Python ElementTree如何高效解析XML？-杰瑞科技汇

Of course! Here's a comprehensive guide to using ElementTree in Python, covering its core concepts, common operations, and best practices.

What is ElementTree?

ElementTree is a simple and efficient API for parsing and manipulating XML data. It represents an XML document as a tree of Python objects, where each element is a node in the tree.

Python has two main implementations of ElementTree:

xml.etree.ElementTree: The standard, built-in library. It's fast, memory-efficient (uses a C implementation under the hood), and perfect for most use cases. This is the one you should use by default.
lxml.etree: A third-party library that is much more powerful and feature-rich. It's significantly faster, supports advanced features like XPath 1.0, XSLT, XML Schema validation, and proper handling of broken HTML. It's the go-to choice for complex applications or performance-critical tasks.

For this guide, we'll focus on the built-in xml.etree.ElementTree.

Parsing XML

You can parse XML from a file or directly from a string.

From a File (`ET.parse`)

This is the most common method. It reads the entire file into an ElementTree object, which represents the whole document.

import xml.etree.ElementTree as ET
try:
    # Parse the XML file
    tree = ET.parse('my_data.xml')
    # Get the root element of the tree
    root = tree.getroot()
    print(f"Root tag: {root.tag}")
    print(f"Root attributes: {root.attrib}")
except FileNotFoundError:
    print("Error: 'my_data.xml' not found.")
    # Create a dummy file for demonstration
    xml_content = """<?xml version="1.0"?>
<library location="Main Street">
    <book id="bk101">
        <author>Gambardella, Matthew</author>
        <title>XML Developer's Guide</title>
        <price>44.95</price>
    </book>
    <book id="bk102">
        <author>Ralls, Kim</author>
        <title>Midnight Rain</title>
        <price>5.95</price>
    </book>
</library>"""
    with open('my_data.xml', 'w') as f:
        f.write(xml_content)
    print("Created a dummy 'my_data.xml' file. Please re-run the script.")

From a String (`ET.fromstring`)

If your XML is already in a string, you can parse it directly. This gives you the root Element object immediately.

import xml.etree.ElementTree as ET
xml_string = """<?xml version="1.0"?>
<library location="Main Street">
    <book id="bk101">
        <author>Gambardella, Matthew</author>
        <title>XML Developer's Guide</title>
        <price>44.95</price>
    </book>
</library>"""
# Parse from a string
root = ET.fromstring(xml_string)
print(f"Root tag from string: {root.tag}")
print(f"Root attribute 'location': {root.get('location')}") # .get() is a safe way to get attributes

Navigating the Tree

Once you have the root Element, you can navigate the tree using properties and methods.

Key Properties of an `Element` Object:

.tag: The tag name (e.g., 'book', 'author').
.text: The text content inside the element (e.g., 'Gambardella, Matthew').
.attrib: A dictionary of the element's attributes (e.g., {'id': 'bk101'}).
.tail: Text content that comes after the element's closing tag. (Less commonly used).

Navigation Methods:

element.iter(): Iterates over all elements in the tree (and their children).
element.iter(tag): Iterates over all elements with a specific tag.
element.findall(tag): Finds all direct children with a specific tag. Returns a list.
element.find(tag): Finds the first direct child with a specific tag. Returns an Element or None.
element.text: Gets/sets the text content.

# Assume 'root' is the <library> element from the examples above
# --- Find all 'book' elements ---
all_books = root.findall('book')
print(f"\nFound {len(all_books)} books.")
# --- Iterate through the books ---
for book in all_books:
    print("\n--- Processing a Book ---")
    print(f"  Tag: {book.tag}")
    print(f"  Attributes: {book.attrib}")
    # Find the title and author text
    # .find() looks for the first child with that tagelement = book.find('title')
    author_element = book.find('author')
    if title_element is not None:
        print(f"  Title: {title_element.text}")
    if author_element is not None:
        print(f"  Author: {author_element.text}")
# --- Find the first book ---
first_book = root.find('book')
if first_book is not None:
    print(f"\nFirst book ID: {first_book.get('id')}") # Use .get() for attributes
# --- Iterate over every single element in the document ---
print("\n--- Iterating over all elements ---")
for elem in root.iter():
    print(f"Tag: {elem.tag}, Text: {elem.text}")

Modifying XML

ElementTree makes it easy to create, modify, and delete elements.

Creating and Adding Elements

import xml.etree.ElementTree as ET
# Start with a new root element
new_root = ET.Element("inventory")
# Create a new product element
product = ET.Element("product")
product.set("id", "p123") # Add an attribute
product.set("category", "electronics")
# Create sub-elements and add text
name = ET.SubElement(product, "name")
name.text = "Super Widget"
price = ET.SubElement(product, "price")
price.text = "99.99"
# Add the product to the root
new_root.append(product)
# You can also create elements from strings
# ET.fromstring returns an element, so we append it
another_product_str = "<product id='p456'><name>Mega Gadget</name><price>149.50</price></product>"
new_root.append(ET.fromstring(another_product_str))
print(ET.tostring(new_root, encoding='unicode'))

Modifying Existing Elements

# Let's modify the first book from our original example
# Assume 'root' is the <library> element
first_book = root.find('book')
# Change an attribute
first_book.set('id', 'bk101-updated')
# Change text contentelement = first_book.find('title')element is not None:element.text = "XML Developer's Guide (2nd Edition)"
# Add a new element
year_element = ET.SubElement(first_book, 'year')
year_element.text = "2005"
print("\n--- After Modification ---")
print(ET.tostring(root, encoding='unicode'))

Removing Elements

# Let's remove the <price> element from the first book
first_book = root.find('book')
price_to_remove = first_book.find('price')
if price_to_remove is not None:
    first_book.remove(price_to_remove) # The remove() method is called on the parent
print("\n--- After Removal ---")
print(ET.tostring(root, encoding='unicode'))

Writing XML to a File

After modifying the tree, you'll want to save it. Use tree.write().

# If you modified a tree object (from ET.parse)
tree.write('my_data_modified.xml', encoding='utf-8', xml_declaration=True)
# If you only have an Element object (like our new_root)
# You need to wrap it in an ElementTree first
new_tree = ET.ElementTree(new_root)
new_tree.write('new_inventory.xml', encoding='utf-8', xml_declaration=True)
print("\nSaved modified and new XML files.")

encoding='utf-8': Highly recommended for compatibility.
xml_declaration=True: Adds the <?xml version='1.0' encoding='utf-8'?> line at the top.

Namespaces

XML namespaces can complicate things. They add a prefix to tags, like <ns0:book>. When searching with find() or findall(), you must use the full namespace URI.

A common pattern is to define a dictionary of prefixes and URIs.

import xml.etree.ElementTree as ET
xml_with_namespace = """<?xml version="1.0"?>
<root xmlns:ns0="http://example.com/books" xmlns:ns1="http://example.com/price">
    <ns0:book id="bk101">
        <ns0:author>Gambardella, Matthew</ns0:author>
        <ns0:title>XML Developer's Guide</ns0:title>
        <ns1:amount>44.95</ns1:amount>
    </ns0:book>
</root>"""
root = ET.fromstring(xml_with_namespace)
# Define the namespace map
namespaces = {
    'b': 'http://example.com/books',  # 'b' is our chosen prefix
    'p': 'http://example.com/price'   # 'p' is our chosen prefix
}
# Now you can use the prefix in your find calls
# The format is {prefix}localname
book_element = root.find('b:book', namespaces)
author_element = book_element.find('b:author', namespaces)
price_element = book_element.find('p:amount', namespaces)
if book_element is not None:
    print(f"Found book with ID: {book_element.get('id')}")
if author_element is not None:
    print(f"Author: {author_element.text}")
if price_element is not None:
    print(f"Price: {price_element.text}")

Best Practices and Summary

Task	Method / Function	Notes
Parse from File	`ET.parse('file.xml')`	Returns an `ElementTree` object. Use `.getroot()` to get the root `Element`.
Parse from String	`ET.fromstring(xml_string)`	Returns the root `Element` directly.
Find Child Element	`parent.find('tag')`	Finds the first direct child. Returns `Element` or `None`.
Find Child Elements	`parent.findall('tag')`	Finds all direct children. Returns a list.
Iterate Recursively	`root.iter('tag')`	Iterates over all matching elements in the entire tree.
Get Attribute	`element.get('attr_name')`	Safe way to get an attribute (returns `None` if not found).
Set Attribute	`element.set('attr_name', 'value')`	Adds or updates an attribute.
Get Text	`element.text`	Returns the text content of an element.
Set Text	`element.text = 'new text'`
Add Sub-Element	`ET.SubElement(parent, 'tag')`	Creates and adds a new element to a parent.
Remove Element	`parent.remove(child)`	Removes an element. The child must be a direct child of the parent.
Write to File	`tree.write('file.xml', ...)`	For an `ElementTree` object.
Write Element to File	`ET.ElementTree(element).write(...)`	For a single `Element` object.
Handle Namespaces	`root.find('prefix:tag', namespaces)`	Define a `{'prefix': 'uri'}` dictionary and pass it as the second argument.

Python ElementTree如何高效解析XML？

What is ElementTree?

Parsing XML

From a File (`ET.parse`)

From a String (`ET.fromstring`)

Navigating the Tree

Key Properties of an `Element` Object:

Navigation Methods:

Modifying XML

Creating and Adding Elements

Modifying Existing Elements

Removing Elements

Writing XML to a File

Namespaces

Best Practices and Summary

99ANYc3cd6

Creo2.0自学教程怎么学更高效？

Java socket 客户端如何实现与服务器的通信？

Python与SQL Express如何高效连接？

Xbox360拆机教程，这些步骤你真的懂吗？

Java 客户端 Socket 如何实现连接？

Java Socket编程实例具体怎么实现？

C游戏开发视频教程怎么学？

Java如何连接Access数据库？

Java如何用jxl读取Excel？

Matlab图像处理教程从哪学起？

佳能cp1200怎么用？新手必看教程指南

Maya山体场景制作如何快速搭建自然地形？

Python版CodeTank如何实现核心逻辑？

会计基础视频教程2025

Java如何测试连接Oracle数据库？

云电脑视频注册教程视频

Python ElementTree如何高效解析XML？

What is ElementTree?

Parsing XML

From a File (ET.parse)

From a String (ET.fromstring)

Navigating the Tree

Key Properties of an Element Object:

Navigation Methods:

Modifying XML

Creating and Adding Elements

Modifying Existing Elements

Removing Elements

Writing XML to a File

Namespaces

Best Practices and Summary

相关推荐

Java Socket编程实例具体怎么实现？

From a File (`ET.parse`)

From a String (`ET.fromstring`)

Key Properties of an `Element` Object: