Of course! Here's a comprehensive guide to using ElementTree in Python, covering its core concepts, common operations, and best practices.
What is ElementTree?
ElementTree is a simple and efficient API for parsing and manipulating XML data. It represents an XML document as a tree of Python objects, where each element is a node in the tree.
Python has two main implementations of ElementTree:
xml.etree.ElementTree: The standard, built-in library. It's fast, memory-efficient (uses a C implementation under the hood), and perfect for most use cases. This is the one you should use by default.lxml.etree: A third-party library that is much more powerful and feature-rich. It's significantly faster, supports advanced features like XPath 1.0, XSLT, XML Schema validation, and proper handling of broken HTML. It's the go-to choice for complex applications or performance-critical tasks.
For this guide, we'll focus on the built-in xml.etree.ElementTree.
Parsing XML
You can parse XML from a file or directly from a string.
From a File (ET.parse)
This is the most common method. It reads the entire file into an ElementTree object, which represents the whole document.
import xml.etree.ElementTree as ET
try:
# Parse the XML file
tree = ET.parse('my_data.xml')
# Get the root element of the tree
root = tree.getroot()
print(f"Root tag: {root.tag}")
print(f"Root attributes: {root.attrib}")
except FileNotFoundError:
print("Error: 'my_data.xml' not found.")
# Create a dummy file for demonstration
xml_content = """<?xml version="1.0"?>
<library location="Main Street">
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<price>44.95</price>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<price>5.95</price>
</book>
</library>"""
with open('my_data.xml', 'w') as f:
f.write(xml_content)
print("Created a dummy 'my_data.xml' file. Please re-run the script.")
From a String (ET.fromstring)
If your XML is already in a string, you can parse it directly. This gives you the root Element object immediately.
import xml.etree.ElementTree as ET
xml_string = """<?xml version="1.0"?>
<library location="Main Street">
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<price>44.95</price>
</book>
</library>"""
# Parse from a string
root = ET.fromstring(xml_string)
print(f"Root tag from string: {root.tag}")
print(f"Root attribute 'location': {root.get('location')}") # .get() is a safe way to get attributes
Navigating the Tree
Once you have the root Element, you can navigate the tree using properties and methods.
Key Properties of an Element Object:
.tag: The tag name (e.g.,'book','author')..text: The text content inside the element (e.g.,'Gambardella, Matthew')..attrib: A dictionary of the element's attributes (e.g.,{'id': 'bk101'})..tail: Text content that comes after the element's closing tag. (Less commonly used).
Navigation Methods:
element.iter(): Iterates over all elements in the tree (and their children).element.iter(tag): Iterates over all elements with a specific tag.element.findall(tag): Finds all direct children with a specific tag. Returns a list.element.find(tag): Finds the first direct child with a specific tag. Returns anElementorNone.element.text: Gets/sets the text content.
# Assume 'root' is the <library> element from the examples above
# --- Find all 'book' elements ---
all_books = root.findall('book')
print(f"\nFound {len(all_books)} books.")
# --- Iterate through the books ---
for book in all_books:
print("\n--- Processing a Book ---")
print(f" Tag: {book.tag}")
print(f" Attributes: {book.attrib}")
# Find the title and author text
# .find() looks for the first child with that tagelement = book.find('title')
author_element = book.find('author')
if title_element is not None:
print(f" Title: {title_element.text}")
if author_element is not None:
print(f" Author: {author_element.text}")
# --- Find the first book ---
first_book = root.find('book')
if first_book is not None:
print(f"\nFirst book ID: {first_book.get('id')}") # Use .get() for attributes
# --- Iterate over every single element in the document ---
print("\n--- Iterating over all elements ---")
for elem in root.iter():
print(f"Tag: {elem.tag}, Text: {elem.text}")
Modifying XML
ElementTree makes it easy to create, modify, and delete elements.
Creating and Adding Elements
import xml.etree.ElementTree as ET
# Start with a new root element
new_root = ET.Element("inventory")
# Create a new product element
product = ET.Element("product")
product.set("id", "p123") # Add an attribute
product.set("category", "electronics")
# Create sub-elements and add text
name = ET.SubElement(product, "name")
name.text = "Super Widget"
price = ET.SubElement(product, "price")
price.text = "99.99"
# Add the product to the root
new_root.append(product)
# You can also create elements from strings
# ET.fromstring returns an element, so we append it
another_product_str = "<product id='p456'><name>Mega Gadget</name><price>149.50</price></product>"
new_root.append(ET.fromstring(another_product_str))
print(ET.tostring(new_root, encoding='unicode'))
Modifying Existing Elements
# Let's modify the first book from our original example
# Assume 'root' is the <library> element
first_book = root.find('book')
# Change an attribute
first_book.set('id', 'bk101-updated')
# Change text contentelement = first_book.find('title')element is not None:element.text = "XML Developer's Guide (2nd Edition)"
# Add a new element
year_element = ET.SubElement(first_book, 'year')
year_element.text = "2005"
print("\n--- After Modification ---")
print(ET.tostring(root, encoding='unicode'))
Removing Elements
# Let's remove the <price> element from the first book
first_book = root.find('book')
price_to_remove = first_book.find('price')
if price_to_remove is not None:
first_book.remove(price_to_remove) # The remove() method is called on the parent
print("\n--- After Removal ---")
print(ET.tostring(root, encoding='unicode'))
Writing XML to a File
After modifying the tree, you'll want to save it. Use tree.write().
# If you modified a tree object (from ET.parse)
tree.write('my_data_modified.xml', encoding='utf-8', xml_declaration=True)
# If you only have an Element object (like our new_root)
# You need to wrap it in an ElementTree first
new_tree = ET.ElementTree(new_root)
new_tree.write('new_inventory.xml', encoding='utf-8', xml_declaration=True)
print("\nSaved modified and new XML files.")
encoding='utf-8': Highly recommended for compatibility.xml_declaration=True: Adds the<?xml version='1.0' encoding='utf-8'?>line at the top.
Namespaces
XML namespaces can complicate things. They add a prefix to tags, like <ns0:book>. When searching with find() or findall(), you must use the full namespace URI.
A common pattern is to define a dictionary of prefixes and URIs.
import xml.etree.ElementTree as ET
xml_with_namespace = """<?xml version="1.0"?>
<root xmlns:ns0="http://example.com/books" xmlns:ns1="http://example.com/price">
<ns0:book id="bk101">
<ns0:author>Gambardella, Matthew</ns0:author>
<ns0:title>XML Developer's Guide</ns0:title>
<ns1:amount>44.95</ns1:amount>
</ns0:book>
</root>"""
root = ET.fromstring(xml_with_namespace)
# Define the namespace map
namespaces = {
'b': 'http://example.com/books', # 'b' is our chosen prefix
'p': 'http://example.com/price' # 'p' is our chosen prefix
}
# Now you can use the prefix in your find calls
# The format is {prefix}localname
book_element = root.find('b:book', namespaces)
author_element = book_element.find('b:author', namespaces)
price_element = book_element.find('p:amount', namespaces)
if book_element is not None:
print(f"Found book with ID: {book_element.get('id')}")
if author_element is not None:
print(f"Author: {author_element.text}")
if price_element is not None:
print(f"Price: {price_element.text}")
Best Practices and Summary
| Task | Method / Function | Notes |
|---|---|---|
| Parse from File | ET.parse('file.xml') |
Returns an ElementTree object. Use .getroot() to get the root Element. |
| Parse from String | ET.fromstring(xml_string) |
Returns the root Element directly. |
| Find Child Element | parent.find('tag') |
Finds the first direct child. Returns Element or None. |
| Find Child Elements | parent.findall('tag') |
Finds all direct children. Returns a list. |
| Iterate Recursively | root.iter('tag') |
Iterates over all matching elements in the entire tree. |
| Get Attribute | element.get('attr_name') |
Safe way to get an attribute (returns None if not found). |
| Set Attribute | element.set('attr_name', 'value') |
Adds or updates an attribute. |
| Get Text | element.text |
Returns the text content of an element. |
| Set Text | element.text = 'new text' |
|
| Add Sub-Element | ET.SubElement(parent, 'tag') |
Creates and adds a new element to a parent. |
| Remove Element | parent.remove(child) |
Removes an element. The child must be a direct child of the parent. |
| Write to File | tree.write('file.xml', ...) |
For an ElementTree object. |
| Write Element to File | ET.ElementTree(element).write(...) |
For a single Element object. |
| Handle Namespaces | root.find('prefix:tag', namespaces) |
Define a {'prefix': 'uri'} dictionary and pass it as the second argument. |
