Of course! Let's dive deep into using xml.dom.minidom in Python.

What is xml.dom.minidom?
minidom stands for "Mini DOM" (Document Object Model). It's a lightweight implementation of the DOM API for Python, included in the standard library.
Key Characteristics:
- Standard Library: You don't need to install anything (
import xml.dom.minidom). - Easy to Use: It's designed to be simpler and more intuitive than the full
xml.dommodule. - Good for Small to Medium Files: It loads the entire XML document into memory. For very large files (gigabytes), this can be inefficient, and a streaming parser like
xml.saxwould be better. - Read-Write Functionality: You can parse existing XML and create new XML documents from scratch.
Core Concepts: The Document Object Model (DOM)
When you parse an XML file with minidom, it converts the text-based structure into a tree of Python objects in memory. You then navigate this tree to access your data.
The main objects you'll interact with are:

Document: The root object of the XML tree, representing the entire file.Element: Represents an XML tag (e.g.,<book>,<author>). It can have attributes, child elements, and text content.Node: The base class for all objects in the DOM tree (Document,Element,Text, etc.). You'll often see methods that operate onNodeobjects.Attribute: Represents a key-value pair on an element (e.g.,id="123").Text: Represents the text content inside an element (e.g., the text "J.R.R. Tolkien" inside a<name>tag).
Step-by-Step Guide with Examples
Let's use a sample XML file named library.xml for our examples.
library.xml
<?xml version="1.0" encoding="UTF-8"?>
<library>
<book category="fantasy">
<title lang="en">The Lord of the Rings</title>
<author>J.R.R. Tolkien</author>
<year>1954</year>
<price>22.99</price>
</book>
<book category="scifi">
<title lang="en">Dune</title>
<author>Frank Herbert</author>
<year>1965</year>
<price>21.99</price>
</book>
</library>
Parsing an XML File
First, you need to read the file from disk and parse it into a Document object.
import xml.dom.minidom
# Open the XML file
with open('library.xml', 'r', encoding='utf-8') as f:
xml_content = f.read()
# Parse the XML string
dom = xml.dom.minidom.parseString(xml_content)
# You can also parse directly from a file object:
# with open('library.xml', 'r', encoding='utf-8') as f:
# dom = xml.dom.minidom.parse(f)
print(dom) # Prints the document object
Navigating the DOM Tree
The core of using minidom is navigating the tree structure.
Getting the Document Element
The documentElement property gives you the root element of the XML.

# Get the root element
root = dom.documentElement
print(f"Root element: {root.tagName}") # Output: Root element: library
Accessing Child Elements
You can get a list of child elements using childNodes.
# Get all child nodes of the root
children = root.childNodes
for child in children:
# We need to check if the node is an Element, as there can be text nodes (e.g., newlines)
if child.nodeType == child.ELEMENT_NODE:
print(f"Found a book element: {child.tagName}")
Finding Elements by Tag Name
The getElementsByTagName() method is very useful. It returns a NodeList (a list-like object) of all elements with the specified tag name, searching the entire document or a subtree.
# Find all 'book' elements in the entire document
all_books = dom.getElementsByTagName('book')
print(f"\nFound {all_books.length} book(s).")
# Iterate through the NodeList
for book in all_books:
print(f" - Book element: {book.tagName}")
Accessing Element Data
Once you have an Element object, you can get its data.
Getting Text Content
Text content is stored in a child Text node. The simplest way to get it is with the firstChild.data property.
# Get the first book element
first_book = all_books[0]
# Get the author text
author_node = first_book.getElementsByTagName('author')[0]
author_text = author_node.firstChild.data
print(f"\nThe author of the first book is: {author_text}") # Output: J.R.R. Tolkien
Getting Attributes
Use the getAttribute() method to get an attribute's value.
# Get the 'category' attribute of the first book
category = first_book.getAttribute('category')
print(f"The category of the first book is: {category}") # Output: fantasy
# Get the 'lang' attribute of the titlenode = first_book.getElementsByTagName('title')[0]
lang = title_node.getAttribute('lang')
print(f"The language of the title is: {lang}") # Output: en
Creating a New XML Document from Scratch
You can also use minidom to generate XML.
from xml.dom.minidom import Document
# 1. Create a new Document
new_doc = Document()
# 2. Create the root element
catalog = new_doc.createElement('catalog')
new_doc.appendChild(catalog)
# 3. Create a product element
product = new_doc.createElement('product')
product.setAttribute('id', 'p1001')
catalog.appendChild(product)
# 4. Create child elements and add text
name = new_doc.createElement('name')
name.appendChild(new_doc.createTextNode('Super Widget'))
product.appendChild(name)
price = new_doc.createElement('price')
price.appendChild(new_doc.createTextNode('19.99'))
product.appendChild(price)
# 5. Print the generated XML
# The 'toprettyxml' method formats the XML for readability
xml_str = new_doc.toprettyxml(indent=" ")
print("\n--- Generated XML ---")
print(xml_str)
Output of Generated XML:
<?xml version="1.0" ?>
<catalog>
<product id="p1001">
<name>Super Widget</name>
<price>19.99</price>
</product>
</catalog>
Important Methods and Properties
| Method/Property | Description |
|---|---|
parse(xml_file) / parseString(xml_string) |
Parses XML from a file object or a string. |
documentElement |
Returns the root Element of the document. |
getElementsByTagName(tag_name) |
Returns a NodeList of elements with the given tag name. |
nodeName |
The name of the node (e.g., tag name for an element). |
nodeType |
The type of the node (e.g., Node.ELEMENT_NODE, Node.TEXT_NODE). |
nodeValue / data |
The value of the node (e.g., the text content for a Text node). |
getAttribute(name) |
Gets the value of an attribute. |
setAttribute(name, value) |
Sets or creates an attribute. |
appendChild(node) |
Adds a new child node to the element. |
removeChild(node) |
Removes a child node. |
toxml() / toprettyxml(indent="") |
Converts the DOM tree back to an XML string. toprettyxml is great for formatting. |
When to Use minidom vs. Alternatives
| Parser | When to Use It | Pros | Cons |
|---|---|---|---|
xml.dom.minidom |
- Reading/writing small to medium XML files. - Quick scripts where simplicity is key. - You need to modify the XML structure. |
- Part of the standard library. - Simple, intuitive API for navigation. - Full read/write support. |
- High memory usage (loads entire file). - Can be slower than other parsers for large files. |
xml.etree.ElementTree |
- The recommended default for most use cases. - Reading/writing XML files of any size. - Performance is important. |
- Very fast and memory-efficient. - Simple, "Pythonic" API. - Part of the standard library. |
- Less flexible for complex document manipulation than a full DOM parser. |
lxml |
- High-performance parsing needed. - Advanced features like XPath, XSLT, XML Schema validation. - Large files where speed is critical. |
- Blazing fast (C-based). - Supports the full ElementTree API plus much more. - Very powerful and feature-rich. |
- Not in the standard library (must be installed via pip install lxml). |
Conclusion: For most everyday XML tasks in Python, start with xml.etree.ElementTree. It's faster and uses less memory than minidom. Use minidom if you specifically need its simple DOM-style navigation or are working with very small, simple XML documents where the overhead doesn't matter. Use lxml when you need top performance or advanced XML features.
