Python minidom如何解析XML文件？-杰瑞科技汇

Of course! Let's dive deep into using xml.dom.minidom in Python.

（图片来源网络，侵删）

What is `xml.dom.minidom`?

minidom stands for "Mini DOM" (Document Object Model). It's a lightweight implementation of the DOM API for Python, included in the standard library.

Key Characteristics:

Standard Library: You don't need to install anything (import xml.dom.minidom).
Easy to Use: It's designed to be simpler and more intuitive than the full xml.dom module.
Good for Small to Medium Files: It loads the entire XML document into memory. For very large files (gigabytes), this can be inefficient, and a streaming parser like xml.sax would be better.
Read-Write Functionality: You can parse existing XML and create new XML documents from scratch.

Core Concepts: The Document Object Model (DOM)

When you parse an XML file with minidom, it converts the text-based structure into a tree of Python objects in memory. You then navigate this tree to access your data.

The main objects you'll interact with are:

（图片来源网络，侵删）

Document: The root object of the XML tree, representing the entire file.
Element: Represents an XML tag (e.g., <book>, <author>). It can have attributes, child elements, and text content.
Node: The base class for all objects in the DOM tree (Document, Element, Text, etc.). You'll often see methods that operate on Node objects.
Attribute: Represents a key-value pair on an element (e.g., id="123").
Text: Represents the text content inside an element (e.g., the text "J.R.R. Tolkien" inside a <name> tag).

Step-by-Step Guide with Examples

Let's use a sample XML file named library.xml for our examples.

`library.xml`

<?xml version="1.0" encoding="UTF-8"?>
<library>
    <book category="fantasy">
        <title lang="en">The Lord of the Rings</title>
        <author>J.R.R. Tolkien</author>
        <year>1954</year>
        <price>22.99</price>
    </book>
    <book category="scifi">
        <title lang="en">Dune</title>
        <author>Frank Herbert</author>
        <year>1965</year>
        <price>21.99</price>
    </book>
</library>

Parsing an XML File

First, you need to read the file from disk and parse it into a Document object.

import xml.dom.minidom
# Open the XML file
with open('library.xml', 'r', encoding='utf-8') as f:
    xml_content = f.read()
# Parse the XML string
dom = xml.dom.minidom.parseString(xml_content)
# You can also parse directly from a file object:
# with open('library.xml', 'r', encoding='utf-8') as f:
#     dom = xml.dom.minidom.parse(f)
print(dom) # Prints the document object

Navigating the DOM Tree

The core of using minidom is navigating the tree structure.

Getting the Document Element The documentElement property gives you the root element of the XML.

（图片来源网络，侵删）

# Get the root element
root = dom.documentElement
print(f"Root element: {root.tagName}") # Output: Root element: library

Accessing Child Elements You can get a list of child elements using childNodes.

# Get all child nodes of the root
children = root.childNodes
for child in children:
    # We need to check if the node is an Element, as there can be text nodes (e.g., newlines)
    if child.nodeType == child.ELEMENT_NODE:
        print(f"Found a book element: {child.tagName}")

Finding Elements by Tag Name The getElementsByTagName() method is very useful. It returns a NodeList (a list-like object) of all elements with the specified tag name, searching the entire document or a subtree.

# Find all 'book' elements in the entire document
all_books = dom.getElementsByTagName('book')
print(f"\nFound {all_books.length} book(s).")
# Iterate through the NodeList
for book in all_books:
    print(f"  - Book element: {book.tagName}")

Accessing Element Data

Once you have an Element object, you can get its data.

Getting Text Content Text content is stored in a child Text node. The simplest way to get it is with the firstChild.data property.

# Get the first book element
first_book = all_books[0]
# Get the author text
author_node = first_book.getElementsByTagName('author')[0]
author_text = author_node.firstChild.data
print(f"\nThe author of the first book is: {author_text}") # Output: J.R.R. Tolkien

Getting Attributes Use the getAttribute() method to get an attribute's value.

# Get the 'category' attribute of the first book
category = first_book.getAttribute('category')
print(f"The category of the first book is: {category}") # Output: fantasy
# Get the 'lang' attribute of the titlenode = first_book.getElementsByTagName('title')[0]
lang = title_node.getAttribute('lang')
print(f"The language of the title is: {lang}") # Output: en

Creating a New XML Document from Scratch

You can also use minidom to generate XML.

from xml.dom.minidom import Document
# 1. Create a new Document
new_doc = Document()
# 2. Create the root element
catalog = new_doc.createElement('catalog')
new_doc.appendChild(catalog)
# 3. Create a product element
product = new_doc.createElement('product')
product.setAttribute('id', 'p1001')
catalog.appendChild(product)
# 4. Create child elements and add text
name = new_doc.createElement('name')
name.appendChild(new_doc.createTextNode('Super Widget'))
product.appendChild(name)
price = new_doc.createElement('price')
price.appendChild(new_doc.createTextNode('19.99'))
product.appendChild(price)
# 5. Print the generated XML
# The 'toprettyxml' method formats the XML for readability
xml_str = new_doc.toprettyxml(indent="  ")
print("\n--- Generated XML ---")
print(xml_str)

Output of Generated XML:

<?xml version="1.0" ?>
<catalog>
  <product id="p1001">
    <name>Super Widget</name>
    <price>19.99</price>
  </product>
</catalog>

Important Methods and Properties

Method/Property	Description
`parse(xml_file)` / `parseString(xml_string)`	Parses XML from a file object or a string.
`documentElement`	Returns the root `Element` of the document.
`getElementsByTagName(tag_name)`	Returns a `NodeList` of elements with the given tag name.
`nodeName`	The name of the node (e.g., tag name for an element).
`nodeType`	The type of the node (e.g., `Node.ELEMENT_NODE`, `Node.TEXT_NODE`).
`nodeValue` / `data`	The value of the node (e.g., the text content for a `Text` node).
`getAttribute(name)`	Gets the value of an attribute.
`setAttribute(name, value)`	Sets or creates an attribute.
`appendChild(node)`	Adds a new child node to the element.
`removeChild(node)`	Removes a child node.
`toxml()` / `toprettyxml(indent="")`	Converts the DOM tree back to an XML string. `toprettyxml` is great for formatting.

When to Use `minidom` vs. Alternatives

Parser	When to Use It	Pros	Cons
`xml.dom.minidom`	- Reading/writing small to medium XML files. - Quick scripts where simplicity is key. - You need to modify the XML structure.	- Part of the standard library. - Simple, intuitive API for navigation. - Full read/write support.	- High memory usage (loads entire file). - Can be slower than other parsers for large files.
`xml.etree.ElementTree`	- The recommended default for most use cases. - Reading/writing XML files of any size. - Performance is important.	- Very fast and memory-efficient. - Simple, "Pythonic" API. - Part of the standard library.	- Less flexible for complex document manipulation than a full DOM parser.
`lxml`	- High-performance parsing needed. - Advanced features like XPath, XSLT, XML Schema validation. - Large files where speed is critical.	- Blazing fast (C-based). - Supports the full ElementTree API plus much more. - Very powerful and feature-rich.	- Not in the standard library (must be installed via `pip install lxml`).

Conclusion: For most everyday XML tasks in Python, start with xml.etree.ElementTree. It's faster and uses less memory than minidom. Use minidom if you specifically need its simple DOM-style navigation or are working with very small, simple XML documents where the overhead doesn't matter. Use lxml when you need top performance or advanced XML features.

Python minidom如何解析XML文件？

What is `xml.dom.minidom`?

Core Concepts: The Document Object Model (DOM)

Step-by-Step Guide with Examples

`library.xml`

Parsing an XML File

Navigating the DOM Tree

Accessing Element Data

Creating a New XML Document from Scratch

Important Methods and Properties

When to Use `minidom` vs. Alternatives

99ANYc3cd6

Python Web在Windows下如何部署运行？

Python apply如何实现异步操作？

Python中datetime如何处理毫秒？

Python Cookbook哪里能下载？

如何在Windows CMD中运行Python脚本？

Java Socket编程实例具体怎么实现？

Photoshop CS6教程哪里下载？安全吗？最新版吗？

如何用Java实现WebService入门Demo？

Java字符串如何转Short类型？

IDMSS Lite苹果版怎么用？步骤详解来了！

Ryan Mitchell的Python书适合零基础入门吗？

Java Base64如何转成图片？

AI教程从哪学？入门到精通怎么学？

Java如何用XSLT实现XML到XML转换？

Android Studio怎么用？新手入门教程看这里！

AdvancedSkeleton教程怎么学？关键步骤有哪些？

Python minidom如何解析XML文件？

What is xml.dom.minidom?

Core Concepts: The Document Object Model (DOM)

Step-by-Step Guide with Examples

library.xml

Parsing an XML File

Navigating the DOM Tree

Accessing Element Data

Creating a New XML Document from Scratch

Important Methods and Properties

When to Use minidom vs. Alternatives

相关推荐

Java Socket编程实例具体怎么实现？

What is `xml.dom.minidom`?

`library.xml`

When to Use `minidom` vs. Alternatives