杰瑞科技汇

Python如何解析XML Schema?

Of course! Working with XML Schema (XSD) in Python is a common task for data validation, especially when dealing with configurations, data interchange formats, or ensuring the structure of XML files generated by your application.

Python如何解析XML Schema?-图1
(图片来源网络,侵删)

Here's a comprehensive guide covering the main approaches, from the built-in library to the most powerful third-party options.

The Standard Library: xmlschema

Python's standard library (xml.etree.ElementTree) can parse XML but has no built-in support for validating against an XSD schema. For this, you need a third-party library.

The most popular and feature-rich library for this is xmlschema.

Installation

First, install the library:

Python如何解析XML Schema?-图2
(图片来源网络,侵删)
pip install xmlschema

Key Features of xmlschema

  • Full XSD 1.0 & 1.1 Support: Supports almost all features of the XML Schema standard.
  • Data Conversion: It can convert XML data into Python-native types (e.g., xs:integer becomes an int, xs:date becomes a datetime.date object).
  • JSON Schema Generation: You can convert an XSD schema into a JSON schema.
  • Easy API: Provides a simple and intuitive API for validation.

Practical Examples with xmlschema

Let's set up a simple example. We have an XML file and a corresponding XSD schema to validate it.

File Structure

project/
├── library.xsd
└── books.xml

library.xsd (The Schema)

This schema defines the rules for our XML file.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="book" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="title" type="xs:string"/>
              <xs:element name="author" type="xs:string"/>
              <xs:element name="year" type="xs:positiveInteger"/>
              <xs:element name="price" type="xs:decimal"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

books.xml (The Data to Validate)

This file conforms to the schema.

<?xml version="1.0" encoding="UTF-8"?>
<library>
  <book>The Hobbit</title>
    <author>J.R.R. Tolkien</author>
    <year>1937</year>
    <price>12.50</price>
  </book>
  <book>Dune</title>
    <author>Frank Herbert</author>
    <year>1965</year>
    <price>18.99</price>
  </book>
</library>

Example 1: Basic Validation

This is the simplest use case: check if an XML file is valid against an XSD schema.

Python如何解析XML Schema?-图3
(图片来源网络,侵删)
import xmlschema
# Define the path to your schema and XML file
schema_path = 'library.xsd'
xml_path = 'books.xml'
try:
    # 1. Create a schema object by loading the XSD file
    #    This is an expensive operation, so do it once and reuse it.
    schema = xmlschema.XMLSchema(schema_path)
    # 2. Validate the XML file
    #    If the file is valid, this will complete without an exception.
    #    If it's invalid, it will raise an xmlschema.XMLSchemaValidationError.
    schema.validate(xml_path)
    print(f"✅ Success: {xml_path} is a valid XML document according to {schema_path}")
except xmlschema.XMLSchemaValidationError as e:
    print(f"❌ Validation Error: {xml_path} is NOT valid.")
    print(f"   Reason: {e.reason}")
    print(f"   Path: {e.path}")
    print(f"   Offending element: {e.object!r}")
except FileNotFoundError as e:
    print(f"❌ File not found: {e}")

Example 2: Parsing with Data Conversion

A powerful feature of xmlschema is its ability to parse the XML into Python data structures, respecting the types defined in the XSD.

import xmlschema
schema = xmlschema.XMLSchema('library.xsd')
# The `to_dict` method parses the XML and converts it to a Python dictionary
# with native Python types (int, float, etc.).
try:
    data = schema.to_dict('books.xml')
    print("--- Parsed Data (as Python dict) ---")
    import json
    print(json.dumps(data, indent=2))
    print("\n--- Accessing Data ---")
    first_book_title = data['library']['book'][0]['title']
    print(f"The first book's title is: '{first_book_title}'")
    first_book_year = data['library']['book'][0]['year']
    print(f"The first book's year is: {first_book_year} (type: {type(first_book_year)})")
except xmlschema.XMLSchemaValidationError as e:
    print(f"❌ Parsing failed: {e.reason}")

Output of Example 2:

--- Parsed Data (as Python dict) ---
{
  "library": {
    "book": [
      {
        "title": "The Hobbit",
        "author": "J.R.R. Tolkien",
        "year": 1937,
        "price": 12.5
      },
      {
        "title": "Dune",
        "author": "Frank Herbert",
        "year": 1965,
        "price": 18.99
      }
    ]
  }
}
--- Accessing Data ---
The first book's title is: 'The Hobbit'
The first book's year is: 1937 (type: <class 'int'>)

Example 3: Handling Invalid XML

Let's create an invalid XML file to see how the error handling works.

invalid_books.xml

<?xml version="1.0" encoding="UTF-8"?>
<library>
  <book>The Hobbit</title>
    <author>J.R.R. Tolkien</author>
    <year>1937</year>
    <price>twelve pounds</price> <!-- Invalid: price must be a decimal -->
  </book>
</library>

Now, run the validation code from Example 1 against invalid_books.xml:

import xmlschema
schema = xmlschema.XMLSchema('library.xsd')
xml_path = 'invalid_books.xml'
try:
    schema.validate(xml_path)
    print(f"✅ Success: {xml_path} is valid.")
except xmlschema.XMLSchemaValidationError as e:
    print(f"❌ Validation Error: {xml_path} is NOT valid.")
    print(f"   Reason: {e.reason}") # The error message
    print(f"   Path: {e.path}")     # The XPath to the error location
    print(f"   Offending value: {e.object!r}") # The value that caused the error

Output of Example 3:

❌ Validation Error: invalid_books.xml is NOT valid.
   Reason: invalid literal for Decimal: 'twelve pounds'
   Path: /library/book[1]/price
   Offending value: 'twelve pounds'

Alternative Libraries

While xmlschema is the recommended choice for most use cases, it's good to know about alternatives.

xsdvalidate

This is a much simpler library if you only need validation and don't require the data conversion features.

  • Pros: Lightweight, simple API.
  • Cons: Lacks advanced features like data binding or JSON schema generation.

Installation:

pip install xsdvalidate

Example:

from xsdvalidate import validate
# The validate function returns True or False
is_valid = validate('library.xsd', 'books.xml')
if is_valid:
    print("✅ The XML is valid.")
else:
    print("❌ The XML is invalid.")

lxml

The lxml library is a high-performance Pythonic binding for the C libraries libxml2 and libxslt. It has excellent XSD validation capabilities but can be more complex to set up and use than xmlschema.

  • Pros: Extremely fast, powerful XPath and XSLT support.
  • Cons: Steeper learning curve, requires external C libraries.

Example with lxml:

from lxml import etree
# Parse the schema
schema = etree.XMLSchema(etree.parse('library.xsd'))
# Parse the XML document
xml_doc = etree.parse('books.xml')
# Validate
is_valid = schema.validate(xml_doc)
if is_valid:
    print("✅ The XML is valid.")
else:
    print("❌ The XML is invalid.")
    for error in schema.error_log:
        print(f"   Line {error.line}: {error.message}")

Summary and Recommendation

Feature xmlschema xsdvalidate lxml
Primary Use Validation & Data Binding Simple Validation High-Parsing & Validation
Ease of Use Excellent Very Easy Good
Data Conversion Yes (Best) No No
Performance Good Good Excellent
XSD Features Full 1.0 & 1.1 support Basic Good (1.0)
Recommendation Best for most projects Quick scripts, validation-only High-performance, complex XML needs

Final Recommendation:

For any new project in Python, start with xmlschema. It provides the best balance of power, ease of use, and features. Its ability to convert XML to Python objects with the correct types is a huge productivity booster and prevents a lot of manual type-checking and conversion code in your application. Use xsdvalidate for simple, one-off validation tasks, and lxml if you have extreme performance requirements or are already using it for other XML processing.

分享:
扫描分享到社交APP
上一篇
下一篇