杰瑞科技汇

python xml schema

Of course! Working with XML Schemas in Python is a common task, primarily handled by the built-in xmlschema library. It's powerful, easy to use, and supports both validation and data conversion.

Here's a comprehensive guide covering everything from installation to advanced use cases.

The Core Library: xmlschema

While Python's standard library has xml.etree.ElementTree for parsing XML, it doesn't have built-in tools for working with XML Schemas (XSD). The de-facto standard library for this is xmlschema.

Installation: You can install it easily using pip:

pip install xmlschema

The Scenario: A Simple XML and XSD

Let's use a simple, common example: a list of books. We'll define the structure in an XSD file and then validate a corresponding XML file.

books.xsd (The Schema)

This file defines the rules for our XML document.

  • The root element must be <catalog>.
  • <catalog> can have one or more <book> elements.
  • Each <book> must have an id attribute and must contain <title>, <author>, and <year> elements.
  • The <year> must be an integer.
<!-- books.xsd -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="catalog">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="book" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="title" type="xs:string" />
              <xs:element name="author" type="xs:string" />
              <xs:element name="year" type="xs:integer" />
            </xs:sequence>
            <xs:attribute name="id" type="xs:ID" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

books_valid.xml (A Valid XML Document)

This file follows all the rules defined in books.xsd.

<!-- books_valid.xml -->
<catalog>
  <book id="bk101">XML Developer's Guide</title>
    <author>McLaughlin, James</author>
    <year>2025</year>
  </book>
  <book id="bk102">Midnight Rain</title>
    <author>Galvin, Brian</author>
    <year>2025</year>
  </book>
</catalog>

books_invalid.xml (An Invalid XML Document)

This file intentionally breaks a rule: the <year> element contains text instead of a number.

<!-- books_invalid.xml -->
<catalog>
  <book id="bk103">Oberon's Legacy</title>
    <author>Corets, Eva</author>
    <year>TWO THOUSAND AND TWENTY-THREE</year> <!-- Invalid: not an integer -->
  </book>
</catalog>

Basic Validation

The most common use case is to check if an XML file conforms to a schema.

Method 1: Using is_valid() (Simple Check)

This method returns True or False and is great for quick checks.

import xmlschema
# Path to your schema and XML files
schema_path = 'books.xsd'
valid_xml_path = 'books_valid.xml'
invalid_xml_path = 'books_invalid.xml'
# Create a schema object from the XSD file
schema = xmlschema.XMLSchema(schema_path)
# --- Validation ---
print(f"Is '{valid_xml_path}' valid? {schema.is_valid(valid_xml_path)}")
# Output: Is 'books_valid.xml' valid? True
print(f"Is '{invalid_xml_path}' valid? {schema.is_invalid(invalid_xml_path)}")
# Output: Is 'books_invalid.xml' valid? False

Method 2: Using validate() (Detailed Error Reporting)

If you need to know why a validation failed, use validate(). It will raise an xmlschema.XMLSchemaValidationError exception.

import xmlschema
schema = xmlschema.XMLSchema('books.xsd')
print("\n--- Validating the invalid file ---")
try:
    schema.validate('books_invalid.xml')
    print("Validation successful!")
except xmlschema.XMLSchemaValidationError as e:
    print(f"Validation failed!")
    print(f"Error message: {e.message}")
    print(f"Path to error: {e.path}")
    print(f"Offending element/attribute: {e.elem}")

Output of the validate() error:

--- Validating the invalid file ---
Validation failed!
Error message: Element '{http://www.w3.org/2001/XMLSchema}year': 'TWO THOUSAND AND TWENTY-THREE' is not a valid value of the atomic type xs:integer.
Path to error: /catalog/book[1]/year
Offending element/attribute: <year>TWO THOUSAND AND TWENTY-THREE</year>

Data Conversion: XML to Python (and Back)

One of the best features of xmlschema is its ability to automatically convert XML data to Python data types and vice-versa.

Converting XML to Python (Decoding)

When you parse an XML file, xmlschema can return a native Python dictionary or list based on your schema.

import xmlschema
schema = xmlschema.XMLSchema('books.xsd')
# Parse the valid XML file into a Python object
data = schema.to_dict('books_valid.xml')
print("--- Parsed Data (Python Dictionary) ---")
print(data)

Output:

--- Parsed Data (Python Dictionary) ---
{
    'catalog': {
        'book': [
            {
                'id': 'bk101',
                'title': 'XML Developer\'s Guide',
                'author': 'McLaughlin, James',
                'year': 2025
            },
            {
                'id': 'bk102',
                'title': 'Midnight Rain',
                'author': 'Galvin, Brian',
                'year': 2025
            }
        ]
    }
}

Notice how:

  • The <year> values are now Python int objects.
  • The attributes (id) are included as keys in the dictionary.
  • Repeated elements (book) become a list of dictionaries.

Converting Python to XML (Encoding)

You can also take a Python dictionary and generate a valid XML string from it.

import xmlschema
schema = xmlschema.XMLSchema('books.xsd')
# A Python dictionary matching the schema structure
python_data = {
    'catalog': {
        'book': [
            {'id': 'bk301', 'title': 'Python for Beginners', 'author': 'S., Jane', 'year': 2025},
            {'id': 'bk302', 'title': 'Advanced Data Science', 'author': 'K., Alan', 'year': 2025}
        ]
    }
}
# Convert the Python dictionary to an XML string
xml_string = schema.tostring(python_data, pretty_print=True)
print("\n--- Generated XML String ---")
print(xml_string)

Output:

--- Generated XML String ---
<catalog>
  <book id="bk301">Python for Beginners</title>
    <author>S., Jane</author>
    <year>2025</year>
  </book>
  <book id="bk302">Advanced Data Science</title>
    <author>K., Alan</author>
    <year>2025</year>
  </book>
</catalog>

Advanced Features

Working with Namespaces

Real-world XML often uses namespaces. xmlschema handles them gracefully. You just need to provide the namespace map when loading the schema.

Let's modify our schema to include a namespace:

books_ns.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://example.com/books"
           xmlns:tns="http://example.com/books">
  <xs:element name="catalog" type="tns:CatalogType"/>
  <xs:complexType name="CatalogType">
    <xs:sequence>
      <xs:element name="book" type="tns:BookType" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="BookType">
    <xs:sequence>
      <xs:element name="title" type="xs:string"/>
      <xs:element name="author" type="xs:string"/>
    </xs:sequence>
    <xs:attribute name="id" type="xs:ID" use="required"/>
  </xs:complexType>
</xs:schema>

books_ns.xml:

<!-- Note the default namespace declaration -->
<tns:catalog xmlns:tns="http://example.com/books">
  <tns:book id="bk401">
    <tns:title>The Namespace Guide</tns:title>
    <tns author="N., David">David N.</tns:author>
  </tns:book>
</tns:catalog>

Python Code with Namespaces:

import xmlschema
# Define the namespace map
namespaces = {
    'tns': 'http://example.com/books' # 'tns' is the prefix from the XML file
}
# When loading the schema, provide the target namespace
schema = xmlschema.XMLSchema('books_ns.xsd', target_namespace='http://example.com/books')
# When validating/parsing, provide the namespaces map
try:
    data = schema.to_dict('books_ns.xml', namespaces=namespaces)
    print("--- Parsed Namespaced Data ---")
    print(data)
except xmlschema.XMLSchemaValidationError as e:
    print(e)

Generating an XSD from an XML

Sometimes you have an XML file and want to create a corresponding XSD schema. xmlschema can help you get a starting point.

import xmlschema
# Generate an XSD schema from an XML file
# This creates a schema with very permissive types (xs:anyType)
generated_schema = xmlschema.to_jsonschema('books_valid.xml')
print("--- Generated JSON Schema (a starting point for XSD) ---")
print(generated_schema)
# You can then use this as a basis to write a more strict XSD file.

Summary: Key xmlschema Functions

Function Purpose Returns
xmlschema.XMLSchema('path/to/schema.xsd') Loads and parses an XSD file. An XMLSchema object.
schema.is_valid('path/to/file.xml') Checks if an XML is valid. True or False.
schema.validate('path/to/file.xml') Validates an XML and raises an error on failure. None (or raises an exception).
schema.to_dict('path/to/file.xml') Parses XML into a Python dictionary/list. A Python dict or list.
schema.tostring(data_object) Converts a Python dictionary/list to an XML string. An XML str.
schema.to_etree(data_object) Converts a Python dictionary/list to an ElementTree object. An xml.etree.ElementTree.Element.
分享:
扫描分享到社交APP
上一篇
下一篇