Of course! Working with XML Schemas in Python is a common task, primarily handled by the built-in xmlschema library. It's powerful, easy to use, and supports both validation and data conversion.
Here's a comprehensive guide covering everything from installation to advanced use cases.
The Core Library: xmlschema
While Python's standard library has xml.etree.ElementTree for parsing XML, it doesn't have built-in tools for working with XML Schemas (XSD). The de-facto standard library for this is xmlschema.
Installation: You can install it easily using pip:
pip install xmlschema
The Scenario: A Simple XML and XSD
Let's use a simple, common example: a list of books. We'll define the structure in an XSD file and then validate a corresponding XML file.
books.xsd (The Schema)
This file defines the rules for our XML document.
- The root element must be
<catalog>. <catalog>can have one or more<book>elements.- Each
<book>must have anidattribute and must contain<title>,<author>, and<year>elements. - The
<year>must be an integer.
<!-- books.xsd -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element name="book" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string" />
<xs:element name="author" type="xs:string" />
<xs:element name="year" type="xs:integer" />
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
books_valid.xml (A Valid XML Document)
This file follows all the rules defined in books.xsd.
<!-- books_valid.xml -->
<catalog>
<book id="bk101">XML Developer's Guide</title>
<author>McLaughlin, James</author>
<year>2025</year>
</book>
<book id="bk102">Midnight Rain</title>
<author>Galvin, Brian</author>
<year>2025</year>
</book>
</catalog>
books_invalid.xml (An Invalid XML Document)
This file intentionally breaks a rule: the <year> element contains text instead of a number.
<!-- books_invalid.xml -->
<catalog>
<book id="bk103">Oberon's Legacy</title>
<author>Corets, Eva</author>
<year>TWO THOUSAND AND TWENTY-THREE</year> <!-- Invalid: not an integer -->
</book>
</catalog>
Basic Validation
The most common use case is to check if an XML file conforms to a schema.
Method 1: Using is_valid() (Simple Check)
This method returns True or False and is great for quick checks.
import xmlschema
# Path to your schema and XML files
schema_path = 'books.xsd'
valid_xml_path = 'books_valid.xml'
invalid_xml_path = 'books_invalid.xml'
# Create a schema object from the XSD file
schema = xmlschema.XMLSchema(schema_path)
# --- Validation ---
print(f"Is '{valid_xml_path}' valid? {schema.is_valid(valid_xml_path)}")
# Output: Is 'books_valid.xml' valid? True
print(f"Is '{invalid_xml_path}' valid? {schema.is_invalid(invalid_xml_path)}")
# Output: Is 'books_invalid.xml' valid? False
Method 2: Using validate() (Detailed Error Reporting)
If you need to know why a validation failed, use validate(). It will raise an xmlschema.XMLSchemaValidationError exception.
import xmlschema
schema = xmlschema.XMLSchema('books.xsd')
print("\n--- Validating the invalid file ---")
try:
schema.validate('books_invalid.xml')
print("Validation successful!")
except xmlschema.XMLSchemaValidationError as e:
print(f"Validation failed!")
print(f"Error message: {e.message}")
print(f"Path to error: {e.path}")
print(f"Offending element/attribute: {e.elem}")
Output of the validate() error:
--- Validating the invalid file ---
Validation failed!
Error message: Element '{http://www.w3.org/2001/XMLSchema}year': 'TWO THOUSAND AND TWENTY-THREE' is not a valid value of the atomic type xs:integer.
Path to error: /catalog/book[1]/year
Offending element/attribute: <year>TWO THOUSAND AND TWENTY-THREE</year>
Data Conversion: XML to Python (and Back)
One of the best features of xmlschema is its ability to automatically convert XML data to Python data types and vice-versa.
Converting XML to Python (Decoding)
When you parse an XML file, xmlschema can return a native Python dictionary or list based on your schema.
import xmlschema
schema = xmlschema.XMLSchema('books.xsd')
# Parse the valid XML file into a Python object
data = schema.to_dict('books_valid.xml')
print("--- Parsed Data (Python Dictionary) ---")
print(data)
Output:
--- Parsed Data (Python Dictionary) ---
{
'catalog': {
'book': [
{
'id': 'bk101',
'title': 'XML Developer\'s Guide',
'author': 'McLaughlin, James',
'year': 2025
},
{
'id': 'bk102',
'title': 'Midnight Rain',
'author': 'Galvin, Brian',
'year': 2025
}
]
}
}
Notice how:
- The
<year>values are now Pythonintobjects. - The attributes (
id) are included as keys in the dictionary. - Repeated elements (
book) become a list of dictionaries.
Converting Python to XML (Encoding)
You can also take a Python dictionary and generate a valid XML string from it.
import xmlschema
schema = xmlschema.XMLSchema('books.xsd')
# A Python dictionary matching the schema structure
python_data = {
'catalog': {
'book': [
{'id': 'bk301', 'title': 'Python for Beginners', 'author': 'S., Jane', 'year': 2025},
{'id': 'bk302', 'title': 'Advanced Data Science', 'author': 'K., Alan', 'year': 2025}
]
}
}
# Convert the Python dictionary to an XML string
xml_string = schema.tostring(python_data, pretty_print=True)
print("\n--- Generated XML String ---")
print(xml_string)
Output:
--- Generated XML String ---
<catalog>
<book id="bk301">Python for Beginners</title>
<author>S., Jane</author>
<year>2025</year>
</book>
<book id="bk302">Advanced Data Science</title>
<author>K., Alan</author>
<year>2025</year>
</book>
</catalog>
Advanced Features
Working with Namespaces
Real-world XML often uses namespaces. xmlschema handles them gracefully. You just need to provide the namespace map when loading the schema.
Let's modify our schema to include a namespace:
books_ns.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://example.com/books"
xmlns:tns="http://example.com/books">
<xs:element name="catalog" type="tns:CatalogType"/>
<xs:complexType name="CatalogType">
<xs:sequence>
<xs:element name="book" type="tns:BookType" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="BookType">
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required"/>
</xs:complexType>
</xs:schema>
books_ns.xml:
<!-- Note the default namespace declaration -->
<tns:catalog xmlns:tns="http://example.com/books">
<tns:book id="bk401">
<tns:title>The Namespace Guide</tns:title>
<tns author="N., David">David N.</tns:author>
</tns:book>
</tns:catalog>
Python Code with Namespaces:
import xmlschema
# Define the namespace map
namespaces = {
'tns': 'http://example.com/books' # 'tns' is the prefix from the XML file
}
# When loading the schema, provide the target namespace
schema = xmlschema.XMLSchema('books_ns.xsd', target_namespace='http://example.com/books')
# When validating/parsing, provide the namespaces map
try:
data = schema.to_dict('books_ns.xml', namespaces=namespaces)
print("--- Parsed Namespaced Data ---")
print(data)
except xmlschema.XMLSchemaValidationError as e:
print(e)
Generating an XSD from an XML
Sometimes you have an XML file and want to create a corresponding XSD schema. xmlschema can help you get a starting point.
import xmlschema
# Generate an XSD schema from an XML file
# This creates a schema with very permissive types (xs:anyType)
generated_schema = xmlschema.to_jsonschema('books_valid.xml')
print("--- Generated JSON Schema (a starting point for XSD) ---")
print(generated_schema)
# You can then use this as a basis to write a more strict XSD file.
Summary: Key xmlschema Functions
| Function | Purpose | Returns |
|---|---|---|
xmlschema.XMLSchema('path/to/schema.xsd') |
Loads and parses an XSD file. | An XMLSchema object. |
schema.is_valid('path/to/file.xml') |
Checks if an XML is valid. | True or False. |
schema.validate('path/to/file.xml') |
Validates an XML and raises an error on failure. | None (or raises an exception). |
schema.to_dict('path/to/file.xml') |
Parses XML into a Python dictionary/list. | A Python dict or list. |
schema.tostring(data_object) |
Converts a Python dictionary/list to an XML string. | An XML str. |
schema.to_etree(data_object) |
Converts a Python dictionary/list to an ElementTree object. |
An xml.etree.ElementTree.Element. |
