杰瑞科技汇

Python jsonschema 如何验证数据?

Of course! Here's a comprehensive guide to using jsonschema in Python, covering everything from installation to advanced features.

Python jsonschema 如何验证数据?-图1
(图片来源网络,侵删)

What is JSON Schema?

JSON Schema is a powerful vocabulary that allows you to annotate and validate JSON documents. Think of it as a "blueprint" or a "contract" for your JSON data. It describes:

  • Data Types: Is it a string, number, array, object, etc.?
  • Structure: What keys are required? What keys are optional?
  • Values: What are the allowed values for a key (e.g., via an enum)?
  • Constraints: Does a string have a maximum length? Is a number within a specific range?
  • Conditional Logic: If one key is present, another must also be present.

The jsonschema library in Python is the official implementation for checking if a given JSON (or Python dictionary) conforms to a given schema.


Installation

First, you need to install the library. It's available on PyPI.

pip install jsonschema

A Simple Example: The Basics

Let's start with a very basic example to understand the core workflow.

Python jsonschema 如何验证数据?-图2
(图片来源网络,侵删)

Step 1: Define your Schema A schema is itself a JSON-like Python dictionary. It uses specific keywords to define rules.

# A schema that describes a simple "product"
product_schema = {
    "type": "object",  # The root must be a JSON object (a Python dict)
    "properties": {
        "name": {
            "type": "string"  # The "name" key must be a string
        },
        "price": {
            "type": "number"  # The "price" key must be a number
        },
        "in_stock": {
            "type": "boolean"  # The "in_stock" key must be a boolean
        }
    },
    "required": ["name", "price"]  # The "name" and "price" keys are mandatory
}

Step 2: Create Data to Validate Now, let's create some Python dictionaries that represent our JSON data.

# A valid product
valid_product = {
    "name": "Laptop",
    "price": 1200.50,
    "in_stock": True
}
# An invalid product (missing 'price', 'category' is not in schema)
invalid_product_1 = {
    "name": "Mouse",
    "category": "Electronics"
}
# Another invalid product ('price' is a string, not a number)
invalid_product_2 = {
    "name": "Keyboard",
    "price": "75.99"
}

Step 3: Validate the Data Use the validate() function from the jsonschema library.

from jsonschema import validate
print("--- Validating valid_product ---")
try:
    validate(instance=valid_product, schema=product_schema)
    print("✅ The data is valid!")
except Exception as e:
    print(f"❌ The data is invalid: {e}")
print("\n--- Validating invalid_product_1 ---")
try:
    validate(instance=invalid_product_1, schema=product_schema)
    print("✅ The data is valid!")
except Exception as e:
    print(f"❌ The data is invalid: {e}")
print("\n--- Validating invalid_product_2 ---")
try:
    validate(instance=invalid_product_2, schema=product_schema)
    print("✅ The data is valid!")
except Exception as e:
    print(f"❌ The data is invalid: {e}")

Output:

Python jsonschema 如何验证数据?-图3
(图片来源网络,侵删)
--- Validating valid_product ---
✅ The data is valid!
--- Validating invalid_product_1 ---
❌ The data is invalid: 'price' is a required property
--- Validating invalid_product_2 ---
❌ The data is invalid: 'price' is not of type 'number'

Common Schema Keywords

Here are the most important keywords you'll use in your schemas.

Keyword Description Example
type The data type. Can be "string", "number", "integer", "boolean", "object", "array", or "null". "type": "string"
properties Defines the schema for each key in an object. "properties": {"name": {"type": "string"}}
required An array of strings listing the keys that are mandatory in an object. "required": ["name", "id"]
items Defines the schema for all items in an array. "items": {"type": "number"}
additionalProperties By default, any extra keys in an object are forbidden. Set to True to allow any, or provide a schema to allow only specific ones. "additionalProperties": False
minimum / maximum For numbers. The minimum/maximum inclusive value. "minimum": 0
exclusiveMinimum / exclusiveMaximum For numbers. The minimum/maximum exclusive value. "exclusiveMaximum": 100
minLength / maxLength For strings. The minimum/maximum length. "minLength": 5
pattern For strings. A regular expression the string must match. "pattern": "^[A-Za-z]+$"
enum The value must be exactly one of the items in the provided list. "enum": ["admin", "user", "guest"]
const The value must be exactly the provided constant. "const": "active"
anyOf The data must be valid against at least one of the provided subschemas. "anyOf": [{"type": "string"}, {"type": "boolean"}]
allOf The data must be valid against all of the provided subschemas. "allOf": [{"type": "string"}, {"minLength": 5}]
oneOf The data must be valid against exactly one of the provided subschemas. "oneOf": [{"type": "number"}, {"type": "string"}]
not The data must not be valid against the provided schema. "not": {"type": "null"}

Handling Validation Errors

The validate() function raises a jsonschema.exceptions.ValidationError when validation fails. It's crucial to catch this exception to handle errors gracefully.

The ValidationError object is very informative and contains details about the error.

from jsonschema import ValidationError
data_to_test = {"name": "A"} # Missing 'price', name is too short
schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string", "minLength": 5},
        "price": {"type": "number"}
    },
    "required": ["name", "price"]
}
try:
    validate(instance=data_to_test, schema=schema)
except ValidationError as e:
    print(f"Validation failed: {e.message}")
    print(f"Path to error: {list(e.path)}")
    print(f"Invalid value: {e.instance}")
    print(f"Schema rule: {e.schema}")

Output:

Validation failed: 'price' is a required property
Path to error: ['price']
Invalid value: {'name': 'A'}
Schema rule: {'type': 'number'}

You can also check the e.validator field to see which keyword caused the failure (e.g., 'required', 'type', 'minLength').


Advanced Features

a) $id and Refs ($ref)

For large schemas, it's useful to break them into smaller, reusable parts. You can do this using $id and $ref.

  • $id: A unique URI for the schema, allowing other schemas to reference it.
  • $ref: A reference to another schema. The library will resolve this reference and validate against the target schema.

Let's create a schema for a user that reuses a "address" schema.

# Define a reusable schema for an address
address_schema = {
    "$id": "https://example.com/schemas/address.json",
    "type": "object",
    "properties": {
        "street_address": {"type": "string"},
        "city": {"type": "string"},
        "state": {"type": "string"}
    },
    "required": ["street_address", "city", "state"]
}
# Define the main user schema, which references the address schema
user_schema = {
    "$id": "https://example.com/schemas/user.json",
    "type": "object",
    "properties": {
        "username": {"type": "string"},
        "email": {"type": "string", "format": "email"},
        "address": {"$ref": "https://example.com/schemas/address.json"} # Reference the address schema
    },
    "required": ["username", "email", "address"]
}
# A valid user object
valid_user = {
    "username": "jane_doe",
    "email": "jane@example.com",
    "address": {
        "street_address": "123 Python Lane",
        "city": "Codeville",
        "state": "CA"
    }
}
# An invalid user object (invalid address)
invalid_user = {
    "username": "john_doe",
    "email": "john@example.com",
    "address": {
        "street_address": "456 Java Ave" # Missing 'city' and 'state'
    }
}
from jsonschema import validate
print("--- Validating valid_user ---")
try:
    validate(instance=valid_user, schema=user_schema)
    print("✅ The data is valid!")
except ValidationError as e:
    print(f"❌ The data is invalid: {e.message}")
print("\n--- Validating invalid_user ---")
try:
    validate(instance=invalid_user, schema=user_schema)
    print("✅ The data is valid!")
except ValidationError as e:
    print(f"❌ The data is invalid: {e.message}")

b) format keyword

The format keyword is for semantic validation, not just structural validation. It checks if a string conforms to a common format.

  • "email": Checks for a basic email pattern.
  • "uri": Checks for a URI/URL.
  • "date-time": Checks for an ISO 8601 date-time string.
  • "ipv4", "ipv6": Checks for IP addresses.

Note: format validation is often less strict than you might expect. For example, "email" uses a simple regex and won't catch all invalid emails. For critical applications, you might need a dedicated library for deep validation.

user_schema_with_format = {
    "type": "object",
    "properties": {
        "email": {"type": "string", "format": "email"}
    },
    "required": ["email"]
}
valid_email_data = {"email": "test@example.com"}
invalid_email_data = {"email": "not-an-email"}
validate(instance=valid_email_data, schema=user_schema_with_format) # Passes
validate(instance=invalid_email_data, schema=user_schema_with_format) # Raises ValidationError

Best Practices

  1. Use Specific Types: Prefer "integer" over "number" if you only want whole numbers.
  2. Be Explicit with additionalProperties: Unless you want to allow any extra keys, it's best practice to set "additionalProperties": False to prevent typos in keys from going unnoticed.
  3. Provide Clear Error Messages: When a validation fails, the ValidationError is detailed. Use this information to give clear feedback to the user of your API or application.
  4. Reuse with $ref: For complex applications, break your schemas into logical, reusable components using $id and $ref. This keeps your main schemas clean and maintainable.
  5. Version Your Schemas: If your data structure might change over time, consider adding a $schema keyword to your schema to specify the JSON Schema version you are using (e.g., "https://json-schema.org/draft/2025-12/schema"). This ensures future compatibility.
分享:
扫描分享到社交APP
上一篇
下一篇