杰瑞科技汇

python json bytes

Of course! Working with JSON and bytes is a common task in Python, especially when dealing with network requests, file I/O, or data serialization. Here's a comprehensive guide covering the conversion between them.

The Core Concepts

  • JSON (JavaScript Object Notation): A text-based format for data interchange. In Python, JSON data is represented as a str (string). It's human-readable and language-independent.
  • Bytes (bytes): A sequence of raw bytes. It's the fundamental unit of data storage and transmission. It's not human-readable text but a binary representation. Files, network packets, and memory are often handled in bytes.

The key to converting between them is encoding and decoding.

  • Encoding: Converting a string (like a JSON string) into bytes. You specify a character encoding, most commonly UTF-8.
  • Decoding: Converting bytes into a string. You must use the same character encoding that was used for encoding.

From Python Object to JSON Bytes

This is the most common scenario: you have a Python dictionary or list, and you want to send it over a network or write it to a file in a compact binary format.

The process involves two steps:

  1. Serialize: Convert the Python object to a JSON string using json.dumps().
  2. Encode: Convert the JSON string to bytes using .encode().

Example: dict -> json.dumps() -> str -> .encode() -> bytes

import json
# 1. Start with a Python dictionary
python_data = {
    "name": "Alice",
    "age": 30,
    "is_student": False,
    "courses": ["History", "Math"],
    "address": None
}
# 2. Serialize the Python object to a JSON string
#    json.dumps() returns a string
json_string = json.dumps(python_data)
print(f"Type of json_string: {type(json_string)}")
print(f"JSON String: {json_string}")
# Output:
# Type of json_string: <class 'str'>
# JSON String: {"name": "Alice", "age": 30, "is_student": false, "courses": ["History", "Math"], "address": null}
# 3. Encode the JSON string to bytes
#    .encode('utf-8') is the standard way to do this
json_bytes = json_string.encode('utf-8')
print(f"\nType of json_bytes: {type(json_bytes)}")
print(f"JSON Bytes: {json_bytes}")
# Output:
# Type of json_bytes: <class 'bytes'>
# JSON Bytes: b'{"name": "Alice", "age": 30, "is_student": false, "courses": ["History", "Math"], "address": null}'

Shortcut: Combining Steps with json.dumps(..., separators=...)

For network transmission, you often want the most compact representation. The json.dumps() function can directly return bytes by using the separators argument and the ensure_ascii=False flag.

import json
python_data = {"name": "Alice", "age": 30}
# Create compact JSON bytes directly
# ensure_ascii=False allows non-ASCII characters (like emojis) to be kept as is
# separators=(',', ':') removes all unnecessary whitespace
json_bytes_compact = json.dumps(python_data, ensure_ascii=False, separators=(',', ':')).encode('utf-8')
print(json_bytes_compact)
# Output: b'{"name":"Alice","age":30}'

From JSON Bytes to Python Object

This is the reverse process, used when you receive data from a network or read it from a file.

The process also involves two steps:

  1. Decode: Convert the bytes into a JSON string using .decode().
  2. Deserialize: Convert the JSON string into a Python object (like a dict or list) using json.loads().

Example: bytes -> .decode() -> str -> json.loads() -> dict

import json
# Let's use the bytes from the previous example
json_bytes = b'{"name": "Bob", "age": 25, "city": "New York"}'
# 1. Decode the bytes to a JSON string
#    You must use the same encoding that was used for encoding (usually 'utf-8')
json_string = json_bytes.decode('utf-8')
print(f"Type of json_string: {type(json_string)}")
print(f"JSON String: {json_string}")
# Output:
# Type of json_string: <class 'str'>
# JSON String: {"name": "Bob", "age": 25, "city": "New York"}
# 2. Deserialize the JSON string to a Python dictionary
python_data = json.loads(json_string)
print(f"\nType of python_data: {type(python_data)}")
print(f"Python Dictionary: {python_data}")
# Output:
# Type of python_data: <class 'dict'>
# Python Dictionary: {'name': 'Bob', 'age': 25, 'city': 'New York'}

Shortcut: Combining Steps with json.loads(bytes_object.decode(...))

You can do this in a single, chained line.

import json
json_bytes = b'{"name": "Charlie", "active": true}'
# Decode and deserialize in one line
python_data = json.loads(json_bytes.decode('utf-8'))
print(python_data)
# Output: {'name': 'Charlie', 'active': True}

Reading/Writing JSON Files with json Module

The json module has helper functions json.dump() and json.load() that handle the file I/O and encoding/decoding for you automatically.

Writing to a File (json.dump())

json.dump() writes a Python object directly to a file object, handling the encoding to bytes automatically.

import json
data_to_write = {"user": "David", "id": 123, "status": "active"}
# Use 'with' for safe file handling
with open('data.json', 'w', encoding='utf-8') as f:
    # json.dump() writes the Python object to the file as a JSON string
    json.dump(data_to_write, f, indent=4) # indent makes it readable
print("File 'data.json' has been written.")

After running this, data.json will contain:

{
    "user": "David",
    "id": 123,
    "status": "active"
}

Reading from a File (json.load())

json.load() reads from a file object and automatically decodes it from bytes to a string, then deserializes it into a Python object.

import json
# Use 'with' for safe file handling
with open('data.json', 'r', encoding='utf-8') as f:
    # json.load() reads the file and converts it to a Python object
    data_read = json.load(f)
print(f"Type of data_read: {type(data_read)}")
print(f"Data read from file: {data_read}")
# Output:
# Type of data_read: <class 'dict'>
# Data read from file: {'user': 'David', 'id': 123, 'status': 'active'}

Advanced: orjson - A Faster Alternative

For applications where performance is critical (e.g., processing large amounts of data or high-throughput APIs), the standard json library can be a bottleneck. The orjson library is a high-performance, drop-in replacement.

Installation:

pip install orjson

Key Differences:

  • orjson.dumps() returns bytes directly. No need for .encode('utf-8').
  • orjson.loads() accepts bytes directly. No need for .decode('utf-8').
  • It's significantly faster and produces smaller JSON output.

Example with orjson

import orjson
# 1. From Python Object to JSON Bytes (direct!)
python_data = {"name": "Eve", "value": 42.5, "items": [1, 2, 3]}
json_bytes = orjson.dumps(python_data)
print(f"Type from orjson.dumps: {type(json_bytes)}")
print(f"JSON Bytes: {json_bytes}")
# Output:
# Type from orjson.dumps: <class 'bytes'>
# JSON Bytes: b'{"name":"Eve","value":42.5,"items":[1,2,3]}'
# 2. From JSON Bytes to Python Object (direct!)
json_bytes_received = b'{"name":"Frank","id":456}'
python_data = orjson.loads(json_bytes_received)
print(f"\nType from orjson.loads: {type(python_data)}")
print(f"Python Dictionary: {python_data}")
# Output:
# Type from orjson.loads: <class 'dict'>
# Python Dictionary: {'name': 'Frank', 'id': 456}

Summary Table

Task Standard json Library orjson Library (Recommended for performance)
Object -> Bytes json.dumps(obj).encode('utf-8') orjson.dumps(obj)
Bytes -> Object json.loads(bytes_data.decode('utf-8')) orjson.loads(bytes_data)
Write to File json.dump(obj, file) file.write(orjson.dumps(obj))
Read from File json.load(file) orjson.loads(file.read())
Key Advantage Built-in, standard, good for most use cases. Extremely fast, compact, modern API.
Key Disadvantage Slower, requires manual encoding/decoding for bytes. Not built-in, requires pip install.
分享:
扫描分享到社交APP
上一篇
下一篇