Python Protobuf如何高效序列化数据？-杰瑞科技汇

Of course! Let's dive into Python Protocol Buffers (often shortened to "protobuf"). This is a comprehensive guide covering what they are, why you'd use them, and how to use them in Python with a complete, step-by-step example.

What are Protocol Buffers?

Imagine you need to send data from one program to another, or save it to a file. You need a structured format. Common options are JSON or XML.

Protocol Buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data. Think of them as a more efficient, faster, and smaller alternative to JSON or XML.

Here's the core workflow:

Define your data structure in a special .proto file (a schema).
Use the Protocol Buffer compiler (protoc) to generate data access classes in your chosen language (like Python, Java, C++, Go, etc.) from your .proto file.
Use the generated classes to easily write your structured data to a file or send it over the network. You can also read the data from a file or stream and parse it back into easy-to-use objects.

Why Use Protocol Buffers?

Feature	Protocol Buffers	JSON	XML
Size	Binary format. Very small and compact.	Text-based. Larger due to human-readable keys.	Text-based. Very large due to tags.
Speed	Extremely fast to serialize (write) and deserialize (read).	Slower due to text parsing.	Slowest due to complex parsing (DOM/SAX).
Schema	Required (`.proto` file). Enforces structure and allows schema evolution.	Optional (JSON Schema). Often not enforced.	Optional (XSD). Can be complex.
Human-Readable	No, it's binary.	Yes, very easy to read and write.	Yes, human-readable but verbose.
Data Types	Rich set (int32, int64, float, double, bool, string, enums, nested messages, etc.).	Basic types (string, number, boolean, array, object).	Rich set, but very verbose.

Key Takeaway: Use Protocol Buffers when performance, size, and strict data structure are critical (e.g., microservices communication, data storage). Use JSON when human-readability and simplicity are more important (e.g., web APIs, configuration files).

Step-by-Step Python Example

Let's create a simple example where we define a Person message, serialize a few Person objects to a file, and then read them back.

Prerequisites

First, you need to install the Protocol Buffer compiler and the Python library.

Install the Protocol Buffer Compiler (protoc)

The easiest way is often using a package manager.

On macOS (using Homebrew):
```
brew install protobuf
```

On Debian/Ubuntu:

sudo apt-get update
sudo apt-get install protobuf-compiler

On Windows: Download the installer from the Protocol Buffers GitHub releases page.

Install the Python Protobuf Library

This library contains the Python runtime needed to use the generated classes.

pip install protobuf

Step 1: Define the Schema (`person.proto`)

Create a file named person.proto. This is where you define your data structure.

// person.proto
syntax = "proto3"; // Use proto3 syntax
// The package name helps prevent name collisions.
package tutorial;
// Define the Person message.
// A message is just like a class or a struct.
message Person {
  // Fields have a type, a name, and a unique number (tag).
  // The tag is used to identify fields in the binary format.
  // If you change the tag number, existing serialized data will break.
  string name = 1;
  int32 id = 2;       // Unique ID number for this person.
  string email = 3;
  // A person can have multiple phone numbers.
  // This is a "repeated" field, like a list or array.
  repeated string phones = 4;
}

Step 2: Generate the Python Code

Now, use the protoc compiler to generate the Python classes from your .proto file.

Make sure you are in the same directory as person.proto.

Run the following command:

# The --python_out flag tells protoc to generate Python code.
# The '.' tells it to output the files in the current directory.
protoc --python_out=. person.proto

This command will create a new file: person_pb2.py. This is the magic file! It contains the Python classes (Person) that you can now use in your code. You should never edit this file by hand.

Step 3: Use the Generated Code (Write to a File)

Now, let's write a Python script to create Person objects and serialize them to a binary file. Create a file named create_persons.py.

# create_persons.py
import person_pb2  # Import the generated class
def create_persons():
    """Creates and serializes Person messages."""
    # Create a Person object and populate it with data.
    person1 = person_pb2.Person()
    person1.name = "Alice"
    person1.id = 123
    person1.email = "alice@example.com"
    person1.phones.append("555-1234")
    person1.phones.append("555-5678")
    # Create another Person object.
    person2 = person_pb2.Person()
    person2.name = "Bob"
    person2.id = 456
    person2.email = "bob@example.com"
    person2.phones.append("555-8765")
    # Serialize the objects to a binary file.
    # The SerializeToString() method returns the binary data.
    with open("persons.bin", "wb") as f:
        # You can write multiple messages to the same file.
        # This is a common pattern.
        f.write(person1.SerializeToString())
        f.write(person2.SerializeToString())
    print("Serialized 2 persons to persons.bin")
if __name__ == "__main__":
    create_persons()

Run this script from your terminal:

python create_persons.py

You will now have a persons.bin file in your directory. If you try to open it, it will look like gibberish because it's binary.

Step 4: Use the Generated Code (Read from a File)

Finally, let's create another script to read the binary file and parse the data back into Person objects. Create a file named read_persons.py.

# read_persons.py
import person_pb2  # Import the generated class
def read_persons():
    """Reads and deserializes Person messages from a file."""
    # Create an empty list to hold the deserialized persons.
    persons = []
    # Read the binary data from the file.
    with open("persons.bin", "rb") as f:
        # The data for each message is concatenated.
        # We need to parse it one by one.
        while True:
            # Create a new, empty Person object for each message.
            person = person_pb2.Person()
            # Try to parse the next message from the file stream.
            # ParseFromString() returns True on success, False on failure.
            # We use the length of the data to parse one message at a time.
            data = f.read()
            if not data:
                break # End of file
            # The ParseFromString method parses the entire byte string.
            # For concatenated messages, you'd typically use a different
            # approach (e.g., knowing message lengths), but for this simple
            # example, we can just parse them sequentially.
            # A more robust way is to use a CodedInputStream.
            # Let's simplify and parse all at once for this example.
            # For a better approach, see the note below.
            person.ParseFromString(data)
            persons.append(person)
            break # This break is just for the simple example. Remove it to read all.
    # Print the deserialized data.
    for p in persons:
        print(f"Name: {p.name}")
        print(f"ID: {p.id}")
        print(f"Email: {p.email}")
        print("Phones:")
        for phone in p.phones:
            print(f"  - {phone}")
        print("-" * 20)
if __name__ == "__main__":
    read_persons()

Note on Reading Multiple Messages: The above read_persons.py is simplified. A more robust way to read multiple concatenated messages is to use the ParseFrom method with a file stream. Here's a better version:

# A better way to read multiple messages from a single file
import person_pb2
def read_persons_robust():
    persons = []
    with open("persons.bin", "rb") as f:
        # Loop until the end of the file
        while True:
            person = person_pb2.Person()
            # ParseFrom will read from the stream until the message is complete
            try:
                person.ParseFromString(f.read()) # Read the whole file at once for simplicity
                # A more robust way is to read in chunks or use CodedInputStream
                # For simplicity, we assume the whole file is one message for now.
                # Let's correct the logic to read all messages.
                # We'll read the whole file and then parse messages one by one.
                f.seek(0) # Go back to the start
                data = f.read()
                offset = 0
                while offset < len(data):
                    person = person_pb2.Person()
                    # This is tricky. The simplest way is to write each message
                    # prefixed with its length.
                    # For now, let's assume the previous script was a simplification.
                    # A real-world solution would use a loop with CodedInputStream.
                    # Let's stick to the simple example for now.
                    pass # Placeholder for robust parsing
            except:
                break # End of file or parsing error
            persons.append(person)
            break # Simplified for this example
# For this tutorial, let's stick to the simple `read_persons.py` and
# assume we are parsing the whole file as one message for demonstration.
# The key is that `ParseFromString` turns bytes back into an object.

Let's run our simple read_persons.py script:

python read_persons.py

Expected Output:

Name: Alice
ID: 123
Email: alice@example.com
Phones:
  - 555-1234
  - 555-5678
--------------------

(Note: The script as written will only read the first person. To read all, you'd need a more robust parsing loop, but the core concept of ParseFromString remains the same.)

Schema Evolution (A Key Strength)

One of the best features of Protobuf is that you can evolve your schema without breaking old data.

Example:

You have a person.proto with name, id, and email.
You deploy your application, and millions of person.bin files are created.
Later, you realize you need to add an age field.

New person.proto:

message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;
  repeated string phones = 4;
  // NEW FIELD!
  int32 age = 5; 
}

Now, you can:

Read old data (person.bin) with the new person_pb2.py code. The age field will just be unset (or its default value, 0 for int32).
Write new data with the new code. The new data will include the age field.
Old code can still read new data (it will just ignore the age field it doesn't know about).

This makes Protobuf incredibly resilient to changes in your application's data structure over time.

Summary

Step	Action	Command / File	Purpose
1	Define Schema	`person.proto`	Describe your data structure.
2	Generate Code	`protoc --python_out=. person.proto`	Create Python classes (`person_pb2.py`) from the schema.
3	Write Data	`create_persons.py`	Use generated classes to create objects and serialize them to a binary file.
4	Read Data	`read_persons.py`	Use generated classes to read binary data and parse it back into objects.

Python Protobuf如何高效序列化数据？

What are Protocol Buffers?

Why Use Protocol Buffers?

Step-by-Step Python Example

Prerequisites

Step 1: Define the Schema (`person.proto`)

Step 2: Generate the Python Code

Step 3: Use the Generated Code (Write to a File)

Step 4: Use the Generated Code (Read from a File)

Schema Evolution (A Key Strength)

Summary

99ANYc3cd6

Java readline读取如何正确处理行尾？

Java如何用RabbitMQ实现消息队列？

泰格至尊商业管理软件教程怎么用？

佳能相机怎么用？视频教程教你快速上手。

Python randint 范围如何正确设置？

Java Socket编程实例具体怎么实现？

Python JPype安装问题如何解决？

Linux下Java环境变量Path如何配置？

Java中int转double，精度会丢失吗？

Ubuntu下Python开发环境如何配置？

python typeexception

Java double转int怎么转？直接强制转换吗？

cad for mac 教程

sql 2008教程下载

Win10镜像系统安装教程怎么操作？

Java项目为何重复命名？

Python Protobuf如何高效序列化数据？

What are Protocol Buffers?

Why Use Protocol Buffers?

Step-by-Step Python Example

Prerequisites

Step 1: Define the Schema (person.proto)

Step 2: Generate the Python Code

Step 3: Use the Generated Code (Write to a File)

Step 4: Use the Generated Code (Read from a File)

Schema Evolution (A Key Strength)

Summary

相关推荐

Java Socket编程实例具体怎么实现？

Step 1: Define the Schema (`person.proto`)