杰瑞科技汇

Protobuf Python如何安装使用?

Of course! Here is a comprehensive guide to using Protocol Buffers (Protobuf) with Python, covering everything from setup to advanced usage.

Protobuf Python如何安装使用?-图1
(图片来源网络,侵删)

What are Protocol Buffers?

Protocol Buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data. Think of them as a more efficient, smaller, and faster alternative to XML, JSON, or other text-based formats.

Key Advantages:

  • Schema: You define your data structure in a .proto file (a schema).
  • Code Generation: You use a compiler (protoc) to generate classes in your target language (Python, in this case).
  • Efficiency: The serialized data is small and fast to parse (it's binary, not text).
  • Type Safety: The generated Python classes provide type safety and easy access to your data.
  • Backward & Forward Compatibility: You can evolve your data schema without breaking old code that reads or writes the data.

Step 1: Installation

You need two main things:

  1. The Protocol Buffers compiler (protoc).
  2. The Python Protobuf library.

Install the Protobuf Compiler (protoc)

On macOS (using Homebrew):

Protobuf Python如何安装使用?-图2
(图片来源网络,侵删)
brew install protobuf

On Debian/Ubuntu:

sudo apt-get update
sudo apt-get install protobuf-compiler

On Windows (using Chocolatey):

choco install protoc

From Source (if needed): You can download the source from the official GitHub repository and compile it.

Install the Python Library

It's highly recommended to install both the library and the compiler plugin for Python, which is needed to generate code.

Protobuf Python如何安装使用?-图3
(图片来源网络,侵删)
pip install protobuf

Step 2: Define Your Schema (.proto file)

Create a file named addressbook.proto. This file defines the structure of your data.

// addressbook.proto
// Syntax specification is required.
syntax = "proto3";
// The package name helps prevent name collisions.
package tutorial;
// Import other .proto files if needed.
// import "other.proto";
// Enumerations are defined like this.
enum PhoneType {
  MOBILE = 0;
  HOME = 1;
  WORK = 2;
}
// A message is like a class or a struct.
message Person {
  // Data types: int32, string, bool, float, double, enums, other messages.
  // 'optional' fields can be missing. In proto3, all fields are implicitly optional.
  // 'repeated' fields can be repeated any number of times (like a list).
  string name = 1;
  int32 id = 2;  // Unique ID number for this person.
  string email = 3;
  // A nested message.
  message PhoneNumber {
    string number = 1;
    PhoneType type = 2;
  }
  // A list of PhoneNumber messages.
  repeated PhoneNumber phones = 4;
}
// Another message that uses the Person message.
message AddressBook {
  repeated Person people = 1;
}

Step 3: Generate Python Code

Now, use the protoc compiler to turn your .proto file into Python classes.

  1. Make sure you are in the same directory as your addressbook.proto file.

  2. Run the following command:

    # The --python_out=. flag tells protoc to generate Python code
    # in the current directory.
    protoc --python_out=. addressbook.proto

This command will create a new file: addressbook_pb2.py. This is the generated code. You should never edit this file by hand.


Step 4: Use the Generated Code in Python

Now you can use the classes from addressbook_pb2.py in your Python scripts.

Here are the main things you'll want to do:

A. Creating and Populating Messages

import addressbook_pb2
# Create an instance of the Person message
person = addressbook_pb2.Person()
# Set its fields. Note the attribute names match the field names in the .proto file.
person.id = 1234
person.name = "John Doe"
person.email = "jdoe@example.com"
# Create and add phone numbers
phone = person.phones.add() # .add() is used for repeated fields
phone.number = "555-4321"
phone.type = addressbook_pb2.PhoneType.HOME
# You can also create a PhoneNumber object and assign it
work_phone = addressbook_pb2.Person.PhoneNumber()
work_phone.number = "555-1234"
work_phone.type = addressbook_pb2.PhoneType.WORK
person.phones.append(work_phone) # .append() also works
print(f"Created person: {person.name} (ID: {person.id})")
print(f"Email: {person.email}")
print(f"First phone number: {person.phones[0].number} (Type: {person.phones[0].type})")

B. Serializing (Writing) to a File

The .SerializeToString() method converts the message object into a binary string.

# Create an AddressBook and add our person to it
address_book = addressbook_pb2.AddressBook()
address_book.people.extend([person])
# Serialize the AddressBook to a binary string
serialized_data = address_book.SerializeToString()
# Write the binary string to a file
with open("addressbook.bin", "wb") as f:
    f.write(serialized_data)
print("\nSerialized data and wrote to addressbook.bin")

C. Parsing (Reading) from a File

The .ParseFromString() method does the reverse, converting a binary string back into a message object.

# Create a new, empty AddressBook object
new_address_book = addressbook_pb2.AddressBook()
# Read the binary data from the file and parse it
with open("addressbook.bin", "rb") as f:
    raw_data = f.read()
    new_address_book.ParseFromString(raw_data)
# Now you can access the data from the parsed object
print("\nRead and parsed data from file:")
for p in new_address_book.people:
    print(f"  Person: {p.name}")
    print(f"    ID: {p.id}")
    print(f"    Email: {p.email}")
    for phone_num in p.phones:
        print(f"    Phone: {phone_num.number} (Type: {phone_num.type})")

Step 5: Running the Full Example

Create a Python script (e.g., main.py) with all the code from the previous steps.

# main.py
import addressbook_pb2
def main():
    # --- Part A: Creating and Populating Messages ---
    print("--- Creating and Populating Messages ---")
    person = addressbook_pb2.Person()
    person.id = 1234
    person.name = "John Doe"
    person.email = "jdoe@example.com"
    phone = person.phones.add()
    phone.number = "555-4321"
    phone.type = addressbook_pb2.PhoneType.HOME
    work_phone = addressbook_pb2.Person.PhoneNumber()
    work_phone.number = "555-1234"
    work_phone.type = addressbook_pb2.PhoneType.WORK
    person.phones.append(work_phone)
    print(f"Created person: {person.name} (ID: {person.id})")
    # --- Part B: Serializing to a File ---
    print("\n--- Serializing to a File ---")
    address_book = addressbook_pb2.AddressBook()
    address_book.people.extend([person])
    serialized_data = address_book.SerializeToString()
    with open("addressbook.bin", "wb") as f:
        f.write(serialized_data)
    print("Serialized data and wrote to addressbook.bin")
    # --- Part C: Parsing from a File ---
    print("\n--- Parsing from a File ---")
    new_address_book = addressbook_pb2.AddressBook()
    with open("addressbook.bin", "rb") as f:
        raw_data = f.read()
        new_address_book.ParseFromString(raw_data)
    for p in new_address_book.people:
        print(f"  Person: {p.name}")
        print(f"    ID: {p.id}")
        print(f"    Email: {p.email}")
        for phone_num in p.phones:
            print(f"    Phone: {phone_num.number} (Type: {phone_num.type})")
if __name__ == "__main__":
    main()

Run the script:

python main.py

You will see the expected output, and a file named addressbook.bin will be created in your directory. If you open this file with a text editor, it will look like garbled text, because it's a binary format.


Advanced Topics

Handling Optional Fields and Defaults

In proto3, all fields are optional. If you don't set a field, it gets a default value:

  • Numerics: 0
  • Booleans: False
  • Strings: (empty string)
  • Bytes: b"" (empty bytes)
  • Enums: The first value defined (e.g., MOBILE in our example)
  • Messages: A "null" or empty instance of that message.

You can check if a field was explicitly set using .HasField():

person = addressbook_pb2.Person()
person.name = "Jane Doe"
# person.id is not set, it will be 0 by default.
if person.HasField("id"):
    print(f"ID is set: {person.id}")
else:
    print("ID field is not set.")

JSON Support

Protobuf can also serialize to and parse from JSON, which is great for web APIs.

First, install the json module for the protobuf library (it's usually included with modern versions):

# For older protobuf versions, you might need this:
# pip install protobuf-json
# For newer versions, it's built-in.

Serializing to JSON:

import json
# ... (create address_book object as before)
# Serialize to a JSON string
json_string = json_format.MessageToJson(address_book)
print("\nJSON representation:")
print(json_string)

Parsing from JSON:

# ... (json_string is from the previous step)
# Parse from a JSON string
new_address_book_from_json = addressbook_pb2.AddressBook()
json_format.Parse(json_string, new_address_book_from_json)
# Now you can use new_address_book_from_json just like the one from binary
print("\nParsed from JSON:")
for p in new_address_book_from_json.people:
    print(f"  Person: {p.name}")

Best Practices for Evolution

A major strength of Protobuf is schema evolution. Follow these rules to ensure backward and forward compatibility:

  • Do not change field numbers. This is the most important rule.
  • You can add new fields. Old code that reads the new data will simply ignore the new fields. New code that reads old data will see the new fields as unset (with their default values).
  • You can remove fields. But you should reserve the field number. This prevents someone from later reusing that number for a new field, which would cause problems for old code still expecting the old field.
    message Person {
      // ...
      reserved 2; // The old 'id' field was here
      reserved "email"; // You can also reserve field names
    }
  • You can change field types, but only in certain, compatible ways (e.g., int32 to int64, string to bytes). Check the official documentation for a full list of compatible changes.
分享:
扫描分享到社交APP
上一篇
下一篇