杰瑞科技汇

Python 3.4中cpickle如何使用或替代?

Of course. Let's break down everything you need to know about cPickle in Python 3.4, including its history, why it's important, and how to handle it today.

Python 3.4中cpickle如何使用或替代?-图1
(图片来源网络,侵删)

Executive Summary (TL;DR)

  • cPickle in Python 3.4 is a built-in module that is a C-optimized version of the pickle module.
  • Its primary purpose is to serialize and deserialize Python objects (convert them to a byte stream and back).
  • cPickle is significantly faster than the pure Python pickle module. This was its main advantage.
  • Crucially, in Python 3.0, cPickle was merged back into pickle. The pickle module in Python 3.4 automatically uses the fast C implementation if it's available. You no longer need to import cPickle directly.
  • Security Warning: pickle is not secure. You should never unpickle data from an untrusted source, as it can execute arbitrary code.

The History of cPickle vs. pickle

To understand the context, you need to look at Python 2.

In Python 2:

  • pickle: The pure Python implementation of the pickling protocol. It was slower but more portable.
  • cPickle: A C-compiled version of the same protocol. It was much faster but required a C compiler to build from source.

Because of the significant speed difference, Python developers often had to choose between:

# Python 2: The common dilemma
import pickle      # Slower, but pure Python
import cPickle as pickle # Faster, but a separate module alias

In Python 3:

The Python core developers decided to eliminate this confusion. They merged the two modules.

  • The pickle module in Python 3 is now a "smart" module.
  • When you import pickle, Python first tries to load the fast C implementation (which was essentially the old cPickle).
  • If the C implementation is not available (e.g., in a minimal Python installation like some embedded systems), it falls back to the pure Python implementation.

The result: There is no separate cPickle module in Python 3. You should always just use import pickle.


How to Use Pickling in Python 3.4 (The Correct Way)

Even though you asked about cPickle, you should use the modern pickle module. The syntax is identical to what cPickle would have used.

The main functions are:

  • pickle.dump(obj, file): Serializes an object and writes it to a file-like object.
  • pickle.load(file): Reads from a file-like object and deserializes it back into a Python object.
  • pickle.dumps(obj): Serializes an object and returns it as a bytes object.
  • pickle.loads(bytes_data): Deserializes a bytes object back into a Python object.

Example: Pickling and Unpickling a Dictionary

Let's create a simple Python object, save it to a file, and then load it back.

import pickle
# 1. Define a Python object to be serialized
data_to_save = {
    'name': 'Alice',
    'age': 30,
    'scores': [88, 92, 95],
    'is_student': False,
    'details': {
        'city': 'New York',
        'id': 12345
    }
}
# Define the filename for our pickled data
filename = 'my_data.pkl'
# 2. Pickle the object and save it to a file
# We use 'wb' for write-binary mode
try:
    with open(filename, 'wb') as f:
        pickle.dump(data_to_save, f)
    print(f"Data successfully pickled to '{filename}'")
except Exception as e:
    print(f"An error occurred during pickling: {e}")
# 3. Unpickle the object from the file
# We use 'rb' for read-binary mode
try:
    with open(filename, 'rb') as f:
        loaded_data = pickle.load(f)
    print("\n--- Data Successfully Unpickled ---")
    print("Loaded data:", loaded_data)
    print("Type of loaded data:", type(loaded_data))
    # Verify the data is identical
    print("\nIs the original data equal to the loaded data?", data_to_save == loaded_data)
except FileNotFoundError:
    print(f"Error: The file '{filename}' was not found.")
except Exception as e:
    print(f"An error occurred during unpickling: {e}")

Running this code will produce:

Data successfully pickled to 'my_data.pkl'
--- Data Successfully Unpickled ---
Loaded data: {'name': 'Alice', 'age': 30, 'scores': [88, 92, 95], 'is_student': False, 'details': {'city': 'New York', 'id': 12345}}
Type of loaded data: <class 'dict'>
Is the original data equal to the loaded data? True

The Critical Security Warning of pickle (and cPickle)

This is the most important thing to know about using this module.

The pickle protocol is not designed to be secure. It can reconstruct not just data, but also code. When you call pickle.loads() or pickle.load(), Python will execute arbitrary bytecode found in the stream to reconstruct the objects.

This means that if you unpickle a file from an untrusted source, an attacker could have crafted it to execute malicious code on your machine (e.g., deleting files, installing malware, opening a reverse shell).

Example of Malicious Pickle

Imagine you receive a file evil.pkl from an untrusted source. It might contain something like this (conceptually):

# This is what the malicious creator of 'evil.pkl' might have done.
import os
import pickle
class Evil:
    def __reduce__(self):
        # This command will be executed when the object is unpickled
        return os.system, ('echo "YOU HAVE BEEN HACKED!" > /tmp/hacked.txt',)
# Pickle the malicious object
with open('evil.pkl', 'wb') as f:
    pickle.dump(Evil(), f)

If you run your previous unpickling script on evil.pkl, it would execute os.system('echo "YOU HAVE BEEN HACKED!" > /tmp/hacked.txt'), creating a file on your system without your consent.

Golden Rule: Only unpickle data that you have pickled yourself or that comes from a source you absolutely trust.


Modern Alternatives to pickle

Because of the security risks, other serialization formats are often preferred, especially for web applications or data interchange.

Format Library Use Case Pros Cons
JSON json (built-in) Web APIs, Config files Human-readable, language-agnostic, very secure. Doesn't support all Python types (e.g., datetime, custom classes).
YAML PyYAML Config files, human-readable data More readable than JSON, supports comments. Slower than JSON, can be complex to parse.
MessagePack msgpack Binary interchange (like JSON but smaller/faster) Fast, compact, language-agnostic. Not human-readable.
Pickle pickle (built-in) Saving/loading Python objects (e.g., ML models) Preserves Python objects perfectly, very fast. SECURITY RISK, not human-readable, Python-specific.

For machine learning, libraries like scikit-learn, TensorFlow, and PyTorch have their own, safer save formats (e.g., .joblib, .h5, .pt) that are often preferred over pickle.

Summary for Python 3.4

  1. Forget cPickle: It doesn't exist as a separate module in Python 3.4. Always use import pickle.
  2. Use pickle for: Saving the state of your Python application, caching complex data structures, or saving trained machine learning models for later use within your own trusted environment.
  3. NEVER use pickle for: Receiving data from the internet, untrusted users, or any source outside your control.
  4. Consider Alternatives: For web APIs or data sharing, use json. For performance-critical binary interchange, consider msgpack.
分享:
扫描分享到社交APP
上一篇
下一篇