杰瑞科技汇

Python 2.7.8 如何使用 zlib?

Of course. Here is a comprehensive guide to using the zlib module in Python 2.7.8, covering its core functionality, common use cases, and important considerations.

Python 2.7.8 如何使用 zlib?-图1
(图片来源网络,侵删)

What is zlib?

zlib is a Python module that provides bindings to the widely-used zlib compression library. It allows you to compress and decompress data in memory using the DEFLATE compression algorithm, which is also used by formats like gzip and PNG.

The primary goal of zlib is to reduce the size of data, which is useful for:

  • Saving disk space.
  • Reducing network bandwidth usage.
  • Speeding up data transfers.

Basic Compression and Decompression

This is the most fundamental operation. You take a string of data, compress it, and then decompress it back to its original form.

import zlib
# The original data (a string)
original_data = "This is a test string that will be compressed. " * 10
print "Original data (length %d): %r" % (len(original_data), original_data[:50] + "...")
# --- Compression ---
# 1. Convert the string to bytes, as zlib works on bytes, not strings.
#    In Python 2, this is often done with .encode() or by using a byte string literal.
data_to_compress = original_data.encode('utf-8')
# 2. Compress the data
compressed_data = zlib.compress(data_to_compress)
print "\nCompressed data (length %d): %r" % (len(compressed_data), compressed_data[:50] + "...")
# --- Decompression ---
# 1. Decompress the data
decompressed_data = zlib.decompress(compressed_data)
# 2. Convert the result back to a string
original_data_restored = decompressed_data.decode('utf-8')
# --- Verification ---
print "\nDecompressed data (length %d): %r" % (len(original_data_restored), original_data_restored[:50] + "...")
print "\nDoes the original data match the decompressed data?", original_data == original_data_restored

Output:

Python 2.7.8 如何使用 zlib?-图2
(图片来源网络,侵删)
Original data (length 310): 'This is a test string that will be compressed. Thi...'
Compressed data (length 62): 'x\x9c\xcbH\xcd\xc9\xc9\x07\x00\x06,\x02\x15\x83\x8c\xf...'
Decompressed data (length 310): 'This is a test string that will be compressed. Thi...'
Does the original data match the decompressed data? True

As you can see, the compressed data is significantly smaller than the original.


Checking Compression Performance

You can easily check the compression ratio to see how effective zlib was.

import zlib
# A string with a lot of repetitive data compresses very well
text_data = "abababababababababababababababababababab" * 100
binary_data = text_data.encode('utf-8')
compressed = zlib.compress(binary_data)
decompressed = zlib.decompress(compressed)
original_size = len(binary_data)
compressed_size = len(compressed)
ratio = (1.0 - (float(compressed_size) / original_size)) * 100
print "Original size:  %d bytes" % original_size
print "Compressed size: %d bytes" % compressed_size
print "Space saved:     %.2f%%" % ratio
print "Data integrity OK:", binary_data == decompressed

Output:

Original size:  2000 bytes
Compressed size: 24 bytes
Space saved:     98.80%
Data integrity OK: True

Compression Levels

The zlib.compress() function accepts an optional level argument, which controls the trade-off between compression speed and the resulting compression ratio.

  • level=0: No compression at all. Fastest.
  • level=1: Best/fastest compression. The compressed data will be larger.
  • level=6: The default level. A good balance of speed and compression.
  • level=9: Best/slowest compression. The smallest compressed data size, but takes the longest.
import zlib
data = "This is a string for testing compression levels. " * 100
data_bytes = data.encode('utf-8')
# Test different compression levels
for level in [0, 1, 6, 9]:
    compressed = zlib.compress(data_bytes, level=level)
    print "Level %2d: Compressed to %d bytes" % (level, len(compressed))

Output:

Level  0: Compressed to 5050 bytes  (No compression, same as original size)
Level  1: Compressed to 120 bytes
Level  6: Compressed to 104 bytes  (Default)
Level  9: Compressed to 99 bytes   (Best compression)

Working with File-like Objects

For large files, you shouldn't load the entire file into memory. Instead, you can read, compress, and write it in chunks. The zlib module provides zlib.compressobj() and zlib.decompressobj() for this purpose.

This is the most memory-efficient way to handle large files.

Example: Compressing a Large File

import zlib
# Create a dummy large file for the example
with open('large_file.txt', 'w') as f:
    for i in range(100000):
        f.write("This is line %d of the large file.\n" % i)
print "Created large_file.txt"
# --- Compression using a file-like object ---
compressor = zlib.compressobj()
with open('large_file.txt', 'rb') as f_in, open('large_file.txt.z', 'wb') as f_out:
    while True:
        chunk = f_in.read(8192) # Read 8KB at a time
        if not chunk:
            break
        # Compress the chunk and write it to the output file
        f_out.write(compressor.compress(chunk))
    # Don't forget to flush the compressor to get any remaining data
    f_out.write(compressor.flush())
print "Compressed large_file.txt to large_file.txt.z"
# --- Decompression using a file-like object ---
decompressor = zlib.decompressobj()
with open('large_file.txt.z', 'rb') as f_in, open('large_file_decomressed.txt', 'wb') as f_out:
    while True:
        chunk = f_in.read(8192)
        if not chunk:
            break
        # Decompress the chunk and write it
        f_out.write(decompressor.decompress(chunk))
    # Flush the decompressor
    f_out.write(decompressor.flush())
print "Decompressed large_file.txt.z to large_file_decomressed.txt"

Useful Functions and Attributes

The zlib module also provides a few helpful utilities.

zlib.adler32(data) and zlib.crc32(data)

These functions compute checksums (hashes) of data. They are not for encryption but for data integrity. You can compute a checksum of the original data and then of the decompressed data to ensure it hasn't been corrupted.

import zlib
data = "A string for checksum testing."
data_bytes = data.encode('utf-8')
# Calculate checksums
adler_orig = zlib.adler32(data_bytes)
crc_orig = zlib.crc32(data_bytes)
print "Original Adler-32: %x" % adler_orig
print "Original CRC-32:   %x" % crc_orig
# Compress and decompress
compressed = zlib.compress(data_bytes)
decompressed = zlib.decompress(compressed)
# Calculate checksums of the decompressed data
adler_decomp = zlib.adler32(decompressed)
crc_decomp = zlib.crc32(decompressed)
print "\nDecompressed Adler-32: %x" % adler_decomp
print "Decompressed CRC-32:   %x" % crc_decomp
print "\nAdler checksums match:", adler_orig == adler_decomp
print "CRC checksums match:", crc_orig == crc_decomp

Output:

Original Adler-32: 5d9b38f3
Original CRC-32:   2e9f9f8a
Decompressed Adler-32: 5d9b38f3
Decompressed CRC-32:   2e9f9f8a
Adler checksums match: True
CRC checksums match: True

Important Considerations for Python 2.7.8

String vs. Bytes Handling (Crucial for Python 2)

In Python 2, there are two types for sequences of characters: str and unicode.

  • str: A sequence of raw bytes.
  • unicode: A sequence of abstract Unicode characters.

The zlib library was written in C and operates on raw bytes. Therefore, you must pass a str object to zlib.compress() and zlib.decompress().

If you have a unicode string, you must first encode it into a str object using an encoding like 'utf-8'.

# Correct way
unicode_string = u"This is a unicode string."
bytes_string = unicode_string.encode('utf-8') # Convert to bytes
compressed = zlib.compress(bytes_string)
# Decompress
decompressed_bytes = zlib.decompress(compressed)
unicode_string_restored = decompressed_bytes.decode('utf-8') # Convert back to unicode
print unicode_string == unicode_string_restored # True

Security Note

Like any compression library, zlib can be vulnerable to CWE-680: Integer Overflow to Buffer Overflow if untrusted, specially-crafted data is decompressed. This is achieved by providing a decompressor with a buffer that is too small for the decompressed output, causing an integer overflow during memory allocation.

This is generally not a concern when processing trusted data (e.g., files you created, data from your own database). However, if you are decompressing data received from an untrusted source (e.g., user uploads, network requests), you should be cautious. The safest approach is to decompress to a temporary file or a very large, pre-allocated buffer in memory if possible, rather than a fixed-size buffer.

Summary

Task Function / Method Key Point
Compress a string zlib.compress(data_bytes, level=6) Input must be str (bytes). Output is compressed str (bytes).
Decompress data zlib.decompress(compressed_bytes) Input is compressed str (bytes). Output is decompressed str (bytes).
Compress a large file (memory safe) zlib.compressobj() Use in a loop with compress() and flush() on chunks.
Decompress a large file (memory safe) zlib.decompressobj() Use in a loop with decompress() and flush() on chunks.
Check data integrity zlib.adler32(data_bytes) or zlib.crc32(data_bytes) Compute checksums before and after to ensure no corruption.
Control compression trade-off level argument in zlib.compress() 0 (fastest/largest) to 9 (slowest/smallest). Default is 6.
分享:
扫描分享到社交APP
上一篇
下一篇