Python 2.7.8 如何使用 zlib？-杰瑞科技汇

Of course. Here is a comprehensive guide to using the zlib module in Python 2.7.8, covering its core functionality, common use cases, and important considerations.

（图片来源网络，侵删）

What is `zlib`?

zlib is a Python module that provides bindings to the widely-used zlib compression library. It allows you to compress and decompress data in memory using the DEFLATE compression algorithm, which is also used by formats like gzip and PNG.

The primary goal of zlib is to reduce the size of data, which is useful for:

Saving disk space.
Reducing network bandwidth usage.
Speeding up data transfers.

Basic Compression and Decompression

This is the most fundamental operation. You take a string of data, compress it, and then decompress it back to its original form.

import zlib
# The original data (a string)
original_data = "This is a test string that will be compressed. " * 10
print "Original data (length %d): %r" % (len(original_data), original_data[:50] + "...")
# --- Compression ---
# 1. Convert the string to bytes, as zlib works on bytes, not strings.
#    In Python 2, this is often done with .encode() or by using a byte string literal.
data_to_compress = original_data.encode('utf-8')
# 2. Compress the data
compressed_data = zlib.compress(data_to_compress)
print "\nCompressed data (length %d): %r" % (len(compressed_data), compressed_data[:50] + "...")
# --- Decompression ---
# 1. Decompress the data
decompressed_data = zlib.decompress(compressed_data)
# 2. Convert the result back to a string
original_data_restored = decompressed_data.decode('utf-8')
# --- Verification ---
print "\nDecompressed data (length %d): %r" % (len(original_data_restored), original_data_restored[:50] + "...")
print "\nDoes the original data match the decompressed data?", original_data == original_data_restored

Output:

（图片来源网络，侵删）

Original data (length 310): 'This is a test string that will be compressed. Thi...'
Compressed data (length 62): 'x\x9c\xcbH\xcd\xc9\xc9\x07\x00\x06,\x02\x15\x83\x8c\xf...'
Decompressed data (length 310): 'This is a test string that will be compressed. Thi...'
Does the original data match the decompressed data? True

As you can see, the compressed data is significantly smaller than the original.

Checking Compression Performance

You can easily check the compression ratio to see how effective zlib was.

import zlib
# A string with a lot of repetitive data compresses very well
text_data = "abababababababababababababababababababab" * 100
binary_data = text_data.encode('utf-8')
compressed = zlib.compress(binary_data)
decompressed = zlib.decompress(compressed)
original_size = len(binary_data)
compressed_size = len(compressed)
ratio = (1.0 - (float(compressed_size) / original_size)) * 100
print "Original size:  %d bytes" % original_size
print "Compressed size: %d bytes" % compressed_size
print "Space saved:     %.2f%%" % ratio
print "Data integrity OK:", binary_data == decompressed

Output:

Original size:  2000 bytes
Compressed size: 24 bytes
Space saved:     98.80%
Data integrity OK: True

Compression Levels

The zlib.compress() function accepts an optional level argument, which controls the trade-off between compression speed and the resulting compression ratio.

level=0: No compression at all. Fastest.
level=1: Best/fastest compression. The compressed data will be larger.
level=6: The default level. A good balance of speed and compression.
level=9: Best/slowest compression. The smallest compressed data size, but takes the longest.

import zlib
data = "This is a string for testing compression levels. " * 100
data_bytes = data.encode('utf-8')
# Test different compression levels
for level in [0, 1, 6, 9]:
    compressed = zlib.compress(data_bytes, level=level)
    print "Level %2d: Compressed to %d bytes" % (level, len(compressed))

Output:

Level  0: Compressed to 5050 bytes  (No compression, same as original size)
Level  1: Compressed to 120 bytes
Level  6: Compressed to 104 bytes  (Default)
Level  9: Compressed to 99 bytes   (Best compression)

Working with File-like Objects

For large files, you shouldn't load the entire file into memory. Instead, you can read, compress, and write it in chunks. The zlib module provides zlib.compressobj() and zlib.decompressobj() for this purpose.

This is the most memory-efficient way to handle large files.

Example: Compressing a Large File

import zlib
# Create a dummy large file for the example
with open('large_file.txt', 'w') as f:
    for i in range(100000):
        f.write("This is line %d of the large file.\n" % i)
print "Created large_file.txt"
# --- Compression using a file-like object ---
compressor = zlib.compressobj()
with open('large_file.txt', 'rb') as f_in, open('large_file.txt.z', 'wb') as f_out:
    while True:
        chunk = f_in.read(8192) # Read 8KB at a time
        if not chunk:
            break
        # Compress the chunk and write it to the output file
        f_out.write(compressor.compress(chunk))
    # Don't forget to flush the compressor to get any remaining data
    f_out.write(compressor.flush())
print "Compressed large_file.txt to large_file.txt.z"
# --- Decompression using a file-like object ---
decompressor = zlib.decompressobj()
with open('large_file.txt.z', 'rb') as f_in, open('large_file_decomressed.txt', 'wb') as f_out:
    while True:
        chunk = f_in.read(8192)
        if not chunk:
            break
        # Decompress the chunk and write it
        f_out.write(decompressor.decompress(chunk))
    # Flush the decompressor
    f_out.write(decompressor.flush())
print "Decompressed large_file.txt.z to large_file_decomressed.txt"

Useful Functions and Attributes

The zlib module also provides a few helpful utilities.

`zlib.adler32(data)` and `zlib.crc32(data)`

These functions compute checksums (hashes) of data. They are not for encryption but for data integrity. You can compute a checksum of the original data and then of the decompressed data to ensure it hasn't been corrupted.

import zlib
data = "A string for checksum testing."
data_bytes = data.encode('utf-8')
# Calculate checksums
adler_orig = zlib.adler32(data_bytes)
crc_orig = zlib.crc32(data_bytes)
print "Original Adler-32: %x" % adler_orig
print "Original CRC-32:   %x" % crc_orig
# Compress and decompress
compressed = zlib.compress(data_bytes)
decompressed = zlib.decompress(compressed)
# Calculate checksums of the decompressed data
adler_decomp = zlib.adler32(decompressed)
crc_decomp = zlib.crc32(decompressed)
print "\nDecompressed Adler-32: %x" % adler_decomp
print "Decompressed CRC-32:   %x" % crc_decomp
print "\nAdler checksums match:", adler_orig == adler_decomp
print "CRC checksums match:", crc_orig == crc_decomp

Output:

Original Adler-32: 5d9b38f3
Original CRC-32:   2e9f9f8a
Decompressed Adler-32: 5d9b38f3
Decompressed CRC-32:   2e9f9f8a
Adler checksums match: True
CRC checksums match: True

Important Considerations for Python 2.7.8

String vs. Bytes Handling (Crucial for Python 2)

In Python 2, there are two types for sequences of characters: str and unicode.

str: A sequence of raw bytes.
unicode: A sequence of abstract Unicode characters.

The zlib library was written in C and operates on raw bytes. Therefore, you must pass a str object to zlib.compress() and zlib.decompress().

If you have a unicode string, you must first encode it into a str object using an encoding like 'utf-8'.

# Correct way
unicode_string = u"This is a unicode string."
bytes_string = unicode_string.encode('utf-8') # Convert to bytes
compressed = zlib.compress(bytes_string)
# Decompress
decompressed_bytes = zlib.decompress(compressed)
unicode_string_restored = decompressed_bytes.decode('utf-8') # Convert back to unicode
print unicode_string == unicode_string_restored # True

Security Note

Like any compression library, zlib can be vulnerable to CWE-680: Integer Overflow to Buffer Overflow if untrusted, specially-crafted data is decompressed. This is achieved by providing a decompressor with a buffer that is too small for the decompressed output, causing an integer overflow during memory allocation.

This is generally not a concern when processing trusted data (e.g., files you created, data from your own database). However, if you are decompressing data received from an untrusted source (e.g., user uploads, network requests), you should be cautious. The safest approach is to decompress to a temporary file or a very large, pre-allocated buffer in memory if possible, rather than a fixed-size buffer.

Summary

Task	Function / Method	Key Point
Compress a string	`zlib.compress(data_bytes, level=6)`	Input must be `str` (bytes). Output is compressed `str` (bytes).
Decompress data	`zlib.decompress(compressed_bytes)`	Input is compressed `str` (bytes). Output is decompressed `str` (bytes).
Compress a large file (memory safe)	`zlib.compressobj()`	Use in a loop with `compress()` and `flush()` on chunks.
Decompress a large file (memory safe)	`zlib.decompressobj()`	Use in a loop with `decompress()` and `flush()` on chunks.
Check data integrity	`zlib.adler32(data_bytes)` or `zlib.crc32(data_bytes)`	Compute checksums before and after to ensure no corruption.
Control compression trade-off	`level` argument in `zlib.compress()`	`0` (fastest/largest) to `9` (slowest/smallest). Default is `6`.

Python 2.7.8 如何使用 zlib？

What is `zlib`?

Basic Compression and Decompression

Checking Compression Performance

Compression Levels

Working with File-like Objects

Example: Compressing a Large File

Useful Functions and Attributes

`zlib.adler32(data)` and `zlib.crc32(data)`

Important Considerations for Python 2.7.8

String vs. Bytes Handling (Crucial for Python 2)

Security Note

Summary

99ANYc3cd6

Java如何将HTML高效转换为Excel？

mysql java 驱动下载

捡到iWatch怎么激活？教程能直接用吗？

Java jQuery上传文件如何实现？

Java中Unicode如何转汉字？

Java Socket编程实例具体怎么实现？

Python subcommands 如何实现？

autodesk3dsmax教程

macbook air win7教程

Python float转int如何取舍小数部分？

我的世界超能物质mod怎么用？

Particleillusion教程如何快速入门？

Windows API如何在Java中调用？

cxf wsdl2java命令如何正确生成Java代码？

oracle10g安装教程

Python lxml parse如何高效解析XML？

Python 2.7.8 如何使用 zlib？

What is zlib?

Basic Compression and Decompression

Checking Compression Performance

Compression Levels

Working with File-like Objects

Example: Compressing a Large File

Useful Functions and Attributes

zlib.adler32(data) and zlib.crc32(data)

Important Considerations for Python 2.7.8

String vs. Bytes Handling (Crucial for Python 2)

Security Note

Summary

相关推荐

Java Socket编程实例具体怎么实现？

What is `zlib`?

`zlib.adler32(data)` and `zlib.crc32(data)`