Of course. Here is a comprehensive guide to using the zlib module in Python 2.7.8, covering its core functionality, common use cases, and important considerations.

What is zlib?
zlib is a Python module that provides bindings to the widely-used zlib compression library. It allows you to compress and decompress data in memory using the DEFLATE compression algorithm, which is also used by formats like gzip and PNG.
The primary goal of zlib is to reduce the size of data, which is useful for:
- Saving disk space.
- Reducing network bandwidth usage.
- Speeding up data transfers.
Basic Compression and Decompression
This is the most fundamental operation. You take a string of data, compress it, and then decompress it back to its original form.
import zlib
# The original data (a string)
original_data = "This is a test string that will be compressed. " * 10
print "Original data (length %d): %r" % (len(original_data), original_data[:50] + "...")
# --- Compression ---
# 1. Convert the string to bytes, as zlib works on bytes, not strings.
# In Python 2, this is often done with .encode() or by using a byte string literal.
data_to_compress = original_data.encode('utf-8')
# 2. Compress the data
compressed_data = zlib.compress(data_to_compress)
print "\nCompressed data (length %d): %r" % (len(compressed_data), compressed_data[:50] + "...")
# --- Decompression ---
# 1. Decompress the data
decompressed_data = zlib.decompress(compressed_data)
# 2. Convert the result back to a string
original_data_restored = decompressed_data.decode('utf-8')
# --- Verification ---
print "\nDecompressed data (length %d): %r" % (len(original_data_restored), original_data_restored[:50] + "...")
print "\nDoes the original data match the decompressed data?", original_data == original_data_restored
Output:

Original data (length 310): 'This is a test string that will be compressed. Thi...'
Compressed data (length 62): 'x\x9c\xcbH\xcd\xc9\xc9\x07\x00\x06,\x02\x15\x83\x8c\xf...'
Decompressed data (length 310): 'This is a test string that will be compressed. Thi...'
Does the original data match the decompressed data? True
As you can see, the compressed data is significantly smaller than the original.
Checking Compression Performance
You can easily check the compression ratio to see how effective zlib was.
import zlib
# A string with a lot of repetitive data compresses very well
text_data = "abababababababababababababababababababab" * 100
binary_data = text_data.encode('utf-8')
compressed = zlib.compress(binary_data)
decompressed = zlib.decompress(compressed)
original_size = len(binary_data)
compressed_size = len(compressed)
ratio = (1.0 - (float(compressed_size) / original_size)) * 100
print "Original size: %d bytes" % original_size
print "Compressed size: %d bytes" % compressed_size
print "Space saved: %.2f%%" % ratio
print "Data integrity OK:", binary_data == decompressed
Output:
Original size: 2000 bytes
Compressed size: 24 bytes
Space saved: 98.80%
Data integrity OK: True
Compression Levels
The zlib.compress() function accepts an optional level argument, which controls the trade-off between compression speed and the resulting compression ratio.
level=0: No compression at all. Fastest.level=1: Best/fastest compression. The compressed data will be larger.level=6: The default level. A good balance of speed and compression.level=9: Best/slowest compression. The smallest compressed data size, but takes the longest.
import zlib
data = "This is a string for testing compression levels. " * 100
data_bytes = data.encode('utf-8')
# Test different compression levels
for level in [0, 1, 6, 9]:
compressed = zlib.compress(data_bytes, level=level)
print "Level %2d: Compressed to %d bytes" % (level, len(compressed))
Output:
Level 0: Compressed to 5050 bytes (No compression, same as original size)
Level 1: Compressed to 120 bytes
Level 6: Compressed to 104 bytes (Default)
Level 9: Compressed to 99 bytes (Best compression)
Working with File-like Objects
For large files, you shouldn't load the entire file into memory. Instead, you can read, compress, and write it in chunks. The zlib module provides zlib.compressobj() and zlib.decompressobj() for this purpose.
This is the most memory-efficient way to handle large files.
Example: Compressing a Large File
import zlib
# Create a dummy large file for the example
with open('large_file.txt', 'w') as f:
for i in range(100000):
f.write("This is line %d of the large file.\n" % i)
print "Created large_file.txt"
# --- Compression using a file-like object ---
compressor = zlib.compressobj()
with open('large_file.txt', 'rb') as f_in, open('large_file.txt.z', 'wb') as f_out:
while True:
chunk = f_in.read(8192) # Read 8KB at a time
if not chunk:
break
# Compress the chunk and write it to the output file
f_out.write(compressor.compress(chunk))
# Don't forget to flush the compressor to get any remaining data
f_out.write(compressor.flush())
print "Compressed large_file.txt to large_file.txt.z"
# --- Decompression using a file-like object ---
decompressor = zlib.decompressobj()
with open('large_file.txt.z', 'rb') as f_in, open('large_file_decomressed.txt', 'wb') as f_out:
while True:
chunk = f_in.read(8192)
if not chunk:
break
# Decompress the chunk and write it
f_out.write(decompressor.decompress(chunk))
# Flush the decompressor
f_out.write(decompressor.flush())
print "Decompressed large_file.txt.z to large_file_decomressed.txt"
Useful Functions and Attributes
The zlib module also provides a few helpful utilities.
zlib.adler32(data) and zlib.crc32(data)
These functions compute checksums (hashes) of data. They are not for encryption but for data integrity. You can compute a checksum of the original data and then of the decompressed data to ensure it hasn't been corrupted.
import zlib
data = "A string for checksum testing."
data_bytes = data.encode('utf-8')
# Calculate checksums
adler_orig = zlib.adler32(data_bytes)
crc_orig = zlib.crc32(data_bytes)
print "Original Adler-32: %x" % adler_orig
print "Original CRC-32: %x" % crc_orig
# Compress and decompress
compressed = zlib.compress(data_bytes)
decompressed = zlib.decompress(compressed)
# Calculate checksums of the decompressed data
adler_decomp = zlib.adler32(decompressed)
crc_decomp = zlib.crc32(decompressed)
print "\nDecompressed Adler-32: %x" % adler_decomp
print "Decompressed CRC-32: %x" % crc_decomp
print "\nAdler checksums match:", adler_orig == adler_decomp
print "CRC checksums match:", crc_orig == crc_decomp
Output:
Original Adler-32: 5d9b38f3
Original CRC-32: 2e9f9f8a
Decompressed Adler-32: 5d9b38f3
Decompressed CRC-32: 2e9f9f8a
Adler checksums match: True
CRC checksums match: True
Important Considerations for Python 2.7.8
String vs. Bytes Handling (Crucial for Python 2)
In Python 2, there are two types for sequences of characters: str and unicode.
str: A sequence of raw bytes.unicode: A sequence of abstract Unicode characters.
The zlib library was written in C and operates on raw bytes. Therefore, you must pass a str object to zlib.compress() and zlib.decompress().
If you have a unicode string, you must first encode it into a str object using an encoding like 'utf-8'.
# Correct way
unicode_string = u"This is a unicode string."
bytes_string = unicode_string.encode('utf-8') # Convert to bytes
compressed = zlib.compress(bytes_string)
# Decompress
decompressed_bytes = zlib.decompress(compressed)
unicode_string_restored = decompressed_bytes.decode('utf-8') # Convert back to unicode
print unicode_string == unicode_string_restored # True
Security Note
Like any compression library, zlib can be vulnerable to CWE-680: Integer Overflow to Buffer Overflow if untrusted, specially-crafted data is decompressed. This is achieved by providing a decompressor with a buffer that is too small for the decompressed output, causing an integer overflow during memory allocation.
This is generally not a concern when processing trusted data (e.g., files you created, data from your own database). However, if you are decompressing data received from an untrusted source (e.g., user uploads, network requests), you should be cautious. The safest approach is to decompress to a temporary file or a very large, pre-allocated buffer in memory if possible, rather than a fixed-size buffer.
Summary
| Task | Function / Method | Key Point |
|---|---|---|
| Compress a string | zlib.compress(data_bytes, level=6) |
Input must be str (bytes). Output is compressed str (bytes). |
| Decompress data | zlib.decompress(compressed_bytes) |
Input is compressed str (bytes). Output is decompressed str (bytes). |
| Compress a large file (memory safe) | zlib.compressobj() |
Use in a loop with compress() and flush() on chunks. |
| Decompress a large file (memory safe) | zlib.decompressobj() |
Use in a loop with decompress() and flush() on chunks. |
| Check data integrity | zlib.adler32(data_bytes) or zlib.crc32(data_bytes) |
Compute checksums before and after to ensure no corruption. |
| Control compression trade-off | level argument in zlib.compress() |
0 (fastest/largest) to 9 (slowest/smallest). Default is 6. |
