Of course! Let's dive deep into struct.pack in Python.
What is struct.pack?
In short, struct.pack is a Python function that converts Python values (like integers, floats, and strings) into a packed bytes object.
This is incredibly useful for situations where you need to serialize data to a specific binary format, such as:
- Creating binary files (e.g., custom file formats, images).
- Network communication (sending data over a socket, where it must be sent as a byte stream).
- Interfacing with C code (C uses fixed-size data types, which
structcan mimic). - Working with binary protocols (e.g., serial port communication).
The "format string" is the most important part. It tells struct.pack how to interpret your Python values.
The Core Concept: Format Strings
The format string is a sequence of characters that defines the layout of the bytes. It uses special characters (called format characters) to specify the data type and size.
Here are the most common format characters:
| Character | C Type | Python Type | Size (bytes) | Notes |
|---|---|---|---|---|
x |
pad byte |
- | 1 | (No value) Adds a padding byte. |
c |
char |
bytes/str | 1 | A single byte. If you pass a string, it must be a single character. |
b |
signed char |
int | 1 | Integer in range -128 to 127. |
B |
unsigned char |
int | 1 | Integer in range 0 to 255. |
h |
short |
int | 2 | |
H |
unsigned short |
int | 2 | |
i |
int |
int | 4 | |
I |
unsigned int |
int | 4 | |
q |
long long |
int | 8 | |
Q |
unsigned long long |
int | 8 | |
f |
float |
float | 4 | Single-precision floating point number. |
d |
double |
float | 8 | Double-precision floating point number. |
s |
char[] |
bytes/str | - | A sequence of bytes. Requires a number before it (e.g., 10s). |
p |
char[] |
bytes/str | - | A pascal string (with a length byte prefix). Requires a number. |
| Standard sizes (no alignment). | ||||
< |
Little-endian (most common on modern PCs). | |||
> |
Big-endian (common in network protocols). |
Endianness
The < and > prefixes are crucial. They determine the byte order.
- Endianness is the order in which bytes are stored in memory.
- Little-endian (
<): The least significant byte is stored first. This is the standard for x86 and x64 processors (Intel, AMD). Most of the time, you'll use this. - Big-endian (
>): The most significant byte is stored first. This is often used in network protocols (like TCP/IP) and older architectures. - Native ( or no prefix): Uses the native byte order of the machine running the code. This is generally not recommended for data that needs to be portable.
Basic Examples
Let's see struct.pack in action with different data types.
Packing Integers
import struct
# Pack an integer (4 bytes) using little-endian format
# struct.pack(format_string, value1, value2, ...)
packed_data = struct.pack('<i', 42)
print(f"Packed data: {packed_data}")
print(f"Type: {type(packed_data)}")
print(f"Length in bytes: {len(packed_data)}")
# Output:
# Packed data: b'*'
# Type: <class 'bytes'>
# Length in bytes: 4
# Packing a larger integer that needs 8 bytes (long long)
packed_long = struct.pack('<q', 123456789012345)
print(f"\nPacked long: {packed_long}")
print(f"Length in bytes: {len(packed_long)}")
# Output:
# Packed long: b'\x87\xcd\x02\xc9\xe6\x01\x00\x00'
# Length in bytes: 8
Packing a Float
import struct
# Pack a float (4 bytes) using little-endian format
packed_float = struct.pack('<f', 3.14159)
print(f"Packed float: {packed_float}")
print(f"Length in bytes: {len(packed_float)}")
# Output:
# Packed float: b'\xdb\x0f\x49\x40'
# Length in bytes: 4
Packing a String
When packing a string, you must specify its length using a number before the s.
import struct
# Pack a 10-byte string. The string will be padded with null bytes (\x00)
# if it's shorter than 10 characters.
my_string = "hello"
packed_string = struct.pack('<10s', my_string.encode('utf-8'))
print(f"Packed string: {packed_string}")
print(f"Length in bytes: {len(packed_string)}")
# Output:
# Packed string: b'hello\x00\x00\x00\x00\x00'
# Length in bytes: 10
Note: struct works with bytes, not strings. You must encode your string first (e.g., using .encode('utf-8')).
Packing Multiple Values
You can pack multiple values in one go by listing them after the format string. The order must match.
import struct
# Pack an integer, a float, and a 5-byte string
# Format: <i (int) + f (float) + 5s (string)
name = "bob"
data_to_pack = (42, 3.14, name.encode('utf-8'))
packed_multiple = struct.pack('<if5s', *data_to_pack)
print(f"Packed multiple values: {packed_multiple}")
print(f"Length in bytes: {len(packed_multiple)}")
# Expected length: 4 (int) + 4 (float) + 5 (string) = 13 bytes
# Output:
# Packed multiple values: b'*\xf6\x00\x00?\t\x1e\xb8@bob\x00'
# Length in bytes: 13
Using the operator (*data_to_pack) is a clean way to unpack the tuple into arguments for the function.
The Reverse: struct.unpack
If you pack data, you'll eventually need to unpack it. The struct.unpack function does the exact opposite.
It takes a format string and a bytes object and returns a tuple of unpacked values.
import struct
# Let's use the data we packed in the previous example
packed_data = b'*\xf6\x00\x00?\t\x1e\xb8@bob\x00'
# Unpack it using the SAME format string
unpacked_tuple = struct.unpack('<if5s', packed_data)
print(f"Unpacked tuple: {unpacked_tuple}")
# Output:
# Unpacked tuple: (42, 3.140000343322754, b'bob')
# You can access the individual values
original_int = unpacked_tuple[0]
original_float = unpacked_tuple[1]
original_string_bytes = unpacked_tuple[2]
print(f"Original int: {original_int}")
print(f"Original float: {original_float}")
print(f"Original string (bytes): {original_string_bytes}")
# Remember to decode the string back to a Python str
original_string = original_string_bytes.decode('utf-8')
print(f"Original string (decoded): {original_string}")
Practical Example: Creating a Simple Binary File
Let's create a file that stores a header and some data in a custom binary format.
File Format:
- A 4-byte integer for the "magic number" (to identify our file type). Let's use
2025. - A 4-byte integer for the number of data points.
- For each data point, a 4-byte float.
import struct
# --- Data to write ---
magic_number = 2025
num_points = 3
data_points = [1.1, 2.2, 3.3]
# --- Writing to the file ---
filename = "my_data.bin"
# Open the file in binary write mode ('wb')
with open(filename, 'wb') as f:
# 1. Pack and write the header
header_format = '<ii' # Two 4-byte integers (magic, num_points)
header = struct.pack(header_format, magic_number, num_points)
f.write(header)
# 2. Pack and write each data point
data_format = '<f' # One 4-byte float
for point in data_points:
packed_point = struct.pack(data_format, point)
f.write(packed_point)
print(f"Successfully created binary file: {filename}")
# --- Reading from the file (verification) ---
with open(filename, 'rb') as f:
# 1. Unpack the header
header_format = '<ii'
header_size = struct.calcsize(header_format) # Get size of header in bytes
header_data = f.read(header_size)
magic, count = struct.unpack(header_format, header_data)
print(f"\n--- Reading from file ---")
print(f"Magic Number: {magic}")
print(f"Number of Points: {count}")
# 2. Unpack the data points
data_format = '<f'
data_size = struct.calcsize(data_format)
read_points = []
for _ in range(count):
point_data = f.read(data_size)
if not point_data:
break # End of file
point = struct.unpack(data_format, point_data)[0]
read_points.append(point)
print(f"Read Data Points: {read_points}")
Key Functions to Remember
struct.pack(format, ...): Converts values to bytes.struct.unpack(format, data): Converts bytes back to a tuple of values.struct.calcsize(format): Returns the size (in bytes) of the data type described by the format string. Very useful for reading files.struct.iter_unpack(format, data): (Python 3.4+) Unpacks all items from a bytes object, useful for reading repeated structures.
