杰瑞科技汇

Python字符串与Bytes如何转换?

Of course! This is a fundamental and crucial concept in Python, especially when dealing with network programming, file I/O, or low-level data.

Python字符串与Bytes如何转换?-图1
(图片来源网络,侵删)

Let's break down the difference between str (string) and bytes in Python.

The Core Difference: Text vs. Bytes

  • str (String): Represents text. It's a sequence of Unicode characters. Unicode is a universal standard that assigns a unique number (a code point) to every character from every language (e.g., A, , , ). A str object is abstract and doesn't care about how the characters are actually stored in memory or on disk.
  • bytes (Bytes): Represents raw binary data. It's a sequence of 8-bit values (integers from 0 to 255). Bytes are the actual "bytes" that are stored in memory, sent over a network, or written to a file. They have no inherent meaning; they just are.

Analogy: Think of str as a book written in a language you understand. The words and sentences have meaning. Think of bytes as the physical ink and paper the book is printed on. The ink itself is just a pattern; it only becomes meaningful when you know how to read it (i.e., which encoding to use).


Converting Between str and bytes

The process of converting between them is called encoding and decoding.

  • Encoding: Converting a str to bytes. You specify an encoding (like UTF-8, ASCII) to define how each Unicode character should be represented as a sequence of bytes.
  • Decoding: Converting bytes to a str. You must use the same encoding that was used to create the bytes to correctly interpret them back into text.

Encoding: str -> bytes

You use the .encode() method on a string.

Python字符串与Bytes如何转换?-图2
(图片来源网络,侵删)
# Our original string
my_string = "Hello, 世界! 👋"
# Encode the string into bytes using UTF-8 (the most common encoding)
my_bytes = my_string.encode('utf-8')
print(f"Original string: {my_string}")
print(f"Type of original: {type(my_string)}")
print("-" * 20)
print(f"Encoded bytes: {my_bytes}")
print(f"Type of encoded: {type(my_bytes)}")
# You can see the raw bytes. Note that non-ASCII characters take up more than one byte.
# 'H' -> b'H'
# ' ' -> b' '
# '世' -> b'\xe4\xb8\x96' (3 bytes)

Output:

Original string: Hello, 世界! 👋
Type of original: <class 'str'>
--------------------
Encoded bytes: b'Hello, \xe4\xb8\x96\xe7\x95\x8c! \xf0\x9f\x91\x8b'
Type of encoded: <class 'bytes'>

Decoding: bytes -> str

You use the .decode() method on a bytes object.

# We have the bytes from the previous example
my_bytes = b'Hello, \xe4\xb8\x96\xe7\x95\x8c! \xf0\x9f\x91\x8b'
# Decode the bytes back into a string using UTF-8
recovered_string = my_bytes.decode('utf-8')
print(f"Original bytes: {my_bytes}")
print(f"Type of original: {type(my_bytes)}")
print("-" * 20)
print(f"Decoded string: {recovered_string}")
print(f"Type of decoded: {type(recovered_string)}")

Output:

Original bytes: b'Hello, \xe4\xb8\x96\xe7\x95\x8c! \xf0\x9f\x91\x8b'
Type of original: <class 'bytes'>
--------------------
Decoded string: Hello, 世界! 👋
Type of decoded: <class 'str'>

What if you use the wrong encoding? This is a very common source of errors.

Python字符串与Bytes如何转换?-图3
(图片来源网络,侵删)
# Try to decode UTF-8 bytes using ASCII
# ASCII only covers characters from 0-127. The byte \xe4 is outside this range.
try:
    my_bytes.decode('ascii')
except UnicodeDecodeError as e:
    print(f"Error: {e}")

Output:

Error: 'ascii' codec can't decode byte 0xe4 in position 7: ordinal not in range(128)

Why is this important? (Common Use Cases)

You can't just mix str and bytes in operations. Python will raise a TypeError.

# This will FAIL!
# my_string + my_bytes  # TypeError: can only concatenate str (not "bytes") to str

Here’s where you need to be careful:

Reading from and Writing to Files

When you open a file, you must specify whether you're working with text ('r', 'w') or binary ('rb', 'wb').

Text Mode (Default): Python automatically handles encoding/decoding for you. By default, it uses the system's encoding, but it's best to be explicit.

# Writing a string to a file in text mode
with open('my_file.txt', 'w', encoding='utf-8') as f:
    f.write("Hello, 世界!")
# Reading from a file in text mode
with open('my_file.txt', 'r', encoding='utf-8') as f:
    content = f.read()  # content is a 'str'
    print(f"Read from file: {content} (type: {type(content)})")

Binary Mode: You work directly with bytes.

# Writing a string to a file in binary mode (you must encode it first)
my_string = "Hello, 世界!"
with open('my_file_bytes.bin', 'wb') as f:
    f.write(my_string.encode('utf-8'))
# Reading from a file in binary mode (you must decode it)
with open('my_file_bytes.bin', 'rb') as f:
    data = f.read()  # data is 'bytes'
    print(f"Read from file: {data} (type: {type(data)})")
    # Decode it to get the string back
    decoded_content = data.decode('utf-8')
    print(f"Decoded content: {decoded_content} (type: {type(decoded_content)})")

Network Communication

When you send data over a network (e.g., via a socket), it must be sent as a stream of bytes. Any string you send must first be encoded.

# Sending data (conceptual)
import socket
# Create a socket (server side)
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.bind(('127.0.0.1', 65432))
server_socket.listen()
conn, addr = server_socket.accept()
# Receive data as bytes
data_from_client = conn.recv(1024)
print(f"Received bytes: {data_from_client}")
# Decode the bytes to get the string
message_from_client = data_from_client.decode('utf-8')
print(f"Decoded message: {message_from_client}")
# Send a response (must encode the string first)
response = "Message received!"
conn.sendall(response.encode('utf-8'))
conn.close()

Summary Table

Feature str (String) bytes (Bytes)
Purpose Represents text. Represents raw binary data.
Content Sequence of Unicode characters (e.g., 'A', '你', '😂'). Sequence of integers from 0 to 255.
Literal Syntax 'text' or "text" b'bytes' or b"bytes"
Methods .find(), .replace(), .split(), .format(), .encode() .hex(), .fromhex(), .decode()
Common Use Storing and manipulating text, user input, file I/O in text mode. Network communication, reading/writing binary files (images, executables), cryptography.
Conversion my_str.encode('utf-8') -> bytes my_bytes.decode('utf-8') -> str
分享:
扫描分享到社交APP
上一篇
下一篇