杰瑞科技汇

Python3中str如何转为bytes?

Of course! Converting a Python str to bytes is a common task, especially when dealing with network programming, file I/O, or cryptography. Here’s a comprehensive guide covering the best ways to do it, the "why," and important details.

Python3中str如何转为bytes?-图1
(图片来源网络,侵删)

The Short and Sweet Answer

The most common and Pythonic way to convert a string to bytes is by using the .encode() method.

my_string = "Hello, World!"
# Encode the string into bytes using UTF-8 (the default)
my_bytes = my_string.encode()
print(my_string)
# Output: Hello, World!
print(my_bytes)
# Output: b'Hello, World!'
print(type(my_string))
# Output: <class 'str'>
print(type(my_bytes))
# Output: <class 'bytes'>

The Detailed Explanation

The .encode() Method (Recommended)

This is the standard, object-oriented way to perform the conversion. Every string object in Python has an encode() method that returns a bytes representation of the string.

Syntax: string.encode(encoding='utf-8', errors='strict')

  • encoding: This is the most important parameter. It specifies the character encoding scheme to use. UTF-8 is the modern standard and the default, but others exist (like 'ascii', 'latin-1', 'utf-16').
  • errors: This parameter tells Python how to handle characters that cannot be encoded in the specified scheme.
    • 'strict' (default): Raises a UnicodeEncodeError if it encounters an unencodable character.
    • 'ignore': Silently drops characters that cannot be encoded.
    • 'replace': Replaces unencodable characters with a placeholder (usually or ).

Examples with Different Encodings:

Python3中str如何转为bytes?-图2
(图片来源网络,侵删)
text = "café"
# Using the default UTF-8 encoding
bytes_utf8 = text.encode('utf-8')
print(f"UTF-8: {bytes_utf8}")
# Output: UTF-8: b'caf\xc3\xa9'  # The 'é' is represented by two bytes
# Using Latin-1 encoding (which can represent 'é' in a single byte)
bytes_latin1 = text.encode('latin-1')
print(f"Latin-1: {bytes_latin1}")
# Output: Latin-1: b'caf\xe9'
# Using ASCII encoding (which cannot represent 'é')
try:
    text.encode('ascii')
except UnicodeEncodeError as e:
    print(f"ASCII Error: {e}")
# Output: ASCII Error: 'ascii' codec can't encode character '\xe9' in position 3: ordinal not in range(128)

The bytes() Constructor

You can also use the built-in bytes() constructor. It's more verbose but can be useful in specific situations, like creating a bytes object from an iterable of integers.

Syntax: bytes(string, encoding)

Example:

my_string = "Python"
# Using the bytes constructor
my_bytes = bytes(my_string, 'utf-8')
print(my_bytes)
# Output: b'Python'

While this works, my_string.encode('utf-8') is generally preferred for its clarity and conciseness.

Python3中str如何转为bytes?-图3
(图片来源网络,侵删)

The bytearray() Type (For Mutable Bytes)

If you need a mutable sequence of bytes (i.e., you can change its contents after creation), you should use bytearray().

Example:

my_string = "mutable"
my_bytearray = bytearray(my_string, 'utf-8')
print(my_bytearray)
# Output: bytearray(b'mutable')
# You can modify it in-place
my_bytearray[0] = ord('M') # You must assign an integer (ordinal value of the character)
print(my_bytearray)
# Output: bytearray(b'Mutable')

Remember, bytes is immutable (like a tuple), while bytearray is mutable (like a list).


Why Do We Need to Convert str to bytes?

In Python 3, there is a clear and important distinction between text (str) and data (bytes).

  • str: A sequence of Unicode characters. It's an abstract representation of text. You use it for all your string manipulations, printing, and user-facing text.
  • bytes: A sequence of raw 8-bit values (integers from 0 to 255). It's a concrete, low-level representation of data.

You need to convert str to bytes whenever you are interacting with systems that don't understand abstract Unicode text, such as:

  1. Network Sockets: Sending data over a network requires it to be in a binary format.

    message = "Hello server"
    # Send encoded bytes over the network
    # socket.send(message.encode('utf-8'))
  2. File I/O: When you open a file in binary mode ('wb' or 'rb'), you are reading from or writing raw bytes, not text.

    text = "This will be written to a file."
    with open('my_file.bin', 'wb') as f:
        f.write(text.encode('utf-8'))
  3. Cryptography: Hashing algorithms (like SHA-256) and encryption/decryption operations work on bytes, not strings.

    import hashlib
    data_to_hash = "my secret message"
    # You must encode the string before hashing
    hash_object = hashlib.sha256(data_to_hash.encode('utf-8'))
    hex_dig = hash_object.hexdigest()
    print(hex_dig)

Common Pitfall: Forgetting to Encode

A very common error for Python 3 newcomers is trying to use a str where a bytes object is expected.

# Example: Writing to a binary file
text = "some data"
# WRONG - This will raise a TypeError
try:
    with open('error_file.bin', 'wb') as f:
        f.write(text) # f.write() expects bytes, got str
except TypeError as e:
    print(f"Error: {e}")
    # Output: Error: a bytes-like object is required, not 'str'
# CORRECT - You must encode the string first
with open('correct_file.bin', 'wb') as f:
    f.write(text.encode('utf-8'))

Summary Table

Method Syntax Use Case Mutability
.encode() my_str.encode('utf-8') Recommended. The standard, most Pythonic way to convert a string to bytes. N/A (creates a new immutable bytes object)
bytes() bytes(my_str, 'utf-8') An alternative to .encode(). Verbose but functional. Useful for creating bytes from other iterables. N/A (creates a new immutable bytes object)
bytearray() bytearray(my_str, 'utf-8') When you need a mutable sequence of bytes that can be changed in-place. Mutable
分享:
扫描分享到社交APP
上一篇
下一篇