杰瑞科技汇

Python字符串如何转为字节?

Of course! Converting a Python str (string) to bytes is a common task, especially when dealing with file I/O, network communication, or cryptography.

Python字符串如何转为字节?-图1
(图片来源网络,侵删)

Here’s a complete guide covering the most common methods, best practices, and important details.

The Short and Simple Answer

The most direct way to convert a string to bytes is by using the .encode() method. You must specify an encoding, with UTF-8 being the standard and recommended choice.

my_string = "Hello, World! 🌍"
# Convert the string to bytes using UTF-8 encoding
my_bytes = my_string.encode('utf-8')
print(f"Original String: {my_string}")
print(f"Type: {type(my_string)}")
print(f"\nConverted Bytes: {my_bytes}")
print(f"Type: {type(my_bytes)}")

Output:

Original String: Hello, World! 🌍
Type: <class 'str'>
Converted Bytes: b'Hello, World! \xf0\x9f\x8c\x8d'
Type: <class 'bytes'>

Key Concepts to Understand

  1. String vs. Bytes:

    Python字符串如何转为字节?-图2
    (图片来源网络,侵删)
    • A str is a sequence of Unicode characters. It's an abstract representation of text. For example, the character 'A' is the Unicode code point U+0041.
    • A bytes object is a sequence of raw 8-bit bytes (integers from 0-255). It's a concrete, physical representation of data.
  2. Encoding:

    • Encoding is the process of translating a string into a sequence of bytes. To do this, you need a "code map" or an encoding (like UTF-8, ASCII, Latin-1) that defines how each character is represented.
    • Decoding is the reverse process: translating a sequence of bytes back into a string using a specific encoding.

Analogy: Think of a string (str) as the text of a book written in English. The encoding (utf-8) is like a specific font or Morse code. The bytes object is the physical book printed with that font or the dots and dashes of the Morse code transmission. You need to know the "font" (encoding) to read the book correctly.


Detailed Methods and Examples

The str.encode() Method (Recommended)

This is the most common, "Pythonic," and readable way to perform the conversion.

my_string = "This is a test."
# Using UTF-8 (the default and highly recommended)
bytes_utf8 = my_string.encode('utf-8')
print(f"UTF-8: {bytes_utf8}")
# Using ASCII (only works for characters in the ASCII set)
# This will fail for characters like 'é' or '🌍'
try:
    bytes_ascii = my_string.encode('ascii')
    print(f"ASCII: {bytes_ascii}")
except UnicodeEncodeError as e:
    print(f"ASCII Error: {e}")
# Using Latin-1 (ISO-8859-1)
bytes_latin1 = my_string.encode('latin-1')
print(f"Latin-1: {bytes_latin1}")

The bytes() Constructor

You can also use the built-in bytes() constructor. It requires an iterable (like a string) and an encoding.

Python字符串如何转为字节?-图3
(图片来源网络,侵删)
my_string = "Another test."
# Using the bytes() constructor
bytes_from_constructor = bytes(my_string, 'utf-8')
print(f"From constructor: {bytes_from_constructor}")
# This is functionally identical to my_string.encode('utf-8')
print(my_string.encode('utf-8') == bytes_from_constructor) # Output: True

The bytearray() Type

A bytearray is a mutable version of bytes. You can modify its contents in-place. It's created in the same way as the bytes() constructor.

my_string = "Mutable test."
# Create a mutable bytearray
my_bytearray = bytearray(my_string, 'utf-8')
print(f"Original bytearray: {my_bytearray}")
# You can modify it
my_bytearray[0] = ord('M') # ord() gets the integer value of a character
print(f"Modified bytearray: {my_bytearray}")

Note: ord('M') is used because you cannot assign a character directly; you must assign an integer value between 0 and 255.


Common Encodings

Encoding Description When to Use
'utf-8' (Default) A variable-width encoding that can represent every character in the Unicode standard. It's backward-compatible with ASCII. Almost always. This is the standard for modern applications, the web, and most file formats.
'ascii' A 7-bit encoding that only covers English characters (A-Z, a-z, 0-9, and symbols). Only when you are certain your string contains only ASCII characters and need compatibility with very old systems.
'latin-1' (or 'iso-8859-1') A 8-bit encoding that covers Western European languages. It maps each byte directly to a character, so it will never raise an EncodeError. Legacy systems or when you need a simple 1-to-1 mapping between bytes and characters. Be careful, as it's not a universal standard.

Handling Errors During Encoding

What happens if you try to encode a character that isn't supported by the chosen encoding (e.g., trying to encode 'é' using ASCII)?

Python gives you several ways to handle this. You specify the error handling as the third argument to .encode().

problem_string = "Café"
# --- 1. 'strict' (Default) ---
# Raises a UnicodeEncodeError
try:
    problem_string.encode('ascii')
except UnicodeEncodeError as e:
    print(f"'strict' error: {e}")
# --- 2. 'ignore' ---
# Silently drops characters that cannot be encoded
ignored = problem_string.encode('ascii', errors='ignore')
print(f"'ignore' result: {ignored}") # Output: b'Caf'
# --- 3. 'replace' ---
# Replaces unencodable characters with a placeholder (usually '?')
replaced = problem_string.encode('ascii', errors='replace')
print(f"'replace' result: {replaced}") # Output: b'Caf?'
# --- 4. 'xmlcharrefreplace' ---
# Replaces unencodable characters with their XML/HTML entity reference
xml_replaced = problem_string.encode('ascii', errors='xmlcharrefreplace')
print(f"'xmlcharrefreplace' result: {xml_replaced}") # Output: b'Caf&#233;'

The Reverse: Bytes to String (bytes.decode())

To go back from bytes to str, you use the .decode() method.

my_bytes = b'Hello, World!'
# Decode the bytes back to a string
my_string_decoded = my_bytes.decode('utf-8')
print(f"Original Bytes: {my_bytes}")
print(f"Decoded String: '{my_string_decoded}'")
print(f"Type: {type(my_string_decoded)}")

Summary Table

Task Method Example
Convert str to bytes my_string.encode('utf-8') "hello".encode('utf-8')
Convert bytes to str my_bytes.decode('utf-8') b'hello'.decode('utf-8')
Convert str to bytearray bytearray(my_string, 'utf-8') bytearray("test", 'utf-8')
分享:
扫描分享到社交APP
上一篇
下一篇