Python字符串与字节如何转换？-杰瑞科技汇

Of course! This is a fundamental and often confusing topic in Python. Let's break it down clearly.

The Core Difference: Text vs. Bytes

Think of it like this:

A String (str) is an abstract representation of text. It's a sequence of human-readable characters.
- Example: "Hello, 世界!"
- Internally, Python doesn't store 'H', 'e', 'l', etc. It stores a sequence of Unicode code points. The character 'H' is the code point U+0048, and the character '世' is U+4E16.
Bytes (bytes) is a concrete sequence of raw, 8-bit values (integers from 0 to 255). It's a low-level representation of data.
- Example: b'Hello, \xe4\xb8\x96\xe7\x95\x8c!'
- This is what you actually send over a network, write to a file, or use to encode a string.

The key takeaway: A str is text, while bytes is the binary data that represents that text in a specific encoding.

Python 3: The Clear Separation (Recommended)

In Python 3, str and bytes are two distinct, incompatible types. This is a major improvement over Python 2 and forces you to be explicit about encoding and decoding.

Encoding: Converting `str` to `bytes`

You use the .encode() method on a string to turn it into bytes. You must specify an encoding (like 'utf-8', 'ascii', 'latin-1').

# Our text string
my_string = "Hello, 世界!"
# --- UTF-8 Encoding (Most Common) ---
# UTF-8 is a variable-width encoding that can represent every character in Unicode.
# It's the standard for the web and most modern applications.
utf8_bytes = my_string.encode('utf-8')
print(f"Original String: {my_string}")
print(f"Type: {type(my_string)}")
print(f"UTF-8 Bytes: {utf8_bytes}")
print(f"Type: {type(utf8_bytes)}")
print("-" * 20)
# --- ASCII Encoding (Limited) ---
# ASCII can only represent characters from 0-127. It will fail on characters outside this range.
try:
    ascii_bytes = my_string.encode('ascii')
except UnicodeEncodeError as e:
    print(f"Encoding to ASCII failed: {e}")
    print("Because '世' and '界' are not ASCII characters.")
print("-" * 20)
# --- Latin-1 Encoding (Handles more, but still not all) ---
# Latin-1 (ISO-8859-1) can represent characters 0-255. It can encode '世' and '界'
# but it will use the wrong code points, corrupting the original meaning.
latin1_bytes = my_string.encode('latin-1')
print(f"Latin-1 Bytes: {latin1_bytes}")
print("Note: The bytes for '世' and '界' are different from UTF-8.")

Output:

Original String: Hello, 世界!
Type: <class 'str'>
UTF-8 Bytes: b'Hello, \xe4\xb8\x96\xe7\x95\x8c!'
Type: <class 'bytes'>
--------------------
Encoding to ASCII failed: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128)
Because '世' and '界' are not ASCII characters.
--------------------
Latin-1 Bytes: b'Hello, \xa4\xa6\xa7\xa5!'
Note: The bytes for '世' and '界' are different from UTF-8.

Decoding: Converting `bytes` to `str`

You use the .decode() method on a bytes object to turn it back into a string. Again, you must specify the encoding that was used to create the bytes.

# Let's use the UTF-8 bytes from the previous example
utf8_bytes = b'Hello, \xe4\xb8\x96\xe7\x95\x8c!'
# Decode the bytes back into a string
decoded_string = utf8_bytes.decode('utf-8')
print(f"Original Bytes: {utf8_bytes}")
print(f"Decoded String: {decoded_string}")
print(f"Type: {type(decoded_string)}")
print("-" * 20)
# --- What if you use the wrong encoding? ---
# If you try to decode bytes with the wrong encoding, you get garbage or an error.
# Let's try to decode the UTF-8 bytes using ASCII.
try:
    wrong_decoded = utf8_bytes.decode('ascii')
except UnicodeDecodeError as e:
    print(f"Decoding with ASCII failed: {e}")
    print("Because the byte \\xe4 is not a valid ASCII character.")

Output:

Original Bytes: b'Hello, \xe4\xb8\x96\xe7\x95\x8c!'
Decoded String: Hello, 世界!
Type: <class 'str'>
--------------------
Decoding with ASCII failed: 'ascii' codec can't decode byte 0xe4 in position 7: ordinal not in range(128)
Because the byte \xe4 is not a valid ASCII character.

Python 2: The "Messy" Way (For Legacy Code)

In Python 2, there was no clear separation. str was a sequence of bytes, and unicode was for text. This led to many bugs.

str: A sequence of bytes. Its encoding was ambiguous.
unicode: A sequence of Unicode characters (like Python 3's str).

Key Python 2 Concepts:

Creating a Unicode string: Use the u prefix.
```
my_unicode_string = u"Hello, 世界!"
```

Encoding a Unicode string to a byte string (str): Use .encode().

my_byte_string = my_unicode_string.encode('utf-8')
# my_byte_string is now a 'str' type, but it's properly encoded UTF-8 bytes.

Decoding a byte string (str) to a Unicode string: Use .decode().
```
back_to_unicode = my_byte_string.decode('utf-8')
```

The danger in Python 2 was that if you forgot to encode/decode, Python 2 would try to do it for you using your system's default encoding (often ascii), leading to UnicodeDecodeError or silent data corruption.

Practical Examples

Example 1: Reading from a File

When you read from a file in binary mode ('rb'), you get bytes. You must decode it to get a str.

# Let's create a dummy file
with open("my_data.txt", "w", encoding='utf-8') as f:
    f.write("This is some text with a newline.\n")
# Now, read it back in binary mode
with open("my_data.txt", "rb") as f:
    # read() returns bytes
    file_content_bytes = f.read()
print(f"Content from file (bytes): {file_content_bytes}")
print(f"Type: {type(file_content_bytes)}")
# You MUST decode it to work with it as a string
file_content_str = file_content_bytes.decode('utf-8')
print(f"Content from file (str): {repr(file_content_str)}")
print(f"Type: {type(file_content_str)}")

Example 2: Networking (HTTP Request)

When you send data over a network, it must be in bytes.

import urllib.request
# The URL we want to request
url = "https://www.example.com"
# In Python 3, urllib.request.urlopen() returns a bytes-like object
with urllib.request.urlopen(url) as response:
    # response.read() returns bytes
    html_bytes = response.read()
# To inspect or process the HTML as a string, decode it
# The encoding is often specified in the HTTP headers, but 'utf-8' is a safe bet.
html_string = html_bytes.decode('utf-8')
print(f"First 100 characters of HTML (str):\n{html_string[:100]}")

Summary Table

Feature	Python 3 `str` (Text)	Python 3 `bytes` (Binary)	Python 2 `str` (Bytes)	Python 2 `unicode` (Text)
Purpose	Abstract text representation	Raw 8-bit data	Raw 8-bit data	Abstract text representation
Literal	`"hello"`	`b'hello'`	`"hello"`	`u"hello"`
Methods	`.encode()`	`.decode()`	`.decode()`	`.encode()`
Use Case	Storing text, in-memory processing	Network I/O, File I/O, Cryptography	Same as Python 3 `bytes`	Same as Python 3 `str`
Key Rule	Always encode to bytes before sending.	Always decode to string after receiving.	Ambiguous encoding led to bugs.	Safer, but required explicit conversion.

Final Advice: Always use Python 3. Be explicit about your encoding (UTF-8 is the best default). Remember the mantra: encode str to bytes for I/O, and decode bytes to str for processing.

Python字符串与字节如何转换？

The Core Difference: Text vs. Bytes

Python 3: The Clear Separation (Recommended)

Encoding: Converting `str` to `bytes`

Decoding: Converting `bytes` to `str`

Python 2: The "Messy" Way (For Legacy Code)

Practical Examples

Example 1: Reading from a File

Example 2: Networking (HTTP Request)

Summary Table

99ANYc3cd6

JTextField Java 如何使用？

企业资源计划ERP教程如何快速上手？

Java JTextField如何实现输入限制与事件监听？

python nslackssvm

斗鱼OBS教程，直播推流怎么设置？

Java Socket编程实例具体怎么实现？

Eclipse+Android+Java如何高效开发应用？

Python winsound.beep如何发出不同频率的蜂鸣声？

OpenCV Python摄像头如何实时调用？

mysql workbench安装教程

ArrayList与List有何区别？

Java DateTime如何高效转String？

Python字符串如何高效移除指定字符？

java软件测试方法有哪些方法有哪些方法

Python assert如何正确导入与使用？

Python struct与socket如何高效处理二进制数据？

Python字符串与字节如何转换？

The Core Difference: Text vs. Bytes

Python 3: The Clear Separation (Recommended)

Encoding: Converting str to bytes

Decoding: Converting bytes to str

Python 2: The "Messy" Way (For Legacy Code)

Practical Examples

Example 1: Reading from a File

Example 2: Networking (HTTP Request)

Summary Table

相关推荐

Java Socket编程实例具体怎么实现？

Encoding: Converting `str` to `bytes`

Decoding: Converting `bytes` to `str`