python str unicode-杰瑞科技汇

Of course! This is a fundamental and often confusing topic in Python, especially for developers who have worked with both Python 2 and Python 3. Let's break it down clearly.

The Core Idea: Bytes vs. Characters

At the heart of the str vs. unicode issue is the distinction between bytes and characters.

Bytes: A sequence of 8-bit values (0-255). This is how data is actually stored on your disk or transmitted over a network. It's just raw, numerical data.
Characters: An abstract concept, like the letter 'A', the Chinese character '中', or the euro symbol '€'. A character isn't a number; it's an idea.

The job of an encoding (like UTF-8, ASCII, Latin-1) is to translate between these two:

Encoding: Translating characters into bytes.
Decoding: Translating bytes into characters.

The Difference: Python 2 vs. Python 3

This is the most critical point. The meaning of str and unicode changed dramatically between these two versions.

Python 2 (The "Old" Way)

In Python 2, there were two distinct string types:

`str` (The "Byte String")

What it is: A sequence of bytes.
Default Encoding: By default, Python 2 assumed your str was encoded in ASCII.
Problem: You could create a str containing non-ASCII characters (like ), but Python would have no idea what encoding it was in. This led to cryptic UnicodeDecodeError and UnicodeEncodeError exceptions.

Example:

# This is a byte string. Python 2 doesn't know its encoding.
my_str = "Hello, world! 你好" 
# On my system, this is actually a UTF-8 encoded byte string.
# But Python 2 just sees it as a sequence of bytes.

`unicode` (The "Unicode String")

What it is: A sequence of abstract characters. It's an internal representation that is not tied to any specific encoding.
Purpose: To correctly handle text from all languages without ambiguity.
How to create: You create a unicode string by decoding a str (byte string) using a specific encoding.

Example:

# my_str is a byte string (let's assume it's UTF-8 encoded)
my_str = "Hello, world! 你好"
# To get a proper unicode string, you must DECODE it
my_unicode = my_str.decode('utf-8')
print type(my_str)      # <type 'str'>
print type(my_unicode)  # <type 'unicode'>
# Now you can do things that require knowing the character, not the bytes
print len(my_unicode)   # 14 (it counts characters: 'H','e','l','l','o',...,'你','好')

The Golden Rule in Python 2: "Unicode sandwich".

The "bread" is your external interface (reading from a file, getting from a network request). This should be bytes (str).
The "filling" is all your internal processing. This should be unicode.
You decode bytes to unicode when you read them in, and encode unicode back to bytes when you write them out.

# Python 2 Golden Rule Example
# 1. Read bytes from a file (the top slice of bread)
    with open('my_file.txt', 'r') as f:
        # f.read() returns a byte string ('str')
        data_from_file = f.read()
# 2. Decode to unicode for processing (the filling)
    text_data = data_from_file.decode('utf-8')
    # ... do all your text manipulation here with text_data (unicode) ...
# 3. Encode back to bytes to write or send (the bottom slice of bread)
    data_to_write = text_data.encode('utf-8')
    with open('another_file.txt', 'w') as f:
        f.write(data_to_write)

Python 3 (The "New" Way)

Python 3 was designed to fix this confusion by making the str vs. bytes distinction explicit and defaulting to the robust UTF-8 encoding.

`str` (The "Text String")

What it is: A sequence of abstract characters. This is what Python 2 called unicode.
Default Encoding: The default encoding for your source code files is UTF-8. You can now write non-ASCII characters directly in your strings.
Purpose: This is the type you should use for all your text processing.

Example:

# This is a text string. It stores characters, not bytes.
# Python 3 knows this is a string of characters.
my_str = "Hello, world! 你好"
print(type(my_str))     # <class 'str'>
print(len(my_str))      # 14 (counts characters)
print(my_str[0])        # 'H'

`bytes` (The "Byte String")

What it is: A sequence of bytes. This is what Python 2 called str.
Purpose: Used for raw binary data (like images, network packets, or when you need to interface with a legacy system that only works with bytes).
How to create: You create a bytes object by encoding a str (text string).

Example:

# my_str is a text string ('str')
my_str = "Hello, world! 你好"
# To get a byte string, you must ENCODE it
my_bytes = my_str.encode('utf-8')
print(type(my_bytes))   # <class 'bytes'>
print(my_bytes)         # b'Hello, world! \xe4\xbd\xa0\xe5\xa5\xbd'
# The \xe4... are the UTF-8 byte representations for '你' and '好'
# You can also create a bytes literal with a 'b' prefix
my_bytes_literal = b"Hello, world!"
print(type(my_bytes_literal)) # <class 'bytes'>

The Golden Rule in Python 3: It's much simpler.

Use str for all text.
Use bytes for all binary data.
Encode str -> bytes when you need to send or store text.
Decode bytes -> str when you receive or read text.

# Python 3 Golden Rule Example
# 1. Read bytes from a file
    with open('my_file.txt', 'rb') as f: # Note the 'rb' (read bytes)
        data_from_file = f.read() # data_from_file is 'bytes'
# 2. Decode to text (str) for processing
    text_data = data_from_file.decode('utf-8') # text_data is 'str'
    # ... do all your text manipulation here with text_data ...
# 3. Encode back to bytes to write
    data_to_write = text_data.encode('utf-8') # data_to_write is 'bytes'
    with open('another_file.txt', 'wb') as f: # Note the 'wb' (write bytes)
        f.write(data_to_write)

Summary Table

Feature	Python 2 `str`	Python 2 `unicode`	Python 3 `str`	Python 3 `bytes`
Type	Byte String	Text String	Text String	Byte String
Content	Raw bytes (0-255)	Abstract characters	Abstract characters	Raw bytes (0-255)
Encoding	Assumed ASCII by default	N/A (it's the decoded form)	Default is UTF-8 for source code	N/A (it's the encoded form)
Literal	`"hello"`	`u"hello"`	`"hello"`	`b"hello"`
How to get	From files, network	`my_str.decode('utf-8')`	From files, network	`my_str.encode('utf-8')`
`len()`	Counts bytes	Counts characters	Counts characters	Counts bytes

Key Takeaway

In Python 3, you should almost always work with str for text. Only use bytes when you are dealing with binary data or need to interface with a low-level system that requires it.
The unicode type does not exist in Python 3. It has been replaced by the improved str type.
The most common source of errors in Python 2 was mixing str and unicode without proper encoding/decoding. Python 3's design makes this much harder to do by accident.

python str unicode

The Core Idea: Bytes vs. Characters

The Difference: Python 2 vs. Python 3

Python 2 (The "Old" Way)

`str` (The "Byte String")

`unicode` (The "Unicode String")

Python 3 (The "New" Way)

`str` (The "Text String")

`bytes` (The "Byte String")

Summary Table

Key Takeaway

99ANYc3cd6

JTextField Java 如何使用？

企业资源计划ERP教程如何快速上手？

Java JTextField如何实现输入限制与事件监听？

python nslackssvm

斗鱼OBS教程，直播推流怎么设置？

Java Socket编程实例具体怎么实现？

Eclipse如何编译Java代码？

Python ListWidget如何高效添加并显示数据？

MySQL5.7安装教程详细步骤是怎样的？

手机钢化膜怎么贴才无气泡？

Java如何将Excel数据高效导入数据库？

如何在Python中高效使用coordinate？

contextlib如何简化资源管理？

Java ByteBuffer如何高效使用？

c程序设计基础教程答案哪里找？

Python addhandler如何使用？

python str unicode

The Core Idea: Bytes vs. Characters

The Difference: Python 2 vs. Python 3

Python 2 (The "Old" Way)

str (The "Byte String")

unicode (The "Unicode String")

Python 3 (The "New" Way)

str (The "Text String")

bytes (The "Byte String")

Summary Table

Key Takeaway

相关推荐

Java Socket编程实例具体怎么实现？

`str` (The "Byte String")

`unicode` (The "Unicode String")

`str` (The "Text String")

`bytes` (The "Byte String")