python request 乱码-杰瑞科技汇

问题根源

乱码的根本原因是 编码不一致。

服务器 发送数据时，使用了某种编码（GBK）。
你的 Python 代码 在读取数据时，默认使用了另一种编码（通常是 requests 推断的 ISO-8859-1）。
当这两种编码不匹配时,解码后的自然就是一堆看不懂的乱码。

解决方案

下面我将从最简单、最推荐的解决方案开始,逐步介绍其他情况的处理方法。

使用 `response.text` 并手动指定编码（最推荐）

这是最直接、最常用的方法。requests 的 Response 对象有一个 encoding 属性,你可以直接设置它。

步骤：

检查服务器编码：服务器会在 HTTP 响应头中通过 Content-Type 字段告诉客户端它使用了什么编码。Content-Type: text/html; charset=gbk,你可以先打印响应头来查看。
手动设置编码：在获取 response.text 之前，将正确的编码赋值给 response.encoding。

示例代码：

import requests
# 假设这是一个返回GBK编码内容的网站
url = 'http://example.com/some-gbk-encoded-page' 
# 这里用知乎作为例子，它的部分页面是GBK编码
url = 'https://www.zhihu.com/question/278960545'
try:
    response = requests.get(url)
    # 1. 打印响应头，查看 Content-Type
    print("原始响应头:")
    print(response.headers)
    print("-" * 30)
    # 2. 如果看到 charset=gbk 或其他编码，就手动设置
    # 即使 response.headers 里没有 charset，但你知道是GBK，也可以手动设置
    response.encoding = 'gbk' 
    # 3. 现在再获取 text，就不会乱码了
    html_content = response.text
    # 打印前500个字符，检查是否正常
    print(html_content[:500])
except requests.exceptions.RequestException as e:
    print(f"请求失败: {e}")

为什么这是最佳实践？ 因为 response.text 会自动帮你处理编码和解码，得到一个可以直接使用的字符串，而 response.content 是原始字节流，你需要自己处理编码，优先使用 response.text 并正确设置 encoding。

使用 `response.content` 并手动解码

如果你需要处理二进制数据（如图片、PDF），或者想更底层地控制解码过程，可以使用 response.content。response.content 返回的是原始的字节串（bytes），你需要用 .decode() 方法将其解码成字符串。

示例代码：

import requests
url = 'http://example.com/some-gbk-encoded-page'
try:
    response = requests.get(url)
    # response.content 是 bytes 类型
    # 直接用 .decode() 指定正确的编码
    html_content = response.content.decode('gbk')
    # 打印前500个字符
    print(html_content[:500])
except requests.exceptions.RequestException as e:
    print(f"请求失败: {e}")

什么时候用这个？ 当你明确需要字节流时（要把下载的图片保存到文件），或者当你对 response.text 的自动推断结果不满意,想完全掌控解码过程时。

让 `requests` 自动推断编码（不推荐用于已知乱码的情况）

requests 会尝试从响应头的 Content-Type 中提取 charset，如果找不到，它会默认使用 ISO-8859-1 编码来解码 response.text。ISO-8859-1 是一个“安全”的编码，因为它不会将任何字节解码成特殊字符,但几乎肯定不是你想要的中文编码。

这个方法只适用于服务器明确告知了正确编码的情况。

import requests
url = 'https://www.baidu.com' # 百度首页通常是UTF-8
try:
    response = requests.get(url)
    # 让 requests 自己去猜编码
    # 如果服务器头里是 Content-Type: text/html; charset=utf-8，那就没问题
    html_content = response.text 
    print(html_content[:500])
except requests.exceptions.RequestException as e:
    print(f"请求失败: {e}")

如果服务器没有提供 charset，或者提供了错误的 charset，这个方法就会失败,导致乱码。

实战演练：一个典型的乱码问题

假设你爬取一个老旧的论坛,页面显示为乱码。

观察现象

import requests
url = 'http://some-old-forum.com/article/123'
response = requests.get(url)
print(response.text) # 输出类似：´ÓÒâ´®ÄÚÈÝ´ó¶à...

调试：检查响应头

print(response.headers)

可能会看到类似这样的输出：

{
    'Content-Type': 'text/html; charset=GBK',  # <-- 找到了！是GBK
    'Server': 'nginx/1.0.0',
    ...
}

应用解决方案 在代码中添加一行，明确告诉 requests 使用 GBK 编码。

import requests
url = 'http://some-old-forum.com/article/123'
response = requests.get(url)
# 关键的一行：手动设置编码
response.encoding = 'gbk' 
# 现在获取的内容就是正常的中文了
article_content = response.text
print(article_content)

总结与最佳实践

方法	适用场景	优点	缺点
`response.encoding = '...'`	强烈推荐，当你知道或通过响应头推断出编码时。	简单、直接，`response.text` 返回可直接使用的字符串。	需要提前知道正确的编码。
`response.content.decode('...')`	需要处理二进制数据，或想手动控制解码过程时。	灵活性高，能获取最原始的字节。	需要手动处理，代码稍显繁琐。
默认 `response.text`	服务器明确提供了正确的 `charset`是标准文本（如UTF-8）。	代码最简洁。	对老旧或配置不当的服务器容易出错。

黄金法则：

遇到乱码，第一件事是 print(response.headers)，查看 Content-Type 里的 charset。
charset 是 gbk, gb2312, big5 等中文编码，立即使用 response.encoding = 'gbk' 来解决。
如果没有 charset，但页面内容明显是中文，可以尝试常见的中文编码（如 gbk, utf-8）进行测试。
优先使用 response.text，只在必要时使用 response.content。

python request 乱码

问题根源

解决方案

使用 `response.text` 并手动指定编码（最推荐）

使用 `response.content` 并手动解码

让 `requests` 自动推断编码（不推荐用于已知乱码的情况）

实战演练：一个典型的乱码问题

总结与最佳实践

99ANYc3cd6

Java成员方法如何调用其他成员方法？

Java如何获取URL参数？

python add option

Excel透视怎么用？视频教程速学！

Java的List对象如何排序？

Java Socket编程实例具体怎么实现？

Java在Win10上怎么安装？

Python中int与string类型如何转换？

Redis视频教程如何结合Java开发实战？

Python如何实现Android拍照功能？

Java Hello World代码为何这样写？

Java如何实现Base64图片转换？

Ubuntu双系统安装要注意哪些关键步骤？

静态变量和实例变量到底有啥区别？

MobaXterm怎么用？新手入门指南看这里！

Java字符串如何转字符串数组？

python request 乱码

问题根源

解决方案

使用 response.text 并手动指定编码（最推荐）

使用 response.content 并手动解码

让 requests 自动推断编码（不推荐用于已知乱码的情况）

实战演练：一个典型的乱码问题

总结与最佳实践

相关推荐

Java Socket编程实例具体怎么实现？

使用 `response.text` 并手动指定编码（最推荐）

使用 `response.content` 并手动解码

让 `requests` 自动推断编码（不推荐用于已知乱码的情况）