Python urllib2如何抓取图片？-杰瑞科技汇

urllib2 是 Python 2 的标准库模块，在 Python 3 中，它被重构并合并到了 urllib 包中，变成了 urllib.request。 将分为两部分：

Python 3 的推荐方式：使用 urllib.request。
Python 2 的旧方式：使用 urllib2（为了兼容旧代码）。

Python 3 的推荐方式 (`urllib.request`)

这是目前的主流和推荐方法,它的核心流程是：

导入 urllib.request 模块。
使用 urllib.request.urlopen() 打开图片的 URL。
读取图片的二进制数据。
将二进制数据写入到本地文件中。

示例代码

下面是一个完整的、可运行的 Python 3 示例，它会下载一张示例图片并保存到本地。

import urllib.request
import os
# 1. 目标图片的 URL
# 这里使用一个来自unsplash的示例图片
image_url = "https://images.unsplash.com/photo-1543857778-c4a1a569e7bd?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1200&q=80"
# 2. 本地保存的文件名
file_name = "downloaded_image.jpg"
try:
    # 3. 发送请求并获取响应
    # urlopen 会返回一个文件类对象，我们可以从中读取数据
    with urllib.request.urlopen(image_url) as response:
        # 4. 读取图片的二进制内容
        # response.read() 会返回图片的原始字节流
        image_data = response.read()
    # 5. 将图片数据写入本地文件
    # 使用 'wb' 模式（write binary）来写入二进制数据
    with open(file_name, 'wb') as f:
        f.write(image_data)
    print(f"图片已成功保存为 {file_name}")
except urllib.error.URLError as e:
    print(f"下载失败，URL错误: {e.reason}")
except Exception as e:
    print(f"发生未知错误: {e}")

代码详解

import urllib.request: 导入处理 URL 请求的模块。
image_url: 你想要下载的图片的完整网络地址。
file_name: 你希望在本地保存的文件名。
with urllib.request.urlopen(image_url) as response::
- urllib.request.urlopen() 函数用于打开给定的 URL。
- with 语句可以确保在代码块执行完毕后，网络连接会被自动关闭，这是一种良好的编程习惯。
- response 对象包含了服务器的响应信息，比如状态码、响应头等，最重要的功能是我们可以从中读取数据。
image_data = response.read():
- response.read() 方法会读取整个响应体的内容，对于图片，内容就是二进制数据（bytes）。
with open(file_name, 'wb') as f::
- open() 函数用于打开本地文件。
- 'wb' 是关键。w 代表写入（write），b 代表二进制（binary），图片是二进制文件，必须以二进制模式打开，否则数据会损坏。
f.write(image_data): 将从网络上读取的二进制数据写入到本地文件中。

Python 2 的旧方式 (`urllib2`)

如果你正在维护一个 Python 2 项目，代码会稍有不同，主要区别在于 urllib2 是一个独立的模块，并且处理异常的方式也略有差异。

示例代码

import urllib2
import os
# 1. 目标图片的 URL
image_url = "https://images.unsplash.com/photo-1543857778-c4a1a569e7bd?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1200&q=80"
# 2. 本地保存的文件名
file_name = "downloaded_image_py2.jpg"
try:
    # 3. 发送请求并获取响应
    response = urllib2.urlopen(image_url)
    # 4. 读取图片的二进制内容
    image_data = response.read()
    # 5. 将图片数据写入本地文件
    with open(file_name, 'wb') as f:
        f.write(image_data)
    print(f"图片已成功保存为 {file_name}")
    # 6. 关闭响应 (在Python 2中，with语句可能不适用于所有文件对象，最好手动关闭)
    response.close()
except urllib2.URLError as e:
    print(f"下载失败，URL错误: {e.reason}")
except Exception as e:
    print(f"发生未知错误: {e}")

Python 2 与 Python 3 的主要区别

特性	Python 3 (`urllib.request`)	Python 2 (`urllib2`)
模块名	`urllib.request`	`urllib2`
打开URL	`urllib.request.urlopen()`	`urllib2.urlopen()`
异常类	`urllib.error.URLError`	`urllib2.URLError`
读取数据	`response.read()`	`response.read()`
文件模式	`'wb'` (二进制写入)	`'wb'` (二进制写入)
资源管理	推荐使用 `with` 语句	`with` 也可用，但手动 `response.close()` 更保险

进阶：添加请求头（模拟浏览器）

有些网站会检查请求的来源,如果发现是简单的爬虫脚本，可能会拒绝访问，为了解决这个问题，我们可以添加 User-Agent 请求头，让我们的请求看起来像是从真实浏览器发出的。

Python 3 示例

import urllib.request
import os
image_url = "https://httpbin.org/image/png" # 一个可以用来测试的图片URL
file_name = "downloaded_image_with_header.png"
# 创建一个请求对象，并添加自定义的请求头
req = urllib.request.Request(
    image_url,
    headers={
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }
)
try:
    # 使用带有请求头的 Request 对象来打开 URL
    with urllib.request.urlopen(req) as response:
        image_data = response.read()
    with open(file_name, 'wb') as f:
        f.write(image_data)
    print(f"图片已成功保存为 {file_name}")
except urllib.error.URLError as e:
    print(f"下载失败，URL错误: {e.reason}")
except Exception as e:
    print(f"发生未知错误: {e}")

更现代的替代方案：`requests` 库

虽然 urllib 是 Python 的标准库，不需要安装，但它的 API 相对繁琐，在实际开发中，更推荐使用第三方库 requests，它提供了更简洁、更人性化的 API。

你需要安装它：

pip install requests

用 requests 下载图片的代码会非常简单：

import requests
image_url = "https://images.unsplash.com/photo-1543857778-c4a1a569e7bd?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1200&q=80"
file_name = "downloaded_image_requests.jpg"
try:
    # 发送 GET 请求
    # stream=True 表示流式下载，对于大文件更高效
    response = requests.get(image_url, stream=True)
    response.raise_for_status() # 如果请求失败 (状态码非 200)，则抛出异常
    # 写入文件
    with open(file_name, 'wb') as f:
        # iter_content(chunk_size=8192) 分块读取内容，避免内存溢出
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    print(f"图片已成功保存为 {file_name}")
except requests.exceptions.RequestException as e:
    print(f"下载失败: {e}")

方法	优点	缺点	适用场景
`urllib.request` (Python 3)	无需安装，标准库	API 繁琐，功能相对基础	简单脚本、标准环境、不想安装第三方库
`urllib2` (Python 2)	无需安装，标准库	Python 2 已过时，API 同样繁琐	维护旧的 Python 2 项目
`requests`	API 简洁优雅，功能强大（会话、Cookie、上传等）	需要第三方安装 (`pip install requests`)	强烈推荐，几乎所有需要网络请求的 Python 项目

对于新项目,请优先使用 requests 库，如果你不能安装第三方库，那么在 Python 3 中使用 urllib.request 是正确的选择。

Python urllib2如何抓取图片？

Python 3 的推荐方式 (`urllib.request`)

示例代码

代码详解

Python 2 的旧方式 (`urllib2`)

示例代码

Python 2 与 Python 3 的主要区别

进阶：添加请求头（模拟浏览器）

Python 3 示例

更现代的替代方案：`requests` 库

99ANYc3cd6