杰瑞科技汇

Python buildopener如何自定义与使用?

Of course! Let's dive deep into urllib.request.build_opener. This is a fundamental tool in Python for making HTTP requests, especially when you need more control than the basic urllib.request.urlopen() provides.

Python buildopener如何自定义与使用?-图1
(图片来源网络,侵删)

What is build_opener?

build_opener is a function from Python's urllib.request module. Its primary job is to create a custom opener object.

Think of it this way:

  • urlopen(): This is like using a simple, default web browser. It's great for quick, simple requests but doesn't have any special extensions or cookies.
  • build_opener(): This is like building a custom browser. You can add "extensions" (called handlers) to it before you start browsing. These handlers can add functionality like handling cookies, managing redirects, working through proxies, or adding custom headers.

The "opener" object you get back is an instance of OpenerDirector, which is the object you actually call to open a URL (e.g., opener.open(...)).

Why Use build_opener?

You use build_opener when you need to do more than just fetch a URL. Common reasons include:

Python buildopener如何自定义与使用?-图2
(图片来源网络,侵删)
  1. Handling Cookies: To automatically send and receive cookies across multiple requests to the same domain.
  2. Handling HTTP Errors: To automatically handle redirects (e.g., HTTP 301, 302) or to raise specific exceptions for HTTP errors (like 404 Not Found).
  3. Adding Custom Headers: To send headers like User-Agent, Accept, or Authorization with every request.
  4. Using Proxies: To route your requests through a proxy server.
  5. Handling Basic/Digest Authentication: To access protected resources that require a username and password.

How to Use build_opener (The Core Concept)

The process is always the same:

  1. Import the necessary modules: urllib.request and any specific handlers you need.
  2. Instantiate Handlers: Create the handler objects you want to add to your opener.
  3. Build the Opener: Pass your list of handlers to build_opener().
  4. Install the Opener (Optional but Recommended): Call install_opener(). This makes your custom opener the default for the entire Python process, so you can just use urlopen() without having to type opener.open() every time.
  5. Open the URL: Use your new opener to open the URL.

Let's look at the most common use cases.


Use Case 1: Handling Cookies (Very Common)

This is the classic example. You want to log in to a website and then make a subsequent request while staying logged in.

import urllib.request
import urllib.parse
import http.cookiejar
import json
# 1. Create a cookie jar to store cookies
cookie_jar = http.cookiejar.CookieJar()
# 2. Create a cookie handler and install it
# The HTTPCookieProcessor takes the cookie jar
cookie_handler = urllib.request.HTTPCookieProcessor(cookie_jar)
opener = urllib.request.build_opener(cookie_handler)
urllib.request.install_opener(opener) # Now, all urlopen calls will use this opener
# --- First Request: Login ---
# We'll use httpbin.org, a great testing service that echoes back what you send it.
login_url = 'https://httpbin.org/post'
login_data = {
    'username': 'testuser',
    'password': 'password123'
}
# Encode the data to be sent in the POST request
encoded_data = urllib.parse.urlencode(login_data).encode('utf-8')
print("--- Making login request... ---")
# This request will send the login data and httpbin.org will send back a cookie
# in the response header. Our cookie handler will automatically save it.
with urllib.request.urlopen(login_url, data=encoded_data) as response:
    login_response = response.read().decode('utf-8')
    print("Login response status:", response.status)
    # print("Login response body:", login_response)
# --- Second Request: Access a protected page ---
# Now, we'll make another request to a different endpoint.
# The cookie handler will automatically attach the saved cookie to this request.
protected_url = 'https://httpbin.org/cookies'
print("\n--- Making second request (with cookies)... ---")
with urllib.request.urlopen(protected_url) as response:
    protected_response = response.read().decode('utf-8')
    # The response will show the cookies we sent, proving they were attached!
    print("Protected response body:", protected_response)
    # You'll see something like: "cookies": {"username": "testuser", "password": "password123"}

Use Case 2: Adding Custom Headers

Sometimes you need to pretend to be a real browser or send an API key.

import urllib.request
# 1. Define the headers you want to send
headers = {
    'User-Agent': 'MyCoolApp/1.0 (myemail@example.com)',
    'Accept': 'application/json', # Tell the server we want JSON back
    'X-Custom-API-Key': 'abcdef123456'
}
# 2. Create a request object
# You must create a Request object to add headers
request = urllib.request.Request('https://httpbin.org/user-agent', headers=headers)
# 3. Build a simple opener (no special handlers needed here)
opener = urllib.request.build_opener()
# 4. Open the URL using the request object
print("--- Request with custom headers ---")
with opener.open(request) as response:
    response_data = response.read().decode('utf-8')
    print("Response Body:", response_data)
    # You will see the User-Agent you sent in the response, proving it worked.

Use Case 3: Handling HTTP Errors and Redirects

By default, urlopen raises a generic URLError for HTTP errors. You can install a handler that raises more specific HTTPError exceptions.

import urllib.request
import urllib.error
# 1. Create an HTTP error processor
# This handler will raise an HTTPError for non-200 status codes
http_error_handler = urllib.request.HTTPErrorProcessor()
# 2. Build the opener with this handler
opener = urllib.request.build_opener(http_error_handler)
# 3. Try to open a URL that will return a 404 Not Found
try:
    print("--- Attempting to open a non-existent URL ---")
    with opener.open('https://httpbin.org/status/404') as response:
        print("This line should not be reached.")
        print("Status Code:", response.status)
except urllib.error.HTTPError as e:
    # Now we catch a specific HTTPError
    print(f"\nCaught an HTTPError: {e.code} {e.reason}")
    print("Headers:", e.headers)
except urllib.error.URLError as e:
    # This would catch other URL-related errors (like DNS failure)
    print(f"\nCaught a URLError: {e.reason}")

Built-in Handlers

You don't have to create handlers from scratch. urllib.request provides many:

Handler Class Purpose
HTTPHandler / HTTPSHandler Handles HTTP and HTTPS requests. These are included by default in build_opener.
HTTPCookieProcessor Manages cookies using a CookieJar.
HTTPRedirectHandler Handles HTTP 3xx redirects. Also included by default.
ProxyHandler Routes requests through a proxy.
HTTPBasicAuthHandler / HTTPDigestAuthHandler Handles HTTP Basic and Digest Authentication.
HTTPHandler(debuglevel=1) Enable debugging for HTTP traffic (very useful for seeing raw requests/responses).

Key Takeaways

Feature urllib.request.urlopen() urllib.request.build_opener()
Simplicity High. One line of code. More setup. Requires creating handlers.
Flexibility Low. Uses default handlers. High. You can mix and match any handlers.
Use Case Quick, simple GET/POST requests where you don't need state (like cookies). Complex scenarios requiring cookies, custom headers, proxies, or special authentication.
Default Behavior Creates a temporary opener with default settings for that single call. Creates a persistent, reusable opener object that you can install as the default.

Modern Alternative: requests Library

While urllib is built-in and powerful, the third-party requests library is the de-facto standard for HTTP in Python today. It simplifies all the concepts above into a much more user-friendly API.

Here's the cookie example using requests:

import requests
# 1. Create a session object. This is like a pre-configured opener.
# It automatically handles cookies for you.
session = requests.Session()
# 2. Make the login request
login_url = 'https://httpbin.org/post'
login_data = {'username': 'testuser', 'password': 'password123'}
print("--- Making login request with requests... ---")
# The session object stores the cookies automatically
response = session.post(login_url, data=login_data)
print("Login response status:", response.status_code)
# 3. Make the second request using the same session
protected_url = 'https://httpbin.org/cookies'
print("\n--- Making second request (with cookies)... ---")
response = session.get(protected_url)
print("Protected response body:", response.json())

As you can see, requests abstracts away the build_opener and handler concepts, making the code cleaner and more intuitive. For any new project, requests is highly recommended. However, understanding build_opener is still valuable for understanding how HTTP clients work under the hood and for working in environments where you can't install third-party libraries.

分享:
扫描分享到社交APP
上一篇
下一篇