杰瑞科技汇

Python 3.4中urllib2如何使用?

Of course. Here's a comprehensive guide to using urllib2 in Python 3.4.

Python 3.4中urllib2如何使用?-图1
(图片来源网络,侵删)

Important: urllib2 in Python 3.4

First, a critical point of clarification: urllib2 does not exist in Python 3. The urllib library was refactored and split into several modules in Python 3.

In Python 3.4, the functionality of the old Python 2 urllib2 is spread across two main modules:

  • urllib.request: For opening and reading URLs (this is the direct replacement for urllib2).
  • urllib.error: Contains exception classes raised by urllib.request (this replaces urllib2's URLError and HTTPError).

So, when you see a Python 2 tutorial using import urllib2, you need to translate it to import urllib.request in Python 3.4.


Quick Translation Guide: Python 2 urllib2 vs. Python 3.4

Python 2 (urllib2) Python 3.4 (urllib) Description
import urllib2 import urllib.request Main module for opening URLs
import urllib import urllib.parse For parsing URLs (was in urllib in Py2)
urllib2.urlopen() urllib.request.urlopen() Function to open a URL
urllib2.Request() urllib.request.Request() Class to create a request object
urllib2.HTTPError urllib.error.HTTPError Exception for HTTP errors
urllib2.URLError urllib.error.URLError Exception for URL errors (e.g., network down)
urllib.urlencode() urllib.parse.urlencode() To encode form data

Core Functionality in Python 3.4

Let's dive into the most common tasks.

Python 3.4中urllib2如何使用?-图2
(图片来源网络,侵删)

Making a Simple GET Request

This is the most basic operation: fetching the content of a webpage.

# In Python 2:
# import urllib2
# response = urllib2.urlopen('http://python.org')
# In Python 3.4:
import urllib.request
try:
    # Open the URL and get a response object
    with urllib.request.urlopen('http://python.org') as response:
        # Read the response data (it's in bytes)
        html = response.read()
    # The data is in bytes, so we need to decode it to a string (e.g., using UTF-8)
    html_string = html.decode('utf-8')
    print(html_string[:200]) # Print the first 200 characters
except urllib.error.URLError as e:
    print(f"Failed to reach the server. Reason: {e.reason}")

Explanation:

  • urllib.request.urlopen() opens the URL.
  • The with statement ensures the network connection is properly closed.
  • response.read() returns the entire content of the response as a bytes object.
  • .decode('utf-8') converts the bytes into a human-readable string.

Making a POST Request (Sending Form Data)

To send data to a server (like a login form), you need to make a POST request. The data must be encoded.

import urllib.request
import urllib.parse
# The data to be sent (as a dictionary)
data = {
    'username': 'john_doe',
    'password': 'secret_password'
}
# Encode the data into bytes
# urlencode converts the dict into 'username=john_doe&password=secret_password'
encoded_data = urllib.parse.urlencode(data).encode('utf-8')
# The URL to send the POST request to
url = 'http://httpbin.org/post' # A testing service that echoes back what you send
try:
    # Create a request object
    request = urllib.request.Request(url, data=encoded_data, method='POST')
    # Add a custom User-Agent header (good practice)
    request.add_header('User-Agent', 'MyCoolApp/1.0')
    # Send the request and get the response
    with urllib.request.urlopen(request) as response:
        response_data = response.read()
    # Decode and print the server's response
    print(response_data.decode('utf-8'))
except urllib.error.HTTPError as e:
    print(f"HTTP Error: {e.code} - {e.reason}")
except urllib.error.URLError as e:
    print(f"URL Error: {e.reason}")

Explanation:

  1. urllib.parse.urlencode(): This function takes a dictionary of data and converts it into a URL-encoded query string (e.g., key1=value1&key2=value2).
  2. .encode('utf-8'): The urlopen function requires the data to be in bytes, so we encode the string.
  3. urllib.request.Request(): We create a request object. This allows us to specify the URL, the data to send, and the HTTP method.
  4. method='POST': Explicitly sets the request method to POST.
  5. request.add_header(): You can add custom headers to your request. This is very common for things like User-Agent, Authorization, Content-Type, etc.
  6. Error Handling: We catch HTTPError (for bad status codes like 404, 500) and URLError (for network-level problems like DNS failure).

Handling Cookies

urllib.request has built-in support for cookies using http.cookiejar. This is essential for websites that require a login session.

import urllib.request
import urllib.parse
import http.cookiejar
import json
# 1. Create a CookieJar object to store cookies
cookie_jar = http.cookiejar.CookieJar()
# 2. Create an opener that will use the CookieJar
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookie_jar))
# 3. Install the opener. Now all urllib.request calls will use it.
urllib.request.install_opener(opener)
# --- First Request: Login to the site ---
login_url = 'http://httpbin.org/post' # Using httpbin to simulate a login form
login_data = urllib.parse.urlencode({'username': 'test', 'password': 'test'}).encode('utf-8')
try:
    # This request will send the data and the server will send back cookies
    # which are automatically stored in our cookie_jar
    with urllib.request.urlopen(login_url, data=login_data) as response:
        print("Login successful. Cookies received.")
        # print(response.read().decode('utf-8'))
    # --- Second Request: Access a protected page ---
    # The opener will automatically send the stored cookies with this request
    protected_url = 'http://httpbin.org/cookies' # This page shows the cookies it received
    with urllib.request.urlopen(protected_url) as response:
        # The server now knows we are logged in
        protected_data = response.read().decode('utf-8')
        print("\nAccessing protected page:")
        print(protected_data)
except urllib.error.URLError as e:
    print(f"An error occurred: {e.reason}")
# You can inspect the cookies that were stored
print("\nCookies stored in the jar:")
for cookie in cookie_jar:
    print(cookie)

The Modern Alternative: requests

While urllib.request is powerful and built-in, it can be verbose and unintuitive. The requests library is the de-facto standard for HTTP in Python because it's much simpler and more user-friendly.

If you can, it's highly recommended to use requests instead.

Here's how you would do the same tasks with requests:

Installation

pip install requests

GET Request with requests

import requests
try:
    response = requests.get('http://python.org')
    # Automatically decodes to a string (if possible)
    html = response.text 
    print(html[:200])
    # JSON responses are automatically parsed into a Python dict
    # response.json() 
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

POST Request with requests

import requests
url = 'http://httpbin.org/post'
data = {
    'username': 'john_doe',
    'password': 'secret_password'
}
# Headers are easy to add
headers = {
    'User-Agent': 'MyCoolApp/1.0'
}
try:
    response = requests.post(url, data=data, headers=headers)
    # Raise an exception for bad status codes (4xx or 5xx)
    response.raise_for_status() 
    # The response is already decoded as a string
    print(response.json()) # httpbin.org/post returns JSON
except requests.exceptions.HTTPError as e:
    print(f"HTTP Error: {e}")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

As you can see, requests handles encoding, headers, JSON, and error handling in a much cleaner way. It also has excellent support for sessions (which handle cookies automatically), authentication, and timeouts.

分享:
扫描分享到社交APP
上一篇
下一篇