Python urllib.open 如何使用？-杰瑞科技汇

Of course! The urllib.open function is a fundamental part of Python for making HTTP requests. It's part of Python's built-in urllib.request module.

（图片来源网络，侵删）

Let's break it down, from the basic concept to more advanced usage.

The Core Concept: `urllib.request.urlopen()`

The main function you'll use is urllib.request.urlopen(). It opens a URL (like a webpage) and returns a file-like object. This means you can read from it just like you would read from a local file.

Basic Example: Fetching a Web Page

Here's the simplest way to use it to get the HTML of a webpage.

import urllib.request
# The URL you want to open
url = 'https://www.example.com'
try:
    # Open the URL
    # This returns a response object
    with urllib.request.urlopen(url) as response:
        # Read the response and decode it from bytes to a string
        html_content = response.read().decode('utf-8')
        # Print the first 500 characters
        print(html_content[:500])
except urllib.error.URLError as e:
    print(f"Failed to open the URL: {e.reason}")

What's happening here?

（图片来源网络，侵删）

import urllib.request: We import the necessary module.
with urllib.request.urlopen(url) as response:: This opens the URL. The with statement is best practice as it automatically closes the connection for you. The result, response, is a file-like object.
response.read(): This reads the entire content of the response from the server. By default, it returns the content as bytes.
.decode('utf-8'): We convert the bytes object into a human-readable string using UTF-8 encoding, which is common for web pages.
except urllib.error.URLError: This is good practice. If the URL is invalid, the server is down, or there's a network problem, urlopen() raises a URLError.

Working with the Response Object

The object returned by urlopen() has several useful attributes and methods:

response.read(): Reads the entire body of the response.
response.readline(): Reads one line at a time.
response.readlines(): Reads all lines into a list.
response.status: The HTTP status code (e.g., 200 for OK, 404 for Not Found).
response.getcode(): An alias for response.status.
response.headers: A dictionary-like object containing the response headers (e.g., Content-Type, Server).

Example: Inspecting the Response

import urllib.request
url = 'https://httpbin.org/get' # A great site for testing HTTP requests
try:
    with urllib.request.urlopen(url) as response:
        print(f"Status Code: {response.status}")
        print("-" * 30)
        print("Headers:")
        for header, value in response.headers.items():
            print(f"{header}: {value}")
        print("-" * 30)
        print("Response Body (first 200 chars):")
        body = response.read().decode('utf-8')
        print(body[:200])
except urllib.error.URLError as e:
    print(f"Error: {e.reason}")

Making POST Requests

By default, urlopen() makes a GET request. To make a POST request, you need to pass some extra data.

The data must be encoded into bytes.

Example: Making a POST Request

import urllib.request
import urllib.parse
url = 'https://httpbin.org/post'
# Data to send in the POST request
# This should be a dictionary
data = {
    'username': 'testuser',
    'password': 'securepassword123'
}
# Encode the data into bytes
# urllib.parse.urlencode() is perfect for this
post_data = urllib.parse.urlencode(data).encode('utf-8')
try:
    # Create a request object with the URL and data
    request = urllib.request.Request(url, data=post_data, method='POST')
    # Open the request
    with urllib.request.urlopen(request) as response:
        response_body = response.read().decode('utf-8')
        print("POST Request Successful!")
        print(response_body)
except urllib.error.URLError as e:
    print(f"Error: {e.reason}")

Key changes for POST:

urllib.parse.urlencode(data): This takes a dictionary and turns it into a URL-encoded string like username=testuser&password=securepassword123.
.encode('utf-8'): The urlopen() function requires the data to be in bytes.
urllib.request.Request(url, data=post_data, method='POST'): We create a Request object, which allows us to specify the data and the HTTP method.

Adding Headers (e.g., User-Agent)

Some websites block default urllib requests because they don't look like a real browser. You can add headers to your request to make it more legitimate.

Example: Adding a User-Agent Header

import urllib.request
import urllib.parse
url = 'https://httpbin.org/user-agent' # This endpoint returns the User-Agent it sees
# Create a dictionary of headers
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Accept': 'application/json' # Ask for JSON data
}
# Create a request object and add the headers
request = urllib.request.Request(url, headers=headers)
try:
    with urllib.request.urlopen(request) as response:
        response_body = response.read().decode('utf-8')
        print("Request with custom User-Agent:")
        print(response_body)
except urllib.error.URLError as e:
    print(f"Error: {e.reason}")

Handling Timeouts

If a server is slow or unresponsive, your program could hang indefinitely. You should always set a timeout.

Example: Setting a Timeout

import urllib.request
url = 'https://httpbin.org/delay/5' # This endpoint waits 5 seconds before responding
try:
    # Set a timeout of 3 seconds
    # The request will fail if the server doesn't respond within 3 seconds
    with urllib.request.urlopen(url, timeout=3) as response:
        print("Request succeeded within the timeout!")
        print(response.read().decode('utf-8'))
except urllib.error.URLError as e:
    print(f"Request timed out or failed: {e.reason}")

Summary: `urllib.request.urlopen()` vs. `requests`

While urllib is built-in and powerful, the requests library is much more user-friendly and is the de facto standard for most Python developers.

Feature	`urllib.request.urlopen()`	`requests` library
Ease of Use	Verbose, requires manual encoding and decoding.	Very simple, intuitive syntax.
JSON Handling	Requires manual parsing with `json.loads()`.	`response.json()` method for automatic parsing.
Sessions	No built-in session object.	`requests.Session()` handles cookies and connections across requests.
Installation	Built-in with Python.	`pip install requests`
Recommendation	Good for simple scripts or environments where you can't install packages.	Highly recommended for almost all HTTP tasks in Python.

Example with requests for comparison:

# Using the requests library
import requests
url = 'https://httpbin.org/get'
params = {'key1': 'value1', 'key2': 'value2'}
try:
    # GET request with params
    response = requests.get(url, params=params, timeout=5)
    # Automatically raises an exception for bad status codes (4xx or 5xx)
    response.raise_for_status() 
    # Access JSON data directly
    data = response.json()
    print("Status Code:", response.status_code)
    print("JSON Response:", data)
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

Final Recommendation

For learning purposes, understanding urllib is great because it's built-in and shows you what's happening under the hood. However, for any serious project, use the requests library. It will save you a lot of time and make your code cleaner and more robust.

Python urllib.open 如何使用？

The Core Concept: `urllib.request.urlopen()`

Basic Example: Fetching a Web Page

Working with the Response Object

Example: Inspecting the Response

Making POST Requests

Example: Making a POST Request

Adding Headers (e.g., User-Agent)

Example: Adding a User-Agent Header

Handling Timeouts

Example: Setting a Timeout

Summary: `urllib.request.urlopen()` vs. `requests`

Final Recommendation

99ANYc3cd6

Selenium Python如何实现页面滚动？

Python random seed 如何固定随机数？

Picsart手绘教程图解，新手如何快速上手？

Python lambda与list如何结合使用？

如何用Java操作Access的OLE对象？

Java Socket编程实例具体怎么实现？

大白菜U盘启动工具怎么用？

Python编写Metasploit

Python WebDriver实战如何快速上手？

Python json库如何高效处理复杂JSON数据？

Java如何向MySQL高效执行insert语句？

fireworks教程视频哪里找？

华为ec2106v1破解教程

Trados 2011教程该怎么学？

Python如何用crossin处理PDF？

Java字符串占几个字节？

Python urllib.open 如何使用？

The Core Concept: urllib.request.urlopen()

Basic Example: Fetching a Web Page

Working with the Response Object

Example: Inspecting the Response

Making POST Requests

Example: Making a POST Request

Adding Headers (e.g., User-Agent)

Example: Adding a User-Agent Header

Handling Timeouts

Example: Setting a Timeout

Summary: urllib.request.urlopen() vs. requests

Final Recommendation

相关推荐

Java Socket编程实例具体怎么实现？

The Core Concept: `urllib.request.urlopen()`

Summary: `urllib.request.urlopen()` vs. `requests`