杰瑞科技汇

Python HTTP库该如何选择?

Of course! Python's standard library for making HTTP requests is urllib, but it's known for being somewhat clunky and unintuitive. The de facto standard and most popular library by a huge margin is requests.

Python HTTP库该如何选择?-图1
(图片来源网络,侵删)

Here’s a comprehensive breakdown, starting with the recommended choice and then covering the standard library.


The requests Library (Highly Recommended)

requests is an elegant and simple HTTP library for Python, built for human beings. It abstracts away the complexities of urllib and provides a beautiful, easy-to-use API.

Installation

First, you need to install it:

pip install requests

Core Features

  • Simple API: Methods like requests.get(), requests.post(), etc., are intuitive.
  • Automatic Content Decoding: It handles decoding of response content (like gzip) automatically.
  • Session Objects: Allow you to persist parameters across requests (e.g., cookies, headers).
  • JSON Handling: Built-in JSON decoding (response.json()).
  • Timeouts: Easy to set timeouts to prevent your script from hanging.
  • Authentication: Simple support for Basic, Digest, and other authentication schemes.
  • Streaming Requests: Download large files efficiently without loading them all into memory.

requests Quick Start Guide

Here are the most common use cases.

Python HTTP库该如何选择?-图2
(图片来源网络,侵删)

a) Making a GET Request

This is the most common request type, used to retrieve data.

import requests
# The URL of the API you want to query
url = "https://api.github.com"
try:
    # Make a GET request
    response = requests.get(url)
    # Raise an exception if the request was unsuccessful (e.g., 404, 500)
    response.raise_for_status()
    # The response object contains all the data
    print(f"Status Code: {response.status_code}")
    print(f"Headers: {response.headers}")
    # The content of the response is in text format
    print("\nResponse Text (first 200 chars):")
    print(response.text[:200])
    # If the response is in JSON format, you can parse it easily
    if response.headers.get('content-type') == 'application/json':
        data = response.json()
        print(f"\nJSON Response (keys): {list(data.keys())}")
except requests.exceptions.HTTPError as errh:
    print(f"Http Error: {errh}")
except requests.exceptions.ConnectionError as errc:
    print(f"Error Connecting: {errc}")
except requests.exceptions.Timeout as errt:
    print(f"Timeout Error: {errt}")
except requests.exceptions.RequestException as err:
    print(f"Oops: Something Else: {err}")

b) Making a POST Request

Used to send data to a server, typically for creating a new resource.

import requests
import json # Using json.dumps to create a valid JSON string
url = "https://httpbin.org/post" # A testing service that echoes back what you send
# Data to send in the request body
payload = {
    'username': 'newuser',
    'email': 'newuser@example.com'
}
# Headers to specify the content type
headers = {
    'Content-Type': 'application/json'
}
try:
    # Make a POST request with the payload and headers
    response = requests.post(url, data=json.dumps(payload), headers=headers)
    response.raise_for_status() # Check for errors
    # The response will contain the data we sent
    response_data = response.json()
    print("Successfully posted data!")
    print("Server received this JSON:")
    print(json.dumps(response_data['json'], indent=2))
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

c) Passing URL Parameters

URL parameters are often used for filtering or searching. requests makes this easy.

import requests
# The base URL
base_url = "https://httpbin.org/get"
# Parameters to add to the URL
params = {
    'q': 'python programming',
    'sort': 'stars',
    'page': 1
}
try:
    # requests automatically encodes the dictionary and adds it to the URL
    response = requests.get(base_url, params=params)
    response.raise_for_status()
    print("Full URL sent:")
    print(response.url) # The URL with the parameters appended
    print("\nResponse JSON:")
    print(response.json())
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

d) Handling Headers and Authentication

You can easily add custom headers or handle authentication.

Python HTTP库该如何选择?-图3
(图片来源网络,侵删)
import requests
url = "https://api.github.com/user" # This endpoint requires authentication
# Your GitHub Personal Access Token
# IMPORTANT: Don't hardcode tokens in real code! Use environment variables.
token = "YOUR_GITHUB_TOKEN" 
# Headers for authentication
headers = {
    'Authorization': f'token {token}',
    'Accept': 'application/vnd.github.v3+json'
}
try:
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    user_data = response.json()
    print(f"Logged in as: {user_data['login']}")
    print(f"Name: {user_data['name']}")
except requests.exceptions.HTTPError as err:
    if err.response.status_code == 401:
        print("Authentication failed. Check your token.")
    else:
        print(f"HTTP Error: {err}")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

The Standard Library: urllib

urllib is part of Python's standard library, so no installation is needed. It's more powerful but also more verbose and less user-friendly than requests. It's split into several modules:

  • urllib.request: For opening and reading URLs.
  • urllib.parse: For parsing URLs.
  • urllib.error: For exception handling.
  • urllib.robotparser: For parsing robots.txt files.

urllib.request Example (GET Request)

Compare this to the simple requests.get() call.

from urllib.request import urlopen
from urllib.error import URLError, HTTPError
url = "https://api.github.com"
try:
    # urlopen returns a file-like object
    with urlopen(url) as response:
        # Read the response content (bytes)
        body_bytes = response.read()
        # Decode the bytes to a string (e.g., UTF-8)
        body_str = body_bytes.decode('utf-8')
    print(f"Status Code: {response.status}")
    print(f"Headers: {response.headers}")
    print(f"\nResponse Text (first 200 chars):")
    print(body_str[:200])
except HTTPError as e:
    print(f"HTTP Error: {e.code} - {e.reason}")
except URLError as e:
    print(f"URL Error: {e.reason}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

As you can see, handling JSON, adding headers, or making a POST request with urllib is significantly more complex and requires manual work.


Summary: requests vs. urllib

Feature requests urllib (Standard Library)
Ease of Use Excellent. Simple, intuitive API. Poor. Verbose, complex, and less intuitive.
Installation pip install requests Built-in. No installation needed.
JSON Handling Automatic. response.json() is a built-in method. Manual. You must import the json module and parse it yourself.
Session Management Excellent. requests.Session() object for persistence. Possible but complex. Requires manual cookie handling.
URL Parameters Automatic. Pass a dictionary to the params argument. Manual. Use urllib.parse.urlencode().
Authentication Simple. Built-in support for various schemes. Complex. Requires manual setup.
Dependencies requests (and its dependencies like chardet, idna, urllib3) None. Part of the standard library.

Final Recommendation

  • For 99% of use cases, use requests. It's the industry standard, it's easier to write and read, and it will save you a lot of time and frustration.
  • Use urllib only if you cannot install external libraries (e.g., in some restricted environments) or if you are writing a very basic script where adding a dependency is undesirable.
分享:
扫描分享到社交APP
上一篇
下一篇