杰瑞科技汇

Python FormRequest 如何正确使用?

Of course! In Python, a "form request" isn't a single, built-in object like in some web frameworks (e.g., Django's FormRequest in Scrapy). Instead, it's a general concept that refers to sending data to a server using an HTML form.

Python FormRequest 如何正确使用?-图1
(图片来源网络,侵删)

This is typically done using the HTTP POST method. The data is sent in the body of the request, encoded in a specific format.

Let's break down the concept and then look at the most common ways to implement it in Python.


The Core Concept: What is a Form Request?

When you fill out a form on a website (like a login form, a search bar, or a contact form) and click "Submit", your browser does the following:

  1. Collects Data: Gathers all the data from the input fields (<input>, <textarea>, <select>, etc.).
  2. Encodes Data: Formats the data into a key-value pair structure. The two most common encodings are:
    • application/x-www-form-urlencoded: This is the default. Data looks like name=John+Doe&email=john%40example.com&message=Hello+World. Spaces are replaced by and special characters are percent-encoded.
    • multipart/form-data: This is used when the form contains file uploads. It's more complex and involves boundaries to separate the different parts of the form.
  3. Sends an HTTP Request: The browser sends an HTTP POST request to the URL specified in the form's action attribute. The encoded data is sent in the body of the request.

In Python, you replicate this process using libraries like requests.


The Standard Library: urllib

Before third-party libraries were popular, Python's standard library urllib was used for this. It's a bit more verbose.

import urllib.parse
import urllib.request
# The URL where the form data will be sent
url = 'https://httpbin.org/post' # A test endpoint that echoes back what you send
# The form data as a dictionary
form_data = {
    'username': 'test_user',
    'password': 'secure_password123',
    'message': 'Hello from urllib!'
}
# Encode the dictionary into the application/x-www-form-urlencoded format
# The 'utf-8' encoding ensures special characters are handled correctly
encoded_data = urllib.parse.urlencode(form_data).encode('utf-8')
# Create a request object
# We specify the data and the Content-Type header
request = urllib.request.Request(url, data=encoded_data, method='POST')
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
# Send the request and get the response
try:
    with urllib.request.urlopen(request) as response:
        # Read the response content
        response_body = response.read()
        print("Status Code:", response.status)
        print("Response Body:")
        print(response_body.decode('utf-8'))
except urllib.error.URLError as e:
    print(f"Failed to reach the server. Reason: {e.reason}")

The Modern & Recommended Approach: requests Library

The requests library is the de facto standard for making HTTP requests in Python. It's much simpler and more intuitive.

First, you need to install it:

pip install requests

Here's how to do the same thing with requests:

import requests
# The URL where the form data will be sent
url = 'https://httpbin.org/post' # A test endpoint
# The form data as a dictionary
form_data = {
    'username': 'test_user',
    'password': 'secure_password123',
    'message': 'Hello from the requests library!'
}
# The `requests.post()` function automatically:
# 1. Encodes the dictionary into application/x-www-form-urlencoded.
# 2. Sets the 'Content-Type' header correctly.
# 3. Sends the data in the request body.
response = requests.post(url, data=form_data)
# Check if the request was successful (status code 2xx)
response.raise_for_status() # This will raise an exception for bad status codes
# Print the response content
print("Status Code:", response.status_code)
print("Response Body (JSON):")
# httpbin.org/post returns the data we sent in a JSON format
print(response.json())

Handling Multipart/Form-Data (with Files)

If your form includes file uploads, you must use multipart/form-data. The requests library handles this seamlessly by using the files argument.

import requests
url = 'https://httpbin.org/post'
# Form data (fields)
form_data = {
    'description': 'A profile picture for John Doe'
}
# File to upload
# The key 'profile_pic' should match the name attribute of the <input type="file"> in the HTML form.
# The tuple contains (filename, file_object, content_type)
with open('profile_pic.jpg', 'rb') as f:
    files = {
        'profile_pic': ('profile_pic.jpg', f, 'image/jpeg')
    }
    # The `requests.post()` function will automatically set the Content-Type
    # to 'multipart/form-data' and generate the correct boundary.
    response = requests.post(url, data=form_data, files=files)
print("Status Code:", response.status_code)
print("Response Body (JSON):")
print(response.json())

Advanced Use Case: Handling CSRF Tokens with requests-html

Modern websites often use Cross-Site Request Forgery (CSRF) tokens for security. These are hidden input values in the form that you must send back with your request. Parsing HTML to find these tokens can be tricky.

The requests-html library is great for this as it can parse JavaScript-rendered pages and execute JavaScript.

First, install it:

pip install requests-html

Here’s a conceptual example of how you'd log in to a site that uses a CSRF token.

from requests_html import HTMLSession
# Start a session to persist cookies (the CSRF token is often stored in a cookie)
session = HTMLSession()
# 1. GET the login page to retrieve the CSRF token
login_url = 'https://example.com/login'
response = session.get(login_url)
# 2. Find the CSRF token in the HTML
# This selector is just an example; you need to inspect the actual website's HTML
csrf_token = response.html.find('input[name="csrf_token"]', first=True).attrs['value']
print(f"Found CSRF Token: {csrf_token}")
# 3. Prepare the login data with the token
login_data = {
    'username': 'your_username',
    'password': 'your_password',
    'csrf_token': csrf_token  # Include the token in the form data
}
# 4. POST the login data to the login URL
# The session object will automatically send cookies with the request
response = session.post('https://example.com/login', data=login_data)
# 5. Check if login was successful
if "Welcome" in response.text:
    print("Login successful!")
    # You can now use the same session object to access protected pages
    dashboard_response = session.get('https://example.com/dashboard')
    print("Dashboard page title:", dashboard_response.html.find('title', first=True).text)
else:
    print("Login failed.")

Summary

Method Best For Pros Cons
urllib Simple scripts, environments where you can't install packages. Part of the standard library. Verbose, less intuitive API.
requests Most common use cases. Simple form data, JSON APIs, file uploads. Extremely easy to use, readable syntax, handles encoding and headers automatically. Requires an external installation (pip install requests).
requests-html Websites with CSRF tokens or that render content with JavaScript. Can parse HTML, execute JS, and maintain sessions easily. Slower than requests due to JS rendering.

For almost all Python development, start with the requests library. It's the standard, most powerful, and easiest way to handle form requests.

分享:
扫描分享到社交APP
上一篇
下一篇