杰瑞科技汇

Python中URL编码怎么用?

Of course! Here's a comprehensive guide on how to use URL encoding in Python, covering the standard library, third-party libraries, and best practices.

What is URL Encoding?

URL encoding (also known as percent-encoding) is a mechanism to encode information in a Uniform Resource Identifier (URI). Certain characters, like spaces, &, , and , have special meaning in URLs. If you want to include these characters as data, you must encode them.

For example, a space is encoded as %20, and a is encoded as %23.


The Standard Library: urllib.parse

Python's built-in urllib.parse module is the standard and recommended way to handle URL encoding and decoding. It provides two main functions: quote() and urlencode().

A. urllib.parse.quote() - For encoding individual strings

Use this function to encode a single string that will be used as a component of a URL (like a path, a query parameter value, etc.).

Syntax: urllib.parse.quote(string, safe='/', encoding=None, errors=None)

  • string: The string to encode.
  • safe: An additional string of characters that should not be encoded. The default is , which is useful for encoding a full path.
  • encoding: The encoding to use (e.g., 'utf-8'). Defaults to 'utf-8'.
  • errors: Specifies how to handle encoding errors (e.g., 'strict', 'ignore').

Example: Encoding a Path and a Query Parameter

from urllib.parse import quote
# A path segment that contains a space
path_segment = "my documents"
encoded_path = quote(path_segment)
print(f"Path: /{encoded_path}")
# Output: Path: /my%20documents
# A query parameter value that contains special characters
query_value = "100% natural & healthy"
encoded_query_value = quote(query_value)
print(f"URL: ?search={encoded_query_value}")
# Output: URL: ?search=100%25%20natural%20%26%20healthy
# Note: % becomes %25 and & becomes %26

B. urllib.parse.urlencode() - For encoding dictionaries of query parameters

This is the most convenient function for building the query string part of a URL (the part after the ). It takes a dictionary (or a sequence of key-value pairs) and returns a properly formatted query string.

Syntax: urllib.parse.urlencode(query, doseq=False, safe='', encoding=None, errors=None, quote_via=quote)

  • query: A dictionary where keys are parameter names and values are parameter values.
  • doseq: If True, the values that are sequences (like lists) will be encoded as multiple key-value pairs (e.g., ?a=1&a=2). If False, the sequence is converted to a string (e.g., ?a=[1, 2]).

Example: Encoding a Dictionary

from urllib.parse import urlencode
# A dictionary of query parameters
params = {
    'search': 'python tutorials',
    'category': 'web development',
    'page': 1
}
# Encode the dictionary into a query string
query_string = urlencode(params)
print(f"Full URL: https://example.com/search?{query_string}")
# Output: Full URL: https://example.com/search?search=python+tutorials&category=web+development&page=1
# Note: Spaces are encoded as + by default for query parameters.

Example: Handling Multiple Values for a Single Key

from urllib.parse import urlencode
# A dictionary where one key has a list of values
params = {
    'filter': ['new', 'popular'],
    'sort': 'date'
}
# doseq=True ensures each item in the list gets its own parameter
query_string = urlencode(params, doseq=True)
print(f"URL: ?{query_string}")
# Output: URL: ?filter=new&filter=popular&sort=date

Third-Party Libraries: requests

The popular requests library simplifies working with HTTP requests and handles URL encoding for you automatically when you pass parameters. This is often the easiest method in practice.

A. requests.get() with params

When you use the params argument in a requests function, it automatically takes a dictionary and encodes it into a query string.

Example: Automatic Encoding with requests

import requests
# The dictionary of parameters
params = {
    'q': 'what is a url?',
    'source': 'web',
    'limit': 10
}
# requests automatically encodes the 'params' dictionary
# You don't need to call urlencode() yourself
response = requests.get('https://api.example.com/search', params=params)
print(f"Request URL: {response.url}")
# Output: Request URL: https://api.example.com/search?q=what+is+a+url%3F&source=web&limit=10
# Note: ? becomes %3F and spaces become +

Decoding URLs

You'll also need to decode URLs. The urllib.parse module has unquote() for this.

urllib.parse.unquote()

This function converts percent-encoded characters back to their original form.

Example: Decoding a URL

from urllib.parse import unquote
# A URL-encoded string
encoded_string = "Hello%20World%21%20This%20is%20a%20test."
decoded_string = unquote(encoded_string)
print(f"Encoded: {encoded_string}")
print(f"Decoded: {decoded_string}")
# Output:
# Encoded: Hello%20World%21%20This%20is%20a%20test.
# Decoded: Hello World! This is a test.

Summary: Which One to Use?

Use Case Recommended Function Why?
Manually building a URL string urllib.parse.quote() Gives you fine-grained control over which characters are considered "safe".
Building a query string from a dictionary urllib.parse.urlencode() The standard, clean, and Pythonic way. Handles encoding for you.
Making HTTP requests with parameters requests.get(..., params=...) The easiest and most common way. It's convenient and less error-prone.
Decoding a URL or part of a URL urllib.parse.unquote() The standard function for reversing the encoding process.

Complete Example: Building a Full URL

Here is a complete example showing how to build a full, valid URL from parts.

from urllib.parse import quote, urlencode, urlunparse, urlparse
# 1. Define the components of the URL
base_url = "https://www.example.com/search"
search_term = "python urllib tutorial"
filters = ["beginner", "advanced"]
page_number = 2
# 2. Encode the path component (if it had special chars)
# In this case, the base_url is already clean.
# 3. Encode the query parameters
query_params = {
    'q': search_term,
    'filter': filters, # This is a list
    'page': page_number
}
encoded_query = urlencode(query_params, doseq=True)
# 4. Construct the final URL
# urlunparse takes a 6-tuple: (scheme, netloc, path, params, query, fragment)
final_url = urlunparse((
    'https',       # scheme
    'www.example.com', # netloc
    '/search',     # path
    '',            # params (rarely used)
    encoded_query, # query
    ''             # fragment
))
print("--- Manual Construction ---")
print(f"Final URL: {final_url}")
# Output: Final URL: https://www.example.com/search?q=python+urllib+tutorial&filter=beginner&filter=advanced&page=2
# 5. Easier way with requests (recommended for HTTP requests)
import requests
params = {
    'q': search_term,
    'filter': filters,
    'page': page_number
}
requests_url = "https://www.example.com/search"
# requests automatically builds the URL
final_url_requests = requests_url + "?" + urlencode(params)
print("\n--- Using requests ---")
print(f"Final URL: {final_url_requests}")
# Output: Final URL: https://www.example.com/search?q=python+urllib+tutorial&filter=beginner&filter=advanced&page=2
分享:
扫描分享到社交APP
上一篇
下一篇