杰瑞科技汇

Python requests如何设置IP代理?

Of course! When using the requests library in Python, you can control the IP address from which your requests originate. This is most commonly done using a proxy server.

Python requests如何设置IP代理?-图1
(图片来源网络,侵删)

Here's a comprehensive guide covering the different methods, from the simplest to more advanced use cases.


The Core Concept: Proxies

A proxy server acts as an intermediary between your computer and the internet. Instead of your request going directly to the target website, it first goes to the proxy. The proxy then forwards your request to the website using its own IP address. The website sees the proxy's IP, not yours.

Method 1: Using a Single Proxy (The Basics)

This is the most straightforward method. You provide the proxy's address and port to the proxies parameter in your requests call.

How to Format the Proxy Dictionary

The proxies parameter expects a dictionary with keys 'http' and 'https', even if you're only making an HTTP request. It's good practice to always include both.

Python requests如何设置IP代理?-图2
(图片来源网络,侵删)
proxies = {
  'http': 'http://your_proxy_ip:port',
  'https': 'http://your_proxy_ip:port'
}

Important: The protocol for the proxy URL (http://) is usually http, even if you are making an https request. This is because the proxy itself is communicating with your script over HTTP, even if it then forwards an HTTPS request to the final destination.

Complete Example

Let's say you have a proxy server at 168.1.100 on port 8080.

import requests
# The proxy server's address and port
proxy_ip = '192.168.1.100'
proxy_port = '8080'
# Format the proxy dictionary
proxies = {
    'http': f'http://{proxy_ip}:{proxy_port}',
    'https': f'http://{proxy_ip}:{proxy_port}'
}
try:
    # The URL you want to visit
    url = 'https://httpbin.org/ip'  # This URL simply returns the IP address it sees
    # Make the request, passing the proxies dictionary
    response = requests.get(url, proxies=proxies, timeout=10)
    # Check if the request was successful
    response.raise_for_status() 
    print(f"Successfully connected to {url}")
    print(f"Status Code: {response.status_code}")
    print("Response Body (the IP seen by the server):")
    print(response.json()) # httpbin.org/ip returns a JSON like {"origin": "x.x.x.x"}
except requests.exceptions.ProxyError as e:
    print(f"Proxy Error: Could not connect to the proxy. Check the IP and port. Error: {e}")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

When you run this, the response.json() will show the IP address of your proxy server (168.1.100), not your own public IP.


Method 2: Using a Username and Password (Authenticated Proxies)

Many public proxies require authentication. You just need to add the username and password to the proxy URL.

The format is: http://username:password@ip:port

Example with Authentication

import requests
# Proxy with username and password
proxy_user = 'myuser'
proxy_pass = 'mypassword'
proxy_ip = '123.45.67.89'
proxy_port = '3128'
proxies = {
    'http': f'http://{proxy_user}:{proxy_pass}@{proxy_ip}:{proxy_port}',
    'https': f'http://{proxy_user}:{proxy_pass}@{proxy_ip}:{proxy_port}'
}
url = 'https://httpbin.org/ip'
try:
    response = requests.get(url, proxies=proxies, timeout=10)
    response.raise_for_status()
    print("Successfully connected using an authenticated proxy.")
    print("Response Body (the IP seen by the server):")
    print(response.json())
except requests.exceptions.ProxyError as e:
    print(f"Proxy Error: Authentication might have failed or proxy is down. Error: {e}")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

Method 3: Using a Proxy Rotation Service (Best for Scraping)

If you are scraping data at scale, using a single proxy will get you blocked quickly. The solution is to use a proxy rotation service. These services provide you with a list of proxies, and you can switch between them for each request.

Many services also provide a special "proxy URL" that automatically rotates the proxy for you. You just use that single URL in your proxies dictionary.

Example with a Rotating Proxy URL

Let's say your proxy rotation service gives you the endpoint proxy.rotatingservice.com:8080.

import requests
import random
# A list of proxies from a rotation service
# In a real scenario, you would fetch this list from an API
PROXY_LIST = [
    'http://user1:pass1@proxy1.rotatingservice.com:8080',
    'http://user2:pass2@proxy2.rotatingservice.com:8080',
    'http://user3:pass3@proxy3.rotatingservice.com:8080',
]
def make_request_through_random_proxy(url):
    """Makes a request using a randomly selected proxy from the list."""
    proxy = random.choice(PROXY_LIST)
    proxies = {
        'http': proxy,
        'https': proxy
    }
    try:
        print(f"Requesting through proxy: {proxy}")
        response = requests.get(url, proxies=proxies, timeout=10)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Request failed with proxy {proxy}. Error: {e}")
        return None
# --- Main execution ---
target_url = 'https://httpbin.org/ip'
for i in range(5):
    print(f"\n--- Request #{i+1} ---")
    ip_data = make_request_through_random_proxy(target_url)
    if ip_data:
        print(f"IP received from server: {ip_data.get('origin')}")

This script will make 5 requests, each time using a different, randomly chosen proxy from the list.


Important Considerations and Best Practices

  1. Timeouts: Proxies can be slow or unreliable. Always use a timeout argument in your requests call to prevent your script from hanging indefinitely.

    response = requests.get(url, proxies=proxies, timeout=15) # 15 second timeout
  2. HTTPS and SSL Verification:

    • Some proxies, especially older or free ones, might not have valid SSL certificates.
    • If you get an SSLError, you might need to disable SSL verification. This is less secure, so use it with caution.
      # WARNING: This disables SSL certificate verification
      response = requests.get(url, proxies=proxies, verify=False)

      You can also suppress the warning:

      import urllib3
      urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
  3. Finding Proxies:

    • Free Proxies: Search for "free proxy list". Be warned: they are often unreliable, slow, and may be monitored or contain malware. Never use them for anything sensitive.
    • Paid Proxy Services: For serious scraping or automation, use a reputable paid service. They offer reliability, speed, authentication, and rotation. Examples include Bright Data, Smartproxy, OxyLabs, etc.
  4. Session Objects: If you are making multiple requests to the same domain, it's more efficient to use a Session object. You can set the proxies on the session, and they will be used for all subsequent requests made with that session.

    import requests
    proxies = {'http': 'http://192.168.1.100:8080', 'https': 'http://192.168.1.100:8080'}
    url = 'https://httpbin.org'
    # Create a session and set the proxies
    with requests.Session() as session:
        session.proxies = proxies
        # All requests made with this session will use the proxy
        response1 = session.get(url + '/get')
        response2 = session.get(url '/ip')
        print("Session Request 1 Origin:", response1.json()['origin'])
        print("Session Request 2 Origin:", response2.json()['origin'])
分享:
扫描分享到社交APP
上一篇
下一篇