Of course! Here’s a comprehensive guide to rate limiting in Python, covering the concepts, popular libraries, and code examples.

What is Rate Limiting?
Rate limiting is a technique used to control the rate at which a user or client can make requests to a server or API. It's a crucial mechanism for:
- Preventing Abuse: Stopping bots or malicious actors from overwhelming your service.
- Ensuring Fairness: Guaranteeing that all users have equitable access to a resource.
- Managing Load: Protecting your backend from being overloaded, which can cause slowdowns or crashes.
- Cost Control: For third-party APIs, it helps you stay within your usage quota and avoid unexpected charges.
Core Concepts
Before diving into code, let's understand the main types of rate limiters:
-
Fixed Window Counter: This is the simplest approach. You define a window (e.g., 1 minute) and a limit (e.g., 100 requests). The counter resets at the start of each new window.
- Pros: Simple to implement.
- Cons: "Burstiness" problem. If a user makes 100 requests in the last second of a window, they can immediately make another 100 in the first second of the next window.
-
Sliding Window Log: This approach is more precise. For each request, you store its timestamp. When a new request comes in, you remove all timestamps older than your time window (e.g., 1 minute). You then check if the number of remaining timestamps is under the limit.
(图片来源网络,侵删)- Pros: Prevents the burstiness problem.
- Cons: Can be memory-intensive, as you need to store timestamps for every request in the current window.
-
Sliding Window Counter (or Token Bucket): This is a popular hybrid model. Imagine a "bucket" that holds tokens. Tokens are added to the bucket at a fixed rate (e.g., 10 tokens per second). Each request consumes one token. If the bucket is empty, the request is denied.
- Pros: Smooths out request rates and is very efficient. It allows for bursts of traffic (if the bucket has tokens) but enforces a long-term average rate.
- Cons: Can be slightly more complex to implement from scratch.
Method 1: The Manual Approach (Token Bucket)
Understanding how to build a rate limiter yourself is a great learning experience. Here's a simple, thread-safe implementation of a Token Bucket limiter using time and threading.Lock.
import time
import threading
class TokenBucket:
def __init__(self, rate, capacity):
"""
Initializes the Token Bucket.
:param rate: The rate at which tokens are added (tokens per second).
:param capacity: The maximum number of tokens the bucket can hold.
"""
self.rate = rate # tokens per second
self.capacity = capacity
self.tokens = capacity
self.last_refilled = time.time()
self.lock = threading.Lock()
def _refill(self):
"""Refills the bucket with tokens based on elapsed time."""
now = time.time()
time_passed = now - self.last_refilled
tokens_to_add = time_passed * self.rate
# Add tokens, but don't exceed capacity
with self.lock:
self.tokens = min(self.capacity, self.tokens + tokens_to_add)
self.last_refilled = now
def consume(self, tokens=1):
"""
Consumes a number of tokens from the bucket.
:param tokens: The number of tokens to consume.
:return: True if tokens were consumed, False otherwise.
"""
self._refill()
with self.lock:
if self.tokens >= tokens:
self.tokens -= tokens
return True
return False
# --- Example Usage ---
if __name__ == "__main__":
# Allow 10 requests per second, with a burst capacity of 10
limiter = TokenBucket(rate=10, capacity=10)
for i in range(15):
if limiter.consume(1):
print(f"Request {i+1}: Allowed")
else:
print(f"Request {i+1}: Rate Limited! Waiting...")
# In a real app, you might wait and try again
time.sleep(0.1) # Simulate waiting a bit
time.sleep(0.05) # Simulate a small delay between requests
Method 2: Using Popular Libraries
For production code, it's almost always better to use a well-tested, feature-rich library.
ratelimit
A simple and popular decorator-based library.
Installation:
pip install ratelimit
Example:
from ratelimit import limits, sleep_and_retry
import time
# Define the rate limit: 10 calls per 10 seconds
CALLS_PER_10_SECONDS = 10
@sleep_and_retry
@limits(calls=CALLS_PER_10_SECONDS, period=10)
def limited_api_call():
print("API call successful at:", time.time())
return "data"
if __name__ == "__main__":
print("Starting 15 rapid calls...")
for i in range(15):
limited_api_call()
# The @sleep_and_retry decorator will automatically pause
# if the limit is reached.
pyrate-limiter
A more powerful and flexible library that supports different rate-limiting algorithms (Fixed Window, Sliding Window, etc.) and can be used as a decorator or a direct class.
Installation:
pip install pyrate-limiter
Example (Decorator with Sliding Window):
from pyrate_limiter import Limiter, Duration, RequestRate
from pyrate_limiter.decorators import RateLimited
import time
# Define the rate limit: 10 calls per 10 seconds using a sliding window
limiter = Limiter(RequestRate(10, Duration.SECOND * 10))
@RateLimited(limiter)
def api_call():
print("API call successful at:", time.time())
return "data"
if __name__ == "__main__":
print("Starting 15 rapid calls...")
for i in range(15):
try:
api_call()
except Exception as e:
print(f"Call {i+1} was rate-limited.")
# The library will raise an exception by default
# You can handle it or configure the limiter to wait.
fastapi-limiter
If you are using the FastAPI web framework, this is the go-to library. It integrates seamlessly with FastAPI's dependency injection system.
Installation:
pip install fastapi-limiter
Example (FastAPI Application):
from fastapi import FastAPI, Request, HTTPException
from fastapi_limiter import FastAPILimiter
from fastapi_limiter.depends import RateLimiter
import uvicorn
import asyncio
app = FastAPI()
@app.on_event("startup")
async def startup():
# Initialize the limiter. This will use an in-memory store.
# For production, use Redis or a similar store.
await FastAPILimiter.init(redis="redis://localhost")
@app.on_event("shutdown")
async def shutdown():
await FastAPILimiter.close()
@app.get("/public")
def public_endpoint():
return {"message": "This is a public endpoint, no rate limit."}
@app.get("/limited")
@RateLimiter(times=5, seconds=10) # Allow 5 calls per 10 seconds
async def limited_endpoint(request: Request):
# The request object is needed by the dependency
return {"message": "This is a rate-limited endpoint."}
if __name__ == "__main__":
uvicorn.run("your_file_name:app", host="0.0.0.0", port=8000)
Method 3: Using a Dedicated Service (Redis)
For distributed systems (e.g., multiple servers or containers), an in-memory limiter won't work. You need a shared, centralized store. Redis is the perfect tool for this.
The redis library has a built-in command, EVAL, which allows you to run Lua scripts atomically. This is the most efficient way to implement a rate limiter in Redis.
Installation:
pip install redis
Concept:
We'll use the ZADD (Sorted Set) and ZREMRANGEBYSCORE commands.
- The Sorted Set will store request timestamps as members.
- The score will be the timestamp itself.
- We'll remove all timestamps older than our time window.
- We'll check the number of remaining items in the set.
Python Code:
import redis
import time
# Connect to your Redis server
r = redis.Redis(host='localhost', port=6379, db=0)
def is_rate_allowed(key, limit, window_seconds):
"""
Checks if a request is allowed based on a rate limit.
:param key: A unique key for the user or client (e.g., user_id:ip).
:param limit: The maximum number of allowed requests.
:param window_seconds: The size of the time window in seconds.
:return: True if allowed, False if rate limited.
"""
# Lua script for atomic check-and-set
lua_script = """
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local request_cost = tonumber(ARGV[4])
-- Remove all older timestamps
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Get the current count
local current = redis.call('ZCARD', key)
-- Check if we can add the new request
if current < limit then
redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, window)
return 1
else
return 0
end
"""
now = time.time()
# Execute the script
allowed = r.eval(lua_script, 1, key, limit, window_seconds, now, 1)
return bool(allowed)
# --- Example Usage ---
USER_KEY = "user:123:rate_limit"
LIMIT = 5
WINDOW = 10 # seconds
for i in range(7):
if is_rate_allowed(USER_KEY, LIMIT, WINDOW):
print(f"Request {i+1}: Allowed")
else:
print(f"Request {i+1}: Rate Limited!")
time.sleep(1)
Summary: Which One to Choose?
| Method | Best For | Pros | Cons |
|---|---|---|---|
| Manual (Token Bucket) | Learning, simple scripts, single-threaded apps. | Full control, no external dependencies. | Complex to get right (thread-safety), not scalable. |
ratelimit |
Simple scripts, functions, or methods. | Very easy to use with decorators. | Less flexible, not designed for distributed systems. |
pyrate-limiter |
General purpose, more complex applications. | Flexible algorithms, can be used as a class or decorator. | Slightly more complex setup than ratelimit. |
fastapi-limiter |
FastAPI applications. | Seamless integration, production-ready (Redis support). | Only for FastAPI. |
| Redis | Distributed systems, microservices, high-traffic apps. | Highly scalable, fast, atomic operations. | Requires a Redis server, more complex setup. |
