Of course. Let's dive into concurrency in Python 2. This is a crucial topic because the way Python 2 handled concurrency is fundamentally different from modern Python 3.

The Core Concept: The Global Interpreter Lock (GIL)
Before we talk about the tools, you must understand the Global Interpreter Lock (GIL).
- What it is: The GIL is a mutex (a lock) that protects access to Python objects, preventing multiple native threads from executing Python bytecode at the same time within a single process.
- The Implication: This means that even if you use multiple threads, only one thread can execute Python code at any given moment. This prevents true parallelism on multi-core processors for CPU-bound tasks.
- Why it exists: It simplifies memory management in CPython (the standard Python implementation). Because the GIL exists, you don't need to worry about race conditions when basic CPython objects are being modified, as the GIL acts as a giant lock.
So, when does concurrency help in Python 2?
- I/O-Bound Tasks: When a thread is waiting for an external operation to complete (like reading a file, making a network request, or querying a database), it releases the GIL. This allows another thread to run. This is the primary use case for threading in Python 2/3.
- CPU-Bound Tasks: For tasks that are purely mathematical and don't involve I/O (e.g., heavy calculations, image processing), the GIL will be a bottleneck. For these tasks in Python 2, the best tool is
multiprocessing, which gets around the GIL by using separate processes, each with its own memory space and its own GIL.
The Main Concurrency Tools in Python 2
Python 2 provided two primary modules for concurrency: threading and multiprocessing. A third, multiprocessing.dummy, is a lesser-known but very useful wrapper.
threading Module
The threading module is the standard way to handle multiple threads. It's perfect for I/O-bound applications.

Use Case: Making multiple network requests, reading from multiple files, or any task that spends most of its time waiting.
Key Concepts:
Thread: The class used to create and manage a new thread.Lock: A synchronization primitive to prevent race conditions when multiple threads try to modify the same shared resource.Queue: A thread-safe data structure for passing data between threads. This is highly recommended over manually sharing lists or dictionaries.
Example: Downloading Multiple URLs with Threads
This classic example shows how to download several web pages concurrently. Notice how much faster it is than doing it sequentially.

# concurrent_downloader.py
import threading
import urllib2
import time
# A shared list to store results
results = []
# A lock to prevent race conditions when appending to the list
lock = threading.Lock()
def download_url(url):
"""Downloads a single URL and appends its content length to results."""
try:
print "Starting download: %s" % url
response = urllib2.urlopen(url)
content = response.read()
# Use the lock to safely modify the shared list
with lock:
results.append((url, len(content)))
print "Finished download: %s (size: %d bytes)" % (url, len(content))
except Exception as e:
print "Error downloading %s: %s" % (url, e)
if __name__ == "__main__":
urls = [
'http://www.python.org',
'http://www.yahoo.com',
'http://www.google.com',
'http://www.apache.org',
'http://www.github.com'
]
start_time = time.time()
# Create a list of thread objects
threads = []
for url in urls:
thread = threading.Thread(target=download_url, args=(url,))
threads.append(thread)
thread.start()
# Wait for all threads to complete
for thread in threads:
thread.join()
end_time = time.time()
print "\n--- All downloads complete ---"
for url, size in results:
print "%s: %d bytes" % (url, size)
print "\nTotal time: %f seconds" % (end_time - start_time)
To run this, you'd execute it from the command line. You'll see the output messages from different threads interleaved, demonstrating that they are running concurrently.
multiprocessing Module
The multiprocessing module was introduced in Python 2.6 to address the GIL limitation for CPU-bound tasks. It creates new processes, each with its own Python interpreter and memory space.
Use Case: Video encoding, scientific calculations, data processing, any task that is heavy on CPU.
Key Concepts:
Process: The class used to create and manage a new process.Queue,Pipe: Inter-process communication (IPC) mechanisms to pass data between processes. This is necessary because they don't share memory.Pool: A high-level abstraction that manages a pool of worker processes, making it easy to parallelize a function across multiple inputs.
Example: CPU-Bound Task with multiprocessing.Pool
Let's create a function that simulates a heavy computation and then run it on multiple inputs in parallel.
# cpu_bound_worker.py
import multiprocessing
import time
import random
def is_prime(n):
"""A CPU-bound function to check if a number is prime."""
if n <= 1:
return False
if n == 2:
return True
if n % 2 == 0:
return False
for i in range(3, int(n**0.5) + 1, 2):
if n % i == 0:
return False
return True
def check_prime_chunk(numbers_chunk):
"""Processes a chunk of numbers and returns the primes found."""
primes_found = []
for num in numbers_chunk:
if is_prime(num):
primes_found.append(num)
return primes_found
if __name__ == "__main__":
# We use the `if __name__ == "__main__":` guard to prevent issues on some platforms
# (like Windows) when importing this module.
# Generate a large list of random numbers to check
all_numbers = [random.randint(1, 100000) for _ in range(100000)]
# Create a pool of 4 worker processes
pool = multiprocessing.Pool(processes=4)
# Split the list of numbers into chunks for each process
# We can let the pool decide the chunk size or specify it.
chunk_size = len(all_numbers) / 4
start_time = time.time()
# map() blocks until all results are ready
results = pool.map(is_prime, all_numbers)
# Alternatively, for more control, we can use chunks
# results_chunks = pool.map(check_prime_chunk, [all_numbers[i:i+chunk_size] for i in range(0, len(all_numbers), chunk_size)])
# all_primes = [prime for chunk in results_chunks for prime in chunk]
end_time = time.time()
print "Checked %d numbers." % len(all_numbers)
print "Total time: %f seconds" % (end_time - start_time)
pool.close()
pool.join()
When you run this, you'll see that it uses all available CPU cores (up to 4 in this case), significantly speeding up the computation compared to a single-threaded version.
multiprocessing.dummy Module (The Hidden Gem)
This is a simple wrapper in the multiprocessing module that makes a multiprocessing.Pool behave like a threading.pool.
What it does: It provides a Pool class that uses threads instead of processes.
Why is it useful? It gives you the clean, high-level API of multiprocessing.Pool but for threading. This is great for I/O-bound tasks where you want to write concise code without manually managing threading.Thread objects.
Example: Using multiprocessing.dummy for the downloader
This achieves the same result as the first example but is often considered more readable.
# dummy_downloader.py
import urllib2
import time
from multiprocessing.dummy import Pool as ThreadPool
def download_url(url):
"""Downloads a single URL and returns its URL and content length."""
try:
print "Starting download: %s" % url
response = urllib2.urlopen(url)
content = response.read()
print "Finished download: %s (size: %d bytes)" % (url, len(content))
return (url, len(content))
except Exception as e:
print "Error downloading %s: %s" % (url, e)
return (url, 0)
if __name__ == "__main__":
urls = [
'http://www.python.org',
'http://www.yahoo.com',
'http://www.google.com',
'http://www.apache.org',
'http://www.github.com'
]
start_time = time.time()
# Create a ThreadPool with 4 worker threads
pool = ThreadPool(processes=4)
# map() applies the function to each item in the list and returns a list of results
results = pool.map(download_url, urls)
# Close the pool and wait for the work to finish
pool.close()
pool.join()
end_time = time.time()
print "\n--- All downloads complete ---"
for url, size in results:
if size > 0:
print "%s: %d bytes" % (url, size)
print "\nTotal time: %f seconds" % (end_time - start_time)
Summary: Which One to Use in Python 2?
| Scenario | Recommended Tool | Why? |
|---|---|---|
| I/O-Bound Task (network, disk, database) | threading or multiprocessing.dummy.Pool |
Threads release the GIL while waiting, allowing other threads to run. dummy.Pool is a clean way to use threads. |
| CPU-Bound Task (heavy calculations) | multiprocessing |
Bypasses the GIL by using separate processes, allowing true parallel execution on multiple cores. |
| Need to share simple data between threads | threading.Lock |
Essential for preventing race conditions when modifying shared variables. |
| Need to pass data between threads/processes | Queue |
Thread-safe and process-safe data structure designed for producer-consumer patterns. |
| Want to parallelize a function easily | multiprocessing.Pool |
High-level API that manages a pool of workers, abstracting away the complexity of Process objects. |
The Modern Path: Python 3
It's important to know that Python 3 has refined these concepts:
- The
queuemodule is used for both threads and processes. - The
concurrent.futuresmodule provides a high-level, unified interface for bothThreadPoolExecutorandProcessPoolExecutor, making the choice even cleaner. - The
asynciomodule offers a different concurrency model based on single-threaded asynchronous I/O, which is extremely powerful for high-performance network services.
However, if you are maintaining or working on a Python 2 codebase, understanding threading and multiprocessing is essential.
