Of course! Let's dive deep into Python's multiprocessing module.

What is multiprocessing and Why Use It?
Python has a Global Interpreter Lock (GIL), which is a mutex that protects access to Python objects, preventing multiple native threads from executing Python bytecode at the same time within a single process. This is a major bottleneck for CPU-bound tasks (tasks that require a lot of computation, like mathematical calculations, data processing, or image rendering) when using the standard threading module.
The multiprocessing module sidesteps the GIL by creating separate processes, each with its own Python interpreter and memory space. This allows your program to fully utilize multiple CPU cores, achieving true parallelism.
Core Concepts
- Process: An independent instance of a running program. It has its own memory and its own Python interpreter. Communication between processes is more complex than between threads.
multiprocessing.Process: The class used to create and manage processes.multiprocessing.Pool: A high-level interface for distributing tasks to a pool of worker processes. This is often the easiest and most efficient way to parallelize simple operations.- Inter-Process Communication (IPC): Since processes don't share memory, they need ways to communicate. The main methods are:
- Queues (
multiprocessing.Queue): A data structure for passing messages between processes, similar to aqueue.Queuebut process-safe. - **Pipes (
multiprocessing.Pipe``): A two-way communication channel, useful for one-to-one communication. - Shared Memory (
multiprocessing.Value,multiprocessing.Array,multiprocessing.Manager): Allows processes to read and write to the same block of memory. This is the fastest way to share data but requires careful management to avoid race conditions.
- Queues (
Example 1: The Basic Process Class
This is the "hello world" of multiprocessing. We'll create a function that prints a message and run it in a separate process.
import multiprocessing
import time
import os
def worker(num):
"""A simple function that runs in a separate process."""
process_id = os.getpid()
print(f"Worker {num} (Process ID: {process_id}) is starting...")
time.sleep(2) # Simulate a long-running task
print(f"Worker {num} (Process ID: {process_id}) has finished.")
if __name__ == "__main__":
print(f"Main Process ID: {os.getpid()}")
# Create a list to hold the process objects
processes = []
# Create and start 5 processes
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
processes.append(p)
p.start()
# Wait for all processes to complete
# This is crucial, otherwise the main script might exit before the workers finish.
for p in processes:
p.join()
print("Main process has finished.")
How it works:

if __name__ == "__main__":: This is a standard Python construct. It's essential for multiprocessing on some platforms (like Windows) to prevent child processes from re-importing and re-executing the script's main code, which would lead to an infinite loop of process creation.multiprocessing.Process(target=worker, args=(i,)): This creates a process object.target: The function that the process will execute.args: A tuple of arguments to pass to the target function.
p.start(): This tells the OS to start the new process and begin executing theworkerfunction.p.join(): This is a blocking call. The main process will wait here until the processphas completed.
Output (will show different Process IDs):
Main Process ID: 12345
Worker 0 (Process ID: 12346) is starting...
Worker 1 (Process ID: 12347) is starting...
Worker 2 (Process ID: 12348) is starting...
Worker 3 (Process ID: 12349) is starting...
Worker 4 (Process ID: 12350) is starting...
# (a 2-second pause)
Worker 0 (Process ID: 12346) has finished.
Worker 1 (Process ID: 12347) has finished.
Worker 2 (Process ID: 12348) has finished.
Worker 3 (Process ID: 12349) has finished.
Worker 4 (Process ID: 12350) has finished.
Main process has finished.
Example 2: The Pool for Parallel Data Processing (Map-Reduce Pattern)
For tasks where you have a list of items and want to apply the same function to each item, multiprocessing.Pool is the perfect tool. It provides a map method that is a parallel version of the built-in map.
Let's calculate squares of numbers in parallel.
import multiprocessing
import time
def square(n):
"""Function to calculate the square of a number."""
time.sleep(0.5) # Simulate some computation
result = n * n
print(f"Calculated square of {n} = {result}")
return result
if __name__ == "__main__":
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Create a pool of worker processes.
# By default, it uses the number of available CPU cores.
print(f"Running on a pool with {multiprocessing.cpu_count()} processes.")
with multiprocessing.Pool() as pool:
# pool.map applies the 'square' function to every item in 'numbers'
# It collects the results in the same order as the inputs.
results = pool.map(square, numbers)
print("\nFinal results from pool.map:", results)
How it works:

with multiprocessing.Pool() as pool:: This creates a pool of worker processes. Thewithstatement ensures the pool is properly shut down after use.pool.map(square, numbers): This is the key part.- It divides the
numberslist into chunks. - It sends each chunk to a free worker process in the pool.
- Each worker executes the
squarefunction on its chunk. - The
mapfunction waits for all workers to finish and then returns a list of the results, in the original order.
- It divides the
Output:
Running on a pool with 8 processes.
Calculated square of 1 = 1
Calculated square of 2 = 4
Calculated square of 3 = 9
Calculated square of 4 = 16
Calculated square of 5 = 25
Calculated square of 6 = 36
Calculated square of 7 = 49
Calculated square of 8 = 64
Calculated square of 9 = 81
Calculated square of 10 = 100
Final results from pool.map: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
(Note: The order of the "Calculated square..." lines might vary depending on which process gets a chunk first, but the final results list will always be in order.)
Example 3: Using a Queue for Communication
Let's create a "producer" process that generates data and a "consumer" process that processes it, using a Queue to pass the data between them.
import multiprocessing
import time
import random
def producer(queue):
"""Generates data and puts it into the queue."""
for i in range(5):
item = random.randint(1, 100)
print(f"Producer: putting item {item} into the queue")
queue.put(item)
time.sleep(1) # Simulate time to produce an item
print("Producer: finished, putting None to signal exit")
queue.put(None) # Signal the consumer to exit
def consumer(queue):
"""Consumes data from the queue until it sees a None."""
while True:
item = queue.get() # This is a blocking call
if item is None:
print("Consumer: received None, exiting.")
break
print(f"Consumer: got item {item}, processing...")
time.sleep(2) # Simulate time to process an item
if __name__ == "__main__":
# Create a shared queue
task_queue = multiprocessing.Queue()
# Create producer and consumer processes
p1 = multiprocessing.Process(target=producer, args=(task_queue,))
p2 = multiprocessing.Process(target=consumer, args=(task_queue,))
p1.start()
p2.start()
# Wait for both processes to complete
p1.join()
p2.join()
print("All processes finished.")
How it works:
multiprocessing.Queue(): Creates a process-safe queue that can be passed to multiple processes.queue.put(item): Puts an item onto the queue. It's thread-safe and process-safe.queue.get(): Removes and returns an item from the queue. If the queue is empty, it blocks (waits) until an item is available.queue.put(None): A common pattern is to use a special "sentinel value" (likeNone) to signal a consumer process that there will be no more data and it should terminate.
multiprocessing vs. threading: A Quick Comparison
| Feature | multiprocessing |
threading |
|---|---|---|
| GIL | Bypasses the GIL. Each process has its own. | Subject to the GIL. Only one thread executes Python code at a time. |
| Parallelism | True Parallelism. Multiple processes run on multiple CPU cores simultaneously. | Concurrency. Threads take turns executing, often giving the illusion of parallelism. |
| Memory | Separate Memory. Processes do not share memory by default. Data must be explicitly shared (e.g., via Queue, Manager). |
Shared Memory. All threads within a process share the same memory space. |
| Overhead | High. Creating a process is more expensive (time and memory) than creating a thread. | Low. Threads are lightweight and cheaper to create. |
| Best For | CPU-bound tasks (heavy computation). | I/O-bound tasks (network requests, file reading/writing, database queries). |
Key Takeaways
- Use
multiprocessingfor tasks that are heavy on CPU calculation. - Use
multiprocessing.Poolfor simple "map-reduce" style tasks. It's easy to use and efficient. - Use
multiprocessing.Processfor more complex workflows where you need fine-grained control over each worker. - Use
multiprocessing.QueueorPipefor communication between processes. - Always wrap your main entry point in
if __name__ == "__main__":. - Remember that
join()is your friend; use it to wait for child processes to complete.
