Python多进程如何高效实现并行计算？-杰瑞科技汇

Of course! Let's dive deep into Python's multiprocessing module.

（图片来源网络，侵删）

What is `multiprocessing` and Why Use It?

Python has a Global Interpreter Lock (GIL), which is a mutex that protects access to Python objects, preventing multiple native threads from executing Python bytecode at the same time within a single process. This is a major bottleneck for CPU-bound tasks (tasks that require a lot of computation, like mathematical calculations, data processing, or image rendering) when using the standard threading module.

The multiprocessing module sidesteps the GIL by creating separate processes, each with its own Python interpreter and memory space. This allows your program to fully utilize multiple CPU cores, achieving true parallelism.

Core Concepts

Process: An independent instance of a running program. It has its own memory and its own Python interpreter. Communication between processes is more complex than between threads.
multiprocessing.Process: The class used to create and manage processes.
multiprocessing.Pool: A high-level interface for distributing tasks to a pool of worker processes. This is often the easiest and most efficient way to parallelize simple operations.
Inter-Process Communication (IPC): Since processes don't share memory, they need ways to communicate. The main methods are:
- Queues (multiprocessing.Queue): A data structure for passing messages between processes, similar to a queue.Queue but process-safe.
- **Pipes (multiprocessing.Pipe``): A two-way communication channel, useful for one-to-one communication.
- Shared Memory (multiprocessing.Value, multiprocessing.Array, multiprocessing.Manager): Allows processes to read and write to the same block of memory. This is the fastest way to share data but requires careful management to avoid race conditions.

Example 1: The Basic `Process` Class

This is the "hello world" of multiprocessing. We'll create a function that prints a message and run it in a separate process.

import multiprocessing
import time
import os
def worker(num):
    """A simple function that runs in a separate process."""
    process_id = os.getpid()
    print(f"Worker {num} (Process ID: {process_id}) is starting...")
    time.sleep(2) # Simulate a long-running task
    print(f"Worker {num} (Process ID: {process_id}) has finished.")
if __name__ == "__main__":
    print(f"Main Process ID: {os.getpid()}")
    # Create a list to hold the process objects
    processes = []
    # Create and start 5 processes
    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(i,))
        processes.append(p)
        p.start()
    # Wait for all processes to complete
    # This is crucial, otherwise the main script might exit before the workers finish.
    for p in processes:
        p.join()
    print("Main process has finished.")

How it works:

（图片来源网络，侵删）

if __name__ == "__main__":: This is a standard Python construct. It's essential for multiprocessing on some platforms (like Windows) to prevent child processes from re-importing and re-executing the script's main code, which would lead to an infinite loop of process creation.
multiprocessing.Process(target=worker, args=(i,)): This creates a process object.
- target: The function that the process will execute.
- args: A tuple of arguments to pass to the target function.
p.start(): This tells the OS to start the new process and begin executing the worker function.
p.join(): This is a blocking call. The main process will wait here until the process p has completed.

Output (will show different Process IDs):

Main Process ID: 12345
Worker 0 (Process ID: 12346) is starting...
Worker 1 (Process ID: 12347) is starting...
Worker 2 (Process ID: 12348) is starting...
Worker 3 (Process ID: 12349) is starting...
Worker 4 (Process ID: 12350) is starting...
# (a 2-second pause)
Worker 0 (Process ID: 12346) has finished.
Worker 1 (Process ID: 12347) has finished.
Worker 2 (Process ID: 12348) has finished.
Worker 3 (Process ID: 12349) has finished.
Worker 4 (Process ID: 12350) has finished.
Main process has finished.

Example 2: The `Pool` for Parallel Data Processing (Map-Reduce Pattern)

For tasks where you have a list of items and want to apply the same function to each item, multiprocessing.Pool is the perfect tool. It provides a map method that is a parallel version of the built-in map.

Let's calculate squares of numbers in parallel.

import multiprocessing
import time
def square(n):
    """Function to calculate the square of a number."""
    time.sleep(0.5) # Simulate some computation
    result = n * n
    print(f"Calculated square of {n} = {result}")
    return result
if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    # Create a pool of worker processes.
    # By default, it uses the number of available CPU cores.
    print(f"Running on a pool with {multiprocessing.cpu_count()} processes.")
    with multiprocessing.Pool() as pool:
        # pool.map applies the 'square' function to every item in 'numbers'
        # It collects the results in the same order as the inputs.
        results = pool.map(square, numbers)
    print("\nFinal results from pool.map:", results)

How it works:

（图片来源网络，侵删）

with multiprocessing.Pool() as pool:: This creates a pool of worker processes. The with statement ensures the pool is properly shut down after use.
pool.map(square, numbers): This is the key part.
- It divides the numbers list into chunks.
- It sends each chunk to a free worker process in the pool.
- Each worker executes the square function on its chunk.
- The map function waits for all workers to finish and then returns a list of the results, in the original order.

Output:

Running on a pool with 8 processes.
Calculated square of 1 = 1
Calculated square of 2 = 4
Calculated square of 3 = 9
Calculated square of 4 = 16
Calculated square of 5 = 25
Calculated square of 6 = 36
Calculated square of 7 = 49
Calculated square of 8 = 64
Calculated square of 9 = 81
Calculated square of 10 = 100
Final results from pool.map: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

(Note: The order of the "Calculated square..." lines might vary depending on which process gets a chunk first, but the final results list will always be in order.)

Example 3: Using a `Queue` for Communication

Let's create a "producer" process that generates data and a "consumer" process that processes it, using a Queue to pass the data between them.

import multiprocessing
import time
import random
def producer(queue):
    """Generates data and puts it into the queue."""
    for i in range(5):
        item = random.randint(1, 100)
        print(f"Producer: putting item {item} into the queue")
        queue.put(item)
        time.sleep(1) # Simulate time to produce an item
    print("Producer: finished, putting None to signal exit")
    queue.put(None) # Signal the consumer to exit
def consumer(queue):
    """Consumes data from the queue until it sees a None."""
    while True:
        item = queue.get() # This is a blocking call
        if item is None:
            print("Consumer: received None, exiting.")
            break
        print(f"Consumer: got item {item}, processing...")
        time.sleep(2) # Simulate time to process an item
if __name__ == "__main__":
    # Create a shared queue
    task_queue = multiprocessing.Queue()
    # Create producer and consumer processes
    p1 = multiprocessing.Process(target=producer, args=(task_queue,))
    p2 = multiprocessing.Process(target=consumer, args=(task_queue,))
    p1.start()
    p2.start()
    # Wait for both processes to complete
    p1.join()
    p2.join()
    print("All processes finished.")

How it works:

multiprocessing.Queue(): Creates a process-safe queue that can be passed to multiple processes.
queue.put(item): Puts an item onto the queue. It's thread-safe and process-safe.
queue.get(): Removes and returns an item from the queue. If the queue is empty, it blocks (waits) until an item is available.
queue.put(None): A common pattern is to use a special "sentinel value" (like None) to signal a consumer process that there will be no more data and it should terminate.

`multiprocessing` vs. `threading`: A Quick Comparison

Feature	`multiprocessing`	`threading`
GIL	Bypasses the GIL. Each process has its own.	Subject to the GIL. Only one thread executes Python code at a time.
Parallelism	True Parallelism. Multiple processes run on multiple CPU cores simultaneously.	Concurrency. Threads take turns executing, often giving the illusion of parallelism.
Memory	Separate Memory. Processes do not share memory by default. Data must be explicitly shared (e.g., via `Queue`, `Manager`).	Shared Memory. All threads within a process share the same memory space.
Overhead	High. Creating a process is more expensive (time and memory) than creating a thread.	Low. Threads are lightweight and cheaper to create.
Best For	CPU-bound tasks (heavy computation).	I/O-bound tasks (network requests, file reading/writing, database queries).

Key Takeaways

Use multiprocessing for tasks that are heavy on CPU calculation.
Use multiprocessing.Pool for simple "map-reduce" style tasks. It's easy to use and efficient.
Use multiprocessing.Process for more complex workflows where you need fine-grained control over each worker.
Use multiprocessing.Queue or Pipe for communication between processes.
Always wrap your main entry point in if __name__ == "__main__":.
Remember that join() is your friend; use it to wait for child processes to complete.

Python多进程如何高效实现并行计算？

What is `multiprocessing` and Why Use It?

Core Concepts

Example 1: The Basic `Process` Class

Example 2: The `Pool` for Parallel Data Processing (Map-Reduce Pattern)

Example 3: Using a `Queue` for Communication

`multiprocessing` vs. `threading`: A Quick Comparison

Key Takeaways

99ANYc3cd6

Linux内核视频教程该怎么学？

Python HTTPHandler如何实现自定义请求处理？

Excel VBA视频教程怎么学更高效？