Python 2并发如何实现高效多任务处理？-杰瑞科技汇

Of course. Let's dive into concurrency in Python 2. This is a crucial topic because the way Python 2 handled concurrency is fundamentally different from modern Python 3.

（图片来源网络，侵删）

The Core Concept: The Global Interpreter Lock (GIL)

Before we talk about the tools, you must understand the Global Interpreter Lock (GIL).

What it is: The GIL is a mutex (a lock) that protects access to Python objects, preventing multiple native threads from executing Python bytecode at the same time within a single process.
The Implication: This means that even if you use multiple threads, only one thread can execute Python code at any given moment. This prevents true parallelism on multi-core processors for CPU-bound tasks.
Why it exists: It simplifies memory management in CPython (the standard Python implementation). Because the GIL exists, you don't need to worry about race conditions when basic CPython objects are being modified, as the GIL acts as a giant lock.

So, when does concurrency help in Python 2?

I/O-Bound Tasks: When a thread is waiting for an external operation to complete (like reading a file, making a network request, or querying a database), it releases the GIL. This allows another thread to run. This is the primary use case for threading in Python 2/3.
CPU-Bound Tasks: For tasks that are purely mathematical and don't involve I/O (e.g., heavy calculations, image processing), the GIL will be a bottleneck. For these tasks in Python 2, the best tool is multiprocessing, which gets around the GIL by using separate processes, each with its own memory space and its own GIL.

The Main Concurrency Tools in Python 2

Python 2 provided two primary modules for concurrency: threading and multiprocessing. A third, multiprocessing.dummy, is a lesser-known but very useful wrapper.

`threading` Module

The threading module is the standard way to handle multiple threads. It's perfect for I/O-bound applications.

（图片来源网络，侵删）

Use Case: Making multiple network requests, reading from multiple files, or any task that spends most of its time waiting.

Key Concepts:

Thread: The class used to create and manage a new thread.
Lock: A synchronization primitive to prevent race conditions when multiple threads try to modify the same shared resource.
Queue: A thread-safe data structure for passing data between threads. This is highly recommended over manually sharing lists or dictionaries.

Example: Downloading Multiple URLs with Threads

This classic example shows how to download several web pages concurrently. Notice how much faster it is than doing it sequentially.

（图片来源网络，侵删）

# concurrent_downloader.py
import threading
import urllib2
import time
# A shared list to store results
results = []
# A lock to prevent race conditions when appending to the list
lock = threading.Lock()
def download_url(url):
    """Downloads a single URL and appends its content length to results."""
    try:
        print "Starting download: %s" % url
        response = urllib2.urlopen(url)
        content = response.read()
        # Use the lock to safely modify the shared list
        with lock:
            results.append((url, len(content)))
            print "Finished download: %s (size: %d bytes)" % (url, len(content))
    except Exception as e:
        print "Error downloading %s: %s" % (url, e)
if __name__ == "__main__":
    urls = [
        'http://www.python.org',
        'http://www.yahoo.com',
        'http://www.google.com',
        'http://www.apache.org',
        'http://www.github.com'
    ]
    start_time = time.time()
    # Create a list of thread objects
    threads = []
    for url in urls:
        thread = threading.Thread(target=download_url, args=(url,))
        threads.append(thread)
        thread.start()
    # Wait for all threads to complete
    for thread in threads:
        thread.join()
    end_time = time.time()
    print "\n--- All downloads complete ---"
    for url, size in results:
        print "%s: %d bytes" % (url, size)
    print "\nTotal time: %f seconds" % (end_time - start_time)

To run this, you'd execute it from the command line. You'll see the output messages from different threads interleaved, demonstrating that they are running concurrently.

`multiprocessing` Module

The multiprocessing module was introduced in Python 2.6 to address the GIL limitation for CPU-bound tasks. It creates new processes, each with its own Python interpreter and memory space.

Use Case: Video encoding, scientific calculations, data processing, any task that is heavy on CPU.

Key Concepts:

Process: The class used to create and manage a new process.
Queue, Pipe: Inter-process communication (IPC) mechanisms to pass data between processes. This is necessary because they don't share memory.
Pool: A high-level abstraction that manages a pool of worker processes, making it easy to parallelize a function across multiple inputs.

Example: CPU-Bound Task with multiprocessing.Pool

Let's create a function that simulates a heavy computation and then run it on multiple inputs in parallel.

# cpu_bound_worker.py
import multiprocessing
import time
import random
def is_prime(n):
    """A CPU-bound function to check if a number is prime."""
    if n <= 1:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False
    for i in range(3, int(n**0.5) + 1, 2):
        if n % i == 0:
            return False
    return True
def check_prime_chunk(numbers_chunk):
    """Processes a chunk of numbers and returns the primes found."""
    primes_found = []
    for num in numbers_chunk:
        if is_prime(num):
            primes_found.append(num)
    return primes_found
if __name__ == "__main__":
    # We use the `if __name__ == "__main__":` guard to prevent issues on some platforms
    # (like Windows) when importing this module.
    # Generate a large list of random numbers to check
    all_numbers = [random.randint(1, 100000) for _ in range(100000)]
    # Create a pool of 4 worker processes
    pool = multiprocessing.Pool(processes=4)
    # Split the list of numbers into chunks for each process
    # We can let the pool decide the chunk size or specify it.
    chunk_size = len(all_numbers) / 4 
    start_time = time.time()
    # map() blocks until all results are ready
    results = pool.map(is_prime, all_numbers)
    # Alternatively, for more control, we can use chunks
    # results_chunks = pool.map(check_prime_chunk, [all_numbers[i:i+chunk_size] for i in range(0, len(all_numbers), chunk_size)])
    # all_primes = [prime for chunk in results_chunks for prime in chunk]
    end_time = time.time()
    print "Checked %d numbers." % len(all_numbers)
    print "Total time: %f seconds" % (end_time - start_time)
    pool.close()
    pool.join()

When you run this, you'll see that it uses all available CPU cores (up to 4 in this case), significantly speeding up the computation compared to a single-threaded version.

`multiprocessing.dummy` Module (The Hidden Gem)

This is a simple wrapper in the multiprocessing module that makes a multiprocessing.Pool behave like a threading.pool.

What it does: It provides a Pool class that uses threads instead of processes.

Why is it useful? It gives you the clean, high-level API of multiprocessing.Pool but for threading. This is great for I/O-bound tasks where you want to write concise code without manually managing threading.Thread objects.

Example: Using multiprocessing.dummy for the downloader

This achieves the same result as the first example but is often considered more readable.

# dummy_downloader.py
import urllib2
import time
from multiprocessing.dummy import Pool as ThreadPool
def download_url(url):
    """Downloads a single URL and returns its URL and content length."""
    try:
        print "Starting download: %s" % url
        response = urllib2.urlopen(url)
        content = response.read()
        print "Finished download: %s (size: %d bytes)" % (url, len(content))
        return (url, len(content))
    except Exception as e:
        print "Error downloading %s: %s" % (url, e)
        return (url, 0)
if __name__ == "__main__":
    urls = [
        'http://www.python.org',
        'http://www.yahoo.com',
        'http://www.google.com',
        'http://www.apache.org',
        'http://www.github.com'
    ]
    start_time = time.time()
    # Create a ThreadPool with 4 worker threads
    pool = ThreadPool(processes=4)
    # map() applies the function to each item in the list and returns a list of results
    results = pool.map(download_url, urls)
    # Close the pool and wait for the work to finish
    pool.close()
    pool.join()
    end_time = time.time()
    print "\n--- All downloads complete ---"
    for url, size in results:
        if size > 0:
            print "%s: %d bytes" % (url, size)
    print "\nTotal time: %f seconds" % (end_time - start_time)

Summary: Which One to Use in Python 2?

Scenario	Recommended Tool	Why?
I/O-Bound Task (network, disk, database)	`threading` or `multiprocessing.dummy.Pool`	Threads release the GIL while waiting, allowing other threads to run. `dummy.Pool` is a clean way to use threads.
CPU-Bound Task (heavy calculations)	`multiprocessing`	Bypasses the GIL by using separate processes, allowing true parallel execution on multiple cores.
Need to share simple data between threads	`threading.Lock`	Essential for preventing race conditions when modifying shared variables.
Need to pass data between threads/processes	`Queue`	Thread-safe and process-safe data structure designed for producer-consumer patterns.
Want to parallelize a function easily	`multiprocessing.Pool`	High-level API that manages a pool of workers, abstracting away the complexity of `Process` objects.

The Modern Path: Python 3

It's important to know that Python 3 has refined these concepts:

The queue module is used for both threads and processes.
The concurrent.futures module provides a high-level, unified interface for both ThreadPoolExecutor and ProcessPoolExecutor, making the choice even cleaner.
The asyncio module offers a different concurrency model based on single-threaded asynchronous I/O, which is extremely powerful for high-performance network services.

However, if you are maintaining or working on a Python 2 codebase, understanding threading and multiprocessing is essential.

Python 2并发如何实现高效多任务处理？

The Core Concept: The Global Interpreter Lock (GIL)

The Main Concurrency Tools in Python 2

`threading` Module

`multiprocessing` Module

`multiprocessing.dummy` Module (The Hidden Gem)

Summary: Which One to Use in Python 2?

The Modern Path: Python 3

99ANYc3cd6

python rabbitmq 广播

java access 2025

如何快速用PS做出专业证件照？

Python pymssql连接为何总报GBK错误？

android java xml

ubuntu安装教程16.04

Java split分割字符串时如何处理特殊字符？

Premiere CS5教程从哪学？新手入门怎么学？

Docker Java开发环境如何高效配置？

java dom4j 读取xml

如何安装Python VoteSmart库？

Windows下如何快速查看Java进程？

oppor7plus换屏教程

python split regex

七彩虹GTX1060怎么安装？

eclipse 远程调试java

Python 2并发如何实现高效多任务处理？

The Core Concept: The Global Interpreter Lock (GIL)

The Main Concurrency Tools in Python 2

threading Module

multiprocessing Module

multiprocessing.dummy Module (The Hidden Gem)

Summary: Which One to Use in Python 2?

The Modern Path: Python 3

相关推荐

ubuntu安装教程16.04

`threading` Module

`multiprocessing` Module

`multiprocessing.dummy` Module (The Hidden Gem)