Python多线程如何高效避免锁竞争？-杰瑞科技汇

Of course! Let's dive deep into Python multithreading. This is a fundamental concept for writing concurrent applications.

What is Multithreading?

At its core, a thread is the smallest sequence of programmed instructions that can be managed independently by a scheduler. A process is an instance of a program in execution. Every process has at least one thread, called the main thread.

Multithreading is the technique of using multiple threads within a single process to execute tasks concurrently. Think of it like a chef in a kitchen:

Single-threaded Chef: The chef does everything one at a time: chops vegetables, then puts them on the stove, then plates the dish. If waiting for water to boil, they just stand and wait.
Multithreaded Chef: The chef delegates tasks. They start boiling water (Thread 1), while simultaneously chopping vegetables (Thread 2). When the water is boiling, they stop chopping, add the pasta, and then go back to chopping. This is much more efficient.

Why Use Multithreading? (The Pros and Cons)

Use Cases (When to use it):

I/O-Bound Tasks: This is the primary and most important use case for Python's multithreading. These are tasks that spend most of their time waiting for external operations to complete.
- Examples: Network requests (API calls, web scraping), reading/writing files to a disk, database queries.
- Why it works: While one thread is waiting for a network response, the Python Global Interpreter Lock (GIL) is released, allowing another thread to run. This way, your program isn't idle; it's doing other useful work.
Concurrent GUI Applications: To keep a user interface responsive while a background task is running (e.g., downloading a file in the background).

Why NOT to Use It (The Pitfalls):

The Global Interpreter Lock (GIL): This is the most critical concept to understand in Python multithreading.
- The GIL is a mutex (a lock) that protects access to Python objects, preventing multiple native threads from executing Python bytecode at the same time within a single process.
- Implication: True parallelism on multi-core processors for CPU-bound tasks is not achieved with threads. Only one thread can execute Python code at any given moment.
- Analogy: Imagine a single checkout counter (the CPU core) in a store. Even if you have cashiers (threads), only one can use the counter at a time. The GIL ensures this.
CPU-Bound Tasks: These are tasks that require heavy computation.
- Examples: Mathematical calculations, image/video processing, data compression.
- Why it's bad: Because of the GIL, multiple threads will just take turns running on the CPU core, but they won't run simultaneously. The overhead of switching between threads can even make your program slower than running it single-threaded. For CPU-bound tasks, use multiprocessing (which creates separate processes, each with its own GIL and Python interpreter).
Complexity and Synchronization Issues: Managing shared data between threads can lead to bugs that are incredibly hard to find and fix, such as:
- Race Conditions: When two threads try to read and write the same data at the same time, leading to inconsistent results.
- Deadlocks: When two or more threads are blocked forever, each waiting for the other to release a resource.

How to Use Multithreading in Python: The `threading` Module

Python's built-in threading module is the standard way to work with threads.

The Basic Approach: `Thread` Class

You create a Thread object, passing it a function (or a callable object) to run in that thread.

import threading
import time
import os
def print_numbers():
    """A simple function that prints numbers."""
    thread_id = threading.get_ident()
    process_id = os.getpid()
    for i in range(1, 6):
        print(f"Thread {thread_id} (Process {process_id}): Count {i}")
        time.sleep(0.5) # Simulate an I/O operation
def print_letters():
    """Another simple function that prints letters."""
    thread_id = threading.get_ident()
    process_id = os.getpid()
    for letter in 'ABCDE':
        print(f"Thread {thread_id} (Process {process_id}): Letter {letter}")
        time.sleep(0.5) # Simulate an I/O operation
# --- Main execution ---
if __name__ == "__main__":
    print(f"Main Process ID: {os.getpid()}")
    # Create two thread objects
    thread1 = threading.Thread(target=print_numbers)
    thread2 = threading.Thread(target=print_letters)
    # Start the threads
    print("Starting threads...")
    thread1.start()
    thread2.start()
    # Wait for both threads to complete their execution
    # This is crucial! Otherwise, the main program might exit before the threads are done.
    print("Waiting for threads to finish...")
    thread1.join()
    thread2.join()
    print("All threads finished.")

Output:

Main Process ID: 12345
Starting threads...
Thread 140123456789120 (Process 12345): Count 1
Thread 140123456789232 (Process 12345): Letter A
Waiting for threads to finish...
Thread 140123456789120 (Process 12345): Count 2
Thread 140123456789232 (Process 12345): Letter B
Thread 140123456789120 (Process 12345): Count 3
Thread 140123456789232 (Process 12345): Letter C
Thread 140123456789120 (Process 12345): Count 4
Thread 140123456789232 (Process 12345): Letter D
Thread 140123456789120 (Process 12345): Count 5
Thread 140123456789232 (Process 12345): Letter E
All threads finished.

Notice how the output is interleaved. This is concurrency in action. Both threads are making progress, even though they are running on the same process.

Passing Arguments to Threads

You can pass arguments to your target function using the args keyword (as a tuple) or kwargs (as a dictionary).

def worker(num):
    print(f"Worker {num} is running")
threads = []
for i in range(5):
    # Create a thread for each number
    thread = threading.Thread(target=worker, args=(i,))
    threads.append(thread)
    thread.start()
for thread in threads:
    thread.join()
print("All worker threads have completed.")

Synchronization: The `Lock`

When multiple threads access and modify the same shared data, you need a Lock to prevent race conditions.

Imagine a bank account with two threads trying to deposit money simultaneously.

import threading
class BankAccount:
    def __init__(self):
        self.balance = 0
        self.lock = threading.Lock() # Create a lock object
    def deposit(self, amount):
        # Acquire the lock. If another thread holds it, this will wait.
        with self.lock:
            print(f"Depositing {amount}. Current balance: {self.balance}")
            self_balance = self.balance
            # Simulate a delay where another thread could interfere
            time.sleep(0.1) 
            self.balance = self_balance + amount
            print(f"New balance after deposit: {self.balance}")
account = BankAccount()
def make_deposit(amount):
    account.deposit(amount)
# Create two threads that try to deposit 100 at the same time
thread1 = threading.Thread(target=make_deposit, args=(100,))
thread2 = threading.Thread(target=make_deposit, args=(100,))
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print(f"Final balance: {account.balance}")

Without the lock, the final balance could be 100 due to a race condition. With the lock, the output will be deterministic:

Depositing 100. Current balance: 0
New balance after deposit: 100
Depositing 100. Current balance: 100
New balance after deposit: 200
Final balance: 200

The with self.lock: statement ensures that only one thread can execute the code inside the block at a time.

The High-Level Approach: `concurrent.futures.ThreadPoolExecutor`

For most common use cases, especially I/O-bound ones, the ThreadPoolExecutor is a much more convenient and modern way to manage a pool of threads. It abstracts away the manual creation and joining of threads.

import concurrent.futures
import time
import requests
def fetch_url(url):
    """Fetches a URL and returns its length."""
    try:
        print(f"Fetching {url}...")
        response = requests.get(url, timeout=5)
        print(f"Fetched {url}. Length: {len(response.text)}")
        return len(response.text)
    except requests.RequestException as e:
        print(f"Error fetching {url}: {e}")
        return 0
urls = [
    'https://www.python.org',
    'https://www.google.com',
    'https://www.github.com',
    'https://nonexistent.url.example' # This will fail
]
# Using a ThreadPoolExecutor with a context manager
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    # map() applies the function to each item in the iterable, returning results in order
    # results = executor.map(fetch_url, urls)
    # submit() is more flexible, returns a Future object for each task
    futures = [executor.submit(fetch_url, url) for url in urls]
    # You can process results as they complete
    for future in concurrent.futures.as_completed(futures):
        try:
            result_length = future.result()
            print(f"Task completed. Result: {result_length}")
        except Exception as e:
            print(f"A task generated an exception: {e}")
print("All URL fetching tasks are done.")

Key advantages of ThreadPoolExecutor:

Simpler Syntax: No need to manually create and join Thread objects.
Resource Management: The with statement ensures threads are properly cleaned up.
Result Handling: as_completed() allows you to process results as soon as they are ready, which is very efficient.
Exception Handling: It's easier to catch exceptions from individual tasks.

Summary: Multithreading vs. Multiprocessing

Feature	Multithreading (`threading`)	Multiprocessing (`multiprocessing`)
What it is	Multiple threads in a single process.	Multiple processes, each with its own memory.
Memory	Threads share the same memory space.	Processes have separate memory spaces.
GIL Impact	Limited by the GIL. No true parallelism for CPU-bound tasks.	Bypasses the GIL. Achieves true parallelism on multi-core CPUs.
Best For	I/O-bound tasks (network, disk).	CPU-bound tasks (computation, data processing).
Communication	Easy (shared variables), but risky (needs locks).	Harder (requires Inter-Process Communication like Queues, Pipes).
Overhead	Low. Creating a thread is cheap.	High. Creating a process is more expensive.
Debugging	Can be complex due to shared state.	Generally simpler due to isolated memory.

Final Recommendation

For I/O-Bound Work: Use concurrent.futures.ThreadPoolExecutor. It's the modern, efficient, and easy-to-use choice.
For CPU-Bound Work: Use multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor. This is the only way to leverage multiple CPU cores in Python for heavy computation.
For Low-Level Control: If you have very specific needs, the threading.Thread class gives you fine-grained control, but be prepared to handle synchronization manually.

Python多线程如何高效避免锁竞争？

What is Multithreading?

Why Use Multithreading? (The Pros and Cons)

Use Cases (When to use it):

Why NOT to Use It (The Pitfalls):

How to Use Multithreading in Python: The `threading` Module

The Basic Approach: `Thread` Class

Passing Arguments to Threads

Synchronization: The `Lock`

The High-Level Approach: `concurrent.futures.ThreadPoolExecutor`

Summary: Multithreading vs. Multiprocessing

Final Recommendation

99ANYc3cd6

ArcGIS Flex教程怎么学？入门到精通路径是什么？

PS水印怎么加才不挡图又防偷？

Photoshop广告教程，从零开始学做广告？

Linux 6.5安装步骤有哪些？

Python如何实现指纹识别？

Python数据分析视频教程怎么学更高效？

Java byte 转 long 有哪些方法？

AutoCAD破解安装教程安全吗？风险在哪？

Selenium API怎么用？Python怎么用Selenium？

Python如何模拟UserAgent？

剪卡nano教程，怎么剪才不损坏？

Python QGridlayout如何实现控件布局？

Java中GB2312与UTF-8编码如何转换？

coreldraw教程图片

Python RTMP player如何实现？

Windows10如何正确配置Java环境？

Python多线程如何高效避免锁竞争？

What is Multithreading?

Why Use Multithreading? (The Pros and Cons)

Use Cases (When to use it):

Why NOT to Use It (The Pitfalls):

How to Use Multithreading in Python: The threading Module

The Basic Approach: Thread Class

Passing Arguments to Threads

Synchronization: The Lock

The High-Level Approach: concurrent.futures.ThreadPoolExecutor

Summary: Multithreading vs. Multiprocessing

Final Recommendation

相关推荐

Python数据分析视频教程怎么学更高效？

How to Use Multithreading in Python: The `threading` Module

The Basic Approach: `Thread` Class

Synchronization: The `Lock`

The High-Level Approach: `concurrent.futures.ThreadPoolExecutor`