杰瑞科技汇

Python reporthook如何实现下载进度显示?

Of course! Let's dive deep into Python's reporthook. This is a powerful, often overlooked feature for tracking the progress of long-running downloads.

Python reporthook如何实现下载进度显示?-图1
(图片来源网络,侵删)

What is a reporthook?

A reporthook is a callback function that you can pass to certain functions in Python's standard library, most notably urllib.request.urlretrieve. This function is called periodically during the download process, allowing you to display progress, log information, or even implement more complex logic like pausing and resuming downloads.

Think of it as a progress reporter for your download.


The Core Function: urllib.request.urlretrieve

The primary place you'll use a reporthook is with urllib.request.urlretrieve. This function downloads a file from a URL and saves it locally.

Signature:

Python reporthook如何实现下载进度显示?-图2
(图片来源网络,侵删)
urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)
  • url: The URL of the file to download.
  • filename: The local path to save the file. If None, it's saved in a temporary file.
  • reporthook: This is the function we're interested in. It's called after each block of data is read.
  • data: Optional data to send to the server (e.g., for POST requests).

How to Write a reporthook Function

A reporthook function must accept three positional arguments:

  1. block_num: The number of blocks of data that have been transferred so far. Starts at 0.
  2. block_size: The size (in bytes) of each block of data.
  3. total_size: The total size of the file in bytes. Important: This value might be None if the server doesn't provide a Content-Length header.

Basic reporthook Example

Here is a simple reporthook that prints the percentage of the file downloaded.

import urllib.request
import sys
def simple_reporthook(block_num, block_size, total_size):
    """
    A simple reporthook to print download progress.
    """
    downloaded = block_num * block_size
    if total_size > 0:
        percent = (downloaded / total_size) * 100
        # Use carriage return (\r) to overwrite the previous line
        sys.stdout.write(f"\rDownloaded: {percent:.2f}%")
        sys.stdout.flush()
    else:
        # If total_size is unknown, just show downloaded bytes
        sys.stdout.write(f"\rDownloaded: {downloaded} bytes")
        sys.stdout.flush()
# --- Main part of the script ---
file_url = "https://www.python.org/static/img/python-logo.png"
local_filename = "python-logo.png"
print(f"Starting download of {file_url}...")
# Pass our function as the reporthook argument
urllib.request.urlretrieve(file_url, local_filename, reporthook=simple_reporthook)
print(f"\nDownload complete! File saved as {local_filename}")

How it works:

  1. We define simple_reporthook with the required three arguments.
  2. Inside the function, we calculate the total bytes downloaded so far (block_num * block_size).
  3. If the server provided a total_size, we calculate the percentage.
  4. sys.stdout.write("\r...") prints the progress and then moves the cursor to the beginning of the line (\r). This makes the next output overwrite the current one, creating a "live" updating progress bar.
  5. sys.stdout.flush() ensures the output is written to the console immediately, even if it's not a full line.
  6. We pass simple_reporthook directly to the reporthook argument of urlretrieve.

Advanced reporthook Example (with TQDM)

While the simple example works, it's not very fancy. A much better way to display progress is using the tqdm library, which creates beautiful, feature-rich progress bars.

Python reporthook如何实现下载进度显示?-图3
(图片来源网络,侵删)

First, install tqdm:

pip install tqdm

Now, let's create a reporthook that integrates with tqdm.

import urllib.request
from tqdm import tqdm
def create_tqdm_hook(tqdm_instance):
    """
    Creates and returns a reporthook function that updates a tqdm progress bar.
    """
    def hook(block_num, block_size, total_size):
        tqdm_instance.update(block_size)
        # tqdm handles the final update and closing automatically
    return hook
# --- Main part of the script ---
file_url = "https://www.python.org/static/img/python-logo.png"
local_filename = "python-logo.png"
# 1. Get the total size for the progress bar
with urllib.request.urlopen(file_url) as response:
    total_size = int(response.headers.get('Content-Length', 0))
# 2. Create a tqdm progress bar instance
#    We initialize it with the total size
with tqdm(
    total=total_size,
    unit='B',
    unit_scale=True,
    unit_divisor=1024,
    desc="Downloading Python Logo",
    ascii=True # Use basic characters for better compatibility
) as progress_bar:
    # 3. Create the reporthook using our factory function
    reporthook = create_tqdm_hook(progress_bar)
    # 4. Start the download
    urllib.request.urlretrieve(file_url, local_filename, reporthook=reporthook)
print(f"\nDownload complete! File saved as {local_filename}")

Why this advanced example is better:

  • Visual Appeal: tqdm provides a clean, professional-looking progress bar.
  • Handles None Total Size: Our create_tqdm_hook is simpler. tqdm is smart enough to handle cases where total_size is unknown, showing the total bytes downloaded instead of a percentage.
  • Separation of Concerns: The create_tqdm_hook factory function cleanly separates the logic of creating the hook from the tqdm bar itself.
  • Rich Features: tqdm can estimate time remaining, show speed, and more.

When reporthook is Called

The reporthook is called after each chunk of data is received from the server. The frequency of calls depends on the block_size. A larger block size means fewer calls, while a smaller block size means more frequent calls and a smoother-looking progress update.

Example Flow: Let's say you download a 1000-byte file with a block_size of 100 bytes.

Call # block_num block_size total_size Bytes Downloaded (block_num * block_size)
1 0 100 1000 0
2 1 100 1000 100
3 2 100 1000 200
... ... ... ... ...
10 9 100 1000 900
11 10 100 1000 1000

Alternatives to reporthook

While reporthook is great for urlretrieve, modern Python often prefers more explicit control using a context manager with urllib.request.urlopen.

Modern Approach with urlopen and a while loop:

This method gives you full control over the download process and is generally more flexible.

import urllib.request
import os
file_url = "https://www.python.org/static/img/python-logo.png"
local_filename = "python-logo.png"
# Get total size for progress bar
with urllib.request.urlopen(file_url) as response:
    total_size = int(response.headers.get('Content-Length', 0))
# Download in chunks
with urllib.request.urlopen(file_url) as response, open(local_filename, 'wb') as out_file:
    # Initial progress bar setup
    progress = tqdm(
        total=total_size,
        unit='B',
        unit_scale=True,
        desc="Downloading (Modern Method)",
        ascii=True
    )
    # Read and write in chunks
    while True:
        chunk = response.read(8192) # Read 8KB at a time
        if not chunk:
            break
        out_file.write(chunk)
        progress.update(len(chunk))
progress.close()
print(f"\nDownload complete! File saved as {local_filename}")

Comparison: reporthook vs. urlopen loop

Feature urlretrieve with reporthook urlopen with while loop
Simplicity Very simple. One function call. More verbose, requires manual file handling.
Control Low. You get callbacks, but not direct stream control. High. You control the read/write loop entirely.
Flexibility Low. Tied to the urlretrieve function. High. Can be integrated with any custom logic.
Best For Quick and simple scripts where you just need a progress indicator. Complex downloads, resuming, custom processing of chunks, or when urlretrieve is not available.

Summary

  • A reporthook is a callback function for monitoring downloads.
  • It's primarily used with urllib.request.urlretrieve.
  • It receives three arguments: block_num, block_size, and total_size.
  • For simple needs, a print with \r works.
  • For a professional look and feel, use the tqdm library.
  • While reporthook is convenient, the modern urlopen with a while loop offers more control and is often preferred in larger applications.
分享:
扫描分享到社交APP
上一篇
下一篇