杰瑞科技汇

Python urlgrabber如何使用?

Of course! urlgrabber is a classic and powerful Python library for downloading files from URLs. While it's largely been superseded by the more modern requests library, it's still used in many legacy systems (like the popular yum package manager) and is worth understanding.

Here's a comprehensive guide covering what urlgrabber is, how to install it, and how to use its core features.


What is urlgrabber?

urlgrabber is a Python module designed to make it easy to grab files from remote URLs. Its key strengths are:

  1. Progress Bar: It has a built-in, easy-to-use progress bar, which is a huge plus for command-line applications.
  2. Resume Capability: It can resume interrupted downloads, a very useful feature for large files.
  3. Robustness: It handles various URL schemes (http, https, ftp) and has built-in retry logic for transient network errors.
  4. Integration: It was the go-to library for tools like yum and anaconda because of its reliability and feature set.

Installation

You can install urlgrabber using pip. Note that on some systems, you might need to install libcurl and python-devel first.

# Using pip
pip install urlgrabber
# On systems like CentOS/RHEL, you might also need:
# sudo yum install libcurl-devel python-devel

Core Usage Examples

Let's start with the most common use case: downloading a file.

Basic Download

The main function is urlgrab(). You give it a URL and a local filename, and it downloads the file.

import urlgrabber
url = 'https://www.python.org/static/img/python-logo.png'
local_file = 'python-logo.png'
try:
    print(f"Downloading {url} to {local_file}...")
    # The urlgrab() function does the work
    urlgrabber.urlgrab(url, local_file)
    print("Download complete!")
except urlgrabber.grabber.URLGrabError as e:
    print(f"An error occurred: {e}")

Downloading to a Temporary File

Sometimes you don't want to specify the output filename. urlgrabber can save it to a temporary file and return the path.

import urlgrabber
url = 'https://www.python.org/static/img/python-logo.png'
try:
    print(f"Downloading {url} to a temporary file...")
    # urlgrab() returns the path of the downloaded file
    temp_file_path = urlgrabber.urlgrab(url)
    print(f"Downloaded to temporary file: {temp_file_path}")
    # You can now work with the file...
    # For example, print its size
    import os
    print(f"File size: {os.path.getsize(temp_file_path)} bytes")
    # Don't forget to clean up the temporary file when you're done
    os.remove(temp_file_path)
    print("Temporary file removed.")
except urlgrabber.grabber.URLGrabError as e:
    print(f"An error occurred: {e}")

Using the Progress Bar

One of the best features is the simple progress bar. You just need to enable it.

import urlgrabber
url = 'https://www.python.org/ftp/python/3.11.4/Python-3.11.4.tgz'
local_file = 'Python-3.11.4.tgz'
try:
    print(f"Downloading {url} with a progress bar...")
    # The progress meter is enabled by default for terminals
    # You can customize it with the 'progress_meter' option
    urlgrabber.urlgrab(url, local_file, progress_meter=1) # 1 for a simple meter
    print("\nDownload complete!")
except urlgrabber.grabber.URLGrabError as e:
    print(f"An error occurred: {e}")

When you run this, you'll see a progress bar in your terminal like:

Downloading https://www.python.org/ftp/python/3.11.4/Python-3.11.4.tgz with a progress bar...
Python-3.11.4.tgz   12.01 MB/s |  24 MB  |   00:02    

Resuming a Download

If a download is interrupted, you can easily resume it from where it left off. urlgrabber handles this automatically if the local file already exists.

import urlgrabber
import os
url = 'https://www.python.org/ftp/python/3.11.4/Python-3.11.4.tgz'
local_file = 'Python-3.11.4.tgz'
# First, let's create a dummy, small file to simulate an interruption
# In a real scenario, this would be a partially downloaded file.
with open(local_file, 'wb') as f:
    f.write(b"This is some partial data.")
print(f"Simulating an interrupted download. File size: {os.path.getsize(local_file)} bytes")
try:
    print("Attempting to resume download...")
    # urlgrabber will check if the local file exists and its size
    # and use the 'Range' HTTP header to resume the download.
    urlgrabber.urlgrab(url, local_file)
    print("\nDownload (or resume) complete!")
except urlgrabber.grabber.URLGrabError as e:
    print(f"An error occurred: {e}")

Adding Download Options (e.g., User-Agent)

You can pass a dictionary of options to urlgrab() to control the download behavior. A common use case is setting a User-Agent.

import urlgrabber
url = 'https://httpbin.org/user-agent' # A URL that shows your user agent
local_file = 'user_agent_response.txt'
# Options to pass to the underlying urllib2/requests library
options = {
    'user_agent': 'MyCoolScript/1.0 (https://mycoolscript.com)'
}
try:
    print(f"Downloading {url} with a custom User-Agent...")
    urlgrabber.urlgrab(url, local_file, options=options)
    # Read and print the response to see the user agent
    with open(local_file, 'r') as f:
        print("Response from server:")
        print(f.read())
except urlgrabber.grabber.URLGrabError as e:
    print(f"An error occurred: {e}")

urlgrabber vs. requests

For any new project, requests is the recommended library. Here’s a quick comparison to understand why.

Feature urlgrabber requests
Ease of Use Simple for basic downloads. More complex for custom headers. Extremely simple and intuitive. The de facto standard.
Progress Bar Built-in and easy. A major advantage for CLI apps. Not built-in. Requires a third-party library like tqdm.
Resume Built-in. Very easy to use. Not built-in. Requires manual implementation with Range headers.
Features Focused on downloading. Good for its purpose. A full-featured HTTP library for making any kind of request.
Status Legacy. Not actively developed. Active. Continuously updated and maintained.
Modern Python Can feel a bit dated. The modern standard, widely used and supported.

Conclusion: Use urlgrabber if you need its specific, built-in features (especially progress bar and resume) and are working on a legacy project. For any new development, use requests and add tqdm for a progress bar if needed.

分享:
扫描分享到社交APP
上一篇
下一篇