杰瑞科技汇

Python中extractfile如何提取文件内容?

Of course! The term "extractfile" in Python most commonly refers to the extractfile() method found in the zipfile module. This method is incredibly useful for reading the contents of a file directly from a ZIP archive without extracting it to your disk first.

Python中extractfile如何提取文件内容?-图1
(图片来源网络,侵删)

I'll cover this in detail, and also show you how to extract all files from a ZIP archive, which is a more common task.


The Core Concept: zipfile.extractfile()

The extractfile() method returns a file-like object. This means you can read the content of the file inside the ZIP as if you had opened a regular file on your computer. This is memory-efficient because you're only loading the specific file you need, not the entire archive.

Key Parameters:

  • member: The name of the file you want to read from the ZIP archive.
  • path (optional): The path to the ZIP archive file (e.g., 'my_archive.zip').

What it Returns:

  • A file-like object if the member exists.
  • None if the member does not exist.

Practical Examples

Let's start by creating a sample ZIP archive to work with.

Step 1: Create a Sample ZIP File

First, let's create a simple script to make a ZIP file named sample.zip containing two files: hello.txt and data.json.

Python中extractfile如何提取文件内容?-图2
(图片来源网络,侵删)
# script_to_create_zip.py
import zipfile
import json
# Create some dummy files to add to the zip
with open("hello.txt", "w") as f:
    f.write("Hello from Python!\nThis is inside a ZIP file.")
data_to_write = {"name": "Alice", "score": 95, "items": ["apple", "banana"]}
with open("data.json", "w") as f:
    json.dump(data_to_write, f, indent=2)
# Create the ZIP archive
with zipfile.ZipFile("sample.zip", "w") as zf:
    zf.write("hello.txt")
    zf.write("data.json")
print("Created 'sample.zip' with 'hello.txt' and 'data.json'")

Run this script once to create sample.zip in the same directory.


Example 2: Extract and Read a Single File (extractfile)

Now, let's use extractfile() to read the content of hello.txt directly from sample.zip without creating a new hello.txt file on our disk.

# script_to_read_from_zip.py
import zipfile
# Open the ZIP file in read mode
with zipfile.ZipFile("sample.zip", "r") as zf:
    # Specify the file name inside the archive you want to read
    file_to_read = "hello.txt"
    # Use extractfile() to get a file-like object
    file_object = zf.extractfile(file_to_read)
    # Check if the file was found
    if file_object is not None:
        print(f"Successfully found and opened '{file_to_read}' inside the archive.")
        # Read the content from the file-like object
        # It's crucial to decode the bytes to a string if it's text
        content = file_object.read().decode('utf-8')
        print("\n--- Content of the file ---")
        print(content)
        # The file-like object is automatically closed when the 'with' block ends
        # or when you call file_object.close()
    else:
        print(f"Error: File '{file_to_read}' not found in the archive.")

Output:

Successfully found and opened 'hello.txt' inside the archive.
--- Content of the file ---
Hello from Python!
This is inside a ZIP file.

Example 3: Reading a Binary File (like an image)

The process is the same for binary files. You just don't need to .decode() the bytes.

Python中extractfile如何提取文件内容?-图3
(图片来源网络,侵删)

Let's add a dummy image to our sample.zip first (you can use any small image file, e.g., logo.png).

# Add an image to the existing zip
with zipfile.ZipFile("sample.zip", "a") as zf: # 'a' for append
    zf.write("logo.png") # Make sure you have a logo.png file

Now, let's read it.

# script_to_read_binary_from_zip.py
import zipfile
with zipfile.ZipFile("sample.zip", "r") as zf:
    file_to_read = "logo.png"
    file_object = zf.extractfile(file_to_read)
    if file_object is not None:
        print(f"Reading binary file: '{file_to_read}'")
        # Read the raw bytes
        image_data = file_object.read()
        print(f"Read {len(image_data)} bytes of binary data.")
        # You could now save this 'image_data' to a new file
        # with open("extracted_logo.png", "wb") as f:
        #     f.write(image_data)
    else:
        print(f"Error: File '{file_to_read}' not found.")

The More Common Task: Extracting All Files

Often, you don't want to read a file in memory; you want to extract all files from the archive to a specific directory. For this, the ZipFile.extractall() method is perfect.

ZipFile.extractall(path=None, members=None, pwd=None)

  • path: The folder where you want to extract all files. If None, it extracts to the current working directory.
  • members: A list of files to extract. If None, it extracts all files.

Example: Extract Everything to a Folder

# script_to_extract_all.py
import zipfile
import os
# Create a directory to extract to if it doesn't exist
extract_to_dir = "extracted_files"
if not os.path.exists(extract_to_dir):
    os.makedirs(extract_to_dir)
    print(f"Created directory: '{extract_to_dir}'")
with zipfile.ZipFile("sample.zip", "r") as zf:
    # Extract all files to the specified directory
    zf.extractall(path=extract_to_dir)
    print(f"All files extracted to '{extract_to_dir}'")
# Verify the contents
print("\nContents of the extracted directory:")
for item in os.listdir(extract_to_dir):
    print(f"- {item}")

Output:

Created directory: 'extracted_files'
All files extracted to 'extracted_files'
Contents of the extracted directory:
- data.json
- hello.txt
- logo.png

Summary: extractfile() vs. extractall()

Feature zipfile.extractfile(member) zipfile.extractall(path)
Purpose Read a file's contents directly from the archive. Extract all (or specific) files from the archive to disk.
Output A file-like object (io.BytesIO or similar). Returns None. Modifies the filesystem.
Use Case When you need to process the data of a file inside a zip without saving it to disk first (e.g., parsing a CSV, reading a config). When you need to un-zip an entire archive to use its files normally.
Memory Efficient for single files, as only one file is loaded into memory at a time. Can be memory-intensive for very large archives, as it decompresses everything.

Other File Formats (.tar, .gz, etc.)

For other archive formats, Python has similar modules:

  • .tar files: Use the tarfile module. The methods are tarfile.open().extractfile() and tarfile.open().extractall().
  • .gz files: These are single compressed files. Use the gzip module. gzip.open('file.gz', 'rt') gives you a file-like object to read from.
  • .rar files: Python's standard library does not support RAR. You would need a third-party library like rarfile.
分享:
扫描分享到社交APP
上一篇
下一篇