杰瑞科技汇

Python如何合并PDF文件?

Of course! Merging PDF files in Python is a common task, and there are several excellent libraries to choose from. The most popular and user-friendly one is PyPDF2.

Python如何合并PDF文件?-图1
(图片来源网络,侵删)

Here’s a complete guide, starting with the simplest method and moving to more advanced and robust options.


Method 1: Using PyPDF2 (Recommended for Simple Merging)

PyPDF2 is a pure-Python library for working with PDF documents. It's easy to install and use for basic PDF merging tasks.

Installation

First, you need to install the library using pip:

pip install PyPDF2

Basic Script to Merge PDFs

This script will take a list of PDF file paths and merge them into a single output file.

Python如何合并PDF文件?-图2
(图片来源网络,侵删)
import PyPDF2
import os
def merge_pdfs(pdf_paths, output_path):
    """
    Merges multiple PDF files into a single PDF.
    :param pdf_paths: A list of paths to the PDF files to be merged.
    :param output_path: The path where the merged PDF will be saved.
    """
    # Create a PDF file writer object
    pdf_writer = PyPDF2.PdfWriter()
    for pdf_path in pdf_paths:
        # Open each PDF file in read-binary mode
        with open(pdf_path, 'rb') as pdf_file:
            # Create a PDF reader object
            pdf_reader = PyPDF2.PdfReader(pdf_file)
            # Add each page from the reader to the writer
            for page_num in range(len(pdf_reader.pages)):
                page = pdf_reader.pages[page_num]
                pdf_writer.add_page(page)
    # Write the merged content to a new PDF file
    with open(output_path, 'wb') as output_file:
        pdf_writer.write(output_file)
    print(f"Successfully merged {len(pdf_paths)} files into {output_path}")
# --- Example Usage ---
if __name__ == "__main__":
    # List of PDF files you want to merge
    # Make sure these files exist in the same directory or provide full paths
    pdf_files_to_merge = [
        'document1.pdf',
        'document2.pdf',
        'document3.pdf'
    ]
    # The name of the output merged file
    output_filename = 'merged_document.pdf'
    # Check if all input files exist
    for file in pdf_files_to_merge:
        if not os.path.exists(file):
            print(f"Error: File not found - {file}")
            exit()
    merge_pdfs(pdf_files_to_merge, output_filename)

How to Run the Script

  1. Save the code above as a Python file (e.g., merger.py).
  2. Place the PDF files you want to merge (document1.pdf, document2.pdf, etc.) in the same directory.
  3. Run the script from your terminal:
    python merger.py
  4. A new file named merged_document.pdf will be created.

Method 2: Using pypdf (The Modern Successor to PyPDF2)

The original PyPDF2 library is now in maintenance mode. Its successor, pypdf, is more actively developed and has a slightly cleaner API. It's recommended for new projects.

Installation

pip install pypdf

Script to Merge PDFs

The logic is very similar, but the class names are a bit shorter.

from pypdf import PdfMerger
import os
def merge_pdfs_pypdf(pdf_paths, output_path):
    """
    Merges multiple PDF files into a single PDF using pypdf.
    :param pdf_paths: A list of paths to the PDF files to be merged.
    :param output_path: The path where the merged PDF will be saved.
    """
    merger = PdfMerger()
    for pdf_path in pdf_paths:
        # Append each PDF to the merger object
        merger.append(pdf_path)
    # Write the merged PDF to a file
    merger.write(output_path)
    # Always close the Merger object to free resources
    merger.close()
    print(f"Successfully merged {len(pdf_paths)} files into {output_path}")
# --- Example Usage ---
if __name__ == "__main__":
    pdf_files_to_merge = [
        'document1.pdf',
        'document2.pdf',
        'document3.pdf'
    ]
    output_filename = 'merged_document_pypdf.pdf'
    for file in pdf_files_to_merge:
        if not os.path.exists(file):
            print(f"Error: File not found - {file}")
            exit()
    merge_pdfs_pypdf(pdf_files_to_merge, output_filename)

As you can see, merger.append() is a very convenient and readable method.


Method 3: Using pdfrw (Good for Modifying PDFs)

pdfrw is another great library, especially if you need to do more than just merge, like modifying pages or forms. It's known for being robust.

Python如何合并PDF文件?-图3
(图片来源网络,侵删)

Installation

pip install pdfrw

Script to Merge PDFs

from pdfrw import PdfReader, PdfWriter, IndirectPdfDict
import os
def merge_pdfs_pdfrw(pdf_paths, output_path):
    """
    Merges multiple PDF files into a single PDF using pdfrw.
    """
    # Create a PDF writer object
    pdf_writer = PdfWriter()
    for pdf_path in pdf_paths:
        # Read the pages from each PDF
        pdf_reader = PdfReader(pdf_path)
        for page in pdf_reader.pages:
            # Add the page to the writer
            pdf_writer.addpage(page)
    # Write the output PDF
    pdf_writer.write(output_path)
    print(f"Successfully merged {len(pdf_paths)} files into {output_path}")
# --- Example Usage ---
if __name__ == "__main__":
    pdf_files_to_merge = [
        'document1.pdf',
        'document2.pdf',
        'document3.pdf'
    ]
    output_filename = 'merged_document_pdfrw.pdf'
    for file in pdf_files_to_merge:
        if not os.path.exists(file):
            print(f"Error: File not found - {file}")
            exit()
    merge_pdfs_pdfrw(pdf_files_to_merge, output_filename)

Comparison and Recommendation

Feature PyPDF2 pypdf pdfrw
Ease of Use Good Excellent Good
API Standard Clean, modern Standard
Maintenance Low (Maintenance Mode) High (Active) High (Active)
Key Strength Simplicity, pure Python Modern, actively developed Robust, good for modification
Best For Quick, simple scripts. New projects. General purpose PDF manipulation. Advanced tasks like modifying forms, page splitting.

Recommendation:

  • For new projects, use pypdf. It's the modern standard and will be actively supported.
  • For quick and simple tasks, PyPDF2 is still perfectly fine and widely used.
  • If you need to do more than merge (e.g., rotate, extract, or modify PDFs), pdfrw is an excellent choice.

Advanced Tip: Creating a Command-Line Tool

You can make your script much more useful by allowing it to accept file paths as command-line arguments. The argparse module is perfect for this.

Here's an enhanced version of the pypdf script that does this:

# merge_cli.py
import argparse
import os
from pypdf import PdfMerger
def main():
    # Set up the argument parser
    parser = argparse.ArgumentParser(
        description="Merge multiple PDF files into a single PDF."
    )
    # Add arguments for input files and output file
    parser.add_argument(
        'output', 
        help='The name of the output merged PDF file.'
    )
    parser.add_argument(
        'pdf_files', 
        nargs='+',  # This means one or more arguments
        help='A list of PDF files to merge.'
    )
    # Parse the arguments
    args = parser.parse_args()
    # Check if all input files exist
    for file in args.pdf_files:
        if not os.path.exists(file):
            print(f"Error: File not found - {file}")
            return
    # Perform the merge
    try:
        merger = PdfMerger()
        for pdf_file in args.pdf_files:
            print(f"Adding {pdf_file}...")
            merger.append(pdf_file)
        merger.write(args.output)
        merger.close()
        print(f"\nSuccess! Files merged into '{args.output}'")
    except Exception as e:
        print(f"An error occurred: {e}")
if __name__ == "__main__":
    main()

How to Use the Command-Line Tool

  1. Save the code as merge_cli.py.
  2. Make sure pypdf is installed (pip install pypdf).
  3. Run it from your terminal, passing the output file name
分享:
扫描分享到社交APP
上一篇
下一篇