杰瑞科技汇

python restful pdf

Of course! Building a RESTful API in Python to handle PDFs is a very common and useful task. This involves several key components:

  1. Web Framework: To create the REST API endpoints. FastAPI is the modern choice due to its speed, automatic data validation, and interactive API documentation. Flask is also a great, lightweight alternative.
  2. PDF Library: To read, create, or manipulate PDF files. PyPDF2 is good for reading and merging, while ReportLab is excellent for generating PDFs from scratch.
  3. File Handling: Managing file uploads and downloads, which requires setting up the correct media types (multipart/form-data for uploads, application/pdf for downloads).

Let's build a complete, practical example using FastAPI. We'll create an API that can:

  • Upload a PDF file to the server.
  • Download an existing PDF file from the server.
  • Merge multiple uploaded PDFs into a single file.

Step 1: Project Setup

First, let's set up our project directory and a virtual environment.

# Create a project folder
mkdir pdf_api_project
cd pdf_api_project
# Create a virtual environment
python -m venv venv
# Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install the necessary libraries
pip install "fastapi[all]" pypdf2 python-multipart
  • fastapi[all]: Installs FastAPI and uvicorn (the server) and jinja2 (for templates).
  • pypdf2: A pure-Python library for PDF manipulation.
  • python-multipart: Required by FastAPI to handle form data with files.

Step 2: The FastAPI Application

Create a file named main.py. This will contain all our API logic.

# main.py
import os
import shutil
from fastapi import FastAPI, UploadFile, File, HTTPException
from fastapi.responses import FileResponse
from pathlib import Path
import pypdf
# --- Configuration ---
# Define a directory to store uploaded files
UPLOAD_DIR = Path("uploads")
# Create the directory if it doesn't exist
UPLOAD_DIR.mkdir(exist_ok=True)
# --- FastAPI App Initialization ---
app = FastAPI("PDF REST API",
    description="A simple API to upload, download, and merge PDF files.",
    version="1.0.0"
)
# --- Helper Functions ---
def merge_pdfs(pdf_list: list[Path], output_path: Path):
    """Merges a list of PDF files into a single PDF."""
    if not pdf_list:
        raise ValueError("No PDFs provided to merge.")
    merger = pypdf.PdfMerger()
    try:
        for pdf in pdf_list:
            merger.append(pdf)
        merger.write(output_path)
        merger.close()
    except Exception as e:
        # Clean up partially merged file if something goes wrong
        if output_path.exists():
            output_path.unlink()
        raise e
# --- API Endpoints ---
@app.get("/")
async def read_root():
    """Root endpoint to check if the API is running."""
    return {"message": "Welcome to the PDF REST API!"}
@app.post("/upload-pdf/")
async def upload_pdf(file: UploadFile = File(...)):
    """
    Uploads a single PDF file.
    - **file**: The PDF file to upload.
    """
    # Check if the uploaded file is a PDF
    if not file.filename.lower().endswith('.pdf'):
        raise HTTPException(status_code=400, detail="File is not a PDF.")
    # Construct the full file path
    file_location = UPLOAD_DIR / file.filename
    # Save the file
    try:
        with open(file_location, "wb") as buffer:
            shutil.copyfileobj(file.file, buffer)
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Could not save file: {e}")
    return {"info": f"file '{file.filename}' saved at '{file_location}'"}
@app.get("/download-pdf/{filename}")
async def download_pdf(filename: str):
    """
    Downloads a PDF file by its filename.
    - **filename**: The name of the file to download.
    """
    file_location = UPLOAD_DIR / filename
    if not file_location.exists():
        raise HTTPException(status_code=404, detail="File not found")
    # Use FileResponse to serve the file for download
    return FileResponse(
        path=file_location,
        media_type='application/pdf',
        filename=filename
    )
@app.post("/merge-pdfs/")
async def merge_pdfs_endpoint(files: list[UploadFile] = File(...)):
    """
    Merges multiple uploaded PDF files into a single file.
    - **files**: A list of PDF files to merge.
    """
    if len(files) < 2:
        raise HTTPException(status_code=400, detail="Please provide at least two PDF files to merge.")
    # Create a temporary directory for this merge operation
    temp_dir = Path("temp_uploads")
    temp_dir.mkdir(exist_ok=True)
    temp_files = []
    try:
        # 1. Save all uploaded files to the temp directory
        for file in files:
            if not file.filename.lower().endswith('.pdf'):
                raise HTTPException(status_code=400, detail=f"File '{file.filename}' is not a PDF.")
            temp_file_path = temp_dir / file.filename
            with open(temp_file_path, "wb") as buffer:
                shutil.copyfileobj(file.file, buffer)
            temp_files.append(temp_file_path)
        # 2. Define the output file name
        output_filename = "merged_output.pdf"
        output_path = UPLOAD_DIR / output_filename
        # 3. Merge the files
        merge_pdfs(temp_files, output_path)
        # 4. Return the merged file for download
        return FileResponse(
            path=output_path,
            media_type='application/pdf',
            filename=output_filename
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"An error occurred during merge: {e}")
    finally:
        # 5. Clean up the temporary files and directory
        for temp_file in temp_files:
            if temp_file.exists():
                temp_file.unlink()
        if temp_dir.exists():
            temp_dir.rmdir()

Step 3: Run the API Server

Use uvicorn to run your FastAPI application.

uvicorn main:app --reload
  • main: The file main.py.
  • app: The object app = FastAPI() inside main.py.
  • --reload: Makes the server restart after code changes.

You will see output like this:

INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [12345]
INFO:     Started server process [12347]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Step 4: Test the API

FastAPI automatically generates interactive API documentation. Open your browser and go to http://127.0.0.1:8000/docs. You'll see a beautiful, user-friendly Swagger UI interface.

Upload a PDF

  • Expand the POST /upload-pdf/ endpoint.
  • Click "Try it out".
  • Click "Choose File" and select a PDF from your computer.
  • Click "Execute".

You should get a 200 OK response with a JSON body like:

{
  "info": "file 'my_document.pdf' saved at 'uploads/my_document.pdf'"
}

Download a PDF

  • Expand the GET /download-pdf/{filename} endpoint.
  • Click "Try it out".
  • In the filename field, type the name of the file you just uploaded (e.g., my_document.pdf).
  • Click "Execute".

Your browser will prompt you to download the file.

Merge PDFs

  • Expand the POST /merge-pdfs/ endpoint.
  • Click "Try it out".
  • Click "Choose File" and select at least two PDF files.
  • Click "Execute".

The API will process the files, merge them, and then return the merged_output.pdf for you to download.


Alternative: Flask Example

If you prefer Flask, here's how you could implement the upload and download endpoints.

First, install Flask: pip install Flask

Then, create app.py:

# app.py
from flask import Flask, request, send_file, jsonify
from werkzeug.utils import secure_filename
import os
from pypdf import PdfMerger
app = Flask(__name__)
# Configuration
UPLOAD_FOLDER = 'uploads'
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
ALLOWED_EXTENSIONS = {'pdf'}
# Ensure upload folder exists
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
def allowed_file(filename):
    return '.' in filename and \
           filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS
@app.route('/')
def index():
    return "Welcome to the Flask PDF API!"
@app.route('/upload', methods=['POST'])
def upload_file():
    if 'file' not in request.files:
        return jsonify({"error": "No file part"}), 400
    file = request.files['file']
    if file.filename == '':
        return jsonify({"error": "No selected file"}), 400
    if file and allowed_file(file.filename):
        filename = secure_filename(file.filename)
        file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
        return jsonify({"info": f"File '{filename}' uploaded successfully!"}), 200
    return jsonify({"error": "File type not allowed"}), 400
@app.route('/download/<filename>', methods=['GET'])
def download_file(filename):
    try:
        return send_from_directory(app.config['UPLOAD_FOLDER'], filename, as_attachment=True)
    except FileNotFoundError:
        return jsonify({"error": "File not found"}), 404
if __name__ == '__main__':
    app.run(debug=True)

To run the Flask app: python app.py

You would test this using a tool like curl or Postman, as Flask doesn't have built-in interactive docs like FastAPI.

Summary and Best Practices

  • Use FastAPI: It's the modern standard for building APIs in Python, offering speed, type hints, and automatic documentation.
  • File Storage: For production, don't store files directly on your server's local disk. Use a cloud storage service like Amazon S3, Google Cloud Storage, or Azure Blob Storage.
  • Security: Be mindful of security. Sanitize filenames (as secure_filename does in Flask), validate file types, and consider setting file size limits to prevent denial-of-service attacks.
  • Error Handling: Always handle potential errors gracefully, as shown with try...except blocks and HTTPException.
  • Asynchronous Tasks: For long-running tasks like merging very large PDFs, use a task queue like Celery with a message broker like Redis or RabbitMQ. This way, your API can return a job ID immediately and the client can poll for the result later.
分享:
扫描分享到社交APP
上一篇
下一篇