Of course! Building a RESTful API in Python to handle PDFs is a very common and useful task. This involves several key components:
- Web Framework: To create the REST API endpoints. FastAPI is the modern choice due to its speed, automatic data validation, and interactive API documentation. Flask is also a great, lightweight alternative.
- PDF Library: To read, create, or manipulate PDF files. PyPDF2 is good for reading and merging, while ReportLab is excellent for generating PDFs from scratch.
- File Handling: Managing file uploads and downloads, which requires setting up the correct media types (
multipart/form-datafor uploads,application/pdffor downloads).
Let's build a complete, practical example using FastAPI. We'll create an API that can:
- Upload a PDF file to the server.
- Download an existing PDF file from the server.
- Merge multiple uploaded PDFs into a single file.
Step 1: Project Setup
First, let's set up our project directory and a virtual environment.
# Create a project folder mkdir pdf_api_project cd pdf_api_project # Create a virtual environment python -m venv venv # Activate the virtual environment # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate # Install the necessary libraries pip install "fastapi[all]" pypdf2 python-multipart
fastapi[all]: Installs FastAPI anduvicorn(the server) andjinja2(for templates).pypdf2: A pure-Python library for PDF manipulation.python-multipart: Required by FastAPI to handle form data with files.
Step 2: The FastAPI Application
Create a file named main.py. This will contain all our API logic.
# main.py
import os
import shutil
from fastapi import FastAPI, UploadFile, File, HTTPException
from fastapi.responses import FileResponse
from pathlib import Path
import pypdf
# --- Configuration ---
# Define a directory to store uploaded files
UPLOAD_DIR = Path("uploads")
# Create the directory if it doesn't exist
UPLOAD_DIR.mkdir(exist_ok=True)
# --- FastAPI App Initialization ---
app = FastAPI("PDF REST API",
description="A simple API to upload, download, and merge PDF files.",
version="1.0.0"
)
# --- Helper Functions ---
def merge_pdfs(pdf_list: list[Path], output_path: Path):
"""Merges a list of PDF files into a single PDF."""
if not pdf_list:
raise ValueError("No PDFs provided to merge.")
merger = pypdf.PdfMerger()
try:
for pdf in pdf_list:
merger.append(pdf)
merger.write(output_path)
merger.close()
except Exception as e:
# Clean up partially merged file if something goes wrong
if output_path.exists():
output_path.unlink()
raise e
# --- API Endpoints ---
@app.get("/")
async def read_root():
"""Root endpoint to check if the API is running."""
return {"message": "Welcome to the PDF REST API!"}
@app.post("/upload-pdf/")
async def upload_pdf(file: UploadFile = File(...)):
"""
Uploads a single PDF file.
- **file**: The PDF file to upload.
"""
# Check if the uploaded file is a PDF
if not file.filename.lower().endswith('.pdf'):
raise HTTPException(status_code=400, detail="File is not a PDF.")
# Construct the full file path
file_location = UPLOAD_DIR / file.filename
# Save the file
try:
with open(file_location, "wb") as buffer:
shutil.copyfileobj(file.file, buffer)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Could not save file: {e}")
return {"info": f"file '{file.filename}' saved at '{file_location}'"}
@app.get("/download-pdf/{filename}")
async def download_pdf(filename: str):
"""
Downloads a PDF file by its filename.
- **filename**: The name of the file to download.
"""
file_location = UPLOAD_DIR / filename
if not file_location.exists():
raise HTTPException(status_code=404, detail="File not found")
# Use FileResponse to serve the file for download
return FileResponse(
path=file_location,
media_type='application/pdf',
filename=filename
)
@app.post("/merge-pdfs/")
async def merge_pdfs_endpoint(files: list[UploadFile] = File(...)):
"""
Merges multiple uploaded PDF files into a single file.
- **files**: A list of PDF files to merge.
"""
if len(files) < 2:
raise HTTPException(status_code=400, detail="Please provide at least two PDF files to merge.")
# Create a temporary directory for this merge operation
temp_dir = Path("temp_uploads")
temp_dir.mkdir(exist_ok=True)
temp_files = []
try:
# 1. Save all uploaded files to the temp directory
for file in files:
if not file.filename.lower().endswith('.pdf'):
raise HTTPException(status_code=400, detail=f"File '{file.filename}' is not a PDF.")
temp_file_path = temp_dir / file.filename
with open(temp_file_path, "wb") as buffer:
shutil.copyfileobj(file.file, buffer)
temp_files.append(temp_file_path)
# 2. Define the output file name
output_filename = "merged_output.pdf"
output_path = UPLOAD_DIR / output_filename
# 3. Merge the files
merge_pdfs(temp_files, output_path)
# 4. Return the merged file for download
return FileResponse(
path=output_path,
media_type='application/pdf',
filename=output_filename
)
except Exception as e:
raise HTTPException(status_code=500, detail=f"An error occurred during merge: {e}")
finally:
# 5. Clean up the temporary files and directory
for temp_file in temp_files:
if temp_file.exists():
temp_file.unlink()
if temp_dir.exists():
temp_dir.rmdir()
Step 3: Run the API Server
Use uvicorn to run your FastAPI application.
uvicorn main:app --reload
main: The filemain.py.app: The objectapp = FastAPI()insidemain.py.--reload: Makes the server restart after code changes.
You will see output like this:
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [12345]
INFO: Started server process [12347]
INFO: Waiting for application startup.
INFO: Application startup complete.
Step 4: Test the API
FastAPI automatically generates interactive API documentation. Open your browser and go to http://127.0.0.1:8000/docs. You'll see a beautiful, user-friendly Swagger UI interface.
Upload a PDF
- Expand the
POST /upload-pdf/endpoint. - Click "Try it out".
- Click "Choose File" and select a PDF from your computer.
- Click "Execute".
You should get a 200 OK response with a JSON body like:
{
"info": "file 'my_document.pdf' saved at 'uploads/my_document.pdf'"
}
Download a PDF
- Expand the
GET /download-pdf/{filename}endpoint. - Click "Try it out".
- In the
filenamefield, type the name of the file you just uploaded (e.g.,my_document.pdf). - Click "Execute".
Your browser will prompt you to download the file.
Merge PDFs
- Expand the
POST /merge-pdfs/endpoint. - Click "Try it out".
- Click "Choose File" and select at least two PDF files.
- Click "Execute".
The API will process the files, merge them, and then return the merged_output.pdf for you to download.
Alternative: Flask Example
If you prefer Flask, here's how you could implement the upload and download endpoints.
First, install Flask:
pip install Flask
Then, create app.py:
# app.py
from flask import Flask, request, send_file, jsonify
from werkzeug.utils import secure_filename
import os
from pypdf import PdfMerger
app = Flask(__name__)
# Configuration
UPLOAD_FOLDER = 'uploads'
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
ALLOWED_EXTENSIONS = {'pdf'}
# Ensure upload folder exists
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
def allowed_file(filename):
return '.' in filename and \
filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS
@app.route('/')
def index():
return "Welcome to the Flask PDF API!"
@app.route('/upload', methods=['POST'])
def upload_file():
if 'file' not in request.files:
return jsonify({"error": "No file part"}), 400
file = request.files['file']
if file.filename == '':
return jsonify({"error": "No selected file"}), 400
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
return jsonify({"info": f"File '{filename}' uploaded successfully!"}), 200
return jsonify({"error": "File type not allowed"}), 400
@app.route('/download/<filename>', methods=['GET'])
def download_file(filename):
try:
return send_from_directory(app.config['UPLOAD_FOLDER'], filename, as_attachment=True)
except FileNotFoundError:
return jsonify({"error": "File not found"}), 404
if __name__ == '__main__':
app.run(debug=True)
To run the Flask app:
python app.py
You would test this using a tool like curl or Postman, as Flask doesn't have built-in interactive docs like FastAPI.
Summary and Best Practices
- Use FastAPI: It's the modern standard for building APIs in Python, offering speed, type hints, and automatic documentation.
- File Storage: For production, don't store files directly on your server's local disk. Use a cloud storage service like Amazon S3, Google Cloud Storage, or Azure Blob Storage.
- Security: Be mindful of security. Sanitize filenames (as
secure_filenamedoes in Flask), validate file types, and consider setting file size limits to prevent denial-of-service attacks. - Error Handling: Always handle potential errors gracefully, as shown with
try...exceptblocks andHTTPException. - Asynchronous Tasks: For long-running tasks like merging very large PDFs, use a task queue like Celery with a message broker like Redis or RabbitMQ. This way, your API can return a job ID immediately and the client can poll for the result later.
