Of course! This is a very common and powerful combination in the Odoo ecosystem. Using Python's pypdf library within Odoo allows you to programmatically manipulate PDF files, which is essential for features like report generation, document processing, and data extraction.

Let's break down how to use pypdf in Odoo, covering the basics, integration methods, and a practical example.
The Basics: What is pypdf?
pypdf (formerly PyPDF2) is a pure-Python library for working with PDF files. It can:
- Read existing PDF files.
- Extract text and metadata (like author, title).
- Merge multiple PDFs into one.
- Split a single PDF into multiple files.
- Rotate pages.
- Encrypt/Decrypt PDFs.
- Overlay content (like a watermark or a stamp) on existing pages.
Installation in an Odoo Environment
You need to add pypdf to your Odoo project's dependencies.
Option A: Using requirements.txt (Recommended for Development & Git-based Deployments)

-
Navigate to your Odoo project's root directory.
-
Open the
requirements.txtfile. -
Add
pypdfto it.# requirements.txt ... pypdf>=3.0.0 ...
-
Install the dependencies:
(图片来源网络,侵删)pip install -r requirements.txt
Option B: Using the Odoo Addon Manifest (__manifest__.py)
This is the standard way for distributing an addon that requires pypdf. Odoo will automatically install the package when the addon is installed.
-
Open your addon's
__manifest__.pyfile. -
Add
'pypdf'to theexternal_dependencieslist under thepythonkey.# my_addon/__manifest__.py { 'name': 'My Awesome PDF Processing', 'version': '1.0', 'summary': 'Processes PDFs using pypdf', 'author': 'Your Name', 'website': 'https://your-website.com', 'depends': ['base'], # Add your module dependencies here 'data': [ # your views, security, etc. ], 'installable': True, 'application': True, 'external_dependencies': {'python': ['pypdf']}, }
How to Use pypdf in Odoo: Key Scenarios
Here are the most common use cases for pypdf in an Odoo module.
Scenario 1: Extracting Text from a PDF (e.g., for OCR or Data Entry)
Let's say you have a ir.attachment record (which is a PDF) and you want to extract all its text.
import io
import pypdf
from odoo import models, api
class MyPdfProcessor(models.Model):
_name = 'my.pdf.processor'
_description = 'PDF Processing Utilities'
@api.model
def extract_text_from_attachment(self, attachment_id):
"""
Extracts text from a PDF attachment.
:param attachment_id: int, ID of the ir.attachment record.
:return: str, The extracted text.
"""
attachment = self.env['ir.attachment'].browse(attachment_id)
if not attachment or not attachment.datas:
raise ValueError("Attachment not found or has no data.")
# The PDF content is in base64, so we need to decode it first
pdf_data = io.BytesIO(base64.b64decode(attachment.datas))
text_content = ""
try:
with pypdf.PdfReader(pdf_data) as reader:
for page in reader.pages:
# Extract text from each page. Note: pypdf is not great with
# scanned images (it won't extract text from them).
# For that, you'd need OCR libraries like pytesseract.
text_content += page.extract_text() + "\n"
except Exception as e:
raise ValueError(f"Failed to read PDF: {e}")
return text_content.strip()
# --- How to call it ---
# pdf_processor = env['my.pdf.processor']
# attachment = env['ir.attachment'].search([('name', '=', 'my_document.pdf')], limit=1)
# if attachment:
# extracted_text = pdf_processor.extract_text_from_attachment(attachment.id)
# print(extracted_text)
Scenario 2: Merging Multiple PDFs into One
This is useful for creating a single "report package" from multiple documents related to a record (e.g., all invoices for a customer).
import io
import base64
import pypdf
from odoo import models, api
class MyPdfMerger(models.Model):
_name = 'my.pdf.merger'
_description = 'PDF Merging Utilities'
def merge_pdf_attachments(self, attachment_ids, output_filename='merged_document.pdf'):
"""
Merges multiple PDF attachments into a single PDF.
:param attachment_ids: list of int, IDs of ir.attachment records to merge.
:param output_filename: str, The desired name for the merged file.
:return: base64 string of the merged PDF, ready to be stored in an attachment.
"""
merger = pypdf.PdfMerger()
attachment_obj = self.env['ir.attachment']
for att_id in attachment_ids:
attachment = attachment_obj.browse(att_id)
if not attachment or not attachment.datas:
continue # Skip invalid attachments
pdf_data = io.BytesIO(base64.b64decode(attachment.datas))
merger.append(pdf_data)
# Write the merged PDF to a BytesIO buffer
merged_pdf_buffer = io.BytesIO()
merger.write(merged_pdf_buffer)
merger.close()
# Get the PDF data as bytes and encode it to base64
merged_pdf_bytes = merged_pdf_buffer.getvalue()
merged_pdf_base64 = base64.b64encode(merged_pdf_bytes)
return merged_pdf_base64, output_filename
# --- How to call it ---
# merger = env['my.pdf.merger']
# attachment_ids = [1, 2, 3] # List of attachment IDs to merge
# merged_pdf_b64, filename = merger.merge_pdf_attachments(attachment_ids)
#
# # Now, save the merged PDF as a new attachment
# new_attachment_vals = {
# 'name': filename,
# 'type': 'binary',
# 'datas': merged_pdf_b64,
# 'res_model': 'some.model', # e.g., 'account.move'
# 'res_id': 123, # e.g., ID of the account.move
# 'mimetype': 'application/pdf',
# }
# env['ir.attachment'].create(new_attachment_vals)
Scenario 3: Adding a Watermark or a Page Stamp
This is a common requirement for official documents. You'll have a "stamp" PDF (e.g., a page with "DRAFT" or "CONFIDENTIAL") and you'll overlay it on every page of the target document.
import io
import base64
import pypdf
from odoo import models, api
class PdfWatermarker(models.Model):
_name = 'my.pdf.watermarker'
_description = 'PDF Watermarking Utilities'
def add_watermark(self, target_attachment_id, stamp_attachment_id):
"""
Adds a watermark (stamp) from one PDF to every page of another PDF.
:param target_attachment_id: int, ID of the PDF to be watermarked.
:param stamp_attachment_id: int, ID of the PDF containing the watermark.
:return: base64 string of the watermarked PDF.
"""
target_attachment = self.env['ir.attachment'].browse(target_attachment_id)
stamp_attachment = self.env['ir.attachment'].browse(stamp_attachment_id)
if not all([target_attachment.datas, stamp_attachment.datas]):
raise ValueError("One or both attachments are empty.")
# Read the main PDF
target_pdf_data = io.BytesIO(base64.b64decode(target_attachment.datas))
target_reader = pypdf.PdfReader(target_pdf_data)
# Read the stamp PDF
stamp_pdf_data = io.BytesIO(base64.b64decode(stamp_attachment.datas))
stamp_reader = pypdf.PdfReader(stamp_pdf_data)
stamp_page = stamp_reader.pages[0] # Use the first page of the stamp
# Create a writer object
writer = pypdf.PdfWriter()
# Loop through all pages of the target PDF
for page in target_reader.pages:
# Merge the stamp page onto the target page
# The scale and position can be adjusted here
page.merge_page(stamp_page)
writer.add_page(page)
# Write the result to a buffer
output_pdf_buffer = io.BytesIO()
writer.write(output_pdf_buffer)
return base64.b64encode(output_pdf_buffer.getvalue())
# --- How to call it ---
# watermarker = env['my.pdf.watermarker']
# target_att = env['ir.attachment'].search([('name', '=', 'final_report.pdf')], limit=1)
# stamp_att = env['ir.attachment'].search([('name', '=', 'draft_stamp.pdf')], limit=1)
#
# if target_att and stamp_att:
# watermarked_pdf_b64 = watermarker.add_watermark(target_att.id, stamp_att.id)
# # ... save watermarked_pdf_b64 as a new attachment
Important Considerations & Best Practices
- Performance with Large Files:
pypdfloads entire PDFs into memory. For very large PDFs (hundreds of pages), this can consume significant RAM and slow down your Odoo worker. Be mindful of this in your business logic. - Text Extraction Limitation:
pypdfextracts text from the PDF's internal text streams. It cannot read text from scanned images (this is called Optical Character Recognition or OCR). If your PDFs are scans, you'll need a more advanced library likepytesseractin combination withPillow(PIL). - Error Handling: Always wrap your PDF operations in
try...exceptblocks. PDFs can be corrupted or malformed, leading to errors that can crash your Odoo process. - File Paths vs. In-Memory Data: In a web server environment like Odoo, it's almost always better to work with files in memory (
io.BytesIO) rather than saving them to the server's disk. This avoids file permission issues and cleanup problems. Their.attachmentmodel stores PDFs as base64-encoded binary data, which fits perfectly with this in-memory approach. - Alternatives: While
pypdfis excellent for many tasks, other libraries exist:PyMuPDF(fitz): Known for its high performance, especially for text extraction and rendering. It's a C++ library with Python bindings, so it's faster but might be slightly harder to install.reportlab: Better for creating PDFs from scratch rather than manipulating existing ones.
By combining pypdf with Odoo's ir.attachment model, you can build powerful document automation features directly into your application.
