杰瑞科技汇

Python pypdf与Odoo如何高效集成?

Of course! This is a very common and powerful combination in the Odoo ecosystem. Using Python's pypdf library within Odoo allows you to programmatically manipulate PDF files, which is essential for features like report generation, document processing, and data extraction.

Python pypdf与Odoo如何高效集成?-图1
(图片来源网络,侵删)

Let's break down how to use pypdf in Odoo, covering the basics, integration methods, and a practical example.


The Basics: What is pypdf?

pypdf (formerly PyPDF2) is a pure-Python library for working with PDF files. It can:

  • Read existing PDF files.
  • Extract text and metadata (like author, title).
  • Merge multiple PDFs into one.
  • Split a single PDF into multiple files.
  • Rotate pages.
  • Encrypt/Decrypt PDFs.
  • Overlay content (like a watermark or a stamp) on existing pages.

Installation in an Odoo Environment

You need to add pypdf to your Odoo project's dependencies.

Option A: Using requirements.txt (Recommended for Development & Git-based Deployments)

Python pypdf与Odoo如何高效集成?-图2
(图片来源网络,侵删)
  1. Navigate to your Odoo project's root directory.

  2. Open the requirements.txt file.

  3. Add pypdf to it.

    # requirements.txt
    ...
    pypdf>=3.0.0
    ...
  4. Install the dependencies:

    Python pypdf与Odoo如何高效集成?-图3
    (图片来源网络,侵删)
    pip install -r requirements.txt

Option B: Using the Odoo Addon Manifest (__manifest__.py)

This is the standard way for distributing an addon that requires pypdf. Odoo will automatically install the package when the addon is installed.

  1. Open your addon's __manifest__.py file.

  2. Add 'pypdf' to the external_dependencies list under the python key.

    # my_addon/__manifest__.py
    {
        'name': 'My Awesome PDF Processing',
        'version': '1.0',
        'summary': 'Processes PDFs using pypdf',
        'author': 'Your Name',
        'website': 'https://your-website.com',
        'depends': ['base'], # Add your module dependencies here
        'data': [
            # your views, security, etc.
        ],
        'installable': True,
        'application': True,
        'external_dependencies': {'python': ['pypdf']},
    }

How to Use pypdf in Odoo: Key Scenarios

Here are the most common use cases for pypdf in an Odoo module.

Scenario 1: Extracting Text from a PDF (e.g., for OCR or Data Entry)

Let's say you have a ir.attachment record (which is a PDF) and you want to extract all its text.

import io
import pypdf
from odoo import models, api
class MyPdfProcessor(models.Model):
    _name = 'my.pdf.processor'
    _description = 'PDF Processing Utilities'
    @api.model
    def extract_text_from_attachment(self, attachment_id):
        """
        Extracts text from a PDF attachment.
        :param attachment_id: int, ID of the ir.attachment record.
        :return: str, The extracted text.
        """
        attachment = self.env['ir.attachment'].browse(attachment_id)
        if not attachment or not attachment.datas:
            raise ValueError("Attachment not found or has no data.")
        # The PDF content is in base64, so we need to decode it first
        pdf_data = io.BytesIO(base64.b64decode(attachment.datas))
        text_content = ""
        try:
            with pypdf.PdfReader(pdf_data) as reader:
                for page in reader.pages:
                    # Extract text from each page. Note: pypdf is not great with
                    # scanned images (it won't extract text from them).
                    # For that, you'd need OCR libraries like pytesseract.
                    text_content += page.extract_text() + "\n"
        except Exception as e:
            raise ValueError(f"Failed to read PDF: {e}")
        return text_content.strip()
# --- How to call it ---
# pdf_processor = env['my.pdf.processor']
# attachment = env['ir.attachment'].search([('name', '=', 'my_document.pdf')], limit=1)
# if attachment:
#     extracted_text = pdf_processor.extract_text_from_attachment(attachment.id)
#     print(extracted_text)

Scenario 2: Merging Multiple PDFs into One

This is useful for creating a single "report package" from multiple documents related to a record (e.g., all invoices for a customer).

import io
import base64
import pypdf
from odoo import models, api
class MyPdfMerger(models.Model):
    _name = 'my.pdf.merger'
    _description = 'PDF Merging Utilities'
    def merge_pdf_attachments(self, attachment_ids, output_filename='merged_document.pdf'):
        """
        Merges multiple PDF attachments into a single PDF.
        :param attachment_ids: list of int, IDs of ir.attachment records to merge.
        :param output_filename: str, The desired name for the merged file.
        :return: base64 string of the merged PDF, ready to be stored in an attachment.
        """
        merger = pypdf.PdfMerger()
        attachment_obj = self.env['ir.attachment']
        for att_id in attachment_ids:
            attachment = attachment_obj.browse(att_id)
            if not attachment or not attachment.datas:
                continue # Skip invalid attachments
            pdf_data = io.BytesIO(base64.b64decode(attachment.datas))
            merger.append(pdf_data)
        # Write the merged PDF to a BytesIO buffer
        merged_pdf_buffer = io.BytesIO()
        merger.write(merged_pdf_buffer)
        merger.close()
        # Get the PDF data as bytes and encode it to base64
        merged_pdf_bytes = merged_pdf_buffer.getvalue()
        merged_pdf_base64 = base64.b64encode(merged_pdf_bytes)
        return merged_pdf_base64, output_filename
# --- How to call it ---
# merger = env['my.pdf.merger']
# attachment_ids = [1, 2, 3] # List of attachment IDs to merge
# merged_pdf_b64, filename = merger.merge_pdf_attachments(attachment_ids)
#
# # Now, save the merged PDF as a new attachment
# new_attachment_vals = {
#     'name': filename,
#     'type': 'binary',
#     'datas': merged_pdf_b64,
#     'res_model': 'some.model', # e.g., 'account.move'
#     'res_id': 123,            # e.g., ID of the account.move
#     'mimetype': 'application/pdf',
# }
# env['ir.attachment'].create(new_attachment_vals)

Scenario 3: Adding a Watermark or a Page Stamp

This is a common requirement for official documents. You'll have a "stamp" PDF (e.g., a page with "DRAFT" or "CONFIDENTIAL") and you'll overlay it on every page of the target document.

import io
import base64
import pypdf
from odoo import models, api
class PdfWatermarker(models.Model):
    _name = 'my.pdf.watermarker'
    _description = 'PDF Watermarking Utilities'
    def add_watermark(self, target_attachment_id, stamp_attachment_id):
        """
        Adds a watermark (stamp) from one PDF to every page of another PDF.
        :param target_attachment_id: int, ID of the PDF to be watermarked.
        :param stamp_attachment_id: int, ID of the PDF containing the watermark.
        :return: base64 string of the watermarked PDF.
        """
        target_attachment = self.env['ir.attachment'].browse(target_attachment_id)
        stamp_attachment = self.env['ir.attachment'].browse(stamp_attachment_id)
        if not all([target_attachment.datas, stamp_attachment.datas]):
            raise ValueError("One or both attachments are empty.")
        # Read the main PDF
        target_pdf_data = io.BytesIO(base64.b64decode(target_attachment.datas))
        target_reader = pypdf.PdfReader(target_pdf_data)
        # Read the stamp PDF
        stamp_pdf_data = io.BytesIO(base64.b64decode(stamp_attachment.datas))
        stamp_reader = pypdf.PdfReader(stamp_pdf_data)
        stamp_page = stamp_reader.pages[0] # Use the first page of the stamp
        # Create a writer object
        writer = pypdf.PdfWriter()
        # Loop through all pages of the target PDF
        for page in target_reader.pages:
            # Merge the stamp page onto the target page
            # The scale and position can be adjusted here
            page.merge_page(stamp_page)
            writer.add_page(page)
        # Write the result to a buffer
        output_pdf_buffer = io.BytesIO()
        writer.write(output_pdf_buffer)
        return base64.b64encode(output_pdf_buffer.getvalue())
# --- How to call it ---
# watermarker = env['my.pdf.watermarker']
# target_att = env['ir.attachment'].search([('name', '=', 'final_report.pdf')], limit=1)
# stamp_att = env['ir.attachment'].search([('name', '=', 'draft_stamp.pdf')], limit=1)
#
# if target_att and stamp_att:
#     watermarked_pdf_b64 = watermarker.add_watermark(target_att.id, stamp_att.id)
#     # ... save watermarked_pdf_b64 as a new attachment

Important Considerations & Best Practices

  1. Performance with Large Files: pypdf loads entire PDFs into memory. For very large PDFs (hundreds of pages), this can consume significant RAM and slow down your Odoo worker. Be mindful of this in your business logic.
  2. Text Extraction Limitation: pypdf extracts text from the PDF's internal text streams. It cannot read text from scanned images (this is called Optical Character Recognition or OCR). If your PDFs are scans, you'll need a more advanced library like pytesseract in combination with Pillow (PIL).
  3. Error Handling: Always wrap your PDF operations in try...except blocks. PDFs can be corrupted or malformed, leading to errors that can crash your Odoo process.
  4. File Paths vs. In-Memory Data: In a web server environment like Odoo, it's almost always better to work with files in memory (io.BytesIO) rather than saving them to the server's disk. This avoids file permission issues and cleanup problems. The ir.attachment model stores PDFs as base64-encoded binary data, which fits perfectly with this in-memory approach.
  5. Alternatives: While pypdf is excellent for many tasks, other libraries exist:
    • PyMuPDF (fitz): Known for its high performance, especially for text extraction and rendering. It's a C++ library with Python bindings, so it's faster but might be slightly harder to install.
    • reportlab: Better for creating PDFs from scratch rather than manipulating existing ones.

By combining pypdf with Odoo's ir.attachment model, you can build powerful document automation features directly into your application.

分享:
扫描分享到社交APP
上一篇
下一篇