杰瑞科技汇

Java如何处理Office与PDF文件转换?

Of course! Working with PDFs in a Java application, especially in an "Office" context (like generating reports, invoices, or forms), is a very common requirement. I'll provide a comprehensive guide covering the most popular and effective libraries, complete with code examples.

Java如何处理Office与PDF文件转换?-图1
(图片来源网络,侵删)

Summary of Approaches

Library Type License Key Strengths Best For
Apache PDFBox Open Source Apache 2.0 Pure Java, no external dependencies, free for commercial use. Low-level PDF manipulation, text extraction, filling forms, splitting/merging.
iText 7 Commercial (with Open Source AGPL) AGPL / Commercial Extremely powerful, feature-rich, excellent documentation. High-end PDF generation, digital signatures, complex layouts, templates.
OpenPDF Open Source MPL/LGPL/GPL A fork of the old iText 2.x, familiar API. Simple to moderate PDF generation, especially for those with iText 2.x experience.
Flying Saucer (xhtmlrenderer) Open Source MIT/LGPL Renders HTML/CSS to PDF. Generating PDFs from existing web technologies (HTML, CSS).

Apache PDFBox (Recommended for most cases)

PDFBox is a pure Java library from the Apache Software Foundation. It's free, powerful, and has no external dependencies. It's often the best starting point for any PDF task in Java.

Key Features:

  • Text Extraction: Read text from PDFs.
  • PDF Creation: Create PDFs from scratch.
  • PDF Manipulation: Merge, split, and encrypt PDFs.
  • Form Filling: Fill in AcroForms (PDF forms).
  • Image Handling: Add images to PDFs.

Maven Dependency:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>3.0.2</version> <!-- Check for the latest version -->
</dependency>

Example: Generating a Simple PDF Report

This example creates a PDF with a title, some text, and a table, mimicking a simple office report.

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import java.io.IOException;
public class PdfBoxReportGenerator {
    public static void main(String[] args) {
        String outputFileName = "office_report.pdf";
        try (PDDocument document = new PDDocument()) {
            // 1. Create a new page
            PDPage page = new PDPage();
            document.addPage(page);
            // 2. Create a content stream to write on the page
            try (PDPageContentStream contentStream = new PDPageContentStream(document, page)) {
                // 3. Set fonts and initial position
                contentStream.setFont(PDType1Font.HELVETICA_BOLD, 16);
                contentStream.beginText();
                contentStream.newLineAtOffset(50, 750);
                contentStream.showText("Monthly Sales Report");
                contentStream.endText();
                // 4. Add a subtitle
                contentStream.setFont(PDType1Font.HELVETICA, 12);
                contentStream.beginText();
                contentStream.newLineAtOffset(50, 730);
                contentStream.showText("Generated on: " + java.time.LocalDate.now());
                contentStream.endText();
                // 5. Add a table header
                contentStream.setFont(PDType1Font.HELVETICA_BOLD, 12);
                float yPosition = 700;
                contentStream.beginText();
                contentStream.newLineAtOffset(50, yPosition);
                contentStream.showText("Product Name");
                contentStream.newLineAtOffset(150, 0); // Move right for the next column
                contentStream.showText("Quantity");
                contentStream.newLineAtOffset(100, 0);
                contentStream.showText("Price");
                contentStream.endText();
                // 6. Add table data
                String[][] tableData = {
                    {"Laptop", "10", "$1200.00"},
                    {"Mouse", "50", "$25.00"},
                    {"Keyboard", "30", "$75.00"}
                };
                contentStream.setFont(PDType1Font.HELVETICA, 12);
                yPosition -= 20; // Move down for the first data row
                for (String[] row : tableData) {
                    contentStream.beginText();
                    contentStream.newLineAtOffset(50, yPosition);
                    contentStream.showText(row[0]);
                    contentStream.newLineAtOffset(150, 0);
                    contentStream.showText(row[1]);
                    contentStream.newLineAtOffset(100, 0);
                    contentStream.showText(row[2]);
                    contentStream.endText();
                    yPosition -= 20; // Move down for the next row
                }
            }
            // 7. Save the document
            document.save(outputFileName);
            System.out.println("PDF created successfully: " + outputFileName);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

iText 7 (The Powerhouse)

iText is the industry standard for high-quality PDF generation. The latest version (7) is a complete rewrite of the classic iText 2.x. Important Note: The iText 7 Core library is released under the AGPL license, which can have implications for commercial closed-source applications. You will need a commercial license for that.

Key Features:

  • Layout & Design: Advanced layout managers for complex documents.
  • Templating: Create templates with placeholders for dynamic content.
  • Digital Signatures: Sign PDFs with digital certificates.
  • PDF/A Compliance: Generate PDFs for long-term archival.
  • Advanced Graphics: Draw complex shapes, charts, and barcodes.

Maven Dependency:

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itext7-core</artifactId>
    <version>7.2.5</version> <!-- Check for the latest version -->
    <type>pom</type>
</dependency>
<!-- You need to add specific modules -->
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>7.2.5</version>
</dependency>
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itext-layout</artifactId>
    <version>7.2.5</version>
</dependency>

Example: Generating a PDF with iText 7 Layout

This example uses the high-level layout API to create a similar report, which is much cleaner for complex documents.

Java如何处理Office与PDF文件转换?-图2
(图片来源网络,侵删)
import com.itextpdf.io.font.PdfEncodings;
import com.itextpdf.io.image.ImageDataFactory;
import com.itextpdf.kernel.font.PdfFont;
import com.itextpdf.kernel.font.PdfFontFactory;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.element.Cell;
import com.itextpdf.layout.element.Image;
import com.itextpdf.layout.element.Paragraph;
import com.itextpdf.layout.element.Table;
import com.itextpdf.layout.property.TextAlignment;
import java.io.IOException;
public class ItextReportGenerator {
    public static void main(String[] args) {
        String outputFileName = "itext_office_report.pdf";
        try (PdfDocument pdf = new PdfDocument(new PdfWriter(outputFileName));
             Document document = new Document(pdf)) {
            // Add a title
            PdfFont font = PdfFontFactory.createFont("Helvetica");
            Paragraph title = new Paragraph("Monthly Sales Report")
                    .setFont(font)
                    .setFontSize(18)
                    .setBold()
                    .setTextAlignment(TextAlignment.CENTER);
            document.add(title);
            document.add(new Paragraph("\n")); // Add a newline for spacing
            // Create a table with 3 columns
            Table table = new Table(new float[]{3, 1, 1}); // Column widths ratio
            table.setWidthPercent(100); // Use full page width
            // Add table header
            table.addHeaderCell(new Cell().add(new Paragraph("Product Name").setBold()));
            table.addHeaderCell(new Cell().add(new Paragraph("Quantity").setBold()));
            table.addHeaderCell(new Cell().add(new Paragraph("Price").setBold()));
            // Add table data
            table.addCell(new Cell().add(new Paragraph("Laptop")));
            table.addCell(new Cell().add(new Paragraph("10")));
            table.addCell(new Cell().add(new Paragraph("$1200.00")));
            table.addCell(new Cell().add(new Paragraph("Mouse")));
            table.addCell(new Cell().add(new Paragraph("50")));
            table.addCell(new Cell().add(new Paragraph("$25.00")));
            table.addCell(new Cell().add(new Paragraph("Keyboard")));
            table.addCell(new Cell().add(new Paragraph("30")));
            table.addCell(new Cell().add(new Paragraph("$75.00")));
            document.add(table);
        } catch (IOException e) {
            e.printStackTrace();
        }
        System.out.println("PDF created successfully: " + outputFileName);
    }
}

OpenPDF (A Good iText 2.x Alternative)

OpenPDF is a fork of the original iText 2.x library, which was open source but changed its license. OpenPDF aims to keep that legacy version alive and free. The API is very similar to the old iText.

Maven Dependency:

<dependency>
    <groupId>com.github.librepdf</groupId>
    <artifactId>openpdf</artifactId>
    <version>1.3.30</version> <!-- Check for the latest version -->
</dependency>

Example (Conceptual)

The code would look very similar to the old iText 2.x style, which is less structured than iText 7's layout but very direct.

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfPCell;
import com.lowagie.text.pdf.PdfPTable;
import com.lowagie.text.pdf.PdfWriter;
import java.io.FileOutputStream;
import java.io.IOException;
public class OpenPdfExample {
    public static void main(String[] args) {
        Document document = new Document();
        try {
            PdfWriter.getInstance(document, new FileOutputStream("openpdf_report.pdf"));
            document.open();
            document.add(new Paragraph("Hello, OpenPDF!"));
            PdfPTable table = new PdfPTable(3);
            table.addCell("Header 1");
            table.addCell("Header 2");
            table.addCell("Header 3");
            table.addCell("Cell 1");
            table.addCell("Cell 2");
            table.addCell("Cell 3");
            document.add(table);
        } catch (DocumentException | IOException e) {
            e.printStackTrace();
        } finally {
            document.close();
        }
    }
}

Flying Saucer (for HTML to PDF)

If your "Office" documents are already designed as HTML pages (e.g., generated by a templating engine like Thymeleaf or FreeMarker), Flying Saucer is the perfect tool. It uses a real web rendering engine (like the one in browsers) to convert HTML/CSS to PDF.

Maven Dependency:

<dependency>
    <groupId>org.xhtmlrenderer</groupId>
    <artifactId>flying-saucer-pdf</artifactId>
    <version>9.1.22</version> <!-- Check for the latest version -->
</dependency>

Example: Converting HTML to PDF

  1. Create an HTML file (report.html):

    Java如何处理Office与PDF文件转换?-图3
    (图片来源网络,侵删)
    <!DOCTYPE html>
    <html>
    <head>
        <style>
            body { font-family: Arial, sans-serif; }
            h1 { color: #333; }
            table { width: 100%; border-collapse: collapse; }
            th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
            th { background-color: #f2f2f2; }
        </style>
    </head>
    <body>
        <h1>Monthly Sales Report</h1>
        <p>Generated on: <span id="date"></span></p>
        <table>
            <thead>
                <tr>
                    <th>Product Name</th>
                    <th>Quantity</th>
                    <th>Price</th>
                </tr>
            </thead>
            <tbody>
                <tr>
                    <td>Laptop</td>
                    <td>10</td>
                    <td>$1200.00</td>
                </tr>
                <tr>
                    <td>Mouse</td>
                    <td>50</td>
                    <td>$25.00</td>
                </tr>
            </tbody>
        </table>
    </body>
    </html>
  2. Java Code to render the HTML:

    import org.xhtmlrenderer.pdf.ITextRenderer;
    java.io.File;
    java.io.FileOutputStream;
    java.io.OutputStream;
    public class HtmlToPdfConverter {
        public static void main(String[] args) {
            try {
                // 1. Create an ITextRenderer instance
                ITextRenderer renderer = new ITextRenderer();
                // 2. Set the XHTML content
                // You can load from a file or a string
                renderer.setDocument(new File("report.html").toURI().toURL());
                // 3. Layout and render the PDF
                renderer.layout();
                // 4. Write the PDF to a file
                try (OutputStream os = new FileOutputStream("html_report.pdf")) {
                    renderer.createPDF(os);
                }
                System.out.println("PDF created from HTML successfully: html_report.pdf");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

How to Choose?

  • For simple to moderate PDF tasks: Start with Apache PDFBox. It's free, robust, and covers 90% of common use cases.
  • For complex, high-quality, or template-based reports: Use iText 7. Its layout API is a game-changer for professional document generation. Be mindful of the licensing for commercial projects.
  • If you have existing HTML/CSS: Use Flying Saucer. It's the most natural and maintainable way to generate PDFs from web content.
  • If you need a drop-in replacement for the old iText 2.x: Use OpenPDF.
分享:
扫描分享到社交APP
上一篇
下一篇