杰瑞科技汇

Java如何实现Word转PDF?

Of course! Converting a Word document (.docx) to a PDF in Java is a common requirement. There are several excellent libraries to achieve this, each with its own strengths.

Java如何实现Word转PDF?-图1
(图片来源网络,侵删)

Here’s a comprehensive guide covering the most popular and effective methods.

Summary of Options

Library License Ease of Use Performance Key Feature
Apache POI Apache 2.0 (Permissive) Medium Good The standard for Office file manipulation in Java.
iText 7 AGPL (Free for open source) Medium Excellent Powerful, feature-rich, but AGPL can be restrictive.
Docx4j LGPL v2.1 (Permissive) Medium Good Excellent choice if you also work with other OOXML formats.
Aspose.Words Commercial (Free trial) Easiest Excellent The most robust and feature-rich, but requires a paid license.

Method 1: Apache POI with pdfbox (Recommended Free & Open Source)

This is a very popular combination. Apache POI is the de-facto standard for reading/writing Office files. While POI itself doesn't have a built-in PDF writer, it integrates seamlessly with Apache PDFBox to render the Word content into a PDF.

How it works: POI parses the .docx file, extracts the text, paragraphs, and basic formatting, and then PDFBox is used to lay out this content on a PDF page.

Add Dependencies

You need both poi and pdfbox in your pom.xml:

Java如何实现Word转PDF?-图2
(图片来源网络,侵删)
<dependencies>
    <!-- Apache POI for .docx file handling -->
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi</artifactId>
        <version>5.2.5</version>
    </dependency>
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml</artifactId>
        <version>5.2.5</version>
    </dependency>
    <!-- Apache PDFBox for PDF creation -->
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>3.0.2</version>
    </dependency>
</dependencies>

Java Code

This example demonstrates a basic conversion. Important: This method is best for simple documents with plain text. Complex layouts, headers, footers, and images may not be perfectly preserved.

import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
public class WordToPdfConverter {
    public static void main(String[] args) {
        // Input and output file paths
        String inputWordPath = "path/to/your/document.docx";
        String outputPdfPath = "path/to/your/output.pdf";
        try (FileInputStream fis = new FileInputStream(inputWordPath);
             XWPFDocument document = new XWPFDocument(fis);
             PDDocument pdfDocument = new PDDocument()) {
            // Create a new page for the PDF
            PDPage page = new PDPage();
            pdfDocument.addPage(page);
            try (PDPageContentStream contentStream = new PDPageContentStream(pdfDocument, page)) {
                // Set font and font size
                contentStream.setFont(PDType1Font.HELVETICA, 12);
                contentStream.beginText();
                contentStream.newLineAtOffset(50, 750); // x, y coordinates
                // Iterate through paragraphs of the Word document
                for (XWPFParagraph paragraph : document.getParagraphs()) {
                    // Add paragraph text to the PDF
                    contentStream.showText(paragraph.getText());
                    // Move to the next line
                    contentStream.newLineAtOffset(0, -15); // Negative value moves down
                }
                contentStream.endText();
            }
            // Save the PDF
            pdfDocument.save(outputPdfPath);
            System.out.println("PDF created successfully at: " + outputPdfPath);
        } catch (IOException e) {
            System.err.println("Error during Word to PDF conversion: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Method 2: iText 7 (Powerful, but Check License)

iText is a powerful library for creating and manipulating PDFs. The itext-7 version has a new module, itext7-pdf-html, which can convert HTML to PDF. For Word, the best approach is to first convert the .docx to HTML and then use iText to render the HTML into a PDF.

How it works: Docx4j (or another library) is used to convert .docx to HTML. Then, iText's HTMLWorker (or a more modern converter) renders the HTML onto a PDF canvas.

Add Dependencies

You'll need itext7-core and docx4j for the .docx to HTML conversion.

Java如何实现Word转PDF?-图3
(图片来源网络,侵删)
<dependencies>
    <!-- iText 7 Core -->
    <dependency>
        <groupId>com.itextpdf</groupId>
        <artifactId>itext7-core</artifactId>
        <version>7.2.5</version>
        <type>pom</type>
    </dependency>
    <dependency>
        <groupId>com.itextpdf</groupId>
        <artifactId>html2pdf</artifactId>
        <version>4.0.3</version>
    </dependency>
    <!-- Docx4j for .docx to HTML conversion -->
    <dependency>
        <groupId>org.docx4j</groupId>
        <artifactId>docx4j-core</artifactId>
        <version>11.4.4</version>
    </dependency>
    <dependency>
        <groupId>org.docx4j</groupId>
        <artifactId>docx4j-export-fo</artifactId>
        <version>11.4.4</version>
    </dependency>
</dependencies>

Java Code

This example uses Docx4j to convert the Word document to an HTML string, which is then passed to iText.

import com.itextpdf.html2pdf.HtmlConverter;
import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import java.io.File;
import java.io.OutputStream;
import java.io.StringWriter;
public class ITextWordToPdfConverter {
    public static void main(String[] args) {
        String inputWordPath = "path/to/your/document.docx";
        String outputPdfPath = "path/to/your/output.pdf";
        try {
            // 1. Load the Word document using Docx4j
            WordprocessingMLPackage wordMLPackage = Docx4J.load(new File(inputWordPath));
            // 2. Convert the Word document to HTML (as a string)
            //    This conversion is quite good at preserving formatting.
            StringWriter htmlWriter = new StringWriter();
            Docx4J.convert(wordMLPackage, htmlWriter, Docx4J.FLAG XHTML);
            String html = htmlWriter.toString();
            // 3. Convert the HTML string to PDF using iText
            OutputStream outputStream = new File(outputPdfPath).toPath().newOutputStream();
            HtmlConverter.convertToPdf(html, outputStream);
            System.out.println("PDF created successfully at: " + outputPdfPath);
        } catch (Exception e) {
            System.err.println("Error during Word to PDF conversion: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Method 3: Docx4j (Good for OOXML Workflows)

Docx4j is another excellent library specifically for working with OOXML formats (.docx, .xlsx, .pptx). It has built-in conversion capabilities to PDF via its docx4j-export-fo module, which uses Apache FOP.

How it works: Docx4j converts the .docx into an XSL-FO (Formatting Objects) document. Apache FOP then takes this XSL-FO and renders it into a PDF.

Add Dependencies

<dependencies>
    <!-- Docx4j Core -->
    <dependency>
        <groupId>org.docx4j</groupId>
        <artifactId>docx4j-core</artifactId>
        <version>11.4.4</version>
    </dependency>
    <!-- Docx4j Export to FO (Formatting Objects) -->
    <dependency>
        <groupId>org.docx4j</groupId>
        <artifactId>docx4j-export-fo</artifactId>
        <version>11.4.4</version>
    </dependency>
</dependencies>

Java Code

This is a very direct approach with Docx4j.

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.Wordprocessing.MainDocumentPart;
import java.io.File;
public class Docx4jConverter {
    public static void main(String[] args) {
        String inputWordPath = "path/to/your/document.docx";
        String outputPdfPath = "path/to/your/output.pdf";
        try {
            // 1. Load the Word document
            WordprocessingMLPackage wordMLPackage = Docx4J.load(new File(inputWordPath));
            // 2. Convert the Word document to PDF
            //    The "FO" stands for Formatting Objects.
            Docx4J.toPDF(wordMLPackage, new File(outputPdfPath));
            System.out.println("PDF created successfully at: " + outputPdfPath);
        } catch (Exception e) {
            System.err.println("Error during Word to PDF conversion: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Method 4: Aspose.Words (Easiest & Most Robust, Commercial)

Aspose.Words is a commercial library widely regarded as the most robust and feature-rich solution for Word processing and conversion. It has a free trial version that places a watermark on the output PDF, making it perfect for evaluation.

How it works: Aspose.Words has a highly optimized rendering engine that converts the Word document's internal structure directly to PDF with high fidelity, preserving almost all formatting, layouts, and even complex elements like tables, headers, footers, and images.

Add Dependencies

You need to download the JAR from the Aspose website and add it to your project, or use their Maven repository.

<repositories>
    <repository>
        <id>aspose-java-releases</id>
        <name>Aspose Java API Repository</name>
        <url>https://repository.aspose.com/repo/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>com.aspose</groupId>
        <artifactId>aspose-words</artifactId>
        <version>23.8</version> <!-- Use the latest version -->
    </dependency>
</dependencies>

Java Code

The code is remarkably simple.

import com.aspose.words.Document;
import com.aspose.words.SaveFormat;
import java.io.File;
public class AsposeWordsConverter {
    public static void main(String[] args) {
        String inputWordPath = "path/to/your/document.docx";
        String outputPdfPath = "path/to/your/output.pdf";
        try {
            // 1. Load the Word document
            Document doc = new Document(inputWordPath);
            // 2. Save the document as PDF
            doc.save(outputPdfPath, SaveFormat.PDF);
            System.out.println("PDF created successfully at: " + outputPdfPath);
        } catch (Exception e) {
            System.err.println("Error during Word to PDF conversion: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Which One Should You Choose?

  • For a simple, free, and open-source solution: Use Apache POI + PDFBox. It's great if you only need basic text conversion and want to stick with Apache-licensed libraries.
  • For high-fidelity conversion with good open-source options: Use Docx4j or iText 7 (with Docx4j). They preserve formatting much better than the POI/PDFBox combination.
  • For the best possible results with minimal code: Use Aspose.Words. If your project can afford a commercial license, this is the top choice for reliability, performance, and feature support. Always test with the free trial first.
分享:
扫描分享到社交APP
上一篇
下一篇