Java如何将PDF转JPG？-杰瑞科技汇

Of course! Converting PDF pages to JPG images in Java is a common task. The most popular and robust library for this is Apache PDFBox. It's a powerful, open-source tool from the Apache Software Foundation.

（图片来源网络，侵删）

Here’s a complete guide covering:

Prerequisites: What you need to set up.
Step-by-Step Code: A simple, runnable Java example.
Explanation of the Code: Breaking down what each part does.
Handling Complex PDFs: Tips for dealing with scanned documents or multi-page PDFs.
Alternative Libraries: Other options like iText.

Prerequisites

You need to have the following set up:

Java Development Kit (JDK): Version 8 or newer.
An IDE: Such as IntelliJ IDEA, Eclipse, or VS Code.
Apache PDFBox Library: You'll need to add this dependency to your project.

Adding PDFBox Dependency

For Maven Projects: Add this to your pom.xml file:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.29</version> <!-- Check for the latest version on Maven Central -->
</dependency>

For Gradle Projects: Add this to your build.gradle file:

（图片来源网络，侵删）

implementation 'org.apache.pdfbox:pdfbox:2.0.29' // Check for the latest version

Step-by-Step Java Code

This example will convert each page of a PDF file into a separate JPG image.

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.rendering.ImageType;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
public class PdfToJpgConverter {
    public static void main(String[] args) {
        // 1. Define the input PDF file and the output directory
        File pdfFile = new File("input.pdf");
        File outputDir = new File("output_jpgs");
        // Create the output directory if it doesn't exist
        if (!outputDir.exists()) {
            outputDir.mkdirs();
        }
        try (PDDocument document = PDDocument.load(pdfFile)) {
            // 2. Create a PDFRenderer object
            PDFRenderer pdfRenderer = new PDFRenderer(document);
            // 3. Iterate through each page and convert it to an image
            for (int page = 0; page < document.getNumberOfPages(); ++page) {
                // Render the page as a BufferedImage
                // You can change the image type (e.g., ImageType.RGB, ImageType.GRAYSCALE)
                // and the DPI for higher/lower quality.
                BufferedImage image = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
                // 4. Define the output file path for the JPG
                String outputFileName = outputDir + File.separator + "page_" + (page + 1) + ".jpg";
                File outputFile = new File(outputFileName);
                // 5. Write the BufferedImage to a JPG file
                ImageIO.write(image, "jpg", outputFile);
                System.out.println("Converted page " + (page + 1) + " to " + outputFileName);
            }
            System.out.println("PDF conversion completed successfully!");
        } catch (IOException e) {
            System.err.println("Error during PDF to JPG conversion: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Explanation of the Code

File Definitions:
- File pdfFile = new File("input.pdf");: Specifies the source PDF file. Make sure this file exists in your project's root directory or provide the correct path.
- File outputDir = new File("output_jpgs");: Specifies the folder where the JPG images will be saved. The code creates this folder if it doesn't exist.
Loading the PDF:
- try (PDDocument document = PDDocument.load(pdfFile)): This is the core of PDFBox. It loads the PDF document into memory.
- The try-with-resources statement (try (...)) is used here to ensure that the PDDocument is automatically closed, even if an error occurs. This prevents resource leaks.
Rendering the PDF:
（图片来源网络，侵删）
- PDFRenderer pdfRenderer = new PDFRenderer(document);: This class is responsible for converting PDF pages into Java BufferedImage objects.
- for (int page = 0; page < document.getNumberOfPages(); ++page): We loop through each page of the PDF.
- BufferedImage image = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);: This is the most important line for conversion.
  - page: The page number (0-indexed).
  - 300: The DPI (Dots Per Inch). A higher DPI results in a larger, higher-quality image. 150-300 is a good range for most use cases.
  - ImageType.RGB: Specifies the color format. RGB is for color images. You can use ImageType.GRAYSCALE for black and white documents, which can be smaller in file size.
Saving the Image:
- String outputFileName = ...: We construct a unique filename for each page (e.g., page_1.jpg, page_2.jpg).
- ImageIO.write(image, "jpg", outputFile);: This standard Java method writes the BufferedImage to a file in the specified format ("jpg").

Handling Complex PDFs (Scanned Documents)

If your PDF is a scanned document (essentially a picture of a page), the code above will still work, but the quality might not be ideal because it's trying to interpret the PDF's vector graphics.

For scanned documents, you often get better results by converting them to black and white (1-bit) images. This also drastically reduces the file size.

To do this, simply change the ImageType and use a different rendering method:

// Inside the loop, replace the renderImageWithDPI line with this:
// For high-quality black and white (1-bit) images
BufferedImage image = pdfRenderer.renderImageWithDPI(page, 300);
// For high-quality grayscale images
// BufferedImage image = pdfRenderer.renderImageWithDPI(page, 300, ImageType.GRAYSCALE);

When you don't specify an ImageType, PDFBox intelligently chooses the best format, often ImageType.BINARY for scanned pages, which gives you the sharp 1-bit black and white image.

Alternative Libraries

While PDFBox is the recommended choice, here are two other popular options.

a) iText (Commercial License Warning)

iText is another powerful library, but be aware that its AGPL license can be problematic for commercial, closed-source applications. You would need to purchase a commercial license.

Maven Dependency:

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>5.5.13.3</version> <!-- Note: iText 7 has a different structure -->
</dependency>

Conceptual Code (iText 5):

// Note: This is a simplified example. iText's image extraction can be complex.
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfReaderContentParser;
import com.itextpdf.text.pdf.parser.ImageRenderInfo;
import com.itextpdf.text.pdf.parser.RenderListener;
import com.itextpdf.text.pdf.parser.PdfContentStreamEngine;
import com.itextpdf.text.Image;
// ... other imports
public class PdfToJpgWithIText {
    public static void main(String[] args) throws IOException {
        PdfReader reader = new PdfReader("input.pdf");
        for (int i = 1; i <= reader.getNumberOfPages(); i++) {
            // iText doesn't have a direct "render page to image" method like PDFBox.
            // You typically use a PdfContentStreamParser or a custom RenderListener.
            // This is more complex and less direct than PDFBox.
            System.out.println("Processing page " + i + " with iText is more involved.");
        }
        reader.close();
    }
}

As you can see, iText is more verbose for this specific task.

b. OpenPDF (LGPL License)

OpenPDF is a fork of the old iText 2.x codebase, which uses a more permissive LGPL license. It's a good alternative if you need iText-like features without the AGPL restrictions.

Maven Dependency:

<dependency>
    <groupId>com.github.librepdf</groupId>
    <artifactId>openpdf</artifactId>
    <version>1.3.30</version>
</dependency>

The conversion process with OpenPDF is very similar to the old iText 2.x style and is generally more complex than using PDFBox.

Summary

Feature	Apache PDFBox	iText	OpenPDF
License	Apache 2.0 (Very permissive)	AGPL (Commercial for closed-source)	LGPL (Permissive)
Ease of Use	Excellent. Very direct API.	Complex for this task.	More complex than PDFBox.
Recommendation	Highly Recommended for this task.	Powerful, but licensing is a concern.	Good alternative if you need iText 2.x features.

For most Java applications, Apache PDFBox is the clear winner for converting PDFs to JPG due to its power, simplicity, and permissive license.

Java如何将PDF转JPG？

Prerequisites

Adding PDFBox Dependency

Step-by-Step Java Code

Explanation of the Code

Handling Complex PDFs (Scanned Documents)

Alternative Libraries

a) iText (Commercial License Warning)

b. OpenPDF (LGPL License)

Summary

99ANYc3cd6

如何在Python中高效使用coordinate？

Java for循环遍历List有几种方式？

Java如何获取Tomcat安装路径？

Java、Excel、HTML如何实现数据交互？

Python中setTimeout如何实现？

Java Socket编程实例具体怎么实现？

Java 如何创建 Windows 服务？

Python DHCP客户端如何配置与使用？

Python加密如何实现？

VirtualBox教程，新手如何快速上手虚拟机？

Java如何处理Unicode编码？

手机开不了机怎么刷机教程

SQLServer教程，从零开始学怎么学？

access数据库连接java

Winhex数据恢复教程怎么用？

如何用Python实现FasterRCNN？

Java如何将PDF转JPG？

Prerequisites

Adding PDFBox Dependency

Step-by-Step Java Code

Explanation of the Code

Handling Complex PDFs (Scanned Documents)

Alternative Libraries

a) iText (Commercial License Warning)

b. OpenPDF (LGPL License)

Summary

相关推荐

Java Socket编程实例具体怎么实现？