Asprise OCR Java如何使用？-杰瑞科技汇

Of course! Here is a comprehensive guide to using Asprise OCR for Java, covering what it is, how to set it up, and how to use it with code examples.

（图片来源网络，侵删）

What is Asprise OCR?

Asprise OCR is a commercial Optical Character Recognition (OCR) software library that allows developers to extract text from images and PDF documents. It's known for its ease of use, high accuracy, and support for a wide range of languages.

Key Features:

Multi-format Support: Reads images (JPEG, PNG, TIFF, BMP, GIF, etc.) and PDFs.
High Accuracy: Provides good text recognition, including for complex layouts.
Extensive Language Support: Over 100 languages are supported, including English, Chinese, Japanese, Korean, and most European languages.
Multiple Output Formats: Can output text as plain text, structured XML, searchable PDF, or even into a java.awt.image.BufferedImage object.
Easy Integration: Provides a straightforward Java API.

Important Note: Asprise OCR is a commercial product. While it offers a free trial for evaluation, a license must be purchased for use in production applications. The trial version will add a watermark to the output.

Step 1: Setup and Dependencies

To use Asprise OCR in your Java project, you need to add the JAR file to your project's classpath.

（图片来源网络，侵删）

Method A: Manual Download (For Maven/Gradle projects)

Download the JAR: Go to the Asprise OCR for Java download page and download the latest version.
Add to Project:
- For Maven: Install the JAR into your local Maven repository or add it directly to your pom.xml.
```
<dependency>
    <groupId>com.asprise</groupId>
    <artifactId>java-ocr-api</artifactId>
    <version>15.3.0.2</version> 
</dependency>
```
- For Gradle: Add the JAR to your libs folder and include it in your build.gradle:
```
implementation files('libs/java-ocr-api-15.3.0.2.jar') // Path to your downloaded JAR
```
  Or, if you install it to your local Maven repo:
```
implementation 'com.asprise:java-ocr-api:15.3.0.2'
```
- For a Standard Java Project: Simply add the downloaded java-ocr-api-<version>.jar to your project's build path in your IDE (like IntelliJ or Eclipse).

Method B: Using Maven (Recommended)

If you use Maven, you can add the dependency directly. The Asprise JAR is available in a public Maven repository.

<dependency>
    <groupId>com.asprise</groupId>
    <artifactId>java-ocr-api</artifactId>
    <version>15.3.0.2</version> <!-- Always check for the latest version on their site -->
</dependency>

Step 2: Basic OCR Example (Extracting Text from an Image)

This is the simplest use case: loading an image file and extracting all the text from it.

Prerequisites:

（图片来源网络，侵删）

You have an image file (e.g., sample.png).
You have the Asprise JAR in your project's classpath.

Java Code:

import com.asprise.ocr.Ocr;
public class BasicOCRExample {
    public static void main(String[] args) {
        // 1. Create an Ocr object
        Ocr ocr = new Ocr();
        // 2. Set the license key (required for production, can be skipped for trial)
        // ocr.startEngine("eng", Ocr.LICENSE_KEY);
        // 3. Perform OCR on an image file
        // The result is the extracted text as a String.
        String s = ocr.recognize(
                "./path/to/your/sample.png", // Path to your image file
                Ocr.RECOGNIZE_TYPE_ALL,      // Recognize everything
                Ocr.OUTPUT_FORMAT_PLAINTEXT  // Output as plain text
        );
        // 4. Print the result
        System.out.println("OCR Result:\n" + s);
        // 5. Stop the OCR engine
        ocr.stopEngine();
    }
}

Explanation:

new Ocr(): Creates a new instance of the OCR engine.
ocr.startEngine("eng", ...): This is optional for the trial version but mandatory for licensed versions. "eng" specifies the language (English). You can change this to support other languages (e.g., "chi_sim" for Simplified Chinese, "jpn" for Japanese).
ocr.recognize(...): This is the core method.
- First argument: The path to the image or PDF file.
- Second argument: Ocr.RECOGNIZE_TYPE_ALL tells it to recognize text, barcodes, etc. You can also use Ocr.RECOGNIZE_TYPE_TEXT for text only.
- Third argument: The desired output format. Ocr.OUTPUT_FORMAT_PLAINTEXT is the most common.
ocr.stopEngine(): Releases the resources used by the OCR engine. It's good practice to call this when you're done.

Step 3: Advanced Example (Processing a PDF and Getting Structured Output)

This example shows how to process a multi-page PDF and get the output in structured XML format, which can be very useful for parsing.

Java Code:

import com.asprise.ocr.Ocr;
public class AdvancedOCRExample {
    public static void main(String[] args) {
        Ocr ocr = new Ocr();
        // For a licensed version, uncomment the next line:
        // ocr.startEngine("eng", Ocr.LICENSE_KEY);
        System.out.println("Processing PDF...");
        // Recognize a PDF file and get the output as XML
        String xmlResult = ocr.recognize(
                "./path/to/your/document.pdf", // Path to your PDF file
                Ocr.RECOGNIZE_TYPE_ALL,
                Ocr.OUTPUT_FORMAT_XML // Output as structured XML
        );
        // Print the XML result
        System.out.println("OCR XML Output:\n" + xmlResult);
        // You can now parse this XML string with a library like DOM or SAX
        // to extract specific text blocks, coordinates, etc.
        ocr.stopEngine();
    }
}

Sample XML Output Snippet: The XML output is very detailed. Each <block> contains information about a recognized text area, including its coordinates, text, and confidence score.

<?xml version="1.0" encoding="UTF-8"?>
<asprise-ocr version="15.3">
    <page index="0" width="612" height="792">
        <block x="50" y="100" width="500" height="50" confidence="0.98">
            <line x="50" y="100" width="500" height="50" confidence="0.98">
                <word x="50" y="100" width="150" height="30" confidence="0.99">Hello</word>
                <word x="220" y="100" width="150" height="30" confidence="0.97">World</word>
            </line>
        </block>
        <block x="50" y="200" width="500" height="50" confidence="0.95">
            <line x="50" y="200" width="500" height="50" confidence="0.95">
                <word x="50" y="200" width="200" height="30" confidence="0.96">This is a test.</word>
            </line>
        </block>
    </page>
</asprise-ocr>

Step 4: Processing an Image from a URL or in Memory

You can also process images that are not on the local filesystem.

From a URL:

import com.asprise.ocr.Ocr;
import java.net.URL;
public class OCRFromUrl {
    public static void main(String[] args) throws Exception {
        Ocr ocr = new Ocr();
        String imageUrl = "https://www.asprise.com/content/img/sample-ocr.png";
        String text = ocr.recognize(new URL(imageUrl), Ocr.RECOGNIZE_TYPE_TEXT, Ocr.OUTPUT_FORMAT_PLAINTEXT);
        System.out.println(text);
        ocr.stopEngine();
    }
}

From an Image in Memory (e.g., from a BufferedImage):

This is useful when you receive an image from another part of your application, like a webcam or a user upload.

import com.asprise.ocr.Ocr;
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
public class OCRFromBufferedImage {
    public static void main(String[] args) throws Exception {
        Ocr ocr = new Ocr();
        // Load an image into a BufferedImage
        File imageFile = new File("./path/to/your/sample.png");
        BufferedImage bufferedImage = ImageIO.read(imageFile);
        // Recognize from the BufferedImage
        String text = ocr.recognize(bufferedImage, Ocr.RECOGNIZE_TYPE_TEXT, Ocr.OUTPUT_FORMAT_PLAINTEXT);
        System.out.println(text);
        ocr.stopEngine();
    }
}

Summary and Best Practices

License is Key: Remember that the trial version is for evaluation only. For any commercial or production use, you must purchase a license and call ocr.startEngine() with your license key.
Stop the Engine: Always call ocr.stopEngine() at the end of your OCR processing to free up memory and system resources.
Choose the Right Output Format:
- Use OUTPUT_FORMAT_PLAINTEXT for simple text extraction.
- Use OUTPUT_FORMAT_XML when you need information about text layout, coordinates, or confidence scores.
- Use OUTPUT_FORMAT_PDF to create a new, searchable PDF from an image or scanned document.
Performance: The OCR engine can be resource-intensive. If you are processing many images in a loop, consider initializing the Ocr object once and reusing it, rather than creating a new one for each image.
Error Handling: Wrap your OCR calls in try-catch blocks to handle potential IOExceptions or other runtime exceptions.

Asprise OCR Java如何使用？

What is Asprise OCR?

Step 1: Setup and Dependencies

Method A: Manual Download (For Maven/Gradle projects)

Method B: Using Maven (Recommended)

Step 2: Basic OCR Example (Extracting Text from an Image)

Step 3: Advanced Example (Processing a PDF and Getting Structured Output)

Step 4: Processing an Image from a URL or in Memory

Summary and Best Practices

99ANYc3cd6

Python imresize函数怎么用？

Java定时任务在Spring中如何实现与优化？

Python如何绑定事件？

Java如何根据WSDL生成客户端代码？

python 3.0 decode

Java Socket编程实例具体怎么实现？

Java与Matlab混合编程如何实现无缝对接？

安卓手机aria2怎么用？

VIVO X7外屏更换教程步骤详解？

联想Y485如何全拆清灰？

aecc2025安装教程

phpstorm视频教程

Python中SQL参数如何正确使用？

Python、Java、Web开发该如何选择？

windows python 分词

Python中enumerate怎么用？

Asprise OCR Java如何使用？

What is Asprise OCR?

Step 1: Setup and Dependencies

Method A: Manual Download (For Maven/Gradle projects)

Method B: Using Maven (Recommended)

Step 2: Basic OCR Example (Extracting Text from an Image)

Step 3: Advanced Example (Processing a PDF and Getting Structured Output)

Step 4: Processing an Image from a URL or in Memory

Summary and Best Practices

相关推荐

Java Socket编程实例具体怎么实现？