So far in all the previous examples we check how we can read the string or any type of a data on an image. Now lets see how we can read a PDF file with the use of OCR class methods.
to start off you can create a simple PDF file with sample texts in it (Many pages as you want) or simply download any PDF file. In my example i have created a pdf file with 03 pages and below text on each page
Page 01
Page 02
Page 03
now below code will read the PDF file and will print the data inside of it.
import java.awt.image.BufferedImage;
import java.io.File;
import com.asprise.util.ocr.OCR;
import com.asprise.util.pdf.PDFReader;
public class Test2 {
public static void main(String[] args) throws Exception {
//Creates a new object from OCR class
OCR ocr = new OCR();
//Creates a new object from PDFReader class and assign the PDF file location
PDFReader reader = new PDFReader(new File("D:\\1.pdf"));
//Open the PDF file
reader.open();
//Assign the number of pages in the PDF file to a int variable
int pages = reader.getNumberOfPages();
//Prints the number of pages in the PDF file
System.out.println("Number of pages are "+pages);
//Read the contents inside the PDF file in a loop and prints the contents
for(int i=0; i<pages; i++) {
BufferedImage image = reader.getPageAsImage(i);
System.out.println("OCR result:\n" + ocr.recognizeEverything(image)); }
//Close the PDF file
reader.close();
}
}
The output of this file is
Number of pages are : 3
OCR result:
^UNLICENSED VERSlON FOR EVALUATlON PURPOSE ONLY. Asprise Java PDF Libray - http:llasp
Page 01
OCR result:
^UNLICENSED VERSlON FOR EVALUATlON PURPOSE ONLY. Asprise Java PDF Libray - http:llasp
Page OZ
OCR result:
^UNLICENSED VERSlON FOR EVALUATlON PURPOSE ONLY. Asprise Java PDF Libray - http:llasp
Page 03
No comments:
Post a Comment