Tess4j result iterator

https://stackoverflow.com/questions/19732841

02-07-2022
|

Question

I have the following code:

public String getName(BufferedImage subc){
        String name=null;
        Tesseract1 instance = new Tesseract1();
        instance.setPageSegMode(8);
        instance.setLanguage("eng");
        instance.setTessVariable("tessedit_char_whitelist", "qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM0123456789_.");
        try {           
            name=instance.doOCR(subc);
        } catch (TesseractException e) {System.err.println(e.getMessage());}
        name=new StringTokenizer(name,"\n").nextToken();
        return name;
}

where subc is the image already cut and preprocessed of the word. What I want is either to obtain the confidence of the recognition of the image or to iterate the first, lets say, 30 most likely words. I have found examples like this Tess4J: How to get a Character's confidence value?, but it breaks at the first line,

TessResultIterator ri = TessAPI1.TessBaseAPIGetIterator(api);

when I put my object "instance" as the parameter "api", and after some trying to use getpointer and different objects I ve had no luck so far. Here http://tess4j.sourceforge.net/docs/docs-1.0/net/sourceforge/tess4j/package-summary.html, in the class summary I understand that maybe the objects Tesseract or Tesseract1 are not the most appropiate for what I want to do, but I didn't manage to recognize a word from an image with TessAPI or TessAPI1. The ResultIterator in c++ looks pretty concise, but with pointers: https://code.google.com/p/tesseract-ocr/wiki/APIExample Thanks!

Solution

The Tesseract is a simplified API, exposing only the most commonly used methods from TessAPI interface. To get the text confidence, you'll need to work with the TessAPI. The library's unit tests include some common use cases. You definitely want to take a look at them.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow