Question

I'm using Tesseract library for my Android OCR apps, then i need to get bounding box for each character so I follow this tutorial, but when i write this code, it show error, here's my code :

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.setDebug(true);
baseApi.init(Path, Lang);
baseApi.setImage(ReadFile.readBitmap(BitmapBiner));
baseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST, CharacterBlacklist);
baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, CharacterWhitelist);
String RecognizedText = baseApi.getUTF8Text();
List<Rect> characterBoundingBoxes = baseApi.getCharacters().getBoxRects();

      BitmapBiner = BitmapBiner.copy(Bitmap.Config.RGB_565, true);
      Canvas canvas = new Canvas(BitmapBiner);

      // draw bounding box for each character
      for (int i = 0; i < characterBoundingBoxes.size(); i++) {
          paint.setAlpha(0xFF);
          paint.setColor(0xFF00CCFF);
          paint.setStyle(Style.STROKE);
          paint.setStrokeWidth(1);
          Rect r = characterBoundingBoxes.get(i);
          canvas.drawRect(r, paint);
      }   

Then, it show error from line 8, it said "The method getCharacters() is undefined for the type TessBaseAPI". So I decide to use another way, it's ResultIterator, here's my code :

    TessBaseAPI baseApi = new TessBaseAPI();
    baseApi.setDebug(true);
    baseApi.init(Path, Lang);
    baseApi.setImage(ReadFile.readBitmap(BitmapBiner));
    baseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST, CharacterBlacklist);
    baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, CharacterWhitelist);
    String RecognizedText = baseApi.getUTF8Text();  

    final ResultIterator iterator = baseApi.getResultIterator();
    String lastUTF8Text;
    float lastConfidence;
    int[] lastBoundingBox;
    int count = 0;
    iterator.begin();
    do {
        lastUTF8Text = iterator.getUTF8Text(PageIteratorLevel.RIL_SYMBOL);
        lastConfidence = iterator.confidence(PageIteratorLevel.RIL_SYMBOL);
        lastBoundingBox = iterator.getBoundingBox(PageIteratorLevel.RIL_SYMBOL);
        count++;
    } while (iterator.next(PageIteratorLevel.RIL_SYMBOL));

      BitmapBiner = BitmapBiner.copy(Bitmap.Config.RGB_565, true);
      Canvas canvas = new Canvas(BitmapBiner);

      // draw bounding box for each character
      for (int i = 0; i < lastBoundingBox.length; i++) {
        paint.setAlpha(0xA0);
        paint.setColor(Color.RED);
        paint.setStyle(Style.STROKE);
        paint.setStrokeWidth(1);
        Rect r = new Rect(lastBoundingBox[0], lastBoundingBox[1],                       
                          lastBoundingBox[2], lastBoundingBox[3]);
        canvas.drawRect(r, paint);
      } 

It works pretty well so far, but now it's now only the last of characters get bounding box, for example word is "DOG",... so in that picture the only character get bounding box is "G",.... the others got no line bounding box, is it imposible to achieve this thing using tesseract library,... ??? thanks

Was it helpful?

Solution

You should move the drawing inside the do loop.

Updates:

  BitmapBiner = BitmapBiner.copy(Bitmap.Config.RGB_565, true);
  Canvas canvas = new Canvas(BitmapBiner);

  paint.setAlpha(0xA0);
  paint.setColor(Color.RED);
  paint.setStyle(Style.STROKE);
  paint.setStrokeWidth(1);

  do {
    lastUTF8Text = iterator.getUTF8Text(PageIteratorLevel.RIL_SYMBOL);
    lastConfidence = iterator.confidence(PageIteratorLevel.RIL_SYMBOL);
    lastBoundingBox = iterator.getBoundingBox(PageIteratorLevel.RIL_SYMBOL);
    count++;

    // draw bounding box for each character
    Rect r = new Rect(lastBoundingBox[0], lastBoundingBox[1],                       
                      lastBoundingBox[2], lastBoundingBox[3]);
    canvas.drawRect(r, paint);
} while (iterator.next(PageIteratorLevel.RIL_SYMBOL));
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top