I'm using Tesseract library for my Android OCR apps, then i need to get bounding box for each character so I follow this tutorial, but when i write this code, it show error, here's my code :

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.setDebug(true);
baseApi.init(Path, Lang);
baseApi.setImage(ReadFile.readBitmap(BitmapBiner));
baseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST, CharacterBlacklist);
baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, CharacterWhitelist);
String RecognizedText = baseApi.getUTF8Text();
List<Rect> characterBoundingBoxes = baseApi.getCharacters().getBoxRects();

      BitmapBiner = BitmapBiner.copy(Bitmap.Config.RGB_565, true);
      Canvas canvas = new Canvas(BitmapBiner);

      // draw bounding box for each character
      for (int i = 0; i < characterBoundingBoxes.size(); i++) {
          paint.setAlpha(0xFF);
          paint.setColor(0xFF00CCFF);
          paint.setStyle(Style.STROKE);
          paint.setStrokeWidth(1);
          Rect r = characterBoundingBoxes.get(i);
          canvas.drawRect(r, paint);
      }   

Then, it show error from line 8, it said "The method getCharacters() is undefined for the type TessBaseAPI". So I decide to use another way, it's ResultIterator, here's my code :

    TessBaseAPI baseApi = new TessBaseAPI();
    baseApi.setDebug(true);
    baseApi.init(Path, Lang);
    baseApi.setImage(ReadFile.readBitmap(BitmapBiner));
    baseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST, CharacterBlacklist);
    baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, CharacterWhitelist);
    String RecognizedText = baseApi.getUTF8Text();  

    final ResultIterator iterator = baseApi.getResultIterator();
    String lastUTF8Text;
    float lastConfidence;
    int[] lastBoundingBox;
    int count = 0;
    iterator.begin();
    do {
        lastUTF8Text = iterator.getUTF8Text(PageIteratorLevel.RIL_SYMBOL);
        lastConfidence = iterator.confidence(PageIteratorLevel.RIL_SYMBOL);
        lastBoundingBox = iterator.getBoundingBox(PageIteratorLevel.RIL_SYMBOL);
        count++;
    } while (iterator.next(PageIteratorLevel.RIL_SYMBOL));

      BitmapBiner = BitmapBiner.copy(Bitmap.Config.RGB_565, true);
      Canvas canvas = new Canvas(BitmapBiner);

      // draw bounding box for each character
      for (int i = 0; i < lastBoundingBox.length; i++) {
        paint.setAlpha(0xA0);
        paint.setColor(Color.RED);
        paint.setStyle(Style.STROKE);
        paint.setStrokeWidth(1);
        Rect r = new Rect(lastBoundingBox[0], lastBoundingBox[1],                       
                          lastBoundingBox[2], lastBoundingBox[3]);
        canvas.drawRect(r, paint);
      } 

It works pretty well so far, but now it's now only the last of characters get bounding box, for example word is "DOG",... so in that picture the only character get bounding box is "G",.... the others got no line bounding box, is it imposible to achieve this thing using tesseract library,... ??? thanks

有帮助吗?

解决方案

You should move the drawing inside the do loop.

Updates:

  BitmapBiner = BitmapBiner.copy(Bitmap.Config.RGB_565, true);
  Canvas canvas = new Canvas(BitmapBiner);

  paint.setAlpha(0xA0);
  paint.setColor(Color.RED);
  paint.setStyle(Style.STROKE);
  paint.setStrokeWidth(1);

  do {
    lastUTF8Text = iterator.getUTF8Text(PageIteratorLevel.RIL_SYMBOL);
    lastConfidence = iterator.confidence(PageIteratorLevel.RIL_SYMBOL);
    lastBoundingBox = iterator.getBoundingBox(PageIteratorLevel.RIL_SYMBOL);
    count++;

    // draw bounding box for each character
    Rect r = new Rect(lastBoundingBox[0], lastBoundingBox[1],                       
                      lastBoundingBox[2], lastBoundingBox[3]);
    canvas.drawRect(r, paint);
} while (iterator.next(PageIteratorLevel.RIL_SYMBOL));
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top