Question

I have successfully integrated tesseract into my android app and it reads whatever the image that I capture but with very less accuracy. But most of the time I do not get the correct text after capturing because some text around the region of interest is also getting captured.

All I want to read is all text from a rectangular area, accurately, without capturing the edges of the rectangle. I have done some research and posted on stackoverflow about this two times, but still did not get a happy result!

Following are the 2 posts that I made:

https://stackoverflow.com/questions/16663504/extract-text-from-a-captured-image?noredirect=1#comment23973954_16663504

Extracting information from captured image in android

I am not sure whether to go ahead with tesseract or use openCV

Was it helpful?

Solution

Including the many links and answers from others, I think it's good to take a step back and note that there are actually two fundamental steps to optical character recognition (OCR):

  • Text Detection: This is the title and focus of your question, and it is concerned with localizing regions in an image that contain text.
  • Text Recognition: This is where the actual recognition happens, where the localized image regions from detection get segmented character-by-character and classified. This is also where tools like Tesseract come into play.

Now, there are also two general settings in which OCR is applied:

  • Controlled: These are images taken from a scanner or similar in-nature where the target is a document and things like perspective, scale, font, orientation, background consistency, etc are pretty docile.
  • Uncontrolled/Scene: These are the more natural and in-the-wild photos, e.g. those taken from a camera, where you are trying to recognize a street sign, shop name, etc.

Tesseract as-is is most applicable to the "controlled" setting. And in general, but for scene OCR especially, "re-training" Tesseract will not directly improve detection, but may improve recognition.

If you are looking to improve scene text detection, see this work; and if you are looking at improving scene text recognition, see this work. Since you asked about detection, the detection reference uses maximally stable extremal regions (MSER), which has a plethora of implementation resources, e.g. see here.

There's also a text detection project here specifically for Android too:
https://github.com/dreamdragon/text-detection

As many have noted, keep in mind that recognition is still an open research challenge.

OTHER TIPS

The solution to improving the OCR output is to

  • either use more training data to train it better

  • filter it's input using some Linear Filter (grayscaling, high-contrasting, blurring)

In the chat we posted a number of links describing filtering techniques used in OCRing, but sample code wasn't posted.

Some of the links posted were

Improving input for OCR

How to train Tesseract

Text enhancement using asymmetric filters <-- this paper is easily found on google, and should be read fully as it quite clearly illustrates and demonstrates necessary steps before OCR-processing the image.

OCR Classification

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top