Question

I would like to build an Android application that, via an OCR library, should scan a picture extracting text from it .

What Java library should I use?

Was it helpful?

Solution

Don't know how good it is (it definitely needs to be trained first), but there is Ron Cemer's Java OCR library.

OTHER TIPS

If you are looking for a very extensible option or have a specific problem domain you could consider rolling your own using the Java Object Oriented Neural Engine.

I used it successfully in a personal project to identify the letter from an image such as this, you can find all the source for the OCR component of my application on github, here.

try tesseract, checkout this article http://www.itwizard.ro/interfacing-cc-libraries-via-jni-example-tesseract-163.html and this example http://code.google.com/p/mezzofanti/

Edit: some more facts - tesseract is one of the best open source OCR used by google - there is training data available for many languages - mezzofanti is an android app that uses tesseract - beware: OCR does use a lot of CPU power. trying to OCR a A4 page with your T-Mob G1 will take a lot of time and the result may not impress you ;-)

You can use the OCR feature from Google Docs. Check the Documents List Data API http://code.google.com/apis/documents/docs/3.0/developers_guide_protocol.html#OCR

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top