Fast character detection

https://stackoverflow.com/questions/1081526

text
image-processing
ocr

22-08-2019
|

Question

I don't want to know what it says, and it will not be dealing with any distortion like a CAPTCHA, I just want to know if a bunch of images contain any text.

This is something that will be running on a couple of idle Linux servers, and a cron job will process a large batch of images multiple times a day.

One of the things I want to do in the process, is discard any images with text in them. I don't mind some false positives, but I would like to get as close to a zero-percent fail rate when it comes to identifying images with text that should be discarded as possible.

Solution

The Tesseract-OCR is what google use for Google Books. Give it a try.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow