Fast character detection
-
22-08-2019 - |
Question
I don't want to know what it says, and it will not be dealing with any distortion like a CAPTCHA, I just want to know if a bunch of images contain any text.
This is something that will be running on a couple of idle Linux servers, and a cron job will process a large batch of images multiple times a day.
One of the things I want to do in the process, is discard any images with text in them. I don't mind some false positives, but I would like to get as close to a zero-percent fail rate when it comes to identifying images with text that should be discarded as possible.
Solution
The Tesseract-OCR is what google use for Google Books. Give it a try.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow