Question

I am doing OCR using Tesseract on a quad-core processor. For better speed, I want to read 4 words at a time, using 4 threads. Is it safe to call Tesseract from multiple threads concurrently?

Note: each thread will be working on a different, non-shared image.

Note: guarding with locks is not ok because of speed.

Was it helpful?

Solution

I don't think tesseract is currently parallelizable (see this thread), although one of the main goals for v3.0 is to make it more thread-safe.

However, you could always parallelize by running n concurrent processes of tesseract. If you want to parallelize the OCRing of a single image, it would be up to you to split it and feed each part to each of these n processes (basically a mapreduce).

OTHER TIPS

From the release notes, Tesseract is (mostly, and to the degree that you describe needing) thread-safe as of 3.01 (Oct 21 2011)

Thread-safety! Moved all critical globals and statics to members of the appropriate class. Tesseract is now thread-safe (multiple instances can be used in parallel in multiple threads.) with the minor exception that some control parameters are still global and affect all threads.

I've been successfully using it on multiple cores for that long (or longer, from dev branch).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top