I am doing OCR using Tesseract on a quad-core processor. For better speed, I want to read 4 words at a time, using 4 threads. Is it safe to call Tesseract from multiple threads concurrently?

Note: each thread will be working on a different, non-shared image.

Note: guarding with locks is not ok because of speed.

有帮助吗?

解决方案

I don't think tesseract is currently parallelizable (see this thread), although one of the main goals for v3.0 is to make it more thread-safe.

However, you could always parallelize by running n concurrent processes of tesseract. If you want to parallelize the OCRing of a single image, it would be up to you to split it and feed each part to each of these n processes (basically a mapreduce).

其他提示

From the release notes, Tesseract is (mostly, and to the degree that you describe needing) thread-safe as of 3.01 (Oct 21 2011)

Thread-safety! Moved all critical globals and statics to members of the appropriate class. Tesseract is now thread-safe (multiple instances can be used in parallel in multiple threads.) with the minor exception that some control parameters are still global and affect all threads.

I've been successfully using it on multiple cores for that long (or longer, from dev branch).

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top