The error message is clear: it needs osd.traineddata
file. You can install or download Orientation & Script Detection Data for Tesseract from https://github.com/tesseract-ocr/tessdata.
tesseract (v3.03) output as PDF [closed]
Question
Why is this error returned?
root@amd-3700-2gb ~/ocr_test # tesseract -l dan pdf.png out pdf
Tesseract Open Source OCR Engine v3.03 with Leptonica
Error opening data file /usr/local/share/tessdata/osd.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'osd'
Tesseract couldn't load any languages!
Warning: Auto orientation and script detection requested, but osd language failed to load
Language list
root@amd-3700-2gb ~/ocr_test # tesseract --list-langs
List of available languages (3):
eng
dan
dan-frak
Output as txt
This works fine and output text to out.txt
tesseract -l dan pdf.png out
Output PDF
This creates out.pdf
but also retuns the error mentioned and the searchable text in the PDF doesn't make sense
tesseract -l dan pdf.png out pdf
Solution
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow