Tesseract identifies a 0 as a Q

https://stackoverflow.com/questions/20676620

optimization
tesseract
tiff
identification

19-09-2022
|

문제

I am using Tesseract OCR for getting an exclusively numeric string in a PDF file. The PDF contains : 66600O3377.pdf but Tesseract recognizes : 66600Q3377.pdf

The input is a TIFF file, the quality is good enough (see the screenshot).

Is there a way to improve the Tesseract accuracy ? I could always change Q for a 0 but I'm afraid of further unexpected mistakes.

enter image description here

해결책

This is in Tesseract FAQ:

Run a tesseract command like this to only permit digits in input image:

tesseract imagename outputbase digits

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow