Question

Does anybody have any experience with different fonts for OCR? I am generating an ID then trying to scan it with tesseract. At the moment I am just T&E'n different fonts, but this seems pretty inefficient. I've tried the OCR* family of fonts, and various others such as Arial and Georgia. The tesseract tends to get confused with the OCR* fonts.

Is there any font specifically designed for tesseract, or any system font which works well with it?

Was it helpful?

Solution

Okay, a search on google comes up with this, a specific OCR font: OCR Font

Looks like it's a standard adopted in 1973.

OTHER TIPS

After trying a lot of different fonts and OCR engines I tend to get the best results using Consolas. It is a monospaced typeface like OCR-A, but easier to read for humans. Consolas is included in several Microsoft products.

There is also an open source font Inconsolata, which is influenced by Consolas. Inconsolata is a good replacement for Consolas, especially considering the licensing details.

In my tests, the numbers and spaces in the Calibri font were not always recognized properly. OCR-A gave lots of reading errors. I did not give MIRC a try, since it is not easily readable for most humans.

Note: tesseract requires a lot of testing and fine-tuning before being reliable. In our case we switched to a commercially licensed OCR engine (ABBYY), especially since reliability was very important and we needed to support multiple (European) languages.

Update: 2017 Jan 31 - Changed 'based on Consolas' to 'influenced by Consolas' due to potential copyright issues.

I find that Calibri works the best for me. We use OCR software daily in an automated system and after testing dozens of fonts (including some OCR specific ones) that Calibri is consistently the best.

Good luck.

I'd probably use the same font that banks use for the routing numbers at the bottom of checks:

http://morovia.com/font/micr.asp

It was specifically designed to be unambiguously machine-readable.

I had always success by simply using times new roman..

I've been doing extensive testing in this recently in an ECM called Laserfiche, which uses Nuance OmniPage, and I've found that monospace fonts perform poorly compared to dynamically spaced fonts. Those old OCR fonts don't perform as well as more 'normal' looking fonts. Especially for strings of numbers at smaller font sizes like point 12.

It's strange that someone else is having success with Calibri. It performed very poorly in my tests, routinely getting similar looking letters and numbers confused for each other. The best fonts (among those that come on a Windows computer with Office installed) were Consolas, Verdana, and Book Antiqua. All dynamic serif fonts where letters and numbers looked distinct. Consolas was the champion.

Currently using Monospace. Tried very many fonts, but this is the most accurate one for me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top