Question

I'm looking for an explanation / API doc / examples of how to use (and train?) Tesseract in C++, nothing useful on the google Tesseract page, and yet to find something over the web.

Anyone useful sources, experiences would be more than welcome, as I have no idea how to begin with it.

P.S:

  1. I'm open for suggestions on other libraries.
  2. Only FREE libraries
Was it helpful?

Solution

I have some experience with Tesseract... a simple google of 'training tesseract' reveals this page: http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract where you must choose which version of tesseract you wish to train.. While 3 is the latest version, it's brand new and thus people are still ironing out any issues - im still using version 2.4. Anyways, you'll see there are about 9 steps in training tesseract for a particular 'language' (or what should have been called 'fonts' or 'character-sets'). You could also just use the existing 'eng' language - but it depends on your application. For example, in my application I would have to do the document analysis and take a particular region and want to OCR a 13-character string of numbers - and I needed high accuracy - and I didn't want it reading '5' as 'S' and '0' as 'O' etc, so it was logical to create a particular 'language' of my particular font-set consisting only of the characters 0..9, whereas you might not care if you get extra 'noise

OTHER TIPS

Tesseract Ocr is an open source library for detecting Optical Character. You just need to include the library files if you are using visual studio. If you are using qt creator then you have to build the library to work on the QT. You need to use CMakelist or Cmake Gui to build the library. You can visit the link Opencv Ocr build for Qt 5.4 mingw

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top