Ad-Hoc dictionary

https://stackoverflow.com//questions/25047110

21-12-2019
|

Question

I m currently working on a small project with Finereader 11 SDK. To improve my results i like to work with an ad-hoc dictionary. The content of the dictionary is based on the first word of a certain line

Example:

Samsung Galaxy S3 ... many other word in this line
Apple Iphone 4 ... much more words
some more lines

My idea is to recongize the first word ( Samsung or Apple ) and fill the dictionary with all possible words based on the first ( for Samsung : Galaxy, S3, ...)

Any idea how to solve this with Finereader

Regards

Solution

Thank you for the clarification. So here is what you can do in my opinion. This applies to FineReader product line, and of course in the SDK you have more specific control via API.

FineReader OCR has these dictionaries:

Built-in dictionary - large set of common words and their variations, one of the strengths of ABBYY OCR technology. It does not contain specialized words, such as "Samsung" and "S3", for example. By selecting popular language, you automatically turn on built-in dictionary for that language.
Custom Dictionary - this is a dictionary that you can build, and use alone or in conjunction with built-in dictionary.

So for your project, I believe it makes sense to use built-in dictionary, because your phrases may have standard English words (you did not provide full phrases for me to see, so decide on this yourself).

I also strongly believe that you need to create a custom dictionary with brands and models, etc. If you have that option, and sounds like you do. It will greatly improve recognition, especially for un-natural words, like "S3", because common language rules indicate letters and numbers should not be mixed. This is very easy to do.

I presently do not see the benefit of reading each line with a separate dictionary, unless you believe you will have an intersection of very similar words applicable to different lines, and you would want those words in separate dictionaries and relative to each line. Then you can create separate dictionaries, and turn on each dictionary for secondary recognition based on the initial word. However, to achieve that, you need to first separate into lines (in memory, or actually crop images) in order to be able to process each separately with unique dictionary. That is possible only in SDK with substantial amount of work.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow