質問

Using Kofax Capture 10 (SP1, FP2), I have recognition zones set up on some fields on a document. These fields are consistently recognizing I's as 1's. I have tried every combination of settings I can think of that don't obliterate all the characters in the field, to no avail. I have tried Advanced OCR and High Performance OCR, different filters for characters. All kinds of things.

What options can I try to automatically recognize this character? Should I tell the people producing the forms (they're generated by a computer) they need to try using a different font? Convince them that now is the time to consider using Validation?

My current field setup:

Kofax Advanced OCR with no custom settings except Maximize Accuracy in the advanced dialog. This has worked as well as anything else I have tried so far.

The font being used is 8 - 12 pt arial, btw.

役に立ちましたか?

解決

Validation is a MUST if OCR is involved, no matter if e-docs or paper docs are processed. For paper docs it is an even bigger must.

Use at least 11pt Arial and render the document as 300 dpi image. This will give you I'd say 99.9% accuracy (that is 1 character in every 1000 missed). Accuracy can drop if you have data where digits and letters are mixed within one word especially 1-I, 0-O, 6-G.

Recognition scripts can be used if you know that you have no such mixed data and OCR still returns mixed digits and letters. You can use the PostRecognition script event to catch the recognition result from the OCR engine and modify it with SBL or VB.NET scripts. But it greatly depends on the documents and data you process.

Image cleanup will not do any good for e-docs.

I'd say your best would be to use validation. At least that will push responsibility to the validation operator.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top