How to find blank field on scanned document image

https://stackoverflow.com/questions/548309

23-08-2019
|

Question

I want my application to fill in a single field in a form that exists as an black-and-white image file. The form always starts as the same paper version, but by the time my application gets it from my users, it may have been scanned or faxed more than once. Because of that, the field I need is not in the same place in every file.

My users do not always get the blank form from me, so I do not have the ability to print a mark or placeholder that I can recognize later.

There is text on the original blank form, but because it may have been faxed, I have only 200 dpi of resolution. The text is always big enough for a human to read, but I'm skeptical about OCR.

I have some budget so I do not need a free solution ... let's just say $2000.

That said, I am considering

Get an OCR solution to find the text label on the field I need. I do not think I have the resources or expertise to roll-my-own. I do not need perfect recognition, since I already know what the text says. But I do need to know X- and Y-coordinates. Is there software that does this? Or is the programming easier than I think?
Build or buy software to recognize the edges of the form. From there, I could get the relative position of the field I need. I'm thinking of the dashed line my scanner software puts around the image of a small document. Is that a known algorhthm or is there an available solution?
Some other way to recognize the field I need. Attempts to google form filling software give me hundreds of matches for web forms, pdf forms, etc. that do not do what I need.

I'm not picky about language. My application runs on Linux, but if the best solution is Microsoft, I can probably make that work.

I'd appreciate your thoughts.

Solution

If I understand correctly, the form is always the same, but may be shifted, scaled, or slightly rotated due to photocopying/faxing. In that case, your problem is one of image registration: find the optimal rigid transformation that makes a form from a user line up with your "model" form, in which you know the location of the field of interest. Once you know the transformation, you can compute the location of the field in the user's form.

There are many image registration algorithms, typically developed for applications such as aligning MR-images of the brain. They are computationally expensive and require statistical priors. Fortunately, your case is easier: all you need to do is fit a rectangle around the contents of the user's form. Coordinate descent should work. You will need some tolerance for noise (junk outside the form).

OTHER TIPS

Here's a little summary of some available OCR solutions (open source and not): http://googlesystem.blogspot.com/2007/04/open-source-ocr-software-sponsored-by.html

Rigid registration may not be enough. Users may modify the layout and formatting of a template form, such as change the fonts, change the location of a checkbox or an entry box, break a paragraph at different newline positions, etc. These differences are more complicated to deal with than the pure shift, rotation or scale transformation. Besides, if your image is binary image (black and white), I don't think those medical image registration algorithms (working on grayscale image) will help much. Your cost function and minimization strategies may be changed accordingly.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow