Question

I was wondering if anyone would give me pointers to image rec packages that would help me recognize "text" (not OCR, just something that looks like text) and a black box frame. So, suppose:

text
+----------+
|          |
|   text1  |
|          |
|          |
+----------+
     text

How do I recognize that "text" boxes are text, and that, say, text1 is inside the box?

Apologies for the vague question... I wouldn't know where to start. This is not homework, btw.

Was it helpful?

Solution

[This is of interest to us.] I am assuming your input is effectively a bitmap - a rectangular matrix of pixels. The first question is whether it is aligned with the axes - if it's been scanned it's probably not. You may need deskewing algorithms (rather dated but it's a useful start: http://www.eecs.berkeley.edu/~fateman/kathey/node11.html)

The classic line detection is the Hough transform (http://en.wikipedia.org/wiki/Hough_transform) though our current collaborators do better than this for simple boxes and project pixels onto different viewpoints - similar to tomography. Rotate the image and count the density/histogram of points on the projection lines. For simple boxes that gives a clear signal.

For the text I suspect you either have to have a set of likely fonts or to use machine learning. In the latter you have to devise features and then select a series of images that are classified by humans as text and not-text. Your algorithm (and there are many, neural nets, maximum entropy, etc.) are then trained against these.

The quality of the pixel map makes a great deal of difference. Documents 20 years ago and much harder than bitmaps of documents created though drawing programs and dumped as PDF (of course if you can interpret text in PDF that helps a good deal.)

OTHER TIPS

You can apply any border detection algorithm to detect box. and since color of text is different form the color of background you can use even linear search to find black pixels of 'text'. I may be wrong, sorry about that.

A very simple algorithm would to scan left-to-right and top-to-bottom, looking for the three black pixels that make up an upper-left corner of a box (and then continuing to scan for the three pixels that would make up the matching lower-right corner). Once you've identified each box in the image in this way, you could scan the inner portion and assume that any non-white pixels mean there is some text in the box. Of course, this would not differentiate between text and images inside the box, but that would be a much more difficult problem anyway.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top