how do I identify letters in an image? (before OCRing)

https://stackoverflow.com/questions/4894758

28-10-2019
|

题

all I can find in the web is about OCR but I'm not there yet, I still have to recognize where the letters are in the image.

any help will be appreciated

解决方案

Generally speaking you'll be looking for small contiguous areas of nearly solid color. I would suggest sampling each pixel and building an array of nearby pixels that also fall within a threshold of the original pixels color (repeat for neighbours of each matching pixel). Put the entire array aside as a potential character (or check it now) and move on (potentially ignoring previously collected pixels for a speedup).

Optimisations are possible if you know in advance the font-size, quality and/or color of the text. If not you'll want to be fairly generous with your thresholds of what constitutes a "contiguous area".

其他提示

The interesting thing is that the answer is not that simple as it may seem. Some may think that locating characters on the picture is first step of OCR, but it is not the case. Actually, you won't be sure where each character is located until you actually finish with recognizing.

The way it works completely depends on the type of image you are going to recognize. First you should segment you image on text areas (blocks) and everything other.

Just few examples:

If you are recognizing license plate on car picture, you should first locate license plate, and only then split it to separate characters.
If you are recognizing some application form, you can locate areas where text is just by knowing it's layout
If you are recognizing scan of book page, you have to distinguish pictures from text areas and then work only on text.

Starting from this moment you don't need original image any more, all you need is binarized image of text block. All OCR alorithms work on binary images. You may need also doing other kind of image transformations like line straightening, perspective correction, skew correction and so on - all that again depends on type of images you are recognizing.

Once text block is found and normalized, you should go further and find lines of text on the text block. In trivial case of horisontal lines of text it is quite simple by creating pixel histogram by horisontal lines.

Now, when you have lines, you may think that now it is simple, you can split it to characters, huray! Again, it is wrong. There are such phenomena as connected characters, broken characters and even ligatures (two letters forming one single shape), or letter that have their parts go further to the right above or bellow next character. What you should do is to create several hipotesis of splitting line to words and individual characters, then try OCR every single variant, weight every hypotesis with confidence level. Last step would be checking different paths in this graph using dictionary and selecting best one.

And only now, when you actually recognized everything, you can say where individual characters are located.

So, simple answer is: recognize your image with OCR program, and get coordinates of charaters from it's output.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow