Question

I'm working on OCR, and right now I'm working on parsing each individual character away from the others. E.g if I have an image that says the following:

12345678.90

I want to detect the x,y coordinates of where each number starts and where it ends in the image, so that I can determine how many numbers there are to process, and to then parse out each individual number / character, and process it.

I have devised a simple algorithm for doing it, and I want some opinions / reviews on how it could be improved.

(In this application, I have to only process numbers, but if this algorithm could also parse out letters, that'd be even better).

  • 1) I would read the pixels in the image in a straight line, at the bottom of the image. E.g, if the image is 30x30, then I would start reading from 0,30 to 30,30.

  • 2) I will compare the color of the pixel. Having already determined the background and foreground colors, I will compare each pixel's color to see if its in the background, or foreground.

  • 3) If its the background, it will be ignored. If I encounter any pixel in the foreground, that would indicate the start of a digit. In that case, I would note the location, and then start to read the pixels upwards. E.g, if at 5,30 I detect a foreground color, I would start to read 5,29, 5,28, etc.

  • 4) I would read the pixels upwards (y axis) until i encounter a pixel in the background color. This should give me the height of the character. (I know that for some chars like 5 it would be more complicated, lets ignore them for now). So I'd determine, e.g, that the character goes from 5,20 to 5,30 vertically.

  • 5) Then I would go back to the x axis (5,30) where I detected the character's start horizontally. I would continue to read horizontally to determine the width of the character, e.g 6,30, 7,30, etc.

  • 6) Here's the tricky step. I'm guessing, that between each characters of the following:

    12345678.90

There is a pixel or so of gap in the background color. And that it may not be visible to us, but it is there and will be found by the program as goes pixel by pixel horizontally, reading the colors. That would tell it where the character ends horizontally. So e.g, it might detect the background color pixel at 15, 30.

  • 7) That's the algorithm, it should give the x,y coordinates of where each letter starts and the next one begins. In the example above, the character would run from 5,20 to 15,30, and is 10x10.

Could this algorithm be improved, and/or am I correct in my assumption on step 6?

Was it helpful?

Solution

A common approach which I know for segmentation of digits is the sliding window. The basic idea is that you slide a window of some size over the image of digits.

Each movement of the sliding window produces an image (you look only at pixels covered by the window). The sliding window will be narrow. Now classifier can be trained, that will map sliding window to 1 or 0, where 1 indicates that sliding window is centered on a split of 2 digits, and 0 indicates the opposite.

You would need some training data to train the classifier. Or you can try to use unsupervised learning.

EDIT : This video can be useful : https://www.youtube.com/watch?v=y6ga5DeVgSY

OTHER TIPS

DISCLAIMER: I never wrote any OCR-like software before.

To me, your algorithm seems a bit off, because of the following reasons:

  • 1 starts not where you find the first pixel at the bottom, because you still have the little stroke that points to the left, on top of the 1.
  • 2 would be only a few pixels high, since you are going straight up until you find a background pixel.
  • 3 would result in being only 1 pixel by 1 pixel, due to the same arguments as above.
  • etc...

I would try to use a recursive algorithm that follows the foreground color pixels as far as it can without going into the background pixels. When using big images with big characters, this might cause a stack overflow, so it would be nice to do the trick in a couple of for loops instead of using a recursive function.

If you are doing it this pixel by pixel discovery of one character, you can use that process to create vector information on what your character looks like. I think that would be a cool starting point to recognize the characters.

I've not tried to write OCR software, but we do use it, and it is (or can) get very complicated.

It's not totally clear where your image is coming from; if it's a scanned image, then there are several complications. Not least in regard to your plan is that even if there is a gap between digits it may not be vertical (it's very unlikely that the page scanned will be totally straight). Other factors include "speckle" -- random dots caused by dirt etc. on the image or the scanner. If you're processing this kind of image, you almost certainly need to look towards Image Processing techniques that apply many different mathematical operations to the whole array of pixels to do things like deskew (straighten the image), despeckle (get rid of random dots); edge-enhancement (strengthen changes from light to dark to enhance lines).

From your use of "background" and "foreground" colours, it may be that you're trying to "OCR" an image from the screen? If so (some kind of "screen-scraping" process), and you know (or can be trained with) the specific character-shapes being interpreted, then a variant of the sliding window may help: you slide the known image of a '5' around the image at different offsets: if all the pixels of the '5' match "foreground" pixels in the image, then you know you've found a '5'. Repeat for other digits. As above, this is a "virtual" window we're talking about.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top