Algorithm for parsing characters from an image for OCR

Question 1

A common approach which I know for segmentation of digits is the sliding window. The basic idea is that you slide a window of some size over the image of digits.

Each movement of the sliding window produces an image (you look only at pixels covered by the window). The sliding window will be narrow. Now classifier can be trained, that will map sliding window to 1 or 0, where 1 indicates that sliding window is centered on a split of 2 digits, and 0 indicates the opposite.

You would need some training data to train the classifier. Or you can try to use unsupervised learning.

EDIT : This video can be useful : https://www.youtube.com/watch?v=y6ga5DeVgSY

Question 2

DISCLAIMER: I never wrote any OCR-like software before.

To me, your algorithm seems a bit off, because of the following reasons:

1 starts not where you find the first pixel at the bottom, because you still have the little stroke that points to the left, on top of the 1.
2 would be only a few pixels high, since you are going straight up until you find a background pixel.
3 would result in being only 1 pixel by 1 pixel, due to the same arguments as above.
etc...

I would try to use a recursive algorithm that follows the foreground color pixels as far as it can without going into the background pixels. When using big images with big characters, this might cause a stack overflow, so it would be nice to do the trick in a couple of for loops instead of using a recursive function.

If you are doing it this pixel by pixel discovery of one character, you can use that process to create vector information on what your character looks like. I think that would be a cool starting point to recognize the characters.

Question 3

I've not tried to write OCR software, but we do use it, and it is (or can) get very complicated.

It's not totally clear where your image is coming from; if it's a scanned image, then there are several complications. Not least in regard to your plan is that even if there is a gap between digits it may not be vertical (it's very unlikely that the page scanned will be totally straight). Other factors include "speckle" -- random dots caused by dirt etc. on the image or the scanner. If you're processing this kind of image, you almost certainly need to look towards Image Processing techniques that apply many different mathematical operations to the whole array of pixels to do things like deskew (straighten the image), despeckle (get rid of random dots); edge-enhancement (strengthen changes from light to dark to enhance lines).

From your use of "background" and "foreground" colours, it may be that you're trying to "OCR" an image from the screen? If so (some kind of "screen-scraping" process), and you know (or can be trained with) the specific character-shapes being interpreted, then a variant of the sliding window may help: you slide the known image of a '5' around the image at different offsets: if all the pixels of the '5' match "foreground" pixels in the image, then you know you've found a '5'. Repeat for other digits. As above, this is a "virtual" window we're talking about.