I've not tried to write OCR software, but we do use it, and it is (or can) get very complicated.
It's not totally clear where your image is coming from; if it's a scanned image, then there are several complications. Not least in regard to your plan is that even if there is a gap between digits it may not be vertical (it's very unlikely that the page scanned will be totally straight). Other factors include "speckle" -- random dots caused by dirt etc. on the image or the scanner. If you're processing this kind of image, you almost certainly need to look towards Image Processing techniques that apply many different mathematical operations to the whole array of pixels to do things like deskew (straighten the image), despeckle (get rid of random dots); edge-enhancement (strengthen changes from light to dark to enhance lines).
From your use of "background" and "foreground" colours, it may be that you're trying to "OCR" an image from the screen? If so (some kind of "screen-scraping" process), and you know (or can be trained with) the specific character-shapes being interpreted, then a variant of the sliding window may help: you slide the known image of a '5' around the image at different offsets: if all the pixels of the '5' match "foreground" pixels in the image, then you know you've found a '5'. Repeat for other digits. As above, this is a "virtual" window we're talking about.