Question

Let's say I have a small bitmap which contains a single digit (0..9) in hand writing.

Is it possible to detect the digit using a (two-layered) perceptron?

Are there other possibilities to detect single digits from bitmaps besides using neural nets?

Was it helpful?

Solution

Feeding each pixel of a bitmap directly into a neural network will require a lot of training, and will not work well for handling scaling or rotation of the image.

To help the neural network perform good classification, you need to perform some preprocessing steps.

  • Normalize the image:
    • Adjust the contrast and brightness so that the histogram of the image matches a reference image.
    • Blur the image, to remove noise.
    • Convert it to black & white, using some threshold.
    • Find the bounding box of the shape, scale to a predefined size.
  • Calculate various features of the image that can be used to differentiate one digit from another:
    • The Euler number of the image — tells you how many "holes" there are in the shape (e.g. two holes for the digit 8).
    • The number of white pixels (the area of the digit)
    • The principal components of the set of coordinates of the white pixels — tells you how "elongated" the shape is.
    • ... other features that you can think of that tend to have similar values for similar digits.

The principal components can also be used to normalize rotation of the shape, so that the longest axis is vertical.

The features are what you feed into the neural network for classification, not the pixels.

OTHER TIPS

Here is a link to a huge database of handwritten digits. The front page also has relative performance data for many different methods including 2 layer Neural networks. This ought to give you a good start: MNIST digits database and performance

You might also want to check out Geoff Hinton's work on Restricted Boltzmann Machines which he says performs fairly well, and there is a good explanatory lecture on his site (very watchable).

Here is a Matlab example program that uses a trained neural network to detect single digits (image size fixed to 28*28).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top