Question

I developed a image processing program that identifies what a number is given an image of numbers. Each image was 27x27 pixels = 729 pixels. I take each R, G and B value which means I have 2187 variables from each image (+1 for the intercept = total of 2188).

I used the below gradient descent formula:

Repeat {
    θj = θj−α/m∑(hθ(x)−y)xj
}

Where θj is the coefficient on variable j; α is the learning rate; hθ(x) is the hypothesis; y is real value and xj is the value of variable j. m is the number of training sets. hθ(x), y are for each training set (i.e. that's what the summation sign is for). Further the hypothesis is defined as:

hθ(x) = 1/(1+ e^-z)
z= θo + θ1X1+θ2X2 +θ3X3...θnXn

With this, and 3000 training images, I was able to train my program in just over an hour and when tested on a cross validation set, it was able to identify the correct image ~ 67% of the time.

I wanted to improve that so I decided to attempt a polynomial of degree 2.

However the number of variables jumps from 2188 to 2,394,766 per image! It takes me an hour just to do 1 step of gradient descent.

So my question is, how is this vast number of variables handled in machine learning? On the one hand, I don't have enough space to even hold that many variables for each training set. On the other hand, I am currently storing 2188 variables per training sample, but I have to perform O(n^2) just to get the values of each variable multiplied by another variable (i.e. the polynomial to degree 2 values).

So any suggestions / advice is greatly appreciated.

Was it helpful?

Solution

  • try to use some dimensionality reduction first (PCA, kernel PCA, or LDA if you are classifying the images)

  • vectorize your gradient descent - with most math libraries or in matlab etc. it will run much faster

  • parallelize the algorithm and then run in on multiple CPUs (but maybe your library for multiplying vectors already supports parallel computations)

OTHER TIPS

Along with Jirka-x1's answer, I would first say that this is one of the key differences in working with image data than say text data for ML: high dimensionality.

Second... this is a duplicate, see How to approach machine learning problems with high dimensional input space?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top