Question

I've been playing with some SVM implementations and I am wondering - what is the best way to normalize feature values to fit into one range? (from 0 to 1)

Let's suppose I have 3 features with values in ranges of:

  1. 3 - 5.

  2. 0.02 - 0.05

  3. 10-15.

How do I convert all of those values into range of [0,1]?

What If, during training, the highest value of feature number 1 that I will encounter is 5 and after I begin to use my model on much bigger datasets, I will stumble upon values as high as 7? Then in the converted range, it would exceed 1...

How do I normalize values during training to account for the possibility of "values in the wild" exceeding the highest(or lowest) values the model "seen" during training? How will the model react to that and how I make it work properly when that happens?

Was it helpful?

Solution 2

You normalise a vector by converting it to a unit vector. This trains the SVM on the relative values of the features, not the magnitudes. The normalisation algorithm will work on vectors with any values.

To convert to a unit vector, divide each value by the length of the vector. For example, a vector of [4 0.02 12] has a length of 12.6491. The normalised vector is then [4/12.6491 0.02/12.6491 12/12.6491] = [0.316 0.0016 0.949].

If "in the wild" we encounter a vector of [400 2 1200] it will normalise to the same unit vector as above. The magnitudes of the features is "cancelled out" by the normalisation and we are left with relative values between 0 and 1.

OTHER TIPS

Besides scaling to unit length method provided by Tim, standardization is most often used in machine learning field. Please note that when your test data comes, it makes more sense to use the mean value and standard deviation from your training samples to do this scaling. If you have a very large amount of training data, it is safe to assume they obey the normal distribution, so the possibility that new test data is out-of-range won't be that high. Refer to this post for more details.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top