Normalizing dataset with ruby
-
19-09-2019 - |
Question
I have a data set that ranges from 1 to 30,000
I want to normalize it, so that it becomes 0.1 to 10
What is the best method/function to do that?
Would greatly appreciate it if you could give some sample code!
Solution
Here's a code snippet, assuming you want a linear normalization. It's a very simplistic version (just straight code, no methods), so you can see "how it works" and can apply it to anything.
xmin = 1.0
xmax = 30000.0
ymin = 0.1
ymax = 10.0
xrange = xmax-xmin
yrange = ymax-ymin
y = ymin + (x-xmin) * (yrange / xrange)
And here it is done as a function:
def normalise(x, xmin, xmax, ymin, ymax)
xrange = xmax - xmin
yrange = ymax - ymin
ymin + (x - xmin) * (yrange.to_f / xrange)
end
puts normalise(2000, 1, 30000, 0.1, 10)
(Note: the to_f
ensures we don't fall into the black hole of integer division)
OTHER TIPS
Here's the Ruby Way for the common case of setting an array's min to 0.0 and max to 1.0.
class Array
def normalize!
xMin,xMax = self.minmax
dx = (xMax-xMin).to_f
self.map! {|x| (x-xMin) / dx }
end
end
a = [3.0, 6.0, 3.1416]
a.normalize!
=> [0.0, 1.0, 0.047199999999999985]
For a min and max other than 0 and 1, add arguments to normalize!
in the manner of Elfstrom's answer.
This is a well known way to scale a collection numbers. It has more precise name but I can't remember and fail to google it.
def scale(numbers, min, max)
current_min = numbers.min
current_max = numbers.max
numbers.map {|n| min + (n - current_min) * (max - min) / (current_max - current_min)}
end
dataset = [1,30000,15000,200,3000]
result = scale(dataset, 0.1, 10.0)
=> [0.1, 10.0, 5.04983499449982, 0.165672189072969, 1.08970299009967]
scale(result, 1, 30000)
=> [1.0, 30000.000000000004, 15000.0, 199.99999999999997, 3000.0000000000005]
As you can see, you have to be aware of rounding issues. You should probably also make sure that you don't get integers as min & max because integer division will damage the result.
x = x / 3030.3031 + 0.1