Grouping numbers for a histogram
Question
I have a bunch of numbers I want to use to generate a histogram for a standard score.
Therefore I compute the mean and the standard deviation of the numbers and normalize each x with this formula
x' = (x-mean)/std_dev
The result is a number between -4 and 4. I want to chart that result. I am looking for a way to group the numbers in order to avoid to small bars.
My plan is to have bins in the interval [-4,4] centered at consecutavice quarter units, i.e [-4,-3.75,...,3.75,4]
Example: 0.1 => bin "0.0", 0.3 => bin "0.25", -1.3 => Bin "-1.5"
What is the best way to achieve that?
Solution
Here's a solution that doesn't use any third part libraries. The numbers should be in the Array vals
.
MULTIPLIER = 0.25
multipliers = []
0.step(1, MULTIPLIER) { |n| multipliers << n }
histogram = Hash.new 0
# find the appropriate "bin" and create the histogram
vals.each do |val|
# create an array with all the residuals and select the smallest
cmp = multipliers.map { |group| [group, (group - val%1).abs] }
bin = cmp.min { |a, b| a.last <=> b.last }.first
histogram[val.truncate + bin] += 1
end
I think that it performs the proper rounding. But I only tried it with:
vals = Array.new(10000) { (rand * 10) % 4 * (rand(2) == 0 ? 1 : -1) }
and the distribution got kind of skewed, but that's probably the random number generator's fault.
OTHER TIPS
Rails provides Enumerable#group_by -- see source here, assuming you're not using Rails: http://api.rubyonrails.org/classes/Enumerable.html
Assuming your list is called xs, you could do something like the following (untested):
bars = xs.group_by {|x| #determine bin here}
Then you'll have a hash that looks like:
bars = { 0 => [elements,in,first,bin], 1 => [elements,in,second,bin], etc }