Quick way to calculate uniformity or discrepancy of number set
-
27-09-2019 - |
Question
Hello Assume I have the set of numbers I want a quick to calculate some measure of uniformity. I know the variance is the most obvious answer but i am afraid the complexity of naive algorithm is too high Anyone have any suggestions?
Solution
"Intuitive" algorithms for calculating variance usually suffer one or both of the following:
- Use two loops (one for calculating the mean, the other for the variance)
- Are not numerically stable
A good algorithm, with only one loop and numerically stable is due to D. Knuth (as always).
n = 0
mean = 0
M2 = 0
def calculate_online_variance(x):
n = n + 1
delta = x - mean
mean = mean + delta/n
M2 = M2 + delta*(x - mean) # This expression uses the new value of mean
variance_n = M2/n
variance = M2/(n - 1) #note on the first pass with n=1 this will fail (should return Inf)
return variance
You should invoke calculate_online_variance(x) for each point, and it returns the variance calculated so far.
OTHER TIPS
I don't see why calculating the variance should be a problem at all. As the variance is just the sum of the squares of the distances from the mean divided by the number of elements, basic pseudocode to do this would be
- Calculate mu, the mean of the set
- Let s = 0
- For each element x in the list, let s = s + (x - mu)*(x-mu)
- Calculate s / n
Note that sometimes it's better to divide s by n-1 (specifically, when you're worried about biased estimators). See the Wikipedia article on Bessel's correction for why.
Of course, a lower variance indicates high uniformity.
Note that it might not be a bad idea to further divide your variance by mu^2 to get an absolute measure of uniformity (that is, so that ".5 1 .5 1 .5 1" is considered less tight than "100 101 100 101 100 101", as the relative differences are much bigger in the former than in the latter).