Question

Hello Assume I have the set of numbers I want a quick to calculate some measure of uniformity. I know the variance is the most obvious answer but i am afraid the complexity of naive algorithm is too high Anyone have any suggestions?

Was it helpful?

Solution

"Intuitive" algorithms for calculating variance usually suffer one or both of the following:

  1. Use two loops (one for calculating the mean, the other for the variance)
  2. Are not numerically stable

A good algorithm, with only one loop and numerically stable is due to D. Knuth (as always).

From Wikipedia:

n = 0
mean = 0
M2 = 0
 def calculate_online_variance(x):
    n = n + 1
    delta = x - mean
    mean = mean + delta/n
    M2 = M2 + delta*(x - mean)  # This expression uses the new value of mean

    variance_n = M2/n
    variance = M2/(n - 1) #note on the first pass with n=1 this will fail (should return Inf)
    return variance

You should invoke calculate_online_variance(x) for each point, and it returns the variance calculated so far.

OTHER TIPS

I don't see why calculating the variance should be a problem at all. As the variance is just the sum of the squares of the distances from the mean divided by the number of elements, basic pseudocode to do this would be

  1. Calculate mu, the mean of the set
  2. Let s = 0
  3. For each element x in the list, let s = s + (x - mu)*(x-mu)
  4. Calculate s / n

Note that sometimes it's better to divide s by n-1 (specifically, when you're worried about biased estimators). See the Wikipedia article on Bessel's correction for why.

Of course, a lower variance indicates high uniformity.

Note that it might not be a bad idea to further divide your variance by mu^2 to get an absolute measure of uniformity (that is, so that ".5 1 .5 1 .5 1" is considered less tight than "100 101 100 101 100 101", as the relative differences are much bigger in the former than in the latter).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top