Running average in Python

https://stackoverflow.com/questions/1790550

22-09-2019
|

Question

Is there a pythonic way to build up a list that contains a running average of some function?

After reading a fun little piece about Martians, black boxes, and the Cauchy distribution, I thought it would be fun to calculate a running average of the Cauchy distribution myself:

import math 
import random

def cauchy(location, scale):
    p = 0.0
    while p == 0.0:
        p = random.random()
    return location + scale*math.tan(math.pi*(p - 0.5))

# is this next block of code a good way to populate running_avg?
sum = 0
count = 0
max = 10
running_avg = []
while count < max:
    num = cauchy(3,1)
    sum += num
    count += 1
    running_avg.append(sum/count)

print running_avg     # or do something else with it, besides printing

I think that this approach works, but I'm curious if there might be a more elegant approach to building up that running_avg list than using loops and counters (e.g. list comprehensions).

There are some related questions, but they address more complicated problems (small window size, exponential weighting) or aren't specific to Python:

Solution

You could write a generator:

def running_average():
  sum = 0
  count = 0
  while True:
    sum += cauchy(3,1)
    count += 1
    yield sum/count

Or, given a generator for Cauchy numbers and a utility function for a running sum generator, you can have a neat generator expression:

# Cauchy numbers generator
def cauchy_numbers():
  while True:
    yield cauchy(3,1)

# running sum utility function
def running_sum(iterable):
  sum = 0
  for x in iterable:
    sum += x
    yield sum

# Running averages generator expression (** the neat part **)
running_avgs = (sum/(i+1) for (i,sum) in enumerate(running_sum(cauchy_numbers())))

# goes on forever
for avg in running_avgs:
  print avg

# alternatively, take just the first 10
import itertools
for avg in itertools.islice(running_avgs, 10):
  print avg

OTHER TIPS

You could use coroutines. They are similar to generators, but allows you to send in values. Coroutines was added in Python 2.5, so this won't work in versions before that.

def running_average():
    sum = 0.0
    count = 0
    value = yield(float('nan'))
    while True:
        sum += value
        count += 1
        value = yield(sum/count)

ravg = running_average()
next(ravg)   # advance the corutine to the first yield

for i in xrange(10):
    avg = ravg.send(cauchy(3,1))
    print 'Running average: %.6f' % (avg,)

As a list comprehension:

ravg = running_average()
next(ravg)
ravg_list = [ravg.send(cauchy(3,1)) for i in xrange(10)]

Edits:

Using the next() function instead of the it.next() method. This is so it also will work with Python 3. The next() function has also been back-ported to Python 2.6+.
In Python 2.5, you can either replace the calls with it.next(), or define a next function yourself.
(Thanks Adam Parkin)

I've got two possible solutions here for you. Both are just generic running average functions that work on any list of numbers. (could be made to work with any iterable)

Generator based:

nums = [cauchy(3,1) for x in xrange(10)]

def running_avg(numbers):
    for count in xrange(1, len(nums)+1):
        yield sum(numbers[:count])/count

print list(running_avg(nums))

List Comprehension based (really the same code as the earlier):

nums = [cauchy(3,1) for x in xrange(10)]

print [sum(nums[:count])/count for count in xrange(1, len(nums)+1)]

Generator-compatabile Generator based:

Edit: This one I just tested to see if I could make my solution compatible with generators easily and what it's performance would be. This is what I came up with.

def running_avg(numbers):
    sum = 0
    for count, number in enumerate(numbers):
        sum += number
        yield sum/(count+1)

See the performance stats below, well worth it.

Performance characteristics:

Edit: I also decided to test Orip's interesting use of multiple generators to see the impact on performance.

Using timeit and the following (1,000,000 iterations 3 times):

print "Generator based:", ', '.join(str(x) for x in Timer('list(running_avg(nums))', 'from __main__ import nums, running_avg').repeat())
print "LC based:", ', '.join(str(x) for x in Timer('[sum(nums[:count])/count for count in xrange(1, len(nums)+1)]', 'from __main__ import nums').repeat())
print "Orip's:", ', '.join(str(x) for x in Timer('list(itertools.islice(running_avgs, 10))', 'from __main__ import itertools, running_avgs').repeat())

print "Generator-compatabile Generator based:", ', '.join(str(x) for x in Timer('list(running_avg(nums))', 'from __main__ import nums, running_avg').repeat())

I get the following results:

Generator based: 17.653908968, 17.8027219772, 18.0342400074
LC based: 14.3925321102, 14.4613749981, 14.4277560711
Orip's: 30.8035550117, 30.3142540455, 30.5146529675

Generator-compatabile Generator based: 3.55352187157, 3.54164409637, 3.59098005295

See comments for code:

Orip's genEx based: 4.31488609314, 4.29926609993, 4.30518198013

Results are in seconds, and show the LC new generator-compatible generator method to be consistently faster, your results may vary though. I expect the massive difference between my original generator and the new one is the fact that the sum isn't calculated on the fly.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow