Question

If I have a list of numbers or objects in a list like l = [3,5,3,6,47,89]. We can calculate the minimum, maximum and average using following python code

minimum = min(l)
maximum = max(l)
avg = sum(l) / len(l)

Since all involve iterating the entire list, it is slow for large lists and lot of code.Is there any python module which can calculate all these values together?

Was it helpful?

Solution

Cython function:

@cython.boundscheck(False)
@cython.wraparound(False)
def minmaxAvg(list x):

    cdef int i
    cdef int _min, _max, total
    _min = x[0]
    _max = x[0]
    total = 0
    for i in x:
        if i < _min: _min = i 
        elif i > _max: _max = i 
        total += i
    return _min, _max, total/len(x)

pure python function to compare against:

def builtinfuncs(x):
    a = min(x)
    b = max(x)
    avg = sum(x) / len(x)
    return a,b,avg


In [16]: x = [random.randint(0,1000) for _ in range(10000)]

In [17]: %timeit minmaxAvg(x)
10000 loops, best of 3: 34 µs per loop

In [18]: %timeit frob(x)
1000 loops, best of 3: 460 µs per loop

Disclaimer:
- Speed result from cython will be dependent on computer hardware.
- Not as flexible and foolproof as using builtins. You would have to change the function to handle anything but integers for example.
- Before going down this path, you should ask yourself if this operation really is a big bottleneck in your application. It's probably not.

OTHER TIPS

If you have pandas installed, you can do something like this:

import numpy as np
import pandas
s = pandas.Series(np.random.normal(size=37))
stats = s.describe()

stats will be a another series that behaves like a dictionary:

print(stats)
count    37.000000
mean      0.072138
std       0.932000
min      -1.267888
25%      -0.688728
50%      -0.048624
75%       0.784244
max       2.501713
dtype: float64

stats['max']
2.501713

...etc. However, I don't recommend this unless you're striving simply for concise code. Here's why:

%%timeit
stats = s.describe()
# 100 loops, best of 3: 1.44 ms per loop

%%timeit
mymin = min(s)
mymax = max(s)
myavg = sum(s)/len(s)
# 10000 loops, best of 3: 89.5 µs per loop

I just can't imagine that you'll be able to squeeze any more performance out of the built-in functions with your own implementations (barring some cython voodoo, maybe).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top