Question

If I have a list of numbers or objects in a list like l = [3,5,3,6,47,89]. We can calculate the minimum, maximum and average using following python code

minimum = min(l)
maximum = max(l)
avg = sum(l) / len(l)

Since all involve iterating the entire list, it is slow for large lists and lot of code.Is there any python module which can calculate all these values together?

Était-ce utile?

La solution

Cython function:

@cython.boundscheck(False)
@cython.wraparound(False)
def minmaxAvg(list x):

    cdef int i
    cdef int _min, _max, total
    _min = x[0]
    _max = x[0]
    total = 0
    for i in x:
        if i < _min: _min = i 
        elif i > _max: _max = i 
        total += i
    return _min, _max, total/len(x)

pure python function to compare against:

def builtinfuncs(x):
    a = min(x)
    b = max(x)
    avg = sum(x) / len(x)
    return a,b,avg


In [16]: x = [random.randint(0,1000) for _ in range(10000)]

In [17]: %timeit minmaxAvg(x)
10000 loops, best of 3: 34 µs per loop

In [18]: %timeit frob(x)
1000 loops, best of 3: 460 µs per loop

Disclaimer:
- Speed result from cython will be dependent on computer hardware.
- Not as flexible and foolproof as using builtins. You would have to change the function to handle anything but integers for example.
- Before going down this path, you should ask yourself if this operation really is a big bottleneck in your application. It's probably not.

Autres conseils

If you have pandas installed, you can do something like this:

import numpy as np
import pandas
s = pandas.Series(np.random.normal(size=37))
stats = s.describe()

stats will be a another series that behaves like a dictionary:

print(stats)
count    37.000000
mean      0.072138
std       0.932000
min      -1.267888
25%      -0.688728
50%      -0.048624
75%       0.784244
max       2.501713
dtype: float64

stats['max']
2.501713

...etc. However, I don't recommend this unless you're striving simply for concise code. Here's why:

%%timeit
stats = s.describe()
# 100 loops, best of 3: 1.44 ms per loop

%%timeit
mymin = min(s)
mymax = max(s)
myavg = sum(s)/len(s)
# 10000 loops, best of 3: 89.5 µs per loop

I just can't imagine that you'll be able to squeeze any more performance out of the built-in functions with your own implementations (barring some cython voodoo, maybe).

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top