Pergunta

Was writing a blog post about some python coding styles and came across something that I found very strange and I was wondering if someone understood what was going on with it. Basically I've got two versions of the same function:

a = lambda x: (i for i in range(x))
def b(x):
    for i in range(x):
        yield i

And I want to compare the performance of these two doing just being set up. In my mind this should involve a negligible amount of computation and both methods should come up pretty close to zero, however, when I actually ran the timeit:

def timing(x, number=10):
    implicit = timeit.timeit('a(%s)' % int(x), 'from __main__ import a', number=number)
    explicit = timeit.timeit('b(%s)' % int(x), 'from __main__ import b', number=number)
    return (implicit, explicit)

def plot_timings(*args, **kwargs):
    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    x_vector = np.linspace(*args, **kwargs)
    timings = np.vectorize(timing)(x_vector)
    ax.plot(x_vector, timings[0], 'b--')
    ax.plot(x_vector, timings[1], 'r--')
    ax.set_yscale('log')
    plt.show()

plot_timings(1, 1000000, 20)

I get a HUGE difference between the two methods as shown below:

a is in blue, b is in red">

Where a is in blue, and b is in red.

Why is the difference so huge? It looks the explicit for loop version is also growing logarithmically, while the implicit version is doing nothing (as it should).

Any thoughts?

Foi útil?

Solução

The difference is caused by range

a needs to call range when you construct it.
b doesn't need to call range until the first iteration

>>> def myrange(n):
...     print "myrange(%s)"%n
...     return range(n)
... 
>>> a = lambda x: (i for i in myrange(x))
>>> def b(x):
...     for i in myrange(x):
...         yield i
... 
>>> a(100)
myrange(100)
range(100)
<generator object <genexpr> at 0xd62d70>
>>> b(100)
<generator object b at 0xdadb90>
>>> next(_)   # <-- first iteration of b(100)
myrange(100)
range(100)
0

Outras dicas

The lambda call is the slow one. Check this out:

import cProfile

a = lambda x: (i for i in range(x))

def b(x):
    for i in range(x):
        yield i

def c(x):
    for i in xrange(x):
        yield i

def d(x):
    i = 0
    while i < x:
        yield i
        i += 1


N = 100000
print " -- a --"
cProfile.run("""
for x in xrange(%i):
    a(x)
""" % N)

print " -- b --"
cProfile.run("""
for x in xrange(%i):
    b(x)
""" % N)

print " -- c --"
cProfile.run("""
for x in xrange(%i):
    c(x)
""" % N)

print " -- d --"
cProfile.run("""
for x in xrange(%i):
    d(x)
""" % N)

print " -- a (again) --"
cProfile.run("""
for x in xrange(%i):
    a(x)
""" % N)

Gives me the following results:

 -- a --
         300002 function calls in 61.764 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   30.881   30.881   61.764   61.764 <string>:3(<module>)
   100000    0.051    0.000    0.051    0.000 test.py:5(<genexpr>)
   100000    0.247    0.000   30.832    0.000 test.py:5(<lambda>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000   30.585    0.000   30.585    0.000 {range}


 -- b --
         100002 function calls in 0.076 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.066    0.066    0.076    0.076 <string>:3(<module>)
   100000    0.010    0.000    0.010    0.000 test.py:7(b)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


 -- c --
         100002 function calls in 0.075 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.065    0.065    0.075    0.075 <string>:3(<module>)
   100000    0.010    0.000    0.010    0.000 test.py:11(c)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


 -- d --
         100002 function calls in 0.075 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.065    0.065    0.075    0.075 <string>:3(<module>)
   100000    0.010    0.000    0.010    0.000 test.py:15(d)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


 -- a (again) --
         300002 function calls in 60.890 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   30.487   30.487   60.890   60.890 <string>:3(<module>)
   100000    0.049    0.000    0.049    0.000 test.py:5(<genexpr>)
   100000    0.237    0.000   30.355    0.000 test.py:5(<lambda>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000   30.118    0.000   30.118    0.000 {range}
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top