Domanda

Was writing a blog post about some python coding styles and came across something that I found very strange and I was wondering if someone understood what was going on with it. Basically I've got two versions of the same function:

a = lambda x: (i for i in range(x))
def b(x):
    for i in range(x):
        yield i

And I want to compare the performance of these two doing just being set up. In my mind this should involve a negligible amount of computation and both methods should come up pretty close to zero, however, when I actually ran the timeit:

def timing(x, number=10):
    implicit = timeit.timeit('a(%s)' % int(x), 'from __main__ import a', number=number)
    explicit = timeit.timeit('b(%s)' % int(x), 'from __main__ import b', number=number)
    return (implicit, explicit)

def plot_timings(*args, **kwargs):
    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    x_vector = np.linspace(*args, **kwargs)
    timings = np.vectorize(timing)(x_vector)
    ax.plot(x_vector, timings[0], 'b--')
    ax.plot(x_vector, timings[1], 'r--')
    ax.set_yscale('log')
    plt.show()

plot_timings(1, 1000000, 20)

I get a HUGE difference between the two methods as shown below:

<code>a</code> is in blue, <code>b</code> is in red

Where a is in blue, and b is in red.

Why is the difference so huge? It looks the explicit for loop version is also growing logarithmically, while the implicit version is doing nothing (as it should).

Any thoughts?

È stato utile?

Soluzione

The difference is caused by range

a needs to call range when you construct it.
b doesn't need to call range until the first iteration

>>> def myrange(n):
...     print "myrange(%s)"%n
...     return range(n)
... 
>>> a = lambda x: (i for i in myrange(x))
>>> def b(x):
...     for i in myrange(x):
...         yield i
... 
>>> a(100)
myrange(100)
range(100)
<generator object <genexpr> at 0xd62d70>
>>> b(100)
<generator object b at 0xdadb90>
>>> next(_)   # <-- first iteration of b(100)
myrange(100)
range(100)
0

Altri suggerimenti

The lambda call is the slow one. Check this out:

import cProfile

a = lambda x: (i for i in range(x))

def b(x):
    for i in range(x):
        yield i

def c(x):
    for i in xrange(x):
        yield i

def d(x):
    i = 0
    while i < x:
        yield i
        i += 1


N = 100000
print " -- a --"
cProfile.run("""
for x in xrange(%i):
    a(x)
""" % N)

print " -- b --"
cProfile.run("""
for x in xrange(%i):
    b(x)
""" % N)

print " -- c --"
cProfile.run("""
for x in xrange(%i):
    c(x)
""" % N)

print " -- d --"
cProfile.run("""
for x in xrange(%i):
    d(x)
""" % N)

print " -- a (again) --"
cProfile.run("""
for x in xrange(%i):
    a(x)
""" % N)

Gives me the following results:

 -- a --
         300002 function calls in 61.764 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   30.881   30.881   61.764   61.764 <string>:3(<module>)
   100000    0.051    0.000    0.051    0.000 test.py:5(<genexpr>)
   100000    0.247    0.000   30.832    0.000 test.py:5(<lambda>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000   30.585    0.000   30.585    0.000 {range}


 -- b --
         100002 function calls in 0.076 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.066    0.066    0.076    0.076 <string>:3(<module>)
   100000    0.010    0.000    0.010    0.000 test.py:7(b)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


 -- c --
         100002 function calls in 0.075 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.065    0.065    0.075    0.075 <string>:3(<module>)
   100000    0.010    0.000    0.010    0.000 test.py:11(c)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


 -- d --
         100002 function calls in 0.075 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.065    0.065    0.075    0.075 <string>:3(<module>)
   100000    0.010    0.000    0.010    0.000 test.py:15(d)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


 -- a (again) --
         300002 function calls in 60.890 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   30.487   30.487   60.890   60.890 <string>:3(<module>)
   100000    0.049    0.000    0.049    0.000 test.py:5(<genexpr>)
   100000    0.237    0.000   30.355    0.000 test.py:5(<lambda>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000   30.118    0.000   30.118    0.000 {range}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top