Python Generator that yields more results takes more time to create

https://stackoverflow.com/questions/22935702

29-06-2023
|

Question

I have the following code in Python:

import time
import sys

def returnlist(times):
    t = time.time()
    l = [i for i in range(times)]
    print "list: {}".format(time.time() - t)
return l

def returngenerator(times):
    t = time.time()
    g = (i for i in range(times))
    print "generator: {}".format(time.time() - t)
    return g

g = returngenerator(times)
l = returnlist(times)

1.For times = 1000000 I get the results:

generator: 0.107323884964

list: 0.225493192673

2.For times = 10000000 I get:

generator: 0.856524944305

list: 1.83883309364

I understand why the 2nd list would take more time to create but why would the 2nd generator take more time as well? I assumed that due to lazy evaluation it would take about the same time to create as the 1st generator.

I am running this program on an Ubuntu VM

La solution

The problem in your code is the range function. In Python 2, it creates a list. For large lists like the ones in your benchmarks, this becomes a problem. In Python 3, range returns a generator. A workaround for Python 2 is to use the xrange function, which is lazy as well.

As a test, let's create a benchmark function like yours, but using xrange:

def returngenerator2(times):
    t = time.time()
    g = (i for i in xrange(times))
    print "generator2: {}".format(time.time() - t)
    return g

And test it:

>>> l = returnlist(10**7)
list: 0.580000162125
>>> g = returngenerator(10**7)
generator: 0.115000009537
>>> x = returngenerator2(10**7)
generator2: 0.0
>>> x2 = returngenerator2(10**8)
generator2: 0.0
>>> x3 = returngenerator2(10**9)
generator2: 0.0

Seems to work. :)

Autres conseils

Because range() returns an actual list in Python 2. In python 3, this was changed so that range() was a generator. Use xrange() in Python 2 to improve performance.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow