Question

I have a list of some (small number) of items, eg:

my_list = [1,2,3,4,5,6,7,8,9,10]

and I have a tuple of indexes, eg:

indexes = (1,5,9)

I want the tuple of the values from the list, eg:

tuple(my_list[x] for x in indexes)

but this is proving to be quite slow (when run many many times).

The tuple of indexes doesn't change for each list I run this over - so is there a faster way?

I'm using Python 2.5, and I get these surprising results so far:

python -m timeit -s "indexes = (1,5,9); l = [1,2,3,4,5,6,7,8,9,10]" "tuple(l[i] for i in indexes)"
100000 loops, best of 3: 3.02 usec per loop

python -m timeit -s "indexes = (1,5,9); l = [1,2,3,4,5,6,7,8,9,10]" "tuple([l[i] for i  in indexes])"
1000000 loops, best of 3: 0.707 usec per loop

Is this an anomaly, or is the list comprehension really that much better than the generator expression?

Was it helpful?

Solution

operator.itemgetter (do you really have to use 2.5? It's dead and buried.)

Aside from being simpler, it also ought to be slightly faster due to being implemented in C. You can construct an itemgetter item once when you know which indices you want, then repeatedly call it on many lists. It still needs to copy N items and create a tuple each time, but it should do this as fast as reasonably possible.

OTHER TIPS

A tuple is an immutable sequence, so when it is created (and its memory allocated), it does need to know how much elements it will contain first. This means that when creating a tuple from a generator expression, the generator will have to be iterated completely first—and because generators can only be consumed once—and the elements need to be stored somewhere. So what’s happening can be compared with this:

tuple(list(generator))

Now, creating a list from a generator expression is slower than creating a list using a list comprehension, so you can save time by just creating a list using a list comprehension instead.

And if you don’t have a real reason to use a tuple, i.e. if you don’t require the immutability, you could also keep the list and not convert it to a tuple to save a bit more time.

Finally, no, there isn’t really a better way than to iterate your indexes and query the sequence for each. Even if the indexes are the same all the time, they still have to be evaluated for each list, so you have to repeat it anyway.

You can however save some more time if those indexes are actually fixed. Because a simple (l[1], l[5], l[9]) will be a lot faster than anything else ;)


Here are some references from the source (using 3.4 here, but it should be similar in 2.x):

The creation of a tuple using the built-in tuple() function is done in the function PySequence_Tuple.

If the argument is a list, then Python will handle that explicitely by calling PyList_AsTuple which essentially allocates a tuple of the list length and then just copies all items over.

Otherwise, it will create an iterator from the argument and first try to guess the length. As generators have no length, Python will use the default guess of 10 and allocate a tuple of that length—note that for your tuple, we have allocated 7 spaces too many. It will then iterate the iterator and assign each value to its place in the tuple. Afterwards, it will resize the created tuple.

Now, the actual difference is likely the way list comprehensions work. List comprehensions are essentially a sequence of low-level list-appends. So that works similarly to the way the tuples are filled in PySequence_Tuple, described above. So from that, both methods would be equal. However, the difference with generator expressions is that they have an overhead of actually creating a generator (a sequence of yields) which needs to be iterated over. So all that is an extra which you avoid when having a list comprehension.

Another option, albeit slower than delnan's, is using __getitem__ in cinjunction with map. However, even with the import-statement, delnan's version is faster.

In [36]: %timeit tuple(map(my_list.__getitem__,indexes))
1000000 loops, best of 3: 653 ns per loop


In [38]: %timeit itemgetter(*indexes)(my_list)
1000000 loops, best of 3: 292 ns per loop

Without ipython:

python -m timeit -s "indexes = (1,5,9); l = [1,2,3,4,5,6,7,8,9,10]" "tuple(map(l.__getitem__,indexes))"
1000000 loops, best of 3: 0.645 usec per loop

python -m timeit -s "import operator" "indexes = (1,5,9); l = [1,2,3,4,5,6,7,8,9,10]" "operator.itemgetter(*indexes)(l)"
1000000 loops, best of 3: 0.463 usec per loop

Looks like the conversion to tuple is making the map-variant slower than the itemgetter-variant:

python -m timeit -s "indexes = (1,5,9); l = [1,2,3,4,5,6,7,8,9,10]" "map(l.__getitem__,indexes)"
1000000 loops, best of 3: 0.489 usec per loop
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top