Question

I've got a large iterable, in fact, a large iterable given by:

itertools.permutations(range(10))

I would like to access to the millionth element. I alredy have problem solved in some different ways.

  1. Casting iterable to list and getting 1000000th element:

    return list(permutations(range(10)))[999999]
    
  2. Manually skiping elements till 999999:

    p = permutations(range(10))
    for i in xrange(999999): p.next()
    return p.next()
    
  3. Manually skiping elements v2:

    p = permutations(range(10))
    for i, element in enumerate(p):
        if i == 999999:
            return element
    
  4. Using islice from itertools:

    return islice(permutations(range(10)), 999999, 1000000).next()
    

But I still don't feel like none of them is the python's elegant way to do that. First option is just too expensive, it needs to compute the whole iterable just to access a single element. If I'm not wrong, islice does internally the same computation I just did in method 2, and is almost exactly as 3rd, maybe it has even more redundant operations.

So, I'm just curious, wondering if there is in python some other way to access to a concrete element of an iterable, or at least to skip the first elements, in some more elegant way, or if I just need to use one of the aboves.

Was it helpful?

Solution

Use the itertools recipe consume to skip n elements:

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

Note the islice() call there; it uses n, n, effectively not returning anything, and the next() function falls back to the default.

Simplified to your example, where you want to skip 999999 elements, then return element 1000000:

return next(islice(permutations(range(10)), 999999, 1000000))

islice() processes the iterator in C, something that Python loops cannot beat.

To illustrate, here are the timings for just 10 repeats of each method:

>>> from itertools import islice, permutations
>>> from timeit import timeit
>>> def list_index():
...     return list(permutations(range(10)))[999999]
... 
>>> def for_loop():
...     p = permutations(range(10))
...     for i in xrange(999999): p.next()
...     return p.next()
... 
>>> def enumerate_loop():
...     p = permutations(range(10))
...     for i, element in enumerate(p):
...         if i == 999999:
...             return element
... 
>>> def islice_next():
...     return next(islice(permutations(range(10)), 999999, 1000000))
... 
>>> timeit('f()', 'from __main__ import list_index as f', number=10)
5.550895929336548
>>> timeit('f()', 'from __main__ import for_loop as f', number=10)
1.6166789531707764
>>> timeit('f()', 'from __main__ import enumerate_loop as f', number=10)
1.2498459815979004
>>> timeit('f()', 'from __main__ import islice_next as f', number=10)
0.18969106674194336

The islice() method is nearly 7 times faster than the next fastest method.

OTHER TIPS

Finding the nth permutation may just be an example but if this is actually the problem you are trying to solve then there is a much better way to do this. Instead of skipping the elements of the iterable you can calculate the nth permutation directly. Borrowing the code from another answer here:

import math

def nthperm(li, n):
    li = list(li)
    n -= 1
    s = len(li)
    res = []
    if math.factorial(s) <= n:
        return None
    for x in range(s-1,-1,-1):
        f = math.factorial(x)
        d = n / f
        n -= d * f
        res.append(li[d])
        del(li[d])
    return res

Example and timing comparison:

In [4]: nthperm(range(10), 1000000)
Out[4]: [2, 7, 8, 3, 9, 1, 5, 4, 6, 0]

In [5]: next(islice(permutations(range(10)), 999999, 1000000))
Out[5]: (2, 7, 8, 3, 9, 1, 5, 4, 6, 0)

In [6]: %timeit nthperm(range(10), 1000000)
100000 loops, best of 3: 9.01 us per loop

In [7]: %timeit next(islice(permutations(range(10)), 999999, 1000000))
10 loops, best of 3: 29.5 ms per loop

Same answer, over 3000 times faster. Note that I did make a slight modification to the original code so that it will no longer destroy the original list.

It is indeed awefully wasteful to slurp up a million items just to get to the next. Unfortunately, whether it can be avoided depends on your iterator: If the iterator has a way to skip directly to a particular offset, it can implement the __getitem__ method and you can use it to request iterator[1000000] directly. (How it gets there is up to the generating algorithm).

If your data source needs to generate all the prior values in order to get there, how you throw them away is the least of your problems. You can choose a nice way, but it's just icing on the cake.

PS. Given the context of your question I was going to outline an algorithm for generating the n-th permutation directly, but I see @F.J. beat me to it. Nice solution! :-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top