Pythonic iteration over sliding window pairs in list?

https://stackoverflow.com/questions/13014595

13-07-2021
|

Question

What's the most Pythonic efficient way to iterate over a list in sliding pairs? Here's a related example:

>>> l
['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> for x, y in itertools.izip(l, l[1::2]): print x, y
... 
a b
b d
c f

this is iteration in pairs, but how can we get iteration over a sliding pair? Meaning iteration over the pairs:

a b
b c
c d
d e
etc.

which is iteration over the pairs, except sliding the pair by 1 element each time rather than by 2 elements. thanks.

Solution

How about:

for x, y in itertools.izip(l, l[1:]): print x, y

OTHER TIPS

You can go even simpler. Just zip the list and the list offset by one.

In [4]: zip(l, l[1:])
Out[4]: [('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('e', 'f'), ('f', 'g')]

Here is a little generator that I wrote a while back for a similar scenario:

def pairs(items):
    items_iter = iter(items)
    prev = next(items_iter)

    for item in items_iter:
        yield prev, item
        prev = item

Here's a function for arbitrarily sized sliding windows that works for iterators/generators as well as lists

def sliding(seq, n):
  return izip(*starmap(islice, izip(tee(seq, n), count(0), repeat(None))))

Nathan's solution is probably more efficient though.

The timing, as defined by the addition of two subsequent entries in the list, is displayed below and ordered from fastest to slowest.

Gil

In [69]: timeit.repeat("for x,y in itertools.izip(l, l[1::1]): x + y", setup=setup, number=1000)
Out[69]: [1.029047966003418, 0.996290922164917, 0.998831033706665]

Geoff Reedy

In [70]: timeit.repeat("for x,y in sliding(l,2): x+y", setup=setup, number=1000)
Out[70]: [1.2408790588378906, 1.2099130153656006, 1.207326889038086]

Alestanis

In [66]: timeit.repeat("for i in range(0, len(l)-1): l[i] + l[i+1]", setup=setup, number=1000)
Out[66]: [1.3387370109558105, 1.3243639469146729, 1.3245630264282227]

chmullig

In [68]: timeit.repeat("for x,y in zip(l, l[1:]): x+y", setup=setup, number=1000)
Out[68]: [1.4756009578704834, 1.4369518756866455, 1.5067830085754395]

Nathan Villaescusa

In [63]: timeit.repeat("for x,y in pairs(l): x+y", setup=setup, number=1000)
Out[63]: [2.254757881164551, 2.3750967979431152, 2.302199125289917]

sr2222

Notice the reduced repetition number...

In [60]: timeit.repeat("for x,y in SubsequenceIter(l,2): x+y", setup=setup, number=100)
Out[60]: [1.599524974822998, 1.5634570121765137, 1.608154058456421]

The setup code:

setup="""
from itertools import izip, starmap, islice, tee, count, repeat
l = range(10000)

def sliding(seq, n):
  return izip(*starmap(islice, izip(tee(seq, n), count(0), repeat(None))))

class SubsequenceIter(object):

    def __init__(self, iterable, subsequence_length):

        self.iterator = iter(iterable)
        self.subsequence_length = subsequence_length
        self.subsequence = [0]

    def __iter__(self):

        return self

    def next(self):

        self.subsequence.pop(0)
        while len(self.subsequence) < self.subsequence_length:
            self.subsequence.append(self.iterator.next())
        return self.subsequence

def pairs(items):
    items_iter = iter(items)
    prev = items_iter.next()

    for item in items_iter:
        yield (prev, item)
        prev = item
"""

Not exactly the most efficient, but quite flexible:

class SubsequenceIter(object):

    def __init__(self, iterable, subsequence_length):

        self.iterator = iter(iterable)
        self.subsequence_length = subsequence_length
        self.subsequence = [0]

    def __iter__(self):

        return self

    def next(self):

        self.subsequence.pop(0)
        while len(self.subsequence) < self.subsequence_length:
            self.subsequence.append(self.iterator.next())
        return self.subsequence

Usage:

for x, y in SubsequenceIter(l, 2):
    print x, y

No need for imports, this will work provided a list of objects or a string; anything with var[indexing]. Tested on python 3.6

# This will create windows with all but 1 overlap
def ngrams_list(a_list, window_size=5, skip_step=1):
    return list(zip(*[a_list[i:] for i in range(0, window_size, skip_step)]))

the for loop by itself creates this with a_list being the alphabet (shown window = 5, OP would want window=2:

['ABCDEFGHIJKLMNOPQRSTUVWXYZ',
 'BCDEFGHIJKLMNOPQRSTUVWXYZ', 
 'CDEFGHIJKLMNOPQRSTUVWXYZ', 
 'DEFGHIJKLMNOPQRSTUVWXYZ',
 'EFGHIJKLMNOPQRSTUVWXYZ']

zip(*result_of_for_loop) will collect all full vertical columns as results. And if you want less than all-but-one overlap:

# You can sample that output to get less overlap:
def sliding_windows_with_overlap(a_list, window_size=5, overlap=2):
    zip_output_as_list = ngrams_list(a_list, window_size)])
    return zip_output_as_list[::overlap+1]

With overlap=2 it skips the columns starting with B& C, and choosing the D

[('A', 'B', 'C', 'D', 'E'),
 ('D', 'E', 'F', 'G', 'H'), 
 ('G', 'H', 'I', 'J', 'K'), 
 ('J', 'K', 'L', 'M', 'N'), 
 ('M', 'N', 'O', 'P', 'Q'), 
 ('P', 'Q', 'R', 'S', 'T'), 
 ('S', 'T', 'U', 'V', 'W'), 
 ('V', 'W', 'X', 'Y', 'Z')]

EDIT: looks like this is similar to what @chmullig provided, with options

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow