如何在 Python 生成器中向前查看(peek)一个元素?
题
我不知道如何在 Python 生成器中向前查看一个元素。我一看就不见了。
我的意思是:
gen = iter([1,2,3])
next_value = gen.next() # okay, I looked forward and see that next_value = 1
# but now:
list(gen) # is [2, 3] -- the first value is gone!
这是一个更真实的例子:
gen = element_generator()
if gen.next_value() == 'STOP':
quit_application()
else:
process(gen.next())
谁能帮我写一个可以向前看一个元素的生成器?
解决方案
在Python的发电机API是一个办法:你可以不推回你读过的元素。但是你可以使用创建一个新的迭代器的 itertools模块里和预先设置元素:
import itertools
gen = iter([1,2,3])
peek = gen.next()
print list(itertools.chain([peek], gen))
其他提示
为了完整起见, more-itertools
包(这可能应的任何部分Python程序员的工具箱),包括实施此行为的peekable
包装。如代码示例所示:
>>> p = peekable(xrange(2))
>>> p.peek()
0
>>> p.next()
0
>>> p.peek()
1
>>> p.next()
1
包是与两个Python 2和3兼容,即使文档显示的Python 2语法。
确定 - 2年为时已晚 - 但我遇到了这个问题,并没有发现任何的答案让我满意的。想出了这个元发生器:
class Peekorator(object):
def __init__(self, generator):
self.empty = False
self.peek = None
self.generator = generator
try:
self.peek = self.generator.next()
except StopIteration:
self.empty = True
def __iter__(self):
return self
def next(self):
"""
Return the self.peek element, or raise StopIteration
if empty
"""
if self.empty:
raise StopIteration()
to_return = self.peek
try:
self.peek = self.generator.next()
except StopIteration:
self.peek = None
self.empty = True
return to_return
def simple_iterator():
for x in range(10):
yield x*3
pkr = Peekorator(simple_iterator())
for i in pkr:
print i, pkr.peek, pkr.empty
结果:
0 3 False
3 6 False
6 9 False
9 12 False
...
24 27 False
27 None False
即。您在迭代访问到下一个项目在列表中在任何时刻。
可以使用itertools.tee以产生所述发电机的轻量化副本。然后,在一个拷贝偷看前方不会影响第二副本:
import itertools
def process(seq):
peeker, items = itertools.tee(seq)
# initial peek ahead
# so that peeker is one ahead of items
if next(peeker) == 'STOP':
return
for item in items:
# peek ahead
if next(peeker) == "STOP":
return
# process items
print(item)
在“项目”发生器是你骚扰“速览者”不受影响。请注意,你不应该用原来的“序列”,呼吁它三通“后,将打破东西。
FWIW,这是错的方式来解决这个问题。这需要你提前看在发电机1项的任何算法可以用这种写法使用电流发生器项目,上一个项目。然后,你不必裂伤您的发电机使用,你的代码会简单得多。见我的其他回答这个问题。
>>> gen = iter(range(10))
>>> peek = next(gen)
>>> peek
0
>>> gen = (value for g in ([peek], gen) for value in g)
>>> list(gen)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
只是为了好玩,我创建了一个前瞻类的实现基于建议 亚伦:
import itertools
class lookahead_chain(object):
def __init__(self, it):
self._it = iter(it)
def __iter__(self):
return self
def next(self):
return next(self._it)
def peek(self, default=None, _chain=itertools.chain):
it = self._it
try:
v = self._it.next()
self._it = _chain((v,), it)
return v
except StopIteration:
return default
lookahead = lookahead_chain
通过此,下面的工作:
>>> t = lookahead(xrange(8))
>>> list(itertools.islice(t, 3))
[0, 1, 2]
>>> t.peek()
3
>>> list(itertools.islice(t, 3))
[3, 4, 5]
通过这个实现是一个坏主意,打电话连续偷看多次...
在注视CPython的源代码我刚发现更好的方法是既短,更有效的:
class lookahead_tee(object):
def __init__(self, it):
self._it, = itertools.tee(it, 1)
def __iter__(self):
return self._it
def peek(self, default=None):
try:
return self._it.__copy__().next()
except StopIteration:
return default
lookahead = lookahead_tee
用法是与上面相同,但你不会在这里付出代价连续使用偷看了很多次。随着几行你也可以看看未来多个项目的迭代器(最多可用RAM)。
除了使用项(I,I + 1),其中 'i' 是当前项和i + 1的 '偷看超前' 版本,则应当使用(i-1,i)中,在哪里“我-1' 是来自发电机的以前的版本。
调整你的算法这样会产生一些等同于你当前有,除了试图“偷看提前实现”额外的不必要的复杂性。
偷看前面就是一个错误,你不应该这样做。
此将工作 - 它缓冲项并调用与所述序列中的每个项和下一个项目的功能
。您的要求是对在序列的最后会发生什么阴暗。什么是“向前看”,当你在最后一个呢?
def process_with_lookahead( iterable, aFunction ):
prev= iterable.next()
for item in iterable:
aFunction( prev, item )
prev= item
aFunction( item, None )
def someLookaheadFunction( item, next_item ):
print item, next_item
一个简单的解决方案是使用这样的功能:
def peek(it):
first = next(it)
return first, itertools.chain([first], it)
然后,你可以这样做:
>>> it = iter(range(10))
>>> x, it = peek(it)
>>> x
0
>>> next(it)
0
>>> next(it)
1
如果任何人有兴趣,并请纠正我,如果我错了,但我相信这是很容易添加一些功能推回任何迭代器。
class Back_pushable_iterator:
"""Class whose constructor takes an iterator as its only parameter, and
returns an iterator that behaves in the same way, with added push back
functionality.
The idea is to be able to push back elements that need to be retrieved once
more with the iterator semantics. This is particularly useful to implement
LL(k) parsers that need k tokens of lookahead. Lookahead or push back is
really a matter of perspective. The pushing back strategy allows a clean
parser implementation based on recursive parser functions.
The invoker of this class takes care of storing the elements that should be
pushed back. A consequence of this is that any elements can be "pushed
back", even elements that have never been retrieved from the iterator.
The elements that are pushed back are then retrieved through the iterator
interface in a LIFO-manner (as should logically be expected).
This class works for any iterator but is especially meaningful for a
generator iterator, which offers no obvious push back ability.
In the LL(k) case mentioned above, the tokenizer can be implemented by a
standard generator function (clean and simple), that is completed by this
class for the needs of the actual parser.
"""
def __init__(self, iterator):
self.iterator = iterator
self.pushed_back = []
def __iter__(self):
return self
def __next__(self):
if self.pushed_back:
return self.pushed_back.pop()
else:
return next(self.iterator)
def push_back(self, element):
self.pushed_back.append(element)
it = Back_pushable_iterator(x for x in range(10))
x = next(it) # 0
print(x)
it.push_back(x)
x = next(it) # 0
print(x)
x = next(it) # 1
print(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
it.push_back(y)
it.push_back(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
for x in it:
print(x) # 4-9
虽然itertools.chain()
是这项工作的天然工具在这里,提防这样的循环:
for elem in gen:
...
peek = next(gen)
gen = itertools.chain([peek], gen)
...因为这会消耗线性生长的内存量,并最终慢慢停止。 (此代码基本上似乎产生了一个链表,每个链()调用一个节点。)我知道这不是因为我考察了库,但因为这只是造成了我的方案的一个主要放缓 - 摆脱gen = itertools.chain([peek], gen)
线加速它再次。 (Python的3.3)
Python3片断为 @乔纳森 - 哈特利回答:
def peek(iterator, eoi=None):
iterator = iter(iterator)
try:
prev = next(iterator)
except StopIteration:
return iterator
for elm in iterator:
yield prev, elm
prev = elm
yield prev, eoi
for curr, nxt in peek(range(10)):
print((curr, nxt))
# (0, 1)
# (1, 2)
# (2, 3)
# (3, 4)
# (4, 5)
# (5, 6)
# (6, 7)
# (7, 8)
# (8, 9)
# (9, None)
这将会是简单的创建一个类,这是否对__iter__
和产量只是prev
项目,并把elm
一些属性。
WRT @大卫个Z后,较新的 seekable
一>工具可以包裹迭代器复位到先前的位置。
>>> s = mit.seekable(range(3))
>>> s.next()
# 0
>>> s.seek(0) # reset iterator
>>> s.next()
# 0
>>> s.next()
# 1
>>> s.seek(1)
>>> s.next()
# 1
>>> next(s)
# 2