Question

What is the current way to chunk a list of the following form: ["record_a:", "x"*N, "record_b:", "y"*M, ...], i.e. a list where the start of each record is denoted by a string ending in ":", and includes all the elements up until the next record. So the following list:

["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]

would be split into:

[["record_a", "a", "b"], ["record_b", "1", "2", "3", "4"]]

The list contains an arbitrary number of records, and each record contains an arbitrary number of list items (up until when the next records begins or when there are no more records.) how can this be done efficiently?

Was it helpful?

Solution

Use a generator:

def chunkRecords(records):
    record = []
    for r in records:
        if r[-1] == ':':
            if record:
                yield record
            record = [r[:-1]]
        else:
            record.append(r)
    if record:
        yield record 

Then loop over that:

for record in chunkRecords(records):
    # record is a list

or turn in into a list again:

records = list(chunkRecords(records))

The latter results in:

>>> records = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
>>> records = list(chunkRecords(records))
>>> records
[['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]

OTHER TIPS

lst = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
out = []
for x in lst:
    if x[-1] == ':':
        out.append([x])
    else:
        out[-1].append(x)

Okay, here's my end-of-work-day crazy itertools solution:

>>> from itertools import groupby, count
>>> d = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
>>> groups = (list(g) for _, g in groupby(d, lambda x: x.endswith(":")))
>>> git = iter(groups)
>>> paired = ((next(git), next(git)) for _ in count())
>>> combined = [ [a[0][:-1]] + b for a,b in paired]
>>> 
>>> combined
[['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]

(Done more as an example of the sorts of things one can do than as a piece of code I'd necessarily use.)

from itertools import groupby,izip,chain

l = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]

[list(chain([x[0][0].strip(':')], x[1])) for x in izip(*[(list(g) 
            for _,g in groupby(l,lambda x: x.endswith(':')))]*2)]

out:

[['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top