Question

I have a huge data file (~2 G) that needs to be splitted into odd and even lines, processed separately and written into two files, I don't want to read the whole file into RAM, so I think a generator should be a suitable choice. In short I want do something like this:

lines = (l.strip() for l in open(inputfn))
oddlines = somefunction(getodds(lines))
evenlines = somefunction(getevens(lines))
outodds.write(oddlines)
outevens.write(evenlines)

Is this possible? Apparently indexing will not work:

In [75]: lines[::2]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/kaiyin/Phased/build37/chr22/segments/segment_1/<ipython-input-75-97be680d00e3> in <module>()
----> 1 lines[::2]

TypeError: 'generator' object is not subscriptable
Was it helpful?

Solution

def oddlines(fileobj):
    return (line for index,line in enumerate(fileobj) if index % 2)

def evenlines(fileobj):
    return (line for index,line in enumerate(fileobj) if not index % 2)

Note that this will require scanning the file twice, since these aren't designed to run in parallel. It does, however, lead to much less complex code. (Also note that an 'odd' line here is one with an index of 1,3,5 - which means that the first line is an 'even' line due to zero-indexing.)

As Ashwini notes, you could also use itertools.islice to do this.

OTHER TIPS

Use itertools.islice to slice an iterator:

from itertools import islice
with open('filename') as f1, open('evens.txt', 'w') as f2:
    for line in islice(f1, 0, None, 2):
        f2.write(line)

with open('filename') as f1, open('odds.txt', 'w') as f2:
    for line in islice(f1, 1, None, 2):
        f2.write(line)

If you want to read the file just once, write a generator that wraps a file and returns a flag indicating whether the line is even or odd along with the actual line read from the file.

def oddeven(f, even=True):
    for line in f:
        yield even, line
        even = not even

Usage:

with open("infile.txt") as infile, \
     open("odd.txt", "w") as oddfile, \
     open ("even.txt", "w") as evenfile:
         for even, line in oddeven(infile):
            if even:
                evenfile.write(line)
            else:
                oddfile.write(line)

This can be further simplified by storing the output file objects in an indexable container:

with open("infile.txt") as infile, \
     open("odd.txt", "w") as oddfile, \
     open ("even.txt", "w") as evenfile:
         outfiles = (oddfile, evenfile)
         for even, line in oddeven(infile):
             outfiles[even].write(line)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top