Question

I'm trying to read a column of numbers into python with the csv module. I get the following behavior:

import csv

f=open('myfile.txt','r')
reader=csv.reader(f)
print [x for x in reader] #  This outputs the contents of "myfile.txt",
                          #  broken up by line.
print [x for x in reader] #  This line prints an empty list.

Why is this happening? Is there some reason the reader object can only be used once?

Était-ce utile?

La solution

Same reason here:

>>> li=[1,2,3,4,5,6,7,8,9]
>>> it=iter(li)
>>> print [x for x in it], [x for x in it]
[1, 2, 3, 4, 5, 6, 7, 8, 9], []

Note the empty list...

csv.reader is an iterator that produces items from a container or sequence one by one until the StopIteration exception indicates there are no more items.

For built-in types (and all library types like csv that I know of), iteration is one way, and the only way to 'go back' is to keep the items you are interested in or recreate the iterator.

You can hack/fool csv.reader by doing a backwards seek I suppose, but why do this?

You can make a copy of an iterator if you need to:

>>> it_copy=list(it)
>>> print [x for x in it_copy],[x for x in it_copy]
[1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3, 4, 5, 6, 7, 8, 9]

Or use itertools.tee as Mark Ransom notes.

The best is to just design your algorithm around a one-way trip through an iterator. Less memory and often faster.

Autres conseils

The reason you can only go one way is because the file you passed it only goes one way, if you want loop over the csv file again you can do something like

>>> with open("output.csv", 'r') as f:
    r = csv.reader(f)
    for l in r:
        print l
    f.seek(0)
    for l in r:
        print l

that was a really bad explanation, and unfortunately I don't know the term for only goes one way, perhaps someone else could help me out with my vocabulary...

When you are reading you are fetching the rows one by one. After you finish reading you are at the end of the file. You should be resetting the read position of the file object to it's begginging.

f.seek(0)
print [x for x in reader]

The reader object is an iterator, and by definition iterator objects can only be used once. When they're done iterating you don't get any more out of them.

You can use itertools.tee to split an iterator into two copies, each of which can be used independently and will return the same data. If you don't use both copies at the same time, this will unfortunately result in a copy being stored in memory and you might run out of memory.

import csv
import itertools

f=open('myfile.txt', 'r')
reader = csv.reader(f)
reader1, reader2 = itertools.tee(reader)
print [x for x in reader1] #  This outputs the contents of "myfile.txt"
print [x for x in reader2] #  This line prints the same thing.
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top