Question

I don't understand why numpy.genfromtxt doesn't split the following string correctly using delimiter="," while it works for most of the other strings in my chunk.

chunk[12968]
Out[143]: '2901869281,3279442095,2012-12-15T23:00:00.003Z,Sacramento,CA,R#3817874,United States,38.583,-121.498,11, 8, 6, 5, 1, 0, 2, 3, 3, 5, 3, 3, 2, 2, 6, 6, 1, 2, 3, 0, 1, 1, 0, 0, 2, 2, 2, 2, 1, 0, 0, 2, 1, 0, 1, 1, 2, 0, 3, 1, 1, 1, 1, 0, 0, 4, 0, 0, 0, 1, 3, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 9, 0, 0, 0, 2, 3, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,130\n'

I would expect an array of shape (110,) but get the following

genfromtxt([chunk[12968]],delimiter=",",dtype=np.int64)
Out[142]: 
array([2901869281, 3279442095,         -1,         -1,         -1,
               -1], dtype=int64)

Note that I am using izip_longest from itertools to read a large *csv by chunks this way:

with open('events.csv','r') as:
    for chunk in izip_longest(*[f] *50000):
          ...

Thanks for help.

Was it helpful?

Solution

The comments argument to genfromtxt() defaults to '#', so everything past the # in your input is getting ignored:

2901869281,3279442095,2012-12-15T23:00:00.003Z,Sacramento,CA,R#3817874,United States,...
                                                              ^ start of comment
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top