Question

I am importing data using numpy.genfromtxt, and I would like to add a field of values derived from some of those within the dataset. As this is a structured array, it seems like the most simple, efficient way of adding a new column to the array is by using numpy.lib.recfunctions.append_fields(). I found a good description of this library HERE.

Is there a way of doing this without copying the array, perhaps by forcing genfromtxt to create an empty column to which I can append derived values?

Was it helpful?

Solution

Here's a simple example using a generator to add a field to a data file using genfromtxt

Our example data file will be data.txt with the contents:

1,11,1.1
2,22,2.2
3,33,3.3

So

In [19]: np.genfromtxt('data.txt',delimiter=',')
Out[19]:
array([[  1. ,  11. ,   1.1],
       [  2. ,  22. ,   2.2],
       [  3. ,  33. ,   3.3]])

If we make a generator such as:

def genfield():
    for line in open('data.txt'):
        yield '0,' + line

which prepends a comma-delimited 0 to each line of the file, then:

In [22]: np.genfromtxt(genfield(),delimiter=',')
Out[22]:
array([[  0. ,   1. ,  11. ,   1.1],
       [  0. ,   2. ,  22. ,   2.2],
       [  0. ,   3. ,  33. ,   3.3]])

You can do the same thing with comprehensions as follows:

In [26]: np.genfromtxt(('0,'+line for line in open('data.txt')),delimiter=',')
Out[26]:
array([[  0. ,   1. ,  11. ,   1.1],
       [  0. ,   2. ,  22. ,   2.2],
       [  0. ,   3. ,  33. ,   3.3]])

OTHER TIPS

I was trying to make genfromtxt read this:

11,12,13,14,15
21,22,
31,32,33,34,35
41,42,43,,45

using:

import numpy as np
print np.genfromtxt('tmp.txt',delimiter=',',filling_values='0')

but it did not work. I had to change the input adding commas to represent the empty columns:

11,12,13,14,15
21,22,,,
31,32,33,34,35
41,42,43,,45

then it worked, returning:

[[ 11.  12.  13.  14.  15.]
 [ 21.  22.   0.   0.   0.]
 [ 31.  32.  33.  34.  35.]
 [ 41.  42.  43.   0.  45.]]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top