Question

I generate data using numpy.genfromtxt like this:

ConvertToDate = lambda s:datetime.strptime(s,"%d/%m/%Y")
data= numpy.genfromtxt(open("PSECSkew.csv", "rb"), 
                        delimiter=',',
                        dtype=[('CalibrationDate', datetime),('Expiry', datetime), ('B0', float), ('B1', float), ('B2', float), ('ATMAdjustment', float)],
                        converters={0: ConvertToDate, 1: ConvertToDate})

I now want to extract the last 4 columns (of each row but in a loop so lets just consider a single row) to separate variables. So I do this:

    B0 = data[0][2]
    B1 = data[0][3]
    B2 = data[0][4]
    ATM = data[0][5]

But if I can do this (like I could with a normal 2D ndarray for example) I would prefer it:

    B0, B1, B2, ATM = data[0][2:]

But this gives me an 'invalid index' error. Is there a way to do this nicely or should I stick with the 4 line approach?

Was it helpful?

Solution

As output of np.genfromtxt, you have a structured array, that is, a 1D array where each row as different fields.

If you want to access some fields, just access them by names:

data["B0"], data["B1"], ...

You can also group them:

data[["B0", "B1]]

which gives you a 'new' structured array with only the fields you wanted (quotes around 'new' because the data is not copied, it's still the same as your initial array).

Should you want some specific 'rows', just do:

data[["B0","B1"]][0]

which outputs the first row. Slicing and fancy indexing work too.

So, for your example:

B0, B1, B2, ATM = data[["B0","B1","B2","ATMAdjustment"]][0]

If you want to access only those fields row after row, I would suggest to store the whole array of the fields you want first, then iterate:

filtered_data = data[["B0","B1","B2","ATMAdjustment"]]
for row in filtered_data:
    (B0, B1, B2, ATM) = row
    do_something

or even :

for (B0, B1, B2, ATM) in filtered_data:
    do_something
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top