Question

I have a large set of data that I need to manipulate with Numpy. This data set contains strings that I will need for processing downstream. When I convert the data into a structured array I specified that the data was a string. I created the record array without any errors, however, when I attempt to convert the data back into a list of nested lists my string data is gone. Here is an sample....

import numpy as np

data = [
    [100.0, 400.0, 'stringhere'],
    [200.0, 500.0, 'another sting'],
]

npdata = np.array(map(tuple, data),
                dtype=([('x', 'float64'), ('y', 'float64'), ('label', 'S'), ])
)

for entry in npdata:
    print list(entry)

This prints... [100.0, 400.0, ''] [200.0, 500.0, '0']

I'm new to structured arrays so I'm assuming I either specified my data type incorrectly or I'm misunderstanding how structured arrays deal with strings. How do I get my string data out of a structured array?

Was it helpful?

Solution

You need to specify the number of bytes in your string dtype. Otherwise, numpy is setting the number of bytes to 1:

In [44]: npdata['label'].dtype
Out[44]: dtype('S1')

and truncating your data.

So, for example, if you replace S with |S20 then the string dtype will support strings of up to 20 bytes:

npdata = np.array(map(tuple, data),
                dtype=([('x', 'float64'), ('y', 'float64'), ('label', '|S20'), ]))

for entry in npdata:
    print list(entry)

yields:

[100.0, 400.0, 'stringhere']
[200.0, 500.0, 'another sting']
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top