How do I properly pass string data to a numpy record array (and get it out)

StackOverflow https://stackoverflow.com/questions/23616165

  •  21-07-2023
  •  | 
  •  

سؤال

I have a large set of data that I need to manipulate with Numpy. This data set contains strings that I will need for processing downstream. When I convert the data into a structured array I specified that the data was a string. I created the record array without any errors, however, when I attempt to convert the data back into a list of nested lists my string data is gone. Here is an sample....

import numpy as np

data = [
    [100.0, 400.0, 'stringhere'],
    [200.0, 500.0, 'another sting'],
]

npdata = np.array(map(tuple, data),
                dtype=([('x', 'float64'), ('y', 'float64'), ('label', 'S'), ])
)

for entry in npdata:
    print list(entry)

This prints... [100.0, 400.0, ''] [200.0, 500.0, '0']

I'm new to structured arrays so I'm assuming I either specified my data type incorrectly or I'm misunderstanding how structured arrays deal with strings. How do I get my string data out of a structured array?

هل كانت مفيدة؟

المحلول

You need to specify the number of bytes in your string dtype. Otherwise, numpy is setting the number of bytes to 1:

In [44]: npdata['label'].dtype
Out[44]: dtype('S1')

and truncating your data.

So, for example, if you replace S with |S20 then the string dtype will support strings of up to 20 bytes:

npdata = np.array(map(tuple, data),
                dtype=([('x', 'float64'), ('y', 'float64'), ('label', '|S20'), ]))

for entry in npdata:
    print list(entry)

yields:

[100.0, 400.0, 'stringhere']
[200.0, 500.0, 'another sting']
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top