Domanda

I would appreciate any help please :)

I'm trying to create a record array from 1d array of strings and 2d array of numbers (so I can use np.savetxt and dump it into a file). Unfortunately the docs aren't informative: np.core.records.fromarrays

>>> import numpy as np
>>> x = ['a', 'b', 'c']
>>> y = np.arange(9).reshape((3,3))
>>> print x
['a', 'b', 'c']
>>> print y
[[0 1 2]
 [3 4 5]
 [6 7 8]]
>>> records = np.core.records.fromarrays([x,y])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/numpy/core/records.py", line 560, in fromarrays
    raise ValueError, "array-shape mismatch in array %d" % k
ValueError: array-shape mismatch in array 1

And the output I need is:

[['a', 0, 1, 2]
 ['b', 3, 4, 5]
 ['c', 6, 7, 8]]
È stato utile?

Soluzione

If all you wish to do is dump x and y to a CSV file, then it is not necessary to use a recarray. If, however, you have some other reason for wanting a recarray, here is how you could create it:

import numpy as np
import numpy.lib.recfunctions as recfunctions

x = np.array(['a', 'b', 'c'], dtype=[('x', '|S1')])
y = np.arange(9).reshape((3,3))
y = y.view([('', y.dtype)]*3)

z = recfunctions.merge_arrays([x, y], flatten=True)
# [('a', 0, 1, 2) ('b', 3, 4, 5) ('c', 6, 7, 8)]

np.savetxt('/tmp/out', z, fmt='%s')

writes

a 0 1 2
b 3 4 5
c 6 7 8

to /tmp/out.


Alternatively, to use np.core.records.fromarrays you would need to list each column of y separately, so the input passed to fromarrays is, as the doc says, a "flat list of arrays".

x = ['a', 'b', 'c']
y = np.arange(9).reshape((3,3))
z = np.core.records.fromarrays([x] + [y[:,i] for i in range(y.shape[1])])

Each item in the list passed to fromarrays will become one column of the resultant recarray. You can see this by inspecting the source code:

_array = recarray(shape, descr)

# populate the record array (makes a copy)
for i in range(len(arrayList)):
    _array[_names[i]] = arrayList[i]

return _array

By the way, you might want to use pandas here for the extra convenience (no mucking around with dtypes, flattening, or iterating over columns required):

import numpy as np
import pandas as pd

x = ['a', 'b', 'c']
y = np.arange(9).reshape((3,3))

df = pd.DataFrame(y)
df['x'] = x

print(df)
#    0  1  2  x
# 0  0  1  2  a
# 1  3  4  5  b
# 2  6  7  8  c

df.to_csv('/tmp/out')
# ,0,1,2,x
# 0,0,1,2,a
# 1,3,4,5,b
# 2,6,7,8,c
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top