How to change the dtype of a ndarray to custom one in numpy?

https://stackoverflow.com/questions/7644597

06-02-2021
|

Question

I made a dtype that is:

mytype = np.dtype([('a',np.uint8), ('b',np.uint8), ('c',np.uint8)])

so the array using this dtype:

test1 = np.zeros(3, dtype=mytype)

test1 is:

array([(0, 0, 0), (0, 0, 0), (0, 0, 0)],
      dtype=[('a', '|u1'), ('b', '|u1'), ('c', '|u1')])

Now I have test2:

test2 = np.array([[1,2,3], [4,5,6], [7,8,9]])

When I use test2.astype(mytype), the result is not what I want to be:

array([[(1, 1, 1), (2, 2, 2), (3, 3, 3)],
       [(4, 4, 4), (5, 5, 5), (6, 6, 6)],
       [(7, 7, 7), (8, 8, 8), (9, 9, 9)]],
      dtype=[('a', '|u1'), ('b', '|u1'), ('c', '|u1')])

I want the result to be:

array([(1, 2, 3), (4, 5, 6), (7, 8, 9)],
      dtype=[('a', '|u1'), ('b', '|u1'), ('c', '|u1')])

Is there any way? Thanks.

Solution

You can use the fromarrays method of numpy.core.records (see documentation):

np.rec.fromarrays(test2.T, mytype)
Out[13]: 
rec.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)], 
      dtype=[('a', '|u1'), ('b', '|u1'), ('c', '|u1')])

The array has to be transposd first because the functions regards the rows of the array as the columns of the structured array in the output. See also this question: Converting a 2D numpy array to a structured array

OTHER TIPS

Because all the fields are the same type, you can also use:

>>> test2.astype(np.uint8).view(mytype).squeeze(axis=-1)
array([(1, 2, 3), (4, 5, 6), (7, 8, 9)], 
      dtype=[('a', 'u1'), ('b', 'u1'), ('c', 'u1')])

The squeeze is needed because test2 is 2d, but you wanted a 1d result

When creating the array, if the input iterable contains tuples (which are guaranteed to be immutable) instead of lists (which are guaranteed not to be) then it will automatically take the input in the way you desire so long as the number of items in each tuple equals the number of fields in the structure:

In[7]: test2 = np.array([(1,2,3), (4,5,6), (7,8,9)], dtype = mytype)

In[8]: test2
Out[8]: 
array([(1, 2, 3), (4, 5, 6), (7, 8, 9)],
      dtype=[('a', 'u1'), ('b', 'u1'), ('c', 'u1')])

There is no need to go to np.rec etc for just this. If however the input iterable contains lists and not tuples, then numpy doesn't take the fields one for one as you expect and does the data duplication.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow