How to convert a 2D array to a structured array using view (numpy)?

https://stackoverflow.com/questions/12190391

29-06-2021
|

Question

I am having some problems assigning fields to an array using the view method. Apparently, there doesn't seem to be a control of how you want to assign the field.

a=array([[1,2],[1,2],[1,2]]) # 3x2 matrix
#array([[1, 2],
#       [1, 2],
#       [1, 2]])  

aa=a.transpose() # 2x3 matrix
#array([[1, 1, 1],
#       [2, 2, 2]])

a.view(dtype='i8,i8') # This works
a.view(dtype='i8,i8,i8') # This returns error ValueError: new type not compatible with array.
aa.view(dtype='i8,i8') # This works
aa.view(dtype='i8,i8,i8') # This returns error ValueError: new type not compatible with array.

In fact, if I create aa from scratch instead of using transpose of a,

b=array([[1,1,1],[2,2,2]])
b.view(dtype='i8 i8') # This returns ValueError again.
b.view(dtype='i8,i8,i8') # This works

Why does this happen? Is there any way I can set the fields to represent rows or columns?

Solution

When you create a standard array in NumPy, some contiguous blocks of memory are occupied by the data. The size of each block depends on the dtype, the number and organization of these blocks by the shape of your array. Structured arrays follow the same pattern, except that each block is now composed of several sub-blocks, each sub-block occupying some space as defined by the corresponding dtype of the field.

In your example, you define a (3,2) array of ints (a). That's 2 int blocks for the first row, followed by 2 other blocks for the second and then 2 last blocks for the first. If you want to transform it into a structured array, you can either keep the original layout (each block becomes a unique field (a.view(dtype=[('f0', int)]), or transform your 2-block rows into rows of 1 larger block consisting of 2 sub-blocks, each sub-block having a int size. That's what happen when you do a.view(dtype=[('f0',int),('f1',int)]).

You can't make larger blocks (ie, dtype="i8,i8,i8"), as the corresponding information would be spread across different rows.

Now, you can display your array in a different way, for example display it column by column: that's what happen when you do a .transpose of your array. It's only display, though, ('views' in the NumPy lingo), that doesn't change the original memory layout. So, your aa example, the original layout is still "3 rows of 2 integers", that you can represent as "3 rows of one block of 2 integers".

In your second example, b=array([[1,1,1],[2,2,2]]), you have a different layout: 2 rows of 3 int blocks. You can group the 3 int blocks into one larger block (dtype="i8,i8,i8") because you're not going over a row. You can't group it two by two, because you would have an extra block on each row.

You can transform a (N,M) standard array into only (1) a N structured array of M fields or (2) a NxM structured array of 1 field and that's it. The (N,M) is the shape given to the array at its creation. You can display your array as a (M,N) array by a transposition, but that doesn't modify the original memory layout.

OTHER TIPS

when you specify the view as b.view(dtype='i8, i8') you are asking numpy to reinterpret the values as set of tuples with two values in them but this simply isn't feasible since we have 3 values which isn't a multiple of two, its like reshaping the matrix where it would generate a new matrix of different size, numpy doesn't like such things.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow