numpy.vstack
is perfect for this situation
import numpy as np
arr = np.ones((50,100,25))
np.vstack(arr).shape
> (5000, 25)
I prefer to use stack
, vstack
or hstack
over reshape
because reshape
just scans through the data and seems to brute-force it into the desired shape. This can be problematic if you are e.g. going to take column averages.
Here's an illustration of what I mean. Suppose we have the following array
>>> arr.shape
(2, 3, 4)
>>> arr
array([[[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]],
[[7, 7, 7, 7],
[7, 7, 7, 7],
[7, 7, 7, 7]]])
We apply both methods to get an array of shape (3,8)
>>> arr.reshape((3,8)).shape
(3, 8)
>>> np.hstack(arr).shape
(3, 8)
However if we look at how they have been reshaped in each case, the hstack
would allow us to take column sums that we could also have calculated from the original array. With reshape this isn't possible.
>>> arr.reshape((3,8))
array([[1, 2, 3, 4, 1, 2, 3, 4],
[1, 2, 3, 4, 7, 7, 7, 7],
[7, 7, 7, 7, 7, 7, 7, 7]])
>>> np.hstack(arr)
array([[1, 2, 3, 4, 7, 7, 7, 7],
[1, 2, 3, 4, 7, 7, 7, 7],
[1, 2, 3, 4, 7, 7, 7, 7]])