There are good answers for visualization however it might help to think purely from analytical perspective.
You can create array of arbitrary dimension with numpy.
For example, here's a 5-dimension array:
>>> a = np.random.rand(2, 3, 4, 5, 6)
>>> a.shape
(2, 3, 4, 5, 6)
You can access any element of this array by specifying indices. For example, here's the first element of this array:
>>> a[0, 0, 0, 0, 0]
0.0038908603263844155
Now if you take out one of the dimensions, you get number of elements in that dimension:
>>> a[0, 0, :, 0, 0]
array([0.00389086, 0.27394775, 0.26565889, 0.62125279])
When you apply a function like sum
with axis
parameter, that dimension gets eliminated and array of dimension less than original gets created. For each cell in new array, the operator will get list of elements and apply the reduction function to get a scaler.
>>> np.sum(a, axis=2).shape
(2, 3, 5, 6)
Now you can check that the first element of this array is sum of above elements:
>>> np.sum(a, axis=2)[0, 0, 0, 0]
1.1647502999560164
>>> a[0, 0, :, 0, 0].sum()
1.1647502999560164
The axis=None
has special meaning to flatten out the array and apply function on all numbers.
Now you can think about more complex cases where axis is not just number but a tuple:
>>> np.sum(a, axis=(2,3)).shape
(2, 3, 6)
Note that we use same technique to figure out how this reduction was done:
>>> np.sum(a, axis=(2,3))[0,0,0]
7.889432081931909
>>> a[0, 0, :, :, 0].sum()
7.88943208193191
You can also use same reasoning for adding dimension in array instead of reducing dimension:
>>> x = np.random.rand(3, 4)
>>> y = np.random.rand(3, 4)
# New dimension is created on specified axis
>>> np.stack([x, y], axis=2).shape
(3, 4, 2)
>>> np.stack([x, y], axis=0).shape
(2, 3, 4)
# To retrieve item i in stack set i in that axis
Hope this gives you generic and full understanding of this important parameter.