Performance of row- vs. column-wise matrix traversal

https://cs.stackexchange.com/questions/14826

16-10-2019
|

Question

Scott Meyers describes here that traversing a symmetric matrix row-wise performes significantly better over traversing it column-wise - which is also significantly counter-intuitive.

The reasoning is connected with how the CPU-caches are utilized. But I do not really understand the explanation and I would like to get it because I think it is relevant to me.

Is it possible to put it in more simple terms for somebody not holding a PhD in computer architecture and lacking experience in hardware-level programming?

Solution

In today's standard architectures, the cache uses what is called "spatial-locality". This is the intuitive idea that if you call some cell in the memory, it is likely that you will want to read cells that are "close by". Indeed, this is what happens when you read 1D arrays.

Now, consider how a matrix is represented in the memory: a 2D matrix is simply encoded as a 1D array, row by row. For example, the matrix $\left(\begin{array}{ll} 2, 3 \\4, 5\end{array}\right)$ is represented as $2,3,4,5$.

When you start reading the matrix in cell $(0,0)$, the CPU automatically caches the cells that are close by, which start by the first row (and if there is enough cache, may also go to the next row, etc).

If your algorithm works row-by-row, then the next call will be to an element still in this row, which is cached, so you will get a fast response. If, however, you call an element in a different row (albeit in the same column), you are more likely to get a cache miss, and you will need to fetch the correct cell from a higher memory.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange