Frage

There were a few questions on SO, such as this one, about performance degradation when arrays or matrices happen to align with cache sizes. The idea how to solve it in hardware has been around for decades. Why then modern computers don't interleave caches to reduce the consequences of super-alignment?

War es hilfreich?

Lösung

Most modern caches are already banked, but that (like the memory banking as your link states), is meant to improve access timing and sequential access bandwidth, not solve other problems.

The question you link was solved as bad coding (traversing row-wise instead of column-wise), but in general - if you want to solve issues emerging from bad alignment in caches - you're looking for cache skewed-associativity (example paper). According to this method, the set mapping is not based on simple set bits, but instead involves some shuffle based on the tag bits - this allows better spread of data in cases where it would otherwise be conflicting over the same sets. Note that this wouldn't really help you in case you're using up your entire cache, just for corner cases where you have some "hot sets" being overused while others are left mostly untouched.

However, this is not a common practice as far as I know, because it's a very specific problem and can be easily solved in code (or through a compiler), and therefore probably not worth a HW solution.


Edit:
Did a few more searches following the question by Paul - it seems that closer caches that are latency critical aren't using this (or at least it's not being published, but I guess if it was done it would appear in optimization guides as it's important for performance tuning and easily detectable). This would probably include the L1 and the TLBs that have to be queried on any memory access.

However, according this this link, it is done at least in the L3 cache of some Intel chips: http://www.realworldtech.com/sandy-bridge/8/

There is one slice of the L3 cache for each core, and each slice can provide half a cache line (32B) to the data ring per cycle. All physical addresses are distributed across the cache slices with a single hash function. Partitioning data between the cache slices simplifies coherency, increases the available bandwidth and reduces hot spots and contention for cache addresses.

So it is used at least for large scale, and less latency critical caches.

Andere Tipps

Interleaving solves a different problem (memory access delays). Since caches are fast, interleaving doesn't really help. For cache alignment issues, the traditional solution is to increase the associativity.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top