What happens to the cache contents on a context switch?

https://cs.stackexchange.com/questions/1088

16-10-2019
|

Question

In a multicore processor, what happens to the contents of a core's cache (say L1) when a context switch occurs on that cache?

Is the behaviour dependent on the architecture or is it a general behaviour followed by all chip manufacturers?

Solution

That depends both on the processor (not just the processor series, it can vary from model to model) and the operating systems, but there are general principles. Whether a processor is multicore has no direct impact on this aspect; the same process could be executing on multiple cores simultaneously (if it's multithreaded), and memory can be shared between processes, so cache synchronization is unavoidable regardless of what happens on a context switch.

When a processor looks up a memory location in the cache, if there is an MMU, it can use either the physical or the virtual address of that location (sometimes even a combination of both, but that's not really relevant here).

With physical addresses, it doesn't matter which process is accessing the address, the contents can be shared. So there is no need to invalidate the cache content during a context switch. If the two processes map the same physical page with different attributes, this is handled by the MMU (acting as a MPU (memory protection unit)). The downside of a physically addressed cache is that the MMU has to sit between the processor and the cache, so the cache lookup is slow. L1 caches are almost never physically addresses; higher-level caches may be.

The same virtual address can denote different memory locations in different processes. Hence, with a virtually addressed cache, the processor and the operating system must cooperate to ensure that a process will find the right memory. There are several common techniques. The context-switching code provided by the operating system can invalidate the whole cache; this is correct but very costly. Some CPU architectures have room in their cache line for an ASID (address space identifier) the hardware version of a process ID, also used by the MMU. This effectively separates cache entries from different processes, and means that two processes that map the same page will have incoherent views of the same physical page (there is usually a special ASID value indicating a shared page, but these need to be flushed if they are not mapped to the same address in all processes where they are mapped). If the operating system takes care that different processes use non-overlapping address spaces (which defeats some of the purpose of using virtual memory, but can be done sometimes), then cache lines remain valid.

Most processors that have an MMU also have a TLB. The TLB is a cache of mappings from virtual addresses to physical addresses. The TLB is consulted before lookups in physically-addressed caches, to determine the physical address quickly when possible; the processor may start the cache lookup before the TLB lookup is complete, as often candidate cache lines can be identified from the middle bits of the address, between the bits that determine the offset in a cache line and the bits that determine the page. Virtually-addressed caches bypass the TLB if there is a cache hit, although the processor may initiate the TLB lookup while it is querying the cache, in case of a miss.

The TLB itself must be managed during a context switch. If the TLB entries contain an ASID, they can remain in place; the operating system only needs to flush TLB entries if their ASID has changed meaning (e.g. because a process has exited). If the TLB entries are global, they must be invalidated when switching to a different context.

OTHER TIPS

The cache is typically oblivious to a context switch. Only the sequence of memory addresses accessed determines which cache lines are replaced.

The replacement policy is usually a heuristic dependent on the manufacturer and the particular microarchitecture. The problem is that the heuristic cannot predict the future, which address and therefore cache line will be accessed next.

The heuristic can be a simple as LRU (least-recently used). But with modern CPUs the heuristics are more intricate.

Take a look at the Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 Chapter 11 explains the memory cache and the cache control mechanisms. AMD has this in Chapter 7 of the AMD64 Architecture Programmer’s Manual Volume 2: System programming. For ARM based CPUs it seems that the PDFs are only available to registered customers.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange