What are the signs of non-data cache misses (instruction, TLB, etc.)?

https://stackoverflow.com/questions/20215304

05-08-2022
|

Question

When you're debugging performance-critical code and looking at the disassembly, it's not too hard to spot bottlenecks due to data cache misses:

Load/store instructions tend to be the usual bottlenecks, which means that if you stop the program, chances are that it will stop close to a load/store instruction loading from some unpredictable memory address.
Similarly, one way to find branch mispredictions is to look and see if breaking the program usually stops it nearby particular jumps, and to then look at the code to see if the jumps are predictable.

(Or, at least, that's how I try to find such bottlenecks. If I'm looking for the wrong symptoms let me know..)

What, however, are the symptoms of other kinds of cache misses?
I do know they're rare, but I still want to know how to spot them if/when they come up.

By "other" caches, I mean things like:

Instruction Cache(s)
Translation Lookaside Buffer
Bonus points for other important caches I should know about but I'm not aware of

Solution

Ah, the good old poor-mans'-profiler technique. I'd be lying if I said I haven't used it from time to time, but it's indeed very problematic and will probably be biased toward finding heisenbugs and not necessarily reflect the real behavior. Another issue is that instructions are overlapped on modern out-of-order CPUs, so even if the program takes longer to do some load or store, your actual breaking point might fall far away from it (long before the long-latency load instruction actually commits, or long after a store instruction does.

Having that said, if you insist on using it, you can

check for page offset in load/store addresses in the vicinity of the breaking point (4k/2M/.. depending on your system configuration). A small offset within a stream of accesses might indicate a TLB miss and a pagewalk
use LBRs to check last branches behavior and predictability

Can't think of a way to recognize an I-Cache miss, as these are even earlier and further decoupled from the execution pipelines where your debugger is likely to catch the "current" instruction

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow