Ah, the good old poor-mans'-profiler technique. I'd be lying if I said I haven't used it from time to time, but it's indeed very problematic and will probably be biased toward finding heisenbugs and not necessarily reflect the real behavior. Another issue is that instructions are overlapped on modern out-of-order CPUs, so even if the program takes longer to do some load or store, your actual breaking point might fall far away from it (long before the long-latency load instruction actually commits, or long after a store instruction does.
Having that said, if you insist on using it, you can
- check for page offset in load/store addresses in the vicinity of the breaking point (4k/2M/.. depending on your system configuration). A small offset within a stream of accesses might indicate a TLB miss and a pagewalk
- use LBRs to check last branches behavior and predictability
Can't think of a way to recognize an I-Cache miss, as these are even earlier and further decoupled from the execution pipelines where your debugger is likely to catch the "current" instruction