Question

At this web link:

http://www.7-cpu.com/cpu/IvyBridge.html

it says the latency for Ivy Bridge L1 cache access is:

  • L1 Data Cache Latency = 4 cycles for simple access via pointer
  • L1 Data Cache Latency = 5 cycles for access with complex address calculation (size_t n, *p; n = p[n]).

Instead of "simple", did they mean if the pointer size is the same as the word size? So if the pointer is 32-bit and its a 32-bit OS then this would be "simple", otherwise it would cost the "complex" latency?

I just don't quite understand their explanation for the difference in the two latencies.

Was it helpful?

Solution

The full x86 effective address looks like displacement + base + index * scale (where displacement is a constant, base and index are registers, and scale is 1, 2, 4 or 8).

It sounds like they call an address simple if only the displacement is present (or maybe additionally the base term), while having index * scale would certainly fall under the complex category.

Update: Indeed, the intel optimization manual has this statement (for Sandy Bridge, though): The common load latency is five cycles. When using a simple addressing mode, base plus offset that is smaller than 2048, the load latency can be four cycles. See also Table 2-12. Effect of Addressing Modes on Load Latency.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top