Question

I am having Intel Core IvyBridge processor , Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz( L1-32KB,L2-256KB,L3-8MB). I know L3 is inclusive and shared among multiple core. I want to know the following with respect to my system

PART1 :

  1. L1 is inclusive or exclusive ?
  2. L2 is inclusive or exclusive ?

PART2 :

If L1 and L2 are both inclusive then to find the access time of L2 we first declare an array(1MB) of size more than L2 cache(256KB) , then start accessing the whole array to load into L2 cache. After that we access the array element from start index to end index with stride of 64B as cache line size is 64B. To get better accurate result we repeat this process(accessing array elements at index ,start-end) for multiple times, say 1 million times and takes the average.

My understanding why this approach gives correct result as follows- When we access the array of size more than L2 cache size, then whole array is loaded from main memory to L3, then from L3 to L2, then L2 to L1. The last 32KB of the whole array is in L1 as it is recently accessed. The whole array is also present in L2 and L3 cache also due to inclusive property and cache coherency . Now, when I start accessing the array again from starting index, which is not in L1 cache, but in L2 cache, so there will be a cache miss and it will be loaded from L2 cache. And this way there will be higher access time required for all elements of whole array and in total I will get the total access time of whole array. To get the single access I will take the average of total no of access .

My question is - Am I correct ?

Thanks in advance .

Était-ce utile?

La solution

See section 2.2.5 in the Intel optimization guide -
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

(note that this applies for Sandy-Bridge, but doesn't appear as changed for Ivy-Bridge, which has only minor micro-architectural changes over the previous generation).

So regarding your questions:

  1. For the L1 there's no question of inclusiveness as it doesn't have upper level caches to be inclusive-of
  2. The L2 cache is not inclusive, meaning that there's no guarantee that a line residing in the L1 would have to be in the L2 as well. However on most cases it's likely to be there since it was probably filled into the L2 when originally requested by the core, and has a good chance to survive longer in the L2 since it's bigger (and therefore the evictions are better spread over more sets), and filtered by the L1 (meaning less evictions usually)

Also note that if your benchmark is accessing a data-set larger than the L2, it will probably fail to sit in the L2 (especially if you access it serially and exceed the L2 by more than the size of a single way), and you'd have to fetch it from the L3.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top