Pergunta

I'm analyzing some code and using cachegrind to get the number of cachemisses(L2 and L3) in the execution.

My question is how do I determine the time spend waiting for the cache to get readdy based on the cache misses?

I would like to be able to say something like, "my code get 90% cpu utilization"

is it posible to do this based on the cache grind output?

Foi útil?

Solução

Cachegrind simply simulates execution on a CPU, emulating how the cache and branch predictor might behave. To be able to know how long you would spend blocking on the cache would require a lot more information. Specifically you need to know when execution can be speculated and how many instructions can be dispatched in parallel (as well as how memory memory accesses can be coordinated simultaneously). Cachegrind can't do this, and any tool that could would depend heavily on the processor (whereas cache misses are much less processor dependent).

If you have access to a modern Intel CPU I'd recommend getting a free copy of VTune (for non-commercial purposes) and seeing what it says. It can tell the processor to collect data on cache misses and will report it back to you, so you can see what actually happened rather then just simulating. It will give you a clocks per instruction for each line of code, and using this you can see which lines are blocking on the cache (and how long for), it can also give you all the other information cachegrind can.

You can get it here:

http://software.intel.com/en-us/articles/non-commercial-software-download/

Outras dicas

The only way to be sure is to use your CPU's performance monitoring counters to measure your particular CPU - and even then, the results are very specific and any optimisations you do based on this may perform very badly on CPUs with different cache sizes, bus architecture or memory configuration.

A variable can be fetched from the cache in a few clock cycles.

It can take more than one hundred clock cycles to fetch it from RAM if it isnt in the cache.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top