It seems this is a known behavior in valgrind:
I used the example that outputs the cache base address, I also disabled the layout randomization.
I ran the executable twice getting the same results in both runs:
D refs: 40,649 (28,565 rd + 12,084 wr)
==15016== D1 misses: 11,465 ( 8,412 rd + 3,053 wr)
==15016== LLd misses: 1,516 ( 1,052 rd + 464 wr)
==15016== D1 miss rate: 28.2% ( 29.4% + 25.2% )
==15016== LLd miss rate: 3.7% ( 3.6% + 3.8% )
villar@localhost ~ $ cache=8 && valgrind --tool=cachegrind --I1=$((cache * 64)),$cache,64 --D1=$((cache * 64)),$cache,64 --L2=262144,4096,64 ./a.out
==15019== D refs: 40,649 (28,565 rd + 12,084 wr)
==15019== D1 misses: 11,465 ( 8,412 rd + 3,053 wr)
==15019== LLd misses: 1,516 ( 1,052 rd + 464 wr)
==15019== D1 miss rate: 28.2% ( 29.4% + 25.2% )
==15019== LLd miss rate: 3.7% ( 3.6% + 3.8% )
According to the cachegrind documentation (http://www.cs.washington.edu/education/courses/cse326/05wi/valgrind-doc/cg_main.html)
Another thing worth nothing is that results are very sensitive. Changing the size of the >valgrind.so file, the size of the program being profiled, or even the length of its name can perturb the results. Variations will be small, but don't expect perfectly >repeatable results if your program changes at all.
While these factors mean you shouldn't trust the results to be super-accurate, hopefully >they should be close enough to be useful.
After reading this, I changed the file name and got the following:
villar@localhost ~ $ mv a.out a.out2345345345
villar@localhost ~ $ cache=8 && valgrind --tool=cachegrind --I1=$((cache * 64)),$cache,64 --D1=$((cache * 64)),$cache,64 --L2=262144,4096,64 ./a.out2345345345
==15022== D refs: 40,652 (28,567 rd + 12,085 wr)
==15022== D1 misses: 10,737 ( 8,201 rd + 2,536 wr)
==15022== LLd misses: 1,517 ( 1,054 rd + 463 wr)
==15022== D1 miss rate: 26.4% ( 28.7% + 20.9% )
==15022== LLd miss rate: 3.7% ( 3.6% + 3.8% )
Changing the name back to "a.out" gave me exactly the same result as before.
Notice that changing the file name or the path to it will change the base of the stack!!.
and this may be the cause after reading what Mr. Evgeny said in a prior comment
When you change current working directory, you also change corresponding environment variable (and its length). Since a copy of all environment variables is usually stored just above the stack, you get different allocation for stack variables and different number of cache misses. (And shell could change some other variables besides "PWD").
EDIT: Documentation also says:
Program start-up/shut-down calls a lot of functions that aren't interesting and just complicate the output. Would be nice to exclude these somehow.
The simulated cache may be tracking the start and end of the program being it the source of the variations.