Different cache miss count for a same program between multiple runs

Question 1

Linux implements the "Address space layout randomization" technique (http://en.wikipedia.org/wiki/Address_space_layout_randomization) for security matters. And you can deactivate this behavior like this:

echo -n "0" > /proc/sys/kernel/randomize_va_space

You can test that through this example:

#include <stdio.h>

int main() {
   char a;
   printf("%u\n", &a);
   return 0;
}

You should always have the same value printed.

Before:

 % ./a.out
4006500239
 % ./a.out
819175583
 % ./a.out
2443759599
 % ./a.out
2432498159

After:

 % ./a.out
4294960207
 % ./a.out
4294960207
 % ./a.out
4294960207
 % ./a.out
4294960207

That also explain the different amount of cache misses since two variables that were in the same line can now be in two different lines.

Edit: This does not solve entirely the problem apparently but I think it was one of the reason. I'll give the bounty to anyone who can help me resolving this issue.

Question 2

It seems this is a known behavior in valgrind:

I used the example that outputs the cache base address, I also disabled the layout randomization.

I ran the executable twice getting the same results in both runs:

D   refs:       40,649  (28,565 rd   + 12,084 wr)
==15016== D1  misses:     11,465  ( 8,412 rd   +  3,053 wr)
==15016== LLd misses:      1,516  ( 1,052 rd   +    464 wr)
==15016== D1  miss rate:    28.2% (  29.4%     +   25.2%  )
==15016== LLd miss rate:     3.7% (   3.6%     +    3.8%  )

villar@localhost ~ $ cache=8 && valgrind --tool=cachegrind --I1=$((cache * 64)),$cache,64 --D1=$((cache * 64)),$cache,64 --L2=262144,4096,64 ./a.out 

==15019== D   refs:       40,649  (28,565 rd   + 12,084 wr)
==15019== D1  misses:     11,465  ( 8,412 rd   +  3,053 wr)
==15019== LLd misses:      1,516  ( 1,052 rd   +    464 wr)
==15019== D1  miss rate:    28.2% (  29.4%     +   25.2%  )
==15019== LLd miss rate:     3.7% (   3.6%     +    3.8%  )

According to the cachegrind documentation (http://www.cs.washington.edu/education/courses/cse326/05wi/valgrind-doc/cg_main.html)

Another thing worth nothing is that results are very sensitive. Changing the size of the >valgrind.so file, the size of the program being profiled, or even the length of its name can perturb the results. Variations will be small, but don't expect perfectly >repeatable results if your program changes at all. While these factors mean you shouldn't trust the results to be super-accurate, hopefully >they should be close enough to be useful.

After reading this, I changed the file name and got the following:

villar@localhost ~ $ mv a.out a.out2345345345
villar@localhost ~ $ cache=8 && valgrind --tool=cachegrind --I1=$((cache * 64)),$cache,64 --D1=$((cache * 64)),$cache,64 --L2=262144,4096,64 ./a.out2345345345 

==15022== D   refs:       40,652  (28,567 rd   + 12,085 wr)
==15022== D1  misses:     10,737  ( 8,201 rd   +  2,536 wr)
==15022== LLd misses:      1,517  ( 1,054 rd   +    463 wr)
==15022== D1  miss rate:    26.4% (  28.7%     +   20.9%  )
==15022== LLd miss rate:     3.7% (   3.6%     +    3.8%  )

Changing the name back to "a.out" gave me exactly the same result as before.

Notice that changing the file name or the path to it will change the base of the stack!!. and this may be the cause after reading what Mr. Evgeny said in a prior comment

When you change current working directory, you also change corresponding environment variable (and its length). Since a copy of all environment variables is usually stored just above the stack, you get different allocation for stack variables and different number of cache misses. (And shell could change some other variables besides "PWD").

EDIT: Documentation also says:

Program start-up/shut-down calls a lot of functions that aren't interesting and just complicate the output. Would be nice to exclude these somehow.

The simulated cache may be tracking the start and end of the program being it the source of the variations.