Question

I am profiling my code and I already found the most expensive part of it. However it happens in an inlined function. To measure the impact I had forced the function to be not inlined.

Now I would like to report accurate profiling data. Without the inline we have a massive overhead (the function is basically a single loop, but it is called very, very often).

I wonder if it is possible to instruct valgrind to treat a specific section of the code as it were a function by itself (like the makros CALLGRIND_START_INSTRUMENTATION, CALLGRIND_STOP_INSTRUMENTATION) without forcing the function to not be inlined.

Was it helpful?

Solution

valgrind --tool=callgrind

is able to show a lot of details about where the cpu (and other costs such as cache) is spent. kcachegrind (visualisation tool) can easily show the various costs (including for inlined functions).

Try to run e.g. with :

valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes 

Note: to look costs at instruction level, you must use kcachegrind

OTHER TIPS

Maybe you could call the CALLGRIND_TOGGLE_COLLECT macro just before calling your function and at the beginning of your function, ditto for the exit of your function and just after the call of your function. E.g.

int main()
{
    CALLGRIND_TOGGLE_COLLECT;
    myFunction();
    CALLGRIND_TOGGLE_COLLECT;
}

__attribute__((noinline))
void myFunction()
{
    CALLGRIND_TOGGLE_COLLECT;
    //Do stuff
    CALLGRIND_TOGGLE_COLLECT;
}

Should do the trick.

Im not sure if this is what you want but Im not sure it isn't :) :
http://valgrind.org/docs/manual/cg-manual.html#cg-manual.overview

Also, since one instruction cache read is performed per instruction executed, you can find out how many instructions are executed per line, which can be useful for traditional profiling.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top