Question

I am using -O3 when compiling the code, and now I need to profile it. For profiling, there are two main choices I came accross: valgrind --tool=callgrind and gprof.

Valgrind (callgrind) docs state:

As with Cachegrind, you probably want to compile with debugging info (the -g option) and with optimization turned on.

However, in the C++ optimization book by Agner Fog, I have read the following:

Many optimization options are incompatible with debugging. A debugger can execute a code one line at a time and show the values of all variables. Obviously, this is not possible when parts of the code have been reordered, inlined, or optimized away. It is common to make two versions of a program executable: a debug version with full debugging support which is used during program development, and a release version with all relevant optimization options turned on. Most IDE's (Integrated Development Environments) have facilities for making a debug version and a release version of object files and executables. Make sure to distinguish these two versions and turn off debugging and profiling support in the optimized version of the executable.

This seems to conflict the callgrind instructions to compile the code with the debugging info flag -g. If I enable debugging in the following way:

-ggdb -DFULLDEBUG

am I not causing this option to conflict with the -O3 optimization flag? Using those two options together makes no sense to me after what I have read so far.

If I use say -O3 optimization flag, can I compile the code with additional profiling info by using:

-pg

and still profile it with valgrind?

Does it ever make sense to profile a code compiled with

-ggdb -DFULLDEBUG -O0

flags? It seems silly - not inlining functions and unrolling loops may shift the bottlenecks in the code, so this should be used for development only, to get the code to actually do stuff properly.

Does it ever make sense to compile the code with one optimization flag, and profile the code compiled with another optimization flag?

Was it helpful?

Solution

Why are you profiling? Just to get measurements or to find speedups?

The common wisdom that you should only profile optimized code is based on assuming the code is nearly optimal to begin with, which if there are significant speedups, it is not.

You should treat the finding of speedups as if they were bugs. Many people use this method of doing so.

After you've removed needless computations, if you still have tight CPU loops, i.e. you're not spending all your time in system or library or I/O routines the optimizer doesn't see, then turn on -O3, and let it do its magic.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top