Question

What are the differences between using profilers that need to recompile the source code with debugging options(like gprof) and profilers that do not need recompiling(like Valgrind, OProfile, ...)?

Was it helpful?

Solution

I'm not familiar with the named profilers but there are two major approaches to profiling:

Instrumentation, this method usually requires recompiling (not always, for example java and .Net applications can be instrumented dynamically). With this method it is possible to measure exactly how often a routine is called, or how many iterations a certain loop makes.

Sampeling is a method that does not require any recompiling, it simply takes a snapshot of the stack with set intervals. This has proven to be an effective way to find bottlenecks.

There is some more info about the two strategies here

OTHER TIPS

I can speak on Valgrind and gprof at least.

The primary differences between using the two is basically what you already said. For gprof, you have to compile it specially to include the profiling code. When you then run your executable, the profiling code is executed (since it's built into your program), and a gmon.out file is created that can then be processed by gprof to show you runtime statistics of your program.

Valgrind is different in that you don't need to compile your program in any special way (except to add debug symbols if you want the output to be useful). Valgrind dynamically translates your program into an internal format that is run on a simulated CPU (although this is slow). This means that any program can be run through Valgrind without needing the special compilation.

Another important difference is that Valgrind can report a lot more information than gprof does, but that's not specifically related to using it.

Any profiling technique is going to need symbol table information, so that has to be requested in the compilation and linking.

Other than that, some profilers work by compiling-in calls to record-keeping routines at the beginning and possibly end of each function. Those functions can attempt to record the time used by the function, and some record of where it was called from. Its timing figures are made inaccurate by the overhead of calling those recording functions.

Other profilers do not need to do that, instead relying on periodic samples of the call stack. Such a profiler has lower overhead. Its timing figures are made inaccurate by the statistical nature of its sampling.

Implicit in this is that accuracy of timing is necessary for locating "bottlenecks", which has never, to my knowledge, been shown to be true. The method I've always used to get orders of magnitude speedup relies on insight into what the program is doing as it spends time, rather than on precisely how much time is spent. If you're interested in the statistical rationale, you could look here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top