How can I get better profiling?

https://stackoverflow.com/questions/622019

05-07-2019
|

Question

I need to profile a program to see whether any changes need to be made regarding performance. I suspect there is a need, but measuring first is the way to go. This is not that program, but it illustrates the problem I'm having:

#include <stdio.h>
int main (int argc, char** argv)
{
  FILE* fp = fopen ("trivial.c", "r");
  if (fp)
  {
    char line[80];
    while (fgets (line, 80, fp))
      printf (line);

    fclose (fp);
  }
  return 0;
}

Here's what I did with it:

% gcc trivial.c -pg -o trivial
% ./trivial
...
% gprof trivial gmon.out

Granted, this is a trivial program, but I would have thought it would make some kind of blip on the profiling radar. It didn't:

                                  called/total       parents 
index  %time    self descendents  called+self    name       index
                                  called/total       children

                0.00        0.00       1/1           __start [1704]
[105]    0.0    0.00        0.00       1         _main [105]

-----------------------------------------------

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  0.0       0.00     0.00        1     0.00     0.00  _main [105]

Index by function name

 [105] _main

Can anyone guide me here? I would like the output to reflect that it called fgets and printf at least 14 times, and it did hit the disk after all - there should be some measured time, surely.

When I run the same command on the real program, I get more functions listed, but even then it is not a complete list - just a sample.

Perhaps gprof is not the right tool to use. What is?

This is on OS X Leopard.

Edit: I ran the real program and got this:

% time real_program
real      4m24.107s
user      2m34.630s
sys       0m38.716s

Solution

There are certain commonly-accepted beliefs in this business, that I would suggest you examine closely.

One is that the best (if not only) way to find performance problems is to measure the time each subroutine takes and count how many times it is called.

That is top-down. It stems from a belief that the forest is more important than the trees. It is based on myths about "speed of code" and "bottlenecks". It is not very scientific.

A performance problem is more like a bug than a quantitative thing. What it is doing wrong is it is wasting time, and it needs to be fixed. It is based on a simple observation:

Slowness consists of time being spent for poor reasons.

To find it, sample the program state at random slivers of clock time, and investigate their reasons.

If something is causing slowness, then that fact alone exposes it to your samples. So if you take enough samples, you will see it. You will know approximately how much time it is costing you, by the fraction of samples that show it.

A good way to tell if a sliver of time is being spent for a good reason is to look carefully at the call stack. Every function invocation on the stack has an implicit reason, and if any of those reasons are poor, then the reason for the entire sample is poor.

Some profilers tell you, at the statement level, what each statement is costing you.

Personally, I just randomly halt the program several times. Any invocations showing up on multiple samples are likely candidates for suspicion. It never fails.

You may say "It's not accurate." It's extremely accurate. It precisely pinpoints the instructions causing the problem. It doesn't give you 3 decimal places of timing accuracy. I.e. it is lousy for measurement, but superb for diagnosis.

You may say "What about recursion?". Well, what about it?

You may say "I think that could only work on toy programs." That would be just wishing. In fact large programs tend to have more performance problems, because they have deeper stacks, thus more opportunity for invocations with poor reasons, and sampling finds them just fine, thanks.

Sorry to be a curmudgeon. I just hate to see myths in what should be a scientifically-based field.

OTHER TIPS

I think that you could try various Valgrind tools, especially callgrind (used to get call counts and inclusive cost for each call happening in your program).

There are various nice visualisation tools for the valgrind output. I don't know about particular tools for OS X though.

By default gprof shows you limited data. Which is good. Look at your output -- it mentions only main (which is the default). Now, look at the calls column -- this is what you want. But for other functions, so try:

gprof -e main -f printf -f fgets trivial > gprof.output

Here's a link to some of the commands. Also, try man gprof on your system. Here's how to interpret the data.

Also, look up ltrace, strace and ptrace (if available -- I no longer recall if all of them are on OSX) as well -- they are fun!

Shark is the profiler that is included in the developer tools.

Before profiling your code you need to see where your program spends its time. Run it under time(1) to see the corresponding user, system, and wall clock time. Profiling your code makes sense only when the user time is close to the wall clock time. If user and system time are very small compared to the wall clock time, then your program is I/O bound; if system time is close to the wall clock time your program is kernel bound. In both of these cases run your program under strace -c or a suitable dtrace script to determine the time spent in each system call.

Profiling doesn't indicate disk access, just what functions were called, and those won't be representative due to VM caching.

Valgrind doesn't work well on OS X.

With Leopard you have the Dtrace utility; I haven't used it but it might get you the info that you're looking for.

The absence of certain functions typically means that those functions are not compiled for profiling. Specifically, to profile code that uses standard functions such as printf (almost always, I'd guess), you need a version of the C library that is compiled with profiling support. I'm not familiar with OS X, but on Linux I needed to install a libc6-prof package which includes the libc_p library.

B.t.w., I do believe OS X (or perhaps XCode?) comes with a profiling tool. It is not as precise as the gprof method because it uses sampling, but you can run it on any program without special compilation.

Taking a look at your program, since you are using file handling (only), it also depends on any cache that is enabled. So, beware, your profiling results may vary based on your cache behaviour.

For the sample code you gave above, if you sample the call stack some number of times, you will basically see these stacks, in some proportion:

-------------------------------------
...
main.c:  4 call _fopen
...        call _main
-------------------------------------
...
main.c:  8 call _fgets
...        call _main
-------------------------------------
...
main.c:  9 call _printf
...        call _main
-------------------------------------
...
main.c: 11 call _fclose
...        call _main

and the proportions will tell you roughly what fraction of the time is spent in each call. You are not likely to see much else because the "exclusive" time is essentially zero compared to the I/O library calls. That's what stack samples can tell you - the precise statements that are costing you the most, and roughly how much, no matter how big the program is.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow