You are one of numerous people trying to figure out what gprof
is telling them, to four decimal places.
I use random pausing and examining the stack.
In the first place, gprof
is a "CPU profiler".
That means during IO, mutex, paging, or any other blocking system calls, it is shut off, and doesn't count that time.
You say you're doing nothing of the sort, but it could be happening deep inside some library function.
If it were, gprof
masks it.
On the other hand, a single stack sample will show what it is waiting for, with probability equal to the fraction of time it is waiting.
Second, as @keshlam said, it's important to understand about "self time".
If it's high, that means the program counter was found in that routine a good percent of the CPU time.
However, if it's low, that does not mean the function isn't guilty.
The function could be spending lots of time, but doing it by calling subfunctions.
To see that, you need the gprof
"total" column, but as a percent of total time, not as an absolute time per call, which is what it is giving you.
If you take a stack sample, then any routine will appear on it with probability equal to the fraction of time it is spending.
What's more, you will know exactly why that time is being spent, because the sample will show you the precise line numbers where the calls occur.
ADDED: gprof
attempts to handle recursive functions, but as its authors point out, it does not succeed. However, stack samples have no problem with recursion. If a stack sample is taken during a recursive function call, the function appears more than once on the stack, possibly many times. However, it is still the case that the inclusive time cost of a function, or of any line of code that calls a function, is simply the fraction of time it is on the stack.
To see this, suppose samples are taken at a constant frequency, for a total of M samples, and a particular function or line of code appears on fraction F of them.
If that function or line of code could be made to take no time, such as by deleting it, branching around it, or passing it off to an infinitely fast processor, then it would have no exposure to being sampled.
Then the M*F samples on which it had appeared would disappear, shortening execution time by fraction F.