Question

I use gprof to analysis a program running on linux which uses a 25ms interval timer (created by timer_create()) to update a global variable. The signal handler is something like update_every_25ms(), it calls profile_dummy() to increase a grobal variable:

static void update_every_25ms(int sig_nr) { profile_dummy(); }
void profile_dummy(void) { global_var++; }

The timer is created through:

timer_t timer;
struct itimerspec itimer;
timer_create(CLOCK_MONOTONIC, NULL, &timer)
itimer.it_interval.tv_sec = 0;
itimer.it_interval.tv_nsec = 25 * 1000 * 1000;
timer_settime(timer, 0, &itimer, NULL);

My program do a lot of heavy works but gprof shows that the update_every_25ms() functions takes the most of time when CPU usage is 100%, and this function itself takes about 1.53ms per call!

Following are several groups of the gprof output when the interval is 25ms:

%time  cumulative  self     calls  self    total    name
       seconds     seconds         ms/call ms/call
3.72   116.26      7.76     22963  0.34    0.34     profile_dummy

Here CPU usage is 60%. Why does profile_dummy() need 0.34ms per call?

%time  cumulative  self     calls  self    total    name
       seconds     seconds         ms/call ms/call
9.38   38.87       38.87    25349  0.00    0.00     profile_dummy

Here CPU usage is 100%. 38.87s / 25349 = 1.53ms but gprof outputs 0.00, what happend?

%time  cumulative  self     calls  self    total    name
       seconds     seconds         ms/call ms/call
6.21   270.58      57.72 59105     0.00    0.00     profile_dummy

Here CPU usage is 90%. And 57.72s / 59105 = 0.98ms but gprof outputs 0.00 too.

This one is the gprof output when I changed the timer interval to 25s:

%time  cumulative  self     calls  self    total    name
       seconds     seconds         ms/call ms/call
0.01   287.52      0.03  23        1.30    1.30     profile_dummy

The function just increase a global variable, why does it need 1.30ms?

Thanks for any reply.

Était-ce utile?

La solution

I'm just hazarding a guess here.

gprof has a signal interrupt every 10ms, at which time it samples the program counter (PC). The total self time of a routine is the number of samples landing in that routine. Since it also counts calls to the routine, it can get the average self time per call.

Normally, when your program is blocked, like for I/O, sleep, or multitasking suspension, the PC is meaningless, so the interrupts are suspended. That's why gprof cannot see I/O at all.

However, if you have your own timer interrupt, that may only mask the profiler interrupt, not suspend it, causing it to be answered after your timer interrupt is dismissed, which might put the PC preferentially in your interrupt handler.

But that's just a guess.

Unless this is purely an academic question, there are better profilers than gprof.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top