Question

I am running the following command in a directory with ~7000 files:

time find . | xargs -P8 -n1 grep -F 'xxx'

The results are:

real    0m1.638s
user    1m1.090s
sys     0m5.080s

I understand very well that (user+cpu) can be < or > cpuTime, but the following bound should hold:

(user + sys) < real * NCPU

Where NCPU is the number of logical cores on the system. At any instant there should be at most NCPU processes running, and being assigned either user or sys time. Yet I have 12 logical cores (6 real cores x 2 hyperthreads), but 1.638 * 12 = ~20 seconds, whereas somehow my process managed to consume more than a minute of CPU time.

Subjectively the 1.6s real time is about right (and I tried it on larger directories).

Messing with the -P value varies the reported real and sys times, but it plateaus around the value about somewhere around 8-12.

Note that none of the files contain xxx, but the results are the same if I use a string which gets some hits.

Was it helpful?

Solution

Most common mechanism of "user" and "system" time accounting is based in Linux on periodic timer. This method is inaccurate, especially for short processes:

http://www.cs.rochester.edu/courses/252/spring2014/notes/XX_timing

Kernel uses regular timer interrupts to trigger context switches, deliver SIGALARM signals, track time of day (literally, by counting timer ticks), schedule bookkeeping tasks, etc.

Kernel keeps with every process a count of how many times the timer interrupt handler found it (the process) in (a) user mode, or (b) kernel (system) mode. These counts work like the statistical profiling of gprof to give you a good sense, averaged over a long period of time, of what fraction of time a process was running, and what fraction of that it spent in the kernel (system). ... Because the granularity of the statistical sampling is roughly equivalent to a scheduling quantum (~5-20ms), interval timing is terribly inaccurate for times below about 100ms. Even beyond that, it's only good to within about 10%. It also tends to charge processes for some of the overhead of processing timer interrupts. The authors report that on their Linux system this overestimates consumed time by 4-5%.

Also http://www.cs.toronto.edu/~demke/469F.06/Lectures/Lecture5.pdf slides 5, 6, 7 "Accuracy of Interval Counting".

If we check the basic implementation, we know that "cpu" and "system" times are updated in kernel/timer.c file of linux kernel, in the "update_process_times" http://lxr.free-electrons.com/source/kernel/timer.c#L1349

1345 /*
1346  * Called from the timer interrupt handler to charge one tick to the current
1347  * process.  user_tick is 1 if the tick is user time, 0 for system.
1348  */
1349 void update_process_times(int user_tick)
1350 {
1351         struct task_struct *p = current;
1352         int cpu = smp_processor_id();
1353 
1354         /* Note: this timer irq context must be accounted for as well. */
1355         account_process_tick(p, user_tick);  // <<<<<<<<<<< here
1356         run_local_timers();
1357         rcu_check_callbacks(cpu, user_tick);
    ...
1362         scheduler_tick();
1363         run_posix_cpu_timers(p);
1364 }

The "update_process_times" is called from tick_nohz_handler or tick_sched_timer -> tick_sched_handle (kernel/time/tick-sched.c#L147) and from tick_handle_periodic -> tick_periodic (kernel/time/tick-common.c#L90). I think that is some cases update_process_times may be called more often than timer interrupt.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top