Which is the relationship between CPU time measured by Python profiler and, real, user and sys time?

https://stackoverflow.com/questions/9533179

03-12-2019
|

Question

Using the python built-in profiler with a script runninng in one processor (and no multithreading)

time python -m cProfile myscript.py

the CPU time reported by the profiler is 345.710 CPU seconds

24184348 function calls (24183732 primitive calls) in 345.710 CPU seconds

and the real, user and sys time are:

real    5m45.926s
user    1m59.340s
sys     0m39.452s

As you can see the CPU time is nearly the real time (345.710 = 5m45.710s).

So, given that result, is it possible to assume that CPU time reported by the profiler includes the time slices used by other processes and time the process spends blocked? i.e. that the profiler CPU time is not the process time (user+sys) but the wall clock time as it is explained in What do 'real', 'user' and 'sys' mean in the output of time(1)?

Thank very much in advance

Solution

This answers beautifully details on the meaning of real, user and sys timings. To quote:

'Real' is wall clock time - time from start to finish of the call. This is all elapsed time including time slices used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).
'User' is the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only actual CPU time used in executing the process. Other processes and time the process spends blocked do not count towards this figure.
'Sys' is the amount of CPU time spent in the kernel within the process. This means executing CPU time spent in system calls within the kernel, as opposed to library code, which is still running in user-space. Like 'user', this is only CPU time used by the process.

From the above explanation it appears like User+Sys time should be equal to CPU seconds. Instead, its closer to 'real' time. Strange!

There is a fair explanation for that. 'User' time does not include CPU seconds spent on I/O operations within the process. It just measures CPU time spent on user-mode code within the memory. The rule of thumb is:

real time = user + sys + I/O time + interpreter startup time + bytecode compilation time

To validate this I made a urllib2.urlopen(urllib2.Request(url)) call doing intensive I/O. Here are the results:

         100792 function calls (98867 primitive calls) in 2.076 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      512    0.719    0.001    0.719    0.001 {method 'recv' of '_socket.socket' objects}
        1    0.464    0.464    0.473    0.473 {_socket.getaddrinfo}
        1    0.352    0.352    0.352    0.352 <string>:1(connect)
        1    0.248    0.248    0.348    0.348 common_functions.py:1(<module>)
        1    0.022    0.022    0.075    0.075 __init__.py:2(<module>)
        1    0.017    0.017    0.030    0.030 urllib.py:1442(getproxies_macosx_sysconf)
   330/67    0.012    0.000    0.031    0.000 sre_parse.py:385(_parse)


real    0m2.255s
user    0m0.352s
sys 0m0.093s

Here, 2.076-(0.352+0.093), i.e, 1.631 CPU secs were consumed in I/O operations (mainly _socket.socket and _socket.getaddrinfo). Rest of the time, 2.255-2.076, were spent in cold start of the code.

Hope that was useful.

Update: In multiple core systems, where multiple CPUs work in parallel, the case is slightly different. Total CPU seconds reported by cProfile is the sum total of time spent by all CPUs individually. For eg: In a 2 core system, if one CPU works for 10 secs. In parallel, another CPU works for 15 secs. Total CPU seconds reported will be 25 secs. Although the real time elapsed may be just 15 secs. Hence CPU time may be more than Real Time in multi core systems. As the CPUs are working in parallel

OTHER TIPS

I've been puzzled by the same problem.

The answer is that cProfile uses wallclock time. And its output has been historically wrong but is now fixed (the 'CPU' in 'CPU seconds' has been removed). I don't know exactly when, but Python 2.6.6 in Debian 6.0 has the bug while Python 2.7.3 in Debian 7.0 is fine.

This is puzzling because most profilers account time spent on the CPU, and not wallclock time. But this is a "cheap" profiler.

The doc http://docs.python.org/2/library/profile.html#limitations explains - not clearly - that the timing is based on a ticking wallclock and not getrusage() or ptrace techniques.

If you check the code (http://hg.python.org/cpython/file/44f455e6163d/Modules/_lsprof.c) you can verify that it is QueryPerformanceFrequency/gettimeofday based.

Jerrymouse is right about the fact that the 'time' and the cProfile timings cannot coincide because cProfile only starts running once the code is compiled. But besides that its 'realtime' equation is bogus.

The difference between wallclock and user+sys may lie in many different factors, like I/O on behalf of your process OR on behalf on any other processes including the kernel itself (swapping, journaling, etc.), or CPU time spent on other processes, or waiting on anything on behalf of your process that cannot be accounted because it's a remote ressource (network, or indirect via NFS or iSCSI), etc. You name it.

The Python profiler measures wall time by default, but can be made to measure CPU time using a custom timer function. The following works under Linux, but not Windows (since time.clock measures wall time on Windows):

import cProfile
import time


def idleWait():
    time.sleep(10)

def busyWait():
    t = time.time() + 10
    while time.time() < t: pass

def target():
    idleWait()
    busyWait()


print "Wall time:"
p = cProfile.Profile()
p.runcall(target)
p.print_stats()

print "CPU time:"
p = cProfile.Profile(time.clock)
p.runcall(target)
p.print_stats()

The first profile run will show 20 seconds elapsed, with about half of them spent in time.sleep. The second shows 10 seconds elapsed, with none of them spent in time.sleep.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow