Question

I'm interested in comparing CPU times some code portions written C++ vs Python (running on Linux). Will the following methods produce a "fair" comparison between the two?

Python

Using the resource module:

import resource
def cpu_time():
    return resource.getrusage(resource.RUSAGE_SELF)[0]+\ # time in user mode
        resource.getrusage(resource.RUSAGE_SELF)[1] # time in system mode

which allows for timing like so:

def timefunc( func ):
    start=cpu_time()
    func()
    return (cpu_time()-start)

Then I test like:

def f():
    for i in range(int(1e6)):
        pass

avg = 0
for k in range(10):
    avg += timefunc( f ) / 10.0
print avg
=> 0.002199700000000071

C++

Using the ctime lib:

#include <ctime>
#include <iostream>

int main() {
    double avg = 0.0;
    int N = (int) 1e6;
    for (int k=0; k<10; k++) {
        clock_t start;
        start = clock();
        for (int i=0; i<N; i++) continue;
        avg += (double)(clock()-start) / 10.0 / CLOCKS_PER_SEC;
    }
    std::cout << avg << '\n';
    return 0;
}

which yields 0.002.

Concerns:

  1. I've read that C++ clock() measures CPU time which is what I'm after, but I can't seem to find if it includes both user and system times.
  2. Results from C++ are much less precise. Why is that?
  3. Overall fairness of comparison as mentioned.

Update

updated the c++ code as per David's suggestion in the comments:

#include <sys/resource.h>
#include <iostream>

int main() {
    double avg = 0.0;
    int N = (int) 1e6;
    int tally = 0;

    struct rusage usage;
    struct timeval ustart, ustop, sstart, sstop;

    getrusage(RUSAGE_SELF, &usage);
    ustart = usage.ru_utime;
    sstart = usage.ru_stime;

    for (int k=0; k<10; k++) {
        ustart = usage.ru_utime;
        sstart = usage.ru_stime;

        for (int i=0; i<N; i++) continue;

        getrusage(RUSAGE_SELF, &usage);
        ustop = usage.ru_utime;
        sstop = usage.ru_stime;

        avg += (
            (ustop.tv_sec+ustop.tv_usec/1e6+
            sstop.tv_sec+sstop.tv_usec/1e6)
            -
            (ustart.tv_sec+ustart.tv_usec/1e6+
            sstart.tv_sec+sstart.tv_usec/1e6)
        ) / 10.0; 
    }

    std::cout << avg << '\n';

    return 0;
}

Running:

g++ -O0 cpptimes.cpp ; ./a.out
=> 0.0020996
g++ -O1 cpptimes.cpp ; ./a.out
=> 0

So I suppose getrusage gets me a little bit better resolution, but I'm not sure how much I should read into it. Setting the optimization flag certainly makes a big difference.

Was it helpful?

Solution

The documentation says:

"Returns the approximate processor time used by the process since the beginning of an implementation-defined era related to the program's execution. To convert result value to seconds divide it by CLOCKS_PER_SEC."

That's pretty vague. CLOCK_PER_SEC is set to 10^6 and the approximate stands for poor resolution, not that the current clocks tick over 1000 faster and the results are rounded. That might be not a very technical term, but it is appropriate. The actual resolution everywhere I tested was about 100Hz = 0,01s. It's been like that for years. Note date here http://www.guyrutenberg.com/2007/09/10/resolution-problems-in-clock/.

Then the doc follows with: "On POSIX-compatible systems, clock_gettime with clock id CLOCK_PROCESS_CPUTIME_ID offers better resolution."

So:

  1. It's CPU time only. But 2 threads = 2*CPU time. See the example on cppreference.

  2. It is not suited for fine grain measurements at all, as explained above. You were on the verge of its accuracy.

  3. IMO measuring wall-clock is the only sensible thing, but its a rather personal opinion. Especially with multithreaded applications and multiprocessing in general. Otherwise results of system+user should be similar anyways.

EDIT: At 3. This of course holds for computational tasks. If your process uses sleep or give up execution back to system, it might be more feasible measuring CPU time. Also regarding the comment that clock resolution is erm... bad. It is, but to be fair one could argue you should not measure such short computations. IMO its too bad, but if you measure times over few seconds I guess its fine. I would personally use others available tools.

OTHER TIPS

Setting the optimization flag certainly makes a big difference.

C++ is a language that begs to be compiled optimized, particularly so if the code in question uses containers and iterators from the C++ standard library. A simple ++iterator shrinks from a good-sized chain of function calls when compiled unoptimized to one or two assembly statement when optimization is enabled.

That said, I knew what the compiler would do to your test code. Any decent optimizing compiler will make that for (int i=0; i<N; i++) continue; loop vanish. It's the as-if rule at work. That loop does nothing, so the compiler is free to treat it as if it wasn't even there.

When I look at the CPU behavior of a suspect CPU hog, I write a simple driver (in a separate file) that calls the suspect function a number of times, sometimes a very large number of times. I compile the functionality to be tested with optimization enabled, but I compile the driver with optimization disabled. I don't want a too-smart optimizing compiler to see that those 100,000 calls to function_to_be_tested() can be pulled out of the loop and then further optimize the loop away.

There are a number of solid reasons for calling the test function a number of times between the single call to start timer and stop timer. This is why python has the timeit module.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top