Question

I have to time the clock_gettime() function for estimating and profiling other operations, and it's for homework so I cant use a profiler and have to write my own code.

The way I'm doing it is like below:

clock_gettime(CLOCK_PROCESS_CPUTIME_ID,&begin);

for(int i=0;i<=n;i++)
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);

cout<<(end.tv_nsec-begin.tv_nsec)/n; //time per clock_gettime()

The problem is that when n=100, output is: 370.63 ns, when n=100000, output: 330 ns, when n=1000000, output: 260 ns, n=10000000, output: 55 ns, ....keeps reducing.

I understand that this is happening because of instruction caching, but I don't know how to handle this in profiling. Because for example when I estimate the time for a function call using gettime, how would I know how much time that gettime used for itself?

Would taking a weighted mean of all these values be a good idea? (I can run the operation I want the same number of times, take weighted mean of that, subtract weighted mean of gettime and get a good estimate of the operation irrespective of caching?)

Any suggestions are welcome.

Thank you in advance.

Was it helpful?

Solution

When you compute the time difference: (end.tv_nsec-begin.tv_nsec)/n

You are only taking into account the nanoseconds part of the elapsed time. You must also take the seconds into account since the tv_nsec field only reflects the fractional part of a second:

int64_t end_ns = ((int64_t)end.tv_sec * 1000000000) + end.tv_nsec;
int64_t begin_ns = ((int64_t)begin.tv_sec * 1000000000) + begin.tv_nsec;
int64_t elapsed_ns = end_ns - begin_ns;

Actually, with your current code you should sometimes get negative results when the nanoseconds part of end has wrapped around and is less than begin's nanoseconds part.

Fix that, and you'll be able to observe much more consistent results.


Edit: for the sake of completeness, here's the code I used for my tests, which gets me very consistent results (between 280 and 300ns per call, whatever number of iterations I use):

int main() {
  const int loops = 100000000;

  struct timespec begin;
  struct timespec end;
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &begin);

  for(int i = 0; i < loops; i++)
      clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);

  int64_t end_ns = ((int64_t)end.tv_sec * 1000000000) + end.tv_nsec;
  int64_t begin_ns = ((int64_t)begin.tv_sec * 1000000000) + begin.tv_nsec;
  int64_t elapsed_ns = end_ns - begin_ns;
  int64_t ns_per_call = elapsed_ns / loops;
  std::cout << ns_per_call << std::endl;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top