Pregunta

I recently compared 2 kinds of doing kernel runtime measuring and I see some confusing results.

I use an AMD Bobcat CPU (E-350) with integrated GPU and Ubuntu Linux (CL_PLATFORM_VERSION is OpenCL 1.2 AMD-APP (923.1)).

The basic gettimeofday idea looks like this:

clFinish(...)  // that all tasks are finished on the command queue
gettimeofday(&starttime,0x0)
clEnqueueNDRangeKernel(...)
clFlush(...)
clWaitForEvents(...)
gettimeofday(&endtime,0x0)

This says the kernel needs around 5466 ms.

Second time measurement I did with clGetEventProfilingInfo for QUEUED / SUBMIT / START / END.

With the 4 time values I can calculate the time spend in the different states:

  • time spend queued: 0.06 ms,
  • time spend submitted: 2733 ms,
  • time spend in execution: 2731 ms (actual execution time).

I see that it adds up to the 5466 ms, but why does it stay in submitted state for half the time?

And the funny things are:

  • the submitted state is always half of the actual execution time, even for different kernels or different workload (so it can't be a constant setup time),

  • for the CPU the time spend in submitted state is 0 and the execution time is equal to the gettimeofday result,

  • I tested my kernels on an Intel Ivy Bridge with windows using CPU and GPU and I didn't see the effects there.

Does anyone have a clue?

I suspect that either the GPU runs the kernel twice (resulting in gettimeofday being double of the actual execution time) or that the function clGetEventProfilingInfo is not working correctly for the AMD GPU.

¿Fue útil?

Solución

I posted the problem in an AMD forum. They say it's a bug in the AMD profiler.

http://devgurus.amd.com/thread/159809

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top