Pure C OpenCL vs Python OpenCL performance

Question 1

It is likely that PyOpenCL is your best choice. I would choose to use C only in very specific situations (a super-critical need for speed/low-latency on the host). For most casual parallel programs, it is fine for the host side to have plenty of slack, because all the real work gets done on the device.

You can consider PyOpenCL and OpenCL to have identical performance on the device.

Maybe use C if you are, like... designing a self-driving car, and every millisecond/amp matters. But even in that situation, it is likely that Python could be used effectively.

The best way to figure out if your specific program is slowed down is to time your code. For PyOpenCL that means:

import time

and

cl.command_queue_properties.PROFILING_ENABLE

Many smart companies and individuals choose to code first in Python, because they can build a flexible, working prototype quickly. If they end up needing more host performance later, it is relatively easy to port Python to C.

Hope that helps!

Question 2

OpenCL uses precompiled programs, that later sent to device for execution. They are so-called "kernels". These kernels are deployed to be executed on end-device. This means main cost that must be measured is OpenCL implementation API I/O. Therefore, you can't rely on memory/CPU measurements, as real OpenCL part will use same of them.

AFAIK, no benchmarks available, but it is not hard to do one, if you will need it (matrix multiplication is hello world example, overall).

OpenCL is not that kind, that uses I/O on every CPU cycle. Field of use - really big data processing, that uses one big input, a lot of processing operations, and one output (no matter small or big). No one says that OpenCL can't be used with many I/O and minimal calculation variations, but implementation API overhead not worth it.

Expectations must be that I/O is pretty same fast in approximation to overall application performance.

Question 3

There is a benchmark here: https://github.com/bennylp/saxpy-benchmark, comparing PyOpenCL against OpenCL as well as other frameworks/methods such as CUDA, plain C++, Numpy, R, Octave, and even TensorFlow (disclaimer: I'm the author)

According to the benchmark results, the performance difference between OpenCL and PyOpenCL varies too wildly. The PyOpenCL GPU target is almost 7x slower than OpenCL, but for the CPU target PyOpenCL is actually more than 2x faster than OpenCL!