Question

In my experience applications written in CUDA run faster than written in OpenCL when run on the same NVidia hardware.

How can this capability be utilized without losing the cross-platform capabilities of OpenCL?

I suspect it may be possible to create a "failback" system where, if there are no NVidia devices available and/or no CUDA version of the requested kernel, then the system would failback to utilizing the OpenCL version. Alternatively, large tasks could be load balanced across NVidia and non-NVidia hardware. Ideally such an application would need to be cross platform and also function on machines that don't have NVidia hardware available.

As far as I can tell, this boils down to being able to utilize CUDA support as dynamic libraries (dll/.so). I am already using JOCL to access OpenCL but I don't see how I would be able to bind to kernels generated with CUDA as all examples I'm able to find are stand-alone applications.

Are there any open-source examples of such systems?

Are there any technical limitations that make developing such a hybrid application impossible?

Was it helpful?

Solution

Answering the question:

The development is possible, and you can do it without any problem with the tools commented in the comments. (examples, JCUDA and JOCL) A quick google search will bring you many free wrappers to put CUDA and OpenCL to Java.

As for the failsafe, CUDA_ERROR_INVALID_DEVICE will be returned at init CUDA in a non-CUDA system in JCUDA. JOCL will give similar error at initializing stage. Then you can simply select the one that didn't fail or the best one for you. (or in the last case, CPU code only in Java)

However, I cannot understand the background of your question. Since I couldn't find any situation where OpenCL was slower to CUDA. At least, not in the last version of the standards. And my personal usage has shown that even in some cases OpenCL is faster (+-5%). Of course you need to implement both properly, otherwise, one of them will be deeply penalized by a wrong deployment.

You would better take the way of using just one of the both options, either CUDA (if you find it easyer and gives you good performance without any headache) or OpenCL (for flexibility). Using both, maintaining both, and selecting properly the useful one for each case as well as having to deal with the fail safe code, will make your project terribly difficult.

OTHER TIPS

Maybe also have a look at OpenCL which, in theory, should be a bit more cross-platform and also allows to transparently run on different processors (read: GPU and/or CPU as available).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top