Answering the question:
The development is possible, and you can do it without any problem with the tools commented in the comments. (examples, JCUDA and JOCL) A quick google search will bring you many free wrappers to put CUDA and OpenCL to Java.
As for the failsafe, CUDA_ERROR_INVALID_DEVICE will be returned at init CUDA in a non-CUDA system in JCUDA. JOCL will give similar error at initializing stage. Then you can simply select the one that didn't fail or the best one for you. (or in the last case, CPU code only in Java)
However, I cannot understand the background of your question. Since I couldn't find any situation where OpenCL was slower to CUDA. At least, not in the last version of the standards. And my personal usage has shown that even in some cases OpenCL is faster (+-5%). Of course you need to implement both properly, otherwise, one of them will be deeply penalized by a wrong deployment.
You would better take the way of using just one of the both options, either CUDA (if you find it easyer and gives you good performance without any headache) or OpenCL (for flexibility). Using both, maintaining both, and selecting properly the useful one for each case as well as having to deal with the fail safe code, will make your project terribly difficult.