CUDA vs Direct X 10 for parallel mathematics. any thoughs you have about it?

https://stackoverflow.com/questions/625162

cuda

05-07-2019
|

Question

CUDA vs Direct X 10 for parallel mathematics. any thoughs you have about it ?

Solution

CUDA is probably a better option, if you know your target architecture is using nVidia chips. You have complete control over your data transfers, instruction paths and order of operations. You can also get by with a lot less __syncthreads calls when you're working on the lower level.

DirectX 10 will be easier to interface against, I should think, but if you really want to push your speed optimization, you have to bypass the extra layer. DirectX 10 will also not know when to use texture memory versus constant memory versus shared memory as well as you will depending on your particular algorithm.

If you have access to a Tesla C1060 or something like that, CUDA is by far the better choice hands down. You can really speed things up if you know the specifics of your GPGPU - I've seen 188x speedups in one particular algorithm on a Tesla versus my desktop.

OTHER TIPS

I find CUDA awkward. It's not C, but a subset of it. It doesn't support double precision floating point natively and is emulated. For single precision it's okay though. It depends on the type of task you throw at it. You have to spend more time computing in parallel than you spend passing the data around for it to be worth using. But that issue is not unique to CUDA.

I'd wait for Apple's OpenCL which seems like it will be the industry standard for parallel computing.

Well, CUDA is portable... That's a big win if you ask me...

CUDA has nothing to do about supporting double precision floating point operations. This is dependent on the hardware available. The 9, 100, 200 and Tesla series support double precision floating point operations tesla.

It should be easy to decide between them.

If your app can tolerate being Windows specific, you can still consider DirectX Compute. Otherwise, use CUDA or OpenCL.

If your app cannot tolerate a vendor lock on NVIDIA, you cannot use CUDA, you must use OpenCL or DirectX Compute.

If your app is doing DirectX interop, consider that CUDA/OpenCL will incur context switch overhead doing graphics API interop, and DirectX Compute will not.

Unless one or more of those criteria affect your application, use the great granddaddy of massively parallel toolchains: CUDA.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow