CUDA/PTX 32-bit vs. 64-bit

Question

Pointers are certainly the most obvious difference. 64 bit machine model enables 64-bit pointers. 64 bit pointers enable a variety of things, such as address spaces larger than 4GB, and unified virtual addressing. Unified virtual addressing in turn enables other things, such as GPUDirect Peer-to-Peer. The CUDA IPC API also depends on 64 bit machine model.

The x64 ISA is not completely different than the x86 ISA, it's mostly an extension of it. Those familiar with the x86 ISA will find the x64 ISA familiar, with natural extensions for 64-bits where needed. Likewise 64 bit machine model is an extension of the capabilities of the PTX ISA to 64-bits. Most PTX instructions work exactly the same way.

32 bit machine model can handle 64 bit data types (such as double and long long), so frequently there don't need to be any changes to properly written CUDA C/C++ source code to compile for 32 bit machine model or 64 bit machine model. If you program directly in PTX, you may have to account for the pointer size differences, at least.