CUDA / C++ does not support this kind of function overloading, because in the end, there is no different function signature. The common approach to have both, i.e. host
and device
versions is to use __host__
in combination with __device__
alongside with an #ifdef
, e.g.
__host__ __device__ double stdinvcdf(float x)
{
#ifdef __CUDA_ARCH__
/* DEVICE CODE */
#else
/* HOST CODE */
#endif
}
This solution was also discussed in this thread in the NVIDIA developer forum.