CUDA external class linkage and unresolved extern function in ptxas file

https://stackoverflow.com/questions/17188527

01-06-2022
|

Question

I'm working with CUDA and I have created an int2_ class to deal with complex integer numbers.

Class declarations in the ComplexTypes.h file as follows:

namespace LibraryNameSpace
{
    class int2_ {

        public:
            int x;
            int y;

            // Constructors
            __host__ __device__ int2_(const int,const int);
            __host__ __device__ int2_();
            // etc.

            // Equalities with other types      
            __host__ __device__ const int2_& operator=(const int);
            __host__ __device__ const int2_& operator=(const float);
            // etc.

    };
}

Class implementations in the ComplexTypes.cpp file as follows:

#include "ComplexTypes.h"

__host__ __device__         LibraryNameSpace::int2_::int2_(const int x_,const int y_)           { x=x_; y=y_;}
__host__ __device__         LibraryNameSpace::int2_::int2_() {}
// etc.

__host__ __device__ const   LibraryNameSpace::int2_& LibraryNameSpace::int2_::operator=(const int a)                        { x = a;            y = 0.;             return *this; }
__host__ __device__ const   LibraryNameSpace::int2_& LibraryNameSpace::int2_::operator=(const float a)                      { x = (int)a;       y = 0.;             return *this; }
// etc.

Everything works well. In the main (which includes ComplexTypes.h) I could deal with int2_ numbers.

In the CudaMatrix.cu file, I'm now including ComplexTypes.h and defining and properly instantiating the __global__ function:

template <class T1, class T2>
__global__ void evaluation_matrix(T1* data_, T2* ob, int NumElements)
{
    const int i = blockDim.x * blockIdx.x + threadIdx.x;
    if(i < NumElements) data_[i] = ob[i];
}

template __global__ void evaluation_matrix(LibraryNameSpace::int2_*,int*,int);

The situation of the CudaMatrix.cu file seems to be symmetric to the main function. Nevertheless, the compiler complains:

Error   19  error : Unresolved extern function '_ZN16LibraryNameSpace5int2_aSEi'    C:\Users\Documents\Project\Test\Testing_Files\ptxas simpleTest

Please, consider that:

Before moving the implementation to separate files, everything was working correctly when including both declarations and implementations in the main file.
The problematic instruction is data_[i] = ob[i].

Anyone has an idea of what is going on?

La solution

The procedure I have followed in my post above has two issues:

The ComplexTypes.cpp filename must be turned to ComplexTypes.cu so that nvcc could intercept the CUDA keywords __device__ and __host__. This has been pointed out by Talonmies in his comment. Actually, before posting, I was already changing the filename from .cpp to .cu, but the compiler was complaining and showing the same error. Therefore, I was ingenuously stepping back;
In Visual Studio 2010, one has to use View -> Property Pages; Configuration Properties -> CUDA C/C++ -> Common -> Generate Relocatable Device Code -> Yes (-rdc=true). This is necessary for separate compilation. Indeed, at NVIDIA CUDA Compiler Driver NVCC, it is said that:

CUDA works by embedding device code into host objects. In whole program compilation, it embeds executable device code into the host object. In separate compilation, we embed relocatable device code into the host object, and run the device linker (nvlink) to link all the device code together. The output of nvlink is then linked together with all the host objects by the host linker to form the final executable. The generation of relocatable vs executable device code is controlled by the --relocatable-device-code={true,false} option, which can be shortened to –rdc={true,false}.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow