nvcc: Combine extern and constant

https://stackoverflow.com/questions/7959174

18-02-2021
|

Pregunta

I would like to organise my CUDA code into separate object files to be linked at the end of compiling, as in C++. To that end I'd like to be able to declare an extern pointer to __constant__ memory in a header file, and put the definition in one of the .cu files, also following the pattern from C++. But it seems that when I do so, nvcc ignores the 'extern' - it takes each declaration as a definition. Is there a way around this?

To be more specific about the code and the errors, I have this in a header file:

extern __device__ void* device_function_table[];

followed by this in a .cu file:

void* __device__ device_function_table[200];

which gives this error on compiling:

(path).cu:40: error: redefinition of ‘void* device_function_table [200]’
(path).hh:29: error: ‘void* device_function_table [200]’ previously declared here

My current solution is to use Makefile magic to glob together all my .cu files and have, in effect, one big translation unit but some semblance of file organisation. But this is already slowing down compiles noticeably, since a change to any one of my classes means recompiling all of them; and I anticipate adding several more classes.

Edit: I see I put __constant__ in the text and __device__ in the example; the question applies to both.

Solución

From the CUDA C Programming Guide version 4.0, section D.2.1.1:

The __device__ , __shared__ and __constant__ qualifiers are not allowed on:

class, struct, and union data members,

formal parameters,

local variables within a function that executes on the host.

__shared__ and __constant__ variables have implied static storage.

__device__ and __constant__ variables are only allowed at file scope.

__device__, __shared__ and __constant__ variables cannot be defined as external using the extern keyword. The only exception is for dynamically allocated __shared__ variables as described in Section B.2.3.

Otros consejos

Cutting the long story short, with recent CUDA toolkit (I'm on v8) and compute capability at least 2.0, in Visual Studio, go to Project Properties -> CUDA C/C++ -> Common , find "Generate Relocatable Device Code" in the list, set it to "Yes (-rdc=true)".

For command line this page suggests –dc compiler option

Since CUDA 5.0, it is now possible to have externally defined data with CUDA, if separate compilation and linking is enabled. This blog post explains it: http://devblogs.nvidia.com/parallelforall/separate-compilation-linking-cuda-device-code/

If this is done, one simply uses it like in the original post, and it 'just works'.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow