Question

I'm just a beginner with CUDA and Nsight and want to utilize great GPU performance with linear algebra operations (e.g. CUBLAS). I've got a lots of custom code written with the help of Eigen and there are lots of matrix multiplication operations, so I wanted to have my code unchanged, just do those operations on GPU.

I've created a sample project with Visual Studio Nsight and it worked fine, but when I add

#include <Eigen/Dense>

line to that project, I've got following errors

1>------ Build started: Project: MatrixPerformanceCompare, Configuration: Debug Win32 ------
1>  Compiling CUDA source file kernel.cu...
1>  
1>  C:\CUDA\Progs\VS\SampleProject\MatrixPerformanceCompare>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2010 -ccbin "c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin"  -Ic:\CUDA\Progs\VS\SampleProject\MatrixPerformanceCompare\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include"  -G   --keep-dir Debug -maxrregcount=0  --machine 32 --compile -cudart static  -g   -DWIN32 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -o Debug\kernel.cu.obj "C:\CUDA\Progs\VS\SampleProject\MatrixPerformanceCompare\kernel.cu" 
1>c:\cuda\progs\vs\sampleproject\matrixperformancecompare\include\eigen\src/Core/Block.h(102): error : "operator=" has already been declared in the current scope
1>c:\cuda\progs\vs\sampleproject\matrixperformancecompare\include\eigen\src/Core/Ref.h(122): error : "operator=" has already been declared in the current scope
1>c:\cuda\progs\vs\sampleproject\matrixperformancecompare\include\eigen\src/Core/products/Parallelizer.h(20): warning : variable "m_maxThreads" was set but never used
1>c:\cuda\progs\vs\sampleproject\matrixperformancecompare\include\eigen\src/Geometry/RotationBase.h(76): error : function template "Eigen::operator*(const Eigen::EigenBase<OtherDerived> &, const Eigen::Quaternion<_Scalar, _Options> &)" has already been defined
1>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 5.5.targets(592,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2010 -ccbin "c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin"  -Ic:\CUDA\Progs\VS\SampleProject\MatrixPerformanceCompare\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include"  -G   --keep-dir Debug -maxrregcount=0  --machine 32 --compile -cudart static  -g   -DWIN32 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -o Debug\kernel.cu.obj "C:\CUDA\Progs\VS\SampleProject\MatrixPerformanceCompare\kernel.cu"" exited with code 2.
========== Build: 0 succeeded, 1 failed, 1 up-to-date, 0 skipped ==========

I know that this is a error connected with the define guards, but those in Eigen seems OK, and in simple c++ project the code with the same Eigen source compiles fine. Could you help me?

Was it helpful?

Solution

The CUDA front end parser for C++ code is not capable of parsing extremely complex host template definitions correctly in all situations. It's job is to look through code in a .cu file and try and split out code which must be compiled by the GPU toolchain from code which should pass through to the host compiler. It is known to fail when importing Boost and QT headers into .cu files. I'll wager the Eigen templates are causing the same problem.

The only solution I am aware of is to refactor your code to separate out the host code that relies on the templates to a different file with a .cc extension. The CUDA front end never sees any code in a .cc file and the problem disappears. In practice, this sort of code splitting isn't really a problem because the host template code can't actually be used inside CUDA GPU code anyway, and at worst you might require a small wrapper function or additional level of abstraction to keep your GPU and host code separate.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top