Ptx assembly aborted due to errors [closed]

https://stackoverflow.com/questions/23106643

04-07-2023
|

Question

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.

Closed 9 years ago.

Improve this question

I have tried to write programm for multiply CSR matrix and vector on C/CUDA 5.5 in MS Visual Studio 2012, but faced with ptx error. Listing of my code:

__global__ void multKernelSymm(double* s, double* u, double* val, int* rowPtr, int* colInd)
    {
        int l = 0, jl, i;
        int idx = blockDim.x*blockIdx.x;

        l = rowPtr[idx] - 1;
        for ( i = 0; i < (rowPtr[idx + 1] - rowPtr[idx]); i++){
            jl = colInd[l] - 1;
            s[idx] = s[idx] + val[l] * u[jl];
            l+=1;
        }

        l = 0;
        l = rowPtr[idx] - 1;
        for (int i = 0; i < (rowPtr[idx + 1] - rowPtr[idx]); i++){
            jl = colInd[l] - 1;
            if(jl > idx)
                atomicAdd(&s[jl], val[l] * u[idx]);
                //s[jl] = s[jl] + val[l] * u[idx];
            l+=1;
        }


    }

    __device__ double atomicAdd(double* address, double val)
    {
         unsigned long long int* address_as_ull =
         (unsigned long long int*)address;
         unsigned long long int old = *address_as_ull, assumed;
         do {
            assumed = old;
            old = atomicCAS(address_as_ull, assumed, __double_as_longlong(val + __longlong_as_double(assumed)));
         } while (assumed != old);
         return __longlong_as_double(old);
    }

This I get in VS:

 C:\Users\Дмитрий\Documents\Visual Studio 2012\Projects\matrix mult simple\matrix mult simple>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2012 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.0\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static  -g   -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -o x64\Debug\kernel.cu.obj "C:\Users\Дмитрий\Documents\Visual Studio 2012\Projects\matrix mult simple\matrix mult simple\kernel.cu" 
1>  ptxas C:/Users/AACE~1/AppData/Local/Temp/tmpxft_00001b88_00000000-4_kernel.ptx, line 479; : fatal error : Parsing error near '-': syntax error
1>ptxas : fatal error : Ptx assembly aborted due to errors
1>  kernel.cu
1>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V110\BuildCustomizations\CUDA 6.0.targets(597,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2012 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.0\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static  -g   -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -o x64\Debug\kernel.cu.obj "C:\Users\Дмитрий\Documents\Visual Studio 2012\Projects\matrix mult simple\matrix mult simple\kernel.cu"" exited with code -1.
1>
 >     ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

Ho I can fix it?

Solution

I was able to get your code to compile by making 3 changes:

As @PaulR indicated, remove the double keyword when calling your custom atomicAdd function.
The first parameter of atomicAdd should be a pointer to the address you want to update (the address you want to add the value to.) Instead you are passing the value at that address. So your function call should look like this:
```
atomicAdd(&(s[jl]), val[l] * u[idx]);
```
Finally, move the declaration and definition of your custom atomicAdd function to before your kernel definition.

With those changes I could get your code to compile.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow