Question

According to CUBLAS reference, asum function (for getting the sum of the elements of a vector) is:

cublasStatus_t  cublasSasum(cublasHandle_t handle, int n, const float *x, int incx, float *result)

You can see in the link to the reference the parameters explanation, roughly we have a vector x of n elements with incx distance between elements.

My code is (quite simplified, but I also tested this one and there is still the error):

int arraySize = 10;
float* a = (float*) malloc (sizeof(float) * arraySize);

float* d_a;
cudaMalloc((void**) &d_a, sizeof(float) * arraySize);

for (int i=0; i<arraySize; i++)
    a[i]=0.8f;

cudaMemcpy(d_a, a, sizeof(float) * arraySize, cudaMemcpyHostToDevice);

cublasStatus_t ret;  
cublasHandle_t handle;
ret = cublasCreate(&handle);

float* cb_result = (float*) malloc (sizeof(float));

ret = cublasSasum(handle, arraySize, d_a, sizeof(float), cb_result);

printf("\n\nCUBLAS: %.3f", *cb_result);

cublasDestroy(handle);

I have removed error checking for simplifying the code (there are no errors, CUBLAS functions return CUDA_STATUS_SUCCESS) and free and cudaFree.

It compiles, it runs, it doesn't throw any error, but result printed is 0, and, debugging, it is actually 1.QNAN.

What did i miss?

Was it helpful?

Solution

One of the arguments to cublasSasum is incorrect. The call should look like this:

ret = cublasSasum(handle, arraySize, d_a, 1, cb_result);

Note that the second last argument, incx, should be in words, not bytes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top