Your code had several errors. As I mention in the comments, you missed the fact that both your device read and device write messages were not being printed out because those functions (cublasSetMatrix, cublasGetMatrix) were in fact failing.
To fix the cublasSetMatrix and cublasGetMatrix calls, change the lda
and ldb
parameters to 1:
status = cublasSetMatrix(1, N, sizeof(R[0]), R, 1, dR, 1);
...
status = cublasGetMatrix (1, N, sizeof(dR[0]), dR, 1, R, 1);
The documentation for these functions says: "with the leading dimension of the source matrix A and destination matrix B given in lda and ldb, respectively. The leading dimension indicates the number of rows of the allocated matrix"
In your line printing out the result of the cublasSasum operation, your printf statement is incorrectly using a int format specifier to print out a float value. This won't work. Change the %d
to %f
:
fprintf(stderr, "\ncublasSasum produced no error. Sum of dR: %f\n", ans);
With those changes, I was able to get a sensible result:
Values of R:
0.123020, 0.367809, 0.834681, 0.035096, 0.517014, 0.662984, 0.426221, 0.104678,
CUBLAS initialization succeeded.
Device memory allocation succeeded.
Device write succeeded.
cublasSasum produced no error. Sum of dR: 3.071503
cublasSaxpy produced no error.
Device read succeded
Values of R, after cublasSaxpy:
0.369060, 1.103427, 2.504043, 0.105288, 1.551042, 1.988952, 1.278663, 0.314034,
Zeroing with cudaMemset on R produced no error.
Device read succeded.
Values of R, after zeroing with cudaMemset:
0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
Note that this SO question/answer provides a tip for a useful, convenient cublas error parser function. It's not difficult to build this into a wrapper or error check macro for your cublas function calls.