Why cublasGetVector get this result? [closed]

https://stackoverflow.com/questions/13144698

cuda
cublas

21-07-2021
|

سؤال

Had a hard time understanding how array(dimensions) is organized in cublas. Did the following test, but the output can't be explained. Thanks for any help!

include <stdio.h>
include <stdlib.h>
include <cublas.h>

define DIMX 5
define DIMY 5
define ROW 2
define COL 3

typedef int TYPE;

void print_matrix(TYPE * v)
{
    int i,j;
    for (i=0; i<DIMX; i++)
    {
        for (j=0; j<DIMY; j++) printf("%5d ",v[i*DIMY+j]);
        printf("\n");
    }
}

    int main()
    {
        printf("Hello world!\n");

        int i;
        //Initialize the array
        TYPE v[DIMX*DIMY];
        for (i=0; i<DIMX*DIMY; i++) v[i]=i+1;
        printf("Before:\n");
        print_matrix(v);

        //Cublas part
        cublasInit();
        int *cv;
        cublasAlloc(DIMX*DIMY,sizeof(TYPE),(void**)&cv);
        cublasSetMatrix(ROW,COL,sizeof(TYPE),v,DIMX,cv,DIMY);
        //cublasGetVector(DIMX*DIMY,sizeof(TYPE),cv,1,v,1);
        cublasGetVector(DIMX*DIMY,sizeof(TYPE),cv,DIMX,v,DIMX);
        cublasFree(cv);
        cublasShutdown();

        printf("After:\n");
        print_matrix(v);
        return 0;
    }

Output:

Hello world! Before: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 After: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

المحلول

The first problem you have is you're doing no error checking. If you were doing error checking, you'd discover you get a mapping error for your call to cublasGetVector. Secondly, you need to review the API definitions for the cublas calls. In your call to cublasSetMatrix, for the leading dimension of the first matrix you have DIMX and the leading dimension of the second matrix you have DIMY. Both should be DIMX. This doesn't really matter since you are dealing with square matrices. The problem with your cublasGetVector call is that you are passing DIMX and DIMY for the inc parameters, which is causing this copy operation to go past the end of the matrix cv in GPU memory. You should be passing 1 for the increment values, if you want to capture the upper left hand corner elements based on your ROW and COL parameters. Here's some code that does what I think you had intended, and shows an example of error checking:

#include <stdio.h>
#include <stdlib.h>
#include <cublas.h>
#include <helper_cuda.h>

#define DIMX 5
#define DIMY 5
#define ROW 2
#define COL 3

typedef int TYPE;

#define cublasCheckErrors(fn) \
    do { \
        cublasStatus_t __err = fn; \
        if (__err != CUBLAS_STATUS_SUCCESS) { \
            fprintf(stderr, "Fatal error: %s (at %s:%d)\n", \
                _cudaGetErrorEnum(__err), \
                __FILE__, __LINE__); \
            fprintf(stderr, "*** FAILED - ABORTING\n"); \
            exit(1); \
        } \
    } while (0)

void print_matrix(TYPE * v)
{
    int i,j;
    for (i=0; i<DIMX; i++)
    {
        for (j=0; j<DIMY; j++) printf("%5d ",v[i*DIMY+j]);
        printf("\n");
    }
}

    int main()
    {
        printf("Hello world!\n");

        int i;
        //Initialize the array
        TYPE v[DIMX*DIMY];
        for (i=0; i<DIMX*DIMY; i++) v[i]=i+1;
        printf("Before:\n");
        print_matrix(v);

        //Cublas part
        cublasCheckErrors(cublasInit());
        int *cv;
        cublasCheckErrors(cublasAlloc(DIMX*DIMY,sizeof(TYPE),(void**)&cv));
        cublasCheckErrors(cublasSetMatrix(ROW,COL,sizeof(TYPE),v,DIMX,cv,DIMX));
        //cublasGetVector(DIMX*DIMY,sizeof(TYPE),cv,1,v,1);
        cublasCheckErrors(cublasGetVector(DIMX*DIMY,sizeof(TYPE),cv,1,v,1));
        cublasCheckErrors(cublasFree(cv));
        cublasCheckErrors(cublasShutdown());

        printf("After:\n");
        print_matrix(v);
        return 0;
    }

You need to compile it with a command something like this:

g++ -I/usr/local/cuda/include -I /usr/local/cuda/samples/common/inc -L/usr/local/cuda/lib64 -lcublas -o t24 t24.cpp

This assumes you have a standard CUDA 5 installation and that you installed the cuda 5 samples in the standard location. This allows me to pick up a convenient error parser for cublas: _cudaGetErrorEnum()

With these changes, I get a result like this:

Hello world!
Before:
    1     2     3     4     5
    6     7     8     9    10
   11    12    13    14    15
   16    17    18    19    20
   21    22    23    24    25
After:
    1     2     0     0     0
    6     7     0     0     0
   11    12     0     0     0
    0     0     0     0     0
    0     0     0     0     0

Also note that you are only partially populating cv but copying all the contents of cv back to v. This means that where I have zero's above in the After: result, you could have any number. So you should initialize all the elements of cv to some value. And my After: result shows 2 columns and 3 rows that are non-zero because although you have a parameter ROW, you are passing it in the wrong position on the cublasSetMatrix call. The cublas API is generally expecting things in column-major form, which is a reversal of indices from row-major form (the typical C or C++ form).

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow