Frage

I have written a struct and some functions that wrap the "CUBLAS matrix object"

struct is:

#include <cuda.h>
#include <cuda_runtime.h>
#include <cublas_v2.h>

#define uint unsigned int

typedef struct {
    uint rows;
    uint cols;
    float* devPtrvals;
} matrix;

The alloc function creates the matrix struct:

matrix* matrix_alloc(uint rows, uint cols)
{
    cudaError_t cudaStat;
    matrix* w = malloc(sizeof(matrix));
    w->rows = rows;
    w->cols = cols;
    cudaStat = cudaMalloc((void**)&w->devPtrvals, sizeof(float) * rows * cols);
    if(cudaStat != cudaSuccess) {
        fprintf(stderr, "device memory allocation failed\n");
        return NULL;
    }
    return w;
};

Free function:

uint matrix_free(matrix* w)
{
    cudaFree(w->devPtrvals);
    free(w);
    return 1;
};

Function that sets the values of the matrix from a float array:

uint matrix_set_vals(matrix* w, float* vals)
{
    cublasStatus_t stat;
    stat = cublasSetMatrix(w->rows, w->cols, sizeof(float),
                           vals, w->rows, w->devPtrvals, w->rows);
    if(stat != CUBLAS_STATUS_SUCCESS) {
        fprintf(stderr, "data upload failed\n");
        return 0;
    }
    return 1;
};

I have a problem to write an universal dot product function, that covers the transposing of the matrices. This is what I have written:

matrix* matrix_dot(cublasHandle_t handle, char transA, char transB,
                   float alpha, matrix* v, matrix* w, float beta)
{
    matrix* x = matrix_alloc(transA == CUBLAS_OP_N ? v->rows : v->cols,
                             transB == CUBLAS_OP_N ? w->cols : w->rows);
    //cublasStatus_t cublasSgemm(cublasHandle_t handle,
    //                 cublasOperation_t transa, cublasOperation_t transb,
    //                 int m, int n, int k,
    //                 const float           *alpha,
    //                 const float           *A, int lda,
    //                 const float           *B, int ldb,
    //                 const float           *beta,
    //                 float           *C, int ldc)
    cublasSgemm(handle, transA, transB,
                transA == CUBLAS_OP_N ? v->rows : v->cols,
                transB == CUBLAS_OP_N ? w->cols : w->rows,
                transA == CUBLAS_OP_N ? v->cols : v->rows,
                &alpha, v->devPtrvals, v->rows, w->devPtrvals,
                w->rows, &beta, x->devPtrvals, x->rows);
    return x;
};

example:

I want a matrix A:

 1  2  3
 4  5  6
 7  8  9
 10 11 12

that means:

float* a = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
matrix* A = matrix_alloc(4, 3);
matrix_set_vals(A, a);

and multiply it with transposed B:

1 2 3
4 5 6

also:

float* b = {1, 2, 3, 4, 5, 6};
matrix* B = matrix_alloc(2, 3);
matrix_set_vals(B, b);

the result of A*B^T=C:

14    32
32    77
50   122
68   167

I'm using the dot function:

matrix* C = matrix_dot(handle, CUBLAS_OP_N, CUBLAS_OP_T, 1.0, A, B, 0.0);

When using this function I get: ** On entry to SGEMM parameter number 10 had an illegal value

What am I doing wrong?

War es hilfreich?

Lösung

There are 2 problems in your code.

First, you stored your matrices in row-major, but cublas assumes that the matrices should be stored in col-major. For col-major matrix A, it should be initialized with the following data.

float* a = {1,4,7,10,2,5,8,11,3,6,9,12};

In fact you probably have noticed that col-major cublas_gemm() can also be used to calculate the row-major matrix multiplication. Since the data layout of a matrix M stored in row-major is exactly the same as the data layout of the transposed matrix M^T stored in col-major, if there's no padding bytes in the storage. So if you want to do

 C_row = A_row * B_row

you can use this instead

 C_col_trans = B_col_trans * A_col_trans

where the underlying storage layouts of C_row and C_col_trans are exactly the same, as well as A and B.

The second problem is about the leading dimension. When there's no padding bytes in the storage, the ld of the row-major matrix is equal to the number of columns, and the ld of the col-marjor matrix is equal to the number of rows.

Edit

Another problem is you may have to use cublasOperation_t tansA instead of char transA.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top