문제

Does anybody know why the following function, uses 16 432 B of shared data? In my opinion it should be: 32x32x8x2 = 16 384 B

__global__ void matrixMulKernel(double *c, const double *a, const double *b, unsigned int size)
{
    __shared__ double as[32][32];
    __shared__ double bs[32][32];
    unsigned int bx = blockIdx.x, by = blockIdx.y;
    unsigned int tx = threadIdx.x, ty = threadIdx.y;
    unsigned int row = bx * TILE_WIDTH + tx;
    unsigned int col = by * TILE_WIDTH + ty;
    double Pval = 0.0;
    for(unsigned int q = 0; q < size / TILE_WIDTH; q++)
    {
        as[tx][ty] = a[row * size + q * TILE_WIDTH + ty];
        bs[ty][tx] = b[(q * TILE_WIDTH + tx) * size + col];
        __syncthreads();

        for(unsigned int k = 0; k < TILE_WIDTH; k++)
            Pval += as[tx][k] * bs[k][ty];
        __syncthreads();
    }

    c[row * size + col] = Pval;
}

The compiler is giving following error:

Entry function '_Z15matrixMulKernelPdPKdS1_j' uses too much shared data (0x4030 bytes, 0x4000 max)

I'm interested in why this is so, not as a workaround :)

도움이 되었습니까?

해결책

Probably you are compiling for a cc 1.x device. The documentation indicates that global kernel parameters are passed via shared memory for cc 1.x devices.

So you have 16,384 bytes for your explicit __shared__ declarations. The remainder would be from the 28 bytes (assuming 64 bit target) required by your explicit kernel parameters plus other overhead that is communicated via shared memory.

Try compiling for a cc 2.x device:

nvcc -arch=sm_20 ...
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top