سؤال

I'm implement my kernel in a multithreaded "host"-program, where every host thread is calling the kernel. I've got a problem with the use of constant memory. In the constant memory will be placed some parameters, but for every thread they are different. I build a sample where the problem occurs, too.

This is the kernel

__global__ void Kernel( int *aiOutput, int Length )
{
    int id = threadIdx.x + blockIdx.x * blockDim.x;

    int iValue = 0;

    // bound check
    if( id < Length )
    {
        if( id % 3 == 0 )
            iValue = c_iaCoeff[2];
        else if( id % 2 == 0 )
            iValue = c_iaCoeff[1];
        else
            iValue = c_iaCoeff[0];

        aiOutput[id] = iValue;
    }
    __syncthreads();
}

And a pthread is calling this function.

void* WrapperCopy( void* params )
{
    // choose cuda device to perform on
    CUDA_CHECK_RETURN( cudaSetDevice( 0 ) );

    // cast of params
    SParams *_params = (SParams*)params;

    // copy coefficients to constant memory
    CUDA_CHECK_RETURN( cudaMemcpyToSymbol( c_iaCoeff, _params->h_piCoeff, 3*sizeof(int) ) );

    // loop kernel
    for( int i=0; i<100; i++ )
    {
        // perfrom kernel
        Kernel<<< BLOCKCOUNT, BLOCKSIZE >>>( _params->d_piArray, _params->iLength );
    }

    // copy data back from gpu
    CUDA_CHECK_RETURN( cudaMemcpy(
            _params->h_piArray, _params->d_piArray, BLOCKSIZE*BLOCKCOUNT*sizeof(int), cudaMemcpyDeviceToHost ) );

    return NULL;
}

Constant memory is declared as this.

__constant__ int c_iaCoeff[ 3 ];

For every host thread has diffrent values in h_piCoeff and will copy that to the constant memory.

Now I get for every pthread call the same results, becaus all of them got the same values in c_iaCoeff. I think that is the problem of how constant memory works and have to be declared in a context - in the sample there will be only one c_iaCoeff declared for all pthreads calling and the kernels called by pthreads will get the values of the last cudaMemcpyToSymbol. Is that right?

Now I've tried to change my constant memory in a two-dimensional array. The second dimension will be the values as before, but the first will be the index of the used pthread.

__constant__ int c_iaCoeff2[ THREADS ][ 3 ];

In the kernels the use of it will be in this way.

iValue = c_iaCoeff2[iTId][2];

But I don't know if it's possible to use constant memory in this way, is it? Also I got an error when I try to copy data to the constant memory.

CUDA_CHECK_RETURN( cudaMemcpyToSymbol( c_iaCoeff[_params->iTId], _params->h_piCoeff, 3*sizeof(int) ) );

General is it possible to use constant memory as a two-dimensional array and if yes, where is my failure?

هل كانت مفيدة؟

المحلول

Yes, you should be able to use constant memory in the way you want to, but the cudaMemcpyToSymbol copy operation you are using is incorrect. The first argument to the call is a symbol, and the API does a lookup in the runtime symbol table to get the address of the constant memory symbol you request. So an address can't be passed to the call (although your code is actually passing an initialised host value to the call, why that is I will leave as an exercise to the reader).

What you may have missed is the optional fourth argument in the call, which is an offset into the memory pointed to by the symbol you request. So you should be able to do something like:

cudaMemcpyToSymbol( c_iaCoeff,                    // symbol to lookup
                    _params->h_piCoeff,           // source location
                    3*sizeof(int),                // number of bytes to copy
                    (3*_params->iTId)*sizeof(int) // Offset in bytes
                   );

[standard disclaimer: written in browser, unstested. use at own risk]

The last argument is the offset in bytes from the start of the symbol. Your 2D array will be laid out in row major order, so you need to use the pitch of the rows multiplied by the row index as an offset for each copy operation.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top