CUDA context lifetime

https://stackoverflow.com/questions/17721212

03-06-2022
|

質問

In my application I have some part of the code that works as follows

main.cpp

int main()
{
  //First dimension usually small (1-10)
  //Second dimension (100 - 1500)
  //Third dimension (10000 - 1000000)
  vector<vector<vector<double>>> someInfo;

  Object someObject(...); //Host class

  for (int i = 0; i < N; i++)
     someObject.functionA(&(someInfo[i]));
}

Object.cpp

void SomeObject::functionB(vector<vector<double>> *someInfo)
{
#define GPU 1
#if GPU == 1
    //GPU COMPUTING
    computeOnGPU(someInfo, aConstValue, aSecondConstValue);
#else
    //CPU COMPUTING
#endif
}

Object.cu

extern "C" void computeOnGPU(vector<vector<double>> *someInfo, int aConstValue, int aSecondConstValue)
{
   //Copy values to constant memory

   //Allocate memory on GPU       

   //Copy data to GPU global memory

   //Launch Kernel

   //Copy data back to CPU

   //Free memory
}

So as (I hope) you can see in the code, the function that prepares the GPU is called many times depending on the value of the first dimension.

All the values that I send to constant memory always remain the same and the sizes of the pointers allocated in global memory are always the same (the data is the only one changing).

This is the actual workflow in my code but I'm not getting any speedup when using GPU, I mean the kernel does execute faster but the memory transfers became my problem (as reported by nvprof).

So I was wondering where in my app the CUDA context starts and finishes to see if there is a way to do only once the copies to constant memory and memory allocations.

解決

Normally, the cuda context begins with the first CUDA call in your application, and ends when the application terminates.

You should be able to do what you have in mind, which is to do the allocations only once (at the beginning of your app) and the corresponding free operations only once (at the end of your app) and populate __constant__ memory only once, before it is used the first time.

It's not necessary to allocate and free the data structures in GPU memory repetetively, if they are not changing in size.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow