質問

Is it possible to share a cudaMalloc'ed GPU buffer between different contexts (CPU threads) which use the same GPU? Each context allocates an input buffer which need to be filled up by a pre-processing kernel which will use the entire GPU and then distribute the output to them.

This scenario is ideal to avoid multiple data transfer to and from the GPUs. The application is a beamformer, which will combine multiple antenna signals and generate multiple beams, where each beam will be processed by a different GPU context. The entire processing pipeline for the beams is already in place, I just need to add the beamforming part. Having each thread generate it's own beam would duplicate the input data so I'd like to avoid that (also, the it's much more efficient to generate multiple beams at one go).

役に立ちましたか?

解決

Each CUDA context has it's own virtual memory space, therefore you cannot use a pointer from one context inside another context.

That being said, as of CUDA 4.0 by default there is one context created per process and not per thread. If you have multiple threads running with the same CUDA context, sharing device pointers between threads should work without problems.

他のヒント

I don't think multiple threads can run with the same CUDA context. I have done the experiments, parent cpu thread create a context and then fork a child thread. The child thread will launch a kernel using the context(cuCtxPushCurrent(ctx) ) created by the parent thread. The program just hang there.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top