Is there a good implementation of reduction algorithm callable from kernel with dynamic parallelism?

StackOverflow https://stackoverflow.com/questions/21071278

  •  27-09-2022
  •  | 
  •  

문제

I see reductions algorithms in CUDA (such as summation and maximization over a range of elements) discussed in previous posts, but with dynamic parallelism, they could potentially be implemented in a different way. Is there a more efficient implementation which is callable from inside the kernels?

도움이 되었습니까?

해결책

Is there a more efficient implementation which is callable from inside the kernels?

CUB provides a CUDA reduction primitive compatible with dynamic parallelism, namely, that can be called within kernels.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top