Question

How are these two block sizes (1024x1 vs 32x32) expected to perform from thread scheduling and memory bandwidth perspective? Is there any expected difference in performance of these 2 block sizes? Note that both use 1024 threads per block.

Était-ce utile?

La solution

Threadblock dimensions, especially when we are talking about the same number of threads per block, don't by themselves affect performance.

Threads are still grouped for execution into warps. The only direct effect of threadblock dimensions is to change the built-in variables e.g. threadIdx.x, blockIdx.x, etc. that are passed to each thread, which is not a performance issue.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top