Question

How are these two block sizes (1024x1 vs 32x32) expected to perform from thread scheduling and memory bandwidth perspective? Is there any expected difference in performance of these 2 block sizes? Note that both use 1024 threads per block.

Was it helpful?

Solution

Threadblock dimensions, especially when we are talking about the same number of threads per block, don't by themselves affect performance.

Threads are still grouped for execution into warps. The only direct effect of threadblock dimensions is to change the built-in variables e.g. threadIdx.x, blockIdx.x, etc. that are passed to each thread, which is not a performance issue.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top