Domanda

How are these two block sizes (1024x1 vs 32x32) expected to perform from thread scheduling and memory bandwidth perspective? Is there any expected difference in performance of these 2 block sizes? Note that both use 1024 threads per block.

È stato utile?

Soluzione

Threadblock dimensions, especially when we are talking about the same number of threads per block, don't by themselves affect performance.

Threads are still grouped for execution into warps. The only direct effect of threadblock dimensions is to change the built-in variables e.g. threadIdx.x, blockIdx.x, etc. that are passed to each thread, which is not a performance issue.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top