Threadblock dimensions, especially when we are talking about the same number of threads per block, don't by themselves affect performance.
Threads are still grouped for execution into warps. The only direct effect of threadblock dimensions is to change the built-in variables e.g. threadIdx.x
, blockIdx.x
, etc. that are passed to each thread, which is not a performance issue.