Coalescing refers to combining memory requests from individual threads in a warp into a single memory transaction.
A single memory transaction is typically a 128 byte cache line, therefore it would consist of eight 128 bit (e.g. float4
) quantities.
So, yes, there is a benefit to having multiple threads requesting adjacent 128 bit quantities, because these can still be coalesced into a single (128 byte) cache line request to memory.