Looking at this simple code:
long f(int a,int b){
return a*b;
}
We see 3 different asm generated by gcc:
movl %edi, %eax
imull %esi, %eax
cltq
clang:
imull %esi, %edi
movslq %edi, %rax
intel:
movslq %esi, %rsi
movslq %edi, %rdi
imulq %rsi, %rdi
movq %rdi, %rax
Basically, you can multiply 2 32-bit numbers (imull
) then sign-extend the result. Or you can sign-extend the operands then multiply them as 64-bit numbers (imulq
), then you should in principle keep only the low 32 bits and sign-extend them, but that's unnecessary because the cases where that matters are those where there was an overflow (undefined behavior), and this optimization (removing the final sign-extension) is precisely what you observed.