Question

If I write

int main(int argc, char *argv[])
{
    int temp[50][3];
    return &temp[argc] - &temp[0];
}

and compile it with Visual C++, I get back:

009360D0 55                   push        ebp  
009360D1 8B EC                mov         ebp,esp  
009360D3 8B 45 08             mov         eax,dword ptr [argc]  
009360D6 8D 0C 40             lea         ecx,[eax+eax*2]  
009360D9 B8 AB AA AA 2A       mov         eax,2AAAAAABh  
009360DE C1 E1 02             shl         ecx,2  
009360E1 F7 E9                imul        ecx  
009360E3 D1 FA                sar         edx,1  
009360E5 8B C2                mov         eax,edx  
009360E7 C1 E8 1F             shr         eax,1Fh  
009360EA 03 C2                add         eax,edx  
009360EC 5D                   pop         ebp  
009360ED C3                   ret  

Why am I getting an imul instruction here instead of just bit shifts, etc.? I find this pretty annoying, because I'm doing pointer arithmetic like this in a tight loop, and I'm suspecting imul is killing its performance. In any case, it shouldn't be necessary.

Is there a good way to prevent it in general and instead replace it with cheaper operations?

Update:

In my original program, I tried adding a dummy variable to make the size of each element a multiple of 4 instead of 3 so the compiler can use bit-shifting instead of division.

The result? Even though the data structure was bigger, the running time of the program decreased from 9.2 seconds to 7.4 seconds.

So yes, this is indeed very slow.

Was it helpful?

Solution

Why am I getting an imul instruction here instead of just bit shifts, etc.?

The multiplication is to divide by 3 (the size of each inner array), using the fact that 0x2AAAAAAB is 231/3. You can't do that with a small number of shifts and adds; multiplication really is the fastest option.

I'm suspecting imul is killing its performance.

On most modern platforms, integer multiplication is usually as fast as simpler operations, so it may be the fastest option even when it could be replaced by a couple of shifts and adds. When you have performance issues, always measure to find the real bottlenecks; they're often in the places you least suspect.

Is there a good way to prevent it in general and instead replace it with cheaper operations?

On platforms where multiplication really is expensive: avoid arrays of awkwardly sized data structures; and avoid subtracting pointers, since that requires division by the object size.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top