How can I optimize my C / x86 code?

https://stackoverflow.com/questions/2222935

19-09-2019
|

Question

int lcm_old(int a, int b) {
    int n;
    for(n=1;;n++)
        if(n%a == 0 && n%b == 0)
            return n;  
}

int lcm(int a,int b)  {
    int n = 0;
    __asm {
lstart:
        inc n;
        mov eax, n;
        mov edx, 0;
        idiv a;
        mov eax, 0;
        cmp eax, edx;
        jne lstart;
        mov eax, n;
        mov edx, 0;
        idiv b;
        mov eax, 0;
        cmp eax, edx;
        jnz lstart;
    }
    return n;
}

I'm trying to beat/match the code for the top function with my own function (bottom). Have you got any ideas how I can optimize my routine?

PS. This is just for fun.

Solution

I'm going to assume you want to keep the same algorithm. This should at least be a slightly more efficient implementation of it. The main difference is that the code in the loop only uses registers, not memory.

int lcm(int a,int b)  {
    __asm {
        xor ecx, ecx
        mov esi, a
        mov edi, b
lstart:
        inc ecx
        mov eax, ecx
        xor edx, edx
        idiv esi
        test edx, edx
        jne lstart
        mov eax, ecx;
        idiv edi
        test edx, edx
        jnz lstart
        mov eax, ecx
        leave
        ret
    }
}

As Jason pointed out, however, this really isn't a very efficient algorithm -- multiplying, finding the GCD, and dividing will normally be faster (unless a and b are quite small).

Edit: there is another algorithm that's almost simpler to understand, that should also be a lot faster (than the original -- not than multiplying, then dividing by GCD). Instead of generating consecutive numbers until you find one that divides both a and b, generate consecutive multiples of one (preferably the larger) until you find one that divides evenly by the other:

int lcm2(int a, int b) { 
    __asm { 
        xor ecx, ecx
        mov esi, a
        mov edi, b
    lstart:
        add ecx, esi
        mov eax, ecx
        xor edx, edx
        idiv edi
        test edx, edx
        jnz lstart
        mov eax, ecx
        leave
        ret
    }
}

This remains dead simple to understand, but should give a considerable improvement over the original.

OTHER TIPS

I would optimize by using a different algorithm. Searching linearly like you are doing is super-slow. It's a fact that the least common mulitple of two natural numbers is the quotient of their product divided by their greatest common divisor. You can compute the greatest common divisor quickly using the Euclidean algorithm.

Thus:

int lcm(int a, int b) {
    int p = a * b;
    return p / gcd(a, b);
}

where you need to implement gcd(int, int). As the average number of steps in the Euclidean algorithm is O(log n), we beat the naive linear search hands down.

There are other approaches to this problem. If you had an algorithm that could quickly factor integers (say a quantum computer) then you can also solve this problem like so. If you write each of a and b into its canonical prime factorization

a = p_a0^e_a0 * p_a1^e_a1 * ... * p_am^e_am
b = p_b0^e_b0 * p_b1^e_b1 * ... * p_bn^e_bn

then the least common multiple of a and b is the obtained by taking for each prime factor appearing in at least one of the factorizations of a and b, taking it with the maximum exponent that it appears in the factorization of a or b. For example:

28 = 2^2 * 7
312 = 2^3 * 39

so that

lcm(28, 312) = 2^3 * 7 * 39 = 2184

All of this is to point out that naive approaches are admirable in their simplicity but you can spend all day optimizing every last nanosecond out of them and still not beat a superior algorithm.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow