How can I optimize my C / x86 code?
-
19-09-2019 - |
Question
int lcm_old(int a, int b) {
int n;
for(n=1;;n++)
if(n%a == 0 && n%b == 0)
return n;
}
int lcm(int a,int b) {
int n = 0;
__asm {
lstart:
inc n;
mov eax, n;
mov edx, 0;
idiv a;
mov eax, 0;
cmp eax, edx;
jne lstart;
mov eax, n;
mov edx, 0;
idiv b;
mov eax, 0;
cmp eax, edx;
jnz lstart;
}
return n;
}
I'm trying to beat/match the code for the top function with my own function (bottom). Have you got any ideas how I can optimize my routine?
PS. This is just for fun.
Solution
I'm going to assume you want to keep the same algorithm. This should at least be a slightly more efficient implementation of it. The main difference is that the code in the loop only uses registers, not memory.
int lcm(int a,int b) {
__asm {
xor ecx, ecx
mov esi, a
mov edi, b
lstart:
inc ecx
mov eax, ecx
xor edx, edx
idiv esi
test edx, edx
jne lstart
mov eax, ecx;
idiv edi
test edx, edx
jnz lstart
mov eax, ecx
leave
ret
}
}
As Jason pointed out, however, this really isn't a very efficient algorithm -- multiplying, finding the GCD, and dividing will normally be faster (unless a
and b
are quite small).
Edit: there is another algorithm that's almost simpler to understand, that should also be a lot faster (than the original -- not than multiplying, then dividing by GCD). Instead of generating consecutive numbers until you find one that divides both a
and b
, generate consecutive multiples of one (preferably the larger) until you find one that divides evenly by the other:
int lcm2(int a, int b) {
__asm {
xor ecx, ecx
mov esi, a
mov edi, b
lstart:
add ecx, esi
mov eax, ecx
xor edx, edx
idiv edi
test edx, edx
jnz lstart
mov eax, ecx
leave
ret
}
}
This remains dead simple to understand, but should give a considerable improvement over the original.
OTHER TIPS
I would optimize by using a different algorithm. Searching linearly like you are doing is super-slow. It's a fact that the least common mulitple of two natural numbers is the quotient of their product divided by their greatest common divisor. You can compute the greatest common divisor quickly using the Euclidean algorithm.
Thus:
int lcm(int a, int b) {
int p = a * b;
return p / gcd(a, b);
}
where you need to implement gcd(int, int)
. As the average number of steps in the Euclidean algorithm is O(log n)
, we beat the naive linear search hands down.
There are other approaches to this problem. If you had an algorithm that could quickly factor integers (say a quantum computer) then you can also solve this problem like so. If you write each of a
and b
into its canonical prime factorization
a = p_a0^e_a0 * p_a1^e_a1 * ... * p_am^e_am
b = p_b0^e_b0 * p_b1^e_b1 * ... * p_bn^e_bn
then the least common multiple of a
and b
is the obtained by taking for each prime factor appearing in at least one of the factorizations of a
and b
, taking it with the maximum exponent that it appears in the factorization of a
or b
. For example:
28 = 2^2 * 7
312 = 2^3 * 39
so that
lcm(28, 312) = 2^3 * 7 * 39 = 2184
All of this is to point out that naive approaches are admirable in their simplicity but you can spend all day optimizing every last nanosecond out of them and still not beat a superior algorithm.