Pergunta

I'm playing through project Euler in my spare time, and it's come to the point where I need to do some refactoring. I've implemented Miller-Rabin, as well as a few sieves. I've heard before that sieves are actually faster for small-ish numbers, as in under a few million. Does anybody have any information on this? Google wasn't very helpful.

Foi útil?

Solução

Yes, you'll find with most algorithms that you can trade space for time. In other words, by allowing the use of more memory, the speed is greatly increased *a.

I don't actually know the Miller-Rabin algorithm but, unless it's simpler than a single shift-left/add and memory extraction, it will be blown out of the water by a pre-calculated sieve.

The important thing here is pre-calculated. It's a good idea, in terms of performance, to pre-calculate things like this since the first million primes will be unlikely to change in the near future :-)

In other words, create your sieve with something like:

unsigned char primeTbl[] = {0,0,1,1,0,1,0,1,0,0,0,1};
#define isPrime(x) ((x < sizeof(primeTbl) ? primeTbl[x] : isPrimeFn(x))

with all the usual caveats about not passing things like a++ into macros. This gives you the best of both worlds, a blindingly fast table lookup for "small-ish" primes, dropping back to a calculation method for those outside the range.

Obviously you would write a program using one of the other methods to generate that lookup table - you don't really want to have to type it all in by hand.

But, as with all optimisation questions, measure, don't guess!


*a A classic case of this was some trig functions I once had to write for an embedded system. This was a competitive contract bid and the system had a little more storage than CPU grunt.

We actually won the contract since our benchmark figures for the functions blew the competition away.

Why? Because we pre-calculated the values into a lookup table originally calculated on another machine. By judicious use of reduction (bringing the input values down below 90 degrees) and trig properties (the fact that cosine is just a phase shift of sine and that the other three quadrants are related to the first), we got the lookup table down to 180 entries (one per half degree).

The best solutions are those that are elegant and devious :-)


For what it's worth, the following C code will generate such a table for you, all the primes below four million (283,000 of them).

#include <stdio.h>

static unsigned char primeTbl[4000000];

int main (void) {
    int i, j;

    for (i = 0; i < sizeof(primeTbl); i++)
        primeTbl[i] = 1;

    primeTbl[0] = 0;
    primeTbl[1] = 0;
    for (i = 2; i < sizeof(primeTbl); i++)
        if (primeTbl[i])
            for (j = i + i; j < sizeof(primeTbl); j += i)
                primeTbl[j] = 0;

    printf ("static unsigned char primeTbl[] = {");
    for (i = 0; i < sizeof(primeTbl); i++) {
        if ((i % 50) == 0) {
            printf ("\n   ");
        }
        printf ("%d,", primeTbl[i]);
    }
    printf ("\n};\n");
    printf ("#define isPrime(x) "
        "((x < sizeof(primeTbl) ? primeTbl[x] : isPrimeFn(x))\n");

    return 0;
}

If you can bump up the primeTbl table to sixteen million entries (16M), you'll find that's enough to keep the prime count above a million (the first 1,031,130 primes).

Now there are ways to make that take less storage such as only storing odd numbers and adjusting the macro to take care of that, or using a bit mask instead of unsigned characters. I prefer simplicity of algorithms myself if the memory is available.

Outras dicas

I recommend a tiered approach. First, make sure there are no small prime factors. Trial-dividing by the first 20 or 30 primes works, though if you use a clever approach you can reduce the number of divisions needed by using gcds. This step filters out about 90% of the composites.

Next, test if the number is a strong probable prime (Miller-Rabin test) to base 2. This step removes almost all remaining composites, but some rare composites can pass.

The final proving step depends on how large you want to go. If you are willing to work in a small range, do a binary search on a list of 2-pseudoprimes up the the largest you allow. If that's 2^32, your list will have only 10,403 members, so the lookup should take only 14 queries.

If you want to go up to 2^64, it now suffices (thanks to the work of Jan Feitisma) to check if the number is a BPSW pseudoprime. (You could also download the 3 GB list of all exceptions, remove those which trial division would remove, and write a disk-based binary search.) T. R. Nicely has a nice page explaining how to implement this reasonably efficiently.

If you need to go higher, implement the above method and use it as a subroutine for a Pocklington-style test. This stretches the definition of "small-ish"; if you want more information on these methods, just ask.

As a variant on the notion of pre-computation, you can first cheaply check whether the candidate number p is divisible by 2, 3, 5, 7, or 11. If not, then declare p prime if 2p-1 = 1 (mod p). This will fail at some point, but it works up to 100 million because I tested it (pre-computation).

In other words, all the small-ish Fermat pseudo-primes to the base 2 are divisible by one of 3, 5, 7, or 11.

EDIT:

As correctly noted by @starblue, the above is simply wrong. I had a bug in my program. The best I can do is amend the above to:

If candidate p is divisible by 2, 3, 5, 7, or 11, declare it composite;
Else if p is one of {4181921, 4469471, 5256091, 9006401, 9863461}, declare it composite;
Else if p passes the Miller-Rabin test for bases 2 and 5 then declare it prime;
Else declare it composite.

This I tested for integers less than 10,000,000. Perhaps a different pair of bases would do even better.

Please accept my apologies for my mistakes.

EDIT 2:

Well, it appears that the information I was after is already on the Wikipedia page for the Miller-Rabin algorithm, the section titled "Deterministic variants of the test".

The only way is to benchmark yourself. When you do, write it up, and post it online somewhere.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top