Finding a Prime Sieve Inconsistency in Python

https://stackoverflow.com/questions/16114614

11-04-2022
|

Question

I'm attempting to learn python and I thought trying to develop my own prime sieve would be an interesting problem for the afternoon. When required thus far, I would just import a version of the Sieve of Eratosthenes that I found online -- it's this that I used as my benchmark.

After trying several different optimizations, I thought I had written a pretty decent sieve:

def sieve3(n):
top = n+1
sieved = dict.fromkeys(xrange(3,top,2), True)
for si in sieved:
    if si * si > top:
        break
    if sieved[si]:
        for j in xrange((si*2) + si, top, si*2):       [****]
            sieved[j] = False
return [2] + [pr for pr in sieved if sieved[pr]]

Using the first 1,000,000 integers as my range, this code would generate the correct number of primes and was only about 3-5x slower than my benchmark. I was about to give up and pat myself on the back when I tried it on a larger range, but it no longer worked!

n = 1,000 -- Benchmark = 168 in 0.00010 seconds
n = 1,000 -- Sieve3 = 168 in 0.00022 seconds

n = 4,194,304 -- Benchmark = 295,947 in 0.288 seconds
n = 4,194,304 -- Sieve3 = 295,947 in 1.443 seconds

n = 4,194,305 -- Benchmark = 295,947 in 3.154 seconds
n = 4,194,305 -- Sieve3 = 2,097,153 in 0.8465 seconds

I think the problem comes from the line with [****], but I can't figure out why it's so broken. It's supposed to mark each odd multiple of 'j' as False and it works most of the time, but for anything above 4,194,304 the sieve is broken. (To be fair, it breaks on random other numbers too, like 10,000 for instance).

I made a change and it significantly slowed my code down, but it would actually work for all values. This version includes all numbers (not just odds) but is otherwise identical.

def sieve2(n):
top = n+1
sieved = dict.fromkeys(xrange(2,top), True)
for si in sieved:
    if si * si > top:
        break
    if sieved[si]:
        for j in xrange((si*2), top, si):
            sieved[j] = False
return [pr for pr in sieved if sieved[pr]]

Can anyone help me figure out why my original function (sieve3) doesn't work consistently?

Edit: I forgot to mention, that when Sieve3 'breaks', sieve3(n) returns n/2.

Solution

The sieve requires the loop over candidate primes to be ordered. The code in question is enumerating the keys of a dictionary, which are not guaranteed to be ordered. Instead, go ahead and use the xrange you used to initialize the dictionary for your main sieve loop as well as the return result loop as follows:

def sieve3(n):
    top = n+1
    sieved = dict.fromkeys(xrange(3,top,2), True)
    for si in xrange(3,top,2):
        if si * si > top:
            break
        if sieved[si]:
            for j in xrange(3*si, top, si*2):     
                sieved[j] = False
    return [2] + [pr for pr in xrange(3,top,2) if sieved[pr]]

OTHER TIPS

It's because dictionary keys are not ordered. Some of the time, by chance, for si in sieved: will loop through your keys in increasing order. With your last example, the first value si gets is big enough to break the loop immediately.

You can simply use: for si in sorted(sieved):

Well, look at the runtime -- you see that the runtime on the last case you showed was almost 5 times faster than the benchmark, while it had usually been 5 times slower. So that is a red flag, maybe you aren't performing all of the iterations? (And it is 5 times faster while having almost 10 times as many primes...)

I don't have time to look into the code more right now, but I hope this helps.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow