Question

I'm using a modified prime generator from here(and here) and there the author suggests that it’s arbitrary where the generator starts as long as the internal data is consistent with the starting value. So I thought of extracting this data and storing it for later use. This is my modified generator:

from itertools import count
def SieveLoadable(level=0,PData={}):
    "modified prime sieve from http://stackoverflow.com/a/19391111"
    if not PData.get(level):#no data, start from scratch
        for i in[2, 3, 5, 7]:
            yield i
        D,C={},9
        ps=SieveLoadable(level+1)
        next(ps)
        p=next(ps)
        psq=p*p

    else:#data! load it
        D,C,psq,p=PData[level]
        ps=SieveLoadable(level+1,PData)
        while p<=7:#have next level down skip the values it gave before
            p=next(ps)

    for i in count(C,2):
        if STOP:#set this outside of the generator
        #store data in a dict outside the generator
            Internals[level]=[D,i,psq,p]
            w=next(ps)#call the next level down
            # it will hit this if statement and stop before changing its data
            break

        if i in D:
            step=D.pop(i)
        elif i<psq:
            yield i
            continue
        else:
            step=2*p
            p=next(ps)
            psq=p*p

        i+=step
        while i in D:
            i+=step
        D[i]=step

This works to some extent, but I’ve noticed starting and stopping makes it skip some primes (e.g. restarting it every 1 million primes makes it skip 32452883 and 32452909), so how can I make this so it doesn’t skip a single prime?

Here is how I’m calling the generator:

import pickle, os
PrimeFolder='C:\\Primes'
sieveData='\\'.join([PrimeFolder,"Internals.dmp"])
Internals={}
STOP=False
numPrimes = 1e6

if not os.path.exists(PrimeFolder):os.makedirs(PrimeFolder)

if os.path.exists(sieveData):#load from previous run
    File=open(sieveData,'rb')
    Internals = pickle.load(File)
    File.close()

for i,p in enumerate(SieveLoadable(0,Internals)):
    #store p in a list here

    if not i:print('Starting at: {:,}'.format(p))
    if i>=numPrimes:#amount of primes to generate at a time
        #dump list of primes to file in this if statement

        print('Stopping at: {:,}'.format(p))
        STOP=True#stop the generator

File=open(sieveData,'wb')#save for next time
pickle.dump(Internals,File)
File.close()

While I started with this specific prime generator, any prime generator that can dump their data and reload it for later use would be appreciated.

Was it helpful?

Solution 2

Using Will Ness’ suggestion, I got a two level form of his prime generator to be paused/stopped and started at any point; provided it’s not stopped after the first 4 primes that is. Here are the modified generators.

The internal generator:

from itertools import count
def sieveL(l=0,Data={}):#l is not used, its how the test calls the generators
    '''modified from 
    http://code.activestate.com/recipes/117119-sieve-of-eratosthenes/
    L for loadable'''


    if Data.get('I',0):#there's data from a previous run, load it
        D,C=Data['I'] #I for inner

    else:#no data, start from scratch
        yield 2 
        D={}
        C=3#starting counter

    for i in count(C,2):
        if STOP:
            Internals['I']=[D,i]#store the current counter and the internal data
            break
        s=D.pop(i, 0)
        if not s:
            yield i
            D[i*i]=2*i
        else:
            i+=s
            while i in D:i+=s
            D[i]=s

And the external generator:

from itertools import count
def postponed_sieveL(level=0,PData={}):
    '''uses a seperate internal generator (two level form) - sieveL
    modified from 
    https://stackoverflow.com/a/10733621, https://stackoverflow.com/a/19391111
    '''
    #this should be fine unless you stop the generator 
    #  before it passes the first four primes

    dat=PData.get(level,0)#load any previous data
    if not dat:#no previous data, start from 2
        for i in [2, 3, 5, 7]:
            yield i

        D,C={},9
        ps=sieveL('I',PData)#inner generator
        next(ps)
        p=next(ps)
        psq=p*p

    else:#past the inital primes, load from previous run  
        D,C,psq,p=PData[level]
        ps=sieveL('I',PData)#inner generator

    for i in count(C,2):
        if STOP:#set this outside of the generator

        #dict, current index (count value), squared prime and prime
            Internals[level]=[D,i,psq,p]

            w=next(ps)#inform the inner generator that it should stop
            break

        if i in D:
            step=D.pop(i)
        elif i<psq:
            yield i
            continue
        else:
            step=2*p
            p=next(ps)
            psq=p*p
        i+=step
        while i in D:
            i+=step
        D[i]=step

While looking at this I had actually managed to work his recursive prime generator into something that can be stopped and started as well. It’s not to my liking, but it works. Handling those initial primes was harder than I thought it should have been. Here’s the recursive form:

from itertools import count
def postponed_sieveR(level=0,PData={}):
    '''recursive form of postponed_sieve
    modified from 
    https://stackoverflow.com/a/10733621
    https://stackoverflow.com/a/19391111
    '''
    dat=PData.get(level,[0,0,0])#load any previous data
    #inital data is stored as [0, current_prime, starting_index]

    if not dat[0]:# handling the inital primes
        init=[2, 3, 5, 7]
        p,srt=dat[1:]
        i=srt

        if p<=7:
            for i,p in enumerate(init[srt:]):#start
                i+=srt#correct the index
                if STOP:
                    break
                yield p

                #to prevent getting caught constantly returning 7 after reloads
                if p==init[-1]:p+=2

        if STOP:# store the data
            Internals[level]=[0,p,i]
            return# no lower levels, so return

        D,C={},9
        ps=postponed_sieveR(level+1,PData)
        next(ps)
        p=next(ps)
        psq=p*p

    else:#past the inital primes, load from previous run  
        D,C,psq,p=PData[level]
        ps=postponed_sieveR(level+1,PData)

    for i in count(C,2):

        if STOP:#set this outside of the generator

            #dict, current index (count value), squared prime and prime
            Internals[level]=[D,i,psq,p]
            w=next(ps)#call the next level down 
            #(it will hit this if statement and stop before changing its data)
            break

        if i in D:
            step=D.pop(i)
        elif i<psq:
            yield i
            continue
        else:
            step=2*p
            p=next(ps)
            psq=p*p

        i+=step
        while i in D:
            i+=step
        D[i]=step

Interestingly, the recursive form uses about half the space of the two level form. My initial thought was that they’d be about the same, but now thinking about it it makes sense. The recursive form is a stack of dictionaries storing a square root amount they need while the two level form is a linear amount and a square root amount. Here is the code I used to test the generators and their results. I might have gone overboard with the formatting to make the fancy table.

import os
import pickle
from time import time

Internals={} #generator's internal data
STOP=False # flag for stopping the generator

Max=10639 # No. primes to generate per segment
segments=83 # No. segments to unload, reload the data from the generator

            # name  : generator
generators={'1 sieveL':sieveL,
            '2 postponed_sieveL - two level form':postponed_sieveL,
            '3 postponed_sieveR - recursive form':postponed_sieveR,
            }

print 'Doing {:,} segment{}, each generating {:,} prime{} ({:,} prime{})\n'.format(segments,''if segments==1 else's',
                                                                      Max,''if Max==1 else's',Max*segments,''if Max*segments==1 else's')
#find sum of primes of a non paused generator for comparison
Pcom=0
t1=time()
for i,k in enumerate(postponed_sieveR()):
    Pcom+=k
    if i==(Max*segments)-1:
        break
del_t=time()-t1

NamCen=max([len(i)for i in generators.keys()])
col_1='Generator Name'.center(NamCen)
col_2='Sum of all generated primes'
col_3='Data size (Bytes)'
col_4='Time (Seconds)'
Size='N/A'

# table and non paused generator
print(' | '.join([col_1,col_2,col_3,col_4]))
print(' | '.join(['-'*len(col_1),'-'*len(col_2),'-'*len(col_3),'-'*len(col_4)])) 
print(' | '.join(['0 Non-paused sieve'.ljust(len(col_1)),'{:,}'.format(Pcom).center(len(col_2)),
                  Size.center(len(col_3)),'{:06.03f}'.format(del_t).center(len(col_4))]))

for name,gen in sorted(generators.items()):
    Psum=0#reset the sum and the data storage for the next sieve
    Internals={}
    t1=time()

    #print('\nstarting {}'.format(name))
    for segs in range(segments):
        for i,j in enumerate(gen(0,Internals)):
            Psum+=j
            if i==Max-1:
                STOP=True
        STOP=False
        #print('\tsegment {}; stopped at {}'.format(segs,j))
    del_t=time()-t1

    # dump data (this section can be commented out without issues)
    DataPath="C:\\data.prime"
    Data=open(DataPath,'wb')
    pickle.dump(Internals,Data)
    Data.close()
    Size=os.path.getsize(DataPath)# find size of data dump after last segment
    os.remove(DataPath)# then remove it, data was only dumped to find file size

    # show stats
    print(' | '.join([name.ljust(len(col_1)),'{:,}'.format(Psum).center(len(col_2)),
                     (Size if type(Size)==str else '{:,}'.format(Size)).center(len(col_3)),
                      '{:06.03f}'.format(del_t).center(len(col_4))]))

And its output:

Doing 83 segments, each generating 10,639 primes (883,037 primes)

           Generator Name           | Sum of all generated primes | Data size (Bytes) | Time (Seconds)
----------------------------------- | --------------------------- | ----------------- | --------------
0 Non-paused sieve                  |      5,774,833,097,284      |        N/A        |     03.114    
1 sieveL                            |      5,774,833,097,284      |     24,195,100    |     04.219
2 postponed_sieveL - two level form |      5,774,833,097,284      |       16,618      |     03.175    
3 postponed_sieveR - recursive form |      5,774,833,097,284      |       8,988       |     03.057    

year+ edit: I've been able to make a much better version of the saving recursive generator:

from itertools import count

def Generator(level=0, PData={}):
    '''prime generator that can save its internal data
    Refs:  https://stackoverflow.com/a/10733621
           https://stackoverflow.com/a/19391111
           https://stackoverflow.com/a/23258396'''
    # this version works if you don't stop before the first 4 primes
    dat=PData.get(level, 0)

    if not dat:# handling the initial primes
        for p in [2, 3, 5, 7]:
            yield p

        if STOP:return#lowest level has nothing to store so return

        D,c={},9
        ps=Generator(level+1, PData)
        next(ps);p=next(ps)
        psq=p*p

    else:       # past the initial primes, load from previous run
        D,c,p=PData[level]
        psq=p*p # need the squared prime, it was not stored
        g=0     # correction factor
        ps=Generator(level+1,PData)

        #if p<=7, its the lowest level, so skip previous values it gave before.
        while g<p and p<=7:g=next(ps)

    #all primes after initial set 
    for i in count(c, 2):

        if STOP:
            Internals[level]=[D,i,p]#store dict, current value and prime
            next(ps)#call the next level down
            #it will hit this if statement and stop before changing its data
            break

        step=D.pop(i, 0)
        if not step:
            if i<psq:
                yield i
                continue
            else:
                step=2*p
                p=next(ps)
                psq=p*p
        i+=step
        while i in D:
            i+=step
        D[i]=step

Instead of trying to handle those initial primes, I just ignore them. It works out better that way and I think is cleaner. On reloading the data, I have it reset the lowest level to where it was before. Here's how it compares to the ones above:

Doing 83 segments, each generating 10,639 primes (883,037 primes)

           Generator Name           | Sum of all generated primes | Data size (Bytes) | Time (Seconds)
----------------------------------- | --------------------------- | ----------------- | --------------
0 Non-paused sieve                  |      5,774,833,097,284      |        N/A        |     02.923    
1 sieveL                            |      5,774,833,097,284      |     24,195,100    |     04.169    
2 postponed_sieveL - two level form |      5,774,833,097,284      |       16,618      |     03.151    
3 postponed_sieveR - recursive form |      5,774,833,097,284      |       8,988       |     03.007    
4 Generator                         |      5,774,833,097,284      |       8,938       |     03.038

OTHER TIPS

Without looking into your code, a comment about the algorithm: it recursively creates a tower of primes generators, each reaching up to the square root of the production point of a generator above it.

But this was done mainly for code brevity. The inner primes generator can well be a regular, non-postponed sieve generator as seen in the original ActiveState code. It reaches only up to the square root of the top generator's limit anyway, and the space complexity doesn't change, which is why this code shortcut was acceptable in the first place. The code can be seen in the test entry on Ideone, as mentioned in my answer that you cite.

That way you will have only two dictionaries to store and reload. You can even maintain the two dictionaries explicitly inside one generator:

                        /
                       /
              generator          {primes are just produced}
             /
            /              {top internal dict uses separate supply of primes}
           /
   internal_loop       {each prime produced is added into the loop dict as well}
  /             \
  \_____________/

This is the same as the difference, in Haskell, between using

_Y g = g (_Y g)      -- recursive tower of generators

and

_Y g = g x           -- two-staged production with 
    where  
         x = g x     --   an internal loop

to express the recursive creation of supply.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top