What's causing the 135k/sec page faults in this python code? (trial-division prime sieve)

https://stackoverflow.com/questions/12356257

01-07-2021
|

Question

The following code executes in a work thread, and just happily spins away, receiving pause/report commands, etc. I assume they're soft faults since I'm not having RAM use issues and my hard drive didn't melt over the weekend, and that it's got something to do with how long I've let results accumulate, as I'm only getting ~50 page faults when I started it up a few days ago.

The "counter" attribute is currently at 22,496,115, and "results" has 1,418,641 elements. The slice of "results" is taken because I was feeling contrary and started the list at 1.

def run(self):
    while self.keep_running:
        self.lock.acquire()

        is_prime = True
        self.counter += 1
        cutoff_val = pow(self.counter,.5)
        for number in self.results[1:]:
            if number > cutoff_val:
                break

            if self.counter % number == 0:
                is_prime = False
                break

        if is_prime:
            self.results.append(self.counter)

        self.lock.release()

Note: I know that I could use the Sieve of Eratosthenes to optimize the algorithm and probably cut down on the page faults, but that's not the point: I'm trying to pinpoint the exact reason - or at least the worst offender - behind the page faults so I can avoid doing the same kinds of things in the future. This algorithm is used simply for testing UI responsiveness when I need a "stupidly expensive, simple work thread."

Additional setup pieces as requested:

def __init__(self):
    self.counter = 0
    self.keep_running = False;
    self.lock = threading.Lock()
    self.results = list()

def __call__(self, *args):
    if not self.keep_running:
        self.keep_running = True
        self.run()

Solution

I think that @John Gaines Jr. has pointed out something you need to change. If your list is really big, you don't want to be making copies of it like that.

Here is a good way to loop over the same values as self.results[1:] but without making a copy:

res = iter(self.results)  # get an iterator for the list values
next(res)  # iterate once to throw away first value
for number in res:
    # same code you already have goes here
    ...

EDIT: The above code is correct and simple, but doesn't scale well. I thought about it and figured there must be something in itertools for this, and sure enough there is:

import itertools as it
res = it.islice(self.results, 1, None)
for number in res:
    # same code you already have goes here
    ...

EDIT: Thank you to @John Gaines Jr. for pointing out that you can use None instead of len(self.results) in the call to it.islice().

OTHER TIPS

From the Python tutorial Lists section:

All slice operations return a new list containing the requested elements. This means that the following slice returns a shallow copy of the list a:

>>> a[:]
['spam', 'eggs', 100, 1234]

So in your for loop, the bit self.results[1:] results in a copy of the results list. If this routine is being called over and over, it could quite easily cause memory thrashing.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow