Why is this scala prime generation so slow/memory intensive?

https://stackoverflow.com/questions/6802112

22-10-2019
|

Question

I run out of memory while finding the 10,001th prime number.

object Euler0007 {
  def from(n: Int): Stream[Int] = n #:: from(n + 1)
  def sieve(s: Stream[Int]): Stream[Int] = s.head #:: sieve(s.filter(_ % s.head != 0))
  def primes = sieve(from(2))
  def main(args: Array[String]): Unit = {
    println(primes(10001))
  }
}

Is this because after each "iteration" (is this the correct term in this context?) of primes, I increase the stack of functions to be called to get the next element by one?

One solution that I've found on the web which doesn't resort to an iterative solution (which I'd like to avoid to get into functional programming/idiomatic scala) is this (Problem 7):

lazy val ps: Stream[Int] = 2 #:: Stream.from(3).filter(i => ps.takeWhile(j => j * j <= i).forall(i % _ > 0))

From what I can see, this does not lead to this recursion-like way. Is this a good way to do it, or do you know of a better way?

Solution

One reason why this is slow is that it isn't the sieve of Eratosthenes. Read http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf for a detailled explanation (the examples are in Haskell, but can be translated directly into Scala).

My old solution for Euler problem #7 wasn't the "true" sieve either, but it seems to work good enough for little numbers:

object Sieve {

    val primes = 2 #:: sieve(3)

    def sieve(n: Int) : Stream[Int] =
          if (primes.takeWhile(p => p*p <= n).exists(n % _ == 0)) sieve(n + 2)
          else n #:: sieve(n + 2)

    def main(args: Array[String]) {
      println(primes(10000)) //note that indexes are zero-based
    }
}

I think the problem with your first version is that you have only defs and no val which collects the results and can be consulted by the generating function, so you always recalculate from scratch.

OTHER TIPS

Yes, it is because you "increase the stack of functions to be called to get the next element, by one after each "iteration" " - i.e. add a new filter on top of stack of filters each time after getting each prime. That's way too many filters.

This means that each produced prime gets tested by all its preceding primes - but only those below its square root are really needed. For instance, to get the 10001-th prime, 104743, there will be 10000 filters created, at run-time. But there are just 66 primes below 323, the square root of 104743, so only 66 filters were really needed. All the 9934 others will be there needlessly, taking up memory, hard at work producing absolutely no added value.

This is the key deficiency of that "functional sieve", which seems to have originated in the 1970s code by David Turner, and later have found its way into the SICP book and other places. It is not that it's a trial division sieve (rather than the sieve of Eratosthenes). That's far too remote a concern for it. Trial division, when optimally implemented, is perfectly capable of producing the 10000th prime very fast.

The key deficiency of that code is that it does not postpone the creation of filters to the right moment, and ends up creating far too many of them.

Talking complexities now, the "old sieve" code is O(n²), in n primes produced. The optimal trial division is O(n^1.5/log^0.5(n)), and the sieve of Eratosthenes is O(n*log(n)*log(log(n))). As empirical orders of growth the first is seen typically as ~ n^2, the second as ~ n^1.45 and the third ~ n^1.2.

You can find Python generators-based code for optimal trial division implemented in this answer (2nd half of it). It was originally discussed here dealing with the Haskell equivalent of your sieve function.

Just as an illustration, a "readable pseudocode" :) for the old sieve is

primes = sieve [2..]  where
   sieve (x:xs) = x : sieve [ y | y <- xs, rem y x > 0 ]
                         -- list of 'y's, drawn from 'xs',
                         --         such that (y % x > 0)

and for optimal trial division (TD) sieve, synchronized on primes' squares,

primes = sieve [2..] primes   where
  sieve (x:xs) ps = x : (h ++ sieve [ y | y <- t, rem y p > 0 ] qs)
          where
            (p:qs) = ps     -- 'p' is head elt in 'ps', and 'qs' the rest
            (h,t)  = span (< p*p) xs    -- 'h' are elts below p^2 in 'xs'
                                        -- and 't' are the rest

and for a sieve of Eratosthenes, devised by Richard Bird, as seen in that JFP article mentioned in another answer here,

primes = 2 : minus [3..] 
               (foldr (\p r-> p*p : union [p*p+p, p*p+2*p..] r) [] primes)
                      -- function of 'p' and 'r', that returns 
                      -- a list with p^2 as its head elt, ...

Short and fast. (minus a b is a list a with all the elts of b progressively removed from it; union a b is a list a with all the elts of b progressively added to it without duplicates; both dealing with ordered, non-decreasing lists). foldr is the right fold of a list. Because it is linear this runs at ~ n^1.33, to make it run at ~ n^1.2 the tree-like folding function foldi can be used).

The answer to your second question is also a yes. Your second code, re-written in same "pseudocode",

ps = 2 : [i | i <- [3..], all ((> 0).rem i) (takeWhile ((<= i).(^2)) ps)]

is very similar to the optimal TD sieve above - both arrange for each candidate to be tested by all primes below its square root. While the sieve arranges that with a run-time sequence of postponed filters, the latter definition re-fetches the needed primes anew for each candidate. One might be faster than another depending on a compiler, but both are essentially the same.

And the third is also a yes: the sieve of Eratosthenes is better,

ps = 2 : 3 : minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- drop 1 ps])

unionAll = foldi union' []          -- one possible implementation
union' (x:xs) ys = x : union xs ys
   -- unconditionally produce first elt of the 1st arg 
   -- to avoid run-away access to infinite lists

It looks like it can be implemented in Scala too, judging by the similarity of other code snippets. (Though I don't know Scala). unionAll here implements tree-like folding structure (click for a picture and full code) but could also be implemented with a sliding array, working segment by segment along the streams of primes' multiples.

TL;DR: yes, yes, and yes.

FWIW, here's a real Sieve of Eratosthenes:

def sieve(n: Int) = (2 to math.sqrt(n).toInt).foldLeft((2 to n).toSet) { (ps, x) => 
    if (ps(x)) ps -- (x * x to n by x) 
    else ps
}

Here's an infinite stream of primes using a variation on the Sieve of Eratosthenes that preserves its fundamental properties:

case class Cross(next: Int, incr: Int)

def adjustCrosses(crosses: List[Cross], current: Int) = {
  crosses map {
    case cross @ Cross(`current`, incr) => cross copy (next = current + incr)
    case unchangedCross                 => unchangedCross
  }
}

def notPrime(crosses: List[Cross], current: Int) = crosses exists (_.next == current)

def sieve(s: Stream[Int], crosses: List[Cross]): Stream[Int] = {
    val current #:: rest = s

    if (notPrime(crosses, current)) sieve(rest, adjustCrosses(crosses, current))
    else current #:: sieve(rest, Cross(current * current, current) :: crosses)
}

def primes = sieve(Stream from 2, Nil)

This is somewhat difficult to use, however, since each element of the Stream is composed using the crosses list, which has as many numbers as there have been primes up to a number, and it seems that, for some reason, these lists are being kept in memory for each number in the Stream.

For example, prompted by a comment, primes take 6000 contains 56993 would throw a GC exception whereas primes drop 5000 take 1000 contains 56993 would return a result rather fast on my tests.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow