efficient sieve of Euler in stream processing style

Question

primesEU = 2:sieve [3,5 ..] where
    sieve (p:i:xt) = p:(sieve (i:xt) `minus` (lps i)) where
        lps i = map (i*) (takeWhile' (\p->(p<i)&&(mod i p /= 0)) primesEU)

First of all, the implementation is incorrect, it considers 9 prime. I suspect you meant sieve (i:xt) `minus` lps p there.

Then, since the sieved list contains only odd numbers, you can restrict to tail primesEU in lps, which makes a small difference.

So what is happening here?

The gist of it (I'm off to dinner in a moment, will expand when I come back) is that each prime p has to bubble through all the minus calls created by the odd numbers (>= 3) smaller than p [not quite, a few minus are eliminated before]. That's (p-3)/2 layers, each costing one step.

So, constant factors and composites aside, producing the primes <= n costs O(∑ p), where the sum is over the primes not greater than n. That's O(n²/log n).

Let us follow a few steps of the evaluation of sieve [3,5 .. ]. For brevity, I denote by s k the list sieve [k, k+2 ..] and minus by (\\). I use the corrected definition

sieve (k:ks) = k : (sieve ks `minus` lps k)

that doesn't consider 9 prime. I expand the list of multiples immediately, which reduces the work somewhat. We get

 (:)
 / \
3  (\\)
   /  \
 s 5  [9]

immediately, and the 3 can then be consumed directly. After that, the minus topping the right subtree demands that s 5 be evaluated enough to tell whether it's empty.

   (\\)
   /  \
 (:)  [9]
 / \
5  (\\)
   /  \
 s 7  [15,25]

That's done in one step. (Then it needs to be determined whether lps 3 is empty, and similarly later, but for each member of a lps k, it's only one step, so we can ignore that.) Then minus needs a comparison to bubble the 5 to the top, leaving

   (:)
   / \
  5  (\\)
     /  \
  (\\)  [9]
  /  \
s 7  [15,25]

Now, after the 5 has been consumed, the right child of the top (:) needs to be expanded

      (\\)
      /  \
   (\\)  [9]
   /  \
 (:)  [15,25]
 / \
7  (\\)
   /  \
 s 9  [21,35,49]

One comparison to move the 7 past the multiples of 5

     (\\)
     /  \
   (:)  [9]
   / \
  7  (\\)
     /  \
  (\\)  [15,25]
  /  \
s 9  [21,35,49]

and one more to lift it past the multiple of 3

      (:)
      / \
     7  (\\)
        /  \
     (\\)  [9]
     /  \
  (\\)  [15,25]
  /  \
s 9  [21,35,49]

After moving the 7 past two lists of composites, it has become ready for consumption. After that, the right subtree of the top-level (:) here must be evaluated further to determine the next prime (if there is any). The evaluation must step down three levels of (\\) in the tree, to reach the s 9 that provides the next candidate.

         (\\)
         /  \
      (\\)  [9]
      /  \
   (\\)  [15,25]
   /  \
 (:)  [21,35,49]
 / \
9  (\\)
   /  \
 s 11 [27]

That candidate must be lifted past two minuses to finally reach the minus that eliminates it

        (\\)
        /  \
     (\\)  [9]
     /  \
   (:)  [15,25]
   / \
  9  (\\)
     /  \
  (\\)  [21,35,49]
  /  \
s 11 [27]

        (\\)
        /  \
      (:)  [9]
      / \
     9  (\\)
        /  \
     (\\)  [15,25]
     /  \
  (\\)  [21,35,49]
  /  \
s 11 [27]

Now the minus can do its work for the first time, producing

           (\\)
           /  \
        (\\)  []
        /  \
     (\\)  [15,25]
     /  \
  (\\)  [21,35,49]
  /  \
s 11 [27]

(the xs `minus` [] at the top eliminates the empty list only after the first argument has been determined non-empty; swapping the first two equations of minus would change that, but the difference in steps needed would be small). Then the evaluation must step down four levels in the tree to produce the next candidate (11). That candidate must be lifted past four minus until it reaches the top. One minus is eliminated from the tree in the course of that, so the next candidate (13) is produced only four levels down the tree, and not five ((13-3)/2), thus the number of steps to bring p to the top so that it may be consumed is not quite (p-3)/2, but it's not much less.

The last element in an lps k is always divisible by the square of the smallest prime factor of k, and they are all odd, so there are at most

1/2*(n/3² + n/5² + n/7² + n/11² + ...) ~ n/10

lists that can be eliminated when reaching n (there are a few less, since that counts all numbers having a square prime divisor, and those having more than one multiple times).

So the problem is that each prime is produced deep down in the tree, at least p*0.4 levels from the top. That means lifting p to the top to make it consumable takes at least p*0.4 steps, giving a basically quadratic overall complexity, not counting the work needed to produce the lists of multiples.

However, the actual behaviour is worse at first, it nears quadratic behaviour when computing primes in the region between 100000 and a 200000 - should become quadratic modulo logarithmic factors in the limit, but I don't think I have the patience to wait for a million or two.

Anyway, is it possible to implement a efficient sieve of Euler in stream processing style? Or the algorithm has something inherent does against the nature of stream?

I can't prove it's impossible, but I know of no way to implement it efficiently. Neither as producing a stream of primes, nor as producing a list of primes to a given limit.

What makes the Sieve of Eratosthenes efficient is that it is trivial to reach the next multiple of a prime (just add p or 2*p or so to the index) without caring whether it has already been crossed off as a multiple of a smaller prime. The work needed to avoid multiple crossings-off is much more than the work of the multiple crossings-off.