Why wouldn't my sieve terminate when I rewrote it as a foldl?

Question 1

foldl can't ever terminate on an infinite list. This is the definition of the function:

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs

Notice that in Haskell, when you force a thunk, evaluation only stops when we reach a step where the outermost operation is a data constructor (well, unless you use explicit forcing or lazy pattern matching). In the case of foldl, that can only be when it hits []. Let's look at this example, reversing an infinite list (which should give it away already):

foldl (flip (:)) [] [1..]
  = foldl (flip (:)) [1] [2..]
  = foldl (flip (:)) [2, 1] [3..]
  = foldl (flip (:)) [3, 2, 1] [4..]
  = foldl (flip (:)) [4, 3, 2, 1] [5..]
  .
  .
  .

Since foldl will always be the outermost operator, and foldl is not a constructor, it will never terminate. With some reflection you can see that no left fold of an infinite list can ever terminate.

foldr doesn't have that problem because its second equation has f at the top:

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)

Example:

foldr (:) [] [1..]
  = 1 : foldr (:) [] [2..]

Now (:) is the outermost operator, so evaluation stops here; something has to force the constructor's arguments for evaluation to continue.

Question 2

_{(clarification: this repeats more or less closely the last part of my answer to your question on https://cstheory.stackexchange.com/ and elaborates on it a bit; I'm not sure if it's a good fit there or not.)}

Your first version is actually very close to Richard Bird's solution (from page 11 of M. O'Neill's paper). You can get there not by trial and error, but by code transformation and stepwise improvement.

After some renaming, pre-calculating the first step gives us this very nice-looking code:

--     map (*i) [i ..] == map (*i) [i, i+1 ..] == [i*i, i*i+i ..]
--     sieve 2 [] where sieve i [] = (i:) $
--               sieve (i + 1) [i*i, i*i+i ..] == 2 : sieve 3 [4, 6 ..]

primesA :: [Integer]
primesA = 2 : sieve 3 [4, 6 ..]

sieve p cs@(c : t)
   | p == c    =     sieve (p + 1) t
   | otherwise = p : sieve (p + 1) (unionI cs [p*p, p*p+p ..])

unionI a@(x:xs) b@(y:ys) | x <  y    = x : xs `unionI` b
                         | x == y    = x : xs `unionI` ys
                         | otherwise = y : a  `unionI` ys

There are two problems with this code. There is the (...(((a+b)+c)+d)+...) calculation structure, which puts the more frequently-producing streams lower in the structure than the less frequently-producing ones.

But the more immediate concern is the premature handling of each prime's multiples. E.g. when you reach 5, [25, 30 ..] is added into the composites. But that should only be done when 25 is reached. Doing so would radically reduce the total number of multiples streams being processed at each moment (from Θ(n) to Θ(sqrt(n/log n)), for n-th prime). This has very significant impact on efficiency.

We can explicitly synchronize on squares of primes, taken from the primes sequence itself as it is being produced (this just maintains a separate back-pointer ps into the sequence, which is advanced at much slower pace than the production point):

primesAC = [2,3] ++ sieve 4 (tail primesAC) 9 [4,6..]

sieve k ps@(p:pt) q cs@(c:ct)                 -- candidate "k", prime "p"
  | k == c  =     sieve (k + 1) ps q ct       -- square "q", composite "c"
  | k < q   = k : sieve (k + 1) ps q cs
  | otherwise =   sieve k       pt (head pt^2)       -- k == q == p*p
                        (unionI cs [q, q+p ..])

This is now just half a step away from Richard Bird's solution, which fixes both problems with one cleverly used foldr, creating the a+(b+(c+(...))) calculation structure where each new prime's multiples stream starts from the prime's square anyway and the synchronization is implicit in the data:

primesAD = (2 :) . sieve 3 . 
            foldr (\p r-> p*p : unionI [p*p+p, p*p+2*p ..] r) [] $ primesAD

sieve p cs@(c : t)                  -- == diffI [p..] cs, if p<=c
  | p == c    =     sieve (p + 1) t
  | otherwise = p : sieve (p + 1) cs

Each prime's square is produced first unconditionally, to prevent the run-away access prematurely forcing further parts of data.

So foldr, and not foldl, is a natural fit to this problem.

You 2nd variant is

primesB :: [Integer]
primesB = [2..] `diffI` composites

composites = foldl f [4,6..] [3..]
  where 
        f cs@(h:t) i | i == h    = cs    
                     | otherwise = h : unionI t [i*i, i*i+i ..]

diffI (x:xs) (y:ys) | x <  y    = x : xs `diffI` (y:ys)
                    | x == y    =     xs `diffI` ys
                    | otherwise = (x:xs) `diffI` ys

This evidently intends to compute composites by similar iterative addition of each number's multiples into the mix; trouble is, foldl doesn't let us in into its internal computation process which is never quite finished (as other answers already pointed out). foldl is unrelenting like that. :) We could turn it into a productive iterate-based iteration, but the same problems of premature addition and the inverted structure will remain. Plus it now adds multiples of all numbers, not just of primes as before:

composites = foldl f [4,6..] [3..]
 -- no part of calculation of (f [4,6..] 3) is forced here, but if it were, ...
 = foldl f (f [4,6..] 3) [4..]                  
 = foldl f (4:unionI [6,8..] [9,12..]) [4..]    
 = foldl f (4:unionI [6,8..] [9,12..]) [5..]
 = foldl f (4:union (unionI [6,8..] [9,12..]) [25,30..]) [6..]
 = foldl f (4:union (union (unionI [6,8..] [9,12..]) [25,30..]) [36,42..]) [7..]
 = ....

And ([2..] `diffI`) is the same as sieve 2 above, so that adds nothing.

Question 3

For anyone who may wonder how to correct my code primesB, the following revision will work. It will "interleave the individual heads properly" as Klaus Draeger wrote here in cstheory.se. It's equivalent to, though maybe less pleasing than, the code attributed to Richard Bird on page 11 of M.E. O'Neill The Genuine Sieve of Eratosthenes: http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf (thanks @Wes) in print as: http://dx.doi.org/10.1017/S0956796808007004

primesB :: [Integer]
-- #1 necessary change
-- primesB = [2..] `differenceIncreasing` composites
primesB = 2 : [3..] `differenceIncreasing` composites

-- #2 necessary change, entire function
composites = foldr f [] primesB -- not foldl, not [2..]
  where f :: Integer -> [Integer] -> [Integer]
        -- no destructuring of knownComposites;
        -- square the first list element and don't put that in unionIncreasing's second argument
        f i knownComposites = ((i*i):) $ unionIncreasing knownComposites $ map (*i) [(i+1)..]

-- No change needed in `unionIncreasing` and `differenceIncreasing`.

I'm still not certain how a person achieves this solution, unless it's only by trial and error.