Reduce list on the fly in Haskell

Question 1

foldPairProduct :: (Num a, Ord a)  => (a -> a -> a) -> [a] -> [a] -> a
foldPairProduct f xs ys = foldl1 f [ x*y | x <- xs, y <- ys]

can be a good memory citizen. The second argument, ys is used repeatedly, so that will have to be in memory entirely during the computation, but the intermediate list is lazily produced as it is consumed, so that contributes only a constant amount of memory, giving an overall O(length ys) space complexity. Of course there have to be length xs * length ys list cells produces and consumed, so the overall allocations are O(length xs * length ys) [assuming each a value uses bounded space]. The number of bytes copied during GC (and thus the time needed for GC) can be drastically reduced by providing a larger allocation area, with +RTS -A1M, the number drops from

3,717,333,376 bytes copied during GC

for the default setting to

20,445,728 bytes copied during GC

and the time from GC time 4.88s to GC time 0.07s for xs == ys = [1 .. 10000] :: [Int] and f = (+).

But that depends on the strictness analyser doing its job - which it does fine if the type it's used at is e.g. Int and known during compilation, and the combining function is known to be strict. If the code is not specialised or if the combining function is not known to be strict, the fold will produce a thunk of O(length xs * length ys) size. That problem can be alleviated by using the stricter foldl1'.

foldPairProduct' :: Num a => (Maybe a -> Maybe a -> Maybe a) -> [a] -> [a] -> Maybe a  
foldPairProduct' _ _ [] = Nothing
foldPairProduct' _ [] _ = Nothing
foldPairProduct' f (x:xs) (y:ys) = 
  foldl1 f [Just $ x*y, foldPairProduct' f [x] ys, foldPairProduct' f xs [y], 
            foldPairProduct' f xs ys]

runs head-on into the problem of insufficient strictness, the value wrapped by a Just constructor can't be made strict by the compiler here, since it might not be needed for the overall result, so the fold often produces an O(length xs * length ys) size thunk under the Just - of course, for some f, like const, it will behave well as is. For that to be a good memory citizen if all values are used, you must use a sufficiently strict combining function f, forcing also the value under Just in the result (if it's a Just); using foldl1' also helps. With that, it can have O(length ys + length xs) space complexity (the lists xs and ys are used more than once, so are reused).

foldCrossProduct :: Num a => (a -> a -> a) -> [[a]]  -> a
foldCrossProduct f xss = foldl1 f (crossProduct xss)

crossProduct :: Num a => [[a]] -> [a]
crossProduct [] = []
crossProduct (xs:[]) = xs
crossProduct (xs:xss) = [x * y | x <- xs, y <- crossProduct xss]

Although GHC does little CSE (common subexpression elimination), the list crossProduct xss will (probably) be shared here between the different xs, so that produces O(N2*...*Nk) space complexity. If the order of elements in the list doesn't matter, reordering to

crossProduct (xs:xss) = [x * y | y <- crossProduct xss, x <- xs]

helps. Then crossProduct xss need not be in memory at once, so can be incrementally produced and consumed, only xs must be remembered because it's used multiple times. For the recursive invocations, the first of the remaining lists has to be shared, so that would produce an overall O(N1+...+Nk-1) space complexity.

Question 2

There is specific optimization for creation/modifciation/consumption of lists called loop fusion. Because Haskell is pure and non-strict there is number of laws like map f . mag g == map (f . g).

If the compiler for some reason would not recognize the code and produce sub-optimal code (after passing -O flag) I would look into stream fusion in detail to see what is preventing it.

Question 3

(Ok, I was wrong, it will not work in constant space because one of the lists is used multiple times, so it most likely to have linear space complexity)

Did you try to compile test program with optimizations enabled? Your foldPairProduct looks good for me, and I expect it to work in constant space.

ADD: Yes, it works in constant space (3 MB total memory in use):

shum@shum-laptop:/tmp/shum$ cat test.hs 

foldPairProduct f xs ys = foldl1 f [ x*y | x <- xs, y <- ys]

n :: Int
n = 10000

main = print $ foldPairProduct (+) [1..n] [1..n]
shum@shum-laptop:/tmp/shum$ ghc --make -fforce-recomp -O test.hs 
[1 of 1] Compiling Main             ( test.hs, test.o )
Linking test ...
shum@shum-laptop:/tmp/shum$ time ./test +RTS -s
2500500025000000
  10,401,332,232 bytes allocated in the heap
   3,717,333,376 bytes copied during GC
         428,280 bytes maximum residency (3335 sample(s))
         219,792 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     16699 colls,     0 par    4.27s    4.40s     0.0003s    0.0009s
  Gen  1      3335 colls,     0 par    1.52s    1.52s     0.0005s    0.0012s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.23s  (  2.17s elapsed)
  GC      time    5.79s  (  5.91s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    8.02s  (  8.08s elapsed)

  %GC     time      72.2%  (73.2% elapsed)

  Alloc rate    4,659,775,665 bytes per MUT second

  Productivity  27.8% of total user, 27.6% of total elapsed


real    0m8.085s
user    0m8.025s
sys 0m0.040s
shum@shum-laptop:/tmp/shum$