Implement zip using foldr

https://stackoverflow.com/questions/235148

04-07-2019
|

Question

I'm currently on chapter 4 of Real World Haskell, and I'm trying to wrap my head around implementing foldl in terms of foldr.

(Here's their code:)

myFoldl :: (a -> b -> a) -> a -> [b] -> a

myFoldl f z xs = foldr step id xs z
    where step x g a = g (f a x)

I thought I'd try to implement zip using the same technique, but I don't seem to be making any progress. Is it even possible?

Solution

zip2 xs ys = foldr step done xs ys
  where done ys = []
        step x zipsfn []     = []
        step x zipsfn (y:ys) = (x, y) : (zipsfn ys)

How this works: (foldr step done xs) returns a function that consumes ys; so we go down the xs list building up a nested composition of functions that will each be applied to the corresponding part of ys.

How to come up with it: I started with the general idea (from similar examples seen before), wrote

zip2 xs ys = foldr step done xs ys

then filled in each of the following lines in turn with what it had to be to make the types and values come out right. It was easiest to consider the simplest cases first before the harder ones.

The first line could be written more simply as

zip2 = foldr step done

as mattiast showed.

OTHER TIPS

The answer had already been given here, but not an (illustrative) derivation. So even after all these years, perhaps it's worth adding it.

It is actually quite simple. First,

foldr f z xs 
   = foldr f z [x1,x2,x3,...,xn] = f x1 (foldr f z [x2,x3,...,xn])
   = ... = f x1 (f x2 (f x3 (... (f xn z) ...)))

hence by eta-expansion,

foldr f z xs ys
   = foldr f z [x1,x2,x3,...,xn] ys = f x1 (foldr f z [x2,x3,...,xn]) ys
   = ... = f x1 (f x2 (f x3 (... (f xn z) ...))) ys

As is apparent here, if f is non-forcing in its 2nd argument, it gets to work first on x1 and ys, f x1r1ys where r1 =(f x2 (f x3 (... (f xn z) ...)))= foldr f z [x2,x3,...,xn].

So, using

f x1 r1 [] = []
f x1 r1 (y1:ys1) = (x1,y1) : r1 ys1

we arrange for passage of information left-to-right along the list, by calling r1 with the rest of the input list ys1, foldr f z [x2,x3,...,xn]ys1 = f x2r2ys1, as the next step. And that's that.

When ys is shorter than xs (or the same length), the [] case for f fires and the processing stops. But if ys is longer than xs then f's [] case won't fire and we'll get to the final f xnz(yn:ysn) application,

f xn z (yn:ysn) = (xn,yn) : z ysn

Since we've reached the end of xs, the zip processing must stop:

z _ = []

And this means the definition z = const [] should be used:

zip xs ys = foldr f (const []) xs ys
  where
    f x r []     = []
    f x r (y:ys) = (x,y) : r ys

From the standpoint of f, r plays the role of a success continuation, which f calls when the processing is to continue, after having emitted the pair (x,y).

So r is "what is done with more ys when there are more xs", and z = const [], the nil-case in foldr, is "what is done with ys when there are no more xs". Or f can stop by itself, returning [] when ys is exhausted.

Notice how ys is used as a kind of accumulating value, which is passed from left to right along the list xs, from one invocation of f to the next ("accumulating" step being, here, stripping a head element from it).

Naturally this corresponds to the left fold, where an accumulating step is "applying the function", with z = id returning the final accumulated value when "there are no more xs":

foldl f a xs =~ foldr (\x r a-> r (f a x)) id xs a

Similarly, for finite lists,

foldr f a xs =~ foldl (\r x a-> r (f x a)) id xs a

And since the combining function gets to decide whether to continue or not, it is now possible to have left fold that can stop early:

foldlWhile t f a xs = foldr cons id xs a
  where 
    cons x r a = if t x then r (f a x) else a

or a skipping left fold, foldlWhen t ..., with

    cons x r a = if t x then r (f a x) else r a

etc.

I found a way using quite similar method to yours:

myzip = foldr step (const []) :: [a] -> [b] -> [(a,b)]
    where step a f (b:bs) = (a,b):(f bs)
          step a f [] = []

For the non-native Haskellers here, I've written a Scheme version of this algorithm to make it clearer what's actually happening:

> (define (zip lista listb)
    ((foldr (lambda (el func)
           (lambda (a)
             (if (empty? a)
                 empty
                 (cons (cons el (first a)) (func (rest a))))))
         (lambda (a) empty)
         lista) listb))
> (zip '(1 2 3 4) '(5 6 7 8))
(list (cons 1 5) (cons 2 6) (cons 3 7) (cons 4 8))

The foldr results in a function which, when applied to a list, will return the zip of the list folded over with the list given to the function. The Haskell hides the inner lambda because of lazy evaluation.

To break it down further:

Take zip on input: '(1 2 3) The foldr func gets called with

el->3, func->(lambda (a) empty)

This expands to:

(lambda (a) (cons (cons el (first a)) (func (rest a))))
(lambda (a) (cons (cons 3 (first a)) ((lambda (a) empty) (rest a))))

If we were to return this now, we'd have a function which takes a list of one element and returns the pair (3 element):

> (define f (lambda (a) (cons (cons 3 (first a)) ((lambda (a) empty) (rest a)))))
> (f (list 9))
(list (cons 3 9))

Continuing, foldr now calls func with

el->3, func->f ;using f for shorthand
(lambda (a) (cons (cons el (first a)) (func (rest a))))
(lambda (a) (cons (cons 2 (first a)) (f (rest a))))

This is a func which takes a list with two elements, now, and zips them with (list 2 3):

> (define g (lambda (a) (cons (cons 2 (first a)) (f (rest a)))))
> (g (list 9 1))
(list (cons 2 9) (cons 3 1))

What's happening?

(lambda (a) (cons (cons 2 (first a)) (f (rest a))))

a, in this case, is (list 9 1)

(cons (cons 2 (first (list 9 1))) (f (rest (list 9 1))))
(cons (cons 2 9) (f (list 1)))

And, as you recall, f zips its argument with 3.

And this continues etc...

The problem with all these solutions for zip is that they only fold over one list or the other, which can be a problem if both of them are "good producers", in the parlance of list fusion. What you actually need is a solution that folds over both lists. Fortunately, there is a paper about exactly that, called "Coroutining Folds with Hyperfunctions".

You need an auxiliary type, a hyperfunction, which is basically a function that takes another hyperfunction as its argument.

newtype H a b = H { invoke :: H b a -> b }

The hyperfunctions used here basically act like a "stack" of ordinary functions.

push :: (a -> b) -> H a b -> H a b
push f q = H $ \k -> f $ invoke k q

You also need a way to put two hyperfunctions together, end to end.

(.#.) :: H b c -> H a b -> H a c
f .#. g = H $ \k -> invoke f $ g .#. k

This is related to push by the law:

(push f x) .#. (push g y) = push (f . g) (x .#. y)

This turns out to be an associative operator, and this is the identity:

self :: H a a
self = H $ \k -> invoke k self

You also need something that disregards everything else on the "stack" and returns a specific value:

base :: b -> H a b
base b = H $ const b

And finally, you need a way to get a value out of a hyperfunction:

run :: H a a -> a
run q = invoke q self

run strings all of the pushed functions together, end to end, until it hits a base or loops infinitely.

So now you can fold both lists into hyperfunctions, using functions that pass information from one to the other, and assemble the final value.

zip xs ys = run $ foldr (\x h -> push (first x) h) (base []) xs .#. foldr (\y h -> push (second y) h) (base Nothing) ys where
  first _ Nothing = []
  first x (Just (y, xys)) = (x, y):xys

  second y xys = Just (y, xys)

The reason why folding over both lists matters is because of something GHC does called list fusion, which is talked about in the GHC.Base module, but probably should be much more well-known. Being a good list producer and using build with foldr can prevent lots of useless production and immediate consumption of list elements, and can expose further optimizations.

I tried to understand this elegant solution myself, so I tried to derive the types and evaluation myself. So, we need to write a function:

zip xs ys = foldr step done xs ys

Here we need to derive step and done, whatever they are. Recall foldr's type, instantiated to lists:

foldr :: (a -> state -> state) -> state -> [a] -> state

However our foldr invocation must be instantiated to something like below, because we must accept not one, but two list arguments:

foldr :: (a -> ? -> ?) -> ? -> [a] -> [b] -> [(a,b)]

Because -> is right-associative, this is equivalent to:

foldr :: (a -> ? -> ?) -> ? -> [a] -> ([b] -> [(a,b)])

Our ([b] -> [(a,b)]) corresponds to state type variable in the original foldr type signature, therefore we must replace every occurrence of state with it:

foldr :: (a -> ([b] -> [(a,b)]) -> ([b] -> [(a,b)]))
      -> ([b] -> [(a,b)])
      -> [a]
      -> ([b] -> [(a,b)])

This means that arguments that we pass to foldr must have the following types:

step :: a -> ([b] -> [(a,b)]) -> [b] -> [(a,b)]
done :: [b] -> [(a,b)]
xs :: [a]
ys :: [b]

Recall that foldr (+) 0 [1,2,3] expands to:

1 + (2 + (3 + 0))

Therefore if xs = [1,2,3] and ys = [4,5,6,7], our foldr invocation would expand to:

1 `step` (2 `step` (3 `step` done)) $ [4,5,6,7]

This means that our 1 `step` (2 `step` (3 `step` done)) construct must create a recursive function that would go through [4,5,6,7] and zip up the elements. (Keep in mind, that if one of the original lists is longer, the excess values are thrown away). IOW, our construct must have the type [b] -> [(a,b)].

3 `step` done is our base case, where done is an initial value, like 0 in foldr (+) 0 [1..3]. We don't want to zip anything after 3, because 3 is the final value of xs, so we must terminate the recursion. How do you terminate the recursion over list in the base case? You return empty list []. But recall done type signature:

done :: [b] -> [(a,b)]

Therefore we can't return just [], we must return a function that would ignore whatever it receives. Therefore use const:

done = const [] -- this is equivalent to done = \_ -> []

Now let's start figuring out what step should be. It combines a value of type a with a function of type [b] -> [(a,b)] and returns a function of type [b] -> [(a,b)].

In 3 `step` done, we know that the result value that would later go to our zipped list must be (3,6) (knowing from original xs and ys). Therefore 3 `step` done must evaluate into:

\(y:ys) -> (3,y) : done ys

Remember, we must return a function, inside which we somehow zip up the elements, the above code is what makes sense and typechecks.

Now that we assumed how exactly step should evaluate, let's continue the evaluation. Here's how all reduction steps in our foldr evaluation look like:

3 `step` done -- becomes
(\(y:ys) -> (3,y) : done ys)
2 `step` (\(y:ys) -> (3,y) : done ys) -- becomes
(\(y:ys) -> (2,y) : (\(y:ys) -> (3,y) : done ys) ys)
1 `step` (\(y:ys) -> (2,y) : (\(y:ys) -> (3,y) : done ys) ys) -- becomes
(\(y:ys) -> (1,y) : (\(y:ys) -> (2,y) : (\(y:ys) -> (3,y) : done ys) ys) ys)

The evaluation gives rise to this implementation of step (note that we account for ys running out of elements early by returning an empty list):

step x f = \[] -> []
step x f = \(y:ys) -> (x,y) : f ys

Thus, the full function zip is implemented as follows:

zip :: [a] -> [b] -> [(a,b)]
zip xs ys = foldr step done xs ys
  where done = const []
        step x f [] = []
        step x f (y:ys) = (x,y) : f ys

P.S.: If you are inspired by elegance of folds, read Writing foldl using foldr and then Graham Hutton's A tutorial on the universality and expressiveness of fold.

A simple approach:

lZip, rZip :: Foldable t => [b] -> t a -> [(a, b)]

-- implement zip using fold?
lZip xs ys = reverse.fst $ foldl f ([],xs) ys
     where f  (zs, (y:ys)) x = ((x,y):zs, ys)

-- Or;
rZip xs ys = fst $ foldr f ([],reverse xs) ys
     where f x (zs, (y:ys))  = ((x,y):zs, ys)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow