Argument of groupBy's Lambda

Question 1

These are the variables to compare. You know that group puts equal neighbored values together. To decide what a equal value is it uses a compare function. group relies on the instance of your type of the Eq typeclass. But groupBy allows you to choose how to compare the neighbored values.

Question 2

If we look at the type of groupBy:

groupBy :: (a -> a -> Bool) -> [a] -> [[a]]

The first argument to groupBy is a function that takes two arguments of type a and return a Bool. You could equivalently write this as

groupBy comparer values where comparer x y = (x > 0) == (y > 0)

The \x y -> part just says that the lambda function takes two arguments named x and y, just like with any other function declaration.

The easiest way to see what this expression does is to just run it:

ghci> groupBy (\x y -> (x > 0) == (y > 0)) values
[[-4.3,-2.4,-1.2],[0.4,2.3,5.9,10.5,29.1,5.3],[-2.4,-14.5],[2.9,2.3]]

If you look closely, you can see that each sublist is grouped by if it's positive or negative. The groupBy function groups elements of a list by the given condition, but only in sequential order. For example:

ghci> groupBy (\x y -> x == y) [1, 1, 2, 2, 2, 3, 3, 4]
[[1,1],[2,2,2],[3,3],[4]]
ghci> groupBy (\x y -> x == y) [1, 1, 2, 2, 2, 3, 3, 1]
[[1,1],[2,2,2],[3,3],[1]]

In the second example, notice that the 1s haven't all been grouped together because they aren't adjacent.

Question 3

In cases like these, it's best to go straight to the source! groupBy is part of Data.List, so you can find it the base package on Hackage. When you don't know what package a function is in, search for the function in Hoogle and click on the name to be taken to the Haddocks on Hackage. When you're looking at Haddock documentation, there will usually be a "Source" link on the righthand side of the function type definition to take you to the definition. Here's the source for groupBy.

I've reproduced the definition here to step through it.

-- | The 'groupBy' function is the non-overloaded version of 'group'.
groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

First, the documentation line at the top tells us that groupBy is the non-overloaded version of group, which is a very common pattern in base. You can go check out group to figure out the simplest case of grouping functionality, then you can understand the -By version as allowing you to supply your own predicate (in case you wanted to compare equality differently than the Eq instance for a type, or whatever other operation you're trying to do).

The base case is trivial, but the recursive step might be a little confusing if you don't know what span does (time to hit Hackage again!). span takes a predicate and a list and returns a pair (2-tuple) of lists broken before the first element that doesn't match the predicate (it's like break but (not) negated).

So now you should be able to put it all together and see that groupBy groups elements of a list together by segregating runs of elements which are "equal" to the first element in that run. Note that it is NOT comparing elements pairwise (I was burned by that before) so don't assume that the two elements being passed to the predicate function would be adjacent in the list!

Question 4

Let's start with group. This function simply groups together all the adjacent elements that are identical. e.g.,

group [0,1,2,3,3,4] = [[0],[1],[2],[3,3],[4]]

GHC defines group as follows:

group :: Eq a => [a] -> [[a]]
group =  groupBy (==)

That is, the equality test is implicit. The same thing can be written with an explicit equality test using groupBy as;

groupBy (\x y -> x == y) [0,1,2,3,3,4] = [[0],[1],[2],[3,3],[4]]

Now, let's look at how GHC defines groupBy:

groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

eq is used to split the rest of the list based on comparison with the first element. groupBy is then recursively called on the list of elements that fail the comparison. Note that (x:ys) is a concatenation of the first element, with the list of elements that satisfy the comparison condition. Also, the span function will start the second list at the first element where test condition is not met.

Hence, in the given example, the moment you reach the value 0.4, a new list has to start, since 0.4 will be the first element of zs from the above definition.

Question 5

groupBy divides a list into groups according to some “rule”. In groupBy (\x y -> x `someComparison` y) someList, x is a first element of the “current” group, y is an element of someList. groupBy traverses someList, so at each step y becomes the next element of someList. The new group started, when the predicate returns False. y becomes the first member of the new group. Iteration continues, the first member of this new group now becomes x, and the next element of someList becomes y.

groupBy does NOT compare elements pairwise (1^st with 2^nd, 2^nd with 3^rd, etc), instead it compares each element of the list with the first element of the group currently being filled. Example:

groupBy (\x y -> x < y) [1,2,3,2,1] -- returns: [[1,2,3,2],[1]]

Step by step groupBy:

compares 1 with 2. 1 < 2, thus both numbers go into the same group. Groups: [[1,2]]
compares 1 with 3. 1 < 3, thus 3 goes into the old group. Groups: [[1,2,3]]
compares 1 with 2. 1 < 2, thus 2 goes into the old group. Groups: [[1,2,3,2]]
compares 1 with 1. 1 ≮ 1, thus the new group is formed, and 1 goes as its first element. Groups: [[1,2,3,2],[1]]

Exercise: To understand how groupBy works, try to figure out how the following expressions return their results with pen and paper:

groupBy (\x y -> x < y) [1,2,3,4,5,4,3,2,1] -- [[1,2,3,4,5,4,3,2],[1]]
groupBy (\x y -> x < y) [1,3,5,2,1] -- [[1,3,5,2],[1]]
groupBy (\x y -> x <= y) [3,5,3,2,1,0,1,0] -- [[3,5,3],[2],[1],[0,1,0]]
groupBy (\x y -> x <= y) [1,2,3,2,1] -- [[1,2,3,2,1]]

Again, the key to understand behavior of groupBy is to remember, that at each step it compares the first element of the current group with a consecutive element of the list. The moment predicate returns False, the new group is formed, and the process continues.

To understand why groupBy behaves that way, inspect its source:

groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

The key here is the use of span function: span (1<) [2,3,2,1] returns ([2,3,2],[1]). span (eq x) xs in the above code puts all elemets of xs that match (eq x) into the first part of a pair, and te rest of xs into the second. (x:ys) then joins x with the first part of a pair, while groupBy is recursively invoked on the rest of xs (which is zs). That's why groupBy works this strange way.