Question

I was going through the source code of the nubBy function in Data.List:

nubBy                   :: (a -> a -> Bool) -> [a] -> [a]
#ifdef USE_REPORT_PRELUDE
nubBy eq []             =  []
nubBy eq (x:xs)         =  x : nubBy eq (filter (\ y -> not (eq x y)) xs)
#else
nubBy eq l              = nubBy' l []
  where
    nubBy' [] _         = []
    nubBy' (y:ys) xs
       | elem_by eq y xs = nubBy' ys xs
       | otherwise       = y : nubBy' ys (y:xs)

Now it seems to me that the two versions above differ from each other. If I take the USE_REPORT_PRELUDE version, I get

nubBy (>) [1..10]
[1,2,3,4,5,6,7,8,9,10]

while the other implementation yields

nubBy (>) [1..10]
[1]

What is the reasoning behind this?

Was it helpful?

Solution

I think that nubBy requires the binary boolean operation to be an equivalence relation.

This is roughly in the same spirit of sortBy requiring a preorder relation (reflexive & transitive). If this requirement is invalidated, then quicksort and mergesort become non equivalent algorithms. The intent of the Haskell report is instead to allow an implementation to use any of them (or another correct sorting algorithm).

Similarly, if the nubBy comparison is allowed being a non-equivalence, the implementation is unnecessarily constrained to use exactly the reference Prelude algorithm, preventing the use of a more efficient alternative.


To be honest, the exact requirements on the operators passed to the "...By" are not always obvious. For instance the documentation of sortBy seems to guarantee correctness only for total orderings, albeit I expect the actual implementation will also work for a larger class of orderings, provided the result is interpreted up-to the equivalence induced by the ordering.

The documentation for nubBy simply states that the first argument is a "user-supplied equality predicate". So, it is only guaranteed for equality, and not for arbitrary equivalences.

However, my feeling is that if its implementation works for equality, it has to work also for equivalences (provided the result is read up-to, of course). This might indeed be provable by exploiting the "free theorem" associated to the nubBy type. My lack of expertise with parametricity betrays me here :)

OTHER TIPS

There is a GHC bug report about this. The behavior of nubBy originally matched the Prelude implementation, but was changed at some point as a "fix" to another bug report. I think the jury is still out on what is the right thing to do.

You can see that on codepad.org, your code does produce [1,2,3,4,5,6,7,8,9,10]; but on ideone.com, your code produces [1]. So clearly one uses an older or different implementation from the other.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top