What is groupBy supposed to do?

https://stackoverflow.com/questions/22897907

28-06-2023
|

문제

I wrote something using Data.List.groupBy. It didn't work as expected so I end up writting my own version of groupBy : after all I'm not sure that the Data.List one is supposed to do (there is no real documentation).

Anyway my tests passed with my version of groupBy whereas it fails with the Data.List. I found (thanks quickcheck) a case where the two function behaves differently, and I still don't understand why there is a difference between the two versions. Is the Data.List version buggy or is is mine ? (Of course mine is a naive implementation and is probably not the most efficient way to do so).

Here is the code :

import qualified Data.List as DL
import Data.Function (on)
import Test.QuickCheck

groupBy' :: (a -> a ->  Bool) -> [a] -> [[a]]
groupBy' _ [] = []
groupBy' eq (x:xs) = xLike:(groupBy' eq xNotLike) where
    xLike = x:[ e | e <- xs, x `eq` e  ]
    xNotLike = [ e | e <- xs, not $ x `eq` e  ]

head' [] = Nothing
head'  (x:xs) = Just x

prop_a s = (groupBy' by s) == (DL.groupBy by s) where
    types = s :: [String]
    by = (==) `on` head'

running in ghc quickCheck prop_a returns ["", "a", ""]

*Main> groupBy' ((==) `on` head') ["","a",""]
[["",""],["a"]] # correct in my opinion
*Main> DL.groupBy ((==) `on` head') ["","a",""]
[[""],["a"],[""]] # incorrect.

What's happening ? I can't believe there is a bug in the haskell-platform .

해결책

Your version is O (n²) – which can be unacceptably slow in real-world use¹.

The standard version avoids this by only grouping adjacent elements if they are equivalent. Hence,

*Main> groupBy ((==) `on` head') ["", "", "a"]

will yield the result you're after.

A simple way to obtain "universal grouping" with groupBy is to first sort the list if that's feasible for the data type.

*Main> groupBy ((==) `on` head') $ DL.sort ["", "a", ""]

The complexity of this is only O (n log n).

¹ This didn't prevent the committee from specifying nub as O (n²)...

다른 팁

Data.List.groupBy in Haskell is a usability mistake! A user friendly groupBy should behave like this:

groupByWellBehaved p = foldr (\x rest -> if null rest
                                     then [[x]]
                                     else if p x (head (head rest))
                                          then (x : head rest) : (tail rest)
                                          else [x] : rest) []

Perhaps there is a better implementation, but at least this is O(n).

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow