Construct infinite sorted list without adding duplicates

Question 1

I'm going to present a solution that uses an infinite pairing heap. We'll have logarithmic overhead per element constructed, but no one knows how to do better (in a model with comparison-based methods and real numbers).

The first bit of code is just the standard pairing heap.

module Queue where
import Data.Maybe (fromMaybe)

data Queue k = E
             | T k [Queue k]
             deriving Show

fromOrderedList :: (Ord k) => [k] -> Queue k
fromOrderedList [] = E
fromOrderedList [k] = T k []
fromOrderedList (k1 : ks'@(k2 : _ks''))
  | k1 <= k2 = T k1 [fromOrderedList ks']

mergePairs :: (Ord k) => [Queue k] -> Queue k
mergePairs [] = E
mergePairs [q] = q
mergePairs (q1 : q2 : qs'') = merge (merge q1 q2) (mergePairs qs'')

merge :: (Ord k) => Queue k -> Queue k -> Queue k
merge (E) q2 = q2
merge q1 (E) = q1
merge q1@(T k1 q1's) q2@(T k2 q2's)
  = if k1 <= k2 then T k1 (q2 : q1's) else T k2 (q1 : q2's)

deleteMin :: (Ord k) => Queue k -> Maybe (k, Queue k)
deleteMin (E) = Nothing
deleteMin (T k q's) = Just (k, mergePairs q's)

toOrderedList :: (Ord k) => Queue k -> [k]
toOrderedList q
  = fromMaybe [] $
      do (k, q') <- deleteMin q
         return (k : toOrderedList q')

Note that fromOrderedList accepts infinite lists. I think that this can be justified theoretically by pretending as though the infinite list of descendants effectively are merged "just in time". This feels like the kind of thing that should be in the literature on purely functional data structures already, but I'm going to be lazy and not look right now.

The function mergeOrderedByMin takes this one step further and merges a potentially infinite list of queues, where the min element in each queue is nondecreasing. I don't think that we can reuse merge, since merge appears to be insufficiently lazy.

mergeOrderedByMin :: (Ord k) => [Queue k] -> Queue k
mergeOrderedByMin [] = E
mergeOrderedByMin (E : qs') = mergeOrderedByMin qs'
mergeOrderedByMin (T k q's : qs')
  = T k (mergeOrderedByMin qs' : q's)

The next function removes duplicates from a sorted list. It's in the library that m09 suggested, but for the sake of completeness, I'll define it here.

nubOrderedList :: (Ord k) => [k] -> [k]
nubOrderedList [] = []
nubOrderedList [k] = [k]
nubOrderedList (k1 : ks'@(k2 : _ks''))
  | k1 < k2 = k1 : nubOrderedList ks'
  | k1 == k2 = nubOrderedList ks'

Finally, we put it all together. I'll use the squares as an example.

squares :: [Integer]
squares = map (^ 2) [0 ..]

sumsOfTwoSquares :: [Integer]
sumsOfTwoSquares
  = nubOrderedList $ toOrderedList $
      mergeOrderedByMin
        [fromOrderedList (map (s +) squares) | s <- squares]

Question 2

If you don't want to modify your code that much, you can use the nub function of Data.List.Ordered (installable by cabal install data-ordlist) to filter duplicates out.

It runs in linear time, ie complexity wise your algorithm won't change.

Question 3

for your example [1..] the result is just [2..]. A "very smart compiler" could deduce this from the general solution with implicit heap, that follows.

gens xs is better expressed as

gens xs = map (\t@(x:_) -> map (x+) t) $ tails xs   -- or should it be
     --   map (\(x:ys) -> map (x+) ys) $ tails xs   --                ?

Its resulting list of lists is easily merged without duplicates by tree-like folding¹ (pictured here), with

pairsums xs = foldi (\(x:l) r-> x : union l r) $ gens xs

This assumes the input list is ordered in increasing order. If it's merely in non-decreasing order (with only finite runs of equals in it, of course), you'll need to slap an orderedNub on top of that (as m09 mentions),

pairsums' = orderedNub . pairsums

Just by using foldi where foldr would work, we often get an algorithmic improvement in complexity from a factor of n to log n, a pretty significant speedup. I use it as a general tool all the time.

¹The code, adjusted for infinite lists only:

foldi f (x:xs)  = f x (foldi f (pairs f xs))
pairs f (x:y:t) = f x y : pairs f t
union (x:xs) (y:ys) = case compare x y of           
                          LT -> x : union  xs (y:ys) 
                          EQ -> x : union  xs    ys  
                          GT -> y : union (x:xs) ys

Construct infinite sorted list without adding duplicates

EDIT: