Question

Reading up on quotient types and their use in functional programming, I came across this post. The author mentions Data.Set as an example of a module which provides a ton of functions which need access to module's internals:

Data.Set has 36 functions, when all that are really needed to ensure the meaning of a set ("These elements are distinct") are toList and fromList.

The author's point seems to be that we need to "open up the module and break the abstraction" if we forgot some function which can be implemented efficiently only using module's internals.

He then says

We could alleviate all of this mess with quotient types.

but gives no explanation to that claim.

So my question is: how are quotient types helping here?


EDIT

I've done a bit more research and found a paper "Constructing Polymorphic Programs with Quotient Types". It elaborates on declaring quotient containers and mentions the word "efficient" in abstract and introduction. But if I haven't misread, it does not give any example of an efficient representation "hiding behind" a quotient container.


EDIT 2

A bit more is revealed in "[PDF] Programming in Homotopy Type Theory" paper in Chapter 3. The fact that quotient type can be implemented as a dependent sum is used. Views on abstract types are introduced (which look very similar to type classes to me) and some relevant Agda code is provided. Yet the chapter focuses on reasoning about abstract types, so I'm not sure how this relates to my question.

Was it helpful?

Solution

I recently made a blog post about quotient types, and I was led here by a comment. The blog post may provide some additional context in addition to the papers referenced in the question.

The answer is actually pretty straightforward. One way to arrive at it is to ask the question: why are we using an abstract data type in the first place for Data.Set?

There are two distinct and separable reasons. The first reason is to hide the internal type behind an interface so that we can substitute a completely new type in the future. The second reason is to enforce implicit invariants on values of the internal type. Quotient type and their dual subset types allow us to make the invariants explicit and enforced by the type checker so that we no longer need to hide the representation. So let me be very clear: quotient (and subset) types do not provide you with any implementation hiding. If you implement Data.Set with quotient types using lists as your representation, then later decide you want to use trees, you will need to change all code that uses your type.

Let's start with a simpler example (leftaroundabout's). Haskell has an Integer type but not a Natural type. A simple way to specify Natural as a subset type using made up syntax would be:

type Natural = { n :: Integer | n >= 0 }

We could implement this as an abstract type using a smart constructor that threw an error when given a negative Integer. This type says that only a subset of the values of type Integer are valid. Another approach we could use to implement this type is to use a quotient type:

type Natural = Integer / ~ where n ~ m = abs n == abs m

Any function h :: X -> T for some type T induces a quotient type on X quotiented by the equivalence relation x ~ y = h x == h y. Quotient types of this form are more easily encoded as abstract data types. In general, though, there may not be such a convenient function, e.g.:

type Pair a = (a, a) / ~ where (a, b) ~ (x, y) = a == x && b == y || a == y && b == x

(As to how quotient types relate to setoids, a quotient type is a setoid that enforces that you respect its equivalence relation.) This second definition of Natural has the property that there are two values that represent 2, say. Namely, 2 and -2. The quotient type aspect says we are allowed to do whatever we want with the underlying Integer, so long as we never produce a result that differentiates between these two representatives. Another way to see this is that we can encode a quotient type using subset types as:

X/~ = forall a. { f :: X -> a | forEvery (\(x, y) -> x ~ y ==> f x == f y) } -> a

Unfortunately, that forEvery is tantamount to checking equality of functions.

Zooming back out, subset types add constraints on producers of values and quotient types add constraints on consumers of values. Invariants enforced by an abstract data type may be a mixture of these. Indeed, we may decide to represent a Set as the following:

data Tree a = Empty | Branch (Tree a) a (Tree a)
type BST a = { t :: Tree a | isSorted (toList t) }
type Set a = { t :: BST a | noDuplicates (toList t) } / ~ 
    where s ~ t = toList s == toList t

Note, nothing about this ever requires us to actually execute isSorted, noDuplicates, or toList. We "merely" need to convince the type checker that the implementations of functions on this type would satisfy these predicates. The quotient type allows us to have a redundant representation while enforcing that we treat equivalent representations in the same way. This doesn't mean we can't leverage the specific representation we have to produce a value, it just means that we must convince the type checker that we would have produced the same value given a different, equivalent representation. For example:

maximum :: Set a -> a
maximum s = exposing s as t in go t
    where go Empty              = error "maximum of empty Set"
          go (Branch _ x Empty) = x
          go (Branch _ _     r) = go r

The proof obligation for this is that the right-most element of any binary search tree with the same elements is the same. Formally, it's go t == go t' whenever toList t == toList t'. If we used a representation that guaranteed the tree would be balanced, e.g. an AVL tree, this operation would be O(log N) while converting to a list and picking the maximum from the list would be O(N). Even with this representation, this code is strictly more efficient than converting to a list and getting the maximum from the list. Note, that we could not implement a function that displayed the tree structure of the Set. Such a function would be ill-typed.

OTHER TIPS

I'll give a simpler example where it's reasonably clear. Admittedly I myself don't really see how this would translate to something like Set, efficiently.

data Nat = Nat (Integer / abs)

To use this safely, we must be sure that any function Nat -> T (with some non-quotient T, for simplicity's sake) does not depend on the actual integer value, but only on its absolute. To do so, it's not really necessary to hide Integer completely; it would be sufficient to prevent you from matching on it directly. Instead, the compiler might rewrite the matches, e.g.

even' :: Nat -> Bool
even' (Nat 0) = True
even' (Nat 1) = False
even' (Nat n) = even' . Nat $ n - 2

could be rewritten to

even' (Nat n') = case abs n' of
           [|abs 0|]  -> True
           [|abs 1|]  -> False
           n          -> even' . Nat $ n - 2

Such a rewriting would point out equivalence violations, e.g.

bad (Nat 1) = "foo"
bad (Nat (-1)) = "bar"
bad _ = undefined

would rewrite to

bad (Nat n') = case n' of
      1 -> "foo"
      1 -> "bar"
      _ -> undefined

which is obviously an overlapped pattern.

Disclaimer: I just read up on quotient types upon reading this question.

I think the author's just saying that sets can be described as quotient types over lists. Ie: (making up some haskell-like syntax):

data Set a = Set [a] / (sort . nub) deriving (Eq)

Ie, a Set a is just a [a] with equality between two Set a's determined by whether the sort . nub of the underlying lists are equal.

We could do this explicitly like this, I guess:

import Data.List

data Set a = Set [a] deriving (Show)

instance (Ord a, Eq a) => Eq (Set a) where
  (Set xs) == (Set ys) = (sort $ nub xs) == (sort $ nub ys)

Not sure if this is actually what the author intended as this isn't a particularly efficient way of implementing a set. Someone can feel free to correct me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top