How can quotient types help safely expose module internals?

Question 1

I recently made a blog post about quotient types, and I was led here by a comment. The blog post may provide some additional context in addition to the papers referenced in the question.

The answer is actually pretty straightforward. One way to arrive at it is to ask the question: why are we using an abstract data type in the first place for Data.Set?

There are two distinct and separable reasons. The first reason is to hide the internal type behind an interface so that we can substitute a completely new type in the future. The second reason is to enforce implicit invariants on values of the internal type. Quotient type and their dual subset types allow us to make the invariants explicit and enforced by the type checker so that we no longer need to hide the representation. So let me be very clear: quotient (and subset) types do not provide you with any implementation hiding. If you implement Data.Set with quotient types using lists as your representation, then later decide you want to use trees, you will need to change all code that uses your type.

Let's start with a simpler example (leftaroundabout's). Haskell has an Integer type but not a Natural type. A simple way to specify Natural as a subset type using made up syntax would be:

type Natural = { n :: Integer | n >= 0 }

We could implement this as an abstract type using a smart constructor that threw an error when given a negative Integer. This type says that only a subset of the values of type Integer are valid. Another approach we could use to implement this type is to use a quotient type:

type Natural = Integer / ~ where n ~ m = abs n == abs m

Any function h :: X -> T for some type T induces a quotient type on X quotiented by the equivalence relation x ~ y = h x == h y. Quotient types of this form are more easily encoded as abstract data types. In general, though, there may not be such a convenient function, e.g.:

type Pair a = (a, a) / ~ where (a, b) ~ (x, y) = a == x && b == y || a == y && b == x

(As to how quotient types relate to setoids, a quotient type is a setoid that enforces that you respect its equivalence relation.) This second definition of Natural has the property that there are two values that represent 2, say. Namely, 2 and -2. The quotient type aspect says we are allowed to do whatever we want with the underlying Integer, so long as we never produce a result that differentiates between these two representatives. Another way to see this is that we can encode a quotient type using subset types as:

X/~ = forall a. { f :: X -> a | forEvery (\(x, y) -> x ~ y ==> f x == f y) } -> a

Unfortunately, that forEvery is tantamount to checking equality of functions.

Zooming back out, subset types add constraints on producers of values and quotient types add constraints on consumers of values. Invariants enforced by an abstract data type may be a mixture of these. Indeed, we may decide to represent a Set as the following:

data Tree a = Empty | Branch (Tree a) a (Tree a)
type BST a = { t :: Tree a | isSorted (toList t) }
type Set a = { t :: BST a | noDuplicates (toList t) } / ~ 
    where s ~ t = toList s == toList t

Note, nothing about this ever requires us to actually execute isSorted, noDuplicates, or toList. We "merely" need to convince the type checker that the implementations of functions on this type would satisfy these predicates. The quotient type allows us to have a redundant representation while enforcing that we treat equivalent representations in the same way. This doesn't mean we can't leverage the specific representation we have to produce a value, it just means that we must convince the type checker that we would have produced the same value given a different, equivalent representation. For example:

maximum :: Set a -> a
maximum s = exposing s as t in go t
    where go Empty              = error "maximum of empty Set"
          go (Branch _ x Empty) = x
          go (Branch _ _     r) = go r

The proof obligation for this is that the right-most element of any binary search tree with the same elements is the same. Formally, it's go t == go t' whenever toList t == toList t'. If we used a representation that guaranteed the tree would be balanced, e.g. an AVL tree, this operation would be O(log N) while converting to a list and picking the maximum from the list would be O(N). Even with this representation, this code is strictly more efficient than converting to a list and getting the maximum from the list. Note, that we could not implement a function that displayed the tree structure of the Set. Such a function would be ill-typed.

Question 2

I'll give a simpler example where it's reasonably clear. Admittedly I myself don't really see how this would translate to something like Set, efficiently.

data Nat = Nat (Integer / abs)

To use this safely, we must be sure that any function Nat -> T (with some non-quotient T, for simplicity's sake) does not depend on the actual integer value, but only on its absolute. To do so, it's not really necessary to hide Integer completely; it would be sufficient to prevent you from matching on it directly. Instead, the compiler might rewrite the matches, e.g.

even' :: Nat -> Bool
even' (Nat 0) = True
even' (Nat 1) = False
even' (Nat n) = even' . Nat $ n - 2

could be rewritten to

even' (Nat n') = case abs n' of
           [|abs 0|]  -> True
           [|abs 1|]  -> False
           n          -> even' . Nat $ n - 2

Such a rewriting would point out equivalence violations, e.g.

bad (Nat 1) = "foo"
bad (Nat (-1)) = "bar"
bad _ = undefined

would rewrite to

bad (Nat n') = case n' of
      1 -> "foo"
      1 -> "bar"
      _ -> undefined

which is obviously an overlapped pattern.

Question 3

Disclaimer: I just read up on quotient types upon reading this question.

I think the author's just saying that sets can be described as quotient types over lists. Ie: (making up some haskell-like syntax):

data Set a = Set [a] / (sort . nub) deriving (Eq)

Ie, a Set a is just a [a] with equality between two Set a's determined by whether the sort . nub of the underlying lists are equal.

We could do this explicitly like this, I guess:

import Data.List

data Set a = Set [a] deriving (Show)

instance (Ord a, Eq a) => Eq (Set a) where
  (Set xs) == (Set ys) = (sort $ nub xs) == (sort $ nub ys)

Not sure if this is actually what the author intended as this isn't a particularly efficient way of implementing a set. Someone can feel free to correct me.