data families use cases

Question

One benefit is that data families are injective, unlike type families.

If you have

type family TF a
data family DF a

Then you know that DF a ~ DF b implies that a ~ b, while with TF, you don't -- for any a you can be sure that DF a is a completely new type (just like [a] is a different type from [b], unless of course a ~ b), while a type family can map multiple input types onto the same existing type.

A second is that data families can be partially applied, like any other type constructor, while type families can't.

This is not a particularly real-world example, but for example, you can do:

data instance DF Int    = DInt    Int
data instance DF String = DString String

class C t where
    foo :: t Int -> t String

instance C DF where -- notice we are using DF without an argument
                    -- notice also that you can write instances for data families at all,
                    -- unlike type families
    foo (DInt i) = DString (show i)

Basically, DF and DF a are actual, first-class, legitimate types, in themselves, like any other type you declare with data. TF a is just an intermediate form that evaluates to a type.

But I suppose all of that's not very enlightening, or at least it wasn't for me, when I was wondering about data families and read similar things.

Here's the rule of thumb I go by. Whenever you find yourself repeating the pattern that you have a type family, and for every input type, you declare a new data type for the type family to map onto, it's nicer to cut out the middleman and use a data family instead.

A real-world example from the vector library. vector has several different kinds of Vectors: boxed vectors, unboxed vectors, primitive vectors, storable vectors. For each Vector type there is a corresponding, mutable MVector type (normal Vectors are immutable). So it looks like this:

type family Mutable v :: * -> * -> * -- the result type has two type parameters

module Data.Vector{.Mutable} where
data Vector a = ...
data MVector s a = ...
type instance Mutable Vector = MVector

module Data.Vector.Storable{.Mutable} where
data Vector a = ...
data MVector s a = ...
type instance Mutable Vector = MVector

[etc.]

Now instead of that, I would rather have:

data family Mutable v :: * -> * -> *

module Data.Vector{.Mutable} where
data Vector a = ...
data instance Mutable Vector s a = ...
type MVector = Mutable Vector

module Data.Vector.Storable{.Mutable} where
data Vector a = ...
data instance Mutable Vector s a = ...
type MVector = Mutable Vector

[etc.]

Which encodes the invariant that for every Vector type there is exactly one Mutable Vector type, and that there is a one-to-one correspondence between them. The mutable version of a Vector is always called Mutable Vector: that is its name, and it has no other. If you have a Mutable Vector, you can get the type of the corresponding immutable Vector, because it's right there as a type argument. With type family Mutable, once you apply it to an argument it evaluates to an unspecified result type (presumably called MVector, but you can't know), and you have no way to map backwards.