Can lists be defined in a special way so that they contain things of different type?

https://cs.stackexchange.com/questions/129264

29-09-2020
|

Question

In https://www.seas.harvard.edu/courses/cs152/2019sp/lectures/lec18-monads.pdf it is written that

A type $\tau$ list is the type of lists with elements of type $\tau$

Why must a list contain elements of the same type? Why can't it contain elements of different types? Is there a way of defining a list polymorphically in the typed lambda calculus, so that it takes elements of any type?

Can we then use the List monad on lists, defined polymorphically?

Solution

The short answer is that $\tau\ \text{list}$ is defined as a type constructor, along with rules for formation and elimination, and so we could similarly define a type constructor that allowed terms of different types to form a single "variably-typed list". However, lists cannot take different types in the definition given, simply because they are defined with respect to a single type. In either case, adding lists, or variably-typed lists, involves extending the simply-typed $\lambda$-calculus, as lists of any kind do not exist in the usual presentation.

If we have a slightly richer type system than the simply-typed $\lambda$-calculus, we can encode variably-typed lists using standard $\tau\ \text{list}$s.

If we have a form of subtyping, we can store terms of different types, as long as they share a supertype. However, when we project elements out of the list, we can no longer tell specifically what type they were to begin with (this may be familiar from object-oriented programming), so this is a little limited.
If we have dependent sum types (also called $\Sigma$-types) and a universe type $\mathcal U$ (i.e. a "type of types"), we can form the type $(\Sigma_{A : \mathcal U} A)\ \text{list}$, whose elements are pairs consisting of a type $A$ and a term of that type.

Finally, I'll just note that polymorphism doesn't help us if we want heterogeneous lists: it just allows us to manipulate homogeneous lists for different $\tau$ more effectively. Polymorphic types have to be uniform in some sense, which is why we need dependency here instead.

To answer a follow-up question: if we have two variably-sorted lists using the dependent type approach, we can concatenate and flatten lists just as with ordinary lists.

The $\mathrm{List}$ monad has an operation $\mathrm{join}$ (in the language of Haskell), so given a list of variably-typed lists, $$l = [[(A, a), (B, b)], [(C, c), (D, d)]] : (\Sigma_{X : \mathcal U} X)\ \text{list list}$$ we can perform $\mathrm{join}$ to get a new list: $$\mathrm{join}(l) = [(A, a), (B, b), (C, c), (D, d)] : (\Sigma_{X : \mathcal U} X)\ \text{list}$$
Similarly, $\tau\ \text{list}$ can be equipped with a concatenation operation $+\!+$, so given the two lists in the previous example, we can concatenate them for a similar result: $$[(A, a), (B, b)]\ {+\!+}\ [(C, c), (D, d)] = [(A, a), (B, b), (C, c), (D, d)] : (\Sigma_{X : \mathcal U} X)\ \text{list}$$

OTHER TIPS

No, that's not possible, at least not in a useful way. Think about what the type of head would be. When every element has same type, head has the type $\tau \; \mathsf{list} \to \tau$. Without that guarantee, there would be no way to write a coherent type for head. For the list type to be useful, we want to be able to draw useful conclusions about what the type of the output of head is; and that requires all elements of the list to have the same type.

I suppose you could define a "list" some other way, but it either wouldn't be useful (you couldn't reason about the type of values you get out of it with head) or it wouldn't correspond to something that computer scientists would call a "list".

You can't usefully define a type $\mathsf{list}$ that doesn't indicate the type of its elements. That doesn't mean that you can't have lists that contain things of different types: it's still a $\tau \, \mathsf{list}$, but you can put the “contain things of different types” part in the $\tau$.

(These basic ideas were already in D.W. and varkor's answers. It's important to realize that these answers aren't contradictory! They're looking at different aspects of the bigger picture.)

If the type system lets you define a type $\mathsf{list}$ that can contain elements of any type, then consider the return type of a destructor like $\mathsf{head}$ or $\mathsf{nth}$, or the type of the function argument to $\mathsf{fold}$. You have no information on the type of the elements, so they would have to allow any type. This means that for example $\lambda x. \mathsf{head}(\mathsf{cons}(x, \mathsf{nil}))$ will not give you back a value of the same type as $x$ (or $x \, \mathsf{option}$, so that $\mathsf{head}$ can return $\mathsf{None}$ on empty lists). But then what do you get back from $\mathsf{head}$?

If $\mathsf{head}$ allows the caller to specify any return type, then the type system is pretty much useless, since it allows arbitrary coercions between types through $\lambda x. \mathsf{head}(\mathsf{cons}(x, \mathsf{nil}))$. It's useless for logic since the Curry-Howard correspondence maps an arbitrary coercion between types to having every proposition imply every other proposition, so you have an inconsistent logic.
If not, then you can't get back a value of the original type through $\lambda x. \mathsf{head}(\mathsf{cons}(x, \mathsf{nil}))$. So you may be able to build lists, but you can't get elements out of them.

A real-life example which in fact demonstrates both behaviors above is early versions of Java, before it had generics. Java has both a static type system and a dynamic type system. In the static type system, any¹ value can be transparently coerced to Object, because Object is considered a supertype of everything. So you can put any value in a List. But what you get back out of it is the original value cast to Object, not the original value itself. In the dynamic type system, you can coerce any type to any other type, so in practice, to get a value out of a list, you coerce it to the desired type. But having coercions defeats the purpose of a type system. This problem is the main reason why Java acquired generics: they allow the language to have $\tau \, \mathsf{list}$ instead of $\mathsf{list}$ (or in Java notation, List<T> instead of List).

Just because a list has a type of elements — $\tau \, \mathsf{list}$ is a list of elements of type $\tau$ — doesn't mean that you can't arrange to put values of different types in the same list. Pretty much any language that allows defining a list type does it by allowing algebraic data type definitions, something like this: $$ \tau \, \mathsf{list} ::= \mathsf{nil} \mid \mathsf{cons} \: \tau \: (\tau \, \mathsf{list}) $$ Suppose you want to put both integers and strings in the same list. Define a type $$ U ::= \mathsf{I} \: \mathsf{int} \mid \mathsf{S} \: \mathsf{string} $$ Now $U \, \mathsf{list}$ is the type of lists that can contain a mixture of integers and strings, e.g. $[\mathsf{I}(3), \mathsf{S}(\texttt{"foo"}), \mathsf{I}(4)]$.

You can make heterogeneous lists this way to the extent that the type system allows heterogeneous types. Note that “heterogeneous lists” isn't quite correct: the list itself is homogeneous: it's a list of elements of type $U$. The heterogeneity is in the type $U$. To put an element in the list, you apply a constructor of $U$ first. After you get an element out of the list, apply a destructor of $U$ to get the original value with its original type.

You can do this with any type that the language supports. If you want a completely heterogeneous list, you need a language that supports an “any” type. That's Object in Java, for example. Strongly typed can have an “any” type if they carry the necessary type information at runtime. Java does it all the time. Languages that are statically typed (such as OCaml and other ML dialects, Haskell, Clean, Swift or Rust) can do it with a $\mathsf{dyn}$ type whose runtime representation contains the type of the value. With such a type, $\mathsf{dyn} \, \mathsf{list}$ is a list type that can contain a value of any type. This type coexists with other list types such as $\mathsf{int} \, \mathsf{list}$ (where the list elements don't carry runtime type information).

A related approach to building heterogeneous data structures is existential types. Existential types let you package a type with a value of that type: $(\exists \tau : P(\tau). a)$ where $a$ is an expression of some type $T$ such that $P(T)$ is true. For example, $\mathsf{dyn}$ can be modeled as a special case where $P$ is true of all types (an unbounded existential). A common use for existential types is to say that $\tau$ is a record, module or class with some particular elements or methods, without giving all the details: existential types are a way to model abstract types. With a bounded existential, you can still do some useful things with the value even without runtime type information (e.g. you can call the methods that $P$ describes), but not get the original type. A list whose elements have an existential type $T_E = (\exists \tau \ldots)$ can be seen as a heterogeneous list (because its elements have different “real” types), but it's still homogeneous in the sense that if you retrieve a value from the list, all you know is its package type $T_E$.

If the language has dependent types, you can package a value with its type in a way that allows recovering the original value: $\mathsf{package} ::= \sum_{\tau:\mathsf{TYPE}} \tau$ where $\mathsf{TYPE}$ is the type of types. This is a dependent sum type where the first component happens to be a type. The $\mathsf{package}$ type is a way to implement unbounded existentials in a dependently typed language. You can construct bounded existentials by adding constraints on $\tau$. Once again, you can build heterogeneous lists in the sense that a $\mathsf{package} \, \mathsf{list}$ contains elements whose “real” types are different, but the list itself is homogeneous in the sense that each list element has the type $\mathsf{package}$. As with existential types, you can't extract a value from a list and directly recover its “real” type. It's possible to destruct a value of type $\mathsf{package}$ by applying the second-element projection, but all you know about the result is that its type is the first-element projection: $p : \mathsf{package} \vdash \pi_2(p) : \pi_1(p)$.

So far, we've seen that in a non-degenerate type system, lists are homogeneous. It's possible to build heterogeneous lists, but the list type constructor itself is homogeneous: the heterogeneity comes from the element type. In a language that has both algebraic datatypes and types that depend on an integer (or something isomorphic to the naturals), it is possible to define a truly heterogenenous list type. Given a type family $(T_n)_{n \in \mathbb{N}}$, you can define the type of lists whose $n$th element has the type $T_n$. Here's such a definition in the language of the calculus of inductive constructions, specifically in Coq syntax. First, I define an example of a family of types indexed by an integer: tuple A n is the type of n-element tuples whose components all have the type A. To keep the definition simple, all tuples have an additional value U at the beginning of unit type. Then I define the inductive type hlist_ which is parametrized by both a type family T and an integer n, which is a heterogeneous list whose kth element has the type n + k. The parameter n is necessary to keep the definition constructive. Finally I show some example terms of type hlist (tuple bool), that is, lists whose nth element is an nth-element tuple of bool values (with U prepended).

Inductive unit : Type := U : unit.
Fixpoint tuple (A : Type) (n : nat) : Type :=
  match n with
    | 0 => unit
    | S m => (tuple A m) * A
  end.

Inductive hlist_ (T : nat -> Type) n :=
  | Hnil : hlist_ T n
  | Hcons : (T n) -> hlist_ T (S n) -> hlist_ T n.
Definition hlist T := hlist_ T 0.

Check (Hcons (tuple bool) 0 U (Hnil _ _) : hlist (tuple bool)).
Check (Hcons (tuple bool) 0 U (Hcons _ 1 (U, true) (Hnil _ _)) : hlist (tuple bool)).
Check (Hcons (tuple bool) 0 U (Hcons _ 1 (U, true) (Hcons _ 2 (U, true, true) (Hnil _ _))) : hlist (tuple bool)).

¹ _{Except for some primitive data types, actually, but that's not important here. When I say “any” about Java in this answer, I mean objects only, not primitive data types.}

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange