What are union types and intersection types? [duplicate]

Question 1

Union and intersection types are just considering types to be sets of values (infinite sets, mostly). If that's how you're thinking of types then any operation on sets that results in a set could be applied to types (sets of values) to make a new type (set of values), at least conceptually.

A union type is similar in some senses to a sum type, which you seem to be familiar with. Indeed I've often heard sum types described as "discriminated union" types. The basic difference is that a sum type like (Haskell notation) data FooBar = Foo Integer | Bar String allows you to tell whether a given FooBar value contains an Integer or a String (because FooBar values are tagged with Foo or Bar). Even if we write data FooBar = Foo Integer | Bar Integer where both types are the same, the "tag" adds extra information and we can tell "which integer" the FooBar value is.

The union type equivalent would be something like (not valid Haskell) data FooBar = Integer | String. The values in FooBar simply are all the string values and all the integer values. If we make a union type of the same two types like data FooBar = Integer | Integer it should be logically indistinguishable from just Integer, since the union of a set with itself is itself.

In principle, the things you could do with values in a type U that is the union of types A and B are just the operations that work on As and also work on Bs; any operation that only works on either As or Bs might get the wrong kind of input, because a U has no information to say whether it's an A or a B.¹

(Undiscriminated) union types wouldn't be very interesting in languages with type systems similar to Haskell's, because concrete types are disjoint², so the only operations that work on both As and Bs work on all values (unless A is B, in which it's just all the operations that work on that single type).

But in a way, a type classes (if you're familiar with them) are a way of providing something a bit like a union type. A type which is polymorphic but constrained to be a member of some type class is a little like a union of all the types which are in the type class (except that you don't know what those are, because type classes are in principle open); the only things you can do with such a value are the thins which have been declared to work on values of every type in the type class.

Union types could be interesting in a language with sub-typing (as is common in object-oriented programming languages). If you union together two subtypes of a common super-type you get something that supports at least the operations of the super-type, but it excludes any other subtypes of the super-type, so it isn't the same as just using the super-type.

An intersection type is exactly the concept, but using intersection instead of union. This means the things you could do with a value in a type I that is the intersection of types A and B are the operations that work on As plus the operations that work on Bs; anything in I is guaranteed to be both an A and a B, so it can safely be given to either kind of operation.

These also wouldn't be very interesting in languages with Haskell-like type systems. Because concrete types are disjoint², any non-trivial intersection is empty. But again, type class constraints can provide something a bit like an intersection type; if you add multiple type class constraints to the same type variable, then the only values that can be used where that type variable is expected are of types that are in the "intersection" of all the type classes, and the operations you can use on such values are the operations that work with any of the type classes.

¹ You could imagine combining an operation A -> C and an operation B -> D to get an operation (A | B) -> (C | D), much like you can use the tags of sum types to "route" a sum type to the appropriate operation. But it gets murky for fully general union types. If A and B overlap (and overlapping types enter the fray as soon as you've got union types), then which operation do you invoke on a value in the overlapping region? If you can tell whether it's an A or a B then you've really got a sum type rather than a union type, and if you apply some arbitrary resolution strategy like picking the A -> C operation because A was listed earlier in the union type's definition, then things work fine in simple cases but get very confusing if you've got types like (A | B) & (B | A) (where I'm using & to denote intersection).

² Although the "disjoint types" point is debatable. In types like data Maybe a = Nothing | Just a you could justifiably argue that the Nothing is the "same value" even for different a. If so, then the union of Maybe String and Maybe Integer only contains one Nothing (rather than both the Nothing that is "no string" and the Nothing that is "no integer"). And the intersection of Maybe String and Maybe Integer contains only one value, that being Nothing.

Question 2

The Whiley programming language supports union and intersection types. If you think of types as sets (i.e. the type int is the set of all valid integers), then union types correspond to set union, whilst intersection types to set intersection.

The classic example in Whiley for union types is representing a "nullable" type, as follows:

null|int indexOf(string str, char c):
    for i in 0..|str|:
       if str[i] == c:
          return i // found a match
    // didn't find a match
    return null

Here, the type null|int may hold any valid integer, or the special null value. In Whiley, you cannot perform arithmetic on such a type. Therefore, you first have to type test to check for the null value before you can use the returned value. For example, like so:

string replaceFirst(string str, char old, char new):
    idx = indexOf(str,old)
    if idx is int:
        str[idx] = new
    // return potentially updated string
    return str

I've written some posts on union types in Whiley here and here. Intersection types are similar, although not currently very well supported in the compiler.

Question 3

A type can be seen as a set of values. For example, if Boolean is the set of the values true and false, saying that some value has type Boolean means it is either the value true or false.

Note that some types, like String, can have infinitely many possible values.

As you probably know, union and intersection are set operations, and hence they apply to types as well. For example, when one has types T1 = {male, female} and T2 = {not-applicable} one can build the type T3 = T1 \union T2 = {male, female, not-applicable}. An example where this type would be useful would be answers to the question: "What is the gender of your first born child?" Since some people don't have any children, they could answer: not-applicable.