Type inference with overloading
-
29-09-2020 - |
Question
I am working on a type system supporting overloading. I have a rough idea of how type inference is usually implemented in such a scenario, but I am wondering how - after type inference is completed - the correct implementation of an overloaded operator can be chosen. Or, in other words, how the inferred type can be passed back down the syntax tree to the operator.
For a small example, consider the expression (x + y) + 1
where x :: N | S, y :: a, + :: (N -> N -> N) | (S -> S -> S), 1 :: N
. ::
stands for type of, and a | b
stands for type a
or type b
.
The way, I assume, type inference would now work, is to traverse the syntax tree, and for each node return a type constraint:
(x + y) + 1 => ((N & (N[a=N] | S[a=S])), (N & N) -> N) | ((S & (N[a=N] | S[a=S])), (S & N) -> S) => N[a=N]
1 => N
+ => (N -> N -> N) | (S -> S -> S)
x + y => ((N & (N | S)), (N & a) -> N) | ((S & (N | S)), (S & a) -> S) => N[a=N] | S[a=S]
x => N | S
y => a
+ => (N -> N -> N) | (S -> S -> S)
a & b
in this example stands for unifying the types a
and b
, [a=T, b=U]
is a set of equality constraints for type variables.
As expected the return type of the given expression is inferred as N[a=N]
, that is N
where the type variable a
is expected to be N
.
Therefore, of the two provided implementations for the +
operator (N -> N -> N
, S -> S -> S
), N -> N -> N
should be used. In the given example, the resulting type is inferred, but not the type of the overloaded operator. My question is if there is a common pattern that is used to inform the +
node in the syntax tree of the used implementation.
Solution
You could organize type inference as follows.
Suppose your input syntax has type Input
. Define output syntax Output
to be like Input
but with explicit type annotations on all variables. The type inference would have type
infer : Input -> List (Output * Type)
That is, given some input e
, it returns a list of possible answers. Each answer is a pair (e', t)
where e'
is e
with variables annotated by types, and t
is the inferred type of e
.
You can view this as all happenning in the nondeterminism monad. Whenever you need to infer the type of a variable x
, you look up its possible types S | T | ...
and branch on each one of them. This way you do not have to "pass back" any information to sub-expressions. Instead, each sub-expressions already comes annotated, in all possible ways.