Clear, intuitive derivation of the fixed-point combinator (Y combinator)?

https://cs.stackexchange.com/questions/9604

16-10-2019
|

Pergunta

The fixed-point combinator FIX (aka the Y combinator) in the (untyped) lambda calculus ($\lambda$) is defined as:

FIX $\triangleq \lambda f.(\lambda x. f~(\lambda y. x~x~y))~(\lambda x. f~(\lambda y. x~x~y))$

I understand its purpose and I can trace the execution of its application perfectly fine; I would like to understand how to derive FIX from first principles.

Here is as far as I get when I try to derive it myself:

FIX is a function: FIX $\triangleq \lambda_\ldots$
FIX takes another function, $f$, to make it recursive: FIX $\triangleq \lambda f._\ldots$
The first argument of the function $f$ is the "name" of the function, used where a recursive application is intended. Therefore, all appearances of the first argument to $f$ should be replaced by a function, and this function should expect the rest of the arguments of $f$ (let's just assume $f$ takes one argument): FIX $\triangleq \lambda f._\ldots f~(\lambda y. _\ldots y)$

This is where I do not know how to "take a step" in my reasoning. The small ellipses indicate where my FIX is missing something (although I am only able to know that by comparing it to the "real" FIX).

I already have read Types and Programming Languages, which does not attempt to derive it directly, and instead refers the reader to The Little Schemer for a derivation. I have read that, too, and its "derivation" was not so helpful. Moreover, it is less of a direct derivation and more of a use of a very specific example and an ad-hoc attempt to write a suitable recursive function in $\lambda$.

Solução

I haven't read this anywhere, but this is how I believe $Y$ could have been derived:

Let's have a recursive function $f$, perhaps a factorial or anything else like that. Informally, we define $f$ as pseudo-lambda term where $f$ occurs in its own definition:

$$f = \ldots f \ldots f \ldots $$

First, we realize that the recursive call can be factored out as a parameter:

$$f = \underbrace{(\lambda r . (\ldots r \ldots r \ldots))}_{M} f$$

Now we could define $f$ if we only had a way how to pass it as an argument to itself. This is not possible, of course, because we don't have $f$ at hand. What we have at hand is $M$. Since $M$ contains everything we need to define $f$, we can try to pass $M$ as the argument instead of $f$ and try to reconstruct $f$ from it later inside. Our first attempt looks like this:

$$f = \underbrace{(\lambda r . (\ldots r \ldots r \ldots))}_{M} \underbrace{(\lambda r . (\ldots r \ldots r \ldots))}_{M}$$

However, this is not completely correct. Before, $f$ got substituted for $r$ inside $M$. But now we pass $M$ instead. We have to somehow fix all places where we use $r$ so that they reconstruct $f$ from $M$. Actually, this not difficult at all: Now that we know that $f = M M$, everywhere we use $r$ we simply replace it by $(r r)$.

$$f = \underbrace{(\lambda r . (\ldots (rr) \ldots (rr) \ldots))}_{M'} \underbrace{(\lambda r . (\ldots (rr) \ldots (rr) \ldots))}_{M'}$$

This solution is good, but we had to alter $M$ inside. This is not very convenient. We can do this more elegantly without having to modify $M$ by introducing another $\lambda$ that sends to $M$ its argument applied to itself: By expressing $M'$ as $\lambda x.M(xx)$ we get

$$f = (\lambda x.\underbrace{(\lambda r . (\ldots r \ldots r \ldots))}_{M}(xx)) (\lambda x.\underbrace{(\lambda r . (\ldots r \ldots r \ldots))}_{M}(xx))$$

This way, when $M$ is substituted for $x$, $MM$ is substituted for $r$, which is by the definition equal to $f$. This gives us a non-recursive definition of $f$, expressed as a valid lambda term!

The transition to $Y$ is now easy. We can take an arbitrary lambda term instead of $M$ and perform this procedure on it. So we can factor $M$ out and define

$$Y = \lambda m . (\lambda x. m(xx)) (\lambda x.m(xx))$$

Indeed, $Y M$ reduces to $f$ as we defined it.

Note: I've derived $Y$ as it is defined in literature. The combinator you've described is a variant of $Y$ for call-by-value languages, sometimes also called $Z$. See this Wikipedia article.

Outras dicas

As Yuval has pointed out there is not just one fixed-point operator. There are many of them. In other words, the equation for fixed-point theorem do not have a single answer. So you can't derive the operator from them.

It is like asking how people derive $(x,y)=(0,0)$ as a solution for $x=y$. They don't! The equation doesn't have a unique solution.

Just in case that what you want to know is how the first fixed-point theorem was discovered. Let me say that I also wondered about how they came up with the fixed-point/recursion theorems when I first saw them. It seems so ingenious. Particularly in the computability theory form. Unlike what Yuval says it is not the case that people played around till they found something. Here is what I have found:

As far as I remember, the theorem is originally due to S.C. Kleene. Kleene came up with the original fixed-point theorem by salvaging the proof of inconsistency of Church's original lambda calculus. Church's original lambda calculus suffered from a Russel type paradox. The modified lambda calculus avoided the problem. Kleene studied the proof of inconsistency probably to see how if the modified lambda calculus would suffer from a similar problem and turned the proof of inconsistency into a useful theorem in of the modified lambda calculus. Through his work regarding equivalence of lambada calculus with other models of computation (Turing machines, recursive functions, etc.) he transferred it to other models of computation.

How to derive the operator you might ask? Here is how I keep it in mind. Fixed-point theorem is about removing self-reference.

Everyone knows the liar paradox:

I am a lair.

Or in the more linguistic form:

This sentence is false.

Now most people think the problem with this sentence is with the self-reference. It is not! The self-reference can be eliminated (the problem is with truth, a language cannot speak about the truth of its own sentences in general, see Tarski's undefinability of truth theorem). The form where the self-reference is removed is as follows:

If you write the following quote twice, the second time inside quotes, the resulting sentence is false: "If you write the following quote twice, the second time inside quotes, the resulting sentence is false:"

No self-reference, we have instructions about how to construct a sentence and then do something with it. And the sentence that gets constructed is equal to the instructions. Note that in $\lambda$-calculus we don't need quotes because there is no distinction between data and instructions.

Now if we analyse this we have $MM$ where $Mx$ is the instructions to construct $xx$ and do something to it.

$Mx = f(xx)$

So $M$ is $\lambda x. f(xx)$ and we have

$MM = (\lambda x. f(xx))(\lambda x. f(xx))$

This is for a fixed $f$. If you want to make it an operator we just add $\lambda f$ and we get $Y$:

$Y = \lambda f. (MM) = \lambda f.((\lambda x. f(xx))(\lambda x. f(xx)))$

So I just keep in mind the paradox without self-reference and that helps me understand what $Y$ is about.

So you need to define a fixed point combinator

fix f = f (fix f)
      = f (f (fix f))
      = f (f (f ... ))

but without explicit recursion. Let's start with the simplest irreducible combinator

omega = (\x. x x) (\x. x x)
      = (\x. x x) (\x. x x)
      = ...

The x in the first lambda is repeatedly substituted by the second lambda. Simple alpha-conversion makes this process clearer:

omega =  (\x. x x) (\x. x x)
      =α (\x. x x) (\y. y y)
      =β (\y. y y) (\y. y y)
      =α (\y. y y) (\z. z z)
      =β (\z. z z) (\z. z z)

I.e. the variable in the first lambda always disappears. So if we add an f to the first lambda

(\x. f (x x)) (\y. y y)

the f will bob up

f ((\y. y y) (\y. y y))

We have got our omega back. It should be clear now, that if we add an f to the second lambda, then the f will appear in the first lambda and then it will bob up:

Y f = (\x. x x)     (\x. f (x x))
      (\x. f (x x)) (\x. f (x x)) -- the classical definition of Y

Since

(\x. s t) z = s ((\x. t) z), if `x' doesn't occur free in `s'

we can rewrite the expression as

f ((\x. x x) (\x. f (x x))

which is just

f (Y f)

and we have got our equation Y f = f (Y f). So the Y combinator is essentially

double the f
make the first f bobbed up
repeat

You may have seen the classic example of an equation without a normal form:

$$(\lambda x.xx)(\lambda x.xx) \triangleright (\lambda x.xx)(\lambda x.xx)$$

A similar equation is suggested by that for general recursion:

$$\begin{array} {rr} & (\lambda x.R(xx))(\lambda x.R(xx)) ~\\ \triangleright & R(~ (\lambda x.R(xx))(\lambda x.R(xx))~) \\ \triangleright & R(R(~ (\lambda x.R(xx))(\lambda x.R(xx))~)) \\ \triangleright & \dots \end{array} \tag{A}$$

(A) is a way to write general recursive equations in lambda calculus (beyond primitive recursive). So how you solve the equation $Yf = f(Yf)$ ? Plug $f$ in for $R$ in the above equation to get:

$$Yf = (\lambda x.f(xx))(\lambda x.f(xx))$$ $$Y = \lambda f.(\lambda x.f(xx))(\lambda x.f(xx))$$

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange