[comp.lang.scheme] Y combinator derivation

markf@ZURICH.AI.MIT.EDU (09/23/89)

Sometime in the middle past (i.e. more than six months ago and less
than two years ago) someone posted a nice derivation of the Y
combinator. He or she took a recursive definition of factorial (I
think) and through a series of abstractions derived Y. 

I thought I saved this little gem but I can't find it. I also can't
find it in the scheme archives here at MIT. If someone else saved this
or if the author reads this I would appreciate it if they could send
me a copy.

Thanks.


Mark Friedman
MIT Artificial Intelligence Lab
545 Technology Sq.
Cambridge, Ma. 02139

markf@zurich.ai.mit.edu

chewy@apple.com (Paul Snively) (09/23/89)

In article <8909222102.AA19330@zurich.ai.mit.edu> markf@ZURICH.AI.MIT.EDU 
writes:
> Sometime in the middle past (i.e. more than six months ago and less
> than two years ago) someone posted a nice derivation of the Y
> combinator. He or she took a recursive definition of factorial (I
> think) and through a series of abstractions derived Y. 

This sounds suspiciously like the derivation of the applicative-order 
Y-combinator that appears in "The Little Lisper," either the trade edition 
or the third edition of which is STILL the standard way to turn someone on 
to Scheme, in my book, especially if they're not much into computer 
science (if they're already into computer science, I'll forget the quick 
kiss of "The Little Lisper" and move right on into pure seduction with 
"Structure and Interpretation of Computer Programs.")

__________________________________________________________________________
Just because I work for Apple Computer, Inc. doesn't mean that they 
believe what I believe or vice-versa.
__________________________________________________________________________

jozwiak@uicsrd.csrd.uiuc.edu (09/24/89)

You may want to consult:
 "Denotational Semantics-A Methodology for Language Development"
by David A. Schmidt, ISBN 0-697-06849-8.
 I think that the Y combinator is:

 Y = lambda h.(lambda x. h(x x))
              (lambda x. h(x x)) .

 I think that this does what it should is pretty patent from it, but
as for deriving it, this involves insight that I don't yet have.

  john

pierce@lanai.cs.ucla.edu (Brad Pierce) (09/25/89)

In article <16800018@uicsrd.csrd.uiuc.edu> jozwiak@uicsrd.csrd.uiuc.edu writes:
>
>You may want to consult:
> "Denotational Semantics-A Methodology for Language Development"
>by David A. Schmidt, ISBN 0-697-06849-8.
> I think that the Y combinator is:
>
> Y = lambda h.(lambda x. h(x x))
>              (lambda x. h(x x)) .
>
> I think that this does what it should is pretty patent from it, but
>as for deriving it, this involves insight that I don't yet have.
>
>  john

I think the previous poster was asking for a derivation of the
*applicative-order* Y combinator, not the normal-order as given
here. ``The Little Lisper'' (as someone else mentioned) has a
good presentation, but its complexity points out the price pays
for using an applicative-order language.

In normal-order languages, like standard lambda calculus, the
derivation is straightforward. The motivation for the Y-combinator
in lambda calculus is to allow the use of recursive definitions.
Recall that there is no such thing as assignment in lambda calculus, 
but that this elegant and mathematically tractable model of 
computation is rich enough to support recursion anyway.

The standard example goes as follows:

Define the factorial function FACT(n) = n! as

     FACT (n) = lambda n. if (n == 0) 1 else n*FACT(n-1)

Then abstracting out the FACT, define:

     H = lambda fact. lambda n. if (n == 0) 1 else n*fact(n-1)

Then we have the equation:

     FACT = H(FACT)

We want to solve for FACT. Since the only other factor is the
function H, we look for a function that will take H as input
and return the solution FACT. Said another way, we're looking 
for a function, F, that satisfies the following equation:

     FH = H(FH)

If combinator theory is defined in terms of the lambda calculus,
then we can define combinators as lambda expressions without
free variables. For obvious reasons, a combinator with the kind
of property described above is called a fixed-point combinator.
It is possible to define an infinite number of different fixed-
point combinators in the lambda calculus, but the simplest
is the so-called Y-combinator given above. It is easy to verify
the definition, as long as you remember that it is to be
evaluated in normal-order. Finding a fixed-point combinator
is not so easy when using applicative-order evaluation. Please see 
the ``Little Lisper''. 

Maybe it's time to start thinking seriously about whether the
current performance benefits of applicative-order evaluation
is worth the theoretical and human factors price tag.

-- Brad

marshall@iuvax.cs.indiana.edu (Jim Marshall) (09/25/89)

Hello world.  Someone recently requested information about deriving the
applicative-order Y-combinator from scratch, so I thought I would post a
derivation of it in Scheme.  The following derivation is similar in flavor
to the derivation found in "The Little LISPer" by Friedman/Felleisen, but
uses a slightly different starting approach (for one thing, I begin with
the "length" function).  Maybe this version will be a little easier to
follow (maybe not, who knows?).  Enjoy...

Step 0...  We wish to write the recursive function "length" without
having to give it a name.  Then we could invoke it on a list as follows:

((lambda (l)
   (if (null? l)
       0
       (add1 (??? (cdr l)))))
 '(any old list you like))

The problem, of course, is that we can't plug in the function
expression itself directly in place of the ??? because that
immediately leads to an infinite regress.  It's like trying to quote
an entire sentence inside the sentence itself.

----------------------------------------------------------------------

Step 1...  This is the _only_ step in the entire derivation that
requires any real thought.  We can get around the above problem by
passing in a _copy_ of the length function as an extra argument and
then using _that_ in place of the ???.  But we must ensure that the
copy of our function looks _exactly_ like the function itself at all
times.  WE WILL ADHERE TO THIS REQUIREMENT IN EVERY STEP THAT FOLLOWS.
Notice that since f is a copy of the function, and the function takes
a copy of itself as a second argument, we must also pass f to f (as a
second argument).  Passing a copy of the function to itself is the
whole secret of the Y-combinator.  The following expression will
evaluate to 5 (the length of the example list):

((lambda (l f)
   (if (null? l)
       0
       (add1 (f (cdr l) f))))
 '(any old list you like)
 (lambda (l f)
   (if (null? l)
       0
       (add1 (f (cdr l) f)))))

------------------------------------------------------------------------

Step 2...  We just switch the order of the function arguments, l and
f.  "(f (cdr l) f)" changes to "(f f (cdr l))", and the arguments to
the top-level invocation also switch places:

((lambda (f l)
   (if (null? l)
       0
       (add1 (f f (cdr l)))))
 (lambda (f l)
   (if (null? l)
       0
       (add1 (f f (cdr l)))))
 '(any old list you like))

------------------------------------------------------------------------

Step 3...  We simply curry the function so that it takes its two
arguments one at a time.  Note the extra set of ()'s around the
top-level invocation as well.  This expression still evaluates to 5:

(((lambda (f)
    (lambda (l)
      (if (null? l)
	  0
	  (add1 ((f f) (cdr l))))))
  (lambda (f)
    (lambda (l)
      (if (null? l)
	  0
	  (add1 ((f f) (cdr l)))))))
 '(any old list you like))

-------------------------------------------------------------------------

Step 4...  The above expression is now of the form
"(<function> '(any old list you like))", where <function> is now a
self-contained recursive version of "length", although still in a
clumsy form.  We can forget about '(any old list you like) for the
remainder of the derivation and concentrate on just the <function>
part, since that's what we're interested in.  So here it is by itself:

((lambda (f)
   (lambda (l)
     (if (null? l)
	 0
	 (add1 ((f f) (cdr l))))))
 (lambda (f)
   (lambda (l)
     (if (null? l)
	 0
	 (add1 ((f f) (cdr l)))))))

-------------------------------------------------------------------------

Step 5...  Notice that in the above expression (f f) returns a
function which gets applied to (cdr l).  In the same way that the add1
function is equivalent to the function (lambda (a) (add1 a)), the
"(f f) function" is equivalent to the function (lambda (a) ((f f) a)).
[This is just an inverse eta step to you lambda-calculus pros out
there].  This step is necessary to avoid infinite loops, since we're
assuming applicative order (i.e, Scheme).

((lambda (f)
   (lambda (l)
     (if (null? l)
	 0
	 (add1 ((lambda (a) ((f f) a)) (cdr l))))))
 (lambda (f)
   (lambda (l)
     (if (null? l)
	 0
	 (add1 ((lambda (a) ((f f) a)) (cdr l)))))))

-------------------------------------------------------------------------

Step 6...  Here we just give the (lambda (a) ((f f) a)) function the name
"r" using a let-expression.  Simple.  (Notice how every change to our
function requires an identical change to the _copy_ of the function, as
mentioned earlier).

((lambda (f)
   (let ([r (lambda (a) ((f f) a))])
     (lambda (l)
       (if (null? l)
	   0
	   (add1 (r (cdr l)))))))
 (lambda (f)
   (let ([r (lambda (a) ((f f) a))])
     (lambda (l)
       (if (null? l)
	   0
	   (add1 (r (cdr l))))))))

-------------------------------------------------------------------------

Step 7...  Now we just expand the let-expressions into their equivalent
lambda-forms.  In general, "(let ([x val]) body)" is equivalent to
"((lambda (x) body) val)".  This may look complicated but it's not.
Just match up the ()'s carefully.

((lambda (f)
   ((lambda (r)
      (lambda (l)
	(if (null? l)
	    0
	    (add1 (r (cdr l))))))
    (lambda (a) ((f f) a))))
 (lambda (f)
   ((lambda (r)
      (lambda (l)
	(if (null? l)
	    0
	    (add1 (r (cdr l))))))
    (lambda (a) ((f f) a)))))

-------------------------------------------------------------------------

Step 8...  Now we can give the (lambda (r) (lambda (l) ...))
expression a name also ("m") using a let-expression, since it has
no free variables (except for primitives, but they're bound globally
anyway).  This step is just like Step 6.

(let ([m (lambda (r)
	   (lambda (l)
	     (if (null? l)
		 0
		 (add1 (r (cdr l))))))])
  ((lambda (f)
     (m (lambda (a) ((f f) a))))
   (lambda (f)
     (m (lambda (a) ((f f) a))))))

-------------------------------------------------------------------------

Step 9...  Now we replace the let-expression for "m" by its equivalent
lambda-form, just like in Step 7, and out pops the applicative-order
Y-combinator!  The expression below still represents the self-contained
recursive length function, but now it's in a nicer form.  In particular,
the (lambda (m) ...) sub-expression is Y:

((lambda (m)
   ((lambda (f) (m (lambda (a) ((f f) a))))
    (lambda (f) (m (lambda (a) ((f f) a))))))
 (lambda (r)
   (lambda (l)
     (if (null? l)
	 0
	 (add1 (r (cdr l)))))))

------------------------------------------------------------------------

Step 10...  We just pull out the (lambda (m) ...) sub-expression and
call it Y, since all of its variables are bound (after all, it's a
combinator).  Then the expression above for the recursive length function
can be rewritten as shown below.  The expression passed to Y is a
"template" for the recursive length function.  Instead of "???", we call
the recursive invocation "r", wrap the whole thing with (lambda (r) ...),
and hand it over to Y, which returns a self-contained recursive function.
You can give it a name with define if you want, but you don't have to.

(define Y
  (lambda (m)
    ((lambda (f) (m (lambda (a) ((f f) a))))
     (lambda (f) (m (lambda (a) ((f f) a)))))))

(Y (lambda (r)
     (lambda (l)
       (if (null? l)
	   0
	   (add1 (r (cdr l)))))))

-------------------------------------------------------------------------
Jim Marshall, Gradual Student      |  "That it will never come again
Computer Science Department        |   Is what makes life so sweet."
Indiana University, Bloomington    |                 -Emily Dickinson
marshall@iuvax.cs.indiana.edu      |
-------------------------------------------------------------------------

dorai@titan.rice.edu (Dorai Sitaram) (09/29/89)

In article <27390@shemp.CS.UCLA.EDU> pierce@lanai.UUCP (Brad Pierce) writes:
$
$I think the previous poster was asking for a derivation of the
$*applicative-order* Y combinator, not the normal-order as given
$here. ``The Little Lisper'' (as someone else mentioned) has a
$good presentation, but its complexity points out the price pays
$for using an applicative-order language.
$
$In normal-order languages, like standard lambda calculus, the
$derivation is straightforward. The motivation for the Y-combinator
$in lambda calculus is to allow the use of recursive definitions.
$Recall that there is no such thing as assignment in lambda calculus, 
$but that this elegant and mathematically tractable model of 
$computation is rich enough to support recursion anyway.
$
$The standard example goes as follows:
$
$ [A quite proper description of normal-order Y deleted.]
$
$is the so-called Y-combinator given above. It is easy to verify
$the definition, as long as you remember that it is to be
$evaluated in normal-order. Finding a fixed-point combinator
$is not so easy when using applicative-order evaluation. Please see 
$the ``Little Lisper''. 
$
$Maybe it's time to start thinking seriously about whether the
$current performance benefits of applicative-order evaluation
$is worth the theoretical and human factors price tag.

I find your _conclusions_ (ie, not the body of your posting)
astounding. The applicative-order Y is not all that prohibitively more
complex than the normal-order one. One only has to replace in

$> Y = lambda h.(lambda x. h(x x))
$>              (lambda x. h(x x)) .

both occurrences of (x x) by (lambda z. (x x) z). Note also that the
new Y shares the elegance and lack of assignment that you claim for
the old Y.

Re the current acceptance of applicative-order evaluation in the
world's programming languages, I hardly think it can be attributed
solely to performance benefits. Applicative-order is _understandable_
in the presence of side-effects in a way that normal-order fails to
be.  You may protest that assignment-type side-effects should be
eliminated from the language, but then things like input and output
cannot be totally wished away.

Assuming you have a 'let' defined in the usual way in terms of lambda,
we have, in a potential normal-order real-life programming language
with input/output, among other things,

(let ([y (printf "y~n")])
  ((lambda (x) t) y))

never printing y at all;

(let ([y (printf "y~n")])
  (or y y))

printing y twice (assuming printf returns nil);

whereas in applicative order, y is just printed once (as expected,
IMHO), regardless of the body of the let.

--dorai
-------------------------------------------------------------------------------
Alles ist richtig, auch das Gegenteil. Nur: zwar ... aber -- das ist
nie richtig. 						       --Kurt Tucholsky
-------------------------------------------------------------------------------