[net.lang.lisp] call/cc and proper tail recursion

harrison@uicsrd.CSRD.UIUC.EDU (10/11/86)

	A question about the definition of Scheme as "properly tail-recursive."

	Consider the following example:

	(define f (lambda (x y)
			(if (f x y) (bad-function))
			(function-with-side-effects x y)
			(if (p x y) #!null (f (cdr x) (cdr y))) ))

	This function is tail-recursive.  My understanding is that such
a function is evaluated with no net growth in the stack.  This means,
in short, that the parameters x and y will be assigned to (cdr x) and
(cdr y), and the recursive call effected by jumping to the entrance of
the function.  Is this so?
	If so, then consider what happens when bad-function is defined
as

	(lambda () (set! g (call/cc (lambda (x) x))))

i.e., bad-function records the current state, or continuation, in
the global variable g, and exits.  Now, at the termination of f,
g will contain some continuation which, if applied, would mean that
the bindings of x and y would have to be restored  to a state which occured
during one of the recursive instances of f.  My question is, how can this
occur if we don't record these values in a stack frame?

willc@tekchips.UUCP (Will Clinger) (10/16/86)

In article <20000003@uicsrd> harrison@uicsrd.CSRD.UIUC.EDU has a question
about the definition of Scheme as "properly tail-recursive":
>
>	(define f (lambda (x y)
>			(if (f x y) (bad-function))
>			(function-with-side-effects x y)
>			(if (p x y) #!null (f (cdr x) (cdr y))) ))
>
>	This function is tail-recursive.  My understanding is that such
>a function is evaluated with no net growth in the stack.  This means,
>in short, that the parameters x and y will be assigned to (cdr x) and
>(cdr y), and the recursive call effected by jumping to the entrance of
>the function.  Is this so?

Almost.  x and y are bound, not assigned; this distinction is important
because assignment would throw away the old values but binding does not.
By the way, I assume you intended for (f x y) in the first line of the
lambda body to be (g x y) to avoid an infinite (non-tail-recursive) loop.

>	If so, then consider what happens when bad-function is defined
>as
>	(lambda () (set! g (call/cc (lambda (x) x))))
>
>i.e., bad-function records the current state, or continuation, in
>the global variable g, and exits.  Now, at the termination of f,
>g will contain some continuation which, if applied, would mean that
>the bindings of x and y would have to be restored  to a state which occured
>during one of the recursive instances of f.  My question is, how can this
>occur if we don't record these values in a stack frame?

It is best to think of Scheme as a heap-based language instead of a
stack-based language.  In other words, pretend that continuations
(analogous to stack frames) are allocated on a heap and reclaimed by
a garbage collector instead of by adjusting a stack pointer.  Pretend
also that the continuation stored in g was created for the sole purpose
of preserving registers across the call to bad-function, and that those
registers are restored immediately upon return from the call.

The tail-recursive call in the last line of the body doesn't need to
preserve registers, so no continuation is created for it.

If you're wondering how this semantics can be implemented efficiently,
you might start by reading the paper on PC Scheme that was presented at
the 1986 ACM conference on Lisp and functional programming.  That paper
tells how the speed of a stack-based implementation can be achieved
by making call-with-current-continuation an expensive procedure to call.

Will Clinger
Tektronix Computer Research Lab

harrison@uicsrd.CSRD.UIUC.EDU (10/16/86)

	The example has a typo!   Should be:

	(define f (lambda (x y)
                    (if (p1 x y) (bad-function))
                    (fn-with-side-effects x y)
                    (if (p2 x y) #!null (f (cdr x) (cdr y)))))

	Again, bad-function is defined as

	(lambda () (set! g (call/cc (lambda (x) x))))

	My claim is,  that 'proper' tail recursion will cause an incorrect
result here (upon the application of g).

harrison@uicsrd.CSRD.UIUC.EDU (10/20/86)

Thanks for replying.

In the TI Scheme Language Reference Manual, p 14, I find:

	Scheme is defined to be properly tail recursive.  This means that
	Scheme handles all tail recursive procedure call with no net growth
	of the stack.

	...its most important characteristic is that it allows recursion
	to subsume iteration completely without losing efficiency.

It seems to me that if tail recursion incurs a penalty of space proportional
to the number of calls, plus time to garbage collect the space, then it is
significantly more expensive than iteration.  I have always assumed that the
'stack' in Scheme is actually maintained as a heap; my confusion came in
reading the claim above (and in several other presentations of Scheme) that
tail-recursive calls occur with no net growth in the stack.  What this seems
to mean in reality is merely that unwinding from a series of tail recursive
calls takes constant time (if we ignore the expense of garbage collecting
all the stack frames allocated).

darrelj@sdcrdcf.UUCP (Darrel VanBuer) (10/26/86)

Tail recursion results in no stack growth and the efficiency of iteration
because by the time the compiler finishes with it, it IS iteration.  I.e.
when the compiler finds any leg of code which ends activity in a function by
returning the value of another call to the function, it instead fixes up the
argument values and jumps to the top of the function.
Example
DEFUN ASSOC(KEY LST)
  (COND((EQ KEY(CAAR LST))
          (CAR LST))
       (T (ASSOC KEY (CDR LST))))
compiles to something like: (for a simple stack machine)
ASSOC:
	GETVAL KEY
	GETVAL LST
	CAR
	CAR
	EQ
	JUMPifFALSE FIXTAIL
	GETVAL LST
	CAR
	RETURN
FIXTAIL:
	GETVAL LST
	CDR
	SETQ LST
	JUMP ASSOC   (instead of CALL ASSOC; RETURN)

On tail recursion and function calls with bad side effects on local
variables (where, prior to recursion calls a function which sets a local
variable of the supposedly recursive function; then the iterative unfolding
collapse all the bindings into a single binding.  There are several ways of
dealing with the problem:

1.  Do as Scheme: use strict lexical binding -- if you can't see a reference
to a variable in directly embedded code, there ARE NO REFERENCES to it;
there is no such thing as a SPECIAL variable in scheme; thus tail recursion
is always safe based on static analysis of a single function.
2.  Ignore the problem during optimization and lusers get what they deserve
in manipulating special variables [They could always write
  (PROG1 (SELF --) (DUMMYFUNCTION)) to defeat the tail optimization]
3.  Have the compiler do some sort of extended analysis of the safety of
function calls.  This could range in sophistication from checking a list of
functions known to be free of side effects to elaborate analysis of all user
functions.  Then only do tail optimization in cases where it's known to be
safe.
-- 
Darrel J. Van Buer, PhD
System Development Corp.
2525 Colorado Ave
Santa Monica, CA 90406
(213)820-4111 x5449
...{allegra,burdvax,cbosgd,hplabs,ihnp4,orstcs,sdcsvax,ucla-cs,akgua}
                                                            !sdcrdcf!darrelj
VANBUER@USC-ECL.ARPA

willc@tekchips.UUCP (Will Clinger) (10/28/86)

In article <20000008@uicsrd> harrison@uicsrd.CSRD.UIUC.EDU writes:
>It seems to me that if tail recursion incurs a penalty of space proportional
>to the number of calls, plus time to garbage collect the space, then it is
>significantly more expensive than iteration.

Tail recursion in Scheme doesn't incur any space penalty at all.  I can't
speak for all implementations of Scheme (because many pure interpreters do
indeed create garbage as part of the interpretation process) but I know that
if you say something like

        ((rec loop (lambda () (loop))))

in PC Scheme or MacScheme or compiled T3 you will get a tight infinite
tail-recursive loop in which no storage is allocated and the garbage
collector never runs.  (Not all infinite loops have this property, of
course; for example ((rec loop (lambda (n) (loop (1+ n)))) 0) will
eventually allocate storage for bignums.)

>                                              I have always assumed that the
>'stack' in Scheme is actually maintained as a heap; my confusion came in
>reading the claim above (and in several other presentations of Scheme) that
>tail-recursive calls occur with no net growth in the stack.  What this seems
>to mean in reality is merely that unwinding from a series of tail recursive
>calls takes constant time (if we ignore the expense of garbage collecting
>all the stack frames allocated).

No, it means that tail-recursion is as efficient as iteration in both
time and space.  That's not to say that any particular compiler will
generate exactly the same code for a do loop that it does for an equivalent
tail recursion, but if there's a significant difference in performance then
something's wrong with the compiler.

Peace, William Clinger
Tektronix Computer Research Lab

harrison@uicsrd.CSRD.UIUC.EDU (10/28/86)

   >/* Written 10:04 am  Oct 26, 1986 by darrelj@sdcrdcf.UUCP
       in uicsrd:net.lang.lisp */
   >Tail recursion results in no stack growth and the efficiency of iteration
   >because by the time the compiler finishes with it, it IS iteration.
   
   ...
   
   >1.  Do as Scheme: use strict lexical binding -- if you can't see a reference
   >to a variable in directly embedded code, there ARE NO REFERENCES to it;
   >there is no such thing as a SPECIAL variable in scheme; thus tail recursion
   >is always safe based on static analysis of a single function.
   
I think that my first example demonstrates that this is not so, given call/cc.
Here is another example, a *bit* less contrived, since it actually does
something!

(define f (lambda (x y) (if (zero? x) y (f (-1+ x) (+ y (* x (h)))))))

(define h (lambda () (set! multiplier (call/cc (lambda (x) x)))
                     (set! entry-points (cons multiplier entry-points))
                     (if (integer? multiplier) multiplier 1)))

consider the sequence:
   (set! entry-points nil)
   (f 5 0) ==> 15
   
if at this point we evaluate
   ((car entry-points) 3), we get 17;
if instead we evaluate
   ((cadr entry-points) 3), we get 19;
if instead we do
   ((caddr entry-points) 3), we get 21; and so on.

It would not be possible
to achieve this result if we performed the recursive calls of f without
preserving the bindings of x and y.  Notice that there are no side-effects
on anything local to f; all of the trickery is within h, which would (possibly)
be compiled separately.  (We could as easily make h a parameter, or grab it
off of an atom property, whatever.)

I agree that global analysis of the program is the answer; my point is that
the definition of scheme makes it seem as though one can state simply that
tail recursion can be made as efficient as iteration, which simply isn't so
in the presence of call/cc.

baldwin@rochester.ARPA (Douglas Baldwin) (10/31/86)

In article <20000010@uicsrd>, harrison@uicsrd.CSRD.UIUC.EDU writes:
> 
> (define f (lambda (x y) (if (zero? x) y (f (-1+ x) (+ y (* x (h)))))))
> 
> (define h (lambda () (set! multiplier (call/cc (lambda (x) x)))
>                      (set! entry-points (cons multiplier entry-points))
>                      (if (integer? multiplier) multiplier 1)))

	It seems to me (as a Scheme fan more than implementor) that a lot
of the confusion here has to do with distinguishing tail recursion
optimization from how closures are implemented. In particular, the
continuation accessed by call/cc is (presumably) a closure, i.e., a
procedure and bindings for its local variables. This closure obviously
has to contain bindings for X and Y, but these bindings can be copied
off the stack into the heap at the time the closure is created. Creation
of the closure could happen as late as the time at which call/cc is
invoked (i.e., interpreters and compilers presumably don't keep
continuations as full closures until something like call/cc forces
them to do so). What this means is that tail recursion optimization in
F and closure creation in H are entirely different activities. In
particular, tail recursion can happen with no net stack growth, it's
call/cc that increases heap size in order to save a closure, and the
implementation of closures that has to be smart enough to know that
bindings on the stack may be lost if they aren't copied somewhere.
(I realize that the stack might actually be part of the heap, and that
there are alternatives to copying for saving bindings in closures,
but I think the net effect is as I've described here.)

harrison@uicsrd.CSRD.UIUC.EDU (11/01/86)

	I read the reference you suggested, and think I understand how this
works now.  call/cc simply copies the stack into the heap; if a frame
representing a tail-recursive call is 'caught' by a call to call/cc, it will
be copied into the heap before the next call is made.  Barring an invocation
of call/cc (no doubt the perversity in my example is rare), only one frame's
worth of locations are allocated for the bindings of the local vars of the
tail-recursive function.
	This implementation of call/cc surprises me.  To quote from
"The Implementation of PC Scheme", by D. H. Bartley and John C. Jensen, p 88,

	When CALL-WITH-CURRENT-CONTINUATION is invoked, the stack is copied
	into the heap to make a continuation object.  To reduce the worst-case
	cost of this copying and to permit arbitrary growth of the stack,
	a small buffer is used to hold the topmost stack frames.  When
	this buffer overflows, all frames but the currently active one are
	copied into a continuation object in the heap.

It would seem that most stack frames would end up on the heap, especially if
either the buffer is small, or call/cc is used heavily.  Given that, why not
allocate all frames on the heap to start with?  Having done so directly,
call/cc becomes dirt cheap, as a continuation may be represented merely by a
pointer to a 'stack' frame and a code pointer.  Likewise, closure formation
would be dirt cheap, requiring only a code pointer and a lexical parent pointer
(a static link, as they call it in compilers for algol-ish languages).
We would have to garbage collect the entire stack, but if we have to do
(nearly) that anyway, why not make closure formation and call/cc cheap?
(We could still perform tail recursion 'properly', except in the event that a
closure was created which caught a local variable of the tail-recursive
function, such that the closure could out-live the stack frame of that
function, or call/cc was used as per my example.  In those cases, we could make
a normal recursive call instead of a properly tail-recursive one.)

ram@spice.cs.cmu.edu (Rob MacLachlan) (11/01/86)

    From: harrison@uicsrd.CSRD.UIUC.EDU
    Newsgroups: net.lang.lisp
    Subject: Re: call/cc and proper tail recursion
    I agree that global analysis of the program is the answer; my point is that
    the definition of scheme makes it seem as though one can state simply that
    tail recursion can be made as efficient as iteration, which simply isn't so
    in the presence of call/cc.

Proof notwithstanding, you are totally wrong.  As was earlier mentioned by
someone who knew what they were talking about, "call with current
continuation" is implemented in a somewhat expensive way so that tail
recursion still does work.  Basically what you do is copy the entire stack
onto the heap.  Your time would be better spent if you avoided proving
impossible things which have already been done.