[comp.lang.scheme] stack/heap continuation allocation

gyro@cymbal.reasoning.COM (Scott Layson Burson) (05/02/91)

I've had an interesting idea come together over the last few days.  I
was thinking about implementations of firstclass continuations.  The
ones I'm familiar with work basically as follows: to reify a
continuation, they allocate a chunk of heap and copy the stack into
it; to invoke it, they copy it back into the stack.  I was reflecting
that that seemed like a lot of overhead, and noting that if one could
heap-allocate activation frames, continuation reification and
invocation per se would be extremely cheap, but of course heap-
allocating frames is extremely expensive in terms of GC, and wouldn't
it be nice to have the best of both worlds?  The efficiency of a stack
for the normal case, that is, with the efficiency of a heap for
continuations.

And while I haven't worked out all the details, I think I've come up
with a way to do that.  The fundamental observation is quite simple.
Roughly stated (I'm aware that I'm skipping over details and
variations in the way this is done): in a normal stack-oriented
language implementation, there are two registers, the stack pointer
(SP) and the frame pointer (FP).  Allocation of an activation frame
usually looks something like

  <push arguments>
  push FP
  move SP into FP

and deallocation looks like

  pop FP
  move FP into SP

Thus FP "tracks" SP in the sense that the difference SP - FP is the
number of words that have been pushed since the last frame allocation.
[For expository purposes I take the point of view that the stack grows
"up", toward "higher" addresses, though of course inverted stacks are
common.]  It occurred to me that this tracking is not strictly
necessary; it is possible to use FP for references to the current
activation frame and SP for allocation of the next activation frame,
and have the two be potentially independent.  Specifically: we can
denote a register to be the "stack bottom" register SB, such that we
allow FP < SB but require always that SP >= SB.  Then, basically, what
CALL/CC does to reify the current continuation is to set SB to SP,
conceptually transferring the frames that had been between SB and SP
from the stack into the heap (but this transfer is conceptual only,
and requires no copying).

Then FP tracks SP as usual *except* when execution is inside a
captured continuation.

The only overhead this adds to normal operation is to increase the
amount of work required to deallocate a frame:

    load FP[offset] into FP    ; <offset> known at compile time
    compare SP with SB + 1     ; `+ 1' covers initial `push FP'
    branch if equal to L
    move FP into SP
 L: 

Simple, yes?

This probably isn't the right thing for creating continuations in a
preemptive multithreading system.  The problem is that on every thread
switch, all the frames "recently" pushed on the stack get transferred
into the heap.  A multithreading system can and should take advantage
of the fact that each continuation it creates will be invoked at most
once, by switching entire stacks.

But for a program that really does make use of firstclass
continuations in a big way, this is as efficient an approach as I can
imagine.  Of course the garbage collector has to be written to handle
the structures that result from doing this.

(Another caveat: I am not sure how this interacts with some of the
cleverer ways of using the stack that Orbit (the T compiler), for
instance, is capable of.)

If this pans out -- and I haven't found anything wrong with it yet --
it definitely qualifies as my best idea in months.  I don't have time
to turn it into a paper -- if anyone out there would like to do so,
with me as co-author, I would be delighted!

-- Scott

gumby@Cygnus.COM (David V. Wallace) (05/03/91)

Seems to me that any system which spaghettis its stack has to go to
similar contortions already.

You might look at the 86 lisp conf proceedings -- I seem to remember
there were a couple of papers on this exact subject at that
conference.

jaffer@zurich.ai.mit.edu (Aubrey Jaffer) (05/03/91)

> ...
> If this pans out -- and I haven't found anything wrong with it yet --
> it definitely qualifies as my best idea in months.  I don't have time
> to turn it into a paper -- if anyone out there would like to do so,
> with me as co-author, I would be delighted!

This sounds similar to an late 1970s MIT AI Lab memo which I can't
find right now.  The citation was something like:

Richard Stallman,  Phantom Stacks: If you look to hard they dissapear,
MIT Artificial Intellegence Laboratory memo XXX.

As far as I know it was not implemented so there is still opportunity
to do so.

max@nic.gac.EDU (Max Hailperin) (05/03/91)

Scott Layson Burson <gyro@cymbal.reasoning.com> suggests in Scheme
Digest V3 #205 an implementation of call/cc which would convert the
stack to heap and then allow further stack growth (and contraction) in
the region on top of that.  Several parties responded that the idea
sounded familiar, but had some difficulty coming up with a concrete
citation.  The idea sounded familiar to me, and I have the good
fortune to put my hand onat least one of the places it was published:

MIT AI Memo No. 559 (EE&CS Integrated Circuit Memo No. 80-6), January
1980, "The SCHEME-79 Chip," by Jack Holloway, Guy Lewis Steele Jr.,
Gerald Jay Sussman, and Alan Bell, p. 24.

The relevant quotation is:

"Richard Stallman has observed that one can make stack out of list
nodes, as we do, but by allocating it from a separate region, the
stack will usually be a linear sequence of cells.  Thus, if we can be
sure that there are no pointers to the current region, pops can be
done by just moving the stack pointer as in a traditional machine.
If, however, the control environment is to be retained, we cannot just
deallocate linearly, but rather we must defer to the garbage collector
as usual.  Since it is only the construcdtion of retained control
environments which makes the stack non-linear, and since these
retained control environments are only constructed at known times by
either the user or the interrupt system, it is possible to know just
which control environments are to be retained.  Stallman suggests
having a movable base register which points to the base of the linear
(not retained) portion of the stack.  When the stack is retained, the
base register is just moved up to the stack pointer. When the stack is
popped, it may be deallocated unless the pointer is below the base
register.  Assuming that such retained control environments are rare
by comparison to the usual uses of the stack, one can usually treat
most of the stack as a linear array, gettin most of the efficiency of
normal stack operations on conventional machines."