gyro@cymbal.reasoning.COM (Scott Layson Burson) (05/02/91)
I've had an interesting idea come together over the last few days. I was thinking about implementations of firstclass continuations. The ones I'm familiar with work basically as follows: to reify a continuation, they allocate a chunk of heap and copy the stack into it; to invoke it, they copy it back into the stack. I was reflecting that that seemed like a lot of overhead, and noting that if one could heap-allocate activation frames, continuation reification and invocation per se would be extremely cheap, but of course heap- allocating frames is extremely expensive in terms of GC, and wouldn't it be nice to have the best of both worlds? The efficiency of a stack for the normal case, that is, with the efficiency of a heap for continuations. And while I haven't worked out all the details, I think I've come up with a way to do that. The fundamental observation is quite simple. Roughly stated (I'm aware that I'm skipping over details and variations in the way this is done): in a normal stack-oriented language implementation, there are two registers, the stack pointer (SP) and the frame pointer (FP). Allocation of an activation frame usually looks something like <push arguments> push FP move SP into FP and deallocation looks like pop FP move FP into SP Thus FP "tracks" SP in the sense that the difference SP - FP is the number of words that have been pushed since the last frame allocation. [For expository purposes I take the point of view that the stack grows "up", toward "higher" addresses, though of course inverted stacks are common.] It occurred to me that this tracking is not strictly necessary; it is possible to use FP for references to the current activation frame and SP for allocation of the next activation frame, and have the two be potentially independent. Specifically: we can denote a register to be the "stack bottom" register SB, such that we allow FP < SB but require always that SP >= SB. Then, basically, what CALL/CC does to reify the current continuation is to set SB to SP, conceptually transferring the frames that had been between SB and SP from the stack into the heap (but this transfer is conceptual only, and requires no copying). Then FP tracks SP as usual *except* when execution is inside a captured continuation. The only overhead this adds to normal operation is to increase the amount of work required to deallocate a frame: load FP[offset] into FP ; <offset> known at compile time compare SP with SB + 1 ; `+ 1' covers initial `push FP' branch if equal to L move FP into SP L: Simple, yes? This probably isn't the right thing for creating continuations in a preemptive multithreading system. The problem is that on every thread switch, all the frames "recently" pushed on the stack get transferred into the heap. A multithreading system can and should take advantage of the fact that each continuation it creates will be invoked at most once, by switching entire stacks. But for a program that really does make use of firstclass continuations in a big way, this is as efficient an approach as I can imagine. Of course the garbage collector has to be written to handle the structures that result from doing this. (Another caveat: I am not sure how this interacts with some of the cleverer ways of using the stack that Orbit (the T compiler), for instance, is capable of.) If this pans out -- and I haven't found anything wrong with it yet -- it definitely qualifies as my best idea in months. I don't have time to turn it into a paper -- if anyone out there would like to do so, with me as co-author, I would be delighted! -- Scott
gumby@Cygnus.COM (David V. Wallace) (05/03/91)
Seems to me that any system which spaghettis its stack has to go to similar contortions already. You might look at the 86 lisp conf proceedings -- I seem to remember there were a couple of papers on this exact subject at that conference.
jaffer@zurich.ai.mit.edu (Aubrey Jaffer) (05/03/91)
> ... > If this pans out -- and I haven't found anything wrong with it yet -- > it definitely qualifies as my best idea in months. I don't have time > to turn it into a paper -- if anyone out there would like to do so, > with me as co-author, I would be delighted! This sounds similar to an late 1970s MIT AI Lab memo which I can't find right now. The citation was something like: Richard Stallman, Phantom Stacks: If you look to hard they dissapear, MIT Artificial Intellegence Laboratory memo XXX. As far as I know it was not implemented so there is still opportunity to do so.
max@nic.gac.EDU (Max Hailperin) (05/03/91)
Scott Layson Burson <gyro@cymbal.reasoning.com> suggests in Scheme Digest V3 #205 an implementation of call/cc which would convert the stack to heap and then allow further stack growth (and contraction) in the region on top of that. Several parties responded that the idea sounded familiar, but had some difficulty coming up with a concrete citation. The idea sounded familiar to me, and I have the good fortune to put my hand onat least one of the places it was published: MIT AI Memo No. 559 (EE&CS Integrated Circuit Memo No. 80-6), January 1980, "The SCHEME-79 Chip," by Jack Holloway, Guy Lewis Steele Jr., Gerald Jay Sussman, and Alan Bell, p. 24. The relevant quotation is: "Richard Stallman has observed that one can make stack out of list nodes, as we do, but by allocating it from a separate region, the stack will usually be a linear sequence of cells. Thus, if we can be sure that there are no pointers to the current region, pops can be done by just moving the stack pointer as in a traditional machine. If, however, the control environment is to be retained, we cannot just deallocate linearly, but rather we must defer to the garbage collector as usual. Since it is only the construcdtion of retained control environments which makes the stack non-linear, and since these retained control environments are only constructed at known times by either the user or the interrupt system, it is possible to know just which control environments are to be retained. Stallman suggests having a movable base register which points to the base of the linear (not retained) portion of the stack. When the stack is retained, the base register is just moved up to the stack pointer. When the stack is popped, it may be deallocated unless the pointer is below the base register. Assuming that such retained control environments are rare by comparison to the usual uses of the stack, one can usually treat most of the stack as a linear array, gettin most of the efficiency of normal stack operations on conventional machines."