[comp.lang.lisp] MMU detection of stack/heap overflow: summary of responses

torbenm@diku.dk (Torben [gidius Mogensen) (07/20/90)
Hi,

I received several responses to my question about using MMU hardware
to detect stack/heap overflow. Below is a summary of the responses,
some of which I have commented [ in brackets ].

	Torben Mogensen (torbenm@diku.dk)

------------------------------------

From: Preston Briggs <preston@rice.edu>

Hi,

You asked about using the MMU to detect stack and heap overflow.

The stack checking has been done on the IBM RT (and presumably
other machines).  The stack grows down, and the page below the stack
is marked as write-protected.  On a protection violation,
they check the SP and extend the stack if necessary.

On the heap checking...
Is there a need?  Don't people usually check at allocation time,
usually as a small part of the many other decisions being made
at the same time (which free list to use and so forth)?

[ In functional languages heap access must be fast, as lots of heap
allocation of small objects (Cons-cells, closures etc.) is done. In
implementations of such languages, the free list is often just a
pointer to the rest of the free heap, which is one contiguous block of
memory. Allocation is then just done by incrementing this pointer by
the amount of memory needed, so testing for overflow can be a large
proportion of the cost. I think Preston is thinking about C style
malloc(), where large blocks of memory are allocated infrequently. ]

Regards,
Preston
--------------------------------------
From: Greg.Morrisett@VACHE.VENARI.CS.CMU.EDU

Torben,

In fact, the SML of NJ compiler version 0.56 used the
very scheme you described.  SML/NJ doesn't use a stack
at all, so everything is heap allocated.  Basically, they
would just protect the page beyond the heap, and make sure
that when writing data out, they wrote it backwards, so that
it would be easy to restart (i.e. you'll trap before you
write any of the data out.)  As you say, all they have to
do is install a handler for bad memory access to trigger
a gc.  In short, it's been done.

For some reason, the folks at Princeton have changed this
in 0.59.  Now what they seem to do is a little different.
At the entrance to an extended basic block or EBB (single entry,
multiple exits), they take the heap pointer and add the
maximum number of bytes that might be allocated in the EBB
and then subtract that from the heap limit pointer.  (Actually,
the subtraction turns into another add, as they keep the
heap limit pointer inverted.)  If an overflow occurs on
the adds, this signals the GC.  (Note that they don't update
the heap pointer when doing the adds.)  So, instead of
trapping on bad access, they trap on overflow.  

It seems to me that if handling an oveflow trap costs the
same as handling a vm trap, that their latest version should
loose -- they're doing 2 adds per EBB and as you point out,
the size of basic blocks (and EBB's) is relatively small.
(My hunch is that the cost of an overflow trap is only slightly
less than that for a vm trap.)

However, any implementation of SML must have facilities for
trapping overflow since ML has provisions for exception handling
and overflow is a "default" exception.  Therefore, it's more
likely that this sort of "trick" will be more portable than
the vm hacks.  Of course, it's easy to put in the explicit
check with this system if you don't have overflow signal facilities.

Another idea to kick around is the use of vm protection for
incremental, concurrent garbage collection.  Assuming you're
using a copying collector, when a gc needs to take place, the
"gc-thread" can just read/write protect the "from" space.  When
a mutator attempts to access something that's protected (i.e.
in "from" space), the gc-thread will be notified.  It will
then "forward" the page that the mutator touched to the "to"
space (following pointers to some depth to amortize the cost).
After it's done forwarding that particular page, it lets the
mutator continue.  

What's nice about this idea is that (1) you don't have one big
stop and copy -- it's broken up into little stops and copies
so response can be much more "real-time", (2) there's no reason
why there only has to be one mutator.  Other threads of control
can be running around while gc is going on.  They'll only be
blocked when the touch a page that hasn't been forwarded, but
they'll be resumed quickly.

A fellow by the name of Dave Detlefs is working on a concurrent,
incremental garbage collector for C and C++ here at CMU.

In short, there are probably a *lot* of nice tricks that one
can pull with vm protection.  I'm messing around with a lot
of these.  'Course, since I'm using Mach, I have better primitives
and facilities for this sort of thing than the average UNIX user.
I think Andrew Appel at Princeton is planning on writing up
a survey of these vm tricks.  I'll let you know if I get a copy.

Glad to here that someone else is interested in this!

 Greg Morrisett                            (jgmorris@cs.cmu.edu)
 Carnegie Mellon University
 School of Computer Science
 Pittsburgh, PA  15213
 (412) 268-3053
------------------------------------------------

From: Kevin T. Likes <likes@loanshark.cs.indiana.edu>


If I recall correctly, Chez Scheme uses something like this in its
implementation.  (The difference being stacks made more suitable for call/cc.)

Kevin Likes
---------------------------------------------------

From: David.Tarditi@B.GP.CS.CMU.EDU

This has been done before.  Andrew Appel used this trick in the Standard ML
of New Jersey compiler until recently.  For references on this trick,
see [1],[2],[3].

This trick was recently removed from the Standard ML compiler and replaced
by a heap check at the top of an extended basic block (an extended basic
block has no jumps into it except at the top of the block and no loops, but
it may "forward" jumps).

It was removed because it causes portability problems and because it is
difficult (if not impossible) to implement correctly in a user-level program
on many operating systems.   You have to be able to restart an instruction
as a user-level process, and on some processors and operating systems this
is almost impossible.  To do this, you need access to all sorts of information
about the processor state, some of which may not be available to your
user-level signal handler that is trapping the page fault (or which may
be incorrect due to OS bugs).  I know that the Standard ML of NJ implementors
had serious problems getting this VM trick to work on the 68020.

     David Tarditi
-------
[1] A. Appel, J. Ellis, K. Li, "Real-time Concurrent Collection on
Stock Multiprocessors, Department of Computer Science Technical
Report CS-TR-133-88, Princeton University, March 2, 1988.

[2] A. Appel, "Garbage Collection Can Be Faster Than Stack Allocation",
Department of Computer Science Technical Report CS-TR-045-86,
Princeton University, June, 1986.

[3] A. Appel, "Simple Generational Garbage Collection and Fast Allocation",
Department of Computer Science Technical Report CS-TR-143-88,
Princeton University, March 1988.
---------------------------------------

From: tmb@ai.mit.edu (Thomas M. Breuel)

In article <1990Jul19.151524.22544@diku.dk> you write:
|>My question is: has this been done? And if not, why not?

Yes, many times. Under UNIX, a recent use of it is in SML of NJ.
-------------------------------------------

From: rh@craycos.com (Robert Herndon)

Torben:
  Certainly relatives of this idea have long been practiced.
Original Version 6 PDP-11 unix extends stacks by detecting
stack violations, extending the stack and restarting.  More
recently, HP-UX binaries on 680X0 processors issue TRAP 2
instructions to check for stack overflow.  These are over-written(!)
by the operating system to tst.b instructions that probe the
lower-most stack location needed by the procedure.  The OS
can then restart these instructions after one fails due to
insufficient stack.
  On the general memory management side, the Bourne shell
(/bin/sh for most of us) does very nasty things along these
lines.  These were the subject of a recent paper by
geoff@utstat.toronto.edu in one of the USENIX proceedings.
The Bourne shell treats the top of the heap in the fashion
you suggest:  access it, if it fails, handle the trap, make
an sbrk, do a bunch of stuff, and restart where you left off.
For machines that don't restart instructions, re-arranging
the code into a more reasonable form is quite painful.


			Robert Herndon
Cray Computer Corp.			(USA) 719/540-4240
1110 Bayfield Dr.			rh@craycos.com
Colorado Springs, CO  80906
	USA
------------------------------------

From: grunwald@foobar.colorado.edu (Dirk Grunwald)

Yes, see work by Andrew Appel on just this notion. I know it came out
as DEC-SRC tech report, and was published elsewhere as well. I think
he advocated this for the SML system.

You can do something similar to this in Ultrix and SunOS using the
``mprotect'' call to protect specific pages of memory, and then use a
signal handler for faults/seg violations (whatever error is normally
raised).

I've done this to protect the stack segments for my tasking library;
when creating 1000's of tasks, it's a little wasteful of time and
space (like, 2x slower and 4x larger when using 8K pages), but it
certainly helps locate tasking bugs.

Dirk Grunwald -- Univ. of Colorado at Boulder	(grunwald@foobar.colorado.edu)
						(grunwald@boulder.colorado.edu)
------------------------------