torbenm@diku.dk (Torben [gidius Mogensen) (07/20/90)
Hi, I received several responses to my question about using MMU hardware to detect stack/heap overflow. Below is a summary of the responses, some of which I have commented [ in brackets ]. Torben Mogensen (torbenm@diku.dk) ------------------------------------ From: Preston Briggs <preston@rice.edu> Hi, You asked about using the MMU to detect stack and heap overflow. The stack checking has been done on the IBM RT (and presumably other machines). The stack grows down, and the page below the stack is marked as write-protected. On a protection violation, they check the SP and extend the stack if necessary. On the heap checking... Is there a need? Don't people usually check at allocation time, usually as a small part of the many other decisions being made at the same time (which free list to use and so forth)? [ In functional languages heap access must be fast, as lots of heap allocation of small objects (Cons-cells, closures etc.) is done. In implementations of such languages, the free list is often just a pointer to the rest of the free heap, which is one contiguous block of memory. Allocation is then just done by incrementing this pointer by the amount of memory needed, so testing for overflow can be a large proportion of the cost. I think Preston is thinking about C style malloc(), where large blocks of memory are allocated infrequently. ] Regards, Preston -------------------------------------- From: Greg.Morrisett@VACHE.VENARI.CS.CMU.EDU Torben, In fact, the SML of NJ compiler version 0.56 used the very scheme you described. SML/NJ doesn't use a stack at all, so everything is heap allocated. Basically, they would just protect the page beyond the heap, and make sure that when writing data out, they wrote it backwards, so that it would be easy to restart (i.e. you'll trap before you write any of the data out.) As you say, all they have to do is install a handler for bad memory access to trigger a gc. In short, it's been done. For some reason, the folks at Princeton have changed this in 0.59. Now what they seem to do is a little different. At the entrance to an extended basic block or EBB (single entry, multiple exits), they take the heap pointer and add the maximum number of bytes that might be allocated in the EBB and then subtract that from the heap limit pointer. (Actually, the subtraction turns into another add, as they keep the heap limit pointer inverted.) If an overflow occurs on the adds, this signals the GC. (Note that they don't update the heap pointer when doing the adds.) So, instead of trapping on bad access, they trap on overflow. It seems to me that if handling an oveflow trap costs the same as handling a vm trap, that their latest version should loose -- they're doing 2 adds per EBB and as you point out, the size of basic blocks (and EBB's) is relatively small. (My hunch is that the cost of an overflow trap is only slightly less than that for a vm trap.) However, any implementation of SML must have facilities for trapping overflow since ML has provisions for exception handling and overflow is a "default" exception. Therefore, it's more likely that this sort of "trick" will be more portable than the vm hacks. Of course, it's easy to put in the explicit check with this system if you don't have overflow signal facilities. Another idea to kick around is the use of vm protection for incremental, concurrent garbage collection. Assuming you're using a copying collector, when a gc needs to take place, the "gc-thread" can just read/write protect the "from" space. When a mutator attempts to access something that's protected (i.e. in "from" space), the gc-thread will be notified. It will then "forward" the page that the mutator touched to the "to" space (following pointers to some depth to amortize the cost). After it's done forwarding that particular page, it lets the mutator continue. What's nice about this idea is that (1) you don't have one big stop and copy -- it's broken up into little stops and copies so response can be much more "real-time", (2) there's no reason why there only has to be one mutator. Other threads of control can be running around while gc is going on. They'll only be blocked when the touch a page that hasn't been forwarded, but they'll be resumed quickly. A fellow by the name of Dave Detlefs is working on a concurrent, incremental garbage collector for C and C++ here at CMU. In short, there are probably a *lot* of nice tricks that one can pull with vm protection. I'm messing around with a lot of these. 'Course, since I'm using Mach, I have better primitives and facilities for this sort of thing than the average UNIX user. I think Andrew Appel at Princeton is planning on writing up a survey of these vm tricks. I'll let you know if I get a copy. Glad to here that someone else is interested in this! Greg Morrisett (jgmorris@cs.cmu.edu) Carnegie Mellon University School of Computer Science Pittsburgh, PA 15213 (412) 268-3053 ------------------------------------------------ From: Kevin T. Likes <likes@loanshark.cs.indiana.edu> If I recall correctly, Chez Scheme uses something like this in its implementation. (The difference being stacks made more suitable for call/cc.) Kevin Likes --------------------------------------------------- From: David.Tarditi@B.GP.CS.CMU.EDU This has been done before. Andrew Appel used this trick in the Standard ML of New Jersey compiler until recently. For references on this trick, see [1],[2],[3]. This trick was recently removed from the Standard ML compiler and replaced by a heap check at the top of an extended basic block (an extended basic block has no jumps into it except at the top of the block and no loops, but it may "forward" jumps). It was removed because it causes portability problems and because it is difficult (if not impossible) to implement correctly in a user-level program on many operating systems. You have to be able to restart an instruction as a user-level process, and on some processors and operating systems this is almost impossible. To do this, you need access to all sorts of information about the processor state, some of which may not be available to your user-level signal handler that is trapping the page fault (or which may be incorrect due to OS bugs). I know that the Standard ML of NJ implementors had serious problems getting this VM trick to work on the 68020. David Tarditi ------- [1] A. Appel, J. Ellis, K. Li, "Real-time Concurrent Collection on Stock Multiprocessors, Department of Computer Science Technical Report CS-TR-133-88, Princeton University, March 2, 1988. [2] A. Appel, "Garbage Collection Can Be Faster Than Stack Allocation", Department of Computer Science Technical Report CS-TR-045-86, Princeton University, June, 1986. [3] A. Appel, "Simple Generational Garbage Collection and Fast Allocation", Department of Computer Science Technical Report CS-TR-143-88, Princeton University, March 1988. --------------------------------------- From: tmb@ai.mit.edu (Thomas M. Breuel) In article <1990Jul19.151524.22544@diku.dk> you write: |>My question is: has this been done? And if not, why not? Yes, many times. Under UNIX, a recent use of it is in SML of NJ. ------------------------------------------- From: rh@craycos.com (Robert Herndon) Torben: Certainly relatives of this idea have long been practiced. Original Version 6 PDP-11 unix extends stacks by detecting stack violations, extending the stack and restarting. More recently, HP-UX binaries on 680X0 processors issue TRAP 2 instructions to check for stack overflow. These are over-written(!) by the operating system to tst.b instructions that probe the lower-most stack location needed by the procedure. The OS can then restart these instructions after one fails due to insufficient stack. On the general memory management side, the Bourne shell (/bin/sh for most of us) does very nasty things along these lines. These were the subject of a recent paper by geoff@utstat.toronto.edu in one of the USENIX proceedings. The Bourne shell treats the top of the heap in the fashion you suggest: access it, if it fails, handle the trap, make an sbrk, do a bunch of stuff, and restart where you left off. For machines that don't restart instructions, re-arranging the code into a more reasonable form is quite painful. Robert Herndon Cray Computer Corp. (USA) 719/540-4240 1110 Bayfield Dr. rh@craycos.com Colorado Springs, CO 80906 USA ------------------------------------ From: grunwald@foobar.colorado.edu (Dirk Grunwald) Yes, see work by Andrew Appel on just this notion. I know it came out as DEC-SRC tech report, and was published elsewhere as well. I think he advocated this for the SML system. You can do something similar to this in Ultrix and SunOS using the ``mprotect'' call to protect specific pages of memory, and then use a signal handler for faults/seg violations (whatever error is normally raised). I've done this to protect the stack segments for my tasking library; when creating 1000's of tasks, it's a little wasteful of time and space (like, 2x slower and 4x larger when using 8K pages), but it certainly helps locate tasking bugs. Dirk Grunwald -- Univ. of Colorado at Boulder (grunwald@foobar.colorado.edu) (grunwald@boulder.colorado.edu) ------------------------------