torbenm@diku.dk (Torben [gidius Mogensen) (07/19/90)
Most implementations (that I know) of programming languages with stacks or heaps, use some kind of explicit check to see when a stack or heap is full. This is used to trigger events such as error messages, garbage collection or allocation of more memory for a fragmented stack. Usually, good implementations avoid testing on every use of the stack / heap, but only once for every basic block or procedure. By using the MMU facilities present in most modern computers, these tests can be entirely done in hardware, and fragmented stacks can be done transparently. This is the idea: Give each stack or heap a small number of pages of memory and make sure that the logical addresses are far apart, and that a large logical address space is given each, but where only the first small part is actually mapped to physical (or virtual) memory. Assign to each stack/heap an event that is invoked by the MMU trap handler on access to unmapped memory. An event can ask for a new physical page to be mapped to the corresponding area, or maybe try to do garbage collection. Virtual memory can handle the stack problem, as the only sensible thing to do would be to assign new pages. However, triggering garbage collection in the way described above is different from just extending the virtual memory space. Furthermore, the idea is perfectly usable even on systems with no virtual memory support. I believe heap intensive languages like LISP, Prolog or functional languages could benefit from such a strategy, especially if procedures and basic blocks are short, which is often the case for object code for these languages. My question is: has this been done? And if not, why not? I suspect that operating systems like Unix doesn't provide such facilities to user processes, but the issue is then: should they? Torben Mogensen (torbenm@diku.dk)
grunwald@foobar.colorado.edu (Dirk Grunwald) (07/20/90)
T> Assign to each stack/heap an event that is invoked by the MMU trap T> handler on access to unmapped memory. An event can ask for a new T> physical page to be mapped to the corresponding area, or maybe try to T> do garbage collection. T> T> Virtual memory can handle the stack problem, as the only sensible T> thing to do would be to assign new pages. However, triggering garbage T> collection in the way described above is different from just extending T> the virtual memory space. Furthermore, the idea is perfectly usable T> even on systems with no virtual memory ... T> T> My question is: has this been done? And if not, why not? ... T> Torben Mogensen (torbenm@diku.dk) -- Yes, see work by Andrew Appel on just this notion. I know it came out as DEC-SRC tech report, and was published elsewhere as well. I think he advocated this for the SML system. You can do something similar to this in Ultrix and SunOS using the ``mprotect'' call to protect specific pages of memory, and then use a signal handler for faults/seg violations (whatever error is normally raised). I've done this to protect the stack segments for my tasking library; when creating 1000's of tasks, it's a little wasteful of time and space (like, 2x slower and 4x larger when using 8K pages), but it certainly helps locate tasking bugs. Dirk Grunwald -- Univ. of Colorado at Boulder (grunwald@foobar.colorado.edu) (grunwald@boulder.colorado.edu)
gjc@paradigm.com (07/20/90)
> Most implementations (that I know) of programming languages with > stacks or heaps, use some kind of explicit check to see when a stack > or heap is full... > By using the MMU facilities present in most modern computers, these > tests can be entirely done in hardware... The VAX-NIL implementation of lisp, cica 1981, did this. The heap was easy, but the stack was trickier, requiring a supervisor mode handler in the case of the MAIN CONTROL stack, (and to be honest, the scheme for stack overflow detection was never implemented that way, and instead a check was made, which only added an instruction to the function call setup after entry). But in current processor technology, where you have multi-mip RISC processors running at many times faster rates than available memory bandwith you can just initiate the memory operation first, and then do the bounds check while you are waiting for the memory operation to complete. The kind of thing you might do in a microcoded lisp implementation. More complete MMU implementations that could differentiate between tagged pointer data references and untagged, and could help keep track of object volatilities on memory pages would of course really be a help. -gjc
hg@ecs.soton.ac.uk (Hugh Glaser) (07/20/90)
Tony Field did something like this on the original Hope/FPM system (running on VAX/VMS), I guess circa 1985.
jan@swi.psy.uva.nl (Jan Wielemaker) (07/20/90)
In article <1990Jul19.151524.22544@diku.dk> torbenm@diku.dk (Torben [gidius Mogensen) writes: > >Most implementations (that I know) of programming languages with >stacks or heaps, use some kind of explicit check to see when a stack >or heap is full. [stuff deleted] >By using the MMU facilities present in most modern computers, these >tests can be entirely done in hardware, and fragmented stacks can be >done transparently. This is the idea: [stuff deleted] >My question is: has this been done? And if not, why not? I implemented this schema for SWI-Prolog. Richard Tobin of Edinburgh University noted me at the NACLP in Cleveland last year of the possibility to reach the MMU under SunOs 4.0 via the mmap() and related calls. The schema supports expanding the stacks and triggering the garbage collector. It works as follows: 1) A file of 1 page is created on /tmp and opened for reading. 2) Computed from the command line options for the stack limits, virtual address space is reserved for all the Prolog stacks. A 1 page gap is always maintained between the gaps. 3) On a segmentation violation, the stack causing the problem is determined. On SunOs the address is passed via the signal handler; on systems that do not provide this I walk along all stacks, comparing the current top with the allocated top. Next, a page is added to the stack and a call to considerGarbageCollect() is made. Afterwards the signal handler returns. 4) considerGarbageCollect() checks whether this is a stack that needs garbage collection and how much the stack has grown after the last garbage collection on this stack. If a garbage collection is appropriate a global flag is set to indicate such. 5) On a save point (the call port for Prolog) the global flag indicating a garbage collection request is checked and if requested a garbage collection is executed. After the garbage collection all pages above the new stack top are unmapped from the address space. It is next to impossible to invoke the garbage collector direct from the signal handler. One does not know the status of the virtual machine and the stacks are not guarantied to be consistent at an arbirary point in time. The proposed schema is very simple to implement. It provides dynamically growing stacks (to some limit, but on most modern machines this limit can be high as the virtual address space is large). Expanding the stacks is almost for free. On conventional systems a stack shifter needs to be implemented. This is tiresome and shifting the stacks is an expensive operation. I have seen mmap() and related calls only on SunOs 4.0. The call maps at file a specific address in the virtual address space. SunOs both supports shared maps (writing in the mapped addresses are forwarded to the file) and private maps (writing is not forwarded to the file). For the dynamic stacks I map the same 1 page file many times using a private map at different places. On some system-V Unix machines the same results can be obtained using shared memory. In this case the shared memory segments are allocated and mapped in the stacks. I implemented this on a GOULD PN9000 under UTX 2.0? It works nice on SunOs (as an alternative to mmap()) as well. On HPUX 6.5 for one or another reason all stacks point to the same physical memory (the same bug occurs in SunOs 4.0.0 on SUN-3; only on 4.0.1 and later things are ok). Finally on AIX 2.0 on a PS2 machine shared memory segments can only be mapped on 4M boundaries, which implies only a difficult schema can be used: 1) create a new segment, larger than the old one and map it at an arbitrary place; 2) copy the contents of the old into the new; 3) deallocate the old; 4) map the new into the old place and unmap it from the temporary place. I did not yet implement this latter schema. Both schemas are not ideal. The mmap() requires a file and reads the file into the page as a page is mapped. This is a waste of resources; I just want a page of uninitialised memory and need no file. The shared memory solution is not ideal as well because shared memory objects are global resources in system-V and only a limited number of them is available to ALL processes. Also, the process is responsible for deallocating the resource, otherwise they survive the terminating process. It would be ideal to have two simple calls: map(base, lenght) unmap(base, lenght) Jan
schwartz@groucho.cs.psu.edu (Scott E. Schwartz) (07/21/90)
In article <3260@swi.swi.psy.uva.nl> jan@swi.psy.uva.nl (Jan Wielemaker) writes: | Both schemas are not ideal. The mmap() requires a file and reads the | file into the page as a page is mapped. This is a waste of resources; I | just want a page of uninitialised memory and need no file. Rather than mapping a file, you can map /dev/zero, which provides zero filled copy on write pages.
lkaplan@bbn.com (Larry Kaplan) (07/21/90)
In article <Fhtz5421@cs.psu.edu> schwartz@groucho.cs.psu.edu (Scott E. Schwartz) writes: > >In article <3260@swi.swi.psy.uva.nl> > jan@swi.psy.uva.nl (Jan Wielemaker) writes: >| Both schemas are not ideal. The mmap() requires a file and reads the >| file into the page as a page is mapped. This is a waste of resources; I >| just want a page of uninitialised memory and need no file. > >Rather than mapping a file, you can map /dev/zero, which provides zero >filled copy on write pages. Ever hear of anonymous mapped memory. Check out the 4.3BSD Architecture Manual (PS1:6) for a description of this rarely (or almost never) implemented feature. Using MAP_ANON causes the memory to be zero filled, but not necessarily copy on write. It is more meant for sharing of zero initialized pages. The file descriptor is only used for rendezvous. We've implemented it here and found it very useful. #include <std_disclaimer> _______________________________________________________________________________ ____ \ / ____ Laurence S. Kaplan | \ 0 / | BBN Advanced Computers lkaplan@bbn.com \____|||____/ 10 Fawcett St. (617) 873-2431 /__/ | \__\ Cambridge, MA 02238
pereira@alice.UUCP (Fernando Pereira) (07/21/90)
In article <1990Jul19.151524.22544@diku.dk> torbenm@diku.dk (Torben [gidius Mogensen) writes: > >By using the MMU facilities present in most modern computers, these >tests can be entirely done in hardware, and fragmented stacks can be >done transparently. This is the idea: > >Give each stack or heap a small number of pages of memory and make >sure that the logical addresses are far apart, and that a large >logical address space is given each, but where only the first small >part is actually mapped to physical (or virtual) memory. > >Assign to each stack/heap an event that is invoked by the MMU trap >handler on access to unmapped memory. An event can ask for a new >physical page to be mapped to the corresponding area, or maybe try to >do garbage collection. > [...] >My question is: has this been done? And if not, why not? > >I suspect that operating systems like Unix doesn't provide such >facilities to user processes, but the issue is then: should they? > I know of Prolog and ML implementers that have tried this, but unfortunately it is not a portable solution in Unix, even those versions that offer the mmap system call to create ``islands'' of virtual address space. The problem is that certain machine architectures (eg. Motorola 68K) and OS implementations (eg. SunOS at least in some versions) do not provide a continuable address violation signal (SIGSEGV), even though at the kernel level, address translation faults (page faults) are continuable. Not having looked at the insides of those machine/OS combinations, I suspect that enough instruction execution context can be saved for filling a page fault in the kernel, but not enough for reentering the faulting process to handle a Unix signal and allocate the needed storage. These observations are the result of practical experiments 5 or so years ago with Sun 2's and VAXen running Berkeley Unix. The former could not recover correctly from the segmentation violation (PC corrupted on return from the signal), the latter could. Newer machines/architectures might handle this better, although comments I have heard from the SML-NJ implementers indicate that there are problem configurations even now. Fernando Pereira 2D-447, AT&T Bell Laboratories 600 Mountain Ave, Murray Hill, NJ 07974 pereira@research.att.com
mo@messy.bellcore.com (Michael O'Dell) (07/22/90)
Interlisp did this as a matter of course, and Vax-Interlisp was the single biggest reason for the originally-published (as opposed to originally discussed but not published) 4.2BSD virtual memory proposal being page-based, just like Tenex. This was finally implemented in Mach and SunOS 4.0, and is forthcoming in 4.4BSD. While this sounds like a good idea, it may not be on very fast computers. Providing user programs with completely general program state access on segmentation faults can be VERY expensive. I worked on one design whereby the kernel could do all the usual stuff (copy-on-write, etc) by "reeffecting" faulting instructions even in the face of out-of-order instruction completion, But letting the user program poke its nose around as described by the signal() interface requires "atomic trap behavior", and that makes a program run MUCH MUCH slower on some machines. For details, see my paper "Putting UNIX on Fast Computers" in the latest USENIX Anaheim Conference Proceedings. -Mike
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (07/22/90)
If my memory serves me correctly, EMAS Prolog (a Prolog interpreter written in IMP, running under the EMAS operating system on ICL 2900s) handled Prolog's several stacks by mapping each of them to a scratch file, leaving unmapped pages in between. This was _early_ 80s. The C Prolog interpreter was derived from EMAS Prolog, and it was going to use the same scheme just as soon as Berkeley implemented mmap (the 4.1 manuals said it would be in 4.2...). Isn't the name for this scheme "red pages"? -- Science is all about asking the right questions. | ok@goanna.cs.rmit.oz.au I'm afraid you just asked one of the wrong ones. | (quote from Playfair)
guy@auspex.auspex.com (Guy Harris) (07/24/90)
>The problem is that certain machine architectures (eg. Motorola 68K) and OS >implementations (eg. SunOS at least in some versions) do not provide >a continuable address violation signal (SIGSEGV), even though at the kernel >level, address translation faults (page faults) are continuable. Not having >looked at the insides of those machine/OS combinations, I suspect that >enough instruction execution context can be saved for filling a page fault >in the kernel, but not enough for reentering the faulting process to >handle a Unix signal and allocate the needed storage. The amount of instruction execution context is the same in both cases; the only difference is where it has to be stored. I think SunOS 4.1 may store enough of it outside the kernel stack to permit *one* such fault to be continued from (i.e., don't expect to be able to return from a SIGSEGV that occurs while you're handling a SIGSEGV). Part of the problem is that Motorola: 1) wouldn't commit to the the "stack puke" stored by the 680[1andup]0 being "safe" to hand to user-mode code; i.e., they wouldn't say "nothing you can do to the 'stack puke' is risky"; and 2) wouldn't describe the format of the "stack puke" to the extent necessary to have the kernel validate it. I can see their not doing so as being perfectly legitimate; for all I know, different revisions of the same chip may have different "stack puke" formats, and even if they don't, they might not want any of that stuff to be considered a "feature", and have people then write code dependent on that stuff and lock them into continuing to provide those "features". It does, however, complicate the task of allowing user-mode code to continue from a SIGSEGV. >These observations are the result of practical experiments 5 or so years >ago with Sun 2's and VAXen running Berkeley Unix. The former could not >recover correctly from the segmentation violation (PC corrupted on return >from the signal), the latter could. The former has more context than the latter; the former has the 68010 "stack puke", the latter has, as I remember, just the First Part Done bit (and some of the general-purpose registers, for some of the long-running instructions like MOVCn). The latter is safe and, I think, appears in the "signal context" saved by a BSD signal, so that the instruction can be continued from user mode without the kernel having to tuck away one or more sets of context. >Newer machines/architectures might handle this better, I think most RISC machines (not entirely surprisingly) have less or no context of that sort; I'd expect things to work OK on a SPARC-based Sun, for example, as well as a MIPS-based machine. In fact, what architectures other than the 68K architectures have lots of context for that? I don't think the 386 or the WE32K, for example, have that problem.