[comp.arch] GC triggering and stack limit checking by MMU hardware

torbenm@diku.dk (Torben [gidius Mogensen) (07/19/90)

Most implementations (that I know) of programming languages with
stacks or heaps, use some kind of explicit check to see when a stack
or heap is full. This is used to trigger events such as error
messages, garbage collection or allocation of more memory for a
fragmented stack. Usually, good implementations avoid testing on every
use of the stack / heap, but only once for every basic block or
procedure.

By using the MMU facilities present in most modern computers, these
tests can be entirely done in hardware, and fragmented stacks can be
done transparently. This is the idea:

Give each stack or heap a small number of pages of memory and make
sure that the logical addresses are far apart, and that a large
logical address space is given each, but where only the first small
part is actually mapped to physical (or virtual) memory.

Assign to each stack/heap an event that is invoked by the MMU trap
handler on access to unmapped memory. An event can ask for a new
physical page to be mapped to the corresponding area, or maybe try to
do garbage collection.

Virtual memory can handle the stack problem, as the only sensible
thing to do would be to assign new pages. However, triggering garbage
collection in the way described above is different from just extending
the virtual memory space. Furthermore, the idea is perfectly usable
even on systems with no virtual memory support.

I believe heap intensive languages like LISP, Prolog or functional
languages could benefit from such a strategy, especially if procedures
and basic blocks are short, which is often the case for object code
for these languages.

My question is: has this been done? And if not, why not?

I suspect that operating systems like Unix doesn't provide such
facilities to user processes, but the issue is then: should they?

	Torben Mogensen (torbenm@diku.dk)

grunwald@foobar.colorado.edu (Dirk Grunwald) (07/20/90)

T> Assign to each stack/heap an event that is invoked by the MMU trap
T> handler on access to unmapped memory. An event can ask for a new
T> physical page to be mapped to the corresponding area, or maybe try to
T> do garbage collection.
T> 
T> Virtual memory can handle the stack problem, as the only sensible
T> thing to do would be to assign new pages. However, triggering garbage
T> collection in the way described above is different from just extending
T> the virtual memory space. Furthermore, the idea is perfectly usable
T> even on systems with no virtual memory 
	...
T> 
T> My question is: has this been done? And if not, why not?
	...
T> 	Torben Mogensen (torbenm@diku.dk)
--

Yes, see work by Andrew Appel on just this notion. I know it came out
as DEC-SRC tech report, and was published elsewhere as well. I think
he advocated this for the SML system.

You can do something similar to this in Ultrix and SunOS using the
``mprotect'' call to protect specific pages of memory, and then use a
signal handler for faults/seg violations (whatever error is normally
raised).

I've done this to protect the stack segments for my tasking library;
when creating 1000's of tasks, it's a little wasteful of time and
space (like, 2x slower and 4x larger when using 8K pages), but it
certainly helps locate tasking bugs.

Dirk Grunwald -- Univ. of Colorado at Boulder	(grunwald@foobar.colorado.edu)
						(grunwald@boulder.colorado.edu)

gjc@paradigm.com (07/20/90)

> Most implementations (that I know) of programming languages with
> stacks or heaps, use some kind of explicit check to see when a stack
> or heap is full...
> By using the MMU facilities present in most modern computers, these
> tests can be entirely done in hardware...

The VAX-NIL implementation of lisp, cica 1981, did this. The heap was
easy, but the stack was trickier, requiring a supervisor mode handler
in the case of the MAIN CONTROL stack, (and to be honest, the scheme for 
stack overflow detection was never implemented that way, and instead a
check was made, which only added an instruction to the function call
setup after entry).

But in current processor technology, where you have multi-mip RISC
processors running at many times faster rates than available
memory bandwith you can just initiate the memory operation first,
and then do the bounds check while you are waiting for the memory
operation to complete. The kind of thing you might do in a
microcoded lisp implementation.

More complete MMU implementations that could differentiate between
tagged pointer data references and untagged, and could help keep
track of object volatilities on memory pages would of course really
be a help.

-gjc

hg@ecs.soton.ac.uk (Hugh Glaser) (07/20/90)

Tony Field did something like this on the original Hope/FPM system
(running on VAX/VMS), I guess circa 1985.

jan@swi.psy.uva.nl (Jan Wielemaker) (07/20/90)

In article <1990Jul19.151524.22544@diku.dk> torbenm@diku.dk (Torben [gidius Mogensen) writes:
>
>Most implementations (that I know) of programming languages with
>stacks or heaps, use some kind of explicit check to see when a stack
>or heap is full.
[stuff deleted]
>By using the MMU facilities present in most modern computers, these
>tests can be entirely done in hardware, and fragmented stacks can be
>done transparently. This is the idea:
[stuff deleted]
>My question is: has this been done? And if not, why not?

I implemented this schema for SWI-Prolog.  Richard Tobin of Edinburgh
University noted me at the NACLP in Cleveland last year of the
possibility to reach the MMU under SunOs 4.0 via the mmap() and related
calls.  The schema supports expanding the stacks and triggering the
garbage collector.  It works as follows:

    1) A file of 1 page is created on /tmp and opened for reading.
    2) Computed from the command line options for the stack limits,
       virtual address space is reserved for all the Prolog stacks.
       A 1 page gap is always maintained between the gaps.
    3) On a segmentation violation, the stack causing the problem
       is determined.  On SunOs the address is passed via the signal
       handler; on systems that do not provide this I walk along
       all stacks, comparing the current top with the allocated
       top.  Next, a page is added to the stack and a call to
       considerGarbageCollect() is made.  Afterwards the signal
       handler returns.
    4) considerGarbageCollect() checks whether this is a stack
       that needs garbage collection and how much the stack has
       grown after the last garbage collection on this stack. If
       a garbage collection is appropriate a global flag is set
       to indicate such.
    5) On a save point (the call port for Prolog) the global flag
       indicating a garbage collection request is checked and if
       requested a garbage collection is executed.  After the
       garbage collection all pages above the new stack top are
       unmapped from the address space.

       It is next to impossible to invoke the garbage collector
       direct from the signal handler.  One does not know the
       status of the virtual machine and the stacks are not
       guarantied to be consistent at an arbirary point in time.

The proposed schema is very simple to implement.  It provides
dynamically growing stacks (to some limit, but on most modern machines
this limit can be high as the virtual address space is large).  Expanding
the stacks is almost for free.  On conventional systems a stack shifter
needs to be implemented.  This is tiresome and shifting the stacks is
an expensive operation.

I have seen mmap() and related calls only on SunOs 4.0.  The call maps
at file a specific address in the virtual address space.  SunOs both
supports shared maps (writing in the mapped addresses are forwarded to
the file) and private maps (writing is not forwarded to the file).  For
the dynamic stacks I map the same 1 page file many times using a private
map at different places.

On some system-V Unix machines the same results can be obtained using
shared memory.  In this case the shared memory segments are allocated
and mapped in the stacks.  I implemented this on a GOULD PN9000 under
UTX 2.0?  It works nice on SunOs (as an alternative to mmap()) as well.
On HPUX 6.5 for one or another reason all stacks point to the same
physical memory (the same bug occurs in SunOs 4.0.0 on SUN-3; only on
4.0.1 and later things are ok).  Finally on AIX 2.0 on a PS2 machine
shared memory segments can only be mapped on 4M boundaries, which
implies only a difficult schema can be used: 1) create a new segment,
larger than the old one and map it at an arbitrary place; 2) copy the
contents of the old into the new; 3) deallocate the old; 4) map the new
into the old place and unmap it from the temporary place.  I did not yet
implement this latter schema.

Both schemas are not ideal.  The mmap() requires a file and reads the
file into the page as a page is mapped.  This is a waste of resources; I
just want a page of uninitialised memory and need no file.  The shared
memory solution is not ideal as well because shared memory objects are
global resources in system-V and only a limited number of them is
available to ALL processes.  Also, the process is responsible for
deallocating the resource, otherwise they survive the terminating
process.  It would be ideal to have two simple calls:

    map(base, lenght)
    unmap(base, lenght)

	Jan

schwartz@groucho.cs.psu.edu (Scott E. Schwartz) (07/21/90)

In article <3260@swi.swi.psy.uva.nl> 
 jan@swi.psy.uva.nl (Jan Wielemaker) writes:
| Both schemas are not ideal.  The mmap() requires a file and reads the
| file into the page as a page is mapped.  This is a waste of resources; I
| just want a page of uninitialised memory and need no file.

Rather than mapping a file, you can map /dev/zero, which provides zero
filled copy on write pages.

lkaplan@bbn.com (Larry Kaplan) (07/21/90)

In article <Fhtz5421@cs.psu.edu> schwartz@groucho.cs.psu.edu (Scott E. Schwartz) writes:
>
>In article <3260@swi.swi.psy.uva.nl> 
> jan@swi.psy.uva.nl (Jan Wielemaker) writes:
>| Both schemas are not ideal.  The mmap() requires a file and reads the
>| file into the page as a page is mapped.  This is a waste of resources; I
>| just want a page of uninitialised memory and need no file.
>
>Rather than mapping a file, you can map /dev/zero, which provides zero
>filled copy on write pages.

Ever hear of anonymous mapped memory.  Check out the 4.3BSD Architecture Manual
(PS1:6) for a description of this rarely (or almost never) implemented 
feature.  Using MAP_ANON causes the memory to be zero filled, but not 
necessarily copy on write.  It is more meant for sharing of zero initialized
pages.  The file descriptor is only used for rendezvous.  We've implemented
it here and found it very useful.

#include <std_disclaimer>
_______________________________________________________________________________
				 ____ \ / ____
Laurence S. Kaplan		|    \ 0 /    |		BBN Advanced Computers
lkaplan@bbn.com			 \____|||____/		10 Fawcett St.
(617) 873-2431			  /__/ | \__\		Cambridge, MA  02238

pereira@alice.UUCP (Fernando Pereira) (07/21/90)

In article <1990Jul19.151524.22544@diku.dk> torbenm@diku.dk (Torben [gidius Mogensen) writes:
>
>By using the MMU facilities present in most modern computers, these
>tests can be entirely done in hardware, and fragmented stacks can be
>done transparently. This is the idea:
>
>Give each stack or heap a small number of pages of memory and make
>sure that the logical addresses are far apart, and that a large
>logical address space is given each, but where only the first small
>part is actually mapped to physical (or virtual) memory.
>
>Assign to each stack/heap an event that is invoked by the MMU trap
>handler on access to unmapped memory. An event can ask for a new
>physical page to be mapped to the corresponding area, or maybe try to
>do garbage collection.
> [...]
>My question is: has this been done? And if not, why not?
>
>I suspect that operating systems like Unix doesn't provide such
>facilities to user processes, but the issue is then: should they?
>

I know of Prolog and ML implementers that have tried this, but unfortunately
it is not a portable solution in Unix, even those versions that offer the
mmap system call to create ``islands'' of virtual address space. The problem
is that certain machine architectures (eg. Motorola 68K) and OS
implementations (eg. SunOS at least in some versions) do not provide
a continuable address violation signal (SIGSEGV), even though at the kernel
level, address translation faults (page faults) are continuable. Not having
looked at the insides of those machine/OS combinations, I suspect that
enough instruction execution context can be saved for filling a page fault
in the kernel, but not enough for reentering the faulting process to
handle a Unix signal and allocate the needed storage.

These observations are the result of practical experiments 5 or so years
ago with Sun 2's and VAXen running Berkeley Unix. The former could not
recover correctly from the segmentation violation (PC corrupted on return
from the signal), the latter could. Newer machines/architectures might handle
this better, although comments I have heard from the SML-NJ implementers
indicate that there are problem configurations even now.

Fernando Pereira
2D-447, AT&T Bell Laboratories
600 Mountain Ave, Murray Hill, NJ 07974
pereira@research.att.com

jac@paul.rutgers.edu (Jonathan A. Chandross) (07/21/90)

[ note editing of newsgroups line ]

torbenm@diku.dk (Torben [gidius Mogensen)
> Most implementations (that I know) of programming languages with
> stacks or heaps, use some kind of explicit check to see when a stack
> or heap is full. This is used to trigger events such as error
> messages, garbage collection or allocation of more memory for a
> fragmented stack.
> [ using an mmu to do this deleted ]

There is another way.  What you do is generalize the trap mechanism.
Traditionally, the trap has been viewed as an error condition --
access out of bounds, overflow, floating point exception, etc.
Instead, treat the trap as a notification mechanism.

Consider a scheme where some of the processor's registers have a bit
of additional hardware associated with them.  That hardware is just a
comparator, a value to compare against, and an address to branch to if
some user-specified trap condition is satisfied.

Each cycle the register's value is compared with the comparand value.
If the relation is satisfied control is transferred to the specified
address.  The address of the instruction which caused the trap is
available in a register so you can resume processing.

I describe how you can use such a mechanism for flow of control,
drastically reducing loop overhead, and to implement a cache of
the past n subroutine return addresses in: 
	%A Jonathan A. Chandross
	%A H. V. Jagadish
	%A Abhaya Asthana
	%T The Trap as a Flow Control Mechanism
	%J Proceedings 21st Annual Workshop on Microprogramming (MICRO 21)
	%D 1988


Jonathan A. Chandross
Internet: jac@paul.rutgers.edu
UUCP: rutgers!paul.rutgers.edu!jac

mo@messy.bellcore.com (Michael O'Dell) (07/22/90)

Interlisp did this as a matter of course, and Vax-Interlisp was
the single biggest reason for the originally-published
(as opposed to originally discussed but not published)
4.2BSD virtual memory proposal being page-based, just like Tenex.
This was finally implemented in Mach and SunOS 4.0, and
is forthcoming in 4.4BSD.

While this sounds like a good idea, it may not be on very fast
computers.  Providing user programs with completely general
program state access on segmentation faults can be VERY expensive.
I worked on one design whereby the kernel could do all the
usual stuff (copy-on-write, etc) by "reeffecting" faulting instructions
even in the face of out-of-order instruction completion,  But 
letting the user program poke its nose around as described by the
signal() interface requires "atomic trap behavior", and
that makes a program run MUCH MUCH slower on some machines.
For details, see my paper "Putting UNIX on Fast Computers"
in the latest USENIX Anaheim Conference Proceedings.

	-Mike

guy@auspex.auspex.com (Guy Harris) (07/22/90)

(UNIX issue, not system architecture issue, at this point)

>>Rather than mapping a file, you can map /dev/zero, which provides zero
>>filled copy on write pages.

No, it provides zero-fill on demand pages; there's no copy-on-write
involved, and no copy-on-write needed.  (It provides the same kind of
pages as the kind that lie behind BSS space when an executable image is
run, behind pages added on with "sbrk()", and behind pages added on to
swap space.)

>Ever hear of anonymous mapped memory.

Yes, it's what you get when you "mmap()" "/dev/zero" in SunOS 4.x and
System V Release 4.

>Check out the 4.3BSD Architecture Manual (PS1:6) for a description of
>this rarely (or almost never) implemented feature.

Yup.  That section says that the "flags" field of the "mmap()" call
has one of two mapping types, MAP_FILE and MAP_ANON, defined as 0x0001
and 0x0002 in <sys/mman.h>, and one of two sharing types, MAP_SHARED and
MAP_PRIVATE, defined as 0x0010 and 0x0000 in <sys/mman.h>.

Of course, the <sys/mman.h> that actually *came* with 4.3BSD and
4.3-tahoe has, instead,

	/* sharing types: choose either SHARED or PRIVATE */
	#define	MAP_SHARED	1		/* share changes */
	#define	MAP_PRIVATE	2		/* changes are private */

which would seem to conflict with MAP_FILE and MAP_ANON, and doesn't
have MAP_FILE or MAP_ANON themselves.

So Sun, I guess, is guilty of modeling "mmap()" more after what 4.3BSD
actually appears to have intended to do, based on the code, rather than
what the 4.3BSD Architecture Manual claimed they intended to do.  Then
again, often, when confronted with UNIX, you have to go with what the
code says, not what the documentation says; Berkeley may simply have
changed their mind and forgotten to update the Architecture Manual....

>Using MAP_ANON causes the memory to be zero filled, but not 
>necessarily copy on write.

Yup, just like mapping "/dev/zero".

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (07/22/90)

If my memory serves me correctly, EMAS Prolog (a Prolog interpreter
written in IMP, running under the EMAS operating system on ICL 2900s)
handled Prolog's several stacks by mapping each of them to a scratch
file, leaving unmapped pages in between.  This was _early_ 80s.  The
C Prolog interpreter was derived from EMAS Prolog, and it was going
to use the same scheme just as soon as Berkeley implemented mmap (the
4.1 manuals said it would be in 4.2...).  Isn't the name for this scheme
"red pages"?  

-- 
Science is all about asking the right questions.  | ok@goanna.cs.rmit.oz.au
I'm afraid you just asked one of the wrong ones.  | (quote from Playfair)

guy@auspex.auspex.com (Guy Harris) (07/24/90)

>The problem is that certain machine architectures (eg. Motorola 68K) and OS
>implementations (eg. SunOS at least in some versions) do not provide
>a continuable address violation signal (SIGSEGV), even though at the kernel
>level, address translation faults (page faults) are continuable. Not having
>looked at the insides of those machine/OS combinations, I suspect that
>enough instruction execution context can be saved for filling a page fault
>in the kernel, but not enough for reentering the faulting process to
>handle a Unix signal and allocate the needed storage.

The amount of instruction execution context is the same in both cases;
the only difference is where it has to be stored.  I think SunOS 4.1 may
store enough of it outside the kernel stack to permit *one* such fault
to be continued from (i.e., don't expect to be able to return from a
SIGSEGV that occurs while you're handling a SIGSEGV).

Part of the problem is that Motorola:

1) wouldn't commit to the the "stack puke" stored by the 680[1andup]0
   being "safe" to hand to user-mode code; i.e., they wouldn't say
   "nothing you can do to the 'stack puke' is risky";

and

2) wouldn't describe the format of the "stack puke" to the extent
   necessary to have the kernel validate it.

I can see their not doing so as being perfectly legitimate; for all I
know, different revisions of the same chip may have different "stack
puke" formats, and even if they don't, they might not want any of that
stuff to be considered a "feature", and have people then write code
dependent on that stuff and lock them into continuing to provide those
"features".  It does, however, complicate the task of allowing user-mode
code to continue from a SIGSEGV.

>These observations are the result of practical experiments 5 or so years
>ago with Sun 2's and VAXen running Berkeley Unix. The former could not
>recover correctly from the segmentation violation (PC corrupted on return
>from the signal), the latter could.

The former has more context than the latter; the former has the 68010
"stack puke", the latter has, as I remember, just the First Part Done
bit (and some of the general-purpose registers, for some of the
long-running instructions like MOVCn).

The latter is safe and, I think, appears in the "signal context"
saved by a BSD signal, so that the instruction can be continued from
user mode without the kernel having to tuck away one or more sets of
context.

>Newer machines/architectures might handle this better,

I think most RISC machines (not entirely surprisingly) have less or no
context of that sort; I'd expect things to work OK on a SPARC-based Sun,
for example, as well as a MIPS-based machine. 

In fact, what architectures other than the 68K architectures have lots
of context for that?  I don't think the 386 or the WE32K, for example,
have that problem.

dswartz@bigbootay.sw.stratus.com (Dan Swartzendruber) (07/24/90)

In article <3729@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
::The problem is that certain machine architectures (eg. Motorola 68K) and OS
::implementations (eg. SunOS at least in some versions) do not provide
::a continuable address violation signal (SIGSEGV), even though at the kernel
::level, address translation faults (page faults) are continuable. Not having
::looked at the insides of those machine/OS combinations, I suspect that
::enough instruction execution context can be saved for filling a page fault
::in the kernel, but not enough for reentering the faulting process to
::handle a Unix signal and allocate the needed storage.
:
:The amount of instruction execution context is the same in both cases;
:the only difference is where it has to be stored.  I think SunOS 4.1 may
:store enough of it outside the kernel stack to permit *one* such fault
:to be continued from (i.e., don't expect to be able to return from a
:SIGSEGV that occurs while you're handling a SIGSEGV).
:
:Part of the problem is that Motorola:
:
:1) wouldn't commit to the the "stack puke" stored by the 680[1andup]0
:   being "safe" to hand to user-mode code; i.e., they wouldn't say
:   "nothing you can do to the 'stack puke' is risky";
:
:and
:
:2) wouldn't describe the format of the "stack puke" to the extent
:   necessary to have the kernel validate it.
:
:I can see their not doing so as being perfectly legitimate; for all I
:know, different revisions of the same chip may have different "stack
:puke" formats, and even if they don't, they might not want any of that
:stuff to be considered a "feature", and have people then write code
:dependent on that stuff and lock them into continuing to provide those
:"features".  It does, however, complicate the task of allowing user-mode
:code to continue from a SIGSEGV.

This is true!  I have been at companies doing 680x0 products where you
had to absolutely sure that all processors on the machine were running
compatible versions of microcode.  Otherwise you could get into the
scenario where processor #1 takes a page fault, dumps "stack puke", the
page fault is serviced, and the user process is restarted on a processor
with incompatible "stack puke".  The net result is not good!

:
::These observations are the result of practical experiments 5 or so years
::ago with Sun 2's and VAXen running Berkeley Unix. The former could not
::recover correctly from the segmentation violation (PC corrupted on return
::from the signal), the latter could.
:
:The former has more context than the latter; the former has the 68010
:"stack puke", the latter has, as I remember, just the First Part Done
:bit (and some of the general-purpose registers, for some of the
:long-running instructions like MOVCn).
:
:The latter is safe and, I think, appears in the "signal context"
:saved by a BSD signal, so that the instruction can be continued from
:user mode without the kernel having to tuck away one or more sets of
:context.
:
::Newer machines/architectures might handle this better,
:
:I think most RISC machines (not entirely surprisingly) have less or no
:context of that sort; I'd expect things to work OK on a SPARC-based Sun,
:for example, as well as a MIPS-based machine. 
:
:In fact, what architectures other than the 68K architectures have lots
:of context for that?  I don't think the 386 or the WE32K, for example,
:have that problem.


--

Dan S.

jkenton@pinocchio.encore.com (Jeff Kenton) (07/24/90)

From article <3729@auspex.auspex.com>, by guy@auspex.auspex.com (Guy Harris):
>>The problem is that certain machine architectures (eg. Motorola 68K) and OS
>>implementations (eg. SunOS at least in some versions) do not provide
>>a continuable address violation signal (SIGSEGV), even though at the kernel
>>level, address translation faults (page faults) are continuable. Not having
> 
>>Newer machines/architectures might handle this better,
> 
> I think most RISC machines (not entirely surprisingly) have less or no
> context of that sort; I'd expect things to work OK on a SPARC-based Sun,
> for example, as well as a MIPS-based machine. 
> 
> In fact, what architectures other than the 68K architectures have lots
> of context for that?  I don't think the 386 or the WE32K, for example,
> have that problem.


The 88000 provides enough information to know what memory access failed,
but not where the instruction was that triggered the failure (the access
enters the memory pipeline while the instruction stream continues).  The
typical kernel faults in a missing page (if possible) and completes the
memory access from within the kernel.  It then returns to the user code
stream at the place where the exception took, which is usually not the
instruction which caused the fault.








- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      jeff kenton  ---	temporarily at jkenton@pinocchio.encore.com	 
		   ---  always at (617) 894-4508  ---
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

henry@zoo.toronto.edu (Henry Spencer) (07/25/90)

In article <3729@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>I think most RISC machines (not entirely surprisingly) have less or no
>context of that sort; I'd expect things to work OK on a SPARC-based Sun,
>for example, as well as a MIPS-based machine. 
>
>In fact, what architectures other than the 68K architectures have lots
>of context for that?  I don't think the 386 or the WE32K, for example,
>have that problem.

I think it's really only an issue on the stack-puke machines, and I
think the 680[123]0 is the full list.  I seem to recall reading that
Motorola has gone for instruction retry rather than restart on the 68040,
which may mean that the stack puke is gone too.  It was always sort of
an artifact of Motorola CPUs not keeping enough information around to
back out of a faulting instruction.

(Well, I should qualify the above or Mike will jump on me... :-)  I
think the 680[123]0 is the full list until you start getting up into the
supercomputer range where the speed of light is a big problem.)
-- 
NFS:  all the nice semantics of MSDOS, | Henry Spencer at U of Toronto Zoology
and its performance and security too.  |  henry@zoo.toronto.edu   utzoo!henry