[comp.lang.prolog] GC triggering and stack limit checking by MMU hardware

torbenm@diku.dk (Torben [gidius Mogensen) (07/19/90)

Most implementations (that I know) of programming languages with
stacks or heaps, use some kind of explicit check to see when a stack
or heap is full. This is used to trigger events such as error
messages, garbage collection or allocation of more memory for a
fragmented stack. Usually, good implementations avoid testing on every
use of the stack / heap, but only once for every basic block or
procedure.

By using the MMU facilities present in most modern computers, these
tests can be entirely done in hardware, and fragmented stacks can be
done transparently. This is the idea:

Give each stack or heap a small number of pages of memory and make
sure that the logical addresses are far apart, and that a large
logical address space is given each, but where only the first small
part is actually mapped to physical (or virtual) memory.

Assign to each stack/heap an event that is invoked by the MMU trap
handler on access to unmapped memory. An event can ask for a new
physical page to be mapped to the corresponding area, or maybe try to
do garbage collection.

Virtual memory can handle the stack problem, as the only sensible
thing to do would be to assign new pages. However, triggering garbage
collection in the way described above is different from just extending
the virtual memory space. Furthermore, the idea is perfectly usable
even on systems with no virtual memory support.

I believe heap intensive languages like LISP, Prolog or functional
languages could benefit from such a strategy, especially if procedures
and basic blocks are short, which is often the case for object code
for these languages.

My question is: has this been done? And if not, why not?

I suspect that operating systems like Unix doesn't provide such
facilities to user processes, but the issue is then: should they?

	Torben Mogensen (torbenm@diku.dk)

grunwald@foobar.colorado.edu (Dirk Grunwald) (07/20/90)

T> Assign to each stack/heap an event that is invoked by the MMU trap
T> handler on access to unmapped memory. An event can ask for a new
T> physical page to be mapped to the corresponding area, or maybe try to
T> do garbage collection.
T> 
T> Virtual memory can handle the stack problem, as the only sensible
T> thing to do would be to assign new pages. However, triggering garbage
T> collection in the way described above is different from just extending
T> the virtual memory space. Furthermore, the idea is perfectly usable
T> even on systems with no virtual memory 
	...
T> 
T> My question is: has this been done? And if not, why not?
	...
T> 	Torben Mogensen (torbenm@diku.dk)
--

Yes, see work by Andrew Appel on just this notion. I know it came out
as DEC-SRC tech report, and was published elsewhere as well. I think
he advocated this for the SML system.

You can do something similar to this in Ultrix and SunOS using the
``mprotect'' call to protect specific pages of memory, and then use a
signal handler for faults/seg violations (whatever error is normally
raised).

I've done this to protect the stack segments for my tasking library;
when creating 1000's of tasks, it's a little wasteful of time and
space (like, 2x slower and 4x larger when using 8K pages), but it
certainly helps locate tasking bugs.

Dirk Grunwald -- Univ. of Colorado at Boulder	(grunwald@foobar.colorado.edu)
						(grunwald@boulder.colorado.edu)

gjc@paradigm.com (07/20/90)

> Most implementations (that I know) of programming languages with
> stacks or heaps, use some kind of explicit check to see when a stack
> or heap is full...
> By using the MMU facilities present in most modern computers, these
> tests can be entirely done in hardware...

The VAX-NIL implementation of lisp, cica 1981, did this. The heap was
easy, but the stack was trickier, requiring a supervisor mode handler
in the case of the MAIN CONTROL stack, (and to be honest, the scheme for 
stack overflow detection was never implemented that way, and instead a
check was made, which only added an instruction to the function call
setup after entry).

But in current processor technology, where you have multi-mip RISC
processors running at many times faster rates than available
memory bandwith you can just initiate the memory operation first,
and then do the bounds check while you are waiting for the memory
operation to complete. The kind of thing you might do in a
microcoded lisp implementation.

More complete MMU implementations that could differentiate between
tagged pointer data references and untagged, and could help keep
track of object volatilities on memory pages would of course really
be a help.

-gjc

pgl@cup.portal.com (Peter G Ludemann) (07/20/90)

IBM-Prolog has used the memory-trap technique for local and
global stack overflow since 1985 or so.  An addressing exception
triggers garbage collection.

The scheme is completely general; the error-catch mechanism
can even catch stack overflow errors and do something reasonable
with them (for example, I use this as a way of trapping left
recursion in a grammar-rule interpreter).

- peter ludemann	(usual disclaimer)

hg@ecs.soton.ac.uk (Hugh Glaser) (07/20/90)

Tony Field did something like this on the original Hope/FPM system
(running on VAX/VMS), I guess circa 1985.

jan@swi.psy.uva.nl (Jan Wielemaker) (07/20/90)

In article <1990Jul19.151524.22544@diku.dk> torbenm@diku.dk (Torben [gidius Mogensen) writes:
>
>Most implementations (that I know) of programming languages with
>stacks or heaps, use some kind of explicit check to see when a stack
>or heap is full.
[stuff deleted]
>By using the MMU facilities present in most modern computers, these
>tests can be entirely done in hardware, and fragmented stacks can be
>done transparently. This is the idea:
[stuff deleted]
>My question is: has this been done? And if not, why not?

I implemented this schema for SWI-Prolog.  Richard Tobin of Edinburgh
University noted me at the NACLP in Cleveland last year of the
possibility to reach the MMU under SunOs 4.0 via the mmap() and related
calls.  The schema supports expanding the stacks and triggering the
garbage collector.  It works as follows:

    1) A file of 1 page is created on /tmp and opened for reading.
    2) Computed from the command line options for the stack limits,
       virtual address space is reserved for all the Prolog stacks.
       A 1 page gap is always maintained between the gaps.
    3) On a segmentation violation, the stack causing the problem
       is determined.  On SunOs the address is passed via the signal
       handler; on systems that do not provide this I walk along
       all stacks, comparing the current top with the allocated
       top.  Next, a page is added to the stack and a call to
       considerGarbageCollect() is made.  Afterwards the signal
       handler returns.
    4) considerGarbageCollect() checks whether this is a stack
       that needs garbage collection and how much the stack has
       grown after the last garbage collection on this stack. If
       a garbage collection is appropriate a global flag is set
       to indicate such.
    5) On a save point (the call port for Prolog) the global flag
       indicating a garbage collection request is checked and if
       requested a garbage collection is executed.  After the
       garbage collection all pages above the new stack top are
       unmapped from the address space.

       It is next to impossible to invoke the garbage collector
       direct from the signal handler.  One does not know the
       status of the virtual machine and the stacks are not
       guarantied to be consistent at an arbirary point in time.

The proposed schema is very simple to implement.  It provides
dynamically growing stacks (to some limit, but on most modern machines
this limit can be high as the virtual address space is large).  Expanding
the stacks is almost for free.  On conventional systems a stack shifter
needs to be implemented.  This is tiresome and shifting the stacks is
an expensive operation.

I have seen mmap() and related calls only on SunOs 4.0.  The call maps
at file a specific address in the virtual address space.  SunOs both
supports shared maps (writing in the mapped addresses are forwarded to
the file) and private maps (writing is not forwarded to the file).  For
the dynamic stacks I map the same 1 page file many times using a private
map at different places.

On some system-V Unix machines the same results can be obtained using
shared memory.  In this case the shared memory segments are allocated
and mapped in the stacks.  I implemented this on a GOULD PN9000 under
UTX 2.0?  It works nice on SunOs (as an alternative to mmap()) as well.
On HPUX 6.5 for one or another reason all stacks point to the same
physical memory (the same bug occurs in SunOs 4.0.0 on SUN-3; only on
4.0.1 and later things are ok).  Finally on AIX 2.0 on a PS2 machine
shared memory segments can only be mapped on 4M boundaries, which
implies only a difficult schema can be used: 1) create a new segment,
larger than the old one and map it at an arbitrary place; 2) copy the
contents of the old into the new; 3) deallocate the old; 4) map the new
into the old place and unmap it from the temporary place.  I did not yet
implement this latter schema.

Both schemas are not ideal.  The mmap() requires a file and reads the
file into the page as a page is mapped.  This is a waste of resources; I
just want a page of uninitialised memory and need no file.  The shared
memory solution is not ideal as well because shared memory objects are
global resources in system-V and only a limited number of them is
available to ALL processes.  Also, the process is responsible for
deallocating the resource, otherwise they survive the terminating
process.  It would be ideal to have two simple calls:

    map(base, lenght)
    unmap(base, lenght)

	Jan

schwartz@groucho.cs.psu.edu (Scott E. Schwartz) (07/21/90)

In article <3260@swi.swi.psy.uva.nl> 
 jan@swi.psy.uva.nl (Jan Wielemaker) writes:
| Both schemas are not ideal.  The mmap() requires a file and reads the
| file into the page as a page is mapped.  This is a waste of resources; I
| just want a page of uninitialised memory and need no file.

Rather than mapping a file, you can map /dev/zero, which provides zero
filled copy on write pages.

lkaplan@bbn.com (Larry Kaplan) (07/21/90)

In article <Fhtz5421@cs.psu.edu> schwartz@groucho.cs.psu.edu (Scott E. Schwartz) writes:
>
>In article <3260@swi.swi.psy.uva.nl> 
> jan@swi.psy.uva.nl (Jan Wielemaker) writes:
>| Both schemas are not ideal.  The mmap() requires a file and reads the
>| file into the page as a page is mapped.  This is a waste of resources; I
>| just want a page of uninitialised memory and need no file.
>
>Rather than mapping a file, you can map /dev/zero, which provides zero
>filled copy on write pages.

Ever hear of anonymous mapped memory.  Check out the 4.3BSD Architecture Manual
(PS1:6) for a description of this rarely (or almost never) implemented 
feature.  Using MAP_ANON causes the memory to be zero filled, but not 
necessarily copy on write.  It is more meant for sharing of zero initialized
pages.  The file descriptor is only used for rendezvous.  We've implemented
it here and found it very useful.

#include <std_disclaimer>
_______________________________________________________________________________
				 ____ \ / ____
Laurence S. Kaplan		|    \ 0 /    |		BBN Advanced Computers
lkaplan@bbn.com			 \____|||____/		10 Fawcett St.
(617) 873-2431			  /__/ | \__\		Cambridge, MA  02238

pereira@alice.UUCP (Fernando Pereira) (07/21/90)

In article <1990Jul19.151524.22544@diku.dk> torbenm@diku.dk (Torben [gidius Mogensen) writes:
>
>By using the MMU facilities present in most modern computers, these
>tests can be entirely done in hardware, and fragmented stacks can be
>done transparently. This is the idea:
>
>Give each stack or heap a small number of pages of memory and make
>sure that the logical addresses are far apart, and that a large
>logical address space is given each, but where only the first small
>part is actually mapped to physical (or virtual) memory.
>
>Assign to each stack/heap an event that is invoked by the MMU trap
>handler on access to unmapped memory. An event can ask for a new
>physical page to be mapped to the corresponding area, or maybe try to
>do garbage collection.
> [...]
>My question is: has this been done? And if not, why not?
>
>I suspect that operating systems like Unix doesn't provide such
>facilities to user processes, but the issue is then: should they?
>

I know of Prolog and ML implementers that have tried this, but unfortunately
it is not a portable solution in Unix, even those versions that offer the
mmap system call to create ``islands'' of virtual address space. The problem
is that certain machine architectures (eg. Motorola 68K) and OS
implementations (eg. SunOS at least in some versions) do not provide
a continuable address violation signal (SIGSEGV), even though at the kernel
level, address translation faults (page faults) are continuable. Not having
looked at the insides of those machine/OS combinations, I suspect that
enough instruction execution context can be saved for filling a page fault
in the kernel, but not enough for reentering the faulting process to
handle a Unix signal and allocate the needed storage.

These observations are the result of practical experiments 5 or so years
ago with Sun 2's and VAXen running Berkeley Unix. The former could not
recover correctly from the segmentation violation (PC corrupted on return
from the signal), the latter could. Newer machines/architectures might handle
this better, although comments I have heard from the SML-NJ implementers
indicate that there are problem configurations even now.

Fernando Pereira
2D-447, AT&T Bell Laboratories
600 Mountain Ave, Murray Hill, NJ 07974
pereira@research.att.com

pereira@alice.UUCP (Fernando Pereira) (07/21/90)

In article <3260@swi.swi.psy.uva.nl> jan@swi.psy.uva.nl (Jan Wielemaker) writes:
>The proposed schema is very simple to implement.  It provides
>dynamically growing stacks (to some limit, but on most modern machines
>this limit can be high as the virtual address space is large).  Expanding
>the stacks is almost for free.  On conventional systems a stack shifter
>needs to be implemented.  This is tiresome and shifting the stacks is
>an expensive operation.

I disagree on several counts:

1. The amortized cost of stack shifting, if the stack expansion and garbage-collection
scheduling factors are computed right, is in my experience trivial (<1% or runtime).
I wrote the first Prolog stack shifter, for DEC-10 Prolog, and once I got the
expansion factors right, no one ever complained about stack shifting costs.
I was also involved in designing and implementing the Quintus stack shifter,
which again has very low amortized costs because of properly chosen expansion
factors.

2. As I pointed out in an earlier message, it is not in general safe to rely
on being able to return from a segmentation violation in Unix. Even if it works
in your tests, it might fail for a violation occuring in particular processor
state, particularly if other aspects of the state (eg. page maps) are affected
by the signal function. Nothing in the OS specification guarantees that
segmentation violations can be returned from. And then there are OS bugs...

3. Very large (even if sparse) address spaces are not free, because with them
one gets into multilevel page tables and other address translation overheads.
In my experience, VM implementations are optimized for relatively small
and compact address spaces on various common machines. Large address spans can bring
out the worst in paging algorithms.

A well-designed bounds check mechanism costs relatively little,
is portable, and does not depend on the vagaries of machine and OS implementation.
The idea of using VM for bounds check has come up over and over again (we
considered it for DEC-10 Prolog in 1978, for example!), but the payoff
always seems mediocre compared with its complication, lack of portability
and sensitivity to OS bugs.

Fernando Pereira
AT&T Bell Labs, Murray Hill, NJ 07974
pereira@research.att.com

mo@messy.bellcore.com (Michael O'Dell) (07/22/90)

Interlisp did this as a matter of course, and Vax-Interlisp was
the single biggest reason for the originally-published
(as opposed to originally discussed but not published)
4.2BSD virtual memory proposal being page-based, just like Tenex.
This was finally implemented in Mach and SunOS 4.0, and
is forthcoming in 4.4BSD.

While this sounds like a good idea, it may not be on very fast
computers.  Providing user programs with completely general
program state access on segmentation faults can be VERY expensive.
I worked on one design whereby the kernel could do all the
usual stuff (copy-on-write, etc) by "reeffecting" faulting instructions
even in the face of out-of-order instruction completion,  But 
letting the user program poke its nose around as described by the
signal() interface requires "atomic trap behavior", and
that makes a program run MUCH MUCH slower on some machines.
For details, see my paper "Putting UNIX on Fast Computers"
in the latest USENIX Anaheim Conference Proceedings.

	-Mike

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (07/22/90)

If my memory serves me correctly, EMAS Prolog (a Prolog interpreter
written in IMP, running under the EMAS operating system on ICL 2900s)
handled Prolog's several stacks by mapping each of them to a scratch
file, leaving unmapped pages in between.  This was _early_ 80s.  The
C Prolog interpreter was derived from EMAS Prolog, and it was going
to use the same scheme just as soon as Berkeley implemented mmap (the
4.1 manuals said it would be in 4.2...).  Isn't the name for this scheme
"red pages"?  

-- 
Science is all about asking the right questions.  | ok@goanna.cs.rmit.oz.au
I'm afraid you just asked one of the wrong ones.  | (quote from Playfair)

guy@auspex.auspex.com (Guy Harris) (07/24/90)

>The problem is that certain machine architectures (eg. Motorola 68K) and OS
>implementations (eg. SunOS at least in some versions) do not provide
>a continuable address violation signal (SIGSEGV), even though at the kernel
>level, address translation faults (page faults) are continuable. Not having
>looked at the insides of those machine/OS combinations, I suspect that
>enough instruction execution context can be saved for filling a page fault
>in the kernel, but not enough for reentering the faulting process to
>handle a Unix signal and allocate the needed storage.

The amount of instruction execution context is the same in both cases;
the only difference is where it has to be stored.  I think SunOS 4.1 may
store enough of it outside the kernel stack to permit *one* such fault
to be continued from (i.e., don't expect to be able to return from a
SIGSEGV that occurs while you're handling a SIGSEGV).

Part of the problem is that Motorola:

1) wouldn't commit to the the "stack puke" stored by the 680[1andup]0
   being "safe" to hand to user-mode code; i.e., they wouldn't say
   "nothing you can do to the 'stack puke' is risky";

and

2) wouldn't describe the format of the "stack puke" to the extent
   necessary to have the kernel validate it.

I can see their not doing so as being perfectly legitimate; for all I
know, different revisions of the same chip may have different "stack
puke" formats, and even if they don't, they might not want any of that
stuff to be considered a "feature", and have people then write code
dependent on that stuff and lock them into continuing to provide those
"features".  It does, however, complicate the task of allowing user-mode
code to continue from a SIGSEGV.

>These observations are the result of practical experiments 5 or so years
>ago with Sun 2's and VAXen running Berkeley Unix. The former could not
>recover correctly from the segmentation violation (PC corrupted on return
>from the signal), the latter could.

The former has more context than the latter; the former has the 68010
"stack puke", the latter has, as I remember, just the First Part Done
bit (and some of the general-purpose registers, for some of the
long-running instructions like MOVCn).

The latter is safe and, I think, appears in the "signal context"
saved by a BSD signal, so that the instruction can be continued from
user mode without the kernel having to tuck away one or more sets of
context.

>Newer machines/architectures might handle this better,

I think most RISC machines (not entirely surprisingly) have less or no
context of that sort; I'd expect things to work OK on a SPARC-based Sun,
for example, as well as a MIPS-based machine. 

In fact, what architectures other than the 68K architectures have lots
of context for that?  I don't think the 386 or the WE32K, for example,
have that problem.

jan@swi.psy.uva.nl (Jan Wielemaker) (07/24/90)

Computers and operating systems are (should be) designed to make life
easier for the user and programmer.  Fernando Pereira claims that the
overhead of stack shifting and stack overflow checks is not very large,
that this schema is much better portable and that it is less critical
to OS errors.  All these arguments are correct.  There is another reason
for which I don't like them.  I do not open /dev/rxy0c to read and write
files because I do not trust the OS disk cache and file system handling.
Instead, I tend to use open(), read(), write() and close().  This problem
is a bit similar.  Using the MMU to handle the stacks, I get the
following advantages:

	- I do not have to think where to do stack overflow checks or
	  how to do them cleverly.  [Question: how do you know how
	  much will be written on the trail before the next point
	  where you can savely do a stack shift?]
	- I do not have to write a stack shifter, nor think about when
	  to call this thing and how much to grow the stacks.  For any
	  such parameters there always are programs for which they have
	  the wrong value.
	- If stack shifts were the only thing I wanted, I would not be
	  considered keeping track of references to the stacks (from
	  the virtual machine, foreign language code, etc.) as the stack
	  does not move.  [Unfortunately I need to keep track of all
	  these things for the garbage collector].
	- If I use malloc()/free() or a similar package for handling the
	  other data (notably the program), I will need to write my own
	  versions of them because the stacks need to be kept together
	  to avoid large unused chunks of memory. [Unluckily a dedicated
	  perfect fit algorith performs much better than malloc()/free()
	  of most OS's, so I wrote a dedicated memory alocation system
	  anyway, but I think the argument nevertheless holds.  Also,
	  my dedicated algorithm does nice on Prolog, but might not very
	  well suit foreign language code.  This code now still can use
	  malloc()/free() without conflicting with Prolog.]
	- Still, it IS faster (provided it works).

Having access to the MMU and the possibility to return from the SEGV
handler I save all the research and programming effort for this.  On
SunOs (from version 4.0.1) this works nice for SUN-3 and SPARC.  It also
works on GOULD PN9000 under UTX 2.1.  Probably there are more systems
that offer this.  Besides all this I even gain something concrete on
these systems: I really can give memory resources back to the system, so
after a query that used 4 MB of stack I will not claim these 4 MB for
the rest of the session.  I use this after calling the garbage collector
and after finishing a user query (there is a Prolog predicate for
deallocating everything above the used part of the stack).  I consider
to use it after deep backtracking if the used part of the stack is far
below the allocated part.

To me it is clear that access to the MMU simplifies the task of the
implementor of notably multi-stack languages.  The only real question
is: can an OS/hardware *in principle* do sparse addressing as efficient
as handling one continuous address space?  If the answer to this
question is negative there is a tradeof.  If the answer is positive any
decent OS should provide these facilities.

	Jan

alanf@bruce.cs.monash.OZ.AU (Alan Grant Finlay) (07/24/90)

I agree with those who point out that using MMU segment violations to trap
stack overflow is not a good idea.  Apart from the reasons which have already
been given, where hardware is concerned it should be used as intended.  What
we really need is for hardware designers to recognise the need for multiple
stack overflow detection and provide a uniform mechanism to support it.
I suggest a simple "notify addr" instruction which saves a limited number of
addresses in hardware registers and generates a trap when these addresses are
accessed.  The point is that if you want to open a can then you should use
a can opener, not a hammer.

pereira@alice.UUCP (Fernando Pereira) (07/25/90)

In article <3261@swi.swi.psy.uva.nl> jan@swi.psy.uva.nl (Jan Wielemaker) writes:
>Computers and operating systems are (should be) designed to make life
>easier for the user and programmer.  Fernando Pereira claims that the
>overhead of stack shifting and stack overflow checks is not very large,
>that this schema is much better portable and that it is less critical
>to OS errors.  All these arguments are correct.  There is another reason
>for which I don't like them.  I do not open /dev/rxy0c to read and write
>files because I do not trust the OS disk cache and file system handling.
>Instead, I tend to use open(), read(), write() and close().

The difference is that the interface offered by system calls such as open() is
part of the specification of the OS and so can be relied upon, whereas the
continuability of a SIGSEGV is not and thus cannot be expected to exist
in all implementations of Unix. Furthermore, even for a particular
implementation, the signal may be continuable in certain circumstances but
not others. As Guy Harris pointed out in a previous message, OS/machine
designers are free to do here whatever they want, since the specification
(such as it is) does not require continuability. My point was not that
some *possible* OS could not support the proposed use of address exceptions,
but that a widely used versions of the Unix OS do not.
I was talking about *current* engineering choices, not about OS design.

>	- I do not have to think where to do stack overflow checks or
>	  how to do them cleverly.  [Question: how do you know how
>	  much will be written on the trail before the next point
>	  where you can savely do a stack shift?]

Except for the general unifier, which would need to do the test every N'th
time through its main loop (N some appropriate margin), a safe upper
bound for # of trail entries can be computed for each clause. Furthermore,
the general unifier must be made interruptible anyway, to allow other
asynchronous signals or some desirable support for multi-threading.

>	- I do not have to write a stack shifter, nor think about when
>	  to call this thing and how much to grow the stacks.  For any
>	  such parameters there always are programs for which they have
>	  the wrong value.

It's not *that* hard! I've designed and used stack shifting schemes for
the last 12 years without being severely bitten by them. The crucial
point is to use a (simple) adaptive algorithm to set expansion factors.

>	- If stack shifts were the only thing I wanted, I would not be
>	  considered keeping track of references to the stacks (from
>	  the virtual machine, foreign language code, etc.) as the stack
>	  does not move.  [Unfortunately I need to keep track of all
>	  these things for the garbage collector].

You said it. You need that info for GC anyway.

>Having access to the MMU and the possibility to return from the SEGV
>handler I save all the research and programming effort for this.

You are trading off language implementation effort to OS design and
implementation effort. Unfortunately, one's ability to influence OS
design to address all our particular needs seems rather limited, and
after 15 years and involvement in several Prolog implementations I
have learned painfully that relying on more than the 10% most used portions
of an OS (you know, those that are used 90% of the time (-:)) is very
dangerous for portability and reliability.

Fernando Pereira
2D-447, AT&T Bell Laboratories
600 Mountain Ave, Murray Hill, NJ 07974
pereira@research.att.com

andrewt@cs.su.oz (Andrew Taylor) (07/25/90)

In article <11079@alice.UUCP> pereira@alice.UUCP () writes:
>2. As I pointed out in an earlier message, it is not in general safe to rely
>on being able to return from a segmentation violation in Unix. Even if it works
>in your tests, it might fail for a violation occuring in particular processor
>state, particularly if other aspects of the state (eg. page maps) are affected
>by the signal function. Nothing in the OS specification guarantees that
>segmentation violations can be returned from. And then there are OS bugs...
>
>3. Very large (even if sparse) address spaces are not free, because with them
>one gets into multilevel page tables and other address translation overheads.
>In my experience, VM implementations are optimized for relatively small
>and compact address spaces on various common machines.
>Large address spans can bring out the worst in paging algorithms.
>
>
>A well-designed bounds check mechanism costs relatively little, is portable,
>and does not depend on the vagaries of machine and OS implementation.
>The idea of using VM for bounds check has come up over and over again (we
>considered it for DEC-10 Prolog in 1978, for example!), but the payoff
>always seems mediocre compared with its complication, lack of portability
>and sensitivity to OS bugs.

I've been working on native code compilation for the MIPS architectures.
Even if you can afford to keep the a register for each stack limit,
The saving in execution time from moving checking to hardware can be at
least 20%. How you assess such a gain is dependant on your aims. My aim
is high performance and I'll go to great lengths for a 20% gain.

A demand for high performance may be rare yet in the Prolog world but
it isn't elsewhere, to quote John Mashey (from MIPS) -
"we've got customers who will commit vile acts for another ten percent"

Returning from a segmentation violation isn't a problem on the MIPS
nor on most RISCs I expect. I don't fully understand remark (3) but
I can't see how having 5 address regions instead of 3 will cause poor
VM performance. I'm more worried about multiple stacks causing poor cache
performance because of conflicts.

Invoking the garbage collector is a problem with hardware limit checking
because the stacks are likely not to be in a consistent state when an
exception occurs. One possibility, which I believe the ECRC's Sepia system
uses?, is for the exception routine to set a global flag which is checked
at CALL ports and the garbage collector invoked if it is set. This brings back
some of the costs which we were trying to avoid.

Its possible if all stack data words are tagged that a robust garbage collector
could operate with inconsistant stacks. If this can be made to work
and having all stack data tagged is not too onerous then this is the go.

Alternatively on the MIPS the exception could invoke the garbage collector
which simulates instruction execution until an instruction is reached where
the stacks are known to be consistant (e.g a call instruction). This will
definitely involve getting your hands dirty but I believe its feasible.

Andrew