[comp.arch] Demand paged virtual memory

blarson@skat.usc.edu (Bob Larson) (09/29/87)

In article <6488@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>Basically, three things are necessary for full support of demand-paged
>virtual memory:  mappable per-process virtual address space pages,
>generation of a trap when a reference is made to an unmapped page,
>and ability to restart the faulted instruction after changing the map.

Instruction restart is NOT needed, instruction continuation also works
quite well.  The 68010 and 68020 both use instruction continuation.  If
there is strict control of code generation, it is even possible to
imagine a demand page virtual memory system that does not need either.
(Preceed each memory reference instruction by an address validity
check instruction -- I assume this would be used on a RISC processor if
anywhere.)  Many real systems use a combination of techniques: I.E.
the PDP-10 uses restart except for the BLT instruction which uses
continuation.  (I think some of the KL only instuctions also continued.)

Each technique has advantages and disadvantages, I don't want to start
another holy war.  (The basic tradoff is the time to re-do on restart
vs. the time and space used to store additional context for
continuation -- obviously different for different architectures.)
--
Bob Larson		Arpa: Blarson@Ecla.Usc.Edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson		blarson@skat.usc.edu
Prime mailing list (requests):	info-prime-request%fns1@ecla.usc.edu

mash@mips.UUCP (09/30/87)

In article <4558@oberon.USC.EDU> blarson@skat.usc.edu (Bob Larson) writes:

>Instruction restart is NOT needed, instruction continuation also works
>quite well.  The 68010 and 68020 both use instruction continuation.  If
>there is strict control of code generation, it is even possible to
>imagine a demand page virtual memory system that does not need either.
>(Preceed each memory reference instruction by an address validity
>check instruction -- I assume this would be used on a RISC processor if
>anywhere.)...
Hmmm.  I don't know anybody who actually does this on a RISC,
but it would be interesting to hear, if so.
Re: continuation versus restart: RISC designs tend to have the
kinds of pipelines that favor restart, although some of the multiple-PC-on-
exception schemes are almost like continuation.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

aglew@ccvaxa.UUCP (09/30/87)

>/* Written  7:28 pm  Sep 28, 1987 by blarson@skat.usc.edu in ccvaxa:comp.arch */
>/* ---------- "Demand paged virtual memory (was Re" ---------- */
>In article <6488@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>>Basically, three things are necessary for full support of demand-paged
>>virtual memory:  mappable per-process virtual address space pages,
>>generation of a trap when a reference is made to an unmapped page,
>>and ability to restart the faulted instruction after changing the map.
>
>Instruction restart is NOT needed, instruction continuation also works
>quite well.

You think that instruction continuation vs. instruction restart would be
a holy war? How about this: neither restart nor continuation of the faulting
instruction is necessary. All you have to do is to back up to some
consistent state prior to the fault, fetch the page, and then start up
before the faulting instruction and steam past it.

There have been a whole slew of papers on checkpoint/restart mechanisms
recently (cf. Wen-Mei Hwu in CompArch 14).

Or you can go whole hog, and say that precise (or micro-precise) page
faults aren't necessary at all.

	Many people are familiar with th Tomasulo algorithm for
instruction dispatch: dispatch an instruction as soon as all of its
inputs are ready from previous instructions (plus a few extra
conditions).
	Turn this on its head: don't write back the results of an instruction
until all previous instructions whose inputs may be modified by this
instructions outputs have completed.
	This will give you imprecise traps, with some instructions in the
future having completed before preceding instructions - but it is restartable
in the sense that it will not hurt to re-execute those instructions.

Eg. in a time sequence of instructions:

	   ....xxxxxxxxddddd--dxxx--xxxPxx------
		       ^  	       ^
		       | LPC	       | Page Fault
		       | All previous instructions have completed

	Where x = completed instruction, d = dispatched but not written back,
	- = not yet started, and P is the page fault

	Even though the page fault doesnt have precise state,
	restart at LPC is possible if the Tomasulo condition on writeback
	is satisfied.

---

Berkeley's Kahan, at a recent talk, said that the traditional objection,
that imprecise traps are hard to debug, doesn't really apply to his class
of problems - or, rather, it already does, because optimizing compilers
move his code around so much that he already has imprecise faults.
Yet he still manages to produce good, fairly well debugged, code.


Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms.arpa

I always felt that disclaimers were silly and affected, but there are people
who let themselves be affected by silly things, so: my opinions are my own,
and not the opinions of my employer, or any other organisation with which I am
affiliated. I indicate my employer only so that other people may account for
any possible bias I may have towards my employer's products or systems.

radford@calgary.UUCP (Radford Neal) (10/01/87)

In article <4558@oberon.USC.EDU>, blarson@skat.usc.edu (Bob Larson) writes:
> ... If there is strict control of code generation, it is even possible to
> imagine a demand page virtual memory system that does not need either.
> (Preceed each memory reference instruction by an address validity
> check instruction -- I assume this would be used on a RISC processor if
> anywhere.)

I've seen the equivalent of this in a 68000 C compiler. The procedure
entry prologue did an otherwise unnecessary memory reference, apparently
so that a stack-expansion address error would occur in a well-defined
context that the kernel knew how to restart from.

   Radford Neal
   The University of Calgary

oster@dewey.soe.berkeley.edu (David Phillip Oster) (10/08/87)

In article <1082@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes:

>In article <4558@oberon.USC.EDU>, blarson@skat.usc.edu (Bob Larson) writes:
>> ... If there is strict control of code generation, it is even possible to
>> imagine a demand page virtual memory system that does not need either.
>> (Preceed each memory reference instruction by an address validity
>> check instruction )

>I've seen the equivalent of this in a 68000 C compiler. The procedure
>entry prologue did an otherwise unnecessary memory reference, apparently
>so that a stack-expansion address error would occur in a well-defined
>context that the kernel knew how to restart from.



Apple's Lisa Office System used this technique. They had a 68000 in the Lisa
and a small amount of extra hardware that compared the address on the address
bus against upper-bound and lower-bound values. Out of bounds addresses
generated an interrupt. The hardware actually supported 4 pairs of bounds,
and these were mapped as stack, instruction space, data space, i/o space.

This allowed the stack to grow to bigger than main memory (swapping
stack pages to disk as needed, protected processes against each other
(and protected the O.S.)  It did slightly increase the context switch
time though (the 8 limit registers were part of the process context.)
Since Apple controlled the compilers for the machine, it was no
problem to have them generate position independent code, with
consistant stack frames. To swap out a code segment, just swap it out
and patch the stack. (You can patch the stack because the compiler
has left hints in the stack frames.)  When you need to swap it back
in, the patched stack "returns" to a handler that reads the segment
into a new location, and fixes up the stack to point to where the code
is now.  The data is harder to move around, but with the compiler putting
enough hints into the segment header, even this can be made to work.

On top of this layer, it had a hierarchical file system, all
applications were written in a Pascal-with-object-oriented-extensions,
and it supported multiple applications, each with multiple windows,
all running concurrently.  with cut and paste of structured data
between applications. Kind of like a modern macintosh+multifinder
system but more integrated and with virtual memory too.

Very sophisticated for a company who's previous products had all been
6502 based.  I was a certified developer at the time. It was slower
than other ways of using those parts, but incredibly easier to use
(hence made better use of people's time) than anything in its price
class at the time. I believe it failed becuase of poor marketing to
corporations, and the fact that third party developers couldn't get
their hands on the documentation for the class hierarchy used by
Clascal, so it was impossible to add new applications that were
integrated into the system.  They copied the Xerox Star too well.  To
use most third party software, you had to reboot the machine under a
different operating system. (Even the development environment ran on
the same hardware unde a different, less spiffy, operating system.)

Burroughs mainframes use a similar technique.  It is kind of RISCy:
Their computations showed that it was more effiecient to use their
silicon real estate to have hardware support for a segment based
architecture, with bounds checking, position independence, and tagged data
and do the stack patching I've just described in software then it would
have been to do virtual address translation in hardware and all the rest
in hardware.

bobw@wdl1.UUCP (Robert Lee Wilson Jr.) (10/08/87)

I think your instruction that checks validity will have to be a
special instruction that locks out interrupts for the subsequent
instruction (except for some special interrupts corresponding to
memory invalid): Otherwise you could get a context switch in there
with the result that memory checks valid on one instruction but by
the time you execute the next instruction (from the viewpoint of
this process) pages have been rearranged, etc., so that the real
memory reference faults, and you have the same restart/continue
problem that sparked this discussion in the first place!

(My opinions, not my employer's: As to facts, we're still arguing.)