[comp.arch] long instructions & page faults

jjb@sequent.UUCP (Jeff Berkowitz) (07/16/89)

In article <1449@mdbs.UUCP> wsmith@mdbs.UUCP (Bill Smith) writes:
>
>What will happen when [a long] instruction crosses a page boundary and
>a page fault occurs?   Is there some trickery that must be written into
>the operating system to avoid thrashing when the instruction is restarted.
>
[Finally a legitimate architecture topic!]

I've seen the paging code for several "well known" CISCs (VAX, 68010/020,
ns32000, i386).  I'm not aware of any "trickery" THAT THE CPU DESIGNERS
PLANNED FOR being required on any of these machines.  The machine can
indeed fault on a piece of an instruction during a return from interrupt
or page fault.  If memory is sufficiently tight, the machine might thrash
here.  This is no more or less serious than thrashing because e.g. the two
data pages required for a block copy can't be kept in the working set.

Page boundries seem to be a fertile source of CPU hardware bugs, however.
On more than one of the processors in the list, achieving *reliable* page
fault response requires the operating system to contain bug workarounds
that can be described as "trickery" - dealing with registers that don't
contain the expected information if the fault occurred during the execution
of a certain opcode which happened to cross a page boundry, etc.

I imagine the page-fault-time (hardware) state save operations on heavily
pipelined CISCs like the 486 and 68040 must be extraordinarily complex.
Perhaps a designer out there can comment about this.  Current generation
RISCs achieve comparable (or better) performance with much lower
complexity in this area, yes?  Complexity always ends up costing money -
silicon that could have been used for functionality rather than sequencing
the state save operations, etc.

The now-defunct Culler 7 had a three stage instruction pipeline with
prebranching.  Return from interrupt (including page fault) involved
having the CPU refetch all the program memory containing all the
instructions that were in the pipeline at the time of the interrupt.
The machine was not interruptible during this time, so the kernel had
to guarantee that all the pages containing all this code were present
and wired down during the interrupt return.  Since the pipeline could
contain branches to branches, etc, this worked out to possibly four
separate pages.  As might be imagined, ensuring that all four pages
were wired down at interrupt return time happened caused significant
complexity in the OS.  I believe this translated into large time cost
in restarting faults, but have no measurements to prove it.
-- 
Jeff Berkowitz N6QOM			uunet!sequent!jjb
Sequent Computer Systems		Custom Systems Group

slackey@bbn.com (Stan Lackey) (07/17/89)

In article <18766@sequent.UUCP> jjb@sequent.UUCP (Jeff Berkowitz) writes:
>In article <1449@mdbs.UUCP> wsmith@mdbs.UUCP (Bill Smith) writes:
>>What will happen when [a long] instruction crosses a page boundary and
>>a page fault occurs? 

Actually, in the case of the VAX, a two-byte instruction can cross a
page boundary.

There are two? solutions.  One would be to "back out" of an instruction,
that is, pretend it never started, before taking the trap.  This would
require saving the PC of the opcode, and un-autoinc/dec'ing of registers
that had been done along the way.  The second method is to actually save
and restore as much raw inner hardware state as is necessary to resume
the instruction upon return.  Advantages of (1) include less hardware
(don't need read/write paths to so much hardware state, and the sequencing
logic to handle the process).  (2) would allow the instruction to be
more "continuable" in the sense that once a page is used in the parsing
of an instruction, it can be paged out and not needed again after the
page fault causing the problem has been brought in.

>Page boundries seem to be a fertile source of CPU hardware bugs, however.

This is one of the many reasons RAM is desirable for microcode.  The
supplier can fix bugs by sending out patches.

>I imagine the page-fault-time (hardware) state save operations on heavily
>pipelined CISCs like the 486 and 68040 must be extraordinarily complex.
>Perhaps a designer out there can comment about this.  Current generation
>RISCs achieve comparable (or better) performance with much lower
>complexity in this area, yes?  Complexity always ends up costing money ...

A lot of this complexity is common to both RISC's and CISC's,
howwever.  This would be the class of problem of instruction N gets a
page fault doing its memory operation, while N+1 and N-1 etc are still
in various stages of execution.  One way to keep this complexity to a
minimum is to design the pipeline such that exeptions can only happen
at fixed points, that when an exceptions happen all older instructions
complete, and that any newer instructions (started after the
instruction causing the exception) are aborted.  Then, state can be
saved, and execution eventually resumed, in the most straightforward
way.  

Nah - just spill all hardware state on the stack.
-Stan

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (07/19/89)

Another fun one is what happens when instruction fetchahead 
pagefaults. 

- the program may branch before it uses the page, thus you've wasted
  the cycles.
- the program may branch before it uses the page, thus making any
  horrendous error condition into a non-error that you must not
  report.
- the pipeline conditions are different from the pipeline conditions
  during a faulted load/store. Hence, if you take these faults, then
  the fault handler has to be able to tell the difference, and has to
  be able to restore to the correct state.

Early models of the IBM PC/RT had a hardware bug in this area. They 
patched it (on that model) by suppressing fetchahead.
-- 
Don		D.C.Lindsay 	Carnegie Mellon School of Computer Science