[comp.arch] Interrupts in user space; Lightweight Traps

kend@data.UUCP (Ken Dickey) (09/26/90)

jkenton@pinocchio.encore.com (Jeff Kenton) writes:
>With the recent RISC chips (88000, MIPS and i860 come to mind) the overhead
>of getting the machine state safely saved away in the low level exception
>code is substantial.  You have to do this anyway before you can hand off
>control to the user program "without going into the OS kernel", so the
>savings don't amount to much.

Let's talk about the next generation...

There is another aspect of this which needs attention by HW
implementors.  There is a desire for fast computing.  There is a
desire for reasonable models of computation.  The problem is that all
traps (e.g. on the 88k) are "heavyweight" in the sense that for
reasons mentioned above, OS designers don't give compiler/runtime
implementors control directly.  Typically, all traps get reflected
through the O.S.  Divide by zero and other "show stoppers" can take a
long time to service, but who cares?  Such exceptions reflect to a
runtime handler (which may be a debugger).  The real pain with
heavyweight traps is trying to give reasonable performance with
reasonable models.

As an example, if one gets an integer numeric overflow, some languages
change representations and go to bignums.  It is a real pain (not to
mention performance loss and code expansion) to implement the guard
bits and checks almost everywhere because the runtime system can't
afford the time to take a heavyweight trap.  The same situation
applies whether checking tagged data on polymorphic operation dispatch
as in using operations on logical entities like numbers which have
multiple representations.

I think that the thing to concentrate on is design of HW architectures
which support the above so that there are a set of "lightweight traps"
which can be safely reflected to the runtime system.  This could lead
to a large savings in code size.

I still hear "C only needs fixnums and floats, Unix only core dumps so
we don't have to do better."  C and Unix are in the class 20 year old
technologies (circa 1972).  A more general and mature view of OS and
language implementation technologies than this is definitely called
for.  [Of course none of us would say that 8^]!

-Ken Dickey			kend@data.uucp

rpw3@rigden.wpd.sgi.com (Rob Warnock) (09/28/90)

In article <413@data.UUCP> kend@data.UUCP (Ken Dickey) writes:
+---------------
| I think that the thing to concentrate on is design of HW architectures
| which support the above so that there are a set of "lightweight traps"
| which can be safely reflected to the runtime system.  This could lead
| to a large savings in code size.
+---------------

In both of the Unix ports done to the Am29000 (S5 & BSD), the "spill/fill"
assertion traps [used to shift the register window] were "trampolined" back
to user mode code to perform the actual operation. In addition, a user program
could request that most other (synchronous) traps also "bounce" back to a
user-mode handler. The trampoline code was quite short. In the case of
spill & fill, the address of the user-mode handler was kept in a kernel-
protected register, thus the trampoline code was only 5 instructions:

uspilltrap:
        mfsr    tpc, PC1                ; save return address
        mtsr    PC1, uspill             ; "branch"
        add     tav, uspill, 4          ; (sequential fetch)
        mtsr    PC0, tav                ; "branch" completely
        iret

This was done for a couple of reasons reasons:

- The copying done in a spill/fill might result in page faults or stack
  growth, both of which were a lot easier to handle if they came from user
  mode rather than from some random kernel trap handler.

- You have to save/restore a dozen+ words of 29k CPU state before you can
  safely "come off freeze mode", which you have to do to use things like
  load & store multiple (which the spill/fill code wants to use!), so bouncing
  straight back to user mode saves all that saving/restoring. [None of that
  state is actually useful in the case of a synchronous trap from an "ASSERT"
  or "EMULATE" instruction, so nothing is "lost".]

Other uses than spill/fill were found for this feature. Since in the 29k BCS
different operating systems use different trap vector numbers, is was possible
to write a user-mode syscall-emulation routine that could be bound with an
object module from another operating system, so that system calls from that
other system could be emulated under Unix by the user-mode library.

Most of the compiler/assembler/tool vendors for the 29k targeted their tools
for a small "operating system" (originally called "HIF", later changed to some
other name -- I forgot what) that ran on an AMD prototyping board (PCEB/29k)
that plugged into a PC. HIF provided a POSIX-subset interface, which was mapped
to MS/DOS calls (by a program running in the PC). But the above "trampoline"
feature made it possible for us to write a HIF system call emulator that was
bound with the HIF object files, so we could instantly run all the PC-targeted
tools under Unix!

[No, nobody ever tried to emulate Sys-V under BSD or vice-versa! ;-} ]

And if you want to go back *really* far, the DEC PDP-10 hardware dispatched
half of the "UUO"s (Unimplemented User Operations -- trap/emulate instructions,
really) to kernel mode (where they were used for system calls), and the other
half directly back to the user process. Many of the compilers (especially
FORTRAN) used trap-to-user UUOs to synthesize "fat instructions" which were
implemented by the run-time libraries. The generated code for FORTRAN I/O was
full of these things...


-Rob

-----
Rob Warnock, MS-9U/510		rpw3@sgi.com		rpw3@pei.com
Silicon Graphics, Inc.		(415)335-1673		Protocol Engines, Inc.
2011 N. Shoreline Blvd.
Mountain View, CA  94039-7311

jkenton@pinocchio.encore.com (Jeff Kenton) (09/28/90)

From article <70576@sgi.sgi.com>, by rpw3@rigden.wpd.sgi.com (Rob Warnock):
> In article <413@data.UUCP> kend@data.UUCP (Ken Dickey) writes:
> +---------------
> | I think that the thing to concentrate on is design of HW architectures
> | which support the above so that there are a set of "lightweight traps"
> | which can be safely reflected to the runtime system.  This could lead
> | to a large savings in code size.
> +---------------
> 
> In both of the Unix ports done to the Am29000 (S5 & BSD), the "spill/fill"
> assertion traps [used to shift the register window] were "trampolined" back
> to user mode code to perform the actual operation. In addition, a user program
> could request that most other (synchronous) traps also "bounce" back to a
> user-mode handler. The trampoline code was quite short.

On the 88000 this sort of fast trap is also possible, and various suppliers
of 88000 boxes have made use of them (especially the real-time people). The
question which started this thread, however, had to do with Herman Rubin's
request for a buffer full (or was it empty) exception. This would occur under
arbitrary circumstances without the need for the user code to issue any test
or trap instructions (TCND or TBND on the 88000).

Providing special services to user code is a lot harder when the user isn't
willing to do at least a little of the work himself.

----- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -----
----- jeff kenton  ---	temporarily at jkenton@pinocchio.encore.com ----- 
-----		   ---  always at (617) 894-4508  ---		    -----
----- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -----

feustel@netcom.UUCP (David Feustel) (09/29/90)

These traps could be simply implemented in the Intel 80586 by
implementing a software interrupt bit vector similar to the io port
permission bitmap that already exists on the 386 and 486. If the sw
int bit is clear, control goes to the system when the corresponding
sw int instruction is executed. Otherwise index into the first 256
pointers in low memory in the user task to get the address of the
routine to be called.
-- 
David Feustel, 1930 Curdes Ave, Fort Wayne, IN 46805, (219) 482-9631

seanf@sco.COM (Sean Fagan) (10/02/90)

In article <12824@encore.Encore.COM> jkenton@pinocchio.encore.com (Jeff Kenton) writes:
>On the 88000 this sort of fast trap is also possible, and various suppliers
>of 88000 boxes have made use of them (especially the real-time people). 

So can the '386.  The question is, is it a win, in the general case, to do it,
and how do you do it?  I can picture a system call

	void (*trap) (int trapno, void (*)(int)) (int);

similar to the unix signal() syscall.  (Obviously, you would then set it up
such that the trap would jump into user mode.)

And, of course, you would have a header file (<trap.h> or <sys/trap.h>),
which would define the trap numbers.  You would only want to allow certain
conditions to be trapped, such as overflow, divide by zero, fp exception,
etc.

For languages such as Ada or PL/I, which want to be able to trap these
things, it would certainly be quicker (I guess 8-)) than having to go
through the kernel for the trap.  But I think the major speedup would be
from the fact that you could make a general function, which called a
function pointer, which you could change when you entered or left blocks
(instead of making a system call for each exception you wish to handle).

Note that I've moved this to comp.os.misc; I think it's more appropriate
there (or comp.lang.misc, if it comes to that, but I think c.o.m is still
correct).

-- 
-----------------+
Sean Eric Fagan  | "Never knock on Death's door:  ring the bell and 
seanf@sco.COM    |   run away!  Death really hates that!"
uunet!sco!seanf  |     -- Dr. Mike Stratford (Matt Frewer, "Doctor, Doctor")
(408) 458-1422   | Any opinions expressed are my own, not my employers'.