[comp.arch] Interrupts in user space

moss@cs.umass.edu (Eliot Moss) (09/17/90)

In article <2128@key.COM> sjc@key.COM (Steve Correll) writes:

   Well, Prof. Rubin _did_ ask for an "interrupt", and most Unix kernels insist
   on being involved in any interrupts. And Unix is not unique in this.

My reaction is that there is no reason why an interrupt of this kind need go
through the OS kernel. I feel that there are many synchronous interrupts that
should be deliverable directly to the user program (or language run-time
system) without kernel intervention. Examples include overflow, divide by
zero, range/bounds errors, and even certain kinds of memory access faults
(i.e., ones where the user program is going to take and handle the exception).
It is reasonable for the machine to do a forced call through a vector location
in user space. If the user really does not want to handle the error, putting a
bad address in the vector would force a trap into the kernel.

While we're on the subject, a user-implemented flavor of system call is nice,
too. I'm thinking of something akin to the PDP-10 UUO (unimplemented user
operation) instruction. A short instruction that calls through a vector. This
is essneitally a call to a global subroutine, but exactly which subroutine it
is can be changed dynamically (by changing the vector), and the instruction
fits into substantially fewer bits. (Not entirely clear what the interaction
is with RISC here.)

Cheers!
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206; Moss@cs.umass.edu

jkenton@pinocchio.encore.com (Jeff Kenton) (09/17/90)

From article <MOSS.90Sep17084423@ibis.cs.umass.edu>, by moss@cs.umass.edu (Eliot Moss):
> In article <2128@key.COM> sjc@key.COM (Steve Correll) writes:
> 
>    Well, Prof. Rubin _did_ ask for an "interrupt", and most Unix kernels insist
>    on being involved in any interrupts. And Unix is not unique in this.
> 
> My reaction is that there is no reason why an interrupt of this kind need go
> through the OS kernel.

With the recent RISC chips (88000, MIPS and i860 come to mind) the overhead
of getting the machine state safely saved away in the low level exception
code is substantial.  You have to do this anyway before you can hand off
control to the user program "without going into the OS kernel", so the
savings don't amount to much.

----- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -----
----- jeff kenton  ---	temporarily at jkenton@pinocchio.encore.com ----- 
-----		   ---  always at (617) 894-4508  ---		    -----
----- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -----

mash@mips.COM (John Mashey) (09/18/90)

In article <12738@encore.Encore.COM> jkenton@pinocchio.encore.com (Jeff Kenton) writes:

>With the recent RISC chips (88000, MIPS and i860 come to mind) the overhead
>of getting the machine state safely saved away in the low level exception
>code is substantial.  You have to do this anyway before you can hand off
>control to the user program "without going into the OS kernel", so the
>savings don't amount to much.

1) In any machine, one must perform the appropriate state saving.

2) I recommend Mike O'Dell's paper in this summer's USENIX proceedings.
He has some insightful comments about the injteraction of the
interface to the kernel and aggressive hardware design.  Specifically,
he described the serious performance issues of overcommitting to the
user (either explicitly, or even worse, implicitly) the state of the
machine, and what can/cannot be expected in a signal-handling
routine.  As an early example, consider the pain caused many people by
the implicit requirements inherent in the Bourne shell's use of
memory-fault handling.....

3) This is not to say that minimal-overhead fault-handling is a bad
thing - it isn't, just that it is another area where:
	a) One must be careful.
	b) Completely unexepected side-effects can pop up and bite
	you - in the case Mike described, providing the exepcted
	signal-handling behavior sometimes cost them 2X or more in
	performance.
	c) In general, exception-handling is one of the most difficult
	to get right, and stays the buggiest longest, and adds years
	to systems programmers' ages....

-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

lewine@cheshirecat.rtp.dg.com (Donald Lewine) (09/18/90)

In article <MOSS.90Sep17084423@ibis.cs.umass.edu>, moss@cs.umass.edu (Eliot Moss) writes:
|> While we're on the subject, a user-implemented flavor of system call is nice,
|> too. I'm thinking of something akin to the PDP-10 UUO (unimplemented user
|> operation) instruction. A short instruction that calls through a vector. This
|> is essneitally a call to a global subroutine, but exactly which subroutine it
|> is can be changed dynamically (by changing the vector), and the instruction
|> fits into substantially fewer bits. 

Funny, that is exactly the problem we were trying to fix with the
VAX!  The *caller* should not need to know if he is calling the
kernel or another subroutine.  
 
It is far more likely to move a function into the kernel for speed
or into user space than to want to change the library function on 
the fly.  The problem with the UUO (or EMT on the PDP-11) is that 
the compiler had to know which routines were called with a CALL 
and which with a UUO.  [The alternative is a call to a routine which
does a UUO.  That does not provide the speed advantage you want!]
 
It is far better to use the RISC scheme of making all calls small and
fast than to make a selected few into UUOs.

BTW, If you do want to call a dyncmically changing subroutine
	CALL @vector
works just fine.

--------------------------------------------------------------------
Donald A. Lewine                (508) 870-9008 Voice
Data General Corporation        (508) 366-0750 FAX
4400 Computer Drive. MS D112A
Westboro, MA 01580  U.S.A.

uucp: uunet!dg!lewine   Internet: lewine@cheshirecat.webo.dg.com

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (09/18/90)

In article <12738@encore.Encore.COM>, jkenton@pinocchio.encore.com (Jeff Kenton) writes:
> From article <MOSS.90Sep17084423@ibis.cs.umass.edu>, by moss@cs.umass.edu (Eliot Moss):
> > My reaction is that there is no reason why an interrupt of this kind need go
> > through the OS kernel.

> With the recent RISC chips (88000, MIPS and i860 come to mind) the overhead
> of getting the machine state safely saved away in the low level exception
> code is substantial.

But you are still thinking in terms of asynchronous interrupts.
What we're talking about here is a "trap", and there isn't the
slightest reason why a trap should have to save any more state
than a procedure call.  The trap is synchronous:  it cannot happen
at an _arbitrary_ point during the execution of an instruction,
only when the architect chose to have it be detected.  It relates
specifically to the currently executing thread, not to some other
I/O device.

To make this absolutely specific and relate it to existing practice,
suppose we wanted to provide something like the VAX mode where
integer overflow generates a trap.  So suppose we wanted to have
an instruction

	ADDI	src1, src2, dest
	;; dest := src1 + src2
	;; on (signed) overflow, this sets dest to the result modulo
	;; 2**32, and calls the procedure whose address is stored in
	;; [0x400].

Why does *this* have to do more state-saving than the instruction
sequence
	LOAD	[0x400], rtemp
	CALL	rtemp

If you use the same machinery for this as you do for "power failure"
and "device ready", yes it's reasonable to expect a detour through
the OS.  But why _should_ you implement _this_ kind of trap that way?

-- 
Heuer's Law:  Any feature is a bug unless it can be turned off.

jkenton@pinocchio.encore.com (Jeff Kenton) (09/18/90)

From article <3783@goanna.cs.rmit.oz.au>, by ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe):
> In article <12738@encore.Encore.COM>, jkenton@pinocchio.encore.com (Jeff Kenton) writes:
> 
>> With the recent RISC chips (88000, MIPS and i860 come to mind) the overhead
>> of getting the machine state safely saved away in the low level exception
>> code is substantial.
> 
> But you are still thinking in terms of asynchronous interrupts.
> What we're talking about here is a "trap", and there isn't the
> slightest reason why a trap should have to save any more state
> than a procedure call.  The trap is synchronous:  it cannot happen
> at an _arbitrary_ point during the execution of an instruction,

Depends on the specifics.  You gave a VAX example; I had the 88000 in mind.
Even if this occurs as a synchronous trap you need to protect both the OS
kernel and the user code thread which was executing (in case you want to
dismiss the exception and continue).  On the 88000 this means saving at
least the volatile registers, checking the data pipeline for data faults
and letting the floating point unit drain (or take further exceptions).
If you implement this feature using existing hardware and having the MMU
indicate illegal memory access (as someone suggested), the exception is
not even synchronous.  The MMU will indicate a data fault several cycles
after the faulting instruction is issued.

All the world is not a VAX.

----- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -----
----- jeff kenton  ---	temporarily at jkenton@pinocchio.encore.com ----- 
-----		   ---  always at (617) 894-4508  ---		    -----
----- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -----

bob@tera.com (Bob Alverson) (09/18/90)

In article <3783@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>But you are still thinking in terms of asynchronous interrupts.
>What we're talking about here is a "trap", and there isn't the
>slightest reason why a trap should have to save any more state
>than a procedure call.  The trap is synchronous:  it cannot happen
>at an _arbitrary_ point during the execution of an instruction,
>only when the architect chose to have it be detected.  It relates
>specifically to the currently executing thread, not to some other
>I/O device.

For a "RISC", having operations which both compute (or load) and
conditionally branch is not a large problem.  However, you must expect
to have at least one and probably several branch delay slots after the
instruction.  If the condition you are checking is more complicated
than the conditions the regular jumps test, then this complex op will
probably need more branch delay slots than a normal jump.  If
exceptions can occur while you are in these extended delay slots,
then the trap handler must know about them and handle yet another
bizarre case properly.  Is it worth it?

If you insist on no branch delay slots, then the instruction will take
longer to execute than the simple instructions it replaces, since it
cannot draw from the surrounding instructions to keep the pipeline
full (whereas the compiler can schedule the pipeline with separate
instructions).

Bob (bob@tera.com)

rro@debussy.cs.colostate.edu (Rod Oldehoeft) (09/19/90)

In article <12743@encore.Encore.COM> jkenton@pinocchio.encore.com (Jeff Kenton) writes:
>
>All the world is not a VAX.
>

Right.  Burton Smith's Tera Horizon architecture has the capability
for testing a little bit map in the memory instruction against the tag
on a data memory word, and doing a trap to a user-defined location on
a match or mismatch, with no intervention at all by an OS.  This nifty
feature has many uses:  the definition of I-structure memory without
software intervention, boundaries around stretches of memory, etc.

I don't recall if arithmetic traps can be mapped to user routines just
as simply or not.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (09/20/90)

In article <1990Sep18.152339.25203@tera.com>, bob@tera.com (Bob Alverson) writes:
> In article <3783@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:

> For a "RISC", having operations which both compute (or load) and
> conditionally branch is not a large problem.

I have to confess that I'm playing devil's advocate here.
I too am a True Believer in the Revealed Instruction Set Creed.
But I don't recall either Rubin or me talking about "RISC"s in this context.

> However, you must expect
> to have at least one and probably several branch delay slots after the
> instruction.

The particular case we were considering was "fetch from a buffer and
trap on underflow".  For many of today's machines, we've already to
worry about memory fetch delay, which would typically swamp the delay
due to the comparison.  Let's look at what's involved in a RISCish version:

	FETCH	curptr, limit, dest
	<instr>
	<instr>
 	-- now it is safe to use "dest"

As far as delay slot filling is concerned, this is exactly like
	[ Bcc <handler> | LOAD (curptr), dest ]
{that is, both kinds of instruction combined in one}.  Either the
instructions in the delay slots can safely be executed anyway, or
else you have to be able to annul them.  But that's true of any
conditional branch.  How complex is this condition?  Well, the 88k
does "if R1 = R2 then goto L" as one conditional branch, and that's
all we need here.  So

> If the condition you are checking is more complicated
> than the conditions the regular jumps test,

in _this_ specific case the condition would _not_ be more complicated.

> If exceptions can occur while you are in these extended delay slots,
> then the trap handler must know about them and handle yet another
> bizarre case properly.  Is it worth it?

What "extended" delay slots?  One or two slots, depending on how the
machine handles other _simple_ conditional branches.  If I have a
conditional branch
	BEQ R1, R2, L
my code at label L doesn't have to worry about exceptions in the
delay slot instructions following BEQ.  It's the handler for _those_
exceptions which has to worry.  So if an exception could happen in
the one or two instructions following a FETCH instruction, it's the
handler (this one's in the OS kernel) for _those_ exceptions which
has to know that FETCH is like a conditional branch, but it already
has to know about conditional branches.

-- 
Heuer's Law:  Any feature is a bug unless it can be turned off.

dfields@neutrino.urbana.mcd.mot.com (David Fields) (09/20/90)

In article <3793@goanna.cs.rmit.oz.au>, ok@goanna.cs.rmit.oz.au (Richard
A. O'Keefe) writes:
|>As far as delay slot filling is concerned, this is exactly like
|>	[ Bcc <handler> | LOAD (curptr), dest ]
|>{that is, both kinds of instruction combined in one}.  Either the
|>instructions in the delay slots can safely be executed anyway, or
|>else you have to be able to annul them.  But that's true of any
|>conditional branch.  How complex is this condition?  Well, the 88k
|>does "if R1 = R2 then goto L" as one conditional branch, and that's
|>all we need here.  So
|>

Just to set the recored straight, the 88k does not have such an instruction.
There is a conditional branch which tests for eq0,ne0,gt0,ge0,lt0 and le0
but if you want to test equality of two registers it's a two instruction
sequence.

Others have pointed out the tricks required to resume after an exception
on the 88100.
                                                        
Dave Fields // Motorola MCD // uiucuxc!udc!dfields //
dfields@urbana.mcd.mot.com