moss@cs.umass.edu (Eliot Moss) (09/17/90)
In article <2128@key.COM> sjc@key.COM (Steve Correll) writes:
Well, Prof. Rubin _did_ ask for an "interrupt", and most Unix kernels insist
on being involved in any interrupts. And Unix is not unique in this.
My reaction is that there is no reason why an interrupt of this kind need go
through the OS kernel. I feel that there are many synchronous interrupts that
should be deliverable directly to the user program (or language run-time
system) without kernel intervention. Examples include overflow, divide by
zero, range/bounds errors, and even certain kinds of memory access faults
(i.e., ones where the user program is going to take and handle the exception).
It is reasonable for the machine to do a forced call through a vector location
in user space. If the user really does not want to handle the error, putting a
bad address in the vector would force a trap into the kernel.
While we're on the subject, a user-implemented flavor of system call is nice,
too. I'm thinking of something akin to the PDP-10 UUO (unimplemented user
operation) instruction. A short instruction that calls through a vector. This
is essneitally a call to a global subroutine, but exactly which subroutine it
is can be changed dynamically (by changing the vector), and the instruction
fits into substantially fewer bits. (Not entirely clear what the interaction
is with RISC here.)
Cheers!
--
J. Eliot B. Moss, Assistant Professor
Department of Computer and Information Science
Lederle Graduate Research Center
University of Massachusetts
Amherst, MA 01003
(413) 545-4206; Moss@cs.umass.edu
jkenton@pinocchio.encore.com (Jeff Kenton) (09/17/90)
From article <MOSS.90Sep17084423@ibis.cs.umass.edu>, by moss@cs.umass.edu (Eliot Moss): > In article <2128@key.COM> sjc@key.COM (Steve Correll) writes: > > Well, Prof. Rubin _did_ ask for an "interrupt", and most Unix kernels insist > on being involved in any interrupts. And Unix is not unique in this. > > My reaction is that there is no reason why an interrupt of this kind need go > through the OS kernel. With the recent RISC chips (88000, MIPS and i860 come to mind) the overhead of getting the machine state safely saved away in the low level exception code is substantial. You have to do this anyway before you can hand off control to the user program "without going into the OS kernel", so the savings don't amount to much. ----- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ----- ----- jeff kenton --- temporarily at jkenton@pinocchio.encore.com ----- ----- --- always at (617) 894-4508 --- ----- ----- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -----
mash@mips.COM (John Mashey) (09/18/90)
In article <12738@encore.Encore.COM> jkenton@pinocchio.encore.com (Jeff Kenton) writes: >With the recent RISC chips (88000, MIPS and i860 come to mind) the overhead >of getting the machine state safely saved away in the low level exception >code is substantial. You have to do this anyway before you can hand off >control to the user program "without going into the OS kernel", so the >savings don't amount to much. 1) In any machine, one must perform the appropriate state saving. 2) I recommend Mike O'Dell's paper in this summer's USENIX proceedings. He has some insightful comments about the injteraction of the interface to the kernel and aggressive hardware design. Specifically, he described the serious performance issues of overcommitting to the user (either explicitly, or even worse, implicitly) the state of the machine, and what can/cannot be expected in a signal-handling routine. As an early example, consider the pain caused many people by the implicit requirements inherent in the Bourne shell's use of memory-fault handling..... 3) This is not to say that minimal-overhead fault-handling is a bad thing - it isn't, just that it is another area where: a) One must be careful. b) Completely unexepected side-effects can pop up and bite you - in the case Mike described, providing the exepcted signal-handling behavior sometimes cost them 2X or more in performance. c) In general, exception-handling is one of the most difficult to get right, and stays the buggiest longest, and adds years to systems programmers' ages.... -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
lewine@cheshirecat.rtp.dg.com (Donald Lewine) (09/18/90)
In article <MOSS.90Sep17084423@ibis.cs.umass.edu>, moss@cs.umass.edu (Eliot Moss) writes: |> While we're on the subject, a user-implemented flavor of system call is nice, |> too. I'm thinking of something akin to the PDP-10 UUO (unimplemented user |> operation) instruction. A short instruction that calls through a vector. This |> is essneitally a call to a global subroutine, but exactly which subroutine it |> is can be changed dynamically (by changing the vector), and the instruction |> fits into substantially fewer bits. Funny, that is exactly the problem we were trying to fix with the VAX! The *caller* should not need to know if he is calling the kernel or another subroutine. It is far more likely to move a function into the kernel for speed or into user space than to want to change the library function on the fly. The problem with the UUO (or EMT on the PDP-11) is that the compiler had to know which routines were called with a CALL and which with a UUO. [The alternative is a call to a routine which does a UUO. That does not provide the speed advantage you want!] It is far better to use the RISC scheme of making all calls small and fast than to make a selected few into UUOs. BTW, If you do want to call a dyncmically changing subroutine CALL @vector works just fine. -------------------------------------------------------------------- Donald A. Lewine (508) 870-9008 Voice Data General Corporation (508) 366-0750 FAX 4400 Computer Drive. MS D112A Westboro, MA 01580 U.S.A. uucp: uunet!dg!lewine Internet: lewine@cheshirecat.webo.dg.com
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (09/18/90)
In article <12738@encore.Encore.COM>, jkenton@pinocchio.encore.com (Jeff Kenton) writes: > From article <MOSS.90Sep17084423@ibis.cs.umass.edu>, by moss@cs.umass.edu (Eliot Moss): > > My reaction is that there is no reason why an interrupt of this kind need go > > through the OS kernel. > With the recent RISC chips (88000, MIPS and i860 come to mind) the overhead > of getting the machine state safely saved away in the low level exception > code is substantial. But you are still thinking in terms of asynchronous interrupts. What we're talking about here is a "trap", and there isn't the slightest reason why a trap should have to save any more state than a procedure call. The trap is synchronous: it cannot happen at an _arbitrary_ point during the execution of an instruction, only when the architect chose to have it be detected. It relates specifically to the currently executing thread, not to some other I/O device. To make this absolutely specific and relate it to existing practice, suppose we wanted to provide something like the VAX mode where integer overflow generates a trap. So suppose we wanted to have an instruction ADDI src1, src2, dest ;; dest := src1 + src2 ;; on (signed) overflow, this sets dest to the result modulo ;; 2**32, and calls the procedure whose address is stored in ;; [0x400]. Why does *this* have to do more state-saving than the instruction sequence LOAD [0x400], rtemp CALL rtemp If you use the same machinery for this as you do for "power failure" and "device ready", yes it's reasonable to expect a detour through the OS. But why _should_ you implement _this_ kind of trap that way? -- Heuer's Law: Any feature is a bug unless it can be turned off.
jkenton@pinocchio.encore.com (Jeff Kenton) (09/18/90)
From article <3783@goanna.cs.rmit.oz.au>, by ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe): > In article <12738@encore.Encore.COM>, jkenton@pinocchio.encore.com (Jeff Kenton) writes: > >> With the recent RISC chips (88000, MIPS and i860 come to mind) the overhead >> of getting the machine state safely saved away in the low level exception >> code is substantial. > > But you are still thinking in terms of asynchronous interrupts. > What we're talking about here is a "trap", and there isn't the > slightest reason why a trap should have to save any more state > than a procedure call. The trap is synchronous: it cannot happen > at an _arbitrary_ point during the execution of an instruction, Depends on the specifics. You gave a VAX example; I had the 88000 in mind. Even if this occurs as a synchronous trap you need to protect both the OS kernel and the user code thread which was executing (in case you want to dismiss the exception and continue). On the 88000 this means saving at least the volatile registers, checking the data pipeline for data faults and letting the floating point unit drain (or take further exceptions). If you implement this feature using existing hardware and having the MMU indicate illegal memory access (as someone suggested), the exception is not even synchronous. The MMU will indicate a data fault several cycles after the faulting instruction is issued. All the world is not a VAX. ----- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ----- ----- jeff kenton --- temporarily at jkenton@pinocchio.encore.com ----- ----- --- always at (617) 894-4508 --- ----- ----- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -----
bob@tera.com (Bob Alverson) (09/18/90)
In article <3783@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: >But you are still thinking in terms of asynchronous interrupts. >What we're talking about here is a "trap", and there isn't the >slightest reason why a trap should have to save any more state >than a procedure call. The trap is synchronous: it cannot happen >at an _arbitrary_ point during the execution of an instruction, >only when the architect chose to have it be detected. It relates >specifically to the currently executing thread, not to some other >I/O device. For a "RISC", having operations which both compute (or load) and conditionally branch is not a large problem. However, you must expect to have at least one and probably several branch delay slots after the instruction. If the condition you are checking is more complicated than the conditions the regular jumps test, then this complex op will probably need more branch delay slots than a normal jump. If exceptions can occur while you are in these extended delay slots, then the trap handler must know about them and handle yet another bizarre case properly. Is it worth it? If you insist on no branch delay slots, then the instruction will take longer to execute than the simple instructions it replaces, since it cannot draw from the surrounding instructions to keep the pipeline full (whereas the compiler can schedule the pipeline with separate instructions). Bob (bob@tera.com)
rro@debussy.cs.colostate.edu (Rod Oldehoeft) (09/19/90)
In article <12743@encore.Encore.COM> jkenton@pinocchio.encore.com (Jeff Kenton) writes: > >All the world is not a VAX. > Right. Burton Smith's Tera Horizon architecture has the capability for testing a little bit map in the memory instruction against the tag on a data memory word, and doing a trap to a user-defined location on a match or mismatch, with no intervention at all by an OS. This nifty feature has many uses: the definition of I-structure memory without software intervention, boundaries around stretches of memory, etc. I don't recall if arithmetic traps can be mapped to user routines just as simply or not.
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (09/20/90)
In article <1990Sep18.152339.25203@tera.com>, bob@tera.com (Bob Alverson) writes: > In article <3783@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: > For a "RISC", having operations which both compute (or load) and > conditionally branch is not a large problem. I have to confess that I'm playing devil's advocate here. I too am a True Believer in the Revealed Instruction Set Creed. But I don't recall either Rubin or me talking about "RISC"s in this context. > However, you must expect > to have at least one and probably several branch delay slots after the > instruction. The particular case we were considering was "fetch from a buffer and trap on underflow". For many of today's machines, we've already to worry about memory fetch delay, which would typically swamp the delay due to the comparison. Let's look at what's involved in a RISCish version: FETCH curptr, limit, dest <instr> <instr> -- now it is safe to use "dest" As far as delay slot filling is concerned, this is exactly like [ Bcc <handler> | LOAD (curptr), dest ] {that is, both kinds of instruction combined in one}. Either the instructions in the delay slots can safely be executed anyway, or else you have to be able to annul them. But that's true of any conditional branch. How complex is this condition? Well, the 88k does "if R1 = R2 then goto L" as one conditional branch, and that's all we need here. So > If the condition you are checking is more complicated > than the conditions the regular jumps test, in _this_ specific case the condition would _not_ be more complicated. > If exceptions can occur while you are in these extended delay slots, > then the trap handler must know about them and handle yet another > bizarre case properly. Is it worth it? What "extended" delay slots? One or two slots, depending on how the machine handles other _simple_ conditional branches. If I have a conditional branch BEQ R1, R2, L my code at label L doesn't have to worry about exceptions in the delay slot instructions following BEQ. It's the handler for _those_ exceptions which has to worry. So if an exception could happen in the one or two instructions following a FETCH instruction, it's the handler (this one's in the OS kernel) for _those_ exceptions which has to know that FETCH is like a conditional branch, but it already has to know about conditional branches. -- Heuer's Law: Any feature is a bug unless it can be turned off.
dfields@neutrino.urbana.mcd.mot.com (David Fields) (09/20/90)
In article <3793@goanna.cs.rmit.oz.au>, ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: |>As far as delay slot filling is concerned, this is exactly like |> [ Bcc <handler> | LOAD (curptr), dest ] |>{that is, both kinds of instruction combined in one}. Either the |>instructions in the delay slots can safely be executed anyway, or |>else you have to be able to annul them. But that's true of any |>conditional branch. How complex is this condition? Well, the 88k |>does "if R1 = R2 then goto L" as one conditional branch, and that's |>all we need here. So |> Just to set the recored straight, the 88k does not have such an instruction. There is a conditional branch which tests for eq0,ne0,gt0,ge0,lt0 and le0 but if you want to test equality of two registers it's a two instruction sequence. Others have pointed out the tricks required to resume after an exception on the 88100. Dave Fields // Motorola MCD // uiucuxc!udc!dfields // dfields@urbana.mcd.mot.com