[comp.sys.m88k] Grabbing arithmetic overflow traps ?

wood@gen-rtx.rtp.dg.com (Tom Wood) (06/28/90)

In article <3418@oakhill.UUCP> shebanow@oakhill.UUCP (Mike Shebanow) writes:

> The 88k "add" and "sub" instructions do check for arithmetic overflow
> automatically.  I also have an Aviion; I replaced an "addu" instruction
> with an "add" in a piece of test code and purposely generated an
> overflow with the add.  The result was a floating point exception.
> You can catch this with your own handler w/o modifying the kernel.
> Unfortunately, looking at the gcc man page (online), I saw no way to
> get the compiler to generate "add" instead of "addu" when performing ops
> on signed integers (unless I overlooked something in the man page).
> Does anyone at DG know how to do this?

It wouldn't be too hard to change the `key' uses of addu/subu in the GCC
compiler to add/sub, but unfortunately, you would no longer have a C
compiler.  Perhaps that's too strong a statement, but in C, you typically
don't get integer overflow checks, and most often you don't want them.
---
			Tom Wood	(919) 248-6067
			Data General, Research Triangle Park, NC
			{the known world}!rti!xyzzy!wood, wood@dg-rtp.dg.com

aglew@basagran.csg.uiuc.edu (Andy Glew) (06/28/90)

Doesn't GCC support an escape to assembly language?  So, could the
fellow who wants overflow not just define his own "add with overflow
detection" function, which uses the assembly inline function for "add"
(with a trap handler for the desired side effects) --- or a more
detailed and portable check on other machines?

--
Andy Glew, aglew@uiuc.edu

devil@techunix.BITNET (Gil Tene) (06/29/90)

Well, I am the guy who originally started this question running.
Actually, I don't need to have the C compiler generate instructions
which will cause overflows, I generate the binary code myself, this is
part of the application. I DO NEED the overflow detection to be FAST,
or more importantly, minimize overhead.

I want to thank all the people who replied, and the basics were the same
from everyone: Integer and floating point overflows both generate a
SIGFPE signal in the kernel, all I have to do is provide a handler.
(I really didn't know this...).

Now for the followup question:

I can handle the overflows now, but signal overhead is HUGE for what I am
doing, since it includes a context switch and all the kernel overhead.
I want something that is FAST, since I may expect more than 5000 overflows
a second. My application is simulating a computer model, and must provide
the same functionality as the simulated machine. This includes overflow,
and means that the overflow flag of the simulated machine must be set
on every overflow that occurs, whether it is checked or not.

I would like opinions on the following "bad-programming" idea:

Interpose the kernel's overflow-trap handling routine with my own
routine. My routine will check the current context, to make sure it
is the same as the simualting process's. If the context is different,
the kernel's overflow routine is called, otherwise, my routine handles
the overflow, and simulates it's behaviour on the simulated machine.
This would mean I need to allocate kernel memory space, stick my
routine in it, and handle the trap interposing. I know this is
"bad-programming", but it is much FASTER then letting the kernel do
the work.

Anyone have any suggestions/observations ?
(This is all done on an Aviion AV300, running DG/UX 4.20, BTW.)

AdvThanks,

Gil.

--
--------------------------------------------------------------------
| Gil Tene                      "Some days it just doesn't pay     |
| devil@techunix.technion.ac.il   to go to sleep in the morning."  |
--------------------------------------------------------------------

jkenton@pinocchio.encore.com (Jeff Kenton) (06/29/90)

From article <9733@discus.technion.ac.il>, by devil@techunix.BITNET (Gil Tene):
> 
> I would like opinions on the following "bad-programming" idea:
> 
> Interpose the kernel's overflow-trap handling routine with my own
> routine. My routine will check the current context, to make sure it
> is the same as the simualting process's. If the context is different,
> the kernel's overflow routine is called, otherwise, my routine handles
> the overflow, and simulates it's behaviour on the simulated machine.
> This would mean I need to allocate kernel memory space, stick my
> routine in it, and handle the trap interposing. I know this is
> "bad-programming", but it is much FASTER then letting the kernel do
> the work.
> 

I don't think this is a bad idea at all (although I can imagine *bad*
implementation).  Kernel overhead is huge.  If you can turn overflow
exceptions into "fast traps" for your own procersses without changing
the semantics of overflow for others, go for it.

On the 88000, people have found lots of uses for "fast traps", where
you get into and out of the kernel without the full overhead.

*** WARNING:  *** Warning Ahead ***  WARNING ***

However, most fast trap implementations I have seen use real trap
instructions.  You can't.  Therefore, you have to be very careful
about the memory pipeline and the floating point pipeline.  If there
is anything in the data pipeline, you are probably better off going
into normal kernel processing (with the registers *exactly* right).
For the FP pipe, you probably have to keep it disabled throughout
your (short) procedure, since there is no good way to find out what's
in it (and pass that info to the kernel).  That means no FP instructions,
of course, and no multiply or divide.

*** RETRACTION ***

Maybe it is a *BAD* idea.  If you have kernel source and know exactly
what you're doing, it could certainly work, and be much faster.  But,
it may be scarier than I thought three paragraphs ago.

Good luck.

Questions? --> e-mail: jkenton@pinocchio.encore.com


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      jeff kenton  ---	temporarily at jkenton@pinocchio.encore.com	 
		   ---  always at (617) 894-4508  ---
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

andrew@frip.WV.TEK.COM (Andrew Klossner) (06/29/90)

We've given some thought to these problems, but haven't implemented
anything.  Some thoughts:

Yes, signal overhead is huge.  88k exception overhead is pretty large
all by itself.  On any exception, you've got to clean out the
pipelines.  There can be up to three data loads/stores suspended in
flight, so you have to relaunch them, and two of them might cause data
access exceptions (if, for example, they refer to invalid virtual
addresses).  You must also clean out the floating point unit pipelines
and deal with any exceptions arising from this.  This is tricky code to
get right, and it must execute with shadow registers frozen, so you
can't use a conventional debugger.  (However, a Tektronix DAS 9200
ICEbox is quite useful in these circumstances -- end of commercial.)

If you want to plug into the kernel's floating point exception handler,
you'll likely find yourself operating within this constricted
environment.  I've been hacking 88k kernels for three years, but I
wouldn't want to take on this task.

Here are a couple of alternative means to your end:

1:  Non-trap overflow detection.  Yes, you lose cycles if you have to
follow every addu and subu with conditional branching.  It helps that a
non-taken conditional branch eats only one cycle.  If you're taking
5000 overflows a second, that's one or more overflows per 1000
operations.  If you can detect overflow with a single one-cycle
conditional branch, you'll do as well as if you install a trap handler
that takes 1000 or more cycles to complete.

2: Restrictive overflow handling.  Arrange that all pipelines will be
empty when you perform an add or subtract, either by sophisticated
instruction scheduling or by using "tb1 0,r0,0" instructions to wall
off your add/sub code from loads, stores, and floating point or
multiply/divide instructions.  Modify the kernel so that, when your
process is executing, integer overflow exceptions are delivered
directly to you, bypassing pipeline correction.  The kernel code might
implement this by changing the code at the integer overflow exception
vector to something like this:

vector+0x48:
	br.n	custom_int_overflow
	stcr	r1,sr0		; Save user's r1.

	...

custom_int_overflow:
	subu	r31,r31,4	; Stack the SNIP -- address of instruction
	ldcr	r1,snip		;   about to be executed.
	st.usr	r1,r31,r0	; System will take ERR exception and crash
				;   if user's r31 is invalid.
	subu	r31,r31,4	; Stack the SXIP -- address of faulting
	ldcr	r1,sxip		;   instruction.
	st.usr	r1,r31,r0
				; Fetch the address of the user's exception
				;   handler.
	or.u	r1,0,hi16(u.u_int_handler)
	ld	r1,r1,lo16(u.u_int_handler)
				; Fill load shadow with as many instructions
				;   as possible:
	stcr	r0,snip		; Clear valid bit in SNIP.
	stcr	r0,ssbr		; Wipe out all shadow scoreboard bits.
				;   Otherwise system will hang at RTE if
				;   pipelines were not in fact empty.
	or	r1,r1,2		; Turn on the "valid" bit, and arrange that
	stcr	r1,sfip		;   execution will resume at user's exception
				;   handler.
	ldcr	r1,sr0		; Restore user's r1.
	rte			; Return to user code.

We call this "lightweight exception dispatch."  Several further
simplifications are possible.  When any process other than yours is
running, the code at vector+0x48 would point to the usual kernel
overflow handler.

  -=- Andrew Klossner   (uunet!tektronix!frip.WV.TEK!andrew)    [UUCP]
                        (andrew%frip.wv.tek.com@relay.cs.net)   [ARPA]