wood@gen-rtx.rtp.dg.com (Tom Wood) (06/28/90)
In article <3418@oakhill.UUCP> shebanow@oakhill.UUCP (Mike Shebanow) writes: > The 88k "add" and "sub" instructions do check for arithmetic overflow > automatically. I also have an Aviion; I replaced an "addu" instruction > with an "add" in a piece of test code and purposely generated an > overflow with the add. The result was a floating point exception. > You can catch this with your own handler w/o modifying the kernel. > Unfortunately, looking at the gcc man page (online), I saw no way to > get the compiler to generate "add" instead of "addu" when performing ops > on signed integers (unless I overlooked something in the man page). > Does anyone at DG know how to do this? It wouldn't be too hard to change the `key' uses of addu/subu in the GCC compiler to add/sub, but unfortunately, you would no longer have a C compiler. Perhaps that's too strong a statement, but in C, you typically don't get integer overflow checks, and most often you don't want them. --- Tom Wood (919) 248-6067 Data General, Research Triangle Park, NC {the known world}!rti!xyzzy!wood, wood@dg-rtp.dg.com
aglew@basagran.csg.uiuc.edu (Andy Glew) (06/28/90)
Doesn't GCC support an escape to assembly language? So, could the fellow who wants overflow not just define his own "add with overflow detection" function, which uses the assembly inline function for "add" (with a trap handler for the desired side effects) --- or a more detailed and portable check on other machines? -- Andy Glew, aglew@uiuc.edu
devil@techunix.BITNET (Gil Tene) (06/29/90)
Well, I am the guy who originally started this question running. Actually, I don't need to have the C compiler generate instructions which will cause overflows, I generate the binary code myself, this is part of the application. I DO NEED the overflow detection to be FAST, or more importantly, minimize overhead. I want to thank all the people who replied, and the basics were the same from everyone: Integer and floating point overflows both generate a SIGFPE signal in the kernel, all I have to do is provide a handler. (I really didn't know this...). Now for the followup question: I can handle the overflows now, but signal overhead is HUGE for what I am doing, since it includes a context switch and all the kernel overhead. I want something that is FAST, since I may expect more than 5000 overflows a second. My application is simulating a computer model, and must provide the same functionality as the simulated machine. This includes overflow, and means that the overflow flag of the simulated machine must be set on every overflow that occurs, whether it is checked or not. I would like opinions on the following "bad-programming" idea: Interpose the kernel's overflow-trap handling routine with my own routine. My routine will check the current context, to make sure it is the same as the simualting process's. If the context is different, the kernel's overflow routine is called, otherwise, my routine handles the overflow, and simulates it's behaviour on the simulated machine. This would mean I need to allocate kernel memory space, stick my routine in it, and handle the trap interposing. I know this is "bad-programming", but it is much FASTER then letting the kernel do the work. Anyone have any suggestions/observations ? (This is all done on an Aviion AV300, running DG/UX 4.20, BTW.) AdvThanks, Gil. -- -------------------------------------------------------------------- | Gil Tene "Some days it just doesn't pay | | devil@techunix.technion.ac.il to go to sleep in the morning." | --------------------------------------------------------------------
jkenton@pinocchio.encore.com (Jeff Kenton) (06/29/90)
From article <9733@discus.technion.ac.il>, by devil@techunix.BITNET (Gil Tene): > > I would like opinions on the following "bad-programming" idea: > > Interpose the kernel's overflow-trap handling routine with my own > routine. My routine will check the current context, to make sure it > is the same as the simualting process's. If the context is different, > the kernel's overflow routine is called, otherwise, my routine handles > the overflow, and simulates it's behaviour on the simulated machine. > This would mean I need to allocate kernel memory space, stick my > routine in it, and handle the trap interposing. I know this is > "bad-programming", but it is much FASTER then letting the kernel do > the work. > I don't think this is a bad idea at all (although I can imagine *bad* implementation). Kernel overhead is huge. If you can turn overflow exceptions into "fast traps" for your own procersses without changing the semantics of overflow for others, go for it. On the 88000, people have found lots of uses for "fast traps", where you get into and out of the kernel without the full overhead. *** WARNING: *** Warning Ahead *** WARNING *** However, most fast trap implementations I have seen use real trap instructions. You can't. Therefore, you have to be very careful about the memory pipeline and the floating point pipeline. If there is anything in the data pipeline, you are probably better off going into normal kernel processing (with the registers *exactly* right). For the FP pipe, you probably have to keep it disabled throughout your (short) procedure, since there is no good way to find out what's in it (and pass that info to the kernel). That means no FP instructions, of course, and no multiply or divide. *** RETRACTION *** Maybe it is a *BAD* idea. If you have kernel source and know exactly what you're doing, it could certainly work, and be much faster. But, it may be scarier than I thought three paragraphs ago. Good luck. Questions? --> e-mail: jkenton@pinocchio.encore.com - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - jeff kenton --- temporarily at jkenton@pinocchio.encore.com --- always at (617) 894-4508 --- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
andrew@frip.WV.TEK.COM (Andrew Klossner) (06/29/90)
We've given some thought to these problems, but haven't implemented anything. Some thoughts: Yes, signal overhead is huge. 88k exception overhead is pretty large all by itself. On any exception, you've got to clean out the pipelines. There can be up to three data loads/stores suspended in flight, so you have to relaunch them, and two of them might cause data access exceptions (if, for example, they refer to invalid virtual addresses). You must also clean out the floating point unit pipelines and deal with any exceptions arising from this. This is tricky code to get right, and it must execute with shadow registers frozen, so you can't use a conventional debugger. (However, a Tektronix DAS 9200 ICEbox is quite useful in these circumstances -- end of commercial.) If you want to plug into the kernel's floating point exception handler, you'll likely find yourself operating within this constricted environment. I've been hacking 88k kernels for three years, but I wouldn't want to take on this task. Here are a couple of alternative means to your end: 1: Non-trap overflow detection. Yes, you lose cycles if you have to follow every addu and subu with conditional branching. It helps that a non-taken conditional branch eats only one cycle. If you're taking 5000 overflows a second, that's one or more overflows per 1000 operations. If you can detect overflow with a single one-cycle conditional branch, you'll do as well as if you install a trap handler that takes 1000 or more cycles to complete. 2: Restrictive overflow handling. Arrange that all pipelines will be empty when you perform an add or subtract, either by sophisticated instruction scheduling or by using "tb1 0,r0,0" instructions to wall off your add/sub code from loads, stores, and floating point or multiply/divide instructions. Modify the kernel so that, when your process is executing, integer overflow exceptions are delivered directly to you, bypassing pipeline correction. The kernel code might implement this by changing the code at the integer overflow exception vector to something like this: vector+0x48: br.n custom_int_overflow stcr r1,sr0 ; Save user's r1. ... custom_int_overflow: subu r31,r31,4 ; Stack the SNIP -- address of instruction ldcr r1,snip ; about to be executed. st.usr r1,r31,r0 ; System will take ERR exception and crash ; if user's r31 is invalid. subu r31,r31,4 ; Stack the SXIP -- address of faulting ldcr r1,sxip ; instruction. st.usr r1,r31,r0 ; Fetch the address of the user's exception ; handler. or.u r1,0,hi16(u.u_int_handler) ld r1,r1,lo16(u.u_int_handler) ; Fill load shadow with as many instructions ; as possible: stcr r0,snip ; Clear valid bit in SNIP. stcr r0,ssbr ; Wipe out all shadow scoreboard bits. ; Otherwise system will hang at RTE if ; pipelines were not in fact empty. or r1,r1,2 ; Turn on the "valid" bit, and arrange that stcr r1,sfip ; execution will resume at user's exception ; handler. ldcr r1,sr0 ; Restore user's r1. rte ; Return to user code. We call this "lightweight exception dispatch." Several further simplifications are possible. When any process other than yours is running, the code at vector+0x48 would point to the usual kernel overflow handler. -=- Andrew Klossner (uunet!tektronix!frip.WV.TEK!andrew) [UUCP] (andrew%frip.wv.tek.com@relay.cs.net) [ARPA]