sdempsey@UCSD.EDU (Steve Dempsey) (07/26/90)
The following discussion pertains to a 4D/25TG running 3.2.1 and a
4D/340VGX running 3.3.
Recently I have been doing a performance analysis of a number cruncher
program that runs much more slowly on IRISes than one would expect.
I fired up gr_osview and ran the program, expecting to see lots of
system calls or swapping, and an indication of where the cpu time was being
wasted. What I saw was something quite strange! The cpu was spending
99% of its time in user mode, just like any decent number cruncher should.
The shock came from the interrupt rate, which went from a background level
of 200-400 per second up to ~20K per second (35K on the 340VGX!)
Ultimately, I discovered that the extra interrupts were occuring whenever
floating point operations resulted in underflow. This behavior can be
demonstrated by compiling and running this code:
#include <math.h>
main()
{
double x, y, z;
int i;
y = MINDOUBLE;
z = 0.5;
i = 10000000;
while(i--) x = y * z;
}
Both C and Fortran versions of this code produce the same results.
I tried similar tests, forcing overflows and divide-by-zero, but no extra
interrupts were found for these floating exceptions.
Can anybody explain what's so special about underflows, and why do I get
interrupts even though the floating point exception interrupts are not enabled?
--------------------------------------------------------------------------------
Steve Dempsey (619) 534-0208
Dept. of Chemistry Computer Facility, 0314 INTERNET: sdempsey@ucsd.edu
University of California, San Diego BITNET: sdempsey@ucsd
La Jolla, CA 92093-0314 UUCP: ucsd!sdempsey
bron@bronze.wpd.sgi.com (Bron Campbell Nelson) (07/27/90)
In article <9007260022.AA03729@chem.chem.ucsd.edu>, sdempsey@UCSD.EDU (Steve Dempsey) writes: > The shock came from the interrupt rate, which went from a background level > of 200-400 per second up to ~20K per second (35K on the 340VGX!) > > Ultimately, I discovered that the extra interrupts were occuring whenever > floating point operations resulted in underflow. This behavior can be > demonstrated by compiling and running this code: [deleted] The MIPS R3010 floating point hardware does not handle the "exceptional" conditions of IEEE floating point, including underflow. Whenever an f.p. operation would result in underflow, the chip generates an interrupt, and the f.p. operation is done in software, correctly dealing with all the obscure conventions of IEEE arithmetic. This is one of the reasons that the chip is (normally) so fast: all that silicon that would normally be devoted to this stuff is removed and is instead invested in making the normal case go faster. Of course, this is also the reason why it is so slow in your particular case. The reason why underflow is particularly bad is that once you get an IEEE denorm, subsequent operations on that denorm will also cause interrupts, et cetera. You say that you have 3.3; if you are not too worried about exact IEEE semantics for your f.p. operations, then you can use the "sigfpe(3C)" package (or "fsigfpe(3F)" for the Fortran interface). This allows you to specify what you want done when these sorts of exceptions occur. The fast simple thing to do is that when an underflow (_UNDERFL) exception occurs, instead of computing the correct denorm value, just use zero as the result value (note that non-IEEE machines typically do just that). You will still take an interrupt when a denorm value is *first* generated, but by replacing it with zero, you prevent that denorm interrupt from propagating into subsequent calculations. This normally gets rid of the vast majority of these interrupts. Sadly, if you *really* need the exact correct IEEE denormalized values, you are stuck. As I said, the R3010 does not have hardware support for denorms, and so operations on denorms must be done in software. -- Bron Campbell Nelson bron@sgi.com or possibly ..!ames!sgi!bron These statements are my own, not those of Silicon Graphics.