[comp.lang.fortran] How to detect NaN's

vlr@litwin.com (Vic Rice) (05/31/91)

I have encountered a problem on a DECStation 5000 using the Mips Fortran
compiler. Some operation I am performing is generating a NaN which then
proceeds to propogate to other dependent variables. Since this is in
a loop, I am not exactly sure where it starts.

How can I test a variable for the presence of a NaN or Infinity ???
-- 
Dr. Victor L. Rice
Litwin Process Automation

jim@interet.UUCP (User) (05/31/91)

In article <1991May30.204332.16506@litwin.com>, vlr@litwin.com (Vic Rice) writes:
    > How can I test a variable for the presence of a NaN or Infinity ???

We test for NAN and INF by using the following:
      if(x*0.0 .ne. 0.0) write(*,*) 'ERROR: X is not a number'
The idea is that if x is NAN, then x times zero is also NAN and not  zero.
Unfortunately,  the  above code may be optimized out by an optimizing com-
piler which "knows" that zero times x is zero, so you may have to code  it
as:

      if(notone.eq.1) then
        temp=1.0
      else
c  this branch is always taken
        temp=0.0
      endif
      if(x*temp.ne.0.0) write(*,*) 'ERROR: X is not a number'

Note that x may be NAN or maybe infinity or some other special bit pattern
depending on your floating point implementation.  But if the test succeeds,
then X is definitely not an ordinary number.

Jim    uunet!interet!jim

keinert@IASTATE.EDU (Keinert Fritz) (06/01/91)

In article <1991May30.204332.16506@litwin.com>, vlr@litwin.com (Vic Rice)
writes:
> I have encountered a problem on a DECStation 5000 using the Mips Fortran
> compiler. Some operation I am performing is generating a NaN which then
> proceeds to propogate to other dependent variables. Since this is in
> a loop, I am not exactly sure where it starts.
> 
> How can I test a variable for the presence of a NaN or Infinity ???
> -- 
> Dr. Victor L. Rice
> Litwin Process Automation

One of my biggest gripes with both the Sun and Mips Fortran compilers
is that there is no easy way to turn on floating point exception
handling. This results in exactly the problem Vic describes: at the
end of the program, most of your output is infinity or NaN, and you
have no clue where it started.

Here is a simple floating point exception handler that I wrote (Mips
Fortran version). Compile it with C and link with your Fortran (or C)
program, like this:

	% cc -c fpx_mips.c
	% f77 prog.f fpx_mips.o -o prog

or something like that.

To use it, just insert a statement

	call standard_fpx()

or 

	call setcsr(whatever)

as the first executable statement in your Fortran program.

I welcome suggestions for improvement, but I am not prepared to answer
a lot of questions or fix bugs in this code. Good Luck.

A last remark for the knowledgeable: the reason I am trapping SIGILL
is a known bug in Ultrix 4.1. Floating point errors cause SIGILL
instead of SIGFPE.
--
Fritz Keinert                             phone:  (515) 294-5128
Department of Mathematics                 fax:    (515) 294-5454
Iowa State University                     e-mail: keinert@iastate.edu
Ames, IA 50011

--------------------------------------------------------------------------
/**********************************************************************
 *                                                                    *
 *  Floating point exception handler for MIPS machines                *
 *  Fritz Keinert (keinert@iastate.edu)                               *
 *  April 5, 1991                                                     *
 *                                                                    *
 *  building on routines by Howard Jespersen and Srini Uppugunduri    *
 *                                                                    *
 *  As they say in NETLIB: Everything free comes with no guarantee.   *
 *                                                                    *
 * This file contains the following routines:                         *
 *                                                                    *
 * int getcsr()     - returns the contents of the control/status      *
 *                    register (CSR) of the MIPS floating point       *
 *                  unit (FPU). The register contents are             *
 *                  described in /usr/include/mips/fpu.h              *
 *                                                                    *
 * int getcsr_()    - same routine, callable from Fortran             *
 *                                                                    *
 * int setcsr(csr)  - sets CSR to new value, returns old value;       *
 *                    also installs fpx_handler                       *
 *                                                                    *
 * int setcsr_(csr) - same routine, callable from Fortran             *
 *                                                                    *
 * void fpx_handler(...) - floating point exception handler. Prints   *
 *                         an error message and dies.                 *
 *                                                                    *
 * int standard_fpx() - same as setcsr(3584), to set up the           *
 *                      most frequently useful behavior.              *
 *                                                                    *
 *                                                                    *
 * int standard_fpx_() - same routine, callable from Fortran          *
 *                                                                    *
 *                                                                    *
 **********************************************************************
 *                                                                    *
 * The floating point exception handling works like this:             *
 *                                                                    *
 * First, you have to tell the floating point unit to enable          *
 * interrupts, by setting the CSR accordingly. (I assume this         *
 * stands for Control and Status Register). The correct value         *
 * for CSR is found by adding                                         *
 *                                                                    *
 *      one of the following, to define the rounding behavior:        *
 *                                                                    *
 *          0  - round to nearest                                     *
 *          1  -          zero                                        *
 *          2  -          + infinity                                  *
 *          3  -          - infinity                                  *
 *                                                                    *
 *      plus as many as needed of the following, to turn on           *
 *      interrupts:                                                   *
 *                                                                    *
 *        128  - cause interrupt on inexact result                    *
 *        256  -                    underflow                         *
 *        512  -                    overflow                          *
 *       1024  -                    division by zero                  *
 *       2048  -                    invalid operation                 *
 *                                                                    *
 * The standard values are:                                           *
 *                                                                    *
 * 1. round to nearest, interrupt on overflow, division by zero       *
 * and invalid operation. CSR = 3584.                                 *
 *                                                                    *
 * 2. round to nearest, interrupt on underflow, overflow,             *
 * division by zero and invalid operation. CSR = 3840.                *
 *                                                                    *
 * Second, you have to install an interrupt handler to give you       *
 * a message when an interrupt actually happens. Otherwise, the       *
 * operating system will just ignore it, or kill your program with    *
 * a highly informative message like "illegal operation".             *
 *                                                                    *
 * My interrupt handler just prints a message and dies. For more      *
 * sophisticated behavior, like "print a message on underflow, set    *
 * the result to zero, and continue", or a traceback of where this    *
 * happened, you are on your own.                                     *
 *                                                                    *
 * Use the dbx debugger to find out where in your source code the     *
 * problem is. Here is an example: program test dies with the         *
 * message                                                            *
 *                                                                    *
 *      caught signal 4, code 0                                       *
 *      floating point error: overflow                                *
 *      at or near location 0x400448                                  *
 *                                                                    *
 * Call up dbx, execute the first line of your program, set the       *
 * program counter to the address reported by the interrupt routine,  *
 * and ask dbx where that is:                                         *
 *                                                                    *
 *      % dbx test                                                    *
 *      dbx version 2.0                                               *
 *      Type 'help' for help.                                         *
 *      reading symbolic information ...                              *
 *      [using test.test]                                             *
 *      test:   6  print*,'round to nearest  = 0'                     *
 *      (dbx) stop at 6                                               *
 *      [2] stop at "test.f":6                                        *
 *      (dbx) run                                                     *
 *      [2] stopped at [test.test:6 ,0x40020c]                        *
 *                             print*,'round to nearest  = 0'         *
 *      (dbx) assign $pc=0x400448                                     *
 *      4195400                                                       *
 *      (dbx) where                                                   *
 *      >  0 test.test() ["test.f":24, 0x400448]                      *
 *         1 main.main(0x7fffb9c4, 0x7fffb9cc, 0x0, 0x0, 0x0)         *
 *                                  ["main.c":35, 0x400f2c]           *
 *      (dbx) quit                                                    *
 *                                                                    *
 * and we know: it was around line 24 in file test.f.                 *
 *                                                                    *
 * This method may not work if the error is in a system subroutine.   *
 * For example, if you try sqrt(-1.), this will tell you that the     *
 * problem is inside a square root, but there may be many square      *
 * roots in your program.                                             *
 *                                                                    *
 * If all else fails, just run your program inside dbx until it       *
 * dies, and then type "where". This will get you an exact            *
 * traceback of where you are (sqrt function, called from line        *
 * ?? of subroutine ??, which was called from line ?? in main         *
 * program).                                                          *
 *                                                                    *
 * You may have to use the "-g" flag for compiling, to get the        *
 * debugger to work properly.                                         *
 *                                                                    *
 **********************************************************************/

#include <mips/fpu.h>
#include <mips/cpu.h>
#include <signal.h>
#include <stdio.h>


void fpx_handler(int sig, int code, struct sigcontext *scp)
{
        union fpc_csr csr;

        csr.fc_word = scp->sc_fpc_csr;

        fprintf(stderr,"caught signal %d, code %d\n", sig, code);

        if (csr.fc_struct.ex_invalid)
        {
          fprintf(stderr,"floating point error: invalid operation\n");
          fprintf(stderr,"at or near location 0x%x\n",scp->sc_pc);
          exit(1);
        }
        if (csr.fc_struct.ex_divide0)
        {
          fprintf(stderr,"floating point error: division by zero\n");
          fprintf(stderr,"at or near location 0x%x\n",scp->sc_pc);
          exit(1);
        } 
        if (csr.fc_struct.ex_underflow)
        {
          fprintf(stderr,"floating point error: underflow\n");
          fprintf(stderr,"at or near location 0x%x\n",scp->sc_pc);
          exit(1);
        }
        if (csr.fc_struct.ex_overflow)
        {
          fprintf(stderr,"floating point error: overflow\n");
          fprintf(stderr,"at or near location 0x%x\n",scp->sc_pc);
          exit(1);
        }
        if (csr.fc_struct.ex_inexact)
        {
          fprintf(stderr,"floating point error: inexact operation\n");
          fprintf(stderr,"at or near location 0x%x\n",scp->sc_pc);
          exit(1);
        }
        if (csr.fc_struct.ex_unimplemented)
        {
          fprintf(stderr,"floating point error: unimplemented operation\n");
          fprintf(stderr,"at or near location 0x%x\n",scp->sc_pc);
          exit(1);
        }

        if (code == BRK_DIVZERO) {
          fprintf(stderr,"integer error: division by zero\n");
          fprintf(stderr,"at or near location 0x%x\n",scp->sc_pc);
          exit(1);
        }
          exit(1);

        fprintf(stderr,"floating point error: unknown exception code\n");
        fprintf(stderr,"at or near location 0x%x\n",scp->sc_pc);
        fprintf(stderr,"fp status word 0x%x\n",csr);
        exit(1);
}


int getcsr()
/*
 * return contents of fpu control and status register
 */
{
  return get_fpc_csr();
}

int getcsr_()
/*
 * Fortran wrapper for getcsr()
 */
{
  return getcsr();
}

int setcsr(unsigned int csr)
/*
 * set up floating point interrupt handler, turn on interrupts
 */
{
  signal(SIGTRAP, fpx_handler);
  signal(SIGFPE,  fpx_handler);
  signal(SIGILL,  fpx_handler);
  return set_fpc_csr(csr);
}

int setcsr_(unsigned int *csr)
/*
 * Fortran wrapper for setcsr()
 */
{
  return setcsr(*csr);
}

int standard_fpx()
/*
 * set up standard error handling: die on overflow,
 * division by zero, illegal operation, but continue
 * on underflow. Round to nearest.
 */
{
  return setcsr((unsigned int) 3584);
}

int standard_fpx_()
/*
 * Fortran wrapper for standard_fpx()
 */
{
  return standard_fpx();
}

meissner@osf.org (Michael Meissner) (06/01/91)

In article <1991May30.204332.16506@litwin.com> vlr@litwin.com (Vic
Rice) writes:

| I have encountered a problem on a DECStation 5000 using the Mips Fortran
| compiler. Some operation I am performing is generating a NaN which then
| proceeds to propogate to other dependent variables. Since this is in
| a loop, I am not exactly sure where it starts.
| 
| How can I test a variable for the presence of a NaN or Infinity ???

To test for a NaN, simply compare the nubmer against itself.  If it's
a NaN, it will not compare equal.

To test for infinity, I would use (1. / 0.), since division by 0 of
anything but zero gives an infinity with the appropriate sign.  Zero
divided by zero, gives Nan.

Of course there are problably compilers out there that 'optimize' away
such tests, or complain if compile time arithmetic has a division by
zero in it.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

You are in a twisty little passage of standards, all conflicting.

gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) (06/01/91)

In article <1991May31.130807@IASTATE.EDU> keinert@IASTATE.EDU (Keinert Fritz) writes:

>One of my biggest gripes with both the Sun and Mips Fortran compilers
>is that there is no easy way to turn on floating point exception
>handling.

It's fairly trivial under SunOS 4.X:

      subroutine initerrors
      integer n, ieee_handler, sigfpe_abort
      n = ieee_handler( 'set', 'common', SIGFPE_ABORT )
      return
      end

This produces nice crashes when you want them. Now if only IBM would
see the light...

warren@atmos.washington.edu (David Warren) (06/01/91)

In article <1991May30.204332.16506@litwin.com> vlr@litwin.com (Vic Rice) writes:

   Xref: milton comp.unix.ultrix:6260 comp.lang.fortran:3304
   Path: milton!hayes.ims.alaska.edu!bionet!uwm.edu!zaphod.mps.ohio-state.edu!swrinde!cs.utexas.edu!uunet!dynsim1!litwin!vlr
   From: vlr@litwin.com (Vic Rice)
   Newsgroups: comp.unix.ultrix,comp.lang.fortran
   Date: 30 May 91 20:43:32 GMT
   Organization: Litwin Process Automation
   Lines: 9

   I have encountered a problem on a DECStation 5000 using the Mips Fortran
   compiler. Some operation I am performing is generating a NaN which then
   proceeds to propogate to other dependent variables. Since this is in
   a loop, I am not exactly sure where it starts.

   How can I test a variable for the presence of a NaN or Infinity ???
   -- 
   Dr. Victor L. Rice
   Litwin Process Automation


how about linking with this c function:

int *check_nan_(num)
/*make sure num != NaN*/
float *num;
{
    int true=1,false=0;
    if(isnan((double)num))return(&true);
    else return(&false);
}

then do

ISTAT = CHECK_NAN(RVAL)
--
David Warren 		INTERNET: warren@atmos.washington.edu
(206) 543-0945		UUCP:	  uw-beaver!atmos.washington.edu!warren
Dept of Atmospheric Sciences, AK-40
University of Washington

elwin@hubcap.clemson.edu (lawrence brown) (06/03/91)

From article <1991May30.204332.16506@litwin.com>, by vlr@litwin.com (Vic Rice):
> I have encountered a problem on a DECStation 5000 using the Mips Fortran
> compiler. Some operation I am performing is generating a NaN which then
> proceeds to propogate to other dependent variables. Since this is in
> a loop, I am not exactly sure where it starts.
> 
> How can I test a variable for the presence of a NaN or Infinity ???
> -- 
> Dr. Victor L. Rice
> Litwin Process Automation
Our compiler (Fortran for RISC version 2.0 I think) has a barely
documented option that will trap any NaN's and generate an illegal
instruction error.  It also sets all variables to NaN at the beginning
of execution (rather than setting them to zero) so it will test for 
unassigned variable usage (e.g. misspelled variable names). Try
f77 -trapuv
Hope this helps.
Larry Brown

khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (06/04/91)

In article <1991May31.130807@IASTATE.EDU> keinert@IASTATE.EDU (Keinert Fritz) writes:

   One of my biggest gripes with both the Sun and Mips Fortran
                                          ^^^????
try man ieee_handler() and man f77 (-fnonstandard)

I don't see how to make this any easier; here is what I usually use

	i=ieee_handler("set","common",%val(2)) !%val(2) is SIGFPE_ABORT
					       
f77_floatingpoint.h contains the definition of %val(2) and a lot of
other things. I can type %val(2) faster, so I personally wind up using
it more often than I should.

This isn't a particularly new feature; you might wish to peruse the
Numerical Computation Guide.
--
----------------------------------------------------------------
Keith H. Bierman    keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33			 | (415 336 2648)   
    Mountain View, CA 94043

vlr@litwin.com (Vic Rice) (06/04/91)

In <WARREN.91May31174433@stormy.atmos.washington.edu> warren@atmos.washington.edu (David Warren) writes:

>how about linking with this c function:

>int *check_nan_(num)
>/*make sure num != NaN*/
>float *num;
>{
>    int true=1,false=0;
>    if(isnan((double)num))return(&true);
>    else return(&false);
>}

>then do

>ISTAT = CHECK_NAN(RVAL)

I originally looked at this. However, as best as I can tell this macro expects
bit patterns associated with IEEE format machines. I don't believe the Mips
-- 
Dr. Victor L. Rice
Litwin Process Automation

martelli@cadlab.sublink.ORG (Alex Martelli) (06/04/91)

warren@atmos.washington.edu (David Warren) writes:

:In article <1991May30.204332.16506@litwin.com> vlr@litwin.com (Vic Rice) writes:
	...
:   I have encountered a problem on a DECStation 5000 using the Mips Fortran
	...
:how about linking with this c function:
:
:int *check_nan_(num)
:/*make sure num != NaN*/
:float *num;
:{
:    int true=1,false=0;
:    if(isnan((double)num))return(&true);
:    else return(&false);
:}
:
:then do
:
:ISTAT = CHECK_NAN(RVAL)

There are several problems here!  Integer and Logical function results are to
be returned by-value, not by-reference, on DECstations as well as all other
platforms I know; Fortran .TRUE. is generally -1, not 1; it is dangerous
for a C function to return the address of a variable of 'automatic' storage
class, since the storage it is pointing to is "going away" (i.e. about to be
recycled for automatic variables of further called functions) just after the
execution of the return statement; it is wrong to cast num, which is the
ADDRESS of a float, to a double - what you want to do is cast the POINTED-TO
value, *num, and such a cast is not needed anyway since floats are promoted
to double by default in C when used in expressions.
All in all, the idea is not bad, but the implementation should preferably be
something like:

int checknan(num) float *num; {
    return isnan(*num) ? -1 : 0 ;
}

to be declared and used like this:

	LOGICAL CHECKNAN
	...
	IF(CHECKNAN(RVAL)) THEN
		WRITE(*,*) 'HELP! RVAL is Not A Number!'
	ELSE
		WRITE(*,*) 'All is fine so far...'
	ENDIF
-- 
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 53, Bologna, Italia
Email: (work:) martelli@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only), Fidonet: 332/407.314 (home only).

mccoy@pixar.com (Dan McCoy) (06/20/91)

In article <1991May30.204332.16506@litwin.com> vlr@litwin.com (Vic Rice) writes:
>How can I test a variable for the presence of a NaN or Infinity ???

Others have answered this question if detecting these values is what
you really want.
But if what you really want is to write code that won't fail when presented
with NaN's and Inf's,  it's often possible to just write code so that 
sensible behaviour just falls out.

For instance, a NaN will fail any test.  It's not equal to anything, including
itself, and it's not greater than anything or less than anything.

If you write:
	if (x > maxvalue) outofrange();
x=NaN will appear to be in range, since it will fail the ">" test.

But if you write:
	if (!(x <= maxvalue)) outofrange();
then x=NaN will be out of range since it failed the "<=" test.

This type of defensive coding allows you to write bullet-proof code without 
having to clog it up with explicit NaN checks all over the place.

(I noticed the cross-post to comp.lang.fortran just before sending.  
 The concept applies just as well in Fortran.
 However my Fortran is so rusty, I better let the example stand in C.)

Dan McCoy     Pixar     ...!ucbvax!pixar!mccoy