[comp.unix.programmer] Fortran divide check crashes interactive '386

rhl@grendel.Princeton.EDU (Robert Lupton (the Good)) (01/01/91)

A colleague has a 386 running Interactive's unix, and the LPI fortran
compiler. He reported that a divide by zero in fortran crashes the
system, so I wrote a trivial C programme to catch SIGFPE --- it works
fine. So I wrote a stub to set the handler from fortran, and now it
works fine ONCE --- if you run the fortran a second time it still
crashes the system. 

I have no further ideas --- do any of you?

				Robert
				

jackv@turnkey.tcc.com (Jack F. Vogel) (01/02/91)

In article <4983@idunno.Princeton.EDU> rhl@grendel.Princeton.EDU (Robert Lupton (the Good)) writes:
 
>A colleague has a 386 running Interactive's unix, and the LPI fortran
>compiler. He reported that a divide by zero in fortran crashes the
>system, so I wrote a trivial C programme to catch SIGFPE --- it works
>fine. So I wrote a stub to set the handler from fortran, and now it
>works fine ONCE --- if you run the fortran a second time it still
>crashes the system. 
 
When will people learn that problem descriptions of the form "crashes the
system" are about as useful to a support person as "the car won't run" would
be to a mechanic :-}!

Seriously, you need to be more specific, what happens exactly? Does the system
panic and if so what is the panic message or type? Also, what level of the
system is this? What type hardware, etc, etc...

In any case, sounds like a fairly serious bug. If it panics, sounds like a bug
in the ISC trap code, a user application should just get a signal, it should
never panic the system. Of course, one might ask what the hell someone is
doing a divide-by-zero for anyway :-}, but that isn't meant as an answer.

I think someone at Interactive should take a close look at this once you
provide some more detail. I also have crossposted this followup to sysv386
where it will definitely be seen. Good Luck!

Disclaimer: I in no way speak for my employer, and certainly not ISC :-}.

-- 
Jack F. Vogel			jackv@locus.com
AIX370 Technical Support	       - or -
Locus Computing Corp.		jackv@turnkey.TCC.COM

src@scuzzy.in-berlin.de (Heiko Blume) (01/02/91)

rhl@grendel.Princeton.EDU (Robert Lupton (the Good)) writes:


>A colleague has a 386 running Interactive's unix, and the LPI fortran
>compiler. He reported that a divide by zero in fortran crashes the
>system, so I wrote a trivial C programme to catch SIGFPE --- it works
>fine. So I wrote a stub to set the handler from fortran, and now it
>works fine ONCE --- if you run the fortran a second time it still
>crashes the system. 

either

/* POSIX signal facilities */
your_catcher() { printf("Division by zero\n"); }
main() { [...]
sigset(SIGFPE,your_catcher);
[...] }


or

/* plain signal facilities */
your_catcher() { signal(SIGFPE,your_catcher); printf("Division by zero\n"); }
main() { [...]
signal(SIGFPE,your_catcher);
[....] }

should do it. however, if you want to set/longjmp()  with the POSIX
thing you must either engage sigrelse() or use sigsetjmp(2) and siglongjmp()
in the first place.
-- 
      Heiko Blume <-+-> src@scuzzy.in-berlin.de <-+-> (+49 30) 691 88 93
                    public source archive [HST V.42bis]:
        scuzzy Any ACU,f 38400 6919520 gin:--gin: nuucp sword: nuucp
                     uucp scuzzy!/src/README /your/home

buhrt@sawmill.uucp (Jeffery A Buhrt) (01/02/91)

>
>/* POSIX signal facilities */
>your_catcher() { printf("Division by zero\n"); }
>main() { [...]
>sigset(SIGFPE,your_catcher);
>[...] }
>
>/* plain signal facilities */
>your_catcher() { signal(SIGFPE,your_catcher); printf("Division by zero\n"); }
>main() { [...]
>signal(SIGFPE,your_catcher);
>[....] }
>
>should do it. however, if you want to set/longjmp()  with the POSIX
>thing you must either engage sigrelse() or use sigsetjmp(2) and siglongjmp()
>in the first place.

>      Heiko Blume <-+-> src@scuzzy.in-berlin.de <-+-> (+49 30) 691 88 93
>

Should work ... kind of...
as long as you are not on a '386 (as the orignal article ask).
The above code is very correct and works fine on all systems that I know
of except a 386/387 system.

The problem is probibly not so much one divide by zero as it is
8 and 9 (the 387 stack is 8 deap), after eight divide by zero calls
the 387 stack is full and the next (or later) 387 call will die.

I do NOT have ISC and can't say if maybe the user->kernel switch after the
code exits doesn't reset his 387 stack (I assume if the system does
panic it would not be an emulator doing it).

I don't have a portable solution yet though.

If you have a '387 and/or a FULL 387 emulator you can call
	fsave and frstor to reset the 387 FPU stack, but most of the default
	emulators (ex: Esix Rev. D) do not implement all the instructions.

What is the proper way to reset the 387's stack, when you get an FPE
the stack is not reset and eventually overwrites the rest of user memory
and/or generates random FPU faults?

Below is a shar file of a stripped down version of Sc that causes a 
387/387 emulator stack overflow, as far I I know on any 386/unix system.

Basically what happens:

	signal(SIGFPE, eval_fpe);
		...divide by 0 (8 times)
	signal(SIGFPE, quit_fpe);
		...do some other floating point operation
			FPU stack overflow

This code dies the same way on: Esix rev. D: cc, gcc and
	a Sequent Symmetry: cc, atscc, and gcc (all tested in BSD and USG)
with the message:
"quit_fpe() not called in EvalAll() (STACK TRASHED)"
	(which comes from a doprnt in sprintf in update())

1) unshar
2) pick the correct options in the Makefile for your system
3) make;fpedie

-- now turn on the fsave/frestore fix
5) If you are not on Esix, #define I387 in interp.c
6) make;fpedie
	-this works fine on the Symmetry, except on Esix:
		a) The 'struct fpusave' is part of the <sys/user.h> structure
		b) per the RevD manual page 7-6:
			FSAVE, FRSTOR are not defined

						-Jeff Buhrt
						317-477-6000
			{sequent, tippy.cs.purdue.edu}!sawmill!buhrt

#!/bin/sh
# This is a shell archive (shar 3.20)
# made 12/21/1990 19:03 UTC by buhrt@sawmill
# Source directory /users/buhrt/src/sc/z3/t/ok
#
# existing files WILL be overwritten
#
# This shar contains:
# length  mode       name
# ------ ---------- ------------------------------------------
#   2009 -rw-r--r-- Makefile
#   1716 -rw-r--r-- interp.c
#   1604 -rw-r--r-- sc.c
#    972 -rw-rw-r-- sc.h
#    964 -rw-r--r-- vmtbl.c
#
if touch 2>&1 | fgrep '[-amc]' > /dev/null
 then TOUCH=touch
 else TOUCH=true
fi
# ============= Makefile ==============
echo "x - extracting Makefile (Text)"
sed 's/^X//' << 'SHAR_EOF' > Makefile &&
X# Set SIGVOID if signal routines are type void.  System 5.3, SunOS 4.X,
X# VMS and ANSI C Compliant systems use this.  Most BSD systems and the
X# UNIXPC 'cc' do not.
X#SIGVOID=-DSIGVOID
XSIGVOID=
X
X# Set IEEE_MATH if you need setsticky() calls in your signal handlers
X#
X#IEEE_MATH=-DIEEE_MATH
XIEEE_MATH=
X
X# flags for lint
XLINTFLAGS=-abchxv
X
X# For ULTRIX: define the BSD4.2 section and SIGVOID above
X#	tdw@cl.cam.ac.uk tested on Ultrix 3.1C-0
X
X# Use this for system AIX V3.1
X#CFLAGS= -O -DSYSV2 -DCHTYPE=int -DNLS
X#LDFLAGS=
X#LIB=
X
X# Use this for system V.2
X#CFLAGS= -O -DSYSV2 
X#LDFLAGS=
X#LIB=
X
X# Use this for system V.3
X#CFLAGS= -O -DSYSV3
X#LDFLAGS=
X#LIB=
X
X# Microport
X#CFLAGS= -DSYSV2 -O -DUPORT -Ml
X#LDFLAGS=-Ml
X#LIB=
X
X# Use this for BSD 4.2
X#CFLAGS= -O -DBSD42
X#LDFLAGS=
X#LIB=
X
X# Use this for Sequent boxes
X#CC=atscc
XCC=gcc
XCFLAGS=-g -DBSD42
X#LDFLAGS= -s
XLDFLAGS= -g
XLIB=-ldmalloc 
XPSCLIB=
X
X# Use this for BSD 4.3
X#CFLAGS= -O -DBSD43
X#LDFLAGS=
X#LIB=
X
X# Use this for SunOS 4.X if you have the System V package installed.
X# This will link with the System V curses which is preferable to the
X# BSD curses (especially helps scrolling on slow (9600bps or less)
X# serial lines).
X#
X# Be sure to define SIGVOID and RE_COMP above.
X# 
X#CC=/usr/5bin/cc
X#CFLAGS= -O -DSYSV3 
X#LDFLAGS=
X#LIB=
X
X# Use this for system III (XENIX)
X#CFLAGS= -O -DSYSIII
X#LDFLAGS= -i
X#LIB=
X
X# Use this for VENIX
X#CFLAGS= -DVENIX -DBSD42 -DV7
X#LDFLAGS= -z -i 
X#LIB=
X
X# For SCO Unix V rel. 3.2.0
X#       -compile using rcc, cc does not cope with gram.c
X#       -edit /usr/include/curses.h, rcc does not understand #error
X#       -link: make CC=cc, rcc's loader gets unresolved __cclass, __range
X#               (rather strange,?)
X#CC=rcc
X#SIGVOID=-DSIGVOID
X#CFLAGS= -O -DSYSV3
X#LDFLAGS=
X#LIB=
X
X# The objects
XOBJS=sc.o interp.o   vmtbl.o  
X
Xfpedie:$(PAR) 	$(OBJS)
X	$(CC) ${CFLAGS} ${LDFLAGS} ${OBJS} ${LIB} -o fpedie
X
Xsc.o:	sc.h sc.c
X	$(CC) ${CFLAGS} ${SIGVOID} -c sc.c
X
Xinterp.o:	interp.c sc.h
X	$(CC) ${CFLAGS} ${IEEE_MATH} ${SIGVOID} -c interp.c
X
SHAR_EOF
$TOUCH -am 1221140290 Makefile &&
chmod 0644 Makefile ||
echo "restore of Makefile failed"
set `wc -c Makefile`;Wc_c=$1
if test "$Wc_c" != "2009"; then
	echo original size 2009, current size $Wc_c
fi
# ============= interp.c ==============
echo "x - extracting interp.c (Text)"
sed 's/^X//' << 'SHAR_EOF' > interp.c &&
X/*#define I387	/* HERE is the bigee */
X
X#ifdef aiws
X#undef _C_func			/* Fixes for undefined symbols on AIX */
X#endif
X
X#ifdef I387
X#include <sys/types.h>
X#include <i386/fpu.h>
Xstruct fpusave	fpu_buf;
X#endif /* I387 */
X
X#ifdef IEEE_MATH
X#include <ieeefp.h>
X#endif /* IEEE_MATH */
X
X#include <signal.h>
X#include <setjmp.h>
X#include <stdio.h>
X
Xextern int errno;		/* set by math functions */
X
X#include "sc.h"
X
Xjmp_buf fpe_save;
X
Xint quit_fpe();
X
Xdouble	eval();
X
X#ifdef SIGVOID
Xvoid
X#else
Xint
X#endif
Xeval_fpe() /* Trap for FPE errors in eval */
X{
X	fputs("eval_fpe called\n", stderr);
X/* not sure if needed since we do a frstor */
X/*
X#ifdef i386
X	asm("	fnclex");
X	asm("	fwait");
X#else
X*/
X#ifdef IEEE_MATH
X	(void)fpsetsticky((fp_except)0);	/* Clear exception */
X#endif /* IEEE_MATH */
X#ifdef PC
X	_fpreset();
X#endif
X/*#endif /* from #ifdef i386*/
X
X#ifdef I387
X	fputs("fpe_save\n", stderr);
X	asm(" frstor _fpu_buf ");
X#endif /* I387 */
X	/* re-establish signal handler for next time */
X	(void) signal(SIGFPE, eval_fpe);
X	longjmp(fpe_save, 1);
X}
X
Xdouble 
Xeval(e)
Xregister struct enode *e;
X{
X	double	denom;
X	denom = (double)0;
X	return((double)1/denom);
X}
X
X
X#ifdef SIGVOID
Xvoid
X#else
Xint
X#endif
Xquit_fpe()
X{
X    fputs("quit_fpe() not called in EvalAll() (STACK TRASHED)\n", stderr);
X    abort();	/* what might be left */
X    exit(1);
X}
X
Xvoid
XEvalAll () {
X    int i;
X    struct ent *p;
X
X    (void) signal(SIGFPE, eval_fpe);
X#ifdef I387
X	fputs("fsave", stderr);
X	asm(" fsave _fpu_buf ");
X	asm(" frstor _fpu_buf ");
X#endif /* I387 */
X
X    for (i=0; i<8; i++)
X	if (p = *ATBL(tbl,1,0))
X	{   double v;
X
X	    if (setjmp(fpe_save)) {
X		v = (double)0.0;
X	    } else {
X		v = eval (p->expr);
X	    }
X	}
X
X    (void) signal(SIGFPE, quit_fpe);
X}
SHAR_EOF
$TOUCH -am 1221140290 interp.c &&
chmod 0644 interp.c ||
echo "restore of interp.c failed"
set `wc -c interp.c`;Wc_c=$1
if test "$Wc_c" != "1716"; then
	echo original size 1716, current size $Wc_c
fi
# ============= sc.c ==============
echo "x - extracting sc.c (Text)"
sed 's/^X//' << 'SHAR_EOF' > sc.c &&
X/*	SC	A Spreadsheet Calculator
X *		Main driver
X *
X *		original by James Gosling, September 1982
X *		modifications by Mark Weiser and Bruce Israel,
X *			University of Maryland
X *
X *              More mods Robert Bond, 12/86
X *		More mods by Alan Silverstein, 3-4/88, see list of changes.
X *		Currently supported by sequent!sawmill!buhrt (Jeff Buhrt)
X *		$Revision: 6.12 $
X *
X */
X
X#include <stdio.h>
X#include "sc.h"
X
X#ifdef SYSV3
Xvoid exit();
X#endif
X
X/* Globals defined in sc.h */
Xstruct ent ***tbl;
X
Xchar line[FBUFLEN];
X
Xvoid	update();
X
Xstruct enode *
Xnew_const(op, a1)
Xint	op;
Xdouble a1;
X{
X    register struct enode *p;
X    p = (struct enode *) malloc ((unsigned)sizeof (struct enode));
X    p->op = op;
X    p->e.k = a1;
X    return p;
X}
X
X/* return a pointer to a cell's [struct ent *], creating if needed */
Xstruct ent *
Xlookat(row,col)
Xint	row, col;
X{
X    register struct ent **pp;
X
X    pp = ATBL(tbl, row, col);
X    if (*pp == (struct ent *)0) {
X	*pp = (struct ent *) malloc((unsigned)sizeof(struct ent));
X	(*pp)->expr = new_const(O_CONST, (double)4);
X	(*pp)->v = (double) 0.0;
X    }
X    return(*pp);
X}
X
Xvoid
Xupdate ()
X{
X	struct ent *p1;
X
X	if (p1 = *ATBL(tbl, 0, 0))
X		(void) sprintf (line, "%.15g", p1 -> v);
X}
X
Xint
Xmain (argc, argv)
Xint argc;
Xchar  **argv;
X{
X	/* setup the spreadsheet arrays, initscr() will get the screen size */
X    if (!growtbl(0, 0, 0))
X    {	exit(1);
X    }
X    lookat(0, 0);
X    lookat(1, 0);
X    lookat(2, 0);
X    lookat(3, 0);
X    lookat(4, 0);
X    lookat(5, 0);
X    lookat(6, 0);
X    lookat(7, 0);
X    lookat(8, 0);
X    EvalAll();
X    update();
X    EvalAll();
X
X    exit(0);
X}
SHAR_EOF
$TOUCH -am 1221140290 sc.c &&
chmod 0644 sc.c ||
echo "restore of sc.c failed"
set `wc -c sc.c`;Wc_c=$1
if test "$Wc_c" != "1604"; then
	echo original size 1604, current size $Wc_c
fi
# ============= sc.h ==============
echo "x - extracting sc.h (Text)"
sed 's/^X//' << 'SHAR_EOF' > sc.h &&
X/*	SC	A Table Calculator
X *		Common definitions
X *
X *		original by James Gosling, September 1982
X *		modified by Mark Weiser and Bruce Israel,
X *			University of Maryland
X *		R. Bond  12/86
X *		More mods by Alan Silverstein, 3-4/88, see list of changes.
X *		$Revision: 6.12 $
X *
X */
X
X#define	ATBL(tbl, row, col)	(*(tbl + row) + (col))
X#define	FBUFLEN	1024	/* buffer size for a single field */
X
Xstruct ent_ptr {
X    int vf;
X    struct ent *vp;
X};
X
X/* info for each cell, only alloc'd when something is stored in a cell */
Xstruct ent {
X    double v;		/* v && label are set in EvalAll() */
X    struct enode *expr;	/* cell's contents */
X    short flags;	
X};
X
X/* stores type of operation this cell will preform */
Xstruct enode {
X    int op;
X    union {
X	double k;		/* constant # */
X	struct ent_ptr v;	/* ref. another cell */
X    } e;
X};
X
X/* op values */
X#define O_CONST 'k'
X
Xextern	struct ent ***tbl;	/* data table ref. in vmtbl.c and ATBL() */
X
X#define	FALSE	0
X#define TRUE	1
SHAR_EOF
$TOUCH -am 1221140290 sc.h &&
chmod 0664 sc.h ||
echo "restore of sc.h failed"
set `wc -c sc.h`;Wc_c=$1
if test "$Wc_c" != "972"; then
	echo original size 972, current size $Wc_c
fi
# ============= vmtbl.c ==============
echo "x - extracting vmtbl.c (Text)"
sed 's/^X//' << 'SHAR_EOF' > vmtbl.c &&
X# include <stdio.h>
X# include "sc.h"
X
Xextern	char	*malloc();
Xextern	char	*realloc();
X	
X
X/*
X * grow the main && auxiliary tables (reset maxrows/maxcols as needed)
X * toprow &&/|| topcol tell us a better guess of how big to become.
X * we return TRUE if we could grow, FALSE if not....
X */
Xint
Xgrowtbl(rowcol, toprow, topcol)
Xint	rowcol;
Xint	toprow, topcol;
X{
X	struct ent ** nullit;
X	struct ent *** tnullit;
X	int	maxrows, maxcols;
X	int	cnt;
X	int	i;
X
X	maxrows = maxcols = 20;
X	tbl = (struct ent ***)malloc((unsigned)(maxrows*sizeof(struct ent **)));
X	for(tnullit = tbl, cnt = 0; cnt < maxrows; cnt++, tnullit++)
X		*tnullit = (struct ent **)NULL;
X
X	/* fill in the bottom of the table */
X	for (i = 0; i < maxrows; i++)
X	{	if ((tbl[i] = (struct ent **)malloc((unsigned)(maxcols *
X				sizeof(struct ent **)))) == (struct ent **)0)
X		{	return(FALSE);
X		}
X		for(nullit = tbl[i], cnt = 0; cnt < maxcols; cnt++, nullit++)
X			*nullit = (struct ent *)NULL;
X	}
X
X	return(TRUE);
X}
SHAR_EOF
$TOUCH -am 1221140290 vmtbl.c &&
chmod 0644 vmtbl.c ||
echo "restore of vmtbl.c failed"
set `wc -c vmtbl.c`;Wc_c=$1
if test "$Wc_c" != "964"; then
	echo original size 964, current size $Wc_c
fi
exit 0

src@scuzzy.in-berlin.de (Heiko Blume) (01/03/91)

buhrt@sawmill.uucp (Jeffery A Buhrt) writes:

>>[signal handling stuff]

>Should work ... kind of...
>as long as you are not on a '386 (as the orignal article ask).
>The above code is very correct and works fine on all systems that I know
>of except a 386/387 system.

oh, well. at least it solves the original problem, that the
signal wasn't catched after the first occurence :-)

>The problem is probably not so much one divide by zero as it is
>8 and 9 (the 387 stack is 8 deap), after eight divide by zero calls
>the 387 stack is full and the next (or later) 387 call will die.

>I do NOT have ISC and can't say if maybe the user->kernel switch after the
>code exits doesn't reset his 387 stack (I assume if the system does
>panic it would not be an emulator doing it).

i tried your fpedie on interactive 2.2.1 without a 387:

eval_fpe called
eval_fpe called
eval_fpe called
eval_fpe called
eval_fpe called
eval_fpe called
eval_fpe called
eval_fpe called
quit_fpe() not called in EvalAll() (STACK TRASHED)
IOT trap (core dumped)

not a panic at least. the #define I386 failed because there is no
struct fpusave in the header files. i kludged around a bit with _fpstate
but got only segmentation faults. anyway, the <ieeefp.h> says
"When a signal handler catches a FPE, it will have a freshly initialized
coprocessor (really?). [...] it gets a single parameter of type struct 
_fpustackframe. [...] By modifying it, the state of the
coprocessor (and emulator?) can be changed upon return to the main task."

since i don't know sh*t about the 387 innards i can't investigate this,
but it sounds like the right way to try.
-- 
      Heiko Blume <-+-> src@scuzzy.in-berlin.de <-+-> (+49 30) 691 88 93
                    public source archive [HST V.42bis]:
        scuzzy Any ACU,f 38400 6919520 gin:--gin: nuucp sword: nuucp
                     uucp scuzzy!/src/README /your/home

rhl@grendel.Princeton.EDU (Robert Lupton (the Good)) (01/04/91)

Let me defend my reputation. I started this thread, and apparently
wasn't careful enough. My problem was not that I didn't reinstall the
signal handler (I actually called exit instead of reinstalling it, but I
would have called signal() if the handler returned). The trouble was the
sequence

	for(;;) {
		Reboot machine
		run fortran code that divide checked with my handler
		(all OK so far. The programme exited. Back at the shell)
		run fortran code that divide checked with my handler
		(now the machine crashes)
	}
	
So the problem is not missing handlers, and I don't think that it is
this 8-deep stack on the '387 either; unless the fortrash handlers are
worse than I think I doubt if they manage to generate an extra 7 divide
checks while trying to clean up.

		
				Robert