[comp.unix.wizards] Floating point exceptions on Sun Workstations

mykel@gestetner.oz (Michael Landers) (02/23/90)

Unfortunately, I am asking this question for someone else, and I must say
I might not have a full grip on the problem, but here goes...

We are doing development on a Sun Workstation (3/60 with 68881 co-processor,
SunOS 4.0.1), in assembler.  The problem we have is that when the co-processor
gets an addressing exception, the Sun kernal doesn't seem to handle it
properly.  An example message is

	psi: USER COPROCESSOR PROTOCOL ERROR
	trap address 0x34, pid 3445, pc = 9e65e, sr = 4, stkfmt 9, context 5
	D0-D7  3 fdb62455 44d007db 0 44b239e7 44b291a3 44b1e349 44b29251
	A0-A7  efff84c d384c 0 d3fcc d384c efff9b8 efff83c

Sometimes these exceptions cause the program to crash (not _too_ bad), but
often the exception causes the machine to crash.

What I want to know is, is there anyone else doing development on Sun's in
this manner that has come across this problem, and if so, what has been
done as a work around?  It is making it very difficult to debug anything if
the machine constantly crashes.

Response by EMail preferred but not required.

Thanks in Advance,

Mykel
--
 ()                                  \\     Black Wind always follows
|\/|ykel Landers (mykel@gestetner.oz) \\    Where my black horse rides.
_||_    Gestetner Laser Systems        \\   Fire's in my soul
Phone: 612 975 0555  Fax: 612 975 0448  \\  Steel is on my side.
ACSnet: mykel@gestetner.oz.au	Internet: mykel@gestetner.oz.au@uunet.uu.net	

pt@geovision.uucp (Paul Tomblin) (02/28/90)

In article <427@gestetner.oz> mykel@gestetner.oz.au writes:
>[about his problem with:]
>	psi: USER COPROCESSOR PROTOCOL ERROR
>	trap address 0x34, pid 3445, pc = 9e65e, sr = 4, stkfmt 9, context 5
>	D0-D7  3 fdb62455 44d007db 0 44b239e7 44b291a3 44b1e349 44b29251
>	A0-A7  efff84c d384c 0 d3fcc d384c efff9b8 efff83c
>
>What I want to know is, is there anyone else doing development on Sun's in
>this manner that has come across this problem, and if so, what has been
>done as a work around?  It is making it very difficult to debug anything if
>the machine constantly crashes.
>
I got exactly the same error in a piece of code that had two routines (call
them 'a' and 'b') where 'a' passed 'b' a pointer to a structure, which 'b'
then dereferenced a float member of.  When 'a' passed a NULL pointer, this
is what we got.  Sun was very little help trying to debug it, since they
claimed it wasn't their message.  I had to run strings on vmunix to prove to
them it was their message.  The workaround is DONT DO IT!!!.  Check your
pointers.  Note that this error comes from the 68881 exception handler, so
it only happens with floats and doubles.

I hope this helps.  Sorry about speaking 'C', but I don't speak assembler.
-- 
Paul Tomblin nrcaer!cognos!geovision!pt or uunet!geovision!pt
Life: Loath it or ignore it, you can't like it.  (Marvin)
(My employer may not agree with my opinions, especially my .signature)