[comp.unix.microport] Floating exception bug

dave@viper.Lynx.MN.Org (David Messer) (12/02/88)

I don't know if this has been reported yet, but the following code
causes a "Floating exception error" and a core dump when run in a
system with an 80287 co-processor.  (I don't know if it generates an
error when floating point operations are emulated with software.)

------File x.c
main()
{
	/*extern double y();*/
	for(;;) {
		y() ;
		}
	}
------File y.c
double
y()
{
	return( 1.0 ) ;
	}
------
Compile with:
	cc x.c y.c

and execute a.out.

The fix is to uncomment "extern double y();" in main().  Since the value
returned from y() is not used, no extern definition should be required.

Apparently if the function is not declared double, it uses up resources
in the floating-point chip which eventually causes an error.  It was a
real bear to find because the error could occur on any floating-point
operation once the resources got too low.
-- 
_____________________________________________________________________________
   __                     _ _ _              David Messer - Lynx Data Systems
  /  )              /    ' ) ) )                 dave@Lynx.MN.Org  -or-
 /  / __. , __o  __/      / / / _  _   _   _  __     ...{amdahl,hpda}!bungia!
/__/_(_/|_\\/ <__(_/_     / ' (_</_/_)_/_)_</_/ (_                 viper!dave

geertj@nlgvax.UUCP (Geert Jan de Groot) (12/03/88)

In article <1656@viper.Lynx.MN.Org> dave@viper (David Messer) writes:
>I don't know if this has been reported yet, but the following code
>causes a "Floating exception error" and a core dump when run in a
>system with an 80287 co-processor.  
>------File x.c
>main()
>{
>	/*extern double y();*/
>	for(;;) {
>		y() ;
>		}
>	}
>------File y.c
>double
>y()
>{
>	return( 1.0 ) ;
>	}
>------
>Compile with:
>	cc x.c y.c
>
>and execute a.out.
>
>The fix is to uncomment "extern double y();" in main().  Since the value
>returned from y() is not used, no extern definition should be required.

Not true! Functions always return values on the stack, even if the value
isn't used. Thus, y() returns a double on the floating point stack.
But, main() doesn't remove it because of the wrong declaration. 
Thus, y() _must_ be declared double.

>Apparently if the function is not declared double, it uses up resources
>in the floating-point chip which eventually causes an error.  It was a
>real bear to find because the error could occur on any floating-point
>operation once the resources got too low.

I used lint (On ultrix, but that doesn't matter), and it says:

y value declared inconsistently	y.c(3)  ::  x.c(5)
y returns value which is always ignored

Well, the second comment can be lived with, but the first predicts the trouble.

Regards,
geert jan de groot
luc rooyakkers (on visit, not on the net)

-.-.- --... ...--   -.. .   .--. . .---- .... --.. --. .-.-.

Geert Jan de Groot,			Email: geertj@nlgvax.pcg.philips.nl
Philips Research Laboratories,		Packet: PE1HZG @ PI8ZAA
Project Centre Geldrop,			AMPRNET: [44.137.24.3]
Building XR, Room 15,
Willem Alexanderlaan 7B,		"When in doubt,
5664 AN Geldrop, The Netherlands.	 tune for minimum smoke
phone: +31 40 892204			 and then consult a reference"
[Standard disclaimers apply]		-(Found in a manual)	

dave@viper.Lynx.MN.Org (David Messer) (12/07/88)

In article <171@nlgvax.UUCP> geertj@nlgvax.UUCP (Geert Jan de Groot) writes:
 >In article <1656@viper.Lynx.MN.Org> dave@viper (David Messer) writes:
 >
 >Not true! Functions always return values on the stack, even if the value
 >isn't used. Thus, y() returns a double on the floating point stack.
 >But, main() doesn't remove it because of the wrong declaration. 
 >Thus, y() _must_ be declared double.

I've had many people point out to me (in boring detail :-) that the problem
is an inconsistent declaration.  Of course that is the problem!  What is
unacceptable to me is that a function in a _library_, whose definition
I may not of known, caused a problem which only showed up much later
in the execution of the program.  If I hadn't had the source to the
function in question, I never would've found it.

The program in question has compiled and executed correctly for many
years on a variety of UNIX systems -- it should've compiled and
executed correctly on Microport.
-- 
_____________________________________________________________________________
   __                     _ _ _              David Messer - Lynx Data Systems
  /  )              /    ' ) ) )                 dave@Lynx.MN.Org  -or-
 /  / __. , __o  __/      / / / _  _   _   _  __     ...{amdahl,hpda}!bungia!
/__/_(_/|_\\/ <__(_/_     / ' (_</_/_)_/_)_</_/ (_                 viper!dave

rcd@ico.ISC.COM (Dick Dunn) (12/08/88)

In article <1675@viper.Lynx.MN.Org>, dave@viper.Lynx.MN.Org (David Messer)
writes about a program core-dumping on a Microport system:
> 
> I've had many people point out to me (in boring detail :-) that the problem
> is an inconsistent declaration.  Of course that is the problem!

So fix it!  How hard is it to reason through this one?
	- The program has a bug
	- The bug causes a core dump
	- The bug has been found
	- The fix is trivial
Why should there be a fix somewhere other than where the bug is.
	
>...What is
> unacceptable to me is that a function in a _library_, whose definition
> I may not of known, caused a problem...

If that's unacceptable to you, then find another programming language,
'cause that's the way C works.  Each compilation unit (source file) has to
have all the declarations it takes to make the program consistent.  As a
result of that, you can't just blindly use functions without knowing their
interfaces and either explicitly declaring what you need or getting the
declarations from an include file.  Of course, it's also hard to figure out
how you can use a function without knowing its definition!  (How do you use
something if you don't know what it does?!)

> ...If I hadn't had the source to the
> function in question, I never would've found it...

I regularly use hundreds of functions for which I don't have source.  I get
the information about them (including proper declarations) from the
documentation.  I don't need the source; I treat them as black boxes which
perform certain functions.  All I need to know is the interface.

> The program in question has compiled and executed correctly for many
> years on a variety of UNIX systems ...

Not so.  The program in question, by your own admission, is not a correct C
program; therefore it is not possible for it to "compile and execute
correctly".  It IS possible for it to compile and do what you want it to.
I'm sorry if that seems like splitting hairs; it's not.  The whole point of
having language definitions is so that we know what should work on any
correct implementation of the language.  The definition is binding both on
the language implementor (to make the implementation do what it's defined
to do) and on the programmer using the implementation (to use only what's
defined, and not anything that happens to seem to work).

>...it should've compiled and
> executed correctly on Microport.

Why?  You want them to "fix" their implementation so you can retain your
bug, to be painfully rediscovered on another system in the future?
-- 
Dick Dunn      UUCP: {ncar,nbires}!ico!rcd           (303)449-2870
   ...I'm not cynical - just experienced.

scjones@sdrc.UUCP (Larry Jones) (12/08/88)

In article <1675@viper.Lynx.MN.Org>, dave@viper.Lynx.MN.Org (David Messer)
writes [about his buggy code]:
> I've had many people point out to me (in boring detail :-) that the problem
> is an inconsistent declaration.  Of course that is the problem!  What is
> unacceptable to me is that a function in a _library_, whose definition
> I may not of known, caused a problem which only showed up much later
> in the execution of the program.  If I hadn't had the source to the
> function in question, I never would've found it.
> 
> The program in question has compiled and executed correctly for many
> years on a variety of UNIX systems -- it should've compiled and
> executed correctly on Microport.

Well, gee, officer, I've run that stop sign every day for the
past five years and you've never given me a ticket before!

Just because something happens to work BY ACCIDENT on some system
(or even on lots of systems) doesn't make it correct.  The code
is wrong and Microport has no obligation to try to make buggy
code work.  Library routines should have associated headers which
declare them correctly so that users can include the header and
get a correct declaration without having to know what it is.  If
they don't, beat up whoever wrote the library routines.  If they
do and the user didn't use them, beat up the user.

----
Larry Jones                         UUCP: uunet!sdrc!scjones
SDRC                                      scjones@sdrc.uucp
2000 Eastman Dr.                    BIX:  ltl
Milford, OH  45150                  AT&T: (513) 576-2070
"Save the Quayles" - Mark Russell

nusip@maccs.McMaster.CA (Mike Borza) (12/08/88)

In article <1675@viper.Lynx.MN.Org> dave@viper.Lynx.MN.Org (David Messer) writes:
>In article <171@nlgvax.UUCP> geertj@nlgvax.UUCP (Geert Jan de Groot) writes:
> >In article <1656@viper.Lynx.MN.Org> dave@viper (David Messer) writes:
> >
> >Not true! Functions always return values on the stack, even if the value
> >isn't used. Thus, y() returns a double on the floating point stack.
> >But, main() doesn't remove it because of the wrong declaration. 
> >Thus, y() _must_ be declared double.
>
>I've had many people point out to me (in boring detail :-) that the problem
>is an inconsistent declaration.  Of course that is the problem!  What is
>unacceptable to me is that a function in a _library_, whose definition
>I may not of known, caused a problem which only showed up much later
>in the execution of the program.  If I hadn't had the source to the
>function in question, I never would've found it.

That the undeclared library function definition is unacceptable to you
is irrelevent.  The compiler has correctly used what semantic
information it has available to make a decision about how to
handle the return value.  If this semantic information is in error,
the compiler can hardly be blamed for the mistake.
>
>The program in question has compiled and executed correctly for many
>years on a variety of UNIX systems -- it should've compiled and
>executed correctly on Microport.

I have no doubt that the error in question was difficult and tedious
to find, but that changes nothing.  The fact that the computed results
were similar across a variety of machines is not a proof of correctness.
The implementor of the compiler is free to make whatever decisions
s/he likes about the underlying implementation of the language, so long
as the implementation conforms the standard.  The program you describe
apparently made an implicit assumption about the implementation, which
was true enough across a variety of machines, and possibly compilers,
that the error went undetected until now.

I know a researcher who migrated a large-ish program "which had been
working for years" from a CDC mainframe to a VAX VMS environment.
The program crashed unpredictably with segmentation exceptions,
and the researcher was sure the problem was in the hardware or
system software.  One of his assistants spent more than a month to
find that the cause was a scalar which was passed to a subroutine
and picked up as an array.  This bug had gone undetected for years
and survived through a variety of hardware and system software changes.
Should this program have "worked"?  Semantically, there is little
difference between this problem and the one you describe.

Now my real question... what other little goodies are waiting in
store for you the next time you port this monster?  Will it port
painlessly to a vectorizing architecture?  To a parallel one?

>   __                     _ _ _              David Messer - Lynx Data Systems

mike borza  <antel!mike@maccs.uucp or nusip@maccs.uucp>