[comp.lang.c] C and Floating Point

dgh@sun.UUCP (04/01/87)

I regret that it's time for some flaming on this newgroup.
After reading part way through too many postings which turn out
to be another instance of the blind arguing with the ignorant,
I have given up on individual mail responses which usually bounce anyway.

Fortran was created before the phrase "computer science" was imagined,
by people who were trained as applied mathematicians.  Consequently they had
some familiarity with what was needed and what had been done by 
mathematicians over the past several centuries.   It would never have
occurred to them, for instance, to disregard parentheses while evaluating
expressions or to disallow equality comparison of numbers!  Numerical
difficulties are inherent in the nature of the problems to be solved
and can't be legislated away by language design, but poor language design
can exacerbate those problems, just as poor design makes Fortran I/O
gratuitously more difficult than I/O problems inherently need to be.

C was originally created and developed by people who knew and cared little 
for the issues of floating-point computation.  This was tolerable while C was
primarily used for implementing kernels and compilers, but now other
people are building large numerical applications entirely in C.  
Version 3 of Spice is a relatively familiar and readily available example.
Large numerical applications are written in C primarily for portability
and because, in some non-numerical respects, the C environment is far more
powerful than that of many older languages, particularly Fortran.

Consequently persons who are expert in the way C has been are not always
fully qualified to prescribe what C should become (particularly in the
context of ANSI standardization).  (The same statement applies to Fortran
standardization, by the way).  The energy spent arguing rationalizations for
old or new mistakes in language design for C might be better spent thinking 
about why every other widely-used algebraic expression language requires 
(for instance) that parentheses be honored and allows numbers to be 
compared for equality.

Studying Fortran as a model of language design is confusing since it requires
much varied experience to be able to separate the good ideas from the bad.
In the area of floating-point computation, there are later and better sources
of inspiration, particularly in the expositions and explanations of the IEEE
Standard, some of which are listed below.  However, the committee which 
drafted that Standard, wanting to finish its work in timely fashion,
declined to prescribe for specific programming languages.
The perhaps naive thought was that the necessary ideas, once enumerated, 
would eventually be understood and incorporated in language standards in
"correct" ways.  Consequently the Standard does
not mention the requirement, obvious to its drafters, that parentheses in
algebraic expressions must be honored.   It is worth observing that none
of the numerical analysts who participated in its deliberations suggested that
exact floating-point equality comparisons were undesirable or that fuzzy
comparisons were. 

People who make dubious pronouncements outside their areas of expertise - 
by no means limited to this newsgroup or to the C language -
undermine their credibility when speaking on subjects in which they are
knowledgeable.  As always - "better to remain silent and be thought a fool
than to hit ^D and confirm it".

Here are some ways to become knowledgeable about floating point:

IEEE Standard for Binary Floating-Point Arithmetic,
ANSI/IEEE Std 754-1985, IEEE, New York, 1985.

Coonen,
Contributions to a Proposed Standard for Binary Floating-Point Arithmetic,
Ph. D. thesis, University of California, Berkeley, 1984.
Cody et al., "A Proposed Radix- and Word-length-independent Standard for
Floating-Point Arithmetic,"
IEEE Computer,
August 1984.

Stevenson et al., Cody, Hough, Coonen,
various papers proposing and analyzing a draft standard for binary floating-point arithmetic,
IEEE Computer,
March 1981.

Coonen,
"An Implementation Guide to a Proposed Standard for Floating-Point Arithmetic,"
IEEE Computer,
January 1980.

The Proposed IEEE Floating-Point Standard,
special issue of the ACM
SIGNUM Newsletter,
October 1979.

Apple Numerics Manual,
Addison-Wesley, 1986.

"Appendix: Accuracy of Numerical Calculations,"
in
HP-15C Advanced Functions Handbook,
00015-90011, Hewlett-Packard, 1982.

Cody and Waite,
Software Manual for the Elementary Functions,
Prentice-Hall, 1980.

Bunch, Dongarra, Moler, Stewart,
Linpack Users' Guide,
SIAM, Philadelphia, 1979.

Sterbenz,
Floating-Point Computation,
Prentice-Hall, 1974.

Kahan, Implementation of Algorithms, 1973,
NTIS # DDC AD-769 124.

Kahan,
"A Survey of Error Analysis,"
in
Proceedings of 1971 IFIP Conference,
IFIP, 1971.
 

gwyn@brl-smoke.UUCP (04/02/87)

In article <15958@sun.uucp> dgh@sun.uucp (David Hough) writes:
>After reading part way through too many postings which turn out
>to be another instance of the blind arguing with the ignorant,
>...

I really have to take objection to the implication that people
who disagree with Mr. Hough's viewpoint do so because of ignorance.

It is certainly true that only a few members of X3J11 feel really
comfortable concerning requirements for floating-point algorithms.
However, there ARE members who do have lots of experience in this area.

It is BECAUSE I care about good floating-point algorithms that I even
bothered to suggest that floating-point == was not a good idea.  I'm
not going to try to get this put in the standard, but I WAS hoping that
people like Mr. Hough might learn something from the ensuing discussion.

>Fortran was created before the phrase "computer science" was imagined,
>by people who were trained as applied mathematicians.  Consequently they had
>some familiarity with what was needed and what had been done by 
>mathematicians over the past several centuries.
>...

It is the failure to understand that the properties of machine
floating-point arithmetic are NOT identical with those of the
real number system of mathematics that leads to many problems
in computational algorithms.  The whole field of numerical analysis
developed BECAUSE of initial problems encountered by programmers in
the "good old days".  It might be "nice" if programming languages
accurately mimicked pure arithmetic, but use of hardware floating-
point is simply not that pure.  Wishful thinking is irrelevant here!

It is totally bogus to argue that mathematicians' idea of the meaning
of parentheses in expressions has anything to do with time order of
evaluation.  That is a Fortran notion.  C actually conforms more
closely to what a mathematician means by parentheses, which is to
group subexpressions in order to override default rules of operator
precedence.  Because C also has a heavily-used macro expansion
facility, we have to take that into account when deciding what the
"right" thing to do with parentheses is.  I assure you that X3J11 has
heard the arguments that the Fortran-style proponents give, and we
decided differently for SOUND REASONS.  We ALSO found that their
legitimate concern could be met by use of the unary + operator,
which was originally introduced for unrelated reasons.  Therefore
there is no reason remaining other than "to act just like Fortran"
that can be given for what Mr. Hough apparently would prefer.  Since
programmers who try to apply their Fortran intuition to C are likely
to get into big trouble in many other ways (e.g. function parameters),
that is a pretty weak reason.

I am still waiting for a really good reason for using floating-point
== in any robust algorithm.  I have posted the only one that anyone
has yet come up with (== 0.0 special case switch) and have also posted
an example of the problems its general use can cause.  I have many years
of experience in programming significant numerical applications in both
Fortran and C, and I am convinced that floating == is virtually NEVER
a good thing to write in one's code.  It would be a Very Good Thing in
my estimation if "lint" (or your local equivalent) would warn about
any use of floating ==, so you would be likely to re-think those
sections of code.

Finally, I would urge newsgroup readers to (a) consider that others
may have something useful to say, and if you think you disagree,
try at least to understand what their reasons are for their differing
opinions, and to (b) avoid posting when you really are unlikely to
contribute significantly to the discussion.  Following these principles
will help to reduce the volume of flamage and wasted time.  (Many former
useful newsgroup contributors "dropped out" long ago when they found
the signal-to-noise ratio too low.)

Yours for good ideas and useful discussions,
	Gwyn@BRL.ARPA

bright@dataio.UUCP (04/02/87)

In article <5716@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <15958@sun.uucp> dgh@sun.uucp (David Hough) writes:
>It is BECAUSE I care about good floating-point algorithms that I even
>bothered to suggest that floating-point == was not a good idea.
>
>I am still waiting for a really good reason for using floating-point
>== in any robust algorithm.  I have posted the only one that anyone
>has yet come up with (== 0.0 special case switch) and have also posted
>an example of the problems its general use can cause.  I have many years
>of experience in programming significant numerical applications in both
>Fortran and C, and I am convinced that floating == is virtually NEVER
>a good thing to write in one's code.

I use == for doubles and floats frequently:
	o in my test suites for my C compiler (Datalight C). This
	  is because I know EXACTLY what bit pattern
	  I want and want to make sure that all the library routines work
	  exactly.

	o to detect +infinity and -infinity, as the 8087 handles it
	  differently than the software floating point does.

	o in routines that attempt to determine the characteristics
	  of the floating point code (like Cody and Waite's machine
	  arithmetic program).

	o to detect special values supported by the 8087 in hardware
	  (such as pi).

I suspect that == is also useful:
	o in specially tweaked numerical programs. As Cody and Waite point
	  out, for many purposes you cannot write numerical routines that
	  are both portable and accurate to the limits of the machine.
	  Take a look at the code to implement sin() for an example.

And now for something completely different:

	How about allowing numbers to be specified in binary radix,
	as in 0b00110101 ? No existing code would be broken, and it
	would be very useful to those of us writing graphics code,
	code to fiddle with bits in i/o ports, etc. It seems odd that
	decimal, octal, hex, floating and ascii formats are supported,
	but not binary in a language supposedly designed for bit twiddling!
	As a corollary, support %b in printf and scanf.

	Also, support specifying floating numbers in hex radix. This
	would avoid horrible kludges like:

		unsigned value[4] = {0x1234,0x5678,0xabcd,0x4532};
		double d;

		if (d < *(double *)value)
			...

	The syntax would be:

		double def = 0x1324.5E678e+0x12;
		double abc = -0x.1B4F;

	def prescribes a mantissa of 0x12345E678 and an exponent of
	(4*16+0x12).

	Note that a + or - would be required before the exponent, to
	distinguish the e from the digit e. Also note that octal is
	a little difficult to parse, as lots of people use leading 0s
	already.

	Such syntax would be very useful to those of us writing numerical
	subroutines. Numerical analysis books frequently give the
	constants to use specified in hex or octal, so as to control
	the resulting values exactly. Note also that the constants
	for def and abc are more portably defined than value is, as
	value depends on byte order, exponent bias, where the sign bit
	is, etc. They are also less error prone to type in, as they
	can be compared directly with the book values instead of going
	through a manual translation process.

jimv@omepd.UUCP (04/04/87)

Doug, I believe that you are mistaken in a number of points you make
in your rejoinder to Dave Hough's note here.

Before I dive into the details, though, let me first comment on your
trailing remark:

>Finally, I would urge newsgroup readers to (a) consider that others
>may have something useful to say, and if you think you disagree,
>try at least to understand what their reasons are for their differing
>opinions, and to (b) avoid posting when you really are unlikely to
>contribute significantly to the discussion.

This is good advice, but it seems to me that you ignored it when you
posted your article.  When reading (and responding?) to this article, I
hope that you keep in mind that I am posting this because I believe
that I have something valuable to contribute.  As you will see below, I
also think that Dave's article had a lot to contribute -- in fact more
than your response.

Okay, on to what I want to say.


In his article, Dave made a number of important points about
programming in C and numerical programming.  You picked up on his
references to Fortran and responded rather angrily (more on that
later), but didn't address or acknowledge anything about the references
he suggested as important background.  Those references are a veritable
gold mine of important information on floating-point arithmetic.  I
strongly recommend that anyone and everyone who is going to do any
significant amount of floating-point programming familiarize themselves
with the information in these references.

You say:
>It is BECAUSE I care about good floating-point algorithms that I even
>bothered to suggest that floating-point == was not a good idea.  I'm
>not going to try to get this put in the standard, but I WAS hoping that
>people like Mr. Hough might learn something from the ensuing discussion.
	and
>It is certainly true that only a few members of X3J11 feel really
>comfortable concerning requirements for floating-point algorithms.
>However, there ARE members who do have lots of experience in this area.

I would hope that you would learn something from Dave.  Dave was one of
the major players in the 754 and 854 IEEE floating-point standards.  He
has been involved in the design and implementation of many
floating-point systems over the last 10 years, including Sun's more
recent offerings.  I don't know exactly what Dave's PhD was in at UC
Berkeley, but the gist of it was numerical programming.  In short, Dave
is a trained and experienced *expert* in floating-point programming.
His credentials are excellent and are not to be ignored.

I agree that there are many ways to misuse a floating ==, but that
doesn't mean that all uses of it are misuses.  I believe that there are
equally as many bad uses of "goto", pointer conversions, and so on that
are possible in the language.  Is the possibility of misuse sufficient
justification for prohibiting useful operations?  I say no.  Surely
you're not suggesting that we need to protect the ignorant
floating-point users from themselves?

>I am still waiting for a really good reason for using floating-point
>== in any robust algorithm.  I have posted the only one that anyone
>has yet come up with (== 0.0 special case switch) and have also posted
>an example of the problems its general use can cause.  I have many years
>of experience in programming significant numerical applications in both
>Fortran and C, and I am convinced that floating == is virtually NEVER
>a good thing to write in one's code.  It would be a Very Good Thing in
>my estimation if "lint" (or your local equivalent) would warn about
>any use of floating ==, so you would be likely to re-think those
>sections of code.

Maybe the "lint" approach is reasonable, although I am not entirely
convinced.  In any case, there are just too many good uses of floating
== to put this warning in the compiler, or to uniformly deprecate it as
an operation.

Let me give you some examples of good uses of floating ==.  I don't
have a whole lot of floating-point code on this machine to draw on, but
here are some examples taken from the code which I can easily get my
hands on.

(1)  First, there's your example of comparing a value for == or != to
     0.0.  A quick textual search of Spice3 found 109 obvious instances
     of this usage.  Usually this check is to avoid a zero divide
     fault, but sometimes it also is used to detect discontinuities in
     approximated functions (e.g. ATAN2(0,0)).  Some typical examples
     are:
		if (model->MOS2substrateDoping == 0.0) xwb = 0.25e-6;
		if ((model->MOS2fastSurfaceStateDensity == 0.0) ||
		(OxideCap == 0.0)) {
		if (dd[i] == 0.0)
		if (realpart(&cc[i]) == 0.0) {
		if (largest == 0.0) {
		} else if (lo == 0.0) { 
		if (delta == 0.0) {
		if (mat1[i * n + i] == 0.0)
		else if (((num < 10.0) && (num > -10.0)) || (num == 0.0))
     Sometimes the check is made for efficiency, such as in the
     following example:
		if(arg != 0) sqarg=sqrt(arg);

(2)  Similar to (1), there are also comparisons against other "known"
     values.  Sometimes these values are extrema, sometimes these are
     initial values, and sometimes these are just critical or undefined
     domains in the computed function.  Examples from Spice3 include:
		if(model->MOS1bulkJctBotGradingCoeff == .5) {
		} else if ((ox != HUGE) && (oy != HUGE)) {
		if (stack[st].e_dlist[1] != HUGE)
		for (d = dv; *d != HUGE; d++)
     Admittedly there are less of these in Spice than comparisons
     against zero, but they do occur.  Another place I looked in the
     4.3 math library.  There I found more examples, of which the
     following are typical:
		if( x != -1.0)
		if (x == 1.0) {
		if ( x == zero )
		if ( x == negone )
		if((t=copysign(x,one))==one) return(zero/zero);
		else if(y==two)       return(x*x);
		else if ( (t=drem(y,two)) == zero)	return( pow_p(-x,y) );
     Yet another example of this is seen in some of the following code
     which tests a math library for correct handling of special case
     operands:
		if(x!=pzero||s!=1.0) printf("Failed; "); else printf("O.K.  ; ");  
		if(x!=PI) printf("Failed; "); else printf("O.K.  ; ");  
		if(x!= -PI) printf("Failed; "); else printf("O.K.  ; ");  
		if(x!= PIh) {printf("Failed; ");
		else if(y!= PIh) {printf("Failed; ");
		if(x==pinf)   printf("O.K.  ; "); else printf("Failed; ");
		if(x==none)   printf("O.K.  ; "); else printf("Failed; ");
		if(x==nzero&&s== -1.0) printf("O.K.  ; "); else printf("Failed; ");
		if(x==unft)   printf("O.K.  ; "); else printf("Failed; ");
		if(x==gunt)   printf("O.K.  ; "); else printf("Failed; ");
		if(x==pone) printf("O.K.  ; "); else printf("Failed; ");
		if(x==pinf&&y==pinf)      {printf("O.K.  ; "); 
		if(x== 5.0)      printf("O.K.  ; "); else printf("Failed; ");
		if(x== 5.0*unft)      printf("O.K.  ; "); else printf("Failed; ");
		if(x==ninf) printf("O.K.  ; "); else printf("Failed; ");
		if(x!=  5929741.) printf("Failed; "); else printf("O.K.  ; ");  

(3)  Sometimes it is quite natural to compare two floating-point
     numbers for equality.  Often this is done as an early guard in an
     expression where the difference of the two numbers must be
     non-zero.  (This guard doesn't always guarantee that if the
     difference were computed that the result would be nonzero for all
     rounding modes, but that may be irrelevant because the function is
     actually computed using an algebraically equivalent formula.)
     Other times one is looking to see if a floating-point number has
     an integral value.  Some examples of this from Spice3 and 4.3bsd
     libm are:
		rcheck((dd1[i] >= 0) || (floor(dd2[i]) == ceil(dd2[i])), "power");
		d[i] = dd1[i] == dd2[i];
		d[i] = ((realpart(&c1) == realpart(&c2)) &&
		(imagpart(&c1) == imagpart(&c2)));
		if (arg == t) {
		if((double)k==y) {	/* if y is an integer */
     In yet other circumstances, a function may be defined in such a
     way that it's value is computed differently or in a special way
     when the arguments are equal.  The common example of this is
     ATAN2(x,x).

(4)  In IEEE code, the construct (x != x) is often used to quickly and
     easily detect NaN operands.  The following example from 4.3bsd libm
     says it all:
		if(x!=x) return(x);	/* x is NaN */

One piece of software which I have not quoted examples from, but does
make heavy use of floating ==, is Kahan's Paranoia.  I argue that even
though it isn't a typical floating-point program, it is performing a
function that should be expressible in a language which purports to
support floating-point arithmetic.


On to the next point you make:
>It is totally bogus to argue that mathematicians' idea of the meaning
>of parentheses in expressions has anything to do with time order of
>evaluation.  That is a Fortran notion.  C actually conforms more
>closely to what a mathematician means by parentheses, which is to
>group subexpressions in order to override default rules of operator
>precedence.

I don't believe that Dave explicitly said that the reason for a
compiler to honor parentheses is a mathematical one.  I think that he
said that the mathematicians who did the early programming in
floating-point influenced Fortran to guarantee the order of operations
by obeying parentheses.

All of us who have programmed floating-point know that the order that
operations are done in is sometimes critical to the correct evaluation
of expressions.  Granted: ANSI has specified the (albeit clumsy) "+"
operator to guarantee order of evaluation.  Dave's point was that
  (a) pre-ANSI C did not have this concept, and
  (b) the people doing early floating-point programming (in
      Fortran) recognized the value of enforced order of evaluation.

Sure it's possible in pre-ANSI C to guarantee order of evaluation by
breaking the expressions up into separate statements.  The cost there
is the (un)clear code of what is being computed.  Requiring order of
evaluation to be enforced by separate statements is almost as clumsy
requiring each integer operation to be a separate statement,
disallowing side-effects, or not supporting more than 1 level of
dereferencing pointers in a single expression.  Sure, you can program
with restrictions like these, but it sure isn't going to be your
language of choice.

Here are some examples of parenthesized expressions, pulled from the
Elefunt tests, which want the order of evaluation guaranteed by the
parentheses:
		x = ((one + x * a) - one) * 16.0;
		y = x / ((half + x * half) *((half - x) + half));
		x = (x + eight) - eight;
		while (((b+one)-b)-one == zero);
		if ((a+betam1)-a != zero)
		if ((one - a) - one != zero)
		y2 = (y/two + y) - y;
		w = zz * xsq/(den*(den-one));
		w = (x - zz) + w;
		w = ((half-zz) + half) * ((half + zz) + half);
(I suppose some of these look pretty obscure even coded like this, eh? :-))

>Because C also has a heavily-used macro expansion
>facility, we have to take that into account when deciding what the
>"right" thing to do with parentheses is.  I assure you that X3J11 has
>heard the arguments that the Fortran-style proponents give, and we
>decided differently for SOUND REASONS.  We ALSO found that their
>legitimate concern could be met by use of the unary + operator,
>which was originally introduced for unrelated reasons.  Therefore
>there is no reason remaining other than "to act just like Fortran"
>that can be given for what Mr. Hough apparently would prefer.

I understand that there are historical reasons and compatibility issues
for the ANSI spec to read the way it does about parentheses.  I
understand that C macro facility tends to require excess parentheses.
And, I accept this decision by ANSI.

But, that doesn't mean that Dave's comments are invalid.  It is the
"easy" coding using parentheses (as seen above) that Dave was arguing
is a *good thing* for numerical programmers.  What is being traded off
here is clean expressibility of floating-point expressions.  It's okay
in some sense that this has happened, but you can't deny that it has
happened, and you can't claim that somehow the resulting language is
better for floating-point computations.



Let me try to wrap this up with a few more miscellaneous thoughts.
First, ANSI C is much better for floating-point programming than the
existing C compilers.  The biggest win that comes to mind is prototyped
functions, which gives much-needed control over data types of passed
arguments.  Personally, I wish that there was a function storage class
called "inline" which would allow me to get rid of a number of hairy
macros.  The improvements in the ANSI specification of the C math
library are good if not perfect, as is the language support for
extended (long double) variables.  I'm even reasonably happy with the
parenthesis situation, since I just overspec my compiler and require it
to preserve parentheses for floating-point operations.  (Sure, that's
not encouraging portable code, but it's an easy dependency to document
and consequently doesn't bother me much.)

And who am I?  Just a floating-point newcomer: only 5 years of
continuous floating-point experience, and only working on my 4th
floating-point processor.
--
Jim Valerio	{verdix,intelca!mipos3}!omepd!jimv, jimv@omepd.intel.com

g-rh@cca.UUCP (04/04/87)

In article <5716@brl-smoke.ARPA> gwyn@brl.arpa
	(Doug Gwyn (VLD/VMB) <gwyn>) writes:

....
>
>It is totally bogus to argue that mathematicians' idea of the meaning
>of parentheses in expressions has anything to do with time order of
>evaluation.  That is a Fortran notion.  C actually conforms more
>closely to what a mathematician means by parentheses, which is to
>group subexpressions in order to override default rules of operator
>precedence.  Because C also has a heavily-used macro expansion
>facility, we have to take that into account when deciding what the
>"right" thing to do with parentheses is.  I assure you that X3J11 has
>heard the arguments that the Fortran-style proponents give, and we
>decided differently for SOUND REASONS.  We ALSO found that their
>legitimate concern could be met by use of the unary + operator,
>which was originally introduced for unrelated reasons....

Doug, much as we all respect your knowledge of C and Unix, this
just doesn't wash.  The essence of the problem is that floating point
arithmetic is commutative but not associative.  From a mathematicians
viewpoint 'x + y + z', where x, y, and z are floating point numbers
and '+' is floating point addition, is not well defined.  I repeat,
NOT WELL DEFINED.  The use of 'x op y op z' where 'op' is a two place
operator is ambiguous:  it may either express

	op(op(x,y),z) or op(x,op(y,z))

Associativity, of course, means that the two are equivalent (i.e. they
both yield the same result.)  Furthermore we can prove (and it has to be
proved) that the expression 'x1 op x2 op x3 ... op xn' is unambiguous
because all possible groupings are equivalent if op is associative.

If we are given an expression 'x op y op z' and 'op' is not associative
then either we must use a predefined order of evaluation rule or the
expression is not well defined.

This is a question of mathematics, and is not, as you say, a question
of Fortran versus C.  As it happens, the rules for evaluating expressions
in Fortran are mathematically 'correct' in that they correctly model the
mathematical properties of floating point numbers (in this instance),
whereas those of C are not.

Now it may well be that there are 'SOUND REASONS' for leaving the situation
as it is.  I would guess that the argument runs as follows:  It will be
much more expensive (in terms of compiler and preprocessor time) if
parentheses must be honored in all instances, and even more expensive
if they must be detected and honored only in the case of nonassociative
operators.  Furthermore, the insistence upon honoring order of evaluation
will 'break' large classes of existing compilers.

These are legitimate reasons.  They amount to asserting that for economic
and historical reasons it is preferable to be living in a certain amount
of mathematical 'sin' rather than to be mathematically 'pure' after the
fact.  But please, do not misrepresent the mathematics of the situation.
It is misleading to assert that the use of parantheses is simply a matter
of grouping of common subexpressions.
-- 

Richard Harter, SMDS Inc. [Disclaimers not permitted by company policy.]

g-rh@cca.UUCP (04/05/87)

One class of usage for floating point equality has been suggested,
detection of exact bit patterns.  I suggest that this is not advisable,
both in principle and in practice.  I say 'in principle' because you
are using an operation with one set of semantics to simulate an
operation with a different set of semantics.  I say 'in practice'
because you can get unexpected results if the bit patterns are
equivalent to unnormalized floating point numbers.

However let us consider the following rather common situation:
We have a iterative process which generates a floating point number,
x, with the following iteration equation

	x <- x + delta

where delta is a function of x and where it is expected that the
iteration converges to some limit point x0.  We have the generic
problem of determining when the iterative sequence has converged,
taking into account the nature of floating point arithmetic.

Typically we will find the the calculation of the delta in question
becomes numerically unstable when x is sufficiently close to x0.
Let us suppose that we have addressed this problem, so that we
are simply left with the convergence issue.

The iteration will have converged if the iteration equation does
not change the value of x; i.e. we will have found the limit point
of the sequence, given that we are using floating point arithmetic
of the machine in question.  The question then is, how do we test
for this condition?  Is the following pseudo code acceptable?

	if ((x+delta)==x) then terminate_iteration
	else continue_iteration

If not, why not?  Alternatively should one use

	if (delta>0.) then
		if ((x+delta)<=x) then terminate_iteration
		else continue_iteration
	else if (delta<0.) then
		if ((x+delta)>=x) then terminate_iteration
		else continue_iteration
	else terminate_iteration

or is this also unacceptable?  If so, why?  Is there a best method?
What is it, and what are the issues involved?  If it is not
guaranteed the the calculation of delta is stable, what further
issues are involved?

You will be given a test on this at the end of the week.  No grade
will be given, but, if your answers are incorrect, World War III
will happen.

-- 

Richard Harter, SMDS Inc. [Disclaimers not permitted by company policy.]

flaps@utcsri.UUCP (Alan J Rosenthal) (04/05/87)

In article <15958@sun.uucp> dgh@sun.UUCP writes:
>Fortran was created before the phrase "computer science" was imagined,
>by people who were trained as applied mathematicians.  Consequently they had
>some familiarity with what was needed and what had been done by 
>mathematicians over the past several centuries.   It would never have
>occurred to them, for instance, to disregard parentheses while evaluating
>expressions or to disallow equality comparison of numbers!

Ahem.

1.  Give any applied mathematician a calculator and the expression
"(57382 * 23) + (57382 * 2)".  They will ALL punch "57382 * 25" on the
calculator.  They know that parentheses can be rearranged.  (obviously not
disregarded, C doesn't do that.)

2.  Kernighan and Plauger, in _The_Elements_of_Programming_Style_, note that
the version of fortran at their local installation fails on an equality test
when one number is on an input punch-card and the other, identically typed
and also a constant, is in a program.  This is because different scanning
routines were used in the i/o library compiled with the fortran program and
the i/o routines that were part of the fortran compiler!  So much for
floating-point equality in fortran.  It's just a fact of life that you can't
count on floating-point equality (although nothing in C parallels the idiocy
of having two versions of a floating-point number scanning routine).

-- 

Alan J Rosenthal

flaps@csri.toronto.edu, {seismo!utai or utzoo}!utcsri!flaps,
flaps@toronto on csnet, flaps at utorgpu on bitnet.

"Probably the best operating system in the world is the [operating system]
made for the PDP-11 by Bell Laboratories." - Ted Nelson, October 1977

gwyn@brl-smoke.UUCP (04/06/87)

I suppose I should have made it clear that I was discussing
the issue in terms of portable programming practice.

Tests for float-point equality with particular bit patterns
is clearly very machine-specific.  Walter Bright further
assumed he would not get in trouble due to register/memory
width differences such as I explained in an earlier example.
Seldom are floating-point values merely picked up and passed
around; much more often they are computed quantities that
inherently have "fuzz" attached to them.  It is the "fuzz"
that makes floating-point equality testing so problematic.

Certainly if you can count on specific behavior for your
system then you might find ways to exploit it.  I try to
avoid writing code like that (except in the special case
where I'm implementing the run-time system, or similar
circumstances where the gain outweighs the disadvantages).
I doubt that I'd trust the C compiler to in future releases
not change the code it generates for a particular source
construct, though.

gwyn@brl-smoke.UUCP (04/06/87)

In article <532@omepd> jimv@omepd.UUCP (Jim Valerio) writes:
>Doug, I believe that you are mistaken in a number of points you make
>in your rejoinder to Dave Hough's note here.

Thanks for the extended discussion, Jim.  I won't repeat the points
I made before; however I must note that about half of the Spice
examples of use of floating == were benign, but the other half fell
into the pitfalls that make me so distrustful of floating ==.  For
the most important example, simply checking that the divisor is non-
zero does NOT guarantee against overflow in the subsequent division.
Truly robust floating-point code must be much more carefully crafted
than that.

A simple list of how floating == has been used (e.g. in Spice) is
not the same thing as a lists of contexts in which its use is safe.

It's also not very surprising to learn that Mr. Hough was one of the
people we have to "thank" for the IEEE floating-point standard.
Many of the qualified people I've discussed the IEEE standard with
have been very dissatisfied with it.  My particular gripe is that it
tries to legitimatize arithmetic with nominally invalid operations
(producing extended "values" such as plus or minus "infinity"); this
merely encourages those programmers who weren't taking sufficient
pains with their algorithms to run off and produce even more
meaningless results ("The hardware supports it, it must be okay.").

I've been burned too many times by floating-point computations that
misbehave when pushed near their extremes.  Unfortunately I can't
offer a simple recipe for avoiding such problems, other than to
know what can go wrong and to think carefully when designing the
algorithm.  The only simple formulas I know of that are right most
of the time are "watch out when dividing", "use atan2(), not atan()",
"consider normalizing", and "be suspicious of any test for equality".
I wish I had better, more specific rules, but I don't (and I have
read many of the references Mr. Hough gave; they don't help much).

The reason I'm not actually going to propose removal of floating
== from the ANSI C standard is that it doesn't solve the real
problem, which is that most programmers are insufficiently aware
of pitfalls in floating-point arithmetic.  Neither a change in ANSI
C nor the IEEE floating-point standard directly address this problem.

ad3@h.cc.purdue.edu.UUCP (04/06/87)

In article <14681@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes:
>One class of usage for floating point equality has been suggested,
>detection of exact bit patterns.  I suggest that this is not advisable,
>both in principle and in practice.  I say 'in principle' because you
>are using an operation with one set of semantics to simulate an
>operation with a different set of semantics.  I say 'in practice'
>because you can get unexpected results if the bit patterns are
>equivalent to unnormalized floating point numbers.

I've had the "opportunity" to track down and fix exactly this problem
in two major statistical packages that we run on our CDC 6000 systems.
The tale related here is a Fortran rather than a C example.  But it
could easily happen with any language.

Both packages were originally developed on IBM systems in the days
before Fortran77.  In those days, Fortran didn't have character
variables, so character data had to be stored in numeric variables.  In
IBM-land, this data often ended up in REAL-typed variables.

Floating-point comparison of character data may work on IBM systems
(I'm not intimately familiar with IBM data representations and
operations), but it can be a problem on the CDC 6000.  A little
background follows...  The CDC 6000 has a 60-bit word size and a 6-bit
character size.  The character set includes "A"-"Z" (01-32 octal),
"0"-"9" (33-44 octal), blank (55 octal), and a number of other special
characters.  Floating point format uses the upper 12 bits (2
characters) for the exponent and the lower 48 bits (8 characters) for
the mantissa.

The problem described here was in the package's command language
decoding.  The command language input is broken into tokens, which were
stored left-justified with blank fill in floating point variables.
Parsing the program naturally includes checking these tokens to try to
recognize command language keywords.  One of these keywords "TO" and
the user had a variable named "TRE".  These are stored internally as:
    24 17 55 55 55 55 55 55 55 55    TO
    24 22 05 55 55 55 55 55 55 55    TRE
(All bytes are in octal.)

So, how do we compare floating point quantities?  The compiler can't
assume that the data will be normalized, so it can't generate code to
do a bitwise comparison.  Instead, the generated code subtracts one
"number" from the other and compares the result to 0.  In doing the
subtraction, the hardware adjusts the number with the smaller exponent
so that the exponents match.  This exponent adjustment must be
compensated by shifting the mantissa so that the adjusted number has
the same value.

So what happens in this particular case?  "TO" has the smaller exponent
(24 17), so its exponent is incremented by 3, making it match the other
(24 22).  This must be compensated by shifting the mantissa (55 55 55
...) right 3 bits, making it (05 55 55 ...).  Putting it all together,
the hardware has transformed "TO" into "TRE", and they'll naturally
compare equal.

Several general points should be made here:
- Sometimes the base data types provided by a language don't match the
  application's view of the data, and you have to choose one of the
  available data types.
- You should be aware of the limitations of the various forms of data
  representation, so that you can make a good choice.


======================================================================
-- 
Mike Brown, Systems Programmer		ARPANET: ad3@j.cc.Purdue.EDU
Purdue University Computing Center	BITNET:  AD3@PURCCVM
Mathematical Sciences Building		USENET:  ad3@pucc-j.UUCP
West Lafayette, IN 47907		Phone:   (317) 494-1787

tyler@drivax.UUCP (04/07/87)

In article <14681@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes:
>We have a iterative process which generates a floating point number,
>x, with the following iteration equation

>	x <- x + delta

>where delta is a function of x and where it is expected that the
>iteration converges to some limit point x0.  We have the generic
>problem of determining when the iterative sequence has converged,

>The iteration will have converged if the iteration equation does
>not change the value of x; ..... Is the following pseudo code acceptable?

>	if ((x+delta)==x) then terminate_iteration
>	else continue_iteration

This is unacceptable in general, because, at the limit, in our discrete 
world, you may find that the successive values for x cycle through 2 or 
more very nearly equal values.  That is, x1+delta(x1) = x2; x2+delta(x2)=x1 
where x1 and x2 are both very close together.  This could well happen 
when the mathematical limit of the sequence lies between x1 and x2.

>Alternatively should one use

>	if (delta>0.) then
>		if ((x+delta)<=x) then terminate_iteration
>		else continue_iteration
>	else if (delta<0.) then
>		if ((x+delta)>=x) then terminate_iteration
>		else continue_iteration
>	else terminate_iteration

>or is this also unacceptable?  If so, why?  Is there a best method?
>What is it, and what are the issues involved?

This method suffers from the same problem, as well as being costly to
compute.  

I don't know a best method, however if you know something about the 
direction of approach to the limit, you can do fairly well with one of 
the two following approaches:

1.  If the x values alternately are above and below the limit, just go
    till the absolute value of delta is less than your acceptable error
    level.

2.  If x monotonically increases (decreases) to the limit, go until
    EITHER two successive values are the same, OR until a change in 
    direction is observed.

If you know the convergence rate of the sequence, you can do reasonably
well by just counting iterations, especially if you have a good starting
value.  This has the additional virtue that counting iterations doesn't add
much overhead to your loop.

>You will be given a test on this at the end of the week.  No grade
>will be given, but, if your answers are incorrect, World War III
>will happen.

Please direct flames to me personally, rather than starting WW III.

-- 

Bill Tyler ... {seismo,hplabs,sun,ihnp4}!amdahl!drivax!tyler

gwyn@brl-smoke.UUCP (04/07/87)

In article <14680@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes:
>...  From a mathematicians
>viewpoint 'x + y + z', where x, y, and z are floating point numbers
>and '+' is floating point addition, is not well defined.  ...

People who use floating-point numbers practically always think of them
as modeling the mathematical "real number" system.  The point that we
agree on is that this model is inexact.  There seems to be disagreement
on whether mathematicians think that, in an expression such as "x + y + z",
the "+" symbolizes machine (approximate) addition or real number (exact)
addition.  I'm sure that the latter is what most would think of.  If the
domain has been agreed to be complex numbers, rings, or some other algebra
than the usual real number field arithmetic, then of course the "+" signs
are assumed to have corresponding meaning.  Most times that I see
discussions of machine floating-point arithmetic, some symbol such as a
circled + sign is used instead of "+" to represent the machine operation.

When people say that "mathematicians" use () to indicate the sequential
order in which operations are carried out, they're simply wrong (for most
conventional mathematics as I was taught it through graduate school).
The () indicate a logical grouping; usually there is no time sequence
implied whatsoever (there IS a logical hierarchy imposed by (), but due
to various identities any of a number of computationally distinct
expressions may be exactly equivalent mathematically).  Actually, many
mathematicians may have never given this matter any thought, because it
didn't seem necessary.  I don't argue that evaluation sequence is not
what *Fortran programmers* mean by (), but that's not the same as what
*mathematicians* mean, and I can't allow an attempt to appeal to existing
mathematical practice to be used in this debate when it is mistaken.  Next
we'll be hearing that mathematicians use "=" (or even more laughably, ":=")
to mean assignment.

In any case, X3J11 has provided the means to not specify order of
evaluation when it's unimportant, and to specify order of evaluation
when it is.  The only beef seems to be that we didn't limit C to act
"just like Fortran".  What I haven't heard are good arguments why C
should be made less flexible in this regard.  It runs counter to the
general "spirit of C" to impose such unnecessary constraints without
adding any power thereby.

tps@sdchem.UUCP (04/07/87)

In article <14680@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes:

>In article <5716@brl-smoke.ARPA> gwyn@brl.arpa

>>C actually conforms more
>>closely to what a mathematician means by parentheses, which is to
>>group subexpressions in order to override default rules of operator
>>precedence....

>Doug, much as we all respect your knowledge of C and Unix, this
>just doesn't wash.  The essence of the problem is that floating point
>arithmetic is commutative but not associative.  From a mathematicians
>viewpoint 'x + y + z', where x, y, and z are floating point numbers
>and '+' is floating point addition, is not well defined.  I repeat,
>NOT WELL DEFINED.  The use of 'x op y op z' where 'op' is a two place
>operator is ambiguous:  it may either express
>
>	op(op(x,y),z) or op(x,op(y,z))

So what?  Even the operation 'x + y' is not well defined.  As everyone knows,
implementations have differing numbers of bits for mantissa and exponent,
different rounding schemes, and possibly different results if calculations
are done in registers.  Besides which, are you suggesting that when faced
with 'x + y + z' the compiler should warn that the expression is not
defined?  Or that this not even be allowed?

I think that the spirit of C is that the most common constructs tell the
compiler "Do the following in whatever way you want, hopefully the best
way for the machine you are on, given this particular broad outline.  If
I the details are important I'll have to specify that separately (and
perhaps more clumsily)."  For instance, "int i;" means 'i' is a word of the
handiest length for the machine.  If I need something that can hold the
difference between any two pointers, I have to say "long i;".

In this spirit, 'x + y' (if x and y are floating point) means "add x and y
together in the way customarily done on this machine".  '(x + y) + z' means
"add x, y and z in whatever way is best for this machine".  '+(x + y) + z'
means "add x and y first".

Whatever is finally decided for ANSI, I hope we do not lose the ability to
say to the compiler "do the following floating point calculation with only the
following broad outline".  Do numerical programmers really want most of their
operations done exactly in the order listed?  My experience has been that
most of the time, you don't.  It would be nice if there were a construct
which had a larger scope than does unary plus.  Some construct such as
	
	respect	expr;

where all the parentheses in "expr" are now "respected".  Or perhaps

	respect type f() { ... }

where parentheses are respected in the entire body of "f".

>If we are given an expression 'x op y op z' and 'op' is not associative
>then either we must use a predefined order of evaluation rule or the
>expression is not well defined.
>
>This is a question of mathematics, and is not, as you say, a question
>of Fortran versus C.  As it happens, the rules for evaluating expressions
>in Fortran are mathematically 'correct' in that they correctly model the
>mathematical properties of floating point numbers (in this instance),
>whereas those of C are not.

>....amount to asserting that for economic
>and historical reasons it is preferable to be living in a certain amount
>of mathematical 'sin' rather than to be mathematically 'pure' after the
>fact.  But please, do not misrepresent the mathematics of the situation.
>It is misleading to assert that the use of parantheses is simply a matter
>of grouping of common subexpressions.

I believe this is bogus because even implementations of Fortran are not
mathematically equivalent.  The question is, how much implementation-detail
do we want to force in the definition of '(x op y) op z'?  No matter how
this is chosen, numerical programs written on different machines can
produce different results.  This is not a question of mathematical sin.

|| Tom Stockfisch, UCSD Chemistry	tps%chem@sdcsvax.ucsd.edu
					or  sdcsvax!sdchem!tps

kent@xanth.UUCP (04/07/87)

I may have started this part of this discussion, and it leaves me a bit in awe
to be caught between representatives of X3J11, and the IEEE Floating Point
Standards committee.  Nevertheless, like Mr. Heinlein, I will fear no evil.
(I did my stint for 4 years on X3H3, so there!)

As promised in the summary, I would like to propose another (to me, excellent)
reason to modify the C standard to require that compilers honor the order of
expression which the programmer indicates by parentheses, programmer
productivity.  I'm not really (despite 25 years experience with them) that
comfortable with floating point numbers, so leave that out of the discussion
for now.

In FORTRAN, since that seems to be the counterexample language used in these
discussions, if I write an expression, and parenthesize it, I have only one
possible evaluation order to debug; that is the one I have written.  If I
write the equivalent expression in C, I have to consider (if I'm working in
a _really_ critical application, model and validate), _every_possible_ order
of evaluation.  Guess which takes longer.  ;-)

I'm sorry to be talking to a crowd of compiler writers, to whom wringing
every last excess cycle out of a piece of generated code is both a matter of
pride and honor, the usual way your work is evaluated, and the normal
expectation of your employers, in such a tone, but it is necessary.  Within
reason, the speed of compiled code is a _very_minor_ cost factor in the
code's life cycle cost, compared to the people costs of creating the code.

We who use your compilers would create our product cheaper and better if you
who design the languages would spend more time worrying about readability,
writeability, intuitiveness, and maintainability, and less time worrying
about the efficiency of execution of the language.  It is not a _minor_
matter when a piece of code doesn't execute the way it reads, and it is not
a _minor_ matter when someting like the proposed unary plus override on
parenthesis evaluation order is added to a language.  It is a utility
destroying set of blunders by the language designers.

I speak as a user of languages.  In a checkered career, I have learned, and
used to create working code, more thatn 40 languages (I lost track there in
about 1977).  I have lived with such wonders in write only code as APL, IBM
360 Assembler, PL/1 (yes, before '1' became 'I'), and LISP.  C is still the
single language with which I cannot become comfortable, and it is not just
senility, as I have learned several languages since C.  I write code in C,
it works, and I still get bitten by unexpected results of straightforward
looking computations _every_time_.  I hate it.

Please, when you make language design decisions, think of the poor boob who
has to use the resulting mess, and take the time to do it right, not just
patch the patches.  It is no argument to say that conforming compilers will
have to be revised if parenthesization order must be respected.  The ANSI
standard doesn't exist yet, it is at best a dpANS, until it is voted into
acceptance, so there are _no_ conforming compilers.  Every compiler maker
will have to go back and do massive rewrites to make existing production
compilers into conforming compilers.  What is true, is that respecting
parentheses will break no code not already buggy.

Think it over.  I'll try to keep out of the XfXiXgXhXtX discussion for a few
weeks.
--
The Contradictor		Member HUP (Happily Unemployed Programmers)

Back at ODU to learn how to program better (after 25 years!)

UUCP  :  kent@xanth.UUCP   or    ...seismo!decuac!edison!xanth!kent
CSNET :  kent@odu.csnet    ARPA  :  kent@xanth.cs.odu.edu
Voice :  (804) 587-7760    USnail:  P.O. Box 1559, Norfolk, Va 23501-1559

Copyright 1987 Kent Paul Dolan.			How about if we keep the human
All Rights Reserved.				race around long enough to see
Recursive retransmission rights only.		a bit more of the universe?

jimv@omepd.UUCP (04/07/87)

Doug, without intending to beat this subject to death, I just want to
point out how the IEEE standard addresses some of the problems you
describe with floating-point arithmetic.

I don't agree with your statement that the IEEE standard does not
address the problem that most programmers are insufficiently aware of
pitfalls in floating-point arithmetic.  I grant that the IEEE standard
won't make the programmers learn more than they choose to learn, but I
maintain that the default behavior will deliver correct answers more
often, and will also deliver wrong answers less often, than today's
typical non-IEEE implementation.

In general, the purpose of what you call ``extended "values"'' is to
allow the computation to continue in those cases where the programmer
didn't provide for better handling of the exceptional situation.  By
examining the results afterwards, the user can decide whether the
answer makes any sense.  Sometimes these exceptional conditions don't
contaminate the overall computation, and sometimes they do.  The rules
for the results of operations on infinities and NaNs were defined very
carefully to guarantee finite values are returned only when everything
worked right.

In particular, infinities have 2 important uses.  First, a
divide-by-zero event can now return an exact result: infinity.  This is
very useful for computations of quotients of products, or for continued
fraction evaluations.  This is not to say that an infinity is an
absolutely necessary representation to make these sorts of calculations
well defined, but it does make many calculations "just work".  The
standard illustrative exercise is: name the simple refinement to
computing the following function to make it compute the correct value on
machines where division by zero is fatal:

	R(x) := 7 - 3/(x - 2 - 1/(x - 7 + 10/(x - 2 - 2/(x - 3))))

The answer is "obvious" once you see it, but very few people do.  (Hint
1: pick what precision your intermediate expressions are calculated
in.  Hint 2: understand why explicit numbers for the constant values
are required.)

The other use for infinities is for overflows.  They do make sense
when an overflow occurs, if you look at floating-point arithmetic as
computing the exact result of the operation for it's operands and then
rounding that result to fit into the destination format.  With
infinities represented in floating-point numbers, every real number has
a single, correct representation (given a particular rounding rule).
What's the alternative if infinities didn't exist and the programmer
neglected to take overflow into account?  Generally the program is
aborted and any chance of useful data coming out is gone, even if that
particular part of the calculation turned out to not affect the final
result.

And, since none of the default responses are forced, an IEEE
implementation lets the programmer provide explicit actions on exactly
those exceptional conditions that he knows how to handle.  Sounds like
a win-win situation to me.  Honestly, I don't understand why you have
a problem with infinities (or NaNs).



And while I'm responding, here are a few comments on what you
recently wrote.

Regarding simple equality checks against division by zero:
>For the most important example, simply checking that the divisor is non-
>zero does NOT guarantee against overflow in the subsequent division.
>Truly robust floating-point code must be much more carefully crafted
>than that.

Granted that overflow is not prevented by checks for zero, but division
by zero is.  So why do they make this explicit check?  I could come up with
a number of guesses without looking at the code, only one of which
is that they failed to take overflow into account.  For example,
without looking carefully at the code, I can't tell if some portion of
the numbers are guaranteed to be in range and consequently won't
overflow.

I just made a 3 minute scan over the code in Spice3 which does an
explicit equality comparison with 0.0, and I see three categories
of what is being done:
  (1)  efficiency (skipping a whole bunch of calculations where
       a multiplier is zero),
  (2)  avoiding divide-by-zero, and
  (3)  other things that were not immediately obvious.
Some of the code in category (3) looked dubious to me, but other
parts looked pretty safe.  Not knowing exactly what is being
computed, I don't know if all the operations in (3) were safe
or not.  (I wouldn't bet on it. :-))

Regarding:
>A simple list of how floating == has been used (e.g. in Spice) is
>not the same thing as a lists of contexts in which its use is safe.

Very true.  It was sloppy of me to suggest that.  However, the list of
examples pulled from Spice3 and elsewhere did serve the purpose of
showing situations where == could very well be the right operation for
the job.

>I've been burned too many times by floating-point computations that
>misbehave when pushed near their extremes.  Unfortunately I can't
>offer a simple recipe for avoiding such problems, other than to
>know what can go wrong and to think carefully when designing the
>algorithm.

I agree that no simple recipe exists to avoid all floating-point
problems.

I also agree that floating-point doesn't behave well when pushed near
it's extremes.  So what?  Neither does integer arithmetic.  In both
cases, it's important to try to keep the problem domain well bounded,
or to have explicit rules to handle the boundary cases.  And that's
exactly what the IEEE standard provides for.
--
Jim Valerio	{verdix,intelca!mipos3}!omepd!jimv, jimv@omepd.intel.com

m5d@bobkat.UUCP (04/07/87)

In article <14680@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes:
>
>Doug, much as we all respect your knowledge of C and Unix, this
>just doesn't wash.  The essence of the problem is that floating point
>arithmetic is commutative but not associative.  From a mathematicians
>viewpoint 'x + y + z', where x, y, and z are floating point numbers
>and '+' is floating point addition, is not well defined.  I repeat,
>NOT WELL DEFINED.  ...
>
>Richard Harter, SMDS Inc. [Disclaimers not permitted by company policy.]

Not well defined for floating point (machine) arithmetic, but perfectly
well defined for abstract real numbers.  When a math guy writes down an
equation on the blackboard, he thinks in terms of real numbers, not
floating point.  When he decides to incorporate the equation into a 
computer program, he is then faced with the problem of the discrete
nature of all machine arithmetic.  If he is using FORTRAN, he will know
that parentheses have a certain meaning.  He writes code accordingly.
If he is using, C, he writes code accordingly, perhaps by breaking 
expressions into lists of simple expressions.  I think this is directly
analogous to the problem of developing a sorting algorithm on paper,
then implementing it: if it is done in LISP, it will be very different
from an implementation in C.


-- 
Mike McNally, mercifully employed at Digital Lynx ---
    Where Plano Road the Mighty Flood of Forest Lane doth meet,
    And Garland fair, whose perfumed air flows soft about my feet...
uucp: {texsun,killer,infotel}!pollux!bobkat!m5d (214) 238-7474

grodberg@kodak.UUCP (04/07/87)

In article <790@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes:
>In FORTRAN, since that seems to be the counterexample language used in these
>discussions, if I write an expression, and parenthesize it, I have only one
>possible evaluation order to debug; that is the one I have written.  If I
>write the equivalent expression in C, I have to consider (if I'm working in
>a _really_ critical application, model and validate), _every_possible_ order
>of evaluation.  Guess which takes longer.  ;-)....
>
>   Within reason, the speed of compiled code is a _very_minor_ cost factor 
>in the code's life cycle cost, compared to the people costs of creating the 
>code....   It is not a _minor_
>matter when a piece of code doesn't execute the way it reads, and it is not
>a _minor_ matter when someting like the proposed unary plus override on
>parenthesis evaluation order is added to a language.  It is a utility
>destroying set of blunders by the language designers.


   I must very forcefully agree with Mr Dolan.  I have written a lot of code
which necessarily works at the limit of numerical representation on the
machine I am working on.  I must be able to specify the order of evaluation
of expressions so that intermediate values do not overflow, underflow or
otherwise bomb.  On many compilers, this is what parenthases mean.  I don't
know of any compilers which now support the unary +, and it will be very
long before writing code that compiles on an ANSI standard compiler is
considered writing portable code.  To me this means that I will have to add 
plusses to a lot of my expressions when shifting to an ANSI compiler.

    I also don't believe that people put parenthases in their code with the
understanding that it won't have any effect on the order of evaluation.  
One of the reasons I put parenthases in my #defines is so that I can be sure
that that will be evaluated before anything around it. 

    In other words:
    1) Current compilers don't support unary +, so adding it will break most
       (all?) current compilers.
    2) *People* expect parenthases to be honored, and will be confused when
       bugs occur because they are not.
    3) Existing code does not depend on the compiler figuring out that the
       way the expression is parenthasized is not the right way to calculate
       the expression in order to achieve the correct value.  
    4) The time saved in evaluation is not worth all these hassles.

     To satisfy the time critical applications, I would recommend either
a compile time command line option such as -Oe (optimize expressions) or
a reserved word #define, such as #define OPT_EXPR (which could also be
specified on the compiler line).  The latter would provide for code
transportability between standard and non-standard compilers, as well as a way
to compile old code on new compilers.

-- 
          Jeremy Grodberg

Usenet: ...rochester!kodak!grodberg
Arpa: 	grodberg@kodak or kodak!grodberg@rochester

g-rh@cca.UUCP (04/08/87)

Ah, the ignominy of it all, being lectured on the basics by an earnest
bright eyed young lad.  Ah, well, Richard, tis you own fault for being
too subtle.

I raised the question of iteration equations and asked about thoughts
about various pseudocode alternatives for testing convergence.  Now, as
we all know, bad things often happen in iteration equations when they
are done in floating point.  When we get near a limit point we tend to
get oscillation in the last few bits with no final convergence.  I
(incorrectly) supposed that I had made it clear that these problems had
already been addressed -- that we had arranged our code so that there
would be a limit point (i.e. convergence is forced) -- so that the only
issue was whether or not we had reached the limit point.  To this end
I included the following paragraph:

"Typically we will find the the calculation of the delta in question
becomes numerically unstable when x is sufficiently close to x0.
Let us suppose that we have addressed this problem, so that we
are simply left with the convergence issue."

With this in mind, I posted some some pseudocode for dealing
with the actual detection of convergence.  William Tyler commented
on this pseudocode by noting that it would break down if the iteration
(when implemented in floating point) does not converge and said that
was unacceptable for that reason.

Well, he's right.  If we haven't minded our p's and q's, that's exactly
what does happen.  However, if we have done our job the calculation will
have the property that there will be an interval such that, for all x
in the interval, the iteration will converge.

I am assuming that we have done this.  The issue I want to deal with
is simpler.  Given that we have arranged things so that convergence
is forced, i.e. so that are guaranteed to reach a point such that,
for some n,

	x[n+1] = x[n] + delta[n] = x[n],

what is the proper way to test for this, if the calculations are
being done in floating point?  For example, is it not conceivable
that the following two pieces of pseudocode will not yield the
same results:

(1)	if ((x+delta)==x) then done = TRUE
	else                   done = FALSE

(2)	temp = x + delta
	if (temp==x) then done = TRUE
	else              done = FALSE

In (1) 'x+delta' might be stored in registers with extension bits
whereas x, also in a register, has no extension bits.  But is (2)
safe?  If not (and why not) then what is?  Again, I am assuming that
we have arranged things so that the iteration will actually converge.
This may seem picky, but it is inattention to picky little details
that gets your ass in a sling.
-- 

Richard Harter, SMDS Inc. [Disclaimers not permitted by company policy.]

rbbb@rice.EDU (04/08/87)

   We who use your compilers would create our product cheaper and better
   if you who design the languages would spend more time worrying about
   readability, writeability, intuitiveness, and maintainability, and less
   time worrying about the efficiency of execution of the language.  It is
   not a _minor_ matter when a piece of code doesn't execute the way it
   reads, and it is not a _minor_ matter when someting like the proposed
   unary plus override on parenthesis evaluation order is added to a
   language.  It is a utility destroying set of blunders by the language
   designers.

Agreed.  Respecting parentheses will not BREAK any existing programs,
though they may run more slowly.  (People have been harping on and on
about taking potential optimizations away from the compiler; are there
really any C compilers that are that gung-ho about reordering
expressions?)

Suppose instead that some other form of parentheses (what I don't know) be
designated as the optional parentheses.  These could be used in macro
definitions, thus taking care of the argument about "macros introducing
tons of parentheses" while leaving ordinary code about as readable as it
ever was.  Of course, this will make expanded macros less readable, but
this is not a day-to-day concern.

Note that this approach requires more effort to make code go fast.  The
other approach (unary + to indicate order of evaluation) requires more
effort to make code robust.  Guess which one I favor.

Before you pooh-pooh the "+() is more effort than ()", consider what is
usual in other uses of parentheses, and consider the arguments put forth
in K&R (p.17) for "="/"==" instead of ":="/"=".

Another approach is to follow the current approach strictly.  If it
happens that the operators are not commutative and associative, then
respect the parentheses.  If this is truly the case, then I have no
problem with the current approach.  Of course, floating point arithmetic
is not associative, and machine arithmetic is not associative in general
in the presence of overflow exceptions.  Too bad--I guess parentheses must
be respected after all.

David

dlnash@ut-ngp.UUCP (04/08/87)

In article <788@kodak.UUCP>, grodberg@kodak.UUCP (jeremy grodberg) writes:
>      To satisfy the time critical applications, I would recommend either
> a compile time command line option such as -Oe (optimize expressions) or
> a reserved word #define, such as #define OPT_EXPR (which could also be
> specified on the compiler line).  The latter would provide for code
> transportability between standard and non-standard compilers, as well as a way
> to compile old code on new compilers.
> 

I like the -Oe idea better that the #define OPT_EXPR idea.  Making a #define
directive affect the compiler is not a good idea, since someone else may be
using OPT_EXPR (or almost whatever else you come up with) to mean something
else.  The -Oe idea makes the most sense to me.  Existing code doesn't break,
and if you need specific evaluation order, you don't need to add stuff to your
code (unary '+') to get it.  Of course, since this is the logical solution,
there is no way it will ever get implemented. :-)

				Don Nash

UUCP:    ...!{ihnp4, allegra, seismo!ut-sally}!ut-ngp!dlnash
ARPA:    dlnash@ngp.UTEXAS.EDU
BITNET:	 CCEU001@UTADNX, DLNASH@UTADNX
TEXNET:  UTADNX::CCEU001, UTADNX::DLNASH

john@viper.UUCP (04/08/87)

In article <788@kodak.UUCP> grodberg@kodak.UUCP (Jeremy Grodberg) writes:
 >
 >    I also don't believe that people put parenthases in their code with the
 >understanding that it won't have any effect on the order of evaluation.  
 >One of the reasons I put parenthases in my #defines is so that I can be sure
 >that that will be evaluated before anything around it. 
 >

Please add my name to the growing list of people who object to making
reordering of parenthasized equations a default standard!!!  If the
primary reason for maintaining this is the pre-existance of compilers
which do this, that is an unaceptable excuse.  The standard will
already require major changes in many compilers.  The pre-existance
of a bad idea is no excuse for maintaining it...

 >    2) *People* expect parenthases to be honored, and will be confused when
 >       bugs occur because they are not.

Exactly!  If I didn't have a reason for wanting to specify an exact order,
I wouldn't use them...  Not honoring parenthases is an inconsistancy in
an otherwise what-you-write-is-what-you-get language.

One of the primary features I like about C is that it does what I tell it
to do without placing unnecessary limitations on the constructs I create.
It's not the responsibility of the designers of a compiler to decide I
mean one thing when I explicily tell the damn fool thing I mean something
else!  That's one of the things I hate about many other languages and,
in most cases love about C.  I tend to avoid languages where the designers
of the language thought they knew more about what "should" be possible 
than I do...  Please don't standardize something that will forever
force C into the same category!!!

 >
 >     To satisfy the time critical applications, I would recommend either
 >a compile time command line option such as -Oe (optimize expressions) or
 >a reserved word #define, such as #define OPT_EXPR (which could also be
 >specified on the compiler line).  The latter would provide for code
 >transportability between standard and non-standard compilers, as well as a way
 >to compile old code on new compilers.
 >

An EXCELLENT idea!  Place the "optimization" options where they will do
the most good and the least harm.  Placing expression optimization in
the category of "options" will cause fewer peices of code to break
when ported (if they don't rely on optimization) and will, correctly,
classify code compiled with an "optimize expressions" flag as code which
may have problems if you don't use the exact same optimizations.



(SUMMARY:  (PLEASE READ THE FOLLOWING))!

  The standards committee has as one of it's obligations to define rules
that will allow writing code which will work on the largest number of 
compilers as possible with little or no change.  Allowing the compiler 
to, at it's option, totaly ignore the EXPLICIT wishes of the programmer 
is one of the easiest ways I can think of to produce a "standard" which 
unnecessarily frustrates programmers, will actualy _Cause_ portability
problems, and is, in fact not a functional "Standard" at all.

--- 
John Stanley (john@viper.UUCP)
Software Consultant - DynaSoft Systems
UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john

firth@sei.cmu.edu.UUCP (04/08/87)

In article <14813@cca.CCA.COM> g-rh@CCA.UUCP (Richard Harter) writes:
(brief excerpt)
>... Given that we have arranged things so that convergence
>is forced, i.e. so that are guaranteed to reach a point such that,
>for some n,
>
>	x[n+1] = x[n] + delta[n] = x[n],
>
>what is the proper way to test for this, if the calculations are
>being done in floating point?

I would suggest

	abs(delta[n]/x[n]) <= eps

(omit the abs if feasible).  This gives the compiler least chance
to screw you.  If you have some way to relate 'eps' to the machine
precision, even better.

mwm@eris.BERKELEY.EDU (Mike (My watch has windows) Meyer) (04/09/87)

AAAAAARRRRRGGGHHHH!!!!

ENOUGH!

C started life as a language for building systems tools, not for doing
number mangling. I've worked with C compilers that didn't have floats,
and never noticed. About the only time I've ever used a float in C is
for examples & exercises, _never_ for a useful tool.

Floats weren't handled very well in early C compiler. If the PDP-11
hadn't had a floating type, C might not have had floats at all. In
spite of the problems with floats in C, people tried going number
crunching in it. The ANSI standard makes some concessions to people
foolish enough to want to do number crunching in C. Now they start
shouting that it's not enough, and ask for more, and start asking for
a "more readable" language.

If you find C inadequate as it is, go find another language! Better
yet, design your own, write a good compiler for it, put that in the
public domain (or at least make it freely redistributable), then write
a good textbook on how to use that language for fooling around with
floating point numbers. Not only will you have what you want, you'll
have done the world a favor, and probably have become famous.

But let the C community have a language that does what it was designed
to do as well as possible: systems-level programming. Sure, you may
think a few extra instructions in an expression is minor, but when
that expression is in the inner search loop of vi, it will chew up a
lot of machine time.

Final comment: from the looks of ANSI C, an attempt was made to only
add features that wouldn't require complete compiler rewrites; changes
to the parser, or tweaks to the optimizer (I don't do enough compiler
work to be sure, though). Making parens "specify order of
evaluation" instead of "override operator precedence" would require
major work on the one C compiler I'm familiar with. It throws out
parens in the parse as it builds the tree, and then manipulates that.
You'd need a new tree node, and....

	<mike
--
Here's a song about absolutely nothing.			Mike Meyer        
It's not about me, not about anyone else,		ucbvax!mwm        
Not about love, not about being young.			mwm@berkeley.edu  
Not about anything else, either.			mwm@ucbjade.BITNET

bader#@andrew.cmu.edu.UUCP (04/09/87)

Since no one using C these days depends on parentheses forcing order
of evaluation, and lots of people use parentheses loving macros, why
not introduce another set of parentheses that *does* force order of
evaluation.  That way, order-of-evaluation weenies (:-) will have
their way and their code won't look like C (since they obviously want
to be programming in fortran), and no-one else will know the
difference (and existing code will still run as fast).

I suggest for the new parentheses:  @ and $ (BUT OH MY GOD!  MAYBE
THAT WOULD BREAK VMS SYSTEMS!!!  :-O).  So the expressions (as
recently posted):
    x = ((one + x * a) - one) * 16.0;
    y = x / ((half + x * half) *((half - x) + half));
    x = (x + eight) - eight;
    while (((b+one)-b)-one == zero);
    if ((a+betam1)-a != zero)
    if ((one - a) - one != zero)
    y2 = (y/two + y) - y;
    w = zz * xsq/(den*(den-one));
    w = (x - zz) + w;
    w = ((half-zz) + half) * ((half + zz) + half);
Would look like:
    x = @@one + x * a$ - one$ * 16.0;
    y = x / @@half + x * half$ *@@half - x$ + half$$;
    x = @x + eight$ - eight;
    while (@@b+one$-b)-one == zero$;
    if (@a+betam1)-a != zero$
    if (@one - a) - one != zero$
    y2 = @y/two + y$ - y;
    w = zz * xsq/@den*@den-one$$;
    w = @x - zz$ + w;
    w = @@half-zz$ + half$ * @@half + zz$ + half$;

Sorry for the tone of this message, but I'm getting sick of hearing
the cries of people who want to break the language to suit a
community which is a small fraction of the whole.  This is C, people,
like it or not.  If you want another language, use another language.

					-Miles

gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/09/87)

In article <790@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes:
>I'm sorry to be talking to a crowd of compiler writers, ...  Within
>reason, the speed of compiled code is a _very_minor_ cost factor in the
>code's life cycle cost, compared to the people costs of creating the code.

Reminder:  My opinions are not official X3J11 positions.

I agree with many of Mr. Dolan's points.  Undoubtedly X3J11 meetings are
predominately attended by compiler implementors (several of them
specializing in the IBM PC market, alas), which occasionally biases if
not the decisions then certainly the discussions.  It should be noted that
virtually all C compiler implementors are also C users, and there are a
few X3J11 members, such as myself, who represent the C user (as opposed
to implementor) viewpoint.

>... it is not
>a _minor_ matter when someting like the proposed unary plus override on
>parenthesis evaluation order is added to a language.  It is a utility
>destroying set of blunders by the language designers.

I think there is a misperception that should be cleared up.  Certainly
it is true that even a nominally "small" change to a language can have
major practical repercussions.  However, the dpANS for C has not changed
the significance of parentheses; C compilers have always been allowed to
reorder theoretically-commutative and -associative operations.  The new
unary plus operator provides the additional functionality of providing
the control over order of evaluation that many numeric programmers were
requesting.  Since unary plus is a new invention for C, it does not harm
existing code and has no effect unless used (when it has the desired
order-of-evaluation-forcing effect).

I don't take responsibility for these X3J11 committee decisions, but I
really do think X3J11 got them right.

I'm sorry that Mr. Dolan finds C so much harder to use effectively than
other languages.  Now that I am used to C, I find it excruciating when
I have to deal with applications written in Fortran (for example).  It
would seem that programming languages, like text editors, establish a
mind-set in their users that acts as a "definition" for what such
facilities should be like.  I'm pretty happy with C for application
development; the few significant deficiencies that I see tend not to be
the points that people in this newsgroup argue about.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/09/87)

In article <788@kodak.UUCP> grodberg@kodak.UUCP (Jeremy Grodberg) writes:
>One of the reasons I put parenthases in my #defines is so that I can be sure
>that that will be evaluated before anything around it. 

Then you're making a mistake.  Possibly your particular compiler has these
additional semantics for (), but the C language does not and never has.

What () DO accomplish is to force the syntactic parsing to deal with the
insides of the () before the outside; this can be used to override default
operator precedence and associativity, which SOMETIMES looks like forcing
order of evaluation, but really they're two different things.

edw@ius2.cs.cmu.edu.UUCP (04/09/87)

In article <kUSjqey00WABcJA0Nr@andrew.cmu.edu>, bader#@andrew.cmu.edu (Miles Bader) writes:
> evaluation.  That way, order-of-evaluation weenies (:-) will have
> their way and their code won't look like C (since they obviously want
> to be programming in fortran), and no-one else will know the
> difference (and existing code will still run as fast).

> community which is a small fraction of the whole.  This is C, people,
> like it or not.  If you want another language, use another language.
> 
> 					-Miles

    Agree totally!!



#define X			0
#define Y			1
#define _EPSILON		.0002
#define sqr(x)			(x*x)
#define distance(x1,x2,y1,y2)	(sqrt((double (sqr(x1-x2) + sqr(y1-y2)))))
#define approx_equal(x,y)	(fabs(double (x - y)) < _EPSILON)
#define same_point(x1,x2,y1,y2) (approx_equal(distance(x1,x2,y1,y2),0))

.
.
.

if (same_point(p1[X],0.0,p1[Y],0.0)) ....

  I trust that my C compiler is going to optimize the hell out of this
expression.  Is this a good enough reason to allow reordering?

-- 
					Eddie Wyatt

They say there are strangers, who threaten us
In our immigrants and infidels
They say there is strangeness, too dangerous
In our theatres and bookstore shelves
Those who know what's best for us-
Must rise and save us from ourselves

Quick to judge ... Quick to anger ... Slow to understand...
Ignorance and prejudice and fear [all] Walk hand in hand.
					- RUSH 

m5d@bobkat.UUCP (04/09/87)

In article <788@kodak.UUCP> grodberg@kodak.UUCP (Jeremy Grodberg) writes:
>	[ ... ]
>    2) *People* expect parenthases to be honored, and will be confused when
>       bugs occur because they are not.
>          Jeremy Grodberg

I don't expect parenthesis to be honored (in determination of execution order).
Does that mean I'm not a *People*?   Hmm, my friends always thought I was
odd.

Seriously, I cannot think of a SINGLE case in all my life when I suffered
from expression rearrangement.  The same goes for all the programmers 
in my office (4 others).  Do we just write different kinds of programs?
Note that I'm not doubting these parentheses horror stories; I just
think it's fascinating that I've never had one.  Oh well.

-- 
Mike McNally, mercifully employed at Digital Lynx ---
    Where Plano Road the Mighty Flood of Forest Lane doth meet,
    And Garland fair, whose perfumed air flows soft about my feet...
uucp: {texsun,killer,infotel}!pollux!bobkat!m5d (214) 238-7474

firth@sei.cmu.edu.UUCP (04/09/87)

In article <1094@ius2.cs.cmu.edu> edw@ius2.cs.cmu.edu (Eddie Wyatt) writes:

>#define X			0
>#define Y			1
>#define _EPSILON		.0002
>#define sqr(x)			(x*x)
>#define distance(x1,x2,y1,y2)	(sqrt((double (sqr(x1-x2) + sqr(y1-y2)))))
>#define approx_equal(x,y)	(fabs(double (x - y)) < _EPSILON)
>#define same_point(x1,x2,y1,y2) (approx_equal(distance(x1,x2,y1,y2),0))
>
>.
>.
>.
>
>if (same_point(p1[X],0.0,p1[Y],0.0)) ....
>
>  I trust that my C compiler is going to optimize the hell out of this
>expression.  Is this a good enough reason to allow reordering?

Maybe I'm missing something obvious, but, no, I DON'T see why this
is reason to allow reordering.  Doesn't it expand to:

	((fabs(double((sqrt(((p1[X]-0.0)*(p1[X]-0.0)) + \
	((p1[Y]-0.0)*(p1[Y]-0.0)))) - 0.0)) < 0.0002))

We can then elide "-0.0", replace "p1[X]*p1[X]" by "p1[X]*ibid",
and remove the redundant coercions and parentheses.  But I can't
see any scope for serious reordering.

braner@batcomputer.UUCP (04/10/87)

[]

My two cents worth:  How about making parens sacred (in the order-of
evaluation sense) for FP ops, but leaving it as it is for integer ops?
Thus you allow numerical considerations in scientific programming while
also allowing compiler optimizations in other kinds of code.  Now, I
expect purists to jump on me and complain that this reduces the consistency
or orthogonality or whatever of the language.  But is it worse than the '%'
(modulo) op already being for integers only (not to mention '&', etc)?
And for me, utility comes before aesthetics.  The FP programmer is always
aware of FP numbers behaving in a special way (or should be...).  Let
the compiler writer work a bit harder here too!

- Moshe Braner

guy@gorodish.UUCP (04/10/87)

>   Don, you're correct in saying that OPT_EXPR -could- have been used by
> someone prior to the standard, but in using that as a disqualifier you're
> ignoring the difference in scope. The -Oe declaration will effect the
> entire module being compiled with that flag.  Using a define allows you
> to "define" it at some point in the program and then undefine it later.

This translates as "using #defines may break existing code, but it's
more convenient".  Sorry, but I just don't see how the difference in
scope is relevant *at all*.  The fact that OPT_EXPR breaks existing
code is sufficient cause to reject it out of hand unless you can
demonstrate that the benefits that it brings are very great.

The use of "could" in your statement doesn't affect the validity of
the arguments; one could easily turn the argument around - "It's
correct to say that somebody *could* have written code that's broken
by this extra optimization, but by using that as a disqualifier
you're ignoring the fact that OPT_EXPR is a perfectly legitimate C
identifier."

>   Anyone know if the standard allows for control of compiler flags from
> within a source file??

The standard allows a compiler to implement "#pragma" directives that
can affect the compiler's behavior.  No, there's no guaranteed
standard way of saying "only do these optimizations", but nobody ever
said life was fair.  A few strategically-placed "#ifdef"s never hurt
anybody.

dlnash@ut-ngp.UUCP (04/10/87)

In article <802@viper.UUCP>, john@viper.UUCP (John Stanley) writes:
> In article <4968@ut-ngp.UUCP> dlnash@ut-ngp.UUCP (Donald L. Nash) writes:
>  >I like the -Oe idea better that the #define OPT_EXPR idea.  Making a #define
>  >directive affect the compiler is not a good idea, since someone else may be
>  >using OPT_EXPR (or almost whatever else you come up with) to mean something
>  >else.  The -Oe idea makes the most sense to me.
>  >
>  >				Don Nash
>  >
> 
>   Don, you're correct in saying that OPT_EXPR -could- have been used by
> someone prior to the standard, but in using that as a disqualifier you're
> ignoring the difference in scope.  The -Oe declaration will effect the
> entire module being compiled with that flag.  Using a define allows you
> to "define" it at some point in the program and then undefine it later.
> 
> --- 
> John Stanley (john@viper.UUCP)


You're right.  I realized that after I posted the article.  I thought
about how this could be done, and after looking through H&S, 2nd
edition, I came upon a possible answer.  The Draft Proposed ANSI C has a
new pre-processor directive, #pragma, for sending implementation
dependent stuff to the compiler.  How about somethine like "#pragma
optexp" to turn expression optimizing on and "#pragma nooptexp" to turn
it off?  That way, you could bracket parts of your code which should or
should not be optimized.  For compatibility with current implementations,
optexp could be the default. 

BTW, H&S, 2nd edition is great!  I recommend it to anyone who needs a complete
C reference manual.

				Don Nash,
                                Recovering from foot-in-mouth disease....

UUCP:    ...!{ihnp4, allegra, seismo!ut-sally}!ut-ngp!dlnash
ARPA:    dlnash@ngp.UTEXAS.EDU
BITNET:	 CCEU001@UTADNX, DLNASH@UTADNX
TEXNET:  UTADNX::CCEU001, UTADNX::DLNASH

dsill@NSWC-OAS.arpa (04/10/87)

Guy Harris wrote:
>The standard allows a compiler to implement "#pragma" directives that
>can affect the compiler's behavior.

It sounds like this is just what we need.

>No, there's no guaranteed standard way of saying "only do these optimiz-
>ations", but nobody ever said life was fair.

Surely it's not to late to devise a standard way of doing this...

>A few strategically-placed "#ifdef"s never hurt anybody.

I'm not sure I know what you mean by this.  Could you give me an example?

-Dave Sill
 dsill@nswc-oas.arpa

djfiander@watnot.UUCP (04/10/87)

In article <742@instable.UUCP> chaim@instable.UUCP (Chaim Bendelac) writes:
>First, for us, the next two sequences are of course THE SAME: 
>
>	(1)	x = (a+b)+c;
>
>	(2)	temp = a+b;
>		x = temp+c;
>
>The optimization technique which reduces (2) into (1) is called copy (or value)
>propagation. The value of 'temp' is propagated. So, if you are worried about
>the correct evaluation of (1), then writing it as (2) will not save you.

Bad news:  K&R state that if the user is really worried about order
of evaluation, then she should use something like code fragment (2).
This, rather obviously, blows any compiler which does what you describe
out of the water.

-- 
"Are you police officers?"
	"No ma'am, we're musicians."

UUCP  : {allegra,ihnp4,decvax,utzoo,clyde}!watmath!watnot!djfiander
CSNET : djfiander%watnot@waterloo.CSNET

djfiander@watnot.UUCP (04/10/87)

In article <802@viper.UUCP> john@viper.UUCP (John Stanley) writes:
>  Anyone know if the standard allows for control of compiler flags from
>within a source file??

Yes, the standard calls them 'pragmas'.

-- 
"Are you police officers?"
	"No ma'am, we're musicians."

UUCP  : {allegra,ihnp4,decvax,utzoo,clyde}!watmath!watnot!djfiander
CSNET : djfiander%watnot@waterloo.CSNET

jpn@teddy.UUCP (04/10/87)

|||      To satisfy the time critical applications, I would recommend either
||| a compile time command line option such as -Oe (optimize expressions) or
||| a reserved word #define, such as #define OPT_EXPR (which could also be
||| specified on the compiler line).
||
|| Making a #define directive affect the compiler is not a good idea ...
|
|The -Oe declaration will effect the entire module being compiled with that
|flag.  Using a define allows you to "define" it at some point in the program
|and then undefine it later.

Isn't this what #pragma is for?  Use #pragma optimize_expressions and
#pragma dont_reorder_expressions.  Microsoft C 4.0 has something like this
for stack probes.  Is this a mis-use of the #pragma directive?

bright@dataio.UUCP (04/10/87)

In article <802@viper.UUCP> john@viper.UUCP (John Stanley) writes:
-  Anyone know if the standard allows for control of compiler flags from
-within a source file??

#pragma

dik@mcvax.UUCP (04/12/87)

In article <26916@rochester.ARPA> crowl@rochester.UUCP (Lawrence Crowl) writes:
 > In article <852@bobkat.UUCP> m5d@bobkat.UUCP (Mike McNally (Man Insane)) writes:
 > >Seriously, I cannot think of a SINGLE case in all my life when I suffered
 > >from expression rearrangement.  The same goes for all the programmers 
 > >in my office (4 others).  Do we just write different kinds of programs?
Do you use floating point?
 > >Note that I'm not doubting these parentheses horror stories; I just
 > >think it's fascinating that I've never had one.  Oh well.
 > 
 > Well, I cannot say I have been bit, ...
Well, I have been bit.  Strange though, it was in Fortran.  (A - B) - B
compiled as A - (B + B).  However, I was lucky because A - B - B compiled
as (A - B) - B.
(Optimization?  No damn optimization, we just rearrange your code
and see what happens!)
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

manis@ubc-cs.UUCP (04/12/87)

In article <852@bobkat.UUCP> m5d@bobkat.UUCP (Mike McNally (Man Insane))
writes:

>Seriously, I cannot think of a SINGLE case in all my life when I suffered
>from expression rearrangement.  The same goes for all the programmers 
>in my office (4 others).  Do we just write different kinds of programs?
>Note that I'm not doubting these parentheses horror stories; I just
>think it's fascinating that I've never had one.  Oh well.

There are two reasons why one would care: ill-conditioned computations
(which can occur in either integer or floating-point operations), and access
to variables whose contents can be asynchronously changed. The second case
is handled by the 'volatile' specifier, and therefore need not be pursued.

The first case occurs of course when somebody wants to write an expression
and to impute significance to the order of operations (thus converting an
algebraic expression into a programming language construct). C of course has
a number of sequential operators, including && and ?: ; however, there's no
reason to make dyadic '+' a sequential operator, just so that a small number
of programmers don't have to worry about compilers generating code that was
too good. I think that the monadic '+' operator, coupled with the ability
to control evaluation by introducing temporary variables, does an excellent
job of allowing programmers who need sequential evaluation to specify it, in
a compiler-independent way, while letting the rest of us have better code.

Like Mike, I'd probably never ask a compiler to stop optimising (well, I did
used to ask the IBM PL/I Optimising Compiler not to, but that was because
when you turned on the code improver, the generated code became not only
buggy but also sometimes slower!). 

-----
Vincent Manis                {seismo,uw-beaver}!ubc-vision!ubc-cs!manis
Dept. of Computer Science    manis@cs.ubc.cdn
Univ. of British Columbia    manis%ubc.csnet@csnet-relay.arpa  
Vancouver, B.C. V6T 1W5      manis@ubc.csnet
(604) 228-6770 or 228-3061

"BASIC is the Computer Science equivalent of 'Scientific Creationism'."

mouse@mcgill-vision.UUCP (der Mouse) (04/12/87)

In article <1283@dataio.Data-IO.COM>, bright@dataio.Data-IO.COM (Walter Bright) writes:
> How about allowing numbers to be specified in binary radix, as in
> 0b00110101 ? [...] It seems odd that decimal, octal, hex, floating
> and ascii formats are supported, but not binary in a language
> supposedly designed for bit twiddling!

Good idea.  I've often wished for this.

> As a corollary, support %b in printf and scanf.

This will get dreadfully confusing.  At least in BSD UNIX, there is
already a %b in the kernel's version of printf; it is designed for
printing device register bits.  For example, this (hypotheical) error
message

hw3c: hard error sn12345: csr=402<IE,ERR> ers=1002<CRC,HARD>

might have been produced by (yes, I know it would really call
harderr(), be quiet, this is an example!) the following printf() call:

printf( "hw%d%c: hard error sn%d: csr=%b ers=%b\n",
	unit >> 3,
	"abcdefgh"[unit&7],
	blkno,
	hwdev->csr, "\10\2IE\3GO\5READY\11ERR",
	hwdev->ers, "\10\1SOFT\2HARD\4TIMEO\12CRC\13NODATA" );

> Also, support specifying floating numbers in hex radix. This would
> avoid horrible kludges like:

> [horrible kludge, using pointer punning]

> The syntax would be:
> 		double def = 0x1324.5E678e+0x12;
> 		double abc = -0x.1B4F;
> def prescribes a mantissa of 0x12345E678 and an exponent of
> (4*16+0x12).

> Note that a + or - would be required before the exponent, to
> distinguish the e from the digit e.

Not good enough.  "double def = 0x1234e+0x12" can be read as either
"mantissa 1234, exponent 4*16+0x12"
or
"value 0x12360" (sum of 0x1234e and 0x12).

> Such syntax would be very useful to those of us writing numerical
> subroutines. Numerical analysis books frequently give the constants
> to use specified in hex or octal, so as to control the resulting
> values exactly.

And if you are on a BCD machine, rather than a binary machine?  Or,
though I doubt any such actually exist, a ternary machine?

I guess there just aren't any good numerical algorithms for BCD or
ternary machines....:-)

					der Mouse

				(mouse@mcgill-vision.uucp)

franka@mmintl.UUCP (04/15/87)

In article <14681@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes:
>We have a iterative process which generates a floating point number,
>x, with the following iteration equation
>
>	x <- x + delta
>
>where delta is a function of x and where it is expected that the
>iteration converges to some limit point x0.
>
>The iteration will have converged if the iteration equation does
>not change the value of x; i.e. we will have found the limit point
>of the sequence, given that we are using floating point arithmetic
>of the machine in question.  The question then is, how do we test
>for this condition?  Is the following pseudo code acceptable?
>
>	if ((x+delta)==x) then terminate_iteration
>	else continue_iteration
>
>If not, why not?  Alternatively should one use
>
>	if (delta>0.) then
>		if ((x+delta)<=x) then terminate_iteration
>		else continue_iteration
>	else if (delta<0.) then
>		if ((x+delta)>=x) then terminate_iteration
>		else continue_iteration
>	else terminate_iteration
>
>or is this also unacceptable?

As far as I can tell, based on comments made by various people, there are
machines and C compilers in existence where neither of these will work.  In
either case, x+delta may be in a register with more precision than x.

I assert that the first test should work.  When the test is for equality,
the compiler should do the extra work to ensure that there are no extra bits
floating around.  In particular, if x and y are expressions of the same
type, then x == y should be true if assignment of x and y to any variable of
that type would give the same result. There may still be cases where x<=y is
false, while x==y is true, which is ugly and potentially dangerous; but one
can't have everything.

(Actually, based on the above, the first example is still not guaranteed.
What should work is:

	y = x + delta;
	if (x == y) terminate the iteration;
	x = y;

If we compare x directly to x+delta, we will have to write the expression
(x+delta) again to update x; and the compiler may redo the computation and
get a different result.)

This sort of thing is one of the reasons for, as somebody suggested, writing
the floating point libraries in assembler.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

tps@sdchem.UUCP (04/15/87)

In article <26916@rochester.ARPA> crowl@rochester.UUCP (Lawrence Crowl) writes:
>>Seriously, I cannot think of a SINGLE case in all my life when I suffered
>>from expression rearrangement...
>>--(Mike McNally (Man Insane))

>The physical constant h_bar is very
>close to zero and a number of physical equations involve h_bar squared.  This
>value was not representable in VAX floating point at the time.  I had to
>carefully arrange my expressions so that all intermediate values were in
>range.  For example, (foo / h_bar) / h_bar works, but the faster and
>mathematically equivalent expression foo / (h_bar * h_bar) does not work.
>  Lawrence Crowl		716-275-5766	University of Rochester

So you should have measured "foo" in units of
h_bar, (or h_bar squared?) then you
don't even need the division.  I
would find keeping track of what might overflow in
such a situation a horrible headache.
	
	(foo / h_bar) / h_bar
	
might just work, but then something
else a couple of expressions later might bomb out.
Also, you need comments all along the way to
explain your non-mathematically (as opposed to
numerically) arranged expression.




|| Tom Stockfisch, UCSD Chemistry	tps%chem@sdcsvax.ucsd.edu
					or  sdcsvax!sdchem!tps

lambert@mcvax.UUCP (04/16/87)

>> For example, (foo / h_bar) / h_bar works, but the faster and
>> mathematically equivalent expression foo / (h_bar * h_bar) does not work.
> 
> So you should have measured "foo" in units of h_bar ...

Measures indicating h_bar units are hard to come by*. Of course you want
to do the computations in such units, but then you have to convert the
measurements first.
________
*I believe they keep a platinum h_bar in the Paris Bureau de Poids et
des Mesures, but I understand they are loath to lend it out.

Can we stop this discussion now? (Summary: From the point of view of people
who want to do serious numerical work it was a mistake that fp operations
were handled as if they were true to the algebraic properties of their
mathematical abstractions, which is good enough for most of the people all
of the time, and for all of the people (numerical analysts are not people)
most of the time, but not ... . But don't forget that C was really designed
for systems programming and such. There was already a work-around by using
assignments to temporaries, which was really hard on the people who wanted
to control the exact sequencing of the operations, and now there is this
kludge that no-one is really happy with but that sure makes life easier for
them.  Proposals to change the C semantics for fp operations only still
leave some other rearrangement problems unresolved and are probably not a
good idea anyway.  We'll have to live with it. As a consolation I can say
that the situation has not gotten worse because of the unary + and that
this is fortunately :-) not the worst problem with C anyway, far from it.)

-- 

Lambert Meertens, CWI, Amsterdam; lambert@cwi.nl

henry@utzoo.UUCP (Henry Spencer) (04/16/87)

>   2) *People* expect parenthases to be honored...

Competent C programmers, who understand the language they are coding in,
expect parentheses to be honored in the way the language is AND HAS ALWAYS
BEEN specified:  they dictate grouping but do not force evaluation order.
This is NOT an invention of X3J11, it is the way C has always been.  See
K&R, page 185, paragraph 2.
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

henry@utzoo.UUCP (Henry Spencer) (04/16/87)

>   The standards committee has as one of it's obligations to define rules
> that will allow writing code which will work on the largest number of 
> compilers as possible with little or no change.  Allowing the compiler 
> to, at it's option, totaly ignore the EXPLICIT wishes of the programmer 
> is one of the easiest ways I can think of to produce a "standard" which 
> unnecessarily frustrates programmers, will actualy _Cause_ portability
> problems, and is, in fact not a functional "Standard" at all.

If you re-read paragraph 2 on page 185 of K&R, you will find that C has
always been this way.  C programmers who expect anything different do not
understand the language they are programming in, which is a guaranteed
way to become frustrated and cause portability problems.  X3J11 has always
given high priority to preserving the functioning of existing legitimate
C programs; programs which depend on order of evaluation do not come under
this heading, since C has always allowed rearrangement.

Some languages do interpret parentheses as an expression of the programmer's
wish to force order of evaluation.  Some do not.  C does not.  The problem
is not that the compiler is ignoring the programmer's wishes, but that the
programmer does not understand how to express his wishes in C (as opposed
to Fortran).

I assure you that when you understand the language, it is not difficult to
write code that will work on the largest number of computers possible with
little or no change.
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

tweten@orville.UUCP (04/19/87)

In article <7918@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>Some languages do interpret parentheses as an expression of the programmer's
>wish to force order of evaluation.  Some do not.  C does not.
                                     ^^^^^^^^^^^

This inspires what to me is an interesting question.  Which common high
level computer languages, other than C, have parentheses but don't use
them "to force order of evaluation"?

mouse@mcgill-vision.UUCP (05/06/87)

In article <1307@ames.UUCP>, tweten@orville (Dave Tweten) writes:
> In article <7918@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>> Some languages do interpret parentheses as an expression of the
>> programmer's wish to force order of evaluation.  Some do not.

> This inspires what to me is an interesting question.  Which common
> high level computer languages, other than C, have parentheses but
> don't use them "to force order of evaluation"?

Lisp.
sh/csh.
TeX (though it uses {} rather than ()).
APL

					der Mouse

				(mouse@mcgill-vision.uucp)