[comp.lang.c] Optimizing Floating-Point Expression Evaluation

dgh@dgh.UUCP (04/14/87)

SUMMARY: Some recent postings have been missing the point again.
There's an optimal way to evaluate floating-point
expressions, which is to respect parentheses.  Details follow,
but first...

	I am NOT talking about integer expression evaluation.
In C, I can readily believe that there is a significant body of
code that was written with the expectation that constants would
be combined despite parentheses.  I don't think it was wise to
code that way, but there the issue is integer overflow, which many
people understand, rather than floating-point rounding errors,
which sooner or later confuse everyone who encounters them.

	I am NOT posting to complain about the ANSI C committee,
whose members deserve credit for recognizing that the problem was 
serious enough to warrant fixing.  Like many others, 
I think using unary + to signify "I mean this" is pretty odd, 
but it's better than ignoring the problem.  Thus what follows is addressed
more to the implementors of existing C compilers, who will all
have a good way to go to meet the ANSI standard anyway if it's
approved in its present form.

	I AM interested in exposing the real costs of ignoring 
parentheses and the real benefits of respecting them.  
The purported benefit of ignoring parentheses in statements like

	w = (x + y) + z

is that sometimes faster code can be generated.  Possibly in some
cases for some architectures, but interestingly enough
this claim was actually investigated right at the time
C and Unix were being invented.

	Morven Gentleman, who was at Bell Labs in the early 1970's,
heard about this alleged optimization and investigated it empirically,
collecting statistics on programs actually being run there, to find out 
how much was saved.  As might be expected, the answer was no significant amount
of execution time and slight saving in stack space.   He related to me 
that he presented the evidence to Steve Johnson but that the latter had
already made up his mind otherwise.  So much for the benefits.

	What of the costs?  In carefully-written code I would think that
if the order of the additions did not matter, you would just write

	w = x + y + z

and leave it up to fate.  If you write something else you mean
something else.  Lacking a feature equivalent to ANSI-style unary +, 
to get the intended effect it's necessary to write

	t = x + y
	w = t + z

If you had just run out of floating-point registers, this may mean 
wasted extra cycles to write and read storage.  You may even have
to allocate stack space for the temporary that otherwise wouldn't have
been needed.  Now which is more efficient?

	I think this and other aspects of optimization are encapsulated
in the following proposition:  optimizing compilers should be designed to
improve carefully-written code by means that can not be readily expressed
in the source language.  I would have thought that most experienced C
programmers would agree with that proposition, but some recent postings
suggest otherwise: that a major purpose of optimizing compilers is to clean
up after careless coders and thereby encourage careless coding.
	  

David Hough

ARPA: dhough@sun.com
UUCP: {ucbvax,decvax,allegra,decwrl,cbosgd,ihnp4,seismo}!sun!dhough

guy%gorodish@Sun.COM (Guy Harris) (04/14/87)

>	I am NOT talking about integer expression evaluation.
>In C, I can readily believe that there is a significant body of
>code that was written with the expectation that constants would
>be combined despite parentheses.  I don't think it was wise to
>code that way,

Given that a lot of code was written in this fashion because it used
macros, what's your alternative?  Macros provide a good facility for
isolating various abstractions (e.g., the number of disk blocks
required to hold N bytes), and many times an expression involving
several such macros can be evaluated more efficiently if constants
are combined despite parentheses.

bright@dataio.Data-IO.COM (Walter Bright) (04/14/87)

In article <16646@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes:
>	I think this and other aspects of optimization are encapsulated
>in the following proposition:  optimizing compilers should be designed to
>improve carefully-written code by means that can not be readily expressed
>in the source language.  I would have thought that most experienced C
>programmers would agree with that proposition, but some recent postings
>suggest otherwise: that a major purpose of optimizing compilers is to clean
>up after careless coders and thereby encourage careless coding.

I disagree completely, for the following reasons:
    o	Some ways of writing expressions work well on one CPU architecture,
	and some work well on others. Witness the recent discussions on
	various ways of coding strcpy(). Having the programmer investigate
	which of (*p++ = c) or ((*p = c),p++) is more efficient is a waste
	of time.
    o	You imply that non-optimal code is careless. That could be true,
	however, it is frequently not true. The goals of readability and
	maintainability are not the same as optimality. An example would
	be writing heavily optimized assembler. Few would argue that
	the resulting code is usually extremely fragile and obtuse. The
	same thing happens when doing source code tuning.
	For example, I believe that:

		for (i = 0; i < MAX; i++)
			array[i] = j * k - 5 + 10;

	is more readable and maintainable than:

		temp = j * k + 5;
		p = &array[0];
		do
		{	*p = temp;
			++p;
		} while (p < &array[MAX]);

	which is what my compiler does.
    o	Trying to do register allocation by coloring by hand is a real
	pain!
    o	Source code optimization takes programmer time, which is more
	expensive than optimizer time.
    o	I have been writing C, assembly code and optimizing compilers
	for years. Everything I know I try to build into the optimizer
	and the code generator. This means that all the users of my
	compiler get the benefit of my knowledge of the 8088.

henry@utzoo.UUCP (Henry Spencer) (04/16/87)

> ... optimizing compilers should be designed to
> improve carefully-written code by means that can not be readily expressed
> in the source language.  I would have thought that most experienced C
> programmers would agree with that proposition, but some recent postings
> suggest otherwise: that a major purpose of optimizing compilers is to clean
> up after careless coders and thereby encourage careless coding.
	  
Not quite.  The correct statement is that a major purpose of optimizing
compilers is to do machine-specific optimizations on machine-independent
code.  In C in particular, it is possible to express many optimizations
in the source code.  Unfortunately, when one does this, one often makes
the code more efficient on one machine at the cost of making it less
efficient on others.  Sometimes, in fact, one makes it more efficient on
one machine at the cost of making it unrunnable on others.  Almost always
one makes it more efficient on one machine at the cost of making it less
readable to human beings.  One legitimate role of optimizing compilers
is to remove the need for such penny-wise-pound-foolish "optimization".
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

flaps@utcsri.UUCP (04/19/87)

David Hough writes that if you say "(a + b) + c" you mean for the evaluation
to be done in that order, and if you don't care you will say "a + b + c".
That's fine, but please explain how you can know whether or not I care about
evaluation order in the following:

	#define THING (a + 3.5)

	...

	c = (THING + 1.1) * 2.6;


How do I say that I want the 3.5 and 1.1 to be added at compile time?
Assuming I say this by #defining THING as "a + 3.5" (which is bug-causing
practice, by the way), the bigger question is how do I get you to
multiply the 2.6 into the parentheses to compile as "c = (a*2.6) + 11.96"?

-- 

Alan J Rosenthal

flaps@csri.toronto.edu, {seismo!utai or utzoo}!utcsri!flaps,
flaps@toronto on csnet, flaps at utorgpu on bitnet.

"Probably the best operating system in the world is the [operating system]
made for the PDP-11 by Bell Laboratories." - Ted Nelson, October 1977

jimv@omepd (Jim Valerio) (04/21/87)

In article <4616@utcsri.UUCP> flaps@utcsri.UUCP (Alan J Rosenthal) writes:
>please explain how you can know whether or not I care about
>evaluation order in the following:
>	#define THING (a + 3.5)
>	...
>	c = (THING + 1.1) * 2.6;
>How do I say that I want the 3.5 and 1.1 to be added at compile time?

The answer is simple: the compiler can't read your mind.  In this
particular example, assume for example that all the variables are IEEE
single precision numbers, all intermediate arithmetic is done in single
precision, and `a' has the value -4.6, correctly rounded.  In this case,
	(a + (3.5 + 1.1))
is zero and
	((a + 3.5) + 1.1)
is nonzero and long ways away from underflowing.  Which answer is "right"?

Can you guarantee that `a' is never -4.6 here, so the only difference
between the two expressions is a rounding error?  When computing the
expression, in which direction do you want rounding errors to be
incurred?  How is the result going to be used?  Will errors cancel
out here, or will they accumulate?

Welcome to the world of error analysis.


But I think you are asking the wrong question.  I would ask: why
partake in the reprehensible practice of hiding the floating-point
arithmetic behind a macro name if you don't want it to be taken a an
indivisible entity?  Why declare `THING' unless you want to treat it in
the program as a single variable?  If `THING' is imitating a single
variable, then the parentheses should be honored.  If `THING' is
imitating a cheap function call, then following the parentheses exactly
mimics the desired semantics.

My point is that this is an example of intent to evaluate the expression
in the parenthesized order, or plain poor code.
--
Jim Valerio	{verdix,intelca!mipos3}!omepd!jimv, jimv@omepd.intel.com

dave@onfcanim.UUCP (04/29/87)

In article <589@omepd> jimv@omepd.UUCP (Jim Valerio) writes:
>
>But I think you are asking the wrong question.  I would ask: why
>partake in the reprehensible practice of hiding the floating-point
>arithmetic behind a macro name if you don't want it to be taken a an
>indivisible entity?  Why declare `THING' unless you want to treat it in
>the program as a single variable?  If `THING' is imitating a single
>variable, then the parentheses should be honored.  If `THING' is
>imitating a cheap function call, then following the parentheses exactly
>mimics the desired semantics.
>
>My point is that this is an example of intent to evaluate the expression
>in the parenthesized order, or plain poor code.

I strongly disagree.  For example, I have the following macros in a
piece of code that calculates the voltage levels of portions of a TV
waveform:

	#define SETUP 7.5

	#define IRE(mv)	((mv)*140.0/1000.0)
	#define MV(ire)	((ire)*1000.0/140.0)

Now, these are macros for the usual good reasons:

1) conversion from IRE units to millivolts and vice versa are "logically"
   functions, and I want the code to read that way rather than be cluttered
   by the details of the computation every time it is done - this improves
   the readability of the code a great deal.  Yet there is no need to
   make them actual function calls.

2) All of the definitions above are appropriate for NTSC, but wrong for
   one of the European colour systems, so I use a #ifdef to conditionally
   define these macros.  This seems cleaner to me than conditionally-
   compiled code scattered through the program, conditionally-compiled
   real functions, or an "if" statement in a real function

So will you grant that I have good reasons for using a macro?

Yet, when I actually use

		y = MV(y*(100.0-SETUP) + SETUP);

I would be happy to have the compiler rearrange the entire expression
in any way that it wants, regardless of where the parentheses in the
macro happened to fall.  I just want the approximate answer.
Any expression rearrangement or constant folding that speeds execution
is fine with me; I don't care about order of evaluation because the
roundoff errors will be smaller than I care about no matter what.

In fact, there are many calculations done in computer graphics where
only the first 12 or 16 bits of the mantissa are significant, and
roundoff need not be worried about.

To restate that:  Yes, the macro is written like a function call because
I want to *think* about it as a function call.  But I don't want nor
expect it to be *evaluated* as if it were a function call, since no
macro is in fact evaluated like a function call.

Macros are *not* exactly the same as a function call - anybody who calls
a macro with parameters that have side effects has to be aware of this.
If I *really* want a function call's semantics - all arguments evaluated
first, the computation performed, then a single value returned for
further use - I'll *write* a function.  But if a macro is adequate for
what I want, I'll use it.  Please credit me with the intelligence to
determine which is appropriate.

People who expect macros to behave exactly like functions are wrong.
Let's educate them, or instruct them not to use macros at all, rather than
hobbling the language to make it fit their expectations.

jimv@omepd (Jim Valerio) (05/05/87)

In article <15296@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes
in defense of the use of macros in floating-point calculations, and gives
an example of where the order of evaluation is potentially harmless.

I believe that 3 different questions are at issue in the discussion here:
(1) Are there valid reasons to do floating-point calculations in
    the context of a macro?
(2) Are there situations where the order of evaluation of floating-point
    sub-expressions radically affects the computed result?
(3) Should macros behave like functions rather than simple textual
    substitutions?

I don't believe there's any argument about (1) and (2): the answer to
both is clearly yes.  In both cases one can construct special situations
where the yes answers are not necessary.  I believe that my attack on the
`THING' macro (see <589@omepd>) fits in this latter category.

In the same special sense, I question Dave's assertion that in the following
code,
	#define SETUP 7.5
	#define MV(ire)	((ire)*1000.0/140.0)
		y = MV(y*(100.0-SETUP) + SETUP);
he "just want[s] the approximate answer".

In particular, Dave says:
>Any expression rearrangement or constant folding that speeds execution
>is fine with me; I don't care about order of evaluation because the
>roundoff errors will be smaller than I care about no matter what.
>In fact, there are many calculations done in computer graphics where
>only the first 12 or 16 bits of the mantissa are significant, and
>roundoff need not be worried about.

Now, I don't know how the result `y' is used here, or exactly what
domain the source `y' is in, but it is trivial to construct an example
where at NO bits of significance are left in the *mantissa*, all
because of the few rounding errors in this expression.  [In single
precision IEEE floating-point, look at y = -0.0810810849, where the
parenthesized expression yields -3.40597967e-06 and the constant folded
expression (y * ((100.0 - 7.5)*1000.0/140.0)) + (7.5*1000.0/140.0)
yields 0.] Of course, the absolute error here might (or might not) be
perfectly acceptable, but to find out requires looking at every place
the data is used.

Folks, let me remind you: it is NOT TRUE that rounding errors can
overwhelm a computation only if vast number of them accumulate.  It is
also NOT TRUE that a few rounding errors can overwhelm a computation
only if accompanied by massive cancellation.


And now back to:
(3) Should macros behave like functions rather than simple textual
    substitutions?

In some sense, it is pointless to debate this question since the
language and common practice provide the simple answer that macros
are simple textual substitutions.  Should's will not change what
is.

Dave says:
>Yes, the macro is written like a function call because
>I want to *think* about it as a function call.  But I don't want nor
>expect it to be *evaluated* as if it were a function call, since no
>macro is in fact evaluated like a function call.

I guess I wish that the C preprocessor was more tightly integrated into
the language so arguments to macros would have an implied precedence.
Then I wouldn't need to introduce the extra parentheses that I wouldn't
write if I were writing out the expression longhand.  For example, if
the macro substitution were done after the expression tree for an
expression involving a macro was constructed, then
	#define islower(ch)	ch >= 'a' && ch <= 'z'
would work for
	islower(c)		/* normally wouldn't put ()'s here */
and
	islower(c & 0x7f)	/* islower def *wants* ()'s now */
(Please note that constant folding would then be done after the all the
macro substitutions had been performed.)

I honestly don't see any advantage, other than the historical
simplification of a piecemeal implementation of a C compiler, to
completely separating the C macro facilities from the compiler proper.
The nagging necessity of protecting against side-effects in macros, and
the compiler being forced in the interests of generating good code to
generally presume the non-programmer-significance of parentheses, are
two nits of using C that could have been avoided with a better
integrated macro facility.
--
Jim Valerio	{verdix,intelca!mipos3}!omepd!jimv, jimv@omepd.intel.com

dsill@NSWC-OAS.arpa (05/06/87)

Jim Valerio <jimv@omepd> wrote:
>And now back to:
>(3) Should macros behave like functions rather than simple textual
>    substitutions?

No, macros are macros and they should behave like macros.

>I guess I wish that the C preprocessor was more tightly integrated into
>the language so arguments to macros would have an implied precedence.
>Then I wouldn't need to introduce the extra parentheses that I wouldn't
>write if I were writing out the expression longhand.  For example, if
>the macro substitution were done after the expression tree for an
>expression involving a macro was constructed, then
>	#define islower(ch)	ch >= 'a' && ch <= 'z'
>would work for
>	islower(c)		/* normally wouldn't put ()'s here */
>and
>	islower(c & 0x7f)	/* islower def *wants* ()'s now */

A better solution, which I saw in a magazine once upon a time, is a for
mechanism using "inline" functions.  These functions are defined like
normal functions but with the storage class ``inline''.  So islower
would be declared something like:

inline islower(ch)
char ch;
{
    return (ch >= 'a' && ch <= 'z');
}

The difference is that the code for islower would be generated at each
place the function is called, thus avoiding the function call overhead.
I haven't followed the X3J11 committee very closely, so I don't know if
this was even considered.

-Dave Sill
 dsill@nswc-oas.arpa

guy@gorodish.UUCP (05/07/87)

> A better solution, which I saw in a magazine once upon a time, is a for
> mechanism using "inline" functions. ... I haven't followed the X3J11
> committee very closely, so I don't know if this was even considered.

I doubt it was.  This feature is present in C++, which is probably
what the article was referring to (or, at least, alluding to).  I
suspect that if it came up at all, it was considered a Nice Idea, but
not mature enough for standardization.