[comp.lang.c] C and overflow anomolies

kent@xanth.UUCP (04/11/87)

In article <995@wanginst.EDU> mckeeman@wanginst.EDU (William McKeeman) writes:
>I am a bit frustrated by people talking past each other on this topic.  The
>order of evaluation rules in C are a blanket over several subproblems.
>Conclusion: when making a pitch for a better C definition rule, identify
>which of the problems:
>    unrestricted optimization
>    overflow anomalies
>    roundoff anomalies
>    side effect anomalies
>it is designed to solve.  If you don't intend for your rule to have
>language-wide implications, state the limited area of application.
>-- 
>W. M. McKeeman            mckeeman@WangInst
>Wang Institute            decvax!wanginst!mckeeman
>Tyngsboro MA 01879

I agree that this is an excellent idea, and, being nothing if not opinionated,
and having freely confessed my disenchantment with C in its present state in
this forum previously, I will start this discussion off with four new subjects
corresponding to (Mr., Mrs., Miss, Ms., Dr., (?)) McKeeman's plan, and then
sit back and read the responses.

A few times in 26 years of coding, I have found myself working at the limits
of a data type, and having to rearrange code to assure that overflow or
underflow did not take place, during a sound, but not robust calculation.
Where control of execution order was needed, nothing else would do.

A previous poster has noted that a compiler writer who is on an optimization
spree (not meant to be derogatory, I have a lot of respect for this skill,
and spent a long bar evening listing to some justifiable bragging by one of
the practicioners of this bit of arcana) will not respect anything you can
do to force a certain code order, since s/he feels free to rearrange
statements, elide temporaries, rearrange loops, rearrange blocks, reduce
strength  (change your multiplies to adds, for example), and recent
publications suggest that separate compilation may not be good enough,
either, to prevent unexpected interactions of two pieces of code which were
certainly separate in the programmer's mind.

Mathematical programmers (and, after its 30 years in continuous use for the
purpose, a great many of these are FORTRAN users) who turn to C for whatever
reason are occassionally astonished by the disrespect of C compilers for the
parentheses the programmer just so carefully included to control execution
order.  Perhaps the proper response is to tell the mathematical programmer
to look elsewhere for a language to use, but the problem exists for the less
demanding user, too.

C is famous (notorious) for its variety of integer variables, signed and
unsigned, and byte, word, and double word (and other) sizes make the
creation of portable code a task for the cautious and insightful programmer.
The utility of these many integer sizes is manifest; it allows systems
tables (structures) to be created with a minimum size by using just the
right mix of integer variable sizes.

But now, if arithmetic must be done with these variables, the problem of
overflow arises.  One solution is surely to pick the next larger integer
size when overflow threatens.  Usually, however, if the integer size is
chosen to match the range of the data to be stored, this is an inappropriate
and wasteful choice, compared to arranging the arithmetic in the proper
order to avoid overflow.

To do this, the compiler must be able to understand when the programmer
wants the code executed exactly as written, and leave it alone.

The ANSI C dpANS, I understand, has chosen to overload the unary plus with
this task.  I have two problems with this.  First it is {counterintuitive,
ugly,unmathematical} (pick one).  I have noted earlier, that to the user
coming from a secondary school environment, the unary plus is a no-op, just
as s/he learned in algebra class.  It really trashes out the appearance of
an expression containing it (although C is already in the top five, in the
running for world's ugliest major language) (need I mention that is one of
the opinions promised above?).  ;-)  Stray operators floating around don't
look much like the mathematical equations to which the new user is presumably
accustomed.

Second, this may provide insufficient or inappropriate granularity of
control against the optimizing compiler.  I may need to control the order
of execution of a series of statements which the compiler may decide to
rearrange.  The unary plus doesn't cut it here, but the compiler author
has promised to look everywhere for optimizations.  How do I stop the code
from exercising optimization where it is dangerous to the success of the
program, without turning it off where it is harmless or beneficial?  I
think the standard should include a better, prettier method of control of
optimization scope and degree.

Comments?

Kent.
--
The Contradictor	Member HUP (Happily Unemployed Programmers)    // Also
								      // A
Back at ODU to learn how to program better (after 25 years!)	  \\ // Happy
								   \// Amigan!
UUCP  :  kent@xanth.UUCP   or    ...{sun,cbosgd,harvard}!xanth!kent
CSNET :  kent@odu.csnet    ARPA  :  kent@xanth.cs.odu.edu
Voice :  (804) 587-7760    USnail:  P.O. Box 1559, Norfolk, Va 23501-1559

Copyright 1987 Kent Paul Dolan.			How about if we keep the human
All Rights Reserved.  Author grants		race around long enough to see
retransmission rights recursively only.		a bit more of the universe?

aeusemrs@csun.UUCP (04/12/87)

In article <820@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes:
[edited for space]
|A previous poster has noted that a compiler writer who is on an optimization
|spree (not meant to be derogatory, I have a lot of respect for this skill,
|and spent a long bar evening listing to some justifiable bragging by one of
|the practicioners of this bit of arcana) will not respect anything you can
|do to force a certain code order, since s/he feels free to rearrange
|statements, elide temporaries, rearrange loops, rearrange blocks, reduce
|strength  (change your multiplies to adds, for example), and recent
|publications suggest that separate compilation may not be good enough,
|either, to prevent unexpected interactions of two pieces of code which were
|certainly separate in the programmer's mind.
|
|Second, this may provide insufficient or inappropriate granularity of
|control against the optimizing compiler.  I may need to control the order
|of execution of a series of statements which the compiler may decide to
|rearrange.  The unary plus doesn't cut it here, but the compiler author
|has promised to look everywhere for optimizations.  How do I stop the code
|from exercising optimization where it is dangerous to the success of the
|program, without turning it off where it is harmless or beneficial?  I
|think the standard should include a better, prettier method of control of
|optimization scope and degree.
|
|Comments?   [just a couple questions for now]

|Copyright 1987 Kent Paul Dolan.  All Rights Reserved.  Author grants
|retransmission rights recursively only.
    [I guess I must include his copyright message...]


    Does optimization mean the changing of the internal workings of something,
such that the inputs, and the results or outputs of the ``black box'' are the
SAME, and of course, the process is less consuming in some way, typically
time?  Is this definition use only in theory, or is it used in practice too?

    Can you please give me an example of the case you alluded to, in your last
paragraph above?  I have never felt the need to have anything but, the full
unrepressed optimizations, global or local, of the the compiler.  With that I
have one comment, I can see the use of storing values into areas of memory (or
was that i/o, :-)) declared as volatile in C.

    What type of ``unexpected interactions of two pieces of code which were
certainly separate in the programmer's mind'' where you referring to in your
first paragraph?
-- 
Mike Stump, Cal State Univ, Northridge Comp Sci Department
uucp: {sdcrdcf, ihnp4, hplabs, ttidca, psivax, csustan}!csun!aeusemrs

kent@xanth.UUCP (Kent Paul Dolan) (04/13/87)

In article <820@xanth.UUCP> kent@xanth.UUCP I wrote:

[edited, paraphrased, etc. to save space; we've seen it twice already]
...a compiler writer...feels free to [...muck about like mad several ways...],
and recent publications suggest that separate compilation may not be good
enough, either, to prevent unexpected interactions of two pieces of code which
were certainly separate in the programmer's mind.

Second, this may provide insufficient or inappropriate granularity of
control against the optimizing compiler. [...]
of execution of a series of statements which the compiler may decide to
rearrange.  The unary plus doesn't cut it here, but the compiler author
has promised to look everywhere for optimizations.  How do I stop the
[compiler]
from exercising optimization where it is dangerous to the success of the
program, without turning it off where it is harmless or beneficial?  I
think the standard should include a better, prettier method of control of
optimization scope and degree.

Comments?

In article <568@csun.UUCP> aeusemrs@csun.UUCP (Mike Stump) writes:

>{just a couple questions for now}
>
>|Copyright 1987 Kent Paul Dolan.  All Rights Reserved.  Author grants
>|retransmission rights recursively only.
>    [I guess I must include his copyright message...]
>
	I think under the fair use doctrine, you could have omitted the
	copyright, but I am no lawyer.

	This was yet another in a small groundswell of attempts to persuade
	the Stargate project not to do damage to the nature of USENet in the
	(laudable) process of attempting to save sites money on phone bills.

	At present, USENet postings may be freely passed from site to site;
	we would lose this ability for news passed via Stargate as presently
	envisioned.

	The Stargate project's present plan proposes to copyright all the
	USENet material it broadcasts as a collective work, and to insist on
	recompense from Stargate subscribers who pass the material on over
	phone lines to additional sites, beyond the normal Stargate
	subscription fees.  This problem is being adequately flamed in the
	Stargate newsgroup, please read/respond there, not here, on this
	topic.

>    Does optimization mean the changing of the internal workings of something,
>such that the inputs, and the results or outputs of the ``black box'' are the
>SAME, and of course, the process is less consuming in some way, typically
>time?  Is this definition use only in theory, or is it used in practice too?

	Optimization in non-pathological cases should be of the black-box
	type; if a=2, b=3, and c=4, then d = a-b+c has the same result as
	d = a+c-b.

	The problems being addressed in this (very, very long) discussion
	have to do with behavior in the pathological cases.  As previous
	postings have shown, these are probably the majority, certainly in
	the case of floating point numbers.  For example, if the implementation
	of floating point numbers in a particular hardware implementation is
	a signed, 4 bit exponent, and a signed, 12 bit mantissa, then the most
	positive representable number (in binary) is 11111111.1111.  This being
	the case, then given (all binary):
	
		a = 10000000.0000;
		b = 00000001.0000;
		c = 10000000.0000;
		
	then the evaluation order (a - b) + c procedes with no problem,
	while the evaluation order (a + c) - b produces an overflow.
	If there were a hardware implementation which rewarded doing the
	add before the subtract, than an optimizing compiler might do the
	evaluation the second way, even though the programmer, knowing the
	possible data values for a, b, and c, had written the code using the
	(a - b) + c format, and expected never to see an overflow from this
	piece of code.
	
	The point of course, is that the compiler does not know what values
	will be assigned to a, b, and c at run time, and so does not know if
	that optimization is safe or not.  The reason that discussion has
	been so prolonged, is that the programmer might know, and (some of us)
	want a nice way to tell the compiler not to muck about with that
	expression.  The compiler writer, lacking guidance to the contrary,
	wants to muck about, because then the code will run faster, benchmark
	better, and the compiler will compete better in the marketplace (all
	else being equal).
	
>    Can you please give me an example of the case you alluded to, in your last
>paragraph above?  I have never felt the need to have anything but, the full
>unrepressed optimizations, global or local, of the the compiler.  With that I
>have one comment, I can see the use of storing values into areas of memory (or
>was that i/o, :-)) declared as volatile in C.

	The volatile declaration (I have no access to the dpANS, thank
	heaven, or I'd read that, too) certainly takes care of part of the
	problem, but it gives compiler writers license to break TONS of
	existing code, which depended on K&R's statement that splitting
	expressions into separate statements was enough to guarantee
	the evaluation order.  If I wrote (all "float" variables):
		temp = a - b;
		d = temp + c;
	in a piece of existing code to avoid the above overflow; the
	compiler writer is now free to elide temp from the code, then go
	back and rearrange to achieve:
		d = (a + c) - b;
	exactly what I was trying to avoid, unless I declare temp to be
	volatile.  Would _you_ like to go back through your site's existing C
	code and find all instances where that is why temp was put in the code,
	and add the appropriate volitile declaration?  ;-)
	
	How about if a little maintenance activity has produced:
		temp = a - b;
		/* 2000 lines of code not involving temp, a, b, c, or d */
		d = temp + c;
	which the ambitious compiler writer is STILL free to optimize into:
		/* 2000 lines of code not involving temp, a, b, c, or d */
		d = (a + c) - b;
	Aaaaarargh!  ;-)

>    What type of ``unexpected interactions of two pieces of code which were
>certainly separate in the programmer's mind'' where you referring to in your
>first paragraph?

	main(){...			/* evreybody  floats */
		external float foo();
		temp = a - b;
		d = foo(temp,c);	/* only call of foo in any code in
					the known universe ;-) */
		...
	}
	
	/* 2,000,000 lines of kernal code and several hundred separate 
	source files later, we finally find foo: 	;-) */
	
	foo(temp,c) ...{
		temp2 = temp + c;
		return temp2;
	}

	Which the ambitious PhD student compiler writer doing a thesis on
	"AI Applications in Optimizing Compilers and Linkers in Spite of all
	Reasonable Attempts to Prevent it" is perfectly free to change, by
	putting the call to foo inline, and then converting to:
		d = (a + c) - b;
	(aw, you guessed ;-).
>-- 
>Mike Stump, Cal State Univ, Northridge Comp Sci Department
>uucp: {sdcrdcf, ihnp4, hplabs, ttidca, psivax, csustan}!csun!aeusemrs

	It's almost daylight, did any of that make more (any) sense?
Kent.

--
The Contradictor	Member HUP (Happily Unemployed Programmers)    // Also
								      // A
Back at ODU to learn how to program better (after 25 years!)	  \\ // Happy
								   \// Amigan!
UUCP  :  kent@xanth.UUCP   or    ...{sun,cbosgd,harvard}!xanth!kent
CSNET :  kent@odu.csnet    ARPA  :  kent@xanth.cs.odu.edu
Voice :  (804) 587-7760    USnail:  P.O. Box 1559, Norfolk, Va 23501-1559

Copyright 1987 Kent Paul Dolan.			How about if we keep the human
All Rights Reserved.  Author grants		race around long enough to see
retransmission rights recursively only.		a bit more of the universe?

mckeeman@wanginst.UUCP (04/13/87)

Let me focus on one particular point of the draft standard that may be read
differently by C programmers and C compiler writers.  First the raw data:

(from the July 9 draft from x3j11)
2.1.2.3 Program execution ...

Evaluation of an expression (specifically, one that involves assignment,
incrementing, decrementing, or calling a function that does any of those
operations) may produce _side_effects_, which are changes in the state of the
execution environment incidental to the evaluation.  At certain specified
points in the execution sequence, called _sequence_points_, all side effects
of previous evaluations shall be complete and no side effects of subsequent
evaluations shall have taken place.

3.6 Statements ...

Except as indicated, statements are executed in sequence.   ...  The end of a
_full_expression_ is a sequence point.

Analysis:

Point 1. the final appearance of "side" in the paragraph above should be
stricken.  In fact it is necesary that _no_ effects of subsequent evaluations
shall have taken place.  Example:

x=y; z=x;

The side effect of storing into x must precede the (non-side-effect) fetch of
x in the second statement.  I suppose that is what the authors intended.

Point 2. compilers may, in fact, discard explicit assignments

x=y; z=x;

In the case that the above is the only use of x, and absent directives such
as volatile and implicit casts, some compiler writers feel free to not
allocate storage for x and translate the above as:

z=y;

The reasoning is that the _observable_ effect of the more efficient version
is the same as the one the programmer wrote.  This is harmless enough unless
the programmer is poking around with a debugger or looking at a postmortem.
If the x3j11 document is supposed to forbid this optimization, then it had
better say something like "all assignments and {inc|dec}rements must result
in updating the indicated memory location."

Point 3. point 2 can get you (or me) into trouble

Many compiler writers would treat the following assignments likewise:

x=y+1; z=x-1;

turning into

z=y;

which is fine except where y==MAXINT.  This is an example of optimizers that
work except for expressions that overflow.  I think x3j11 is OK on this point
(the compiler writer is surely violating it) but x3j11 could explicitly
forbid optimizations that fail to preserve semantics for expressions that
{over|under}flow.  Deep in the heart and professional aspirations of some
compiler writers is an urge to ignore non-arithmetic properties of computer
arithmetic, and a suspicion that customers would rather have really fast code
than code that preserves the semantics of overflow for those very rare cases
where it matters.

End of Points.

The innocent examples above do not expose the subtlety of the problem for
complex expressions, but resolving the meaning of x3j11 for the above should
constrain/free the compiler writer as needed.

QUESTION:   Is it worth tinkering with the x3j11 verbiage?
QUESTION:   Should all explicit assignments be forced to memory?
QUESTION:   Should compiled code overflow exactly the same way under all
	    implementations?

-- 
W. M. McKeeman            mckeeman@WangInst
Wang Institute            decvax!wanginst!mckeeman
Tyngsboro MA 01879

dc@sdd.UUCP (04/14/87)

In article <820@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes:
>
>A few times in 26 years of coding, I have found myself working at the limits
>of a data type, and having to rearrange code to assure that overflow or
>underflow did not take place, during a sound, but not robust calculation.
>Where control of execution order was needed, nothing else would do.
>
>A previous poster has noted that a compiler writer who is on an optimization
>spree (not meant to be derogatory, I have a lot of respect for this skill,
>and spent a long bar evening listing to some justifiable bragging by one of
>the practicioners of this bit of arcana) will not respect anything you can
>do to force a certain code order, since s/he feels free to rearrange
>statements, elide temporaries, rearrange loops, rearrange blocks, reduce
>strength  (change your multiplies to adds, for example), and recent
>publications suggest that separate compilation may not be good enough,
>either, to prevent unexpected interactions of two pieces of code which were
>certainly separate in the programmer's mind.
>
>The ANSI C dpANS, I understand, has chosen to overload the unary plus with
>this task.  I have two problems with this.  First it is {counterintuitive,
>ugly,unmathematical} (pick one).  I have noted earlier, that to the user
>coming from a secondary school environment, the unary plus is a no-op, just
>as s/he learned in algebra class.  It really trashes out the appearance of
>an expression containing it (although C is already in the top five, in the
>running for world's ugliest major language) (need I mention that is one of
>the opinions promised above?).  ;-)  Stray operators floating around don't
>look much like the mathematical equations to which the new user is presumably
>accustomed.
>
>Second, this may provide insufficient or inappropriate granularity of
>control against the optimizing compiler.  I may need to control the order
>of execution of a series of statements which the compiler may decide to
>rearrange.  The unary plus doesn't cut it here, but the compiler author
>has promised to look everywhere for optimizations.  How do I stop the code
>from exercising optimization where it is dangerous to the success of the
>program, without turning it off where it is harmless or beneficial?  I
>think the standard should include a better, prettier method of control of
>optimization scope and degree.
>
>Comments?
>

	Why not utilize a #pragma NO_OPTIMIZE & a counter pragma #pragma OPTIMIZE
around code that is to executed as is  Perhaps we can even create various
classes of pragmas for different optimizations:  #pragma NO_OPTIMIZE LOOPS,
#pragma NO_OPTIMIZE_STRENGTH, etc.


											Daniel Corbett
											V.P. Engineering SDD Corp.

m5d@bobkat.UUCP (Mike McNally ) (04/16/87)

In article <827@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes:
>	                  . . .If I wrote (all "float" variables):
>		temp = a - b;
>		d = temp + c;
>	in a piece of existing code to avoid the above overflow; the
>	compiler writer is now free to elide temp from the code, then go
>	back and rearrange to achieve:
>		d = (a + c) - b;
>
>UUCP  :  kent@xanth.UUCP   or    ...{sun,cbosgd,harvard}!xanth!kent

Not a C compiler writer.  A C compiler (as I know it; I suppose my 
complete ignorance of ANSI C may show here) is not supposed to do
any non-expression-local optimizations, not even basic block type
stuff.  Thus while a Pascal compiler may do such work, or for that
matter an optimizing FORTRAN compiler, no C compiler will.



-- 
Mike McNally, mercifully employed at Digital Lynx ---
    Where Plano Road the Mighty Flood of Forest Lane doth meet,
    And Garland fair, whose perfumed air flows soft about my feet...
uucp: {texsun,killer,infotel}!pollux!bobkat!m5d (214) 238-7474

guy%gorodish@Sun.COM (Guy Harris) (04/17/87)

>	Why not utilize a #pragma NO_OPTIMIZE & a counter pragma #pragma
>OPTIMIZE around code that is to executed as is  Perhaps we can even create
>various classes of pragmas for different optimizations:  #pragma NO_OPTIMIZE
>LOOPS, #pragma NO_OPTIMIZE_STRENGTH, etc.

"We" can't create any sort of pragma classes.  #pragmas are
compiler-dependent.  There's no guarantee that a particular compiler
knows about the categories of optimization you know about, with the
possible exception of "#pragma OPTIMIZE" and "#pragma NO_OPTIMIZE".
The standard should do as little as possible to constrain the forms
of optimizing technology used in comn M   ~r

jm36#@andrew.cmu.edu.UUCP (04/18/87)

> From: kent@xanth.UUCP (Kent Paul Dolan)
> Subject: Re: C and overflow anomolies
> 	The volatile declaration (I have no access to the dpANS, thank
> 	heaven, or I'd read that, too) certainly takes care of part of the
> 	problem, but it gives compiler writers license to break TONS of
> 	existing code, which depended on K&R's statement that splitting
> 	expressions into separate statements was enough to guarantee
> 	the evaluation order.  If I wrote (all "float" variables):
> 		temp = a - b;
> 		d = temp + c;
> 	in a piece of existing code to avoid the above overflow; the
> 	compiler writer is now free to elide temp from the code, then go
> 	back and rearrange to achieve:
> 		d = (a + c) - b;
> 	exactly what I was trying to avoid, unless I declare temp to be
> 	volatile.  Would _you_ like to go back through your site's existing C
> 	code and find all instances where that is why temp was put in the code,
> 	and add the appropriate volitile declaration?  ;-)
> 	

The ambitions compiler writer is NOT free to make this optimization.
There are such things as "sequence points" which occur between
statements.  The best that this could be optimized to is:

    d = +(a + c) - b;

With the additional restriction that "(a + c)" must be evaluated
before "b" if there is any possibility of side-effects between "b" and
either of "a" or "c".

				_.John