[comp.lang.c] Short circuit evaluation

chris@mimsy.UUCP (01/24/87)

>>>    x = (a | b | c);
>>>if the variable `a' contains zero, the compiler must still OR the
>>>contents of `b' and `c' to determine the result.

Almost true.

>>>These are bitwise logical operators.  Short-circuiting these makes
>>>no more sense than short-circuiting a sequence of multiplies as
>>>soon as one of the operands evaluates to `1'.
>>>Mike McNally                                    Digital Lynx Inc.

Not so!

>In article <102600001@datacube> stephen@datacube.UUCP writes:
>>This posting indicates a misunderstanding of how short-circuit evaluation
>>works. In the case of the '|' expression above, the decision to not evaluate
>>is would occur when a or b are all ones, NOT when a or b was zero.

Correct.

In article <34@umich.UUCP>jtr485@umich.UUCP (Johnathan Tainter) writes:
>You mean if the '|' had been a '||', of course.

No, he means in the case of the `|' expression above.

>And actually, when a or b is NONZERO not ALL ONES.

Yes for `||', no for `|'.

C guarantees short-circuit left-to-right evaluation for `||' and
`&&'.  It makes no guarantees for `|' and `&'.  Here is a table
enumerating all possibilities.

    [For these, read `all zeroes bit pattern' for `OFF', and `all ones
     bit pattern' for `ON'.  This is a text compression device.]

      Given:	Possible methods of evalution:
      ------	------------------------------
	&&	Left side evaluated.  If zero, result is zero,
		and evaluation stops.  If nonzero, right side evaluated;
		result is 0 if zero, 1 if nonzer.

	||	Left side evaluated.  If nonzero, result is 1,
		and evaluation stops.  If zero, right side evaluated;
		result is 1 if nonzero, 0 if zero.

	&	1.  Left side evaluated.  If OFF, evaluation stops;
		    result is OFF.  If not, right side evaluated,
		    and both results ANDed.
		2.  Left side evaluated.  Right side evaluated.
		    Results ANDed.
		3.  Right side evaluated.  If OFF, evaluation stops;
		    result is OFF.  If not, left side evaluated, and
		    both results ANDed.

	|	1.  Left side evaluated.  If ON, evaluation stops;
		    result is ON.  If not, right side evaluated,
		    and both results ORed.
		2.  Left side evaluated.  Right side evaluated.
		    Results ORed.
		3.  Right side evaluated.  If ON, evaluation stops;
		    result is ON.  If not, left side evaluated, and
		    both results ORed.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

tim@amdcad.UUCP (01/25/87)

In article <5178@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
	(OFF is all zeros, ON is all ones)
+---------------------------------------
|	&	1.  Left side evaluated.  If OFF, evaluation stops;
|		    result is OFF.  If not, right side evaluated,
|		    and both results ANDed.
|		2.  Left side evaluated.  Right side evaluated.
|		    Results ANDed.
|		3.  Right side evaluated.  If OFF, evaluation stops;
|		    result is OFF.  If not, left side evaluated, and
|		    both results ANDed.
|
|	|	1.  Left side evaluated.  If ON, evaluation stops;
|		    result is ON.  If not, right side evaluated,
|		    and both results ORed.
|		2.  Left side evaluated.  Right side evaluated.
|		    Results ORed.
|		3.  Right side evaluated.  If ON, evaluation stops;
|		    result is ON.  If not, left side evaluated, and
|		    both results ORed.
+---------------------------------------
What about if either the left side expression or the right side
expression contained a side-effect (or a procedure call, which also may
have a side-effect)?  These cannot be short-circuited when bit-wise
operators are used.

	-- Tim Olson
	Advanced Micro Devices

	"We plan products, not lunches"

tps@sdchem.UUCP (01/25/87)

In article <5178@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>...
>    [For these, read `all zeroes bit pattern' for `OFF', and `all ones
>     bit pattern' for `ON'.  This is a text compression device.]
>
>      Given:	Possible methods of evalution:
>      ------	------------------------------
>	&	1.  Left side evaluated.  If OFF, evaluation stops;
>		    result is OFF.  If not, right side evaluated,
>		    and both results ANDed.
>		2.  Left side evaluated.  Right side evaluated.
>		    Results ANDed.
>		3.  Right side evaluated.  If OFF, evaluation stops;
>		    result is OFF.  If not, left side evaluated, and
>		    both results ANDed.
>
>	|	1.  Left side evaluated.  If ON, evaluation stops;
>		    result is ON.  If not, right side evaluated,
>		    and both results ORed.
>		2.  Left side evaluated.  Right side evaluated.
>		    Results ORed.
>		3.  Right side evaluated.  If ON, evaluation stops;
>		    result is ON.  If not, left side evaluated, and
>		    both results ORed.
>In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)

I don't think so.  "&" and "|" are like "+" and "-" in that you are not
guaranteed that the left side will be evaluated before the right.  Read
K&R sections 7.8-10, p. 190:
	"The [&|^] operator is associative and expressions involving [&|^]
	may be rearranged." 
Otherwise, how could an optimizing compiler change

	(a | b)  &  (b | c)

to

	(a & c) | b

or fold constants together as in changing

	(a | CONST1) & (a | CONST2)

to

	a | CONST3

where CONST3 is (CONST1 & CONST2).
	
I agree with you that the evaluation might be short circuited if the result
is already known.  However, I don't think this is guaranteed,
it might happen in the reverse order (right operand evaluated,
left operand skipped),
and it is guaranteed not to happen if the to-be-skipped operand has a
side effect.

|| Tom Stockfisch, UCSD Chemistry	tps%chem@sdcsvax.UCSD

chris@mimsy.UUCP (01/26/87)

In article <14479@amdcad.UUCP> tim@amdcad.UUCP (Tim Olson) writes:
>What about if either the left side expression or the right side
>expression [of an otherwise short-circuitable expression] contained
>a side-effect (or a procedure call, which also may have a side-
>effect)?  These cannot be short-circuited when bit-wise operators
>are used.

	c = *p++ & *q++;
	/* are both p and q always incremented? */

I can find no promise in K&R that bitwise expressions are not short
circuited even in the presence of side effects.  The ANSI draft
may have more to say.  In any case, I would advise not counting on
full evaluation.  It is even conceivable that a compiler might
generate

	if ((c = *p) != 0)
		c &= *q;
	p++;
	q++;

which, if p and q point to device registers, is not the same!

In general, existing C compilers will not attempt to optimise
away the AND above if *p++ (or *q++) is zero, simply because the
`C philosophy' is that if you *meant*

	c = *p++ ? p[-1] & *q++ : 0;

you would have written that.  But is it mandated?  I cannot say.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

chris@mimsy.UUCP (01/26/87)

In article <621@sdchema.sdchem.UUCP> tps@sdchem.UUCP (Tom Stockfisch) writes:
>"&" and "|" are like "+" and "-" in that you are not guaranteed that
>the left side will be [fully] evaluated before the right [or vice versa].
>Read K&R sections 7.8-10, p. 190:
>	"The [&|^] operator is associative and expressions involving [&|^]
>	may be rearranged." 

This is a good point too.  It gets worse all the time! :-)

>I agree with you that the evaluation might be short circuited if the result
>is already known.  However, I don't think this is guaranteed,

Of course not.  Intuitive languages make few guarantees.... :-)

>it might happen in the reverse order (right operand evaluated,
>left operand skipped),

That was one of the possibilities I enumerated.

>and it is guaranteed not to happen if the to-be-skipped operand has a
>side effect.

Is it?

(This is one reason optimising compilers are so difficult to write:
No one is sure just how much they are allowed to do....)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

m5d@bobkat.UUCP (01/28/87)

In article <5178@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>>>[ he's quoting me here ]
>>>>These are bitwise logical operators.  Short-circuiting these makes
>>>>no more sense than short-circuiting a sequence of multiplies as
>>>>soon as one of the operands evaluates to `1'.
>>>>Mike McNally                                    Digital Lynx Inc.
>
>Not so!
>

I know, I know, I'm sorry, I'm sorry!

> [ 
>   Chris then goes on to accurately describe the possible
>   paths of evaluation for |, &, ||, and &&, although he
>   omits ^, which is OK I think, because you can't say
>   much about ^ given an intermediate result
> ]
>
>In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
>UUCP:  seismo!mimsy!chris  ARPA/CSNet: chris@mimsy.umd.edu

I realize that my first posting(s? I can't remember how many I did)
were a little confused.  I do know what's going on, and would like to
ask a related question.

If the compiler decides to generate code that notices when part of a
sequence of sibling `|' or `&' operations have generated an inevitable
result, do side effects of sub-expressions still have to be performed?
In other words, let's look at this:

    x = f1(something) | f2(something) | f3(something) ...

If I wrote this, I would expect all the functions to be called, even if
one of them returns ~0.  Am I nuts?  Seems to me that an optimization
like that would be naughty.

As I look at that example, I realize that people might still think I'm
confused.  Try this example:

    x = f1(something) * f2(something) * f3(something) ...

Shouldn't I expect all the multiplications to be performed, even if one
function returns zero?

I don't think I've ever used a compiler that does such optimizations
(not that that really means much).  Has anyone else?

If I actually *am* lost, please don't tell my employer.

--
****                                                         ****
**** At Digital Lynx, we're almost in Garland, but not quite ****
****                                                         ****

Mike McNally (feeling sheepish nowadays)        Digital Lynx Inc.
Software (not hardware) Person                  Dallas  TX  75243
uucp: {texsun,killer,infotel}!pollux!bobkat!m5d (214) 238-7474

jpn@teddy.UUCP (01/29/87)

Why is there so much misinformation in this group?!?!  I wish people who
don't know what they are talking about would not post here!

>I can find no promise in K&R that bitwise expressions are not short
>circuited even in the presence of side effects.

60 seconds with my copy of K&R yeilds this (Appendix A):

  "The & operator is associative and expressions involving & may be rearranged"

  "Unlike &, && guarantees left-to-right evaluation; moreover, the second
   operand is not evaluated if the first operand is zero"

>In any case, I would advise not counting on full evaluation (of bitwise). 

True, it is not explitly stated that & and | are not short circuited.  It
also doesn't explicitly say that + and * are not short circuited!  However
&& and || are EXPLICITY defined as having short circuit behavier, no other
operators are described this way, so it is safe to assume that all other
operators do NOT short-circuit.

henry@utzoo.UUCP (Henry Spencer) (01/31/87)

>     x = f1(something) * f2(something) * f3(something) ...
> 
> Shouldn't I expect all the multiplications to be performed, even if one
> function returns zero?
> I don't think I've ever used a compiler that does such optimizations...
> ...  Has anyone else?

Yeah, some variants of the V7 pdp11 C compiler did, in cases where one of
the operands was a compile-time constant known to be 0.  (This can arise
legitimately when doing arithmetic on configuration-dependent #defined
constants, for example.)  Never bothered me, since I view side effects
inside expressions as being unjustifiable pornography except for some very
specific cases.  I did ask Dennis about it at one point, since strict
reading of K&R suggested it was a bug.  His reply, as I recall it, was
roughly "I think it is defensible in principle, but it caused so many
complaints that newer versions of the compiler don't do it".
-- 
Legalize			Henry Spencer @ U of Toronto Zoology
freedom!			{allegra,ihnp4,decvax,pyramid}!utzoo!henry

henry@utzoo.UUCP (Henry Spencer) (02/03/87)

> I did ask Dennis about [short-circuiting of multiplies with a 0 operand]
> ... His reply, as I recall it, was
> roughly "I think it is defensible in principle, but it caused so many
> complaints that newer versions of the compiler don't do it".

I should add that X3J11 has come down quite firmly on the pragmatic side
of this issue.  With the obvious exceptions of && and friends, which are
explicitly short-circuited, the compiler is not permitted to leave out
side effects of expressions.  (Personally, as you might have gathered from
my earlier posting, I like the purist approach to this one, but I realize
how much havoc it could cause...)
-- 
Legalize			Henry Spencer @ U of Toronto Zoology
freedom!			{allegra,ihnp4,decvax,pyramid}!utzoo!henry

jas@rtech.UUCP (02/03/87)

Apropos of whether it is legitimate to short-circuit an expression
evaluation, when a short-circuited operand has side effects, Henry
Spencer writes:
> Yeah, some variants of the V7 pdp11 C compiler did, in cases where one of
> the operands was a compile-time constant known to be 0....
> Never bothered me, since I view side effects
> inside expressions as being unjustifiable pornography except for some very
> specific cases.  I did ask Dennis about it....  His reply ... was
> roughly "I think it is defensible in principle, but it caused so many
> complaints that newer versions of the compiler don't do it".

I don't know, Henry, I think you and Dennis are on thin ice here.  In
Pascal, you're right; but C is close to being a pure expression
language, in which the only way to change machine state is by a side
effect.  Three examples:  (1) An assignment is an expression; as a side
effect, it stores the value of its right operand in the memory location
represented by the left operand.  (2) The comma is a binary operator whose
semantics are:  evaluate the left operand, discard the result, then
evaluate the right operand, whose value is the value of the whole
expression.  Obviously, this is only useful if the left operand has
side effects.  If a short-circuiting compiler decides never to evaluate
the left operand, because it will not affect the value of the whole
expression, then the comma operator becomes meaningless.  (3) The
most common form of statement in C is (syntactically), "<stmt> ::=
<expression> ';'".  The semantics associated with this production are,
"evaluate the expression and discard the result."  If a compiler
writer were to decide to short-circuit the evaluation, since the result
is being discarded, s/he would end up with a pretty useless compiler.

Now, one might say, "It is poor style for an expression with side
effects to itself be an operand of an expression other than a "comma"
expression."  (Though this prohibits the ever-popular "if ((c = getchar())
!= EOF)".)  But though this may be a valid stylistic constraint, I
would hate to see it embedded in the semantics of the language.
-- 
Jim Shankland
 ..!ihnp4!cpsc6a!\
                  rtech!jas
..!ucbvax!mtxinu!/

franka@mmintl.UUCP (02/04/87)

In article <3721@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes:
>60 seconds with my copy of K&R yeilds this (Appendix A):
>
>  "The & operator is associative and expressions involving & may be
>   rearranged"
>
>  "Unlike &, && guarantees left-to-right evaluation; moreover, the second
>   operand is not evaluated if the first operand is zero"
>
>True, it is not explitly stated that & and | are not short circuited.  It
>also doesn't explicitly say that + and * are not short circuited!  However
>&& and || are EXPLICITY defined as having short circuit behavier, no other
>operators are described this way, so it is safe to assume that all other
>operators do NOT short-circuit.

As a general rule, whatever is not specified in the C description (whether K&R
or ANSI) is just that: not specified.  Just because certain operators are
described as having certain behavior does not guarantee that those operators
for which the behavior is not specified behave in the opposite way.

I would not write code which assumes that & and | are not short-circuited;
nor would I write code which assumes that + and * are not short-circuited.
(Although + really can't be.)  But then, I use side-effects of expressions
only in time-critical portions of the code, and only when the side effect
is *guaranteed* to work by the language specification or in pieces of code
marked as machine dependent.  (However, my tendency is to write machine
dependent code in assembler.)

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

mouse@mcgill-vision.UUCP (02/10/87)

In article <637@rtech.UUCP>, jas@rtech.UUCP (Jim Shankland) writes:
> Now, one might say, "It is poor style for an expression with side
> effects to itself be an operand of an expression other than a "comma"
> expression."  (Though this prohibits the ever-popular
> "if ((c = getchar()) != EOF)".)

It also prohibits *ptr++ (which I don't think is excessive).

					der Mouse

USA: {ihnp4,decvax,akgua,utzoo,etc}!utcsri!mcgill-vision!mouse
     think!mosart!mcgill-vision!mouse
Europe: mcvax!decvax!utcsri!mcgill-vision!mouse
ARPAnet: think!mosart!mcgill-vision!mouse@harvard.harvard.edu

hoey@nrl-aic.arpa (02/12/87)

    From: Chris Torek <mimsy!chris>
    Message-Id: <5199@mimsy.UUCP>
    Date: 26 Jan 87 02:17:54 GMT

    ...
	    c = *p++ & *q++;
	    /* are both p and q always incremented? */

    I can find no promise in K&R that bitwise expressions are not short
    circuited even in the presence of side effects.  The ANSI draft
    may have more to say.  In any case, I would advise not counting on
    full evaluation.

K&R's note that ``The order in which side-effects take place is ...
unspecified'' conspicuously fails to lend credence to this point of
view.  Why should they unspecify the order, and not unspecify whether
side-effects take place at all?  It seems to me that the idea of
bitwise operators (or other operators lacking sequence breaks) not
performing all their side-effects is so counterintuitive that it should
be outlawed.  At the very least, a prominent warning must be included
in the standard.  Since this is a contentious issue, a standard silent
on the point is no standard.

If compilers are allowed to fail to increment both p and q, I would
assume that they are also allowed to short-circuit evaluation of *, /,
%, <<, >>, >=, and <= when the result is known by evaluation of one
argument.  This may also occur in evaluation of <, >, ==, and != when
one of the arguments is being promoted from a restricted range.  Shall
we perhaps leave unspecified whether

	*p++ = 0;

increments p when the compiler can deduce that *p is already 0?

      It is even conceivable that a compiler might generate

	    if ((c = *p) != 0)
		    c &= *q;
	    p++;
	    q++;

    which, if p and q point to device registers, is not the same!

I find this perfectly acceptable, due to the above quote from K&R.  If
you are accessing device registers code in C, you are expected to know
what the compiler is going to do with your code.  If you have an
optimizing compiler, you also have to worry about dead variables and
code movement for code in which pointer dereference can cause
side-effects.

    From: Henry Spencer <utzoo!henry>
    Message-Id: <7588@utzoo.UUCP>
    Date: 30 Jan 87 22:08:26 GMT

    ...  Never bothered me, since I view side effects inside expressions
    as being unjustifiable pornography except for some very specific cases.
    I did ask Dennis about it at one point, since strict reading of K&R
    suggested it was a bug.  His reply, as I recall it, was roughly "I
    think it is defensible in principle, but it caused so many complaints
    that newer versions of the compiler don't do it".

Are your cases of justifiable pornography sufficiently specific that
they should appear in the standard?  I believe the standard must be
precise here, and the ``so many complaints'' lead me to believe that
the answer must be in favor of guaranteed execution of side-effects.

Dan Hoey

chris@mimsy.umd.edu (02/12/87)

Dan Hoey <hoey@nrl-aic.arpa> writes:
>    From: Henry Spencer <utzoo!henry>
>
>    I did ask Dennis about [a PDP-11 C compiler short circuiting
>    multiplies by constant zeroes, thus discarding side effects]
>    at one point, since strict reading of K&R suggested it was a
>    bug.  His reply, as I recall it, was roughly "I think it is
>    defensible in principle, but it caused so many complaints that
>    newer versions of the compiler don't do it".
>
>Are your cases of justifiable pornography sufficiently specific that
>they should appear in the standard?  I believe the standard must be
>precise here, and the ``so many complaints'' lead me to believe that
>the answer must be in favor of guaranteed execution of side-effects.

I would prefer to see some guarantee that side effects will occur.
Guarantees as to precisely *when* are problematical, and since such
have never been made, should not be part of the standard.  So let
us say that

	if (*p++ & *q++)

and

	a = *p++ * *q++;

and even

	a = 0 * *p++ * *q++;

must always increment both p and q, but not in any particular order,
so long as the memory references use the values p and q have before
they are incremented.

Should we also say that the memory references in the third expression
must occur?  If not, the compiler can optimise this into

	p++, q++, a = 0;

but if so, must the compiler also do the multiply?  (It might
overflow, which could be important.)
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

chris@mimsy.UUCP (02/12/87)

Concerning guarantees as to evalutions, in article <4391@brl-adm.ARPA>
I wrote:
>	a = 0 * *p++ * *q++;
>
>Should we also say that the memory references in the third expression
>must occur? ... if so, must the compiler also do the multiply?  (It might
>overflow, which could be important.)

Bad example; the compiler is certainly allowed (0 * *p++) giving
0 giving * *q++ giving 0, which does not overflow.  Suppose instead
we have

	double *p, r;
	...
	r = *p++ / 1.0;

Must this do the divide, possibly causing (e.g.) a reserved operand
fault on a Vax due to an invalid float value at *p?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

john@viper.UUCP (02/13/87)

I agree totaly with Dan Hoey's statement on this subject.  There is -FAR-
too much code already existing which requires non-shortcircuit evaluation
of '|' and '&' operations.  Since one of the primary considerations that
has been stated by the standards committee was to avoid "breaking" as much
existing software as possible, and since, as Dan pointed out, "a standard
which is silent on this point is no standard", it seems that at very least
all side effects -must- be carried out.  As long as the code produces the
same results, I doubt any of us will object if the compiler creates some
bizarre method of getting things done in the least amount of time.
  What this means to me is that code written in optimized C, where side
effects are known and taken into account, won't break just because some
tight loop got short circuited and an increment got skipped.  This kind
of code exists -everywhere-.  If we allow side effects like increment
or function calls to be skipped in one "optimized" compiler and not in
the standard run-of-the-mill type complers that most of us use, the standard
itself will have introduced portability problems by an omission...
  If the compiler people want to create code which stops evaluating the
results after a fixed result is known, I say fine.  BUT, they must take
the responsibilty of picking up the peices and making sure the incidental
increments,etc are taken care of!

mckeeman@wanginst.UUCP (02/16/87)

The discussion about side effects is the tip of a very subtle iceberg.  I
would like to separate concerns as follows: visible side effects are those
which are reflected in specific C constructs (essentially all of these are
assignments of one kind or another imbedded within larger expressions or
hidden in arithmetic tautologies), and invisible side effects which are
properties of the underlying hardware (this is a mixed bag including setting
condition codes, causing overflow, causing normalization of floating point
numbers by known hardware hacks, going through states that cannot be 
unwound by a symbolic debugger, and so on.).

The compiler writer is surely allowed any optimization that will not
change the program results.  Some will work harder on this than others
but the programmer should not care.  The question arises when an unlikely
set of circumstances would change the results (like aborting on overflow).
This is a hard problem because the compiler writer cannot know what the
programmer is looking for.  If, for instance, run-time is a legitimate
"result", then no optimization can be allowed at all.

The safest approach is to warn the programmer when the compiler is about to
throw away some of his/her carefully crafted C.  A value that is computed,
and not used is a significant programming error, not an optimization
opportunity.  With that approach, the following fragment would generate
a single store of 0 to x and several warning messages about discarded code
for the visible C commands  /, +, *, *p and ++.

x = 0; x /= 1; x += 0 * *p++;

I regard this approach as an indication that the compiler writer respects the
users of their product; the programmer would not write code unless the
programmer expected it to matter.  A numercial analyst once complained
bitterly that my throwing away the assignment
X = X + 0.0
prevented him from normalizing floating point numbers on the IBM 7090.  I was
unable to give him advice on what he could have written that I could not (at
least in theory) have optimized away.  And he didn't want to stand on his
head to get numbers normalized, either.  Enough said: compiler writers cannot
know what is in the mind of the programmer, and anyway do not want to be
limited to make all their object code worse because of rarely occurring
special cases.

The invisible side effects are more difficult to deal with at the language
definition level.  I often hear complaints, backed by debugger output, that
some assignment never happens.  True enough, the value is held in a register
for use until context exit, and never ends up in memory.  There is no
consequence except for a bedeviled programmer trying to find a bug with
optimizer-obscured clues.  In this case the program is not being looked at as
an input/output mapping, but rather as a living thing to be examined in vivo.

A solution to both problems is some sort of construct in C that says "from
here to here, do everything by the abstract machine rules".  Associate
left-to-right, evaluate operands left-to-right, do stores immediately, and so
on.  This would make the unwelcome warning messages go away (because the
compiler would not be throwing away the code after all) and allow the rare
but necessary forcing of non-optimal code.
-- 
W. M. McKeeman            mckeeman@WangInst
Wang Institute            decvax!wanginst!mckeeman
Tyngsboro MA 01879

mccaugh@uiucdcsm.UUCP (02/21/87)

 This is not submitted as a criticism to the foregoing erudite discussion of
 side-effects, but is rather an innocent question about C in particular: why
 does C refuse to abide by the associativity/precedence rules for expression-
 evaluation that even BASIC guarantees? I can well understand "optimization"
 as an excuse but can easily imagine cases where normally-evaluated expressions
 can crash a system when "optimized" for eavaluation without the programmer's
 express consent. Isn't it a little arbitrary for C to mnaipulate the parts
 of an expression to its satisfaction (or whim)? In particular, this renders
 the formal verification of C-code impractical.

michael@crlt.UUCP (02/22/87)

[Line eater shouldn't eat: no leading whitespace - but why take chances?]

In article <7610@utzoo.UUCP>, henry@utzoo.UUCP writes:
> 
> I should add that X3J11 has come down quite firmly on the pragmatic side
> of this issue.  With the obvious exceptions of && and friends, which are
> explicitly short-circuited, the compiler is not permitted to leave out
> side effects of expressions.

Let me applaud X3J11 on this issue.  Regardless of whether side-effects
are allowed to be optimized out or not, conforming compilers should all
do things predictable from the standard.  Thus they should all either
squeeze them out or not.  Otherwise you can all-too-easily write code
that works with some conforming compilers and not others - which is
exactly what a standard is intended to prevent, or at least reduce and
make it easy to avoid.

When a programmer writes a statement with side-effects, it is probably
because he wanted them to happen.  Therefore, the logical choice is to
force the compiler to let them happen, rather than to force it to squeeze
them out.

(In the same vein, perhaps my biggest objection to c is the possibility
that associative operators may be rearranged, even if I try to coerce
the order with parenthesis.  This means that if, say, I want to coerce
multiplies and divides into a certain order to insure against overflow,
I must break expressions into several statements.)

===========================================================================
  "I've got code in my node."	| UUCP:  ...!ihnp4!itivax!node!michael
				| AUDIO: (313) 973-8787
	Michael McClary		| SNAIL: 2091 Chalmers, Ann Arbor MI 48104
---------------------------------------------------------------------------
Above opinions are the official position of McClary Associates.  Customers
may have opinions of their own, which are given all the attention paid for.
===========================================================================

henry@utzoo.UUCP (Henry Spencer) (02/24/87)

>     I can find no promise in K&R that bitwise expressions are not short
>     circuited even in the presence of side effects.  The ANSI draft
>     may have more to say.  In any case, I would advise not counting on
>     full evaluation.
> 
> K&R's note that ``The order in which side-effects take place is ...
> unspecified'' conspicuously fails to lend credence to this point of
> view.  Why should they unspecify the order, and not unspecify whether
> side-effects take place at all?

You are using what is sometimes known as the "Biblical Theory of Standards":
if the standard is by definition complete, an omission is highly significant
and may be taken to imply answers to questions that the standard does not
explicitly address.  In real life, "insufficient data" is a legitimate answer
to a question.  The fact is, K&R simply doesn't discuss the issue.  Your
reasoning does *suggest* that the *intent* was to require side effects, but
the fact remains that the document DOES NOT RESOLVE THE QUESTION.

In response to this, one can take one of two approaches.  One may assume that
all implementors will read the same message between the lines, and thus it
is proper to rely on side effects being present.  Alternatively, if one has
a bit less faith in the interlineal reading ability of implementors, one has
no choice but to assume that "no specification" means "unspecified".

> It seems to me that the idea of
> bitwise operators (or other operators lacking sequence breaks) not
> performing all their side-effects is so counterintuitive that it should
> be outlawed...

Why so?  I know of at least one language which explicitly says, right up
front, that no such assumptions are allowed and the compiler is free to
short-circuit anything it pleases.  This will affect programming style in
small ways (in my view for the better), but it's not an unthinkable heresy.
The language in question is Bliss, which has been quite successful in its
limited niche.

> ...Shall we perhaps leave unspecified whether
> 	*p++ = 0;
> increments p when the compiler can deduce that *p is already 0?

Whether one thinks this is legitimate depends on fine points of semantics;
you are trying to push the line of argument beyond the point where most
people would defend it.  The crucial observations here are that (a) the
value of the = operator is not being used, so its whole purpose is a side
effect, and (b) the postfix ++ operator likewise exists solely to cause a
side effect.  This isn't quite the same sort of animal as the situations
we were originally discussing.

> I find this perfectly acceptable, due to the above quote from K&R.  If
> you are accessing device registers code in C, you are expected to know
> what the compiler is going to do with your code...

Actually, the right solution to this is X3J11's "volatile" modifier for
types.  (They haven't handled it quite right, but the underlying idea is
correct.)  Then you do not need to be intimate with the compiler, because
you can explicitly tell the compiler "hands off this one".

> Are your cases of justifiable pornography sufficiently specific that
> they should appear in the standard?...

Probably not; I've never bothered pinning them down in detail.

> ...the standard must be precise here, and the ``so many complaints'' lead
> me to believe that the answer must be in favor of guaranteed execution of
> side-effects.

This is in fact what X3J11 has done.  I concede the practical necessity while
finding it aesthetically displeasing.
-- 
Legalize			Henry Spencer @ U of Toronto Zoology
freedom!			{allegra,ihnp4,decvax,pyramid}!utzoo!henry

greg@utcsri.UUCP (Gregory Smith) (02/24/87)

In article <4700004@uiucdcsm> mccaugh@uiucdcsm.UUCP writes:
>
> This is not submitted as a criticism to the foregoing erudite discussion of
> side-effects, but is rather an innocent question about C in particular: why
> does C refuse to abide by the associativity/precedence rules for expression-
> evaluation that even BASIC guarantees? I can well understand "optimization"
> as an excuse but can easily imagine cases where normally-evaluated expressions
> can crash a system when "optimized" for eavaluation without the programmer's
> express consent. Isn't it a little arbitrary for C to mnaipulate the parts
> of an expression to its satisfaction (or whim)? In particular, this renders
> the formal verification of C-code impractical.

C does abide by associativity and precedence rules, which are laid
out in the appendix of K&R. A-B+C means (A-B)+C and not A-(B+C).
A*B+C/D means (A*B)+(C/D), and not A*((B+C)/D).

What you are complaining about is sequence of evaluation. If I write
a=b*c+d*e, do I care which multiply is done first? The reason you have
not seen anybody complaining about this in other languages, is that
these other languages do not have side-effects in expressions (at least
not to the same extent). If I write a=foo()+bar(), I may indeed care
whether foo() is called before bar(), in which case I *won't write
that*.

The problem has arisen through the notation we use in writing
expressions. An expression has a tree structure, but it is written in a
linear left to right fashion. Associativity and precedence rules serve
merely to define how an expression tree is extracted from its linear
form. Unfortunately, the linear form imposes an artificial ordering
on an expression. The '+' operator is perfectly commutative, so that A+B
and B+A are the same. Unfortunately, you have to write either A or B first.
Even an operator like / has the same problem: if there were an 'under'
operator \ you could have your choice of A/B or B\A; the justifiable lack
of a \ operator forces you to write A before B even though they are at
equal levels in the expression tree.

The philosophy of the C language is that these limitations of the linear
notation should not be allowed to restrict code generation. In an
expression where two subexpressions X and Y must be added together,
it may be vastly more efficient to evaluate Y before X. The programmer
is not able to determine this, and is nevertheless forced to write one
before the other.

Putting it another way, a tree which represents an expression may
be converted to linear form ( traversed ) in many ways. Some ways
will be more efficient for evaluation, and that is hopefully what
the compiler will generate. Some ways may be more useful for human
reading, and that is what the programmer will write ( E.g. in a divide,
the human always writes the dividend before the divisor, due only to
lack of the aforementioned '\' operator ). As long as the same tree is
involved in both cases, what's wrong with that?

It is worth noting at this point that the parentheses '(' and ')' serve
only in the process of converting a linear expression to a tree. Thus
(a+b)*c is different from a+b*c but a+(b+c) can be treated the same as
(a+b)+c. Simply allow your tree to contain a node which forms the sum of
its three, equally ranked, children, and throw in the fact that
redundant ()'s grow in C like mushrooms. Then a+(b+c) should be treated
as a+b+c or SUM(a,b,c). ( I know, there are overflow considerations).

If the expression tree contains no side-effects, it will make no
difference in which order it is evaluated (just like in BASIC).
However, if one branch of the tree changes the value of X and another
branch uses the value of X, the result will depend on which branch
is done first. Some would say , "the compiler should detect such cases
and then use the order of the original expression". However, it
is *impossible* to detect all of these cases at compile time.
So the programmer is simply warned of the cases where expressions
may be rearranged, and avoids expressions which work differently
depending on how they are arranged.

Note that these cases (where results are undefined) are *well defined*
by the language, so you know what to avoid.

[ I know this is oversimplification and doesn't deal with delaying
of side-effects, or with sequence points, etc.. but I wanted to explain
this stuff to those who just can't fathom what all the fuss is about and
why the damn compilers don't just do what they're told ]

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

john@viper.UUCP (03/02/87)

In article <4211@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:
 >
 >It is worth noting at this point that the parentheses '(' and ')' serve
 >only in the process of converting a linear expression to a tree. Thus
 >(a+b)*c is different from a+b*c but a+(b+c) can be treated the same as
 >(a+b)+c. Simply allow your tree to contain a node which forms the sum of
 >its three, equally ranked, children, and throw in the fact that
 >redundant ()'s grow in C like mushrooms. Then a+(b+c) should be treated
 >as a+b+c or SUM(a,b,c). ( I know, there are overflow considerations).
 >
 >----------------------------------------------------------------------
 >Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
 >Have vAX, will hack...

The example Greg provides here is misleading.  

If you have three expressions:
	1:  a+b+c
	2: (a+b)+c
	3:  a+(b+c)
all three of them should not be evaluated the same way.  Greg implys that
they should be.  This is not so.  When an equation contains '(' and ')'
it intentionaly (and explicitly) defines the parse tree structure that will
result.  The statement "redundant ()'s grow in C like mushrooms" may be true,
but it doesn't give anyone the right to arbitrarily ignore explicit cues
to the compiler.  When I don't care, I don't use them.  When I do, I do
so for a reason..........

  Since there is additional, undesireable, and unnecessary overhead in the
detection of this SUM(a1,a2,a3...,aN) special case, and since there appears
to be little or no advantage to doing so (you have to add them up in -some-
order, you might as well let the programmer decide as anyone), why bother?

--- 
John Stanley (john@viper.UUCP)
Software Consultant - DynaSoft Systems
UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john

m5d@bobkat.UUCP (03/05/87)

In article <609@viper.UUCP> john@viper.UUCP (John Stanley) writes:
>
>The example Greg provides here is misleading.  
>
>If you have three expressions:
>	1:  a+b+c
>	2: (a+b)+c
>	3:  a+(b+c)
>all three of them should not be evaluated the same way.  Greg implys that
>they should be.  This is not so.  When an equation contains '(' and ')'
>it intentionaly (and explicitly) defines the parse tree structure that will
>result.  The statement "redundant ()'s grow in C like mushrooms" may be true,
>but it doesn't give anyone the right to arbitrarily ignore explicit cues
>to the compiler.  When I don't care, I don't use them.  When I do, I do
>so for a reason..........
>
>  Since there is additional, undesireable, and unnecessary overhead in the
>detection of this SUM(a1,a2,a3...,aN) special case, and since there appears
>to be little or no advantage to doing so (you have to add them up in -some-
>order, you might as well let the programmer decide as anyone), why bother?
>
>--- 
>John Stanley (john@viper.UUCP)
>Software Consultant - DynaSoft Systems
>UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john

Sorry, but the compiler writer DOES have the right to use algebraic
manipulation to improve code.  Why?  Because the specification of the
language says so.

There is some difference between "a + (b + c)" and "(a + b) + c":

        +               +
       / \             / \
      a  +            +   c
        / \          / \
       b   c        a   b

A typical code generation scheme will prefer one to the other (one
saves a register).

What happens in other languages (like FORTRAN) if a programmer doesn't
want code re-arranged?  At least in C, the compiler is not supposed to
mess with stuff across statement boundaries.  In a language like
FORTRAN or Pascal, the compiler is liable to do common subexpression
elimination across an entire basic block (or maybe even globally), not
to mention loop unravelling, code hoisting, induction variable
elimination, and removal of dead code (a real bad one for you device
driver fans).  Since lots of people seem outraged at the liscense given
to C compilers, perhaps they should give examples of mechanisms which
allow optimization to be prevented in other languages.  I would hope
that such mechanisms are portable.



-- 
Mike McNally, mercifully employed at Digital Lynx ---
    Where Plano Road the Mighty Flood of Forest Lane doth meet,
    And Garland fair, whose perfumed air flows soft about my feet...
uucp: {texsun,killer,infotel}!pollux!bobkat!m5d (214) 238-7474

greg@utcsri.UUCP (Gregory Smith) (03/06/87)

In article <609@viper.UUCP> john@viper.UUCP (John Stanley) writes:
>The example Greg provides here is misleading.  
>
>If you have three expressions:
>	1:  a+b+c
>	2: (a+b)+c
>	3:  a+(b+c)
>all three of them should not be evaluated the same way.  Greg implys that
>they should be.  This is not so.  When an equation contains '(' and ')'
>it intentionaly (and explicitly) defines the parse tree structure that will
>result.  The statement "redundant ()'s grow in C like mushrooms" may be true,
>but it doesn't give anyone the right to arbitrarily ignore explicit cues
>to the compiler.  When I don't care, I don't use them.  When I do, I do
>so for a reason..........
>
How do you tell an explicitly defined tree from an implicit one? a+b+c
parses to the same expression tree as (a+b)+c. That could be changed
(i.e. () info could be included in the expr. tree ). But what about
(a+b)*c? There is no way to represent that *expression tree* without
parentheses. How do I then write this in such a way that the compiler is
allowed to rearrange it? My point is that ()'s serve to override the
binding strengths of operators, and thus allow arbitrary expressions to
be constructed. They cannot be used to prevent optmizations,
since they are *required* in many cases, simply to write the expression
in the first place. I want my compiler to be free to change (i+3)*4+6
to i*4+18 (more on this later). If I don't want that, there are means
of avoiding it.

Furthermore, a redundant () may very well not be "intentional", in the
applicable sense of the word, and therefore should not be construed as
"explicit". more later.

>  Since there is additional, undesireable, and unnecessary overhead in the
>detection of this SUM(a1,a2,a3...,aN) special case, and since there appears
>to be little or no advantage to doing so (you have to add them up in -some-
>order, you might as well let the programmer decide as anyone), why bother?
>
Wrong. There is an advantage to doing the sum in an arbitrary order.

First of all, it rarely makes a difference to the result (if it
does, it is the programmer's fault, by definition :-) ).

It allows constants to be folded to the end (a+2)+(b+9)+4 = (a+b)+15.
It allows fewer registers to be used. ((a+b)+(c+d))+((w+x)+(y+z)),
if done literally, requires 3 working values at one point, while a
running sum never requires more than one. The programmer cannot
determine the best order, because the programmer is not writing
code specifically for a certain machine, right? :-) the compiler
is supposed to do that.

Multiple constants in running sums tend to pop up (1) from macro
expansions (2) in expressions like (p+3)->foo.y[i-1].abc ( after the
semantics have been applied ). In the latter case the programmer can't
control them because s/he can't really see them. One could of course
distinguish the programmer-created weirdness from the internally-created
ones, but why not optimize both of them? One cannot, of course,
distinguish macro-created weirdness under the current preprocessor
paradigm.

And once you convert arbitrary additions into rearrangable running sums,
it becomes very attractive to convert things like (i+7)*4 + 2 into
i*4 + 28 + 2 into i*4 + 30. Again, this comes up a *lot* with array
operations, and again, if it breaks a program, that program was probably
doomed anyway.

For more on this sort of thing, see "A Tour through the UN*X C Compiler"
in the Version 7 books.

PS if you don't believe me about the mushrooms, I give you the
expansion of getchar(putchar()):

(--((&_iob[1]))->_cnt>=0? ((int)(*((&_iob[1]))->_ptr++=
(unsigned)((--((&_iob[0]))->_cnt>=0? *((&_iob[0]))->_ptr++&0377:
_filbuf((&_iob[0])))))):_flsbuf((unsigned)((--((&_iob[0]))->_cnt>=0?
*((&_iob[0]))->_ptr++&0377:_filbuf((&_iob[0])))),(&_iob[1])));

Most of those '()'s are in there to enforce precedence against
arbitrary parameters. E.g. if I write

#define INCH_TO_CM(x) x*2.54

then INCH_TO_CM(a+b) becomes a+b*2.54, which is wrong.
To be safe, I have to write

#define INCH_TO_CM(x) ((x)*2.54)

>John Stanley (john@viper.UUCP)
>Software Consultant - DynaSoft Systems
>UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

root@sdd.UUCP (Root) (03/07/87)

In article <609@viper.UUCP> john@viper.UUCP (John Stanley) writes:
>
>If you have three expressions:
>	1:  a+b+c
>	2: (a+b)+c
>	3:  a+(b+c)
>all three of them should not be evaluated the same way.  Greg implys that
>they should be.  This is not so.  When an equation contains '(' and ')'
>it intentionaly (and explicitly) defines the parse tree structure that will
>result.  The statement "redundant ()'s grow in C like mushrooms" may be true,
>but it doesn't give anyone the right to arbitrarily ignore explicit cues
>to the compiler.  When I don't care, I don't use them.  When I do, I do
>so for a reason..........

	NO!!!!  The new DRAFT ANSI standard explicitly says:

	"An expression involving more than one occurance of the same cummutative
	and associtive binary operator (*, +, &, ^, |) may be regrouped 
	arbitrarily, EVEN IN THE PRESENCE OF PARENTHESES, provided the types of
	the operands or of the results are not changed by this regrouping.  To
	force a particular grouping of operations, either the value of the
	expression to be grouped may be explicitly assigned to an object, or
	grouping parentheses may be preceded by a unary plus operator."


Therefore if one desires to explicitly specify an order, in the new ANSI
standard one can use a unary plus operator as follows:

	1:	+(a + b) + c
	2:	a + (+(a + b))

This should guarantee the evaluation order!

									Daniel Corbett
									Software Design & Development Corp
									360 Mobil, Suite 103
									Camarillo, CA 93010

drw@cullvax.UUCP (03/11/87)

john@viper.UUCP (John Stanley) writes:
> If you have three expressions:
> 	1:  a+b+c
> 	2: (a+b)+c
> 	3:  a+(b+c)
> all three of them should not be evaluated the same way.  Greg implys that
> they should be.  This is not so.  When an equation contains '(' and ')'
> it intentionaly (and explicitly) defines the parse tree structure that will
> result.  The statement "redundant ()'s grow in C like mushrooms" may be true,
> but it doesn't give anyone the right to arbitrarily ignore explicit cues
> to the compiler.  When I don't care, I don't use them.  When I do, I do
> so for a reason..........

Read the opening paragraphs of section 3.3 of X3J11.  The compiler is
allowed to re-associate multiples uses of *, +, &, ^, and |.  If you
want to prevent this, you have to write:

	+(a+b) + c, etc.

The parentheses you wrote aren't explicit cues to the compiler, by
definition.  (*Why* X3J11 did this is another question...)

Dale
-- 
Dale Worley		Cullinet Software
UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw
ARPA: cullvax!drw@eddie.mit.edu
Un*x (a generic name for a class of OS's) != Unix (AT&T's brand of such)

gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/14/87)

In article <903@cullvax.UUCP> drw@cullvax.UUCP (Dale Worley) writes:
>The parentheses you wrote aren't explicit cues to the compiler, by
>definition.  (*Why* X3J11 did this is another question...)

But an easily answered one:
	1.  X3J11's base document was K&R Appendix A.
	2.  Existing practice.
	3.  C is not Fortran.
	4.  Many parentheses are introduced during macro
		expansion, and it is undesirable to prohibit
		optimization of these.
	5.  An explicit mechanism (unary +) was provided that
		can be used to obtain the desired functionality.

gram@uctcs.uucp (Graham Wheeler) (05/24/91)

I've had several responses to the question I posted, one or two of them
being no help at all and just short of abusive. I guess I phrased things
a bit badly, as only one response actually had an answer to the *real*
question. To recap, I wanted to know what ANSI C specified about 
	
	i) short circuit evaluation of Boolean expressions
	ii) order of evaluation of expressions

I was 99% sure that C used short circuit evaluation (thanks to those 
who confirmed this, anyway). My real question, and one which is relevant
to me as I teach a compiler course, was:

  If short circuit evaluation is allowed, can the compiler still
  rearrange expressions? If so, a check like:

	if (p && p->next)...

  could cause a segmentation violation if p is NULL, *IF* the compiler
  rearranged the expression so that the p->next was tested first.

In practice, a compiler that rearranges expressions will usually do so by
putting the cheaper expression first, especially if short circuit evaluation
is supported. The issue is that in C expressions may have side effects (by
that I include the possibility of a segmentation violation 8-) ) and so the
consequences of such a rearrangement may be serious.

All but one seemed to miss this point. The logical answer is no, the compiler
must evaluate Boolean expressions in the same order as specified in the source
program.

The two most useful responses I got were:

=========================================================================
From: Xing Wu <wuxing@comp.mscs.mu.EDU>

In article <1991May22.092404.25297@ucthpx.uct.ac.za> you write:
>ii) Does it say that Boolean expressions must be evaluated with short-
>	circuit evaluation?

Yes.  Take a look at A7.14 and A7.15 in K&RII (pages 207-8 in my copy).
=========================================================================

and, more pertinently:

=========================================================================
From: Richard Flamsholt S0rensen <richard@iesd.auc.DK>

  C has *always* guaranteed short-circuit evaluation - you may safely
use the practice "if (ptr != NUL && ptr->next ...) ..". This also
implies, that it will *never* rearrange expressions.
=========================================================================

Thanks to those who responded. No thanks to those who were insulting.
Graham Wheeler <gram@cs.uct.ac.za> | "That which is weak conquers the strong,
Data Network Architectures Lab     | that which is soft conquers the hard.
Dept. of Computer Science          | All men know this; none practise it"
University of Cape Town            |		Lao Tzu - Tao Te Ching Ch.78