[net.lang.c] Must useless expressions be evaluated?

keesan@bbncca.ARPA (Morris Keesan) (10/13/83)

    I'm working on modifying a C compiler, and in the course of forcing some
type conversion to be generated correctly for assignment expressions I
discovered that the compiler will generate code for almost anything in the
source, whether or not it has any side effects.  The particular case in point
is the statement

    (char)(i = j);

for which my compiler generates the assignment, followed by code to truncate
(i.e. convert to type "char") the value of the assignment (which is in a
register as a byproduct of the assignment), and then the converted value gets
thrown away as the register is re-used. 
    This leads to the more general question of whether it is ever legitimate
for a C compiler to ignore operations that have no side effects, or whether I
should always generate what's asked of me.  For example:

    (i = j) + k + (++l);

Clearly, (i = j) should be performed, as should (++l).  But is there any reason
to do either addition?  Essentially, the question boils down to whether the
programmer has a right to expect every expression to be evaluated as it falls
into the control flow of the program.  I can find no guidance in Kernighan and
Ritchie on this, except in the case of the comma operator and the logical
&&, ||, and ?: operators, which explicitly cause certain operands to be
evaluated.  I'd like some opinions on this.
    The related question, if people think that some operations are legitimately
ignored when they have no side effects and their value is not used (e.g. when
they appear as expression statements), is which operations?  The list of
operations and expressions I've come up with includes casting, all unary
operations except pre and post ++ and --, pointer dereference of all
kinds ( *, ->, [], and . ), all the arithmetic, boolean, shift, and relational
binary operations, and any expression which is simply a variable name or a
constant.  For all of these, the C Reference Manual defines what the result or
value of the expression is, but not whether it needs to be evaluated. 

					Morris Keesan
					decvax!bbncca!keesan
					{genrad,allegra}!wjh12!bbncca!keesan
					keesan@BBN-UNIX.ARPA

mjs@rabbit.UUCP (10/14/83)

Various optimizers will detect `dead' scratch registers & delete the
assembly language which generates the value (if the assembly code has
no other side effects, such as auto-incrementing a different
register).  I'm not certain if the current /lib/c2 for the VAX does
this, but it should.  Yes, I suppose it would be nice if the compiler
didn't generate such code in the first place, and my guess is that it
would be completely legitimate for it to do so, however, I don't
believe that PCC can easily be made to do this.
-- 
	Marty Shannon
UUCP:	{alice,rabbit,research}!mjs
Phone:	201-582-3199

vmicro1@ucbtopaz.UUCP (10/14/83)

	I think useless expressions must be evaluated, simply because there is
no syntactic criterion to determine the uselessness of any expression. For
example,
	(i=j)+something()+k++;
is not syntactically different from
	(i=j)+1+k++;
but something() probably has side effects.

ken@turtleva.UUCP (Ken Turkowski) (10/14/83)

Another C compiler implementation issue is whether to do computations
in shorts when the destination and all the operands are shorts.  How
about multiplying two shorts giving a long, without converting the
shorts to longs and using lmul?  Testing bits by using the bit test
instruction instead of ANDing?  This last one may be an optimizer
problem, but the first two are related to K&R's assertion that all
computations be done in ints, or that to get a long result, at least
one of the operands must be long.

			Ken Turkowski
		    CADLINC, Palo Alto
		{decwrl,amd70}!turtlevax!ken

mat@hou5d.UUCP (10/15/83)

		I think useless expressions must be evaluated, simply
	because there is no syntactic criterion to determine the uselessness
	of any expression. For example,
			(i=j)+something()+k++;
	is not syntactically different from
			(i=j)+1+k++;


Not SYNTACTICALLY different, but SEMANTICALLY different.  And since the
compiler, at many levels from syntax description through parse trees
and code generation templates right out to the optimizer's linked lists
of statements or quads or basic blocks or whatever HAS got syntactic
information about live values and side effects available, it makes
sense to use that information.

					Mark Terribile
					Duke of deNet

barmar@mit-eddie.UUCP (Barry Margolin) (10/15/83)

ether there are any side effects and whether
the value of an expression is used.

PL/I (or at least the Multics version) allows the programmer to give a
function the "reducible" attribute.  A reducible function is one which
does not have any side effects and whose value is solely a function of
the arguments (i.e., not global variables can affect the value).  The
compiler is then free to throw away excess calls, i.e. it may translate
	y = foo (x) + foo (x);
to
	y = 2 * foo (x);
and it may skip over reducible function calls in boolean expressions,
i.e.
	if bool_var | foo (x) then ...
is permitted to not evaluate foo (x) if bool_var is true.
-- 
			Barry Margolin
			ARPA: barmar@MIT-Multics
			UUCP: ..!genrad!mit-eddie!barmar

kendall@wjh12.UUCP (Sam Kendall) (10/15/83)

                I think useless expressions must be evaluated, simply because
        there is no syntactic criterion to determine the uselessness of any
        expression.  For example,
		(i=j)+something()+k++;
	is not syntactically different from
		(i=j)+1+k++;
	but something() probably has side effects.

Well, `something()' is a function call, and `1' isn't.

Some (syntactic) operators may have side effects, and others never do.  If
the result of one of the former isn't used, it must still be evaluated.  If
the result of one of the latter isn't used, there is no reason to evaluate
it.  Please, let's drop this topic.

	Sam Kendall		  {allegra,ihnp4}!wjh12!kendall
	Delft Consulting Corp.	    decvax!genrad!wjh12!kendall

smh@mit-eddie.UUCP (Steven M. Haflich) (10/15/83)

I am uneasy about any attempt to make a compiler smarter than a
programmer, but can offhand think of only a couple situations in which
it would not be safe to optimize out an unused computation.  In
particular, Unix autoconfiguration code can determine whether a device
exists by peeking at one of its addresses.  If a trap results, the
device doesn't exist.  How can a clever compiler determine whether an
arbitrary reference might have an expected side effect of a trap?

Some hardware provides for interrupt on certain data fetches, e.g., the
little-used status bit in the PDP11 FPU which causes interrupt when -0
is fetched.  The idea is that the hardware can detect reference to a
variable before it is first set.  (I don't know if Vaxen have a similar
frob.)  I could imagine situations where code might want to probe a
variable using this hardware feature, throwing away the fetched value,
in order that an interrupt not occur many levels deep in an evaluator.

Lastly, a demand-paged program which wants to *attempt* real-time
response to some input might try to touch the pages it expected to
reference (e.g. some big array) to increase the probabilty they will be
in physical memory when needed.

Observe, of course, that all these examples involve explicit or
implicit traps.  C is, after all, a system programming languare, often
likened to (ugh) assemble language, and one should be very careful
about disturbing the clear mapping from C instructions into machine
instructions.

	Steve Haflich
	MIT Experimental Music Studio

kendall@wjh12.UUCP (Sam Kendall) (10/16/83)

Steven M. Haflich has some good points about possibly evaluating
expressions that seem to have no side effects, but that actually
do because of reference to special locations.  One should be
able to tell the compiler when such things may occur in a piece
of code, like you can in BLISS only less elaborate,* but anyway I
was jumping the gun when I rudely suggested this topic be dropped.
It is not so easily dismissed, and I apologize.

	Sam Kendall		  {allegra,ihnp4}!wjh12!kendall
	Delft Consulting Corp.	    decvax!genrad!wjh12!kendall

*There was extensive discussion of these points a while back.

ark@rabbit.UUCP (10/16/83)

Barry Margolin points out that if foo is a reducible function,
then the compiler is allowed to generate code for

	if bool_var | foo(x) then ...

which does not evaluate foo if bool_var is true.  This statement
is correct, but slightly misleading -- the PL/I implementation
I have used (and, if I recall correctly, the ANSI standard) permits
the compiler to use short-circuit evaluation in this context even
if foo is not reducible.  In general, the compiler is only obliged
to evaluate enough of an expression to determine its result.

tjt@kobold.UUCP (T.J.Teixeira) (10/17/83)

As others have pointed out, dereferencing a pointer might generate an
illegal memory reference.  Since this could generate a signal which
could be caught by the program, the side effect of the dereferencing
should be visible to the program.

One might even go so far as to say that using computation time is a
side effect.  If the program has access to a cpu-time clock (i.e.
times(II) in UNIX) (or less reliably, just a real-time clock) you could
write a program for a particular machine that depends on the side
effect of the execution time.

Although very few programs rely on these side effects, benchmark
programs are an important example.  How many benchmarks have you seen
that compute some numbers and then just throw them away?  If you don't
even print them out, a compiler could just decide to optimize your
program to nothing!

I believe that something like this actually did happen several years
ago in somebodys us vs. them ads: the ratio between the execution times
was something like 100000.  I believe that the particular example was
supposed to be a FORTRAN program with lots and lots of GOTO's.  The
intention was to determine how fast the machine could execute a branch
instruction.  However, one of the compilers just unravelled the
branches and threw them all away, leading to the astronomically better
apparent performance.

kurt@fluke.UUCP (Kurt Guntheroth) (10/17/83)

Of course it is legitimate to not do ANY operation which has no side
effects.  By definition, if the operation has no side effects, the presence
or absense of the operation will not change the state of execution except
perhaps by changing the amount of time execution takes.

It must be remembered though, that all the operands of a dead operation must
be evaluated if the operands cause side effects.

-- 
Kurt Guntheroth
John Fluke Mfg. Co., Inc.
{uw-beaver,decvax!microsof,ucbvax!lbl-csam,allegra,ssc-vax}!fluke!kurt

barmar@mit-eddie.UUCP (Barry Margolin) (10/18/83)

Yes, Steve Haflich has a point about special locations which have
side-effects just from being accessed.  However, there are plenty of
ways to fool a compiler into leaving the references around without
giving up a very common optimization.  For instance, you could have a
library routine, do_nothing, which takes one argument and just returns.
You could then say "do_nothing(*special_loc)".  The compiler has no way
of knowing that the parameter will be ignored, and I have never heard of
an optimizing loader (at least not that smart).

It is not too smart to depend too heavily on what you expect the
generated code to be.  For instance, *special_loc+*special_loc will
probably NOT access special_loc twice; more likely, it will be loaded
into a register, and doubled.  A good optimizer can really contort your
code in order to generate efficient code.  Perhaps there should be ways
to tell the compiler and optimizer that certain variables should never
be optimized into registers (a not_register declaration?).  Another
place to be careful is in expressions; I believe that C does not define
the order of evaluation of terms in an expression (I know PL/I doesn't).
-- 
			Barry Margolin
			ARPA: barmar@MIT-Multics
			UUCP: ..!genrad!mit-eddie!barmar

thomas@utah-gr.UUCP (Spencer W. Thomas) (10/20/83)

But all of these examples (touching and/or reading a location to cause a
"trap") can be done in a "non-useless" setting by a statement of the form
	i := loc;
Here, you are assigning the value, and so it is not "useless".  In fact,
I had exactly that problem in a sort of garbage collector thing I wrote
- I wanted to test pointers to see if they really pointed to something
before following them.  Just saying *ptr was NOT sufficient - the
compiler threw it away (I think this was in a comma-separated
expression, I'm not sure now).  I had to actually assign it to a
variable before I could get my "expected" trap (naturally I chose a
register variable).

=Spencer