[comp.lang.c] C compiler/specification stuff

rbbb@rice.EDU (04/18/87)

Comments on several of your messages:

Abstract machine--I believe that the "correct" use of an abstract machine
in a language specification is to (it is hoped) provide more meaning for a
"specification by interpretation/translation".  This can run the gamut
from the code-environment-continuation-store denotational semantic model
(where a statement is just a functional transformation on the
environment/continuation/store) and compilation to a PDP-11.  I believe
that a middle ground is more practical right now (and I would rather not
go into the details; "I know one when I see one" :-).

How is one used?

(1) it is asserted that sensible people understand the meaning of behavior
in the abstract machine (thus the need for a simple one).

(2) the language specification describes either a translation to abstract
machine code, or transformations to the abstract machine caused by each
source language element.  This specification may be VERY picky about
evaluation order, etc, because here is where the MEANING of a program is
defined.

(3) the observable behavior of the abstract machine is defined; this is
probably a subset of the total behavior of the abstract machine (this is
so that different compilers can generate different code for different
machines; the actual code generated IS NOT part of the C specification).

(4) An implementation of a language is correct if the implementation has
the same observed behavior as the abstract machine.  Notice that different
notions of "observation" are required for debugging code and for correct
interaction with device drivers.  In C, for instance, it might be claimed
that the volatile variables are observed after every statement in the
program.  This is not, however, strong enough to rule out the following
optimization:

(original)  while ((*csr & mask) == 0) ; /* busy waiting */
(optimized) if ((*csr & mask) == 0) while (1) ; /* infinite loop */

Presumably the examination of volatile variables must somehow be exposed
in the abstract machine's behavior.

There are some situations where it may be desired to leave the
interpretation/translation unspecified; for example, evaluation of
arguments to a procedure, and perhaps evaluation of subexpressions.

Note that if there are clearly no side-effects in the evaluated code, or
if the compiler happens to be smart enough to determine that two
subexpressions do not interfere with each other, that it can still change
the order of evaluation because the final result will be the same.

What this means is that optimizations may be prohibited in the abstract
machine which are in fact legal in real compiled code; if the observable
behavior is the same, then the optimization is ok.  This brings us to:

Integer overflow--The standard OUGHT to say something about this, but I
don't know if it does.  If it doesn't, then any compiler writer for C who
has ever seen another C compiler will say

"AHA! longs/integers/shorts/chars are just integers modulo some power of two"

and you can be sure that the writer will try to re-order statements that
don't contain references to volatile varables.

My gripe about rearrangement of floating point arithmetic is that it
certainly does not obey algebraic rules; modulo integer arithmetic is not
"theoretically associative and commutative", it *IS* associative and
commutative ("Does this program work correctly?"  "Theoretically, yes.")

Debugging and optimizations--read work of Polle Zellweger, read of work of
John Hennessy.  Zellweger's thesis is a good place to start.  They
actually talk about debugging optimized code, what optimizations you can
do, what you cannot, and how compiles and debuggers can get along with
each other.  (I hope I am not misrepresenting their work).

There are three good reasons (in general) not to worry about debugging of
optimized C code (1) if you are debugging it, it is likely that you plan
to recompile several times anyway, and the time spent optimizing the
program is not likely to be recovered while running the program, (2) there
are worthwhile optimizing transformations that make it difficult to
understand what is going on when you debug a program, and (3) the bugs in
your code should be independent of the optimizations applied, except in
the case that (a) you are writing a device driver or (b) you are writing
code for execution in a concurrent environment.  Obviously, (a) calls for
use of volatile variables, and (b) calls for either volatile variables or
a new compiler, depending upon whether this sort of programming is an
occasional thing or a common thing.  There ARE optimizations that are safe
even in a concurrent context; it is not necessary to turn off all
optimization.  If your code has different behavior depending upon whether
or not it has been optimized, I suggest you run lint.

Sigh.  I hope that this makes things a little clearer for someone.
Comments?

David