[comp.lang.c] low level optimization

rkowen@violet.berkeley.edu (Dario Bressanini) (04/10/91)

>>   Consider for a moment (yes, these are not equivalent):
>>
>>           x = ++c;       vs       x == c++;
>> These can be "compiled" as:
>>
>>                                   temp_001 <-- c;
>>           c <-- c + 1;            c <-- c + 1;
>>           x <-- c;                x <-- temp_001;
>>
>> (now throw away the "x=" part (last "instruction").
>>
>>So, "++c" is ``cleaner'' in some pedantic sense[1], and I suppose a
>>sufficiently lacking compiler might actually produce slower code
>>for "c++;" than for "++c;".                  ^^^^^^^^^^^^^^^^^^^^

I always read this group with interest, i have learned a lot following
many discussions, but sometimes the "war"  "This statement is faster
than the other" comes up, like in this case. I don't want to discuss
this particular case, but to make some general personal observations.
Based on my experience, I hardly believe that  in a "real world code"
one can improve the performances of a program using this kind
of "optimization".
When i REALLY HAVE to optimize a program, first of all i use 
a profiler to see where is the bottleneck, and THEN i try to optimize it;
probably I am biased since i mostly write programs (90% in FORTRAN)
for scientific computation (yes, I use floating points :-) ) where usually
you have a single routine, or even a single loop, that takes 95% of 
the CPU time.
In most cases the best way to gain speed was to change completely
the algorithm, and not to make some very low level optimization.
Following the latest "this is faster than that" wars I had 
the impression that they were pure void theoric discussions, without any
connection with the "real world", at least to my world.

Just in case.... I don't want to start the usual and useless war
C vs FORTRAN etc..,i would like to use C for my programs, but in most
cases i tried, the Code produced by the C compiler was 2 or 3 times
slower that the fortran code.

      dario bressanini

juul@diku.dk (Anders Juul Munch) (04/11/91)

rkowen@violet.berkeley.edu (Dario Bressanini) writes:


>>>   Consider for a moment (yes, these are not equivalent):
>>>
>>>           x = ++c;       vs       x == c++;

Of course you mean                    x = c++; - right?

>>> These can be "compiled" as:
>>>
>>>                                   temp_001 <-- c;
>>>           c <-- c + 1;            c <-- c + 1;
>>>           x <-- c;                x <-- temp_001;
>>>
>>> (now throw away the "x=" part (last "instruction").
>>>
>>>So, "++c" is ``cleaner'' in some pedantic sense[1], and I suppose a
>>>sufficiently lacking compiler might actually produce slower code
>>>for "c++;" than for "++c;".                  ^^^^^^^^^^^^^^^^^^^^

>I always read this group with interest, i have learned a lot following
>many discussions, but sometimes the "war"  "This statement is faster
>than the other" comes up, like in this case. I don't want to discuss
>this particular case, but to make some general personal observations.
>Based on my experience, I hardly believe that  in a "real world code"
>one can improve the performances of a program using this kind
>of "optimization".

>[stuff deleted]

You're probably right, this is hardly a worth-while optimization.
He does have a point, though, in that post-increment/decrement
is a rather clumsy construct from a compiler point of view.
As long as it is such a simple statement, any compiler can sort
out how to do it the fastest way, anyway. But when the increment
is in the middle of some assignment or array index, it may not
be able to do that.

	I once read saw an implementation of quicksort in a 
magazine, which contained code like:

	array[index++] = something;

They stated it was written that way for efficiency.
I tried compiling that code, together with the rewrite

	array[index] = something;
	index++;

And then I looked at the code that my "real-world" compiler 
(Turbo C++ 1.0) generated for these two code fragments  -
	Identical!

The morale of the story is, don't use increment/decrement-operators
within other expressions - it makes the code less readable
and it's probably doesn't give you any performance gain anyway. 


- Anders Munch, juul@diku.dk
/------------------------------------------------------\
|      (Insert your disclaimer of choice here)         |
|                                                      |
\------------------------------------------------------/

andand@cia.docs.uu.se (Anders Andersson) (04/15/91)

In <1991Apr9.213601.12309@agate.berkeley.edu> rkowen@violet.berkeley.edu (Dario Bressanini) writes:


>>>   Consider for a moment (yes, these are not equivalent):
>>>
>>>           x = ++c;       vs       x == c++;
>>> These can be "compiled" as:
>>>
>>>                                   temp_001 <-- c;
>>>           c <-- c + 1;            c <-- c + 1;
>>>           x <-- c;                x <-- temp_001;
>>>
>>> (now throw away the "x=" part (last "instruction").
>>>
>>>So, "++c" is ``cleaner'' in some pedantic sense[1], and I suppose a
>>>sufficiently lacking compiler might actually produce slower code
>>>for "c++;" than for "++c;".                  ^^^^^^^^^^^^^^^^^^^^

>I always read this group with interest, i have learned a lot following
>many discussions, but sometimes the "war"  "This statement is faster
>than the other" comes up, like in this case. I don't want to discuss
>this particular case, but to make some general personal observations.
>Based on my experience, I hardly believe that  in a "real world code"
>one can improve the performances of a program using this kind
>of "optimization".
>When i REALLY HAVE to optimize a program, first of all i use 
>a profiler to see where is the bottleneck, and THEN i try to optimize it;
>probably I am biased since i mostly write programs (90% in FORTRAN)
>for scientific computation (yes, I use floating points :-) ) where usually
>you have a single routine, or even a single loop, that takes 95% of 
>the CPU time.
>In most cases the best way to gain speed was to change completely
>the algorithm, and not to make some very low level optimization.
>Following the latest "this is faster than that" wars I had 
>the impression that they were pure void theoric discussions, without any
>connection with the "real world", at least to my world.

>Just in case.... I don't want to start the usual and useless war
>C vs FORTRAN etc..,i would like to use C for my programs, but in most
>cases i tried, the Code produced by the C compiler was 2 or 3 times
>slower that the fortran code.

To me it seems you are contradicting yourself. First you tell us that you
only find algorithmic improvement worthwhile, then you complains about slow
code produces by C compared to Fortran. 
  As you are probably well aware, the speed difference you noticed is very
dependent of computer and compiler, but a few observations may be of interest:
You cannot do much low-level optimization yourself in Fortran, right?
So, Fortran compilers tend to have very clever optimizers, especially when
the program does number crunching which often is quite easy to handle for an
optimizer. 
  C, on the other hand, allows the programmer a lot more freedom to tune his or
her program himself, using register variables, pointer arithmetic and other
interesting stuff. This freedom gives optimizers a hard time (especially 
pointers is a big problem, since a optimizer cannot assume that certain
values can be kept in registers if a indirect assignment can change them..)
  Fortran may therefore have an edge on number crunching, but I doubt it holds
in general (not that many successful operating systems, compilers, word processors
or whatever are written in Fortran (at least not that I know of, no need to flame))
  Further, were the C versions of the programs you compared tuned by somebody that
is used to doing the tuning ? Were they tuned at all ?  If you want maximum speed
you have to do pointer arithmetic rather than array indexing on many machines/compilers.
(Since you can do this yourself in C, most optimizers don't bother with it, while
Fortran optimizers virtually allways makes this for you). When converting from Fortran
you also have to consider the column-major and row-major storage used for two
dimensional arrays and so forth.
  Most of the time improvement of the algorithm is preferable, but when you have
the best known algorithm and your program is still slobbish what do you do then ?
Turn to assembly language ?  I have found low-level optimizations of the discussed
kind most helpful in tight loops, sometimes speeding it up with a factor of two or so.
The only problem is that many such optimizations are machine dependent (sometimes
even compiler dependent) for example: the MC68000-series has indirect addressing modes
with post-increment and pre-decrement, corresponding to C expressions like *(p++) and
*(--p) respectively. There are no pre-increment or post-decrement addressing modes,
however. Knowing this may speed up many loops when running on a 68k, but may not
have any effect whatsoever on another machine (I bet there is some machine out there 
that has only pre-increment and post-decrement on which it will run slower...)  

-Anders Andersson      (andand@cia.docs.uu.se)

andand@cia.docs.uu.se (Anders Andersson) (04/15/91)

In <1991Apr11.150858.4242@odin.diku.dk> juul@diku.dk (Anders Juul Munch) writes:

[ stuff deleted ]

>	I once read saw an implementation of quicksort in a 
>magazine, which contained code like:

>	array[index++] = something;

>They stated it was written that way for efficiency.

That's crazy. If you want efficiency in QSort you should use
pointer arithmetic like:

	*(pointer++) = something;  

Then it makes a difference...
OK, OK, not true an all machines but *absolutly* true on a
MC600x0. (Yes, I've tried it!)

>I tried compiling that code, together with the rewrite


>	array[index] = something;
>	index++;

>And then I looked at the code that my "real-world" compiler 
>(Turbo C++ 1.0) generated for these two code fragments  -
>	Identical!

I can't imagine a compiler where they wouldn't be. (Unless there is 
some strange addressing modes on the machine).

>The morale of the story is, don't use increment/decrement-operators
>within other expressions - it makes the code less readable
>and it's probably doesn't give you any performance gain anyway. 

The moral of my comment is, if you want speed you'd better not stop
half way - go for pointer arithmetic and make your code truly
unreadable. (But run a profiler first... You might not want it
everywhere...)

[ name and disclamer deleted ]

-Anders Andersson  (andand@cia.docs.uu.se)

gwyn@smoke.brl.mil (Doug Gwyn) (04/16/91)

In article <andand.671719926@cia.docs.uu.se> andand@cia.docs.uu.se (Anders Andersson) writes:
-In <1991Apr9.213601.12309@agate.berkeley.edu> rkowen@violet.berkeley.edu (Dario Bressanini) writes:
->Just in case.... I don't want to start the usual and useless war
->C vs FORTRAN etc..,i would like to use C for my programs, but in most
->cases i tried, the Code produced by the C compiler was 2 or 3 times
->slower that the fortran code.
-  As you are probably well aware, the speed difference you noticed is very
-dependent of computer and compiler, but a few observations may be of interest:
-You cannot do much low-level optimization yourself in Fortran, right?
-So, Fortran compilers tend to have very clever optimizers, especially when
-the program does number crunching which often is quite easy to handle for an
-optimizer. 

I don't think that is the proper explanation of the observed phenomenon.
For any commercial-quality system, I would be amazed if an equally
competent implementation of nearly any algorithm in Fortran and C gave
much edge to Fortran.  The one possibility for a significant speed
difference of which I am aware is the requirement for pre-ANSI C that
all floating-point computations be performed in double precision, even
if the variables are declared as single-precision.  Standard C discards
that old requirement.

jlg@cochiti.lanl.gov (Jim Giles) (04/16/91)

>>	array[index++] = something;
>> [...]
>
> That's crazy. If you want efficiency in QSort you should use
> pointer arithmetic like:
>
> 	*(pointer++) = something;  

If your compiler does something which makes the first slower than
the second, it's time to get a new compiler.  At some time in the
code you had to do something like:

      pointer = &array[1];

or even
     
      pointer = array;

The compiler should be able to perform the same operation outside
any loops and achieve the same level of optimization.  The user's
code needn't contain such things.

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (04/17/91)

[...]
For any commercial-quality system, I would be amazed if an equally
competent implementation of nearly any algorithm in Fortran and C gave
much edge to Fortran.  [...]

We've been through this again and again.  C can't hold a candle to
Fortran for array manipulation because Fortran is free to assume that
array arguments to procedures are _not_ aliased to each other or to
globals.  C must assume that all pointers are possibly aliased to
each other, to globals, and to any locals to which the & operator
has been applied.  There is not a single significant optimization
technique that is not inhibited to some extent or other by aliasing.

Some C implementations have local extensions (using #pragma statements)
that allow the programmer to declare that certain variables are not
aliased - all very non-portable.  Some experimental systems allow the
compiler to do restricted types of interprocedural analysis to detect
aliasing - this is non-standard since it violates separate compilation
constraints.  I've never seen a purely standard conforming C compiler
that can come close to Fortran.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (04/17/91)

In article <21527@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
> We've been through this again and again.  C can't hold a candle to
> Fortran for array manipulation because Fortran is free to assume that
> array arguments to procedures are _not_ aliased to each other or to
> globals.

Except on vector machines, Fortran array code can be converted into C
pointer code which will run at the same speed *without* the optimizer
that Fortran requires. Aliasing does not inhibit this (usually rather
mechanical) translation. On vector machines aliasing is a problem, but
they provide adequate directives to control aliasing.

On the other hand, no matter how powerful your Fortran optimizer is,
there will be a pointer program P for which any array code translation
will be SLOWER. This is a simple application of the halting problem.
Most importantly, that pointer program P comes up frighteningly often in
practice.

Looks to me like C wins here.

> There is not a single significant optimization
> technique that is not inhibited to some extent or other by aliasing.

Hand optimization. Try it, and your programs will run a lot faster for
relatively little effort.

> Some experimental systems allow the
> compiler to do restricted types of interprocedural analysis to detect
> aliasing - this is non-standard since it violates separate compilation
> constraints.

The second half of that statement is incorrect. The compiler must act AS
IF the code had been separately compiled. This in no way prohibits
interprocedural optimizations. Read the standard (or any of the
available elementary introductions to it) to learn about the as-if rule.

---Dan

gwyn@smoke.brl.mil (Doug Gwyn) (04/17/91)

In article <21527@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>We've been through this again and again.

Indeed.

>C can't hold a candle to Fortran for array manipulation because
>Fortran is free to assume that array arguments to procedures are
>_not_ aliased to each other or to globals.  C must assume that all
>pointers are possibly aliased ...  There is not a single significant
>optimization technique that is not inhibited to some extent or other
>by aliasing.

The latter is a gross overgeneralization, and in any event the
assertion is simply not true.  There are SOME circumstances under
which aliasing must be assumed to be possible, but by no means ALL
circumstances.

I wasn't particularly referring to array-intensive applications
anyway, because most interesting "scientific" applications that
I have encountered require more flexibility than the use of fixed
arrays provides.  Each language should be dealt with on its own
terms, not treated as an inferior substitute for one's favorite
language.

>I've never seen a purely standard conforming C compiler
>that can come close to Fortran.

Well, I have.

jlg@cochiti.lanl.gov (Jim Giles) (04/17/91)

In article <15870@smoke.brl.mil>, gwyn@smoke.brl.mil (Doug Gwyn) writes:
|> In article <21527@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
|> [...]                                        C must assume that all
|> >pointers are possibly aliased ...  There is not a single significant
|> >optimization technique that is not inhibited to some extent or other
|> >by aliasing.
|> 
|> The latter is a gross overgeneralization, and in any event the
|> assertion is simply not true.  There are SOME circumstances under
|> which aliasing must be assumed to be possible, but by no means ALL
|> circumstances.

All array arguments to a procedure must be assumed aliased to each
other and to any globals of the same base type.  Indeed, due to the
fact that most C programmers violate the rules prohibiting type casts
of pointers to different types, all arrays should be assumed aliased
to globals and other pointers of _any_ type.  Any locally declared
pointers which are assigned to the any of the above (pointer args,
and/or the addresses of globals) must also be assumed aliased to
these things.

|> [...]
|> I wasn't particularly referring to array-intensive applications
|> anyway, because most interesting "scientific" applications that
|> I have encountered require more flexibility than the use of fixed
|> arrays provides.  Each language should be dealt with on its own
|> terms, not treated as an inferior substitute for one's favorite
|> language.

Then you shouldn't make unqualified statements that are not true.
Your claim was that C was inherently as fast or faster than Fortran
for all operations (you made the exception of _old_ C implementations
that forced double computation instead of float).  To now say that you
weren't including array-intensive applications is a little late.

Most of the array calculations I have encountered require more 
flexibility than pointers provide - like the flexibility to declare
two things _not_ to be aliased.

|> [...]
|> >I've never seen a purely standard conforming C compiler
|> >that can come close to Fortran.
|> 
|> Well, I have.


Really?  I'm sitting on the edge of my chair.  Where can I learn more
about this paragon?  It must use compilation techniques that are not
described in any of the existing literature since I have seen verified
to my satisfaction that 1) aliasing _does_ inhibit _all_ major optimization
techniques, and 2) most uses of C pointers are either aliased in truth,
or are not distinguishable (by the compiler) from aliased ones.  The only
way around this difficulty is interprocedural analysis - but C requires
separate compilation. 

J. Giles

mhcoffin@tolstoy.waterloo.edu (Michael Coffin) (04/18/91)

In article <21660@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>The only way around this difficulty [detecting aliasing] is
>interprocedural analysis - but C requires separate compilation.

Where in heaven's name did you get the idea that separate compilation
precludes interprocedural analysis?  Ken Kennedy's group at Rice U.
have for years been doing extremely fancy interprocedural analysis of
Fortran programs and simultaneously providing separate compilation.
Take a look at the work done on PFC back in the late 70's and early
80's.  They operate under the same constraints that C demands---that
object code has to repect the semantics of the language, regardless of
what analysis is done.  (Note, by the way, that Fortran compiler
writers seem to find interprocedural analysis necessary for producing
efficient code from Fortran in spite of the rules against aliased
parameters.)

-mike

jlg@cochiti.lanl.gov (Jim Giles) (04/18/91)

In article <1991Apr17.190243.24691@watmath.waterloo.edu>, mhcoffin@tolstoy.waterloo.edu (Michael Coffin) writes:
|> [...]
|> Where in heaven's name did you get the idea that separate compilation
|> precludes interprocedural analysis?  [...]

Separate compilation and interprocedural analysis are (in the abstract)
mutually exclusive.  You can, of course, adopt some intermediate approach
in which _some_ interprocedural information is available at compile
time.  To the extent that you do this, you do _not_ have separate
compilation.  The only information available to the compiler about
procedures that are separately compiled are the names of routines
that the current procedure calls and the number and type of their
arguments and results (if any).  For a language which requires
separate compilation, _NO_OTHER_INFORMATION_ is known about an
external procedure.  The reason is that _all_ other characteristics
of an external procedure are allowed to _change_ without the need to
recompile any uses of that procedure.

The bottom line is that separate compilation implies that procedures
may be used together for which the _source_ of the separate routines
has never even existed in the same programming environment and only
the object code has been distributed.  Further, even the object code
of each may not be available at the time each of the other procedures
is compiled.  Hence, under separate compilation there may be no internal
information about other procedures used when a given procedure is compiled
because such information may not exist, may be speculative, or may 
subsequently change without notice.

To be sure, a great deal of research has been done on the consequences of
various compromises to separate compilation.  Some techniques require that
more information about externals be made available through interface specs
or header files.  Some techniques move some of the compilation process
into later tools (like the loader).  Indeed, by moving the code generation
and optimization phases of compilation into the loader, you can have the
benefits of full interprocedural analysis while technically remaining
compatible with language specifications which require separate compilation.
In fact, such a loader could 'inline' separately compiled procedures, do
constant folding, etc. across procedure boundaries and other amazing things.
Hypothetically, such an environment could optimize the infamous "hello
word" test code into a single system call. _I_ have not _seen_ production
quality compiler/loader combinations that do this.

J. Giles

henry@zoo.toronto.edu (Henry Spencer) (04/18/91)

In article <21660@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>... The only
>way around this difficulty is interprocedural analysis - but C requires
>separate compilation. 

And the top traffic light has to be red - but blueberries are blue.  The
two parts of that statement have about as much to do with each other as
in yours.  There is no shortage of standard-conforming C compilers that
do vigorous interprocedural analysis if asked to; the ones from MIPS,
for example.  Even actual separate compilation does not actually preclude
interprocedural analysis, although it certainly makes it harder.  But the
problem at hand is nowhere near that hard, since you can simply decline
to provide both capabilities simultaneously.  You do interprocedural
analysis across all source files supplied to a single invocation of the
compiler, and if the user is supplying some modules as object files,
the interprocedural analysis simply doesn't extend over their boundaries.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

gwyn@smoke.brl.mil (Doug Gwyn) (04/18/91)

In article <21660@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
-All array arguments to a procedure must be assumed aliased to each
-other and to any globals of the same base type.

No, this is not true (assuming you are talking about pointer
arguments; C does not have array arguments).  As I and others
have recently told you, sometimes this must be assumed and
other times it need not be, depending on context.

-Indeed, due to the fact that most C programmers violate the rules
-prohibiting type casts of pointers to different types, all arrays
-should be assumed aliased to globals and other pointers of _any_
-type.

We know you're a Fortran bigot, but that's ridiculous.

-|> I wasn't particularly referring to array-intensive applications
-|> anyway, because most interesting "scientific" applications that
-|> I have encountered require more flexibility than the use of fixed
-|> arrays provides.  Each language should be dealt with on its own
-|> terms, not treated as an inferior substitute for one's favorite
-|> language.
-Then you shouldn't make unqualified statements that are not true.
-Your claim was that C was inherently as fast or faster than Fortran
-for all operations (you made the exception of _old_ C implementations
-that forced double computation instead of float).  To now say that you
-weren't including array-intensive applications is a little late.

You have this annoying habit of pretending I said things that I didn't.
I didn't EXCLUDE array operations in the statement sited above, and I
didn't say that C was inherently as fast as Fortran for all operations.
What I said was:

->>> I don't think that is the proper explanation of the observed phenomenon.
->>> For any commercial-quality system, I would be amazed if an equally
->>> competent implementation of nearly any algorithm in Fortran and C gave
->>> much edge to Fortran.  The one possibility for a significant speed
->>> difference of which I am aware is the requirement for pre-ANSI C that
->>> all floating-point computations be performed in double precision, even
->>> if the variables are declared as single-precision.  Standard C discards
->>> that old requirement.

I don't see any point in continuing to argue with you, any more than
the last time you stuck your nose into the C newsgroup to run down the
language and promote Fortran for everything.  If you want a more complete
picture of the "aliasing problem" in C, you should read some of the
papers from NCEG.  "A little knowledge is a dangerous thing."

mhcoffin@tolstoy.waterloo.edu (Michael Coffin) (04/18/91)

In article <21703@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>For a language which requires
>separate compilation, _NO_OTHER_INFORMATION_ is known about an
>external procedure.  The reason is that _all_ other characteristics
>of an external procedure are allowed to _change_ without the need to
>recompile any uses of that procedure.

The C standard doesn't say Thou Must Compile Separately.  It says that
the meaning of the program---roughly speaking, the output obtained by
running the program---doesn't depend on information gleaned from
intermodule analysis.  Nothing in the standard says that a compiler
can't take advantage of all available information to produce good
code.

-mike

jlg@cochiti.lanl.gov (Jim Giles) (04/18/91)

In article <1991Apr17.225944.15261@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes:
|> [...]                                           You do interprocedural
|> analysis across all source files supplied to a single invocation of the
|> compiler,  [...]

And, if you subsequently change _some_ of those source files?  What 
then?  Under this model, you'd have to recompile _all_ the others 
that were originally compiled together.  This violates one of the 
principle reasons to have separate compilation to begin with - the 
elimination of the needto recompile most of your code every time 
you change a piece of it.

Note: I'm not arguing that what you suggest is a bad idea.  I think
personally that what you've said here is a good way to improve
performance.  I'm just pointing out that it _does_ violate the
standards of both Fortran and C which are defined to be separately
compiled.  There are a lot of _non-standard_ ways to improve
performance.  But, it is always a mistake to allow someone to 
claim that any of these are really standard after all.`

Finally, I don't have access to any production quality compilers
which do interprocedural analysis for either Fortran or C.  Fortran
doesn't need the feature as much as C does.

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (04/19/91)

In article <15881@smoke.brl.mil>, gwyn@smoke.brl.mil (Doug Gwyn) writes:
|> [...]                                         As I and others
|> have recently told you, sometimes this must be assumed and
|> other times it need not be, depending on context.

You can tell me that until you are blue in the face.  I have never seen
any of these hypothetical contexts in which anonymous pointer arguments
passed in as parameters can be safely assumed not to be aliased.
Please give an example.  Please state why you consider the example 
be representative of a common case in C.  Since I have never seen
such, I suspect that if an example exists it is _very_ obscure.

|> [... aliasing of pointers to mixed type ...]
|> 
|> We know you're a Fortran bigot, but that's ridiculous.

What's rediculous about it?  Even the C standard says it's illegal.
Why can't I complain about it?  It is a _very_ common cause of errors.

Finally, what has this particular complaint have to do with Fortran?
I would have made the same comment if I had been comparing C to _any_
well designed language (Pascal, Ada, Scheme, ...).

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (04/19/91)

> [...]
> The C standard doesn't say Thou Must Compile Separately.  [...]

No.  As far as I know, it doesn't use the word "Thou" at all.
The standard _does_ say:

    A C program need not all be translated at the same time.  The
    text of a program is kept in units called _source_files_ in 
    this standard, and previously translated files may be preserved 
    individually or in libraries.  The separate files of a program 
    communicate by calls to functions whose identifiers have external
    linkage, and by manipulation of objects whose identifiers have
    external linkage.  Translated files may be linked with previously
    translated libraries to produce an executable program.

Note: the standard specifically says that files may be compiled
in any order and that the only standard communication between files
is through external variables and functions.

J. Giles

henry@zoo.toronto.edu (Henry Spencer) (04/19/91)

In article <21812@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>|> [...]                                           You do interprocedural
>|> analysis across all source files supplied to a single invocation of the
>|> compiler,  [...]
>
>And, if you subsequently change _some_ of those source files?  What 
>then?  Under this model, you'd have to recompile _all_ the others 
>that were originally compiled together...

Correct.  Arranging to do this is pretty trivial with something like make.
Of course, having to do it is a nuisance.  Using some sort of intermediate
form as the basis for interprocedural analysis could bypass *some* of the
overhead, but having to reprocess all the relevant code to some extent is
really rather fundamental to interprocedural optimization.

>... I'm just pointing out that it _does_ violate the
>standards of both Fortran and C which are defined to be separately
>compiled...

I can't speak for Fortran, but please cite chapter and verse out of X3.159
when you say this about C.  You are, simply, wrong.  You are confusing the
characteristics of specific implementations with what is mandated by the
standard.  There is *nothing* in ANSI C which demands classical separate
compilation or prevents interprocedural analysis.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

henry@zoo.toronto.edu (Henry Spencer) (04/19/91)

In article <21818@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>    ... Translated files may be linked with previously
>    translated libraries to produce an executable program.

Note that there is no definition of what "translation" means.  It could
mean something as trivial as tokenizing the source and storing it for
later processing.  It does not have to mean classical compilation.

>Note: the standard specifically says that files may be compiled
>in any order and that the only standard communication between files
>is through external variables and functions.

No, it says that they may be *translated* in any order, which does
not necessarily mean compilation.  As for the communication, note the
discussion in 2.1.2.3, which basically says that the implementation
can do anything it pleases so long as the externally-visible results
of running the program remain much the same.  Portable user code cannot
rely on other communication paths, but the implementation can use any
information it can get.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

barmar@think.com (Barry Margolin) (04/19/91)

In article <21812@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>In article <1991Apr17.225944.15261@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes:
>|> [...]                                           You do interprocedural
>|> analysis across all source files supplied to a single invocation of the
>|> compiler,  [...]
>
>And, if you subsequently change _some_ of those source files?  What 
>then?  Under this model, you'd have to recompile _all_ the others 
>that were originally compiled together.  This violates one of the 
>principle reasons to have separate compilation to begin with - the 
>elimination of the needto recompile most of your code every time 
>you change a piece of it.

When one is in a heavy edit-compile-test loop, one normally disables many
of the optimizations that slow down the compile step.  So, during this
stage of development you probably wouldn't use the inter-module analysis
option, and would only have to recompile the modules that you edited.

When you get to the stage of compiling with all the optimization options,
then you might have to recompile lots of files each time you change just
one.  Hopefully, by this time in the development such changes should be
rare.

>  I'm just pointing out that it _does_ violate the
>standards of both Fortran and C which are defined to be separately
>compiled.  There are a lot of _non-standard_ ways to improve
>performance.  But, it is always a mistake to allow someone to 
>claim that any of these are really standard after all.`

I interpreted the section of the standard that you quoted in another
posting as *allowing* separate compilation.  This means that the programmer
isn't forced to compile everything together in order to achieve the
semantics specified by the language.  However, this doesn't prohibit an
implementation from providing extensions that compile a group of source
files together as a batch.
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

jlg@lanl.gov (Jim Giles) (04/19/91)

In article <1991Apr18.190403.29049@Think.COM>, barmar@think.com (Barry Margolin) writes:
> [...]
> I interpreted the section of the standard that you quoted in another
> posting as *allowing* separate compilation.  This means that the programmer
> isn't forced to compile everything together in order to achieve the
> semantics specified by the language.  [...]

The standard _allows_ the _USER_ to decide whether to compile separately
or together.  In the standard, the word "may" applies to user choices,
the corresponding constraint on implementations is that it _must_ work
correctly when the user chooses to exercise the feature.  In this case,
if the user chooses only to recompile one file and not any of the others
(which is allowed by the "may" clause), the implementation _must_ allow
this to be linked with the previously translated files to produce a
program.

Sorry, those are the rules.  These ANSI standards describe capabilities
of the language - the requisite properties of an implementation must
be inferred from that.  And, as I said, if the standard says you _may_
do something, the implementation _must_ permit you to do it.

J. Giles

jlg@lanl.gov (Jim Giles) (04/19/91)

In article <1991Apr18.185442.7546@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes:
> [...]
> Note that there is no definition of what "translation" means.  It could
> mean something as trivial as tokenizing the source and storing it for
> later processing.  It does not have to mean classical compilation.

Echoing my own previous arguments is not going to get you anywhere.
I previously have stated that if you move the "backend" processes
of compilation into a post-compilation tool (like the loader), that
you could do interprocedural analysis and still remain standard
comforming.  I also said that this is not what you were recomending,
and that I had seen no production quality compiler/loader environments
fitted to do this.

If you want now to recommend this implementation as opposed to the
non-conforming method of dependent compilation you were previously
recommending, then i have no quarrel with you.

J. Giles

cs450a03@uc780.umd.edu (04/19/91)

Joe English writes:
   [about inter-procedural optimization]
>What happens, then, if you compile foo.c and bar.c together, and
>later change and recompile bar.c alone?  foo.o (or its equivalent)
>might have been optimized to use information that's no longer valid.
>Does the Standard say that it's OK for an implementation to require
>recompilation of foo.c at this point?

It might be worth pointing out that if a compiler was designed with
the option of doing inter-procedural analysis, and if that option were
used, there would be no need to have generated a "foo.o" in the first
place.

Note that I'm not saying that there would be no intermediate files,
just that they should not be confusable with intermediate files used
for independent compilation.

Also note that I'm not saying that the compiler couldn't generate
"foo.o" -- maybe somebody would throw in a switch to gratuiously
create independent object files even though compiling with
inter-procedural optimization turned on.

Finally, note that I know of no such compiler, and I'm not about to
write one.

Raul Rockwell

henry@zoo.toronto.edu (Henry Spencer) (04/19/91)

In article <21848@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>If you want now to recommend this implementation as opposed to the
>non-conforming method of dependent compilation you were previously
>recommending, then i have no quarrel with you.

Either method is fully, 100% conforming.  There is nothing non-conforming
about a compiler that does certain optimizations only in favorable situations,
e.g. when it is compiling all source modules involved.  It has to be able to
cope with separately-translated files, but that does not require that it
translate each and every file independently.  It is perfectly entitled to
notice that it is being asked to process more than one file, and to do its
optimizations on all of them together rather than each one separately.  The
only constraint is that certain aspects of the external behavior of the
resulting program must not vary with such optimizations.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

henry@zoo.toronto.edu (Henry Spencer) (04/19/91)

In article <21846@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>if the user chooses only to recompile one file and not any of the others
>(which is allowed by the "may" clause), the implementation _must_ allow
>this to be linked with the previously translated files to produce a
>program.

Of course, it may not run as fast as if he had compiled them all together.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

jlg@cochiti.lanl.gov (Jim Giles) (04/19/91)

In article <1991Apr18.233807.19552@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes:
|> In article <21846@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
|> >if the user chooses only to recompile one file and not any of the others
|> >(which is allowed by the "may" clause), the implementation _must_ allow
|> >this to be linked with the previously translated files to produce a
|> >program.
|> 
|> Of course, it may not run as fast as if he had compiled them all together.

I see that I'm going to have to give a specific example.  Suppose
that functions A and B are in two separate files.  Suppose that
I compile them 'together' in some way (that is, the compiler makes
some use of internal knowledge of B while translating A and vice-versa).
Now, suppose I _change_ A and retranslate just A (I do _not_ retranslate
B).  The standard requires that I still be able to link both into a
single program.  If the translation of B made any _profitable_ use
of internal information about A, then that information is obsolete
(and probably downright _wrong_) after A has been changed.  Yet, the
standard requires that the new version of A still be usable with the
previously translated B.  Either the use the translator made of the
interprocedural analysis was trivial, or B will contain unsafe
optimizations - the only third way is if 'translation' does not
include code generation (which will subsequently be done by some
post-translation tool: like the loader - just like I keep saying).

I don't understand why you continue to hang on like grim death
to the fiction that dependent compilation satisfies the requirements
of the standard.

J. Giles

jeenglis@alcor.usc.edu (Joe English) (04/19/91)

henry@zoo.toronto.edu (Henry Spencer) writes:
>  There is nothing non-conforming
>about a compiler that does certain optimizations only in favorable situations,
>e.g. when it is compiling all source modules involved.  It has to be able to
>cope with separately-translated files, but that does not require that it
>translate each and every file independently.  It is perfectly entitled to
>notice that it is being asked to process more than one file, and to do its
>optimizations on all of them together rather than each one separately.

What happens, then, if you compile foo.c and bar.c together, 
and later change and recompile bar.c alone?  foo.o (or its equivalent)
might have been optimized to use information that's no longer valid.
Does the Standard say that it's OK for an implementation to 
require recompilation of foo.c at this point?

(My *guess* is that, yes, an implementation is allowed to say "you 
must recompile foo.c" or even generate an incorrect program in
a case like this; as far as I know the C Standard has very little
to say about the programming environment as far as revision control
is concerned.  Ada has rules to handle this sort of situation,
but C doesn't seem to.  Then again, I don't own, and have never even read,
the ANSI C Standard so this is all conjecture.)

>The
>only constraint is that certain aspects of the external behavior of the
>resulting program must not vary with such optimizations.

If that's the case, and given that 'make' and similar utilities
don't keep track of the compilation history, it would be extremely
difficult to do any kind of cross-file interprocedural optimizations
in a conforming C implementation for a conventional (e.g.,
Unixlike) environment.  I have to agree (as painful as it is to do
so) with Jim Giles on this point.



--Joe English

  jeenglis@alcor.usc.edu 
  Apologies in advance to Doug Gwyn et al. for stating an opinion about the 
  Standard without having read it.

bhoughto@bishop.intel.com (Blair P. Houghton) (04/19/91)

In article <21812@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>In article <1991Apr17.225944.15261@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes:
>|> [...] You do interprocedural analysis across all source files
>|>supplied to a single invocation of the compiler,  [...]
>
>And, if you subsequently change _some_ of those source files?  What 
>then?  Under this model, you'd have to recompile _all_ the others 
>that were originally compiled together.  This violates one of the 
>principle reasons to have separate compilation to begin with - the 
>elimination of the needto recompile most of your code every time 
>you change a piece of it.

ANSI X3.159-1989, sec. 2.1.2.3, p. 8, l. 30:
"...issues of optimization are irrelevant."

The only reason you need to recompile (in this situation,
from the information given) is to provide the compiler
enough information to optimize a certain subset of the
syntax.  If you want optimizations of this sort, you must:

    (a) use a compiler which provides this optimization;
    (b) write a makefile that indicates the dependencies
	inherent in this optimization; and,
    (c) run make(1) instead of cc(1).

Neither are any of these things specified by the standard,
nor are any of them prohibited to be used in concert with a
conforming implementation of the C language.

Basically, if you want to do something outside the realm of
all sensibility, you may have to hold its hand.

				--Blair
				  "And you'll _still_ spend less time each
				   day recompiling source files than you do
				   shaking your penis dry at the urinal."

steve@groucho.ucar.edu (Steve Emmerson) (04/19/91)

In <21868@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:

>I see that I'm going to have to give a specific example.  Suppose
>that functions A and B are in two separate files.  Suppose that
>I compile them 'together' in some way (that is, the compiler makes
>some use of internal knowledge of B while translating A and vice-versa).
>Now, suppose I _change_ A and retranslate just A (I do _not_ retranslate
>B).  The standard requires that I still be able to link both into a
>single program.  ...

I don't believe the C Standard requires that a compiler capable of the
above generate the separate "object" files A.o and B.o when they are
somehow "compiled together".  My understanding is that it could instead
generate the "object" file AB.o.  Thus, your problem can't occur: you
can't link A.o against a re-compiled B.o *unless* you generated A.o
separately as well.

Steve Emmerson        steve@unidata.ucar.edu        ...!ncar!unidata!steve

bhoughto@bishop.intel.com (Blair P. Houghton) (04/19/91)

In article <21818@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>Note: the standard specifically says that files may be compiled
>in any order and that the only standard communication between files
>is through external variables and functions.

They sure may, but why is it so horrible to you to document
to the users of your optimized libraries that certain
arguments to certain functions must be carefully
non-aliased?  That kind of restriction stopped nobody from
specifying a standard function that defines and returns a
static object (thus making more than one call to the
function a risky construct, at best...)

				--Blair
				  "How does Yogi eat all that
				   honey right out of the tree
				   without getting stung?"

bhoughto@bishop.intel.com (Blair P. Houghton) (04/19/91)

In article <21868@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>I don't understand why you continue to hang on like grim death
>to the fiction that dependent compilation satisfies the requirements
>of the standard.

I haven't seen anyone state such a thing.  What I have
seen is plenty of statements that equate to the standard's
requirement that a conforming program be translatable by
a conforming implementation into a form that can be used
to produce the output associated with a given input.  It's
just tough luck if this prevents you from expecting
a certain optimization to be applied when you compile
in strictly conforming mode.

This has absolutely no effect on the form of the code in
the source files.  Exactly the opposite.  It provides for
the desired state where a program can be translated on any
number of machines without alteration of the source files.

If, however, it's provided you with extensions that allow
you to produce odd effects, and you use them, then you
are exceeding the bounds of the standard's influence, and
whatever is required by the extension you must obey or
it just won't work, apparently.  This still has no
effect on the source code.  You haven't stated that
special syntax must be provided in the source files
in order to use this optimization you so desire, so
there's no reason that my strictly conforming compiler
that doesn't optimize in this way and your specialized
compiler that does optimize in this way shouldn't be
able to produce the same output.

You're so hyped up about the fact that separate compilation
prevents use of this optimization, but you haven't been
so vocal about the case where you deliberately alias one
of your arrays, which also blows your optimization.

If you want to go out of your way, you have to go out of your way.

				--Blair
				  "If the implementation also has
				   knobby tires, a half-fairing, and a
				   bottle of diethyl ether, then it's
				   also an offroad crotch-rocket, but
				   it better do the -ansi thing for
				   `main(){printf("Hello, world.\n");}'"

barmar@think.com (Barry Margolin) (04/19/91)

In article <21846@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>The standard _allows_ the _USER_ to decide whether to compile separately
>or together.  In the standard, the word "may" applies to user choices,
>the corresponding constraint on implementations is that it _must_ work
>correctly when the user chooses to exercise the feature.  In this case,
>if the user chooses only to recompile one file and not any of the others
>(which is allowed by the "may" clause), the implementation _must_ allow
>this to be linked with the previously translated files to produce a
>program.

Once you invoke the implementation's extension that does inter-module
optimization (why does everyone keep calling this interprocedural analysis?
that usually refers to optimization across procedures within the same
file), you have ventured outside the scope of the standard, just as if you
had invoked a compiler option that enables some implementation-specific
feature.

Rather than referring specifically to files, the standard probably should
have described the behavior in terms of "translation units", which could
encompass multiple source files.  This would have allowed multifile
compilation to be defined in the standard, rather than making it an
extension, and then the semantics would be covered by the section you
quoted.
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (04/19/91)

Anyone who isn't sure about the issues involved here should keep the
following in mind while reading Jim's articles: A standard-conforming
compiler can include foo.c as part of foo.o. When it links foo.o and
bar.o together to make a program, it still has foo.c and bar.c right
there inside the object files, so it can do all types of interprocedural
analysis.

In practice, compilers don't include *all* of foo.c and bar.c---only
some useful information about the procedure calls and variable use. This
still conforms with the standard.

Jim, I recommend that you look at the MIPS C compilers.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (04/19/91)

In article <21815@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
> You can tell me that until you are blue in the face.  I have never seen
> any of these hypothetical contexts in which anonymous pointer arguments
> passed in as parameters can be safely assumed not to be aliased.
> Please give an example.

(``Anonymous pointer arguments''?)

sort(table,records,n)
int *table;
struct blah *records;
int n;

This routine sorts n records, setting table[i] to the position of the
i'th smallest record. This is, in fact, a very common type of sort,
hardly a ``hypothetical'' or contrived example. I recommend that you
read Knuth to learn about sorting and its many applications.

Since neither ``int'' nor ``struct blah'' is ``char'' or ``void'', a
standard compiler may assume that the pointers are not aliased.

(I don't expect Jim to respond to this article; he always ignores
counterexamples.)

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (04/19/91)

In article <21703@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
> The only information available to the compiler about
> procedures that are separately compiled are the names of routines
> that the current procedure calls and the number and type of their
> arguments and results (if any).

This is incorrect. The standard makes no special allowance for ``the
names of routines that the current procedure calls and [etc.].'' Each
C source file has a *meaning* dependent solely upon the information in
that file and its inclusions. Nevertheless, optimizations do not change
that meaning and may depend upon other information.

> For a language which requires
> separate compilation, _NO_OTHER_INFORMATION_ is known about an
> external procedure.

This is incorrect. Again, the meaning of a source file depends solely
upon that source file, prototypes in #includes, etc., but lots of other
information may be known to the compiler.

> The reason is that _all_ other characteristics
> of an external procedure are allowed to _change_ without the need to
> recompile any uses of that procedure.

This is incorrect. Compilation consists of translation (like foo.c
becoming foo.o) and loading (like foo.o and bar.o becoming a.out).
The *translation* of one file is independent of other files, but the
*loading* depends on all the files together.

> The bottom line is that separate compilation implies that procedures
> may be used together for which the _source_ of the separate routines
> has never even existed in the same programming environment and only
> the object code has been distributed.

This is incorrect. The ANSI C standard requires separate compilation. It
does not define ``the same programming environment'' or ``distributed.''
(As I pointed out in my previous article, a conforming compiler may
produce object code that includes the entire source code, so Jim's
statement is obviously wrong.)

> Further, even the object code
> of each may not be available at the time each of the other procedures
> is compiled.

This is incorrect. Compilation includes loading, and loading requires
all object code for the program. It's true that the other objects may
not be available at *translation* time, but optimizations aren't
restricted to translation time.

> Hence, under separate compilation there may be no internal
> information about other procedures used when a given procedure is compiled
> because such information may not exist, may be speculative, or may 
> subsequently change without notice.

This is incorrect. The correct information may be carried along with
each object file.

---Dan

jlg@cochiti.lanl.gov (Jim Giles) (04/19/91)

> I don't believe the C Standard requires that a compiler capable of the
> above generate the separate "object" files A.o and B.o when they are
> somehow "compiled together".  My understanding is that it could instead
> generate the "object" file AB.o.  Thus, your problem can't occur: you
> can't link A.o against a re-compiled B.o *unless* you generated A.o
> separately as well.

Well, my example was the other way around (I postulated retranslating
A and not B), but I'll use you version just as well.  The standard says
that I _may_ link a fresh translaton of B with previously translated
files (like A for example).  It does _not_ say that I can only do this
if A was not previously translated with an older version of B or anything
to that effect.  (Note again: anything the standard says I _may_ do,
the implementation _must_ permit me to do.)  

To reiterate, the standard says I may link translated files to previously
translated ones to form programs.  It places no constraints on this.  As
long as each translated file is itself standard conforming, linking them
together is a capability required by the standard.

Now, it _does_ make linkage of translated units based on the definition
of the units as "source files".  So, I would read that to mean that
interprocedural analysis and optimization _is_ allowed between functions
defined in the _same_ source file.  This is contrary to the standard
practice of putting functions definitions into separate files as much
as is possible.  Further, I've not seen any compilers which take advantage
of this method of optimization.  In fact, Pascal did not even allow 
separate compilation and I never saw a Pascal compiler which did inter-
procedural analysis either - compiler technology just hasn't quite got
to to the point where such analysis is the commonplace of production
compilers.  When it does become commonplace, care must be exercised
in applying it to C and still claiming standard conformance.

J. Giles

henry@zoo.toronto.edu (Henry Spencer) (04/19/91)

In article <16692@chaph.usc.edu> jeenglis@alcor.usc.edu (Joe English) writes:
>>  There is nothing non-conforming
>>about a compiler that does certain optimizations only in favorable situations,
>>e.g. when it is compiling all source modules involved...
>
>What happens, then, if you compile foo.c and bar.c together, 
>and later change and recompile bar.c alone?  foo.o (or its equivalent)
>might have been optimized to use information that's no longer valid.

Why do you assume that foo.o even exists?  The standard requires that it
exist at some point during the compilation of foo.c+bar.c, but it could
be only a compiler-internal temporary, with foobar.o being generated as
the result of the compilation.  Actually, there might not even be a .o
file, just the final a.out.  (Remember that the standard allows things
like C interpreters, so it is very carefully worded to place very few
constraints on how the results of translation show up.)

>Does the Standard say that it's OK for an implementation to 
>require recompilation of foo.c at this point?

I would tentatively interpret that as being legitimate only if foo.o was
not actually made available to the user.  Otherwise you would really have
to bend the rules to dodge around the "you can compile things separately
and link them later" part.  It's not actually breaking the rules, because
all they say is that there must be *a way* to do it, and maybe the way is
to compile files one at a time, but it's not going to endear the compiler
writer to his customers.

>>The
>>only constraint is that certain aspects of the external behavior of the
>>resulting program must not vary with such optimizations.
>
>If that's the case, and given that 'make' and similar utilities
>don't keep track of the compilation history, it would be extremely
>difficult to do any kind of cross-file interprocedural optimizations
>in a conforming C implementation for a conventional (e.g.,
>Unixlike) environment...

Why so?  When I say "external behavior", I mean of the generated code,
not of the compiler.  There are any number of ways you can contrive to
make things work, the simplest being the foobar.o approach that denies
any opportunity to make them not work.

You'd have to modify the Makefiles to feed such a compiler properly.
It's often necessary to modify Makefiles for the sake of odd compilers.
ANSI C does not place any constraints on that. :-)
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

jlg@cochiti.lanl.gov (Jim Giles) (04/19/91)

> [...]       (why does everyone keep calling this interprocedural analysis?
> that usually refers to optimization across procedures within the same
> file) [...]

Oh.  Quite so.  Only the common practice of C programmers putting each
function in its own separate file makes the mistake double easy to fall
into.  Not to mention the obvious fact that the implementation details
of interprocedure and intermodule analysis would be almost identical
from the implementors point of view.


> Once you invoke the implementation's extension that does inter-module
> optimization [...]
> [...] you have ventured outside the scope of the standard, just as if you
> had invoked a compiler option that enables some implementation-specific
> feature.

That's what I keep trying to point out.  Thank heavens for someone 
finally joining the discussion who understands the issue.  Further,
I agree that the standard could have been just subtly reworded to 
eliminate the objection just stated.  But, the fact remains that it
was worded the way it was (and probably for reason - vendors would
like to continue to be able to distribute only object code and if
large pieces of the object code are interdependent, this would make
incremental fixes and distributions of such much more difficult).

J. Giles

dbrooks@osf.org (David Brooks) (04/20/91)

In article <21868@lanl.gov>, jlg@cochiti.lanl.gov (Jim Giles) writes:
|> I see that I'm going to have to give a specific example.  Suppose
|> that functions A and B are in two separate files.  Suppose that
|> I compile them 'together' in some way (that is, the compiler makes
|> some use of internal knowledge of B while translating A and vice-versa).
|> Now, suppose I _change_ A and retranslate just A (I do _not_ retranslate
|> B).  The standard requires that I still be able to link both into a
|> single program.

This has never been the case.  Using pre-ANSI syntax:

B:

	some_routine(123);

A:

	some_routine(arg)
	int arg;
	{...

A after editing:

	some_routine(flag, arg)
	int flag, arg;
	{...
-- 
David Brooks				dbrooks@osf.org
Systems Engineering, OSF		uunet!osf.org!dbrooks
Donnie Wahlberg is brought to you by three billion years of evolution.

steve@groucho.ucar.edu (Steve Emmerson) (04/20/91)

In <21961@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:

>To reiterate, the standard says I may link translated files to previously
>translated ones to form programs.  It places no constraints on this.  As
>long as each translated file is itself standard conforming, linking them
>together is a capability required by the standard.

Hmm...  it appears you didn't understand  what I was trying to say so I
must not have stated it clearly enough.  Let me try again.

In my opinion the following two scenarios do not violate the standard
and yet allow for global-optimization *and* incremental
compilation (though not necessarily at the same time).

Scenario #1 (demonstration of global-optimization):

	$ cc -O3 -o foobar.o -c foo.c bar.c
	$ cc foobar.o

Scenario #2 (demonstration of the *required* functionality of incremental
compilation):

	$ cc -O -c foo.c
	$ cc -O -c bar.c
	$ cc foo.o bar.o

Since, in the above, one can't do both global-optimization and
incremental compilation at the same time, I suppose one could argue
that the above implementation doesn't conform to the standard (at least
when performing global-optimization).  This might be true, but I
consider it unimportant because, to me at least, the important issues
are

	1) whether or not the *language* disallows global-optimization 
	   (it does not); and 

	2) whether or not the standard *forbids* behavior or
	   capabilities outside its scope (it cannot).

I'll second Barry's sugestion about the MIPS C compiler.

Steve Emmerson        steve@unidata.ucar.edu        ...!ncar!unidata!steve

jlg@cochiti.lanl.gov (Jim Giles) (04/20/91)

It has been suggested that a program of the following type constitutes
a counterexample to my previous statements about pointers and aliasing:

func(i,j)
char *i;
float *j;
...

However, my original statement about pointers and aliasing was that
pointer arguments to the _same_ underlying type must be assumed to be
aliased to each other and to all globals of that type (as well as any
local object of that type whose address has possibly been assigned
to one of these visible pointers, etc.).

I admit that I also (and distinctly separately) mentioned that a 
prudent C compiler would also assume aliasing in many other cases 
because of the common practice that C programmers seem to covet of 
casting pointers in illegal ways.  It is only in that sense that 
the above type of program is even relevant to anything I have been 
saying. In fact, I do believe that a prudent C implementation should 
assume that i and j in the above are aliased.  But, since such aliasing 
is illegal, it does not trouble me that a standard conforming C compiler
is free to assume that they are _not_ aliased.  Either way, it does
effect my main point.

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (04/20/91)

> You're so hyped up about the fact that separate compilation
> prevents use of this optimization, but you haven't been
> so vocal about the case where you deliberately alias one
> of your arrays, which also blows your optimization.

Since I _never_ deliberately alias array arguments either to each
other or to globals to which the procedure I'm calling has access,
this is not an issue to me.  In _most_ code (not just mine) _most_
of the arrays that you pass around as parameters are not aliased
and _most_ C compilers forgo a perfectly valid optimization in
order to cater to the much rarer case where someone had "deliberately"
aliased something.  Deliberate aliasing happens all the time in
processing recursive data structures (like linked lists, trees, graphs,
etc.) but for array manipulation it is almost always a serious error -
no matter how carefully the compiler works.  It would be nice if C
let me pass arrays to procedures _without_ converting them to pointers
and it would be nice if the compiler were free to assume that array
parameters were not aliased to each other or to any globals.  If they
become aliased to any locals, that could only happen through loaclly
visible action that the compiler can see without interprocedure
analysis.

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (04/20/91)

> [...]     but why is it so horrible to you to document
> to the users of your optimized libraries that certain
> arguments to certain functions must be carefully
> non-aliased?  [...]

It's not horrible.  You miss my point if you believe that I would
oppose that.  You give me a portable way to declare to the compiler
that two (or more) arguments are assumed not to be aliased and I will 
rejoice.

The problem is that what people are recommending is _not_ portable
and is _not_ an explicit declaration (so the end user has to guess 
which variables the compiler optimized in this way and which are 
safe to alias).

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (04/20/91)

> This has never been the case.  Using pre-ANSI syntax:
> B:
> 	some_routine(123);
>
> A:
> 	some_routine(arg)	{...
>
> A after editing:
>	some_routine(flag, arg)

Irrelevant to this discussion since it was always clearly understood
(and clearly stated at the beginning of the discussion) that the valid
means of communication between separately translated units were external
functions and external variables.  If you introduce a change which makes
these incompatible, you have stepped outside the domain of this discussion.

The main purpose of separate compilation is two-fold: to save compilation
time when only part of the code has been modified and to allow a vendor
to alter his libraries for revisions without having to distribute the
_whole_ library again or distribute source.  Presumably, incompatible
revisions require more effort than the more common compatible ones.

J. Giles

mmcg@bruce.cs.monash.OZ.AU (Mike McGaughey) (04/20/91)

bhoughto@bishop.intel.com (Blair P. Houghton) writes:

>In article <21868@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>>I don't understand why you continue to hang on like grim death
>>to the fiction that dependent compilation satisfies the requirements
>>of the standard.

>I haven't seen anyone state such a thing.  What I have
>seen is plenty of statements that equate to the standard's
>requirement that a conforming program be translatable by
>a conforming implementation into a form that can be used
>to produce the output associated with a given input.  It's
>just tough luck if this prevents you from expecting
>a certain optimization to be applied when you compile
>in strictly conforming mode.

And, in fact, it doesn't - if your system wants to do 'serious'
interprocedural analysis while still allowing seperate, order
independent compilation, there's nothing to stop it from adding an
extra table to the object file describing the properties of that object
file, and using a clever linker (post-compiler?) to sort all that
information out.

Any specification that non-static names are the only 'standard'
inter-module information available applies only to C source, not object
modules - after all, who ever expected object code to be portable?  All
we know is that they must support *at least* the information required
by C.  There's no reason why .o files can't consist of pure RTL.  Or C
source code, for that matter.  Although that *would* require a better
linker :).

Not that I've read the ANSI spec - I'm too poor.

Cheers,

    Mike.

ps: the MIPS compilers (which are pretty good :) provide this option -
you can produce ucode files instead of .o files (independantly); the
ucode linker provides global optimisation.
-- 
Mike McGaughey			AARNET:	mmcg@bruce.cs.monash.oz.au

"His state is kingly; thousands at his bidding speed,
 And post o'er land and ocean without rest."		- Milton.

rjc@cstr.ed.ac.uk (Richard Caley) (04/20/91)

In article <21868@lanl.gov>, Jim Giles (jg) writes:

jg> Suppose
jg> that functions A and B are in two separate files.  Suppose that
jg> I compile them 'together' in some way (that is, the compiler makes
jg> some use of internal knowledge of B while translating A and vice-versa).
jg> Now, suppose I _change_ A and retranslate just A (I do _not_ retranslate
jg> B).  The standard requires that I still be able to link both into a
jg> single program.  If the translation of B made any _profitable_ use
jg> of internal information about A, then that information is obsolete
jg> (and probably downright _wrong_) after A has been changed.  Yet, the
jg> standard requires that the new version of A still be usable with the
jg> previously translated B.  Either the use the translator made of the
jg> interprocedural analysis was trivial, or B will contain unsafe
jg> optimizations - the only third way is if 'translation' does not
jg> include code generation (which will subsequently be done by some
jg> post-translation tool: like the loader - just like I keep saying).

Fourth option:

When compiling `together' the compiler creates object files
containing two complete sets of code, data etc. The linker looks at
the object files given and if they were not all compiled together (eg
if you had recompiled one) it uses the non-optimised code.

I'm sure the net.experts can come up with much better solutions, this
is simply the one which was most obvious to me.

--
rjc@cstr.ed.ac.uk		_O_
				 |<

mmcg@bruce.cs.monash.OZ.AU (Mike McGaughey) (04/20/91)

jlg@cochiti.lanl.gov (Jim Giles) writes:

>> Once you invoke the implementation's extension that does inter-module
>> optimization [...]
>> [...] you have ventured outside the scope of the standard, just as if you
>> had invoked a compiler option that enables some implementation-specific
>> feature.

>That's what I keep trying to point out.  Thank heavens for someone 
>finally joining the discussion who understands the issue.

A pity you dont.  All that the above means is that ANSI C doesn't force
you to optimise code.  As plenty of people have pointed out, it doesn't
force you *not* to optimise code either, nor does it limit the extent
of your optimisations.  You just optimise *after* the C-to-intermediate-
code translation is done.

Mike.
-- 
Mike McGaughey			AARNET:	mmcg@bruce.cs.monash.oz.au

"His state is kingly; thousands at his bidding speed,
 And post o'er land and ocean without rest."		- Milton.

aipdc@castle.ed.ac.uk (Paul Crowley) (04/21/91)

In article <21815@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>I would have made the same comment if I had been comparing C to _any_
>well designed language (Pascal, Ada, Scheme, ...).

C is bad. Therefore Pascal is good.

Pass the sick bag.

Actually, why is it that reading comp.lang.c is the one thing sure to
put me off C and start me looking for a language that I like?  I keep
telling people it's possible to program well, and even to prove your
programs, in C, but the more I read this group the less sure I get. 
Aren't there any good languages out there?
                                         ____
\/ o\ Paul Crowley aipdc@castle.ed.ac.uk \  /
/\__/ Part straight. Part gay. All queer. \/

gwyn@smoke.brl.mil (Doug Gwyn) (04/21/91)

In article <1991Apr18.233653.19435@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>There is nothing non-conforming about a compiler that does certain
>optimizations only in favorable situations, ...

Note that WITHIN a translation unit various "interprocedural" optimizations
may be performed to good effect, including identification of alias-free
pointer arguments.  One of the C compilers I routinely use goes so far as
to expand small functions "in-line" in order to avoid function linkage
overhead.

gwyn@smoke.brl.mil (Doug Gwyn) (04/21/91)

In article <3906@inews.intel.com> bhoughto@bishop.intel.com (Blair P. Houghton) writes:
>You're so hyped up about the fact that separate compilation
>prevents use of this optimization, but you haven't been
>so vocal about the case where you deliberately alias one
>of your arrays, which also blows your optimization.

Which is one reason why this is not a C-specific issue.
The fact that the Fortran standard prohibits a program from
calling subroutines with arguments that are overlapping array
sections, thereby enabling a certain class of optimizations
(loop vectorization), means that a class of useful program
constructs are also outlawed.  Any reasonable language that
supports pointer arguments needs to deal with this issue one
way or another; Fortran came down on the side of restricting
the programmer, while C came down on the side of empowering
the programmer.  Due to these stupid "Fortran vs. C" arguments,
C implementors have looked very hard at ways to perform
Fortran-like optimizations for C pointer parameter usage, and
have devised a number of techniques for enabling the
optimization in many circumstances.  Cray's Standard C
Compiler release 2.0 is an example of a C compiler that
performs extensive optimization, including loop vectorization.

gwyn@smoke.brl.mil (Doug Gwyn) (04/21/91)

In article <21964@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>... the common practice of C programmers putting each function in its own
>separate file makes the mistake double easy to fall into.  ...

This is the second "common C programming practice" you have raised
in this thread that is in fact NOT a common practice among skilled
C programmers.  If your argument is that C requires a certain
degree of expertise to use WELL, then you're wasting our time,
since that has never been in question.

bhoughto@hopi.intel.com (Blair P. Houghton) (04/22/91)

In article <22021@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
[...and forgets to attribute me...]
>> prevents use of this optimization, but you haven't been
>> so vocal about the case where you deliberately alias one
>> of your arrays, which also blows your optimization.
>
>Since I _never_ deliberately alias array arguments either to each
>other or to globals to which the procedure I'm calling has access,
>this is not an issue to me.

And I don't depend on Fortrannical optimizations that can break
whenever the vendor feels like it.  If I want things linked
that closely, I'll keep them close.

[...lots of "it would be nice if C were Fortran" deleted...]

Some of us like the fact that arrays are passed as
pointers, and can be aliased rampantly.  It's a matter of
looking _both_ ways before stepping into the street, though
the traffic is _supposed_ to be traveling in only one
direction per lane...

				--Blair
				  "Go figure."

bhoughto@hopi.intel.com (Blair P. Houghton) (04/22/91)

In article <22025@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
[...and forgets to attribute me _again_...]
>> [...]     but why is it so horrible to you to document
>> to the users of your optimized libraries that certain
>> arguments to certain functions must be carefully
>> non-aliased?  [...]
>
>It's not horrible.  You miss my point if you believe that I would
>oppose that.

It's the best solution there is, and you still want
to crock a working language.

>You give me a portable way to declare to the compiler
>that two (or more) arguments are assumed not to be aliased and I will 
>rejoice.

Rejoice away:

	double *array_mollify(double *bar)
	/* Mr. Compiler, I declare that no aliasing is being done and I've put it in the man-page so you can optimize until the whole function is but an opcode */
	{
	    ...
	}

It's as good as any, and you can forget portable.  Your
optimization is dependent on the cpu, the OS, the compiler,
the compiler flags, and the main-memory organization.
At the least.

With all that against it, how bad would it be if it's also
dependent on the source language?  How good would it be
to understand finally that we've been telling the truth
for three days when we say it is most definitely NOT
dependent on the source language?

>The problem is that what people are recommending is _not_ portable
>and is _not_ an explicit declaration (so the end user has to guess 
>which variables the compiler optimized in this way and which are 
>safe to alias).

Are you doing this deliberately?  Nobody has to guess
a thing.  You tell them.  "Argument 1 is safe; when
argument 2 is unaliased and you use the -bletch flag,
the compiler is free to optimize the arrays in the
Fooby way; if you specify -bletch but alias argument 2,
grave danger will befall the system."

On the other hand, it's portable as long as you don't
expect it.  You can't expect it.  Optimization is
irrelevant.  Portability is given.  If I "translate" a
conforming program into sign language and get Koko the
gorilla to perform the correct operations on the data, it's
still ANSI.

				--Blair
				  "Koko knows the way to the
				   Ice cream store, and cried
				   when she was told her kitten
				   had died, but I bet she doesn't
				   do pointer-abalias-dependent
				   optimizations..."

jlg@cochiti.lanl.gov (Jim Giles) (04/22/91)

> And, in fact, it doesn't - if your system wants to do 'serious'
> interprocedural analysis while still allowing seperate, order
> independent compilation, there's nothing to stop it from adding an
> extra table to the object file describing the properties of that object
> file, and using a clever linker (post-compiler?) to sort all that
> information out.

Sounds exactly like what _I_ keep saying: the functionality _can_
be performed in a standard conforming way by the loader or some
other load-time tool.

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (04/22/91)

> Scenario #1 (demonstration of global-optimization):
> 
>	$ cc -O3 -o foobar.o -c foo.c bar.c
>	$ cc foobar.o
>
> Scenario #2 (demonstration of the *required* functionality of incremental
> compilation):
>
>	$ cc -O -c foo.c
>	$ cc -O -c bar.c
>	$ cc foo.o bar.o

The point you seem to miss is that after scenario 1, the vendor will 
ship foobar.o to the customer.  Later, the vendor may change _just_
bar.c.  He will recompile it and send bar.o to the customer.  The
customer will then link bar.o and foobar.o to yield a program with the
new version of bar in it.  The standard says the vendor _may_ do this.
Which means that the implementation _must_ allow it to be done.

Since the end user never has foo.c (and doesn't even know whether
foo was really written in C or not), scenario 2 is not relevant to
the discussion.

J. Giles

P.S.  As I keep pointing out, the _loader_ (or some load-time tool)
_can_ satisfy the standard and still do intermodule optimization.
The 'translator' (usually thought of as the compiler) cannot.

jfw@ksr.com (John F. Woods) (04/22/91)

jlg@lanl.gov (Jim Giles) writes:
>In article <1991Apr18.190403.29049@Think.COM>, barmar@think.com (Barry Margolin) writes:
>> I interpreted the section of the standard that you quoted in another
>> posting as *allowing* separate compilation.  This means that the programmer
>> isn't forced to compile everything together in order to achieve the
>> semantics specified by the language.  [...]
>The standard _allows_ the _USER_ to decide whether to compile separately
>or together.

The standard allows the user to have "translation" done in separate pieces.
For example, my compilation system generates object modules by translating
all comments to Swahili and stores the resulting C code using LZW compression.
The job of the linker is to turn the object modules, plus requested libraries,
into an executable (I usually skip the phase where it logs the comments (in
Sanskrit) to the system console, in order to save paper).  The only thing that
the standard ALLOWS users to depend on is external linkage.  As long as this
compilation system obeys the famous "AS IF" rule, it is standard conformant,
end of discussion.

> Sorry, those are the rules.

Try understanding the rules before yapping about them.  For example,

> And, as I said, if the standard says you _may_
> do something, the implementation _must_ permit you to do it.

If a compilation system waits until it knows where all the source code is
to perform code generation, what is the user not being permitted to do?
Get slow code?  No thanks!

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (04/23/91)

In article <22004@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
> It has been suggested that a program of the following type constitutes
> a counterexample to my previous statements about pointers and aliasing:
> func(i,j)
> char *i;
> float *j;

Well, no, that's not what I said, and you should learn the difference
between char/void pointers and non-char/void pointers.

> However, my original statement about pointers and aliasing was that
> pointer arguments to the _same_ underlying type must be assumed to be
> aliased to each other and to all globals of that type

Jim, I despise the way that you are trying to escape your mistake by
failing to quote what you claim to be responding to. You were arguing
with Doug Gwyn; you said ``I have never seen any of these hypothetical
contexts in which anonymous pointer arguments passed in as parameters
can be safely assumed not to be aliased.'' I showed you how a very
common type of sorting routine provided exactly such a context.

Now are you saying you've never seen such sorting routines? Or do you
admit that you simply haven't thought through the issues here?

---Dan

jlg@cochiti.lanl.gov (Jim Giles) (04/23/91)

> [...]                       Any reasonable language that
> supports pointer arguments needs to deal with this issue one
> way or another; Fortran came down on the side of restricting
> the programmer, while C came down on the side of empowering
> the programmer.  [...]

No.  Fortran restricted the programmer in one way, C restricted the
programmer in a different way.  In Fortran, if I want to do things
in an overlapping way to a single array, I must pass that array 
around by itself and make the overlap explicit everywhere it
occurs.  In return for that constraint, the compiler optimizes
my code _much_ better than it could if it had to assume all
array arguments were aliased to each other.  In C, the compiler
prevents me from telling the compiler in any way whatsoever that
pointer arguments are not aliased - even though they _usually_
aren't.  In return, I get the dubious advantage of hiding overlap
from procedures.

For a language to _empower_ the user, it must provide a portable
and explicit way of telling the compiler whether two arguments are
allowed to be aliased.  Neither language does this.  In fact, no 
language in popular use does.

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (04/23/91)

In article <RJC.91Apr20023816@brodie.cstr.ed.ac.uk>, rjc@cstr.ed.ac.uk (Richard Caley) writes:
|> In article <21868@lanl.gov>, Jim Giles (jg) writes:
|>
|> jg> [...]          the only third way is if 'translation' does not
|> jg> include code generation (which will subsequently be done by some
|> jg> post-translation tool: like the loader - just like I keep saying).
|> 
|> Fourth option:
|> 
|> When compiling `together' the compiler creates object files
|> containing two complete sets of code, data etc. The linker looks at
|> the object files given and if they were not all compiled together (eg
|> if you had recompiled one) it uses the non-optimised code.

Same as my third option.  The load-time tool actually does the 
interprocedural analysis.  Your solution merely makes the loader's
'code generation' duties simpler at the expense of more expensive
compilation and larger object files.  And, again, no implementation
does this (yet).

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (04/23/91)

>> And, as I said, if the standard says you _may_
>> do something, the implementation _must_ permit you to do it.
>
> If a compilation system waits until it knows where all the source code is
> to perform code generation, what is the user not being permitted to do?
> Get slow code?  No thanks!

If the compilation system waits until it knows where all the source
code is to perform code generation, it's got a long wait coming.  Many
vendors don't distribute source code.  Your compilation system is 
_never_ going to know where the vendor's source code is.  What you are
not permitting the user to do is one of the very things that separate
compilation is _for_ - vendor distribution of _object_ code.  If your
compiler/language requires source to get fast code, you should think
about changing languages.

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (04/24/91)

bhoughto@hopi.intel.com (Blair P. Houghton) writes:

> [...]                              How good would it be
> to understand finally that we've been telling the truth
> for three days when we say it is most definitely NOT
> dependent on the source language?

Please hurry up and contact the X3J11 committee with this news!
They have a subcommittee (the Numerical C Extensions Group - NCEG)
who are wasting all sorts of time trying to correct this aliasing
problem that C has.  Since you _know_ it's not a problem, they are
obviously all idiots.  You should hurry to set them straight.

Whoops!  Forgot to mention: NCEG is also affiliated with WG14 (the
ISO C working group).  So you better set those guys straight too.

(It couldn't be that these people know something that Blair Houghton
doesn't?  Nah - no chance.)

J. Giles

rjc@cstr.ed.ac.uk (Richard Caley) (04/24/91)

In article <22245@lanl.gov>, Jim Giles (jg) writes:

jg> Sounds exactly like what _I_ keep saying: the functionality _can_
jg> be performed in a standard conforming way by the loader or some
jg> other load-time tool.

There have been at least two postings giving methods by which a
conforming implementation can do inter-file optimisations with the
linker doing nothing but link and allowing the user to later change
and recompile one file at the expense of loosing the optimisations. 

Do you read this group?

--
rjc@cstr.ed.ac.uk

martelli@cadlab.sublink.ORG (Alex Martelli) (04/24/91)

rkowen@violet.berkeley.edu (Dario Bressanini) writes:
	...
:When i REALLY HAVE to optimize a program, first of all i use 
:a profiler to see where is the bottleneck, and THEN i try to optimize it;
:probably I am biased since i mostly write programs (90% in FORTRAN)
:for scientific computation (yes, I use floating points :-) ) where usually
:you have a single routine, or even a single loop, that takes 95% of 
:the CPU time.
Ciao Dario!  In interactive tasks such as solid modeling and CAD, you
would still find (in my experience) 'single-bottlenecks' responsible for
heavy slowdown of many tasks (no, not 95%!, but 30-40% are not uncommon),
except that different tasks, and different users, would hit different
bottlenecks - as a commercial program evolves over many releases, the
various 'specific bottlenecks' are found and removed.

:In most cases the best way to gain speed was to change completely
:the algorithm, and not to make some very low level optimization.
I can't argue with that - particularly as the bottlenecks are removed
over the life of an evolving program, there are times when you must
fully redo the design of a major subsystem, including all data structures
and often even the interfaces to other subsystems, for REAL performance
improvement!

:Following the latest "this is faster than that" wars I had 
:the impression that they were pure void theoric discussions, without any
:connection with the "real world", at least to my world.
Since when do you mind 'pure void theoric discussions', DB...?-) {Personal
joke warning, me and DB are old fidonet friend/enemies!-}

:Just in case.... I don't want to start the usual and useless war
:C vs FORTRAN etc..,i would like to use C for my programs, but in most
:cases i tried, the Code produced by the C compiler was 2 or 3 times
:slower that the fortran code.
On what platforms?  On workstation-class hardware, my experience is
just the reverse: despite all the theoretical advantages of Fortran,
which SHOULD be vastly easier to optimize for numerical computation
(no aliasing allowed...), I find most Fortran-generated code quite a
bit slower than C.  I posted a benchmark once where on an incredibly pure
numerical computation, a 2D FFT over a 256x256 complex matrix, you
could even go faster by converting Fortran to C with f2c, then compiling
the C results, than by directly compiling the Fortran source!!!  That
was on Sparc workstations with C and Fortran compilers of 18 months
ago, and I hope *this* particular aberration has since been removed,
but others remain and indeed loom larger and larger (bulk unformatted
I/O for checkpointing, for example, surely an important task for any
long-running program where the operating system is not so kind as to
do checkpointing for you transparently).  On PC's, it's even worse, as
if compiler vendors barely cared for Fortran performance (particularly
in bulk I/O) but were locked in a mad race for C performance, or
something like that...
I do recall from my mainframe/supercomputing days that it USED to be
different - blazing FAST fortran-generated code, versus barely passable
C-generated code - although I DON'T recall any '2 or 3 times' difference,
possibly because I worked on machine where double precision math was
just as fast as single precision (I believe this is something of a
'trademark' of IBM machines, from 3090's down to PCs with FP coprocs,
and also including RS6000's WSs) - modern C's should allow you to use
single precision anyway where/when it matters (ANSI allows that, Sun C
while non-ANSI offers a special option for this, etc).  Anyway I'm not
surprised if your results are on supers/mainframes, since apparently
those guys are GOOD at writing Fortran compilers (maybe it is that they
CARE about it!), while their WSs colleagues appear to be either less
able or less motivated... 
If your results are instead obtained on WSs/PCs, we should exchange more
detailed notes - either you don't optimize your C well, or I don't optimize
my Fortran well...  and this DOES matter to me, since, differing from you,
I'd like to keep using Fortran for those numerically-heavy parts of our
programs where Fortran does perfectly fine, thank you, without having to
pay a performance PRICE for that, however!!!
-- 
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 53, Bologna, Italia
Email: (work:) martelli@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).

rob@kaa.eng.ohio-state.edu (Rob Carriere) (04/25/91)

In article <22246@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) spouts forth:
>P.S.  As I keep pointing out, the _loader_ (or some load-time tool)
>_can_ satisfy the standard and still do intermodule optimization.
>The 'translator' (usually thought of as the compiler) cannot.

Err, yes it can.  Someone else has already explained how to do this by
including optimized and non-optimized versions of the code in the same object
file.  Since I assume that you read less selectively than you answer, I won't
repeat his scheme.

So, where does this leave us?  We now have 2 conforming implementations of
intermodular analysis.  Mister Giles started this whole brouhaha with a claim
that C coudn't be optimized as well as FORTRAN, since you needed to do IM
analysis for that and the standard didn't allow such things.  I would think
that 2 implementations count as sufficient counterexamples, so would mister
Giles either admit he was wrong or else shut up?

Nice try to divert the argument, though.

SR
---

rjc@cstr.ed.ac.uk (Richard Caley) (04/25/91)

In article <22355@lanl.gov>, Jim Giles (jg) writes:

jg> In article <RJC.91Apr20023816@brodie.cstr.ed.ac.uk>, rjc@cstr.ed.ac.uk (Richard Caley) writes:
jg> [...]          the only third way is if 'translation' does not
jg> include code generation (which will subsequently be done by some
jg> post-translation tool: like the loader - just like I keep saying).

rjc> When compiling `together' the compiler creates object files
rjc> containing two complete sets of code, data etc. The linker looks at
rjc> the object files given and if they were not all compiled together (eg
rjc> if you had recompiled one) it uses the non-optimised code.

jg> Same as my third option.  

No, you said the `linker' would have to do code analysis and code
generation, I said it would have to make a choice. If the difference
is not clear, I suggest you try and code them. How big do you think
interprocedureal analysis plus code generation is? I think it is clear
that choosing one of two sets of definitions takes about 100 including
totally paranoid levels of defensive tests.

jg> The load-time tool actually does the interprocedural analysis.

Nope, it doesn't even need to know that the difference between the
alternate definitions _is_ interprocedural analysis.

jg> Your solution merely makes the loader's
jg> 'code generation' duties simpler at the expense of more expensive
jg> compilation and larger object files.  And, again, no implementation
jg> does this (yet).

Larget objects I'll grant you, I never said it was a comercially
viable solution, just the simplest to explain in a news message.

Ps. Re: `common practices'. The only C code I have ever seen written
	one procedure to a file was the result of letting a PL/I and
	PL6 person loose with CC. name deleted to protect the not so
	innocent :-)

--
rjc@cstr.ed.ac.uk			_O_
					 |<

pa@curly.appmag.com (Pierre Asselin) (04/26/91)

It's late.  OK, I'm-a-byt'n...

In article <1991Apr24.174057.22470@ee.eng.ohio-state.edu>
rob@kaa.eng.ohio-state.edu (Rob Carriere) writes:
>In article <22246@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) spouts forth:
>>P.S.  As I keep pointing out, the _loader_ (or some load-time tool)
>>_can_ satisfy the standard and still do intermodule optimization.
>>The 'translator' (usually thought of as the compiler) cannot.
>
>Err, yes it can.  Someone else has already explained how to do this by
>including optimized and non-optimized versions of the code in the same object
>file. [...]  I would think that 2 implementations count as sufficient
>counterexamples [...]

But does this sheme really count?  Suppose there are N modules subject
to interoptimization.  Translating any one of them leads to a
2^(N-1)-way branch as to what set of optimizations is allowed.
Hmmm...  or 2^M,, where M is the number of optimization tricks the
compiler knows about.  Still too big, though.

On that thought, good night.

  --Pierre Asselin, R&D, Applied Magnetics.  I speak for me.

wilber@alice.att.com (rew) (04/26/91)

J. Giles writes:
>> [...]     but why is it so horrible to you to document
>> to the users of your optimized libraries that certain
>> arguments to certain functions must be carefully
>> non-aliased?  [...]
>
>It's not horrible.  You miss my point if you believe that I would
>oppose that.  You give me a portable way to declare to the compiler
>that two (or more) arguments are assumed not to be aliased and I will 
>rejoice.
>
>The problem is that what people are recommending is _not_ portable
>and is _not_ an explicit declaration (so the end user has to guess 
>which variables the compiler optimized in this way and which are 
>safe to alias).

I like C.  I don't like Fortran.  I find the likelihood that people will still
be coding in Fortran in the year 2020 to be depressing.  Having said that, I
must say that in my experience the criticism that J. Giles makes of C, that
problems with potential aliasing make it *very* difficult for optimizing
compilers to do their job, is a legitimate one.  I am working on a large
program that does schedule optimization.  Speed is critical.  Parts of it are
written in C, and most is written in C++, which is translated into C by cfront.
(BTW, cfront exacerbates the problems with aliasing by introducing large
numbers of spurious pointers.)  What I find is that Fortran code regularly runs
circles around carefully coded C and C++.  The problems intensify on RISC
machines, which are becoming all the rage, because of their dependence on
keeping data in registers for speed.  On an IBM RS6000 a little test program I
wrote in C runs about 6 times slower when I use pointers to scalars than when I
use local scalars directly.  One might expect that a factor of 2 could be due
to the need to dereference the pointers, the rest is because most of the
optimizations have to be turned off as soon as pointers are used.
Unfortunately, once you get past the realm of toy test programs storing all
your data in local variables is a non-option -- most of the data is lurking in
giant arrays and malloced structures, all accessed through pointers.

I think that the pro-C camp has been too willing to sweep under the rug the
difficulties of making a C compiler deal with aliasing.  Sure, global
optimization across all 200 .c modules would solve many of the problems, but
how many C compilers actually do that?  Can you name 3 machines I can buy right
now that have such a beast?  And when you try to do that sort of thing, you
quickly bump up against the limits of your machine. (Can you say, "Virtual
memory exhausted"?  Can you say, "Two day compilation"?)

Cross procedural optimization within a module doesn't buy you as much as you
might hope; simply knowing that none of the calls to foo() within foo.c alias
the parameters doesn't help if foo() is declared external (the default) because
some call to foo() in another module might alias the parameters.  And I'd say
that less than 10% of all functions in typical C code are declared static.
(Even when functions aren't called outside of their modules, programmers rarely
bother to declare them static.)

So what *can* be done?  As has already been pointed out, it is not unreasonable
for a compiler to assume that non-char, non-void pointers of different types
are not aliased.  An option like -this_programmer_should_be_fired should of
course be available to turn this assumption off, for things like compiling code
taken from the net.  I would go further and allow an option called, say,
-strict_typedefs that would tell the compiler that in a case like this:

typedef int Apple;
typedef int Orange;

that an int can be an Apple or an Orange (or neither) but an Apple can't be
an Orange and an Orange can't be an Apple.  Thus in

typedef long Cents;
typedef long Quantity;

Cents calculate_costs(Cents c[], Quantity q[]);

the compiler knows that c and q aren't aliased.  This would of course have
to be an option because there is nothing in the definition of the language
that requires that typedefs should behave this way.

But what can be done about cases like this?

foo(char* p, char* q, char* s);

One can of course come up with various schemes involving macros or comments
that inform your favorite compiler that the parameters aren't aliased, but the
problem is that there are no standards for how this is to be done, so any
such information is likely to be understood *only* by your favorite compiler,
and no other.

Right now, in the Real World (TM), optimized Fortran really does run faster
than optimized C.  A situation that truly sucks.

Bob Wilber    wilber@homxb.att.com

jlg@cochiti.lanl.gov (Jim Giles) (04/26/91)

In article <1991Apr24.174057.22470@ee.eng.ohio-state.edu>, rob@kaa.eng.ohio-state.edu (Rob Carriere) rants:
|> 
|> Err, yes it can.  Someone else has already explained how to do this by
|> including optimized and non-optimized versions of the code in the same object
|> file.  Since I assume that you read less selectively than you answer, I won't
|> repeat his scheme.

Yes, and I've already pointed out that this still relies on the 
_loader_ (or some post-translation tool) to make the actual decisions
about what optimizations to use.  Since I assume that you read less
selectively than you answer, I won't repeat the discussion.

|> [...]             Mister Giles started this whole brouhaha with a claim
|> that C coudn't be optimized as well as FORTRAN, since you needed to do IM
|> analysis for that and the standard didn't allow such things. 

No, I didn't.  I started this discussion by saying that C _isn't_
optimized as well as Fortran since you need IM analysis for that 
and the standard doesn't allow such things in the _translator_ and
_no_ C implementation I've ever seen has the _loader_ (or any post-
translation tool) do the job.  Only _one_ C implementation has been
brought to my attention which does IM analysis _at_all_.  This is 
the MIPS C compiler, which does IM analysis in a non-standard way
by requiring dependent compilation.  Since I don't have access to
the MIPS compiler, this info is of only academic interest.

So - optimization of pointer arguments to procedures requires IM
analysis in C.  The translator can't do this and be standard conforming.
If the translator leaves sufficient information in the object files,
the loader (or some load-time tool) can make the optimizations (even
if this only results in selecting among several versions that the
compiler produced).  In NO present implementation that does IM analysis
is the method standard conforming.

Finally, as I have pointed out before, I consider the MIPS method
perfectly acceptable for all but commercial function libraries (which
are inherently separately compiled so the vendor doesn't have to 
distribute source).  There is nothing wrong with an implementation
which provides an interesting capability as an _extension_ to the
standard.  What I object to is the claim that the extension _is_
standard.

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (04/27/91)

In article <RJC.91Apr25134502@brodie.cstr.ed.ac.uk>, rjc@cstr.ed.ac.uk (Richard Caley) writes:
|> [...]
|> No, you said the `linker' would have to do code analysis and code
|> generation, I said it would have to make a choice. If the difference
|> is not clear, I suggest you try and code them. How big do you think
|> interprocedureal analysis plus code generation is? I think it is clear
|> that choosing one of two sets of definitions takes about 100 including
|> totally paranoid levels of defensive tests.

I'll show you what needs to be done, you show me how the it can be
done without interprocedural analysis.

File1:                  File2:
   int a, b;               func1(x)
   main (){                int *x;
   ...                     {
      func1(&a)                func2(x); func3(x);  
   ...                      ...
   }                       }

File3:                  File4:
   extern int a;           extern int b;
   func3(y)                func3(z)
   int *y;                 int *z
   {                       {
      expression(a,y);         expression(b,z};
   }                       }

These four files are separately compiled.  If loaded together, they will
produce a program in which y and a are aliased in func2(); also b and z
will _not_ be aliased in func3() (because z is really a pointer to a).
Now, the optimization I want is for the final code to optimize func3()
as much as possible since there is no aliasing.  But, the solution must
also inhibit unsafe optimizations in func2() because there _is_ aliasing.

Now, you show me your specific solution and then we'll be able to decide
whether it works, whether it is standard conforming, whether is generalizes
to more than one arg/global, and whether it does interprocedural analysis
or not.  

|> [...]
|> jg> The load-time tool actually does the interprocedural analysis.
|> 
|> Nope, it doesn't even need to know that the difference between the
|> alternate definitions _is_ interprocedural analysis.

If the method you are proposing manages the above example correctly,
then the choice the load-time tool _is_ intermodular analysis.  Whether
it "knows" it or not is an irrelevant point.  I don't think that any
program yet written can be said to _know_ anything.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (04/27/91)

In article <RJC.91Apr25134502@brodie.cstr.ed.ac.uk> rjc@cstr.ed.ac.uk (Richard Caley) writes:
  [ quite reasonable suggestions ]

Gee, sounds like exactly the same argument as we had in comp.lang.misc.
Jim said that because C *allows* aliasing, it *forces* optimizers to
assume aliasing at every step. I said that the compiler could put
together any number of versions of each subroutine, one for each set of
anti-aliasing constraints chosen by any fixed algorithm.

It could, for instance, compile one version where no arguments are
aliased, and one where all arguments are aliased. In the case Jim cares
about (i.e., Fortran-ish array code), nothing would be aliased, so the
optimized versions would be used in every case. So the C code would be
optimized *exactly* as well as a Fortran version, with *no*
interprocedural analysis on the linker's part.

How quickly Jim forgets.

Well, no, I should be fair: he never admitted to losing the argument
that time, so he probably doesn't realize that he's lost it this time.
Poor guy. Still, Caley and I came up with the solution independently,
and probably any compiler writer who doesn't have such a closed mind as
Jim can come up with it too, so I don't think Jim's prejudice will hurt
the future of C optimization.

---Dan

bhoughto@pima.intel.com (Blair P. Houghton) (04/27/91)

In article <22392@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>bhoughto@hopi.intel.com (Blair P. Houghton) writes:
>> for three days when we say it is most definitely NOT
>> dependent on the source language?
>
>Whoops!  Forgot to mention: NCEG is also affiliated with WG14 (the
>ISO C working group).  So you better set those guys straight too.
>(It couldn't be that these people know something that Blair Houghton
>doesn't?  Nah - no chance.)

The thing they know that I don't is a good reason why it
should be necessary to force certain data into a
non-aliased paradigm rather than trusting the programmer to
keep certain data from being aliased.  The best bad reason
is that standards are better at enforcement than
documentors are.

It's neither my fault, my concern, nor my problem that
they're chasing pots of dingy gold at the end of haggard
rainbows, and I certainly don't want to be taxed for it
later by having to pay an extra $20 per copy of the
Standard for 100 pages of errata explaining the contortions
one must go through to define a non-aliasing implementation.

They can elide the trigraphs sections while they're at it, too.

Their best bet is to do what we've all done:  wait for a few
examples of prior art, and then choose from the good ones.
It's a perpetual fool's chase, trying to design by committee
what should be designed by evolution.

X3.159-1989 shows in spots the effects of this political
methodology to science, and I'm thrilled to say that the
noalias crock isn't one of the mistakes they made.

				--Blair
				  "If you want f77(1), you
				   know where to find it."

bhoughto@pima.intel.com (Blair P. Houghton) (04/27/91)

In article <20270@alice.att.com> wilber@homxb.att.com writes:
>One can of course come up with various schemes involving macros or comments

Or #pragma's...

>that inform your favorite compiler that the parameters aren't aliased, but the
>problem is that there are no standards for how this is to be done, so any
>such information is likely to be understood *only* by your favorite compiler,
>and no other.

It's likely to be so, anyway.  If I have invented an optimization
that makes 6x gains in speed of operations on pointer-accessed
data, how fast do you expect me to propagate that to my marketplace
competitors?  Patent Office or no, it's going to take a long time
and cost someone a lot of money to get this little feature into
any more compilers.

>Right now, in the Real World (TM), optimized Fortran really does run faster
>than optimized C.  A situation that truly sucks.

Funny, that happened in '64, too.

				--Blair
				  "...everyone's figuring
				   out how old we were..."

torek@elf.ee.lbl.gov (Chris Torek) (04/27/91)

(I have tried to stay out of this, as arguing with Jim Giles is bad for
my disposition---everything he writes seems to me a vicious attack, as
if his postings were little rabid dogs, poised to bite the unwary; but
that is only my opinion.  At any rate, regardless of the merits of some
of his points [there are some merits!] I wish to make two comments.)

In article <22687@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>Only _one_ C implementation has been brought to my attention which
>does [inter-module] analysis _at_all_.  This is the MIPS C compiler ...

The MIPS compiler is definitely not the only such, though it may be the
only commercial one.  For instance, the Plan 9 compilers do much of
their optimization in the linker; there may be a special flag to
disable this in case the linker's optimizer is broken, but the default
is inter-module optimization.  (Indeed, the Plan 9 group have decided
they may have gone too far in that direction, as it takes over 30
seconds to compile the Plan 9 kernel from scratch, and if more of the
optimization were done in the `compiler' part more of it could be done
in parallel.  Would that gcc could compile BSD in under a minute, or
even under 10 ... ah well.)

>Finally, as I have pointed out before, I consider the MIPS method
>perfectly acceptable for all but commercial function libraries (which
>are inherently separately compiled so the vendor doesn't have to 
>distribute source). ...

Jim seems to be under the impression that in order to get the MIPS
Ucode optimizer to function, you must give it source.  This is not the
case.  One of the compiler options causes it to emit `.u' files (so
named simply to avoid confusion, no doubt; the only difference between
a .u and a .o file is that the latter contains much more machine code
and much less intermediate information).  These .u files can be bound
up in a library.  They are not source code (there are no syntax errors
in .u files) and they certainly could be distributed commercially.
They do require more time to load than an equivalent library of MIPS .o
files, which some commercial vendors might view an impossible barrier,
but that would be silly (vendors could distribute both formats, one for
`test compilation' and one for `final compilation', if compilation time
is considered important).
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

sarima@tdatirv.UUCP (Stanley Friesen) (04/28/91)

In article <790@cadlab.sublink.ORG> martelli@cadlab.sublink.ORG (Alex Martelli) writes:
>rkowen@violet.berkeley.edu (Dario Bressanini) writes:
 	...
>:When i REALLY HAVE to optimize a program, first of all i use 
>:a profiler to see where is the bottleneck, and THEN i try to optimize it;
>: ...
>Ciao Dario!  In interactive tasks such as solid modeling and CAD, you
>would still find (in my experience) 'single-bottlenecks' responsible for
>heavy slowdown of many tasks (no, not 95%!, but 30-40% are not uncommon),
>except that different tasks, and different users, would hit different
>bottlenecks - as a commercial program evolves over many releases, the
>various 'specific bottlenecks' are found and removed.

Quite, I too optimize after profiling.  And most of the time there seems to
be one specific bottleneck.

One of the most interesting I have ever found was one involving union
assignment.   The program had a union somewhat like this:

union any_type {
	int    i;
	long   l;
	double d;
	char   *str;
	struct stuff s;
};

At several places in the code the original writr had used union/struct
assignment to copy the value of this union.  The compiler dutifully
inserted a tight copy loop to copy the entire union for each assignment.
As a result the program spent nearly 1/3 of its time copying the union!!

I simply replaced each assignment with a switch on the actual type and
assigned only the active member of the union - voila a *huge* improvement.


So, be watch out for this one!  (It took me several days to figure out why
simple linear code with no (explicite) loops was using up all of the time!!
I had to disassemble the binary and locate the in-line copy loop)
-- 
---------------
uunet!tdatirv!sarima				(Stanley Friesen)

burow@cernvax.cern.ch (burkhard burow) (04/28/91)

Everybody's favourite topic .....

jlg@cochiti.lanl.gov (Jim Giles) gives us the example 

>I'll show you what needs to be done, you show me how the it can be
>done without interprocedural analysis.
>
>File1:                  File2:
>   int a, b;               func1(x)
>   main (){                int *x;
>   ...                     {
>      func1(&a)                func2(x); func3(x);  
>   ...                      ...
>   }                       }
>
>File3:                  File4:
>   extern int a;           extern int b;
>   func3(y)                func3(z)
>   int *y;                 int *z
>   {                       {
>      expression(a,y);         expression(b,z};
>   }                       }
>

Now lets rewrite this in everybody's favourite language

File1:                           File2:
       common/puke/ia,ib
       program demo                    func1(ix)
          ....                           ....
       call func1(ia)                  call func2(ix)
          ...                          call func3(ix)
       end                              ...
                                       end

File3:                           File4:
       common/puke/ia,ib                common/puke/ia,ib
       func2(iy)                        func3(iz)
       expression with ia,iy            expression with ib,iy
       end                              end

Don't laugh, this little bit of FORTRAN almost killed me.
And as with the original posting, we're all going to ignore little syntax
errors that may have creeped in.

Now jlg@cochiti.lanl.gov (Jim Giles) gave this challenge:

>These four files are separately compiled.  If loaded together, they will
>produce a program in which y and a are aliased in func2(); also b and z
>will _not_ be aliased in func3() (because z is really a pointer to a).
>Now, the optimization I want is for the final code to optimize func3()
>as much as possible since there is no aliasing.  But, the solution must
>also inhibit unsafe optimizations in func2() because there _is_ aliasing.

Now I just don't get it. If the C compiler/linker/what_ever_you_call_it can't
optimize the nonalias in func3, and not screw up with the alias in func2,
what's so magical about FORTRAN? It would seem to be in the same boat.

I fear that I'm on the wrong side of ignorance, so I would appreciate an
explanation.

A bit o' irelevant background:
I must admit that I care much more about the programmers performance (mainly
me!) than I do about the machines performance. I'm a C hack swimming in a
FORTRAN world. To keep me from drowning I've created created CFORTRAN, a set of
macros that 'prototype' FORTRAN functions. e.g.

When my friends do:
      CALL HBOOK1(1,'pT spectrum of pi+',100,0.,5.,0.)
I can do:
           HBOOK1(1,"pT spectrum of pi+",100,0.,5.,0.);

after I've 'prototyped' the FORTRAN function:
PROTOCCALLSFSUB6(hbook1,INT,STRING,INT,FLOAT,FLOAT,FLOAT)
#define HBOOK1(ID,CHTITLE,NX,XMI,XMA,VMX)                 \
     CCALLSFSUB6(hbook1,INT,STRING,INT,FLOAT,FLOAT,FLOAT, \
               ID,CHTITLE,NX,XMI,XMA,VMX) 

I create the prototypes (about 20 FORTRAN routines per hour, main bottleneck is
my typing speed) using info. from the FORTRAN routines' documentation. i.e. My
C code  uses 'FORTRAN' object code, without me ever looking at, or dealing with
a single line of FORTRAN source code. 

I've got CFORTRAN running, i.e. I've created a cfortran.h file with the
preprocessor directives, on RISC Mips (e.g. Silicon Graphics, DECstations) and
VAX VMS machines. I'm still trying to sell the idea to the world, but have
already generated some bootstrap interest. If you're curious, send mail.

QQQQuestion: The clarification I ask for above is actually a serious one, for
I'd like to know if with my pointer freedom in C, I can pass arguments to a
FORTRAN routine which will blow up, because of its optimizations based on
restrictions in FORTRAN.

QQQQuestion: Lots of manuals have little sections on calling and linking
different languages within the same program. How many people out there actually
do this sort of thing? e.g. How many people out there use C, handling the messy
jobs of i/o, user interface, etc., as a frontend to number crunching FORTRAN
routines? It doesn't matter whether you call the FORTRAN routines because of
performance reasons, or as in my case, because they exist and work.

hoping somebody reads this line.
burkhard                          burow%13313.hepnet@csa3.lbl.gov

rob@mckinley.eng.ohio-state.edu (Rob Carriere) (04/29/91)

In article <703@curly.appmag.com> pa@appmag.com (Pierre Asselin) writes:
[about the 2-objects-in-1 scheme of IM optimization]
>But does this sheme really count?  Suppose there are N modules subject
>to interoptimization.  Translating any one of them leads to a
>2^(N-1)-way branch as to what set of optimizations is allowed.
>Hmmm...  or 2^M,, where M is the number of optimization tricks the
>compiler knows about.  Still too big, though.

Well, this is why us academic types invented the thought experiment... :-)
Seriously, the proposed implementation was just that: a proof of concept.  A
real implementation would be a lot smarter about it.  If you build an RCS-like
structure of diffs, I'm pretty sure you can get away low space and time
wastage.  Or you could be lazy and either IM optimize or not at all.  The idea
was, after all, to encourage the programmer to use a compilation scheme that
provides IM optimzation, but still allow compilation in the general case for
standard conformance.  In other words, the executable that results when the
linker goes `oops' can stink as long as it works well enough to conform to the
standard. 

SR
---

rob@mckinley.eng.ohio-state.edu (Rob Carriere) (04/29/91)

In article <20270@alice.att.com> wilber@homxb.att.com writes:
>Cross procedural optimization within a module doesn't buy you as much as you
>might hope; simply knowing that none of the calls to foo() within foo.c alias
>the parameters doesn't help if foo() is declared external (the default) because
>some call to foo() in another module might alias the parameters.  And I'd say
>that less than 10% of all functions in typical C code are declared static.
>(Even when functions aren't called outside of their modules, programmers rarely
>bother to declare them static.)

Well, the idea behind C has always been "the programmer knows what s/he is
doing."  Since that programmer _should_ have declared those fns static, I
don't see where I should be bothered by the fact that not doing so may inhibit
optimization.  

>Right now, in the Real World (TM), optimized Fortran really does run faster
>than optimized C.  A situation that truly sucks.

It seems to me that, like mister Giles, you are confusing properties of the
language with properties of the implementation.  It would sure be nice to have
those wonderful compilers today, but their presence or absence has nothing to
do with the merits or lack thereof of the language C.

SR
---

rob@mckinley.eng.ohio-state.edu (Rob Carriere) (04/29/91)

In article <22687@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
!Yes, and I've already pointed out that this still relies on the 
!_loader_ (or some post-translation tool) to make the actual decisions
!about what optimizations to use.  Since I assume that you read less
!selectively than you answer, I won't repeat the discussion.

Good.  It would have been totally irrelevant.  I do, however, want to
apologize about the selective reading gibe.  Your answer to the post I was
refering to got here out of sequence (to use an understatement).  I should
have realized that USENET will occasionally do such wonderful things.

!No, I didn't.  I started this discussion by saying that C _isn't_
!optimized as well as Fortran since you need IM analysis for that 
!and the standard doesn't allow such things in the _translator_ and
!_no_ C implementation I've ever seen has the _loader_ (or any post-
!translation tool) do the job.  Only _one_ C implementation has been
!brought to my attention which does IM analysis _at_all_.  This is 
!the MIPS C compiler, which does IM analysis in a non-standard way
!by requiring dependent compilation.  Since I don't have access to
!the MIPS compiler, this info is of only academic interest.

Ah.  My mistake.  I thought you were talking about C.  What, pray tell, is a
discussion about compilers doing in comp.lang.c?

SR
---

rob@mckinley.eng.ohio-state.edu (Rob Carriere) (04/29/91)

In article <5079@cernvax.cern.ch> burow@cernvax.cern.ch (burkhard burow)
writes: 
>jlg@cochiti.lanl.gov (Jim Giles) gives us the example 
[IM alias in C]
>Now lets rewrite this in everybody's favourite language
[same example in FORTRAN]
>Now I just don't get it. If the C compiler/linker/what_ever_you_call_it can't
>optimize the nonalias in func3, and not screw up with the alias in func2,
>what's so magical about FORTRAN? It would seem to be in the same boat.

Nope, it isn't.  FORTRAN simple states that such programs ar illegal.
(undefined behavior is the C standard speak, I'm no longer sure of the
FORTRANesque).  So basically, in FORTRAN you are responsible for staying out
of the compiler's way, and in C it's the other way around.  This makes for
safer programming on the C side of the fence (never thought you'd hear _that_
about C, right :-), but it also makes the C compiler's[1] job _much_ harder.

>hoping somebody reads this line.

I did. :-)

SR
[1] compiler is used here to mean translator+linker+loader.  It was
    convenient, and I feel like Humpty Dumpty today.  :-)
---

shap@shasta.Stanford.EDU (shap) (04/29/91)

Smart compilers to interprocedural optimization between files by
understanding how to take multiple file names on the command line.
This way no magic in the linker is required, though I grant it helps.

mmcg@bruce.cs.monash.OZ.AU (Mike McGaughey) (04/29/91)

jlg@cochiti.lanl.gov (Jim Giles) writes:

>the MIPS C compiler, which does IM analysis in a non-standard way
>by requiring dependent compilation.

No it doesnt.

Even with full MIPS IM optimisation, compilation order is independent
(it creates .u files; if you apply their .u-to-symbolic translator to
them, you'll find that they look suspiciously like RTL).

Mike.
-- 
Mike McGaughey			AARNET:	mmcg@bruce.cs.monash.oz.au

"His state is kingly; thousands at his bidding speed,
 And post o'er land and ocean without rest."		- Milton.

jlg@cochiti.lanl.gov (Jim Giles) (04/29/91)

In article <5079@cernvax.cern.ch>, burow@cernvax.cern.ch (burkhard burow) writes:
|> Everybody's favourite topic .....
|> 
|> jlg@cochiti.lanl.gov (Jim Giles) gives us the example 
|> [...]
|> >File3:                  File4:
|> >   extern int a;           extern int b;
|> >   func3(y)                func3(z)
           ^
Did I actually say func3 in both places?  Sorry - the one if file3
should be func2.

|> Now lets rewrite this in everybody's favourite language
|> 
|> File1:                           File2:
|>        common/puke/ia,ib
|> [...]
|> File3:                           File4:
|>        common/puke/ia,ib                common/puke/ia,ib

To be the same as the C example, there would have to be two common
blocks: one with just ia and the other with just ib.  The ia one
is common with file3, the ib one with file4.

|> [...]
|> Now I just don't get it. If the C compiler/linker/what_ever_you_call_it can't
|> optimize the nonalias in func3, and not screw up with the alias in func2,
|> what's so magical about FORTRAN? It would seem to be in the same boat.

That is exactly the point I made earlier about empowerment.  C allows
both subroutines, but most implementations pessimize both func2 and
func3 because they can't tell whether their argument is aliased to
thier global or not.  Fortran also restricts the user, but not in
speed.  The Fortran compiler optimizes both func2 and func3 as if
no aliasing has occurred - but then, the call in this example which
causes aliasing in func2 is _ILLEGAL_.  Since deliberate aliasing 
of this kind is rare (or can be made rare by coding around it), the
Fortran rule permits the user to get better code generated.  

_NEITHER_ language empowers the user to _explicitly_ ask for aliasing
or not as he pleases.  C _could_ be optimized anyway, by having a tool
that traces the identity of procedure arguments through the call tree.
The best place to do this would be the loader, which has to do call
tree analyses anyway.  Of course, the same trick would allow Fortran
to be extended to allow the call to func2 after all - the loader could
in this case _inhibit_ the optimizations that Fortran usually performs.

Without either an explicit declaration or a call chain analysis, there
is no general way I know of to optimize where it's safe to do so, and
not optimize when aliasing occurs.  Last fall, someone on comp.lang.misc
posted a method that relied on information about functions given in 
their .h files (this method would only worh for functions which _have_
.h files associated with them).  However this method *as stated* would
only find aliasing one-level deep in the call chain.  The example I
gave here would have fooled it.

|> [...]
|> QQQQuestion: The clarification I ask for above is actually a serious one, for
|> I'd like to know if with my pointer freedom in C, I can pass arguments to a
|> FORTRAN routine which will blow up, because of its optimizations based on
|> restrictions in FORTRAN.

Calling Fortran from C is safe as long as you remember not to pass pointers
to the same object (or within the same object) as two different arguments
to the Fortran routine and if you don't pass pointers to global objects
that the Fortran routine can see.  Actually, if the arguments that you
pass are not modified by the Fortran routine, it's safe to alias them.
But, it's safer not to alias anything if you can help it.

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (04/29/91)

In article <4037@bruce.cs.monash.OZ.AU>, mmcg@bruce.cs.monash.OZ.AU (Mike McGaughey) writes:
|> [...]
|> Even with full MIPS IM optimisation, compilation order is independent
|> (it creates .u files; if you apply their .u-to-symbolic translator to
|> them, you'll find that they look suspiciously like RTL).

Oh.  Sorry (this will surprise thos who claim that I never admit to
being wrong).  As I said, I don't have access to MIPS.  The capability,
as it was described to me, made no mention of .u files and, in fact,
claimed that the compiler did the IM analysis.  If it is done by a
post-translation tool, then it _is_ standard conforming.

J. Giles

martelli@cadlab.sublink.ORG (Alex Martelli) (04/29/91)

burow@cernvax.cern.ch (burkhard burow) writes:
	...
:QQQQuestion: The clarification I ask for above is actually a serious one, for
:I'd like to know if with my pointer freedom in C, I can pass arguments to a
:FORTRAN routine which will blow up, because of its optimizations based on
:restrictions in FORTRAN.
You can, but you can also blow up Fortran from itself quite easily, i.e. any
'CALL ROUT(X,X)' or 'COMMON/A/X ... CALL TINE(X)' *could* cause the blowup.
Such aliasing is FORBIDDEN in Fortran, but ENFORCING the prohibition is left
to mysterious runtime errors in most cases... just like, in C, free()ing a
pointer NOT obtained via malloc(), or, in any language, using an index out
of range to address an array (indeed, some Fortran compilers will optionally
do array-index-checking, and some malloc()/free() packages will check for
sanity, so I assume somewhere there must exist a compiler that will do some
checks and output error messages on aliasing, though I know of none).

:QQQQuestion: Lots of manuals have little sections on calling and linking
:different languages within the same program. How many people out there actually
:do this sort of thing? e.g. How many people out there use C, handling the messy
:jobs of i/o, user interface, etc., as a frontend to number crunching FORTRAN
:routines? It doesn't matter whether you call the FORTRAN routines because of
:performance reasons, or as in my case, because they exist and work.
I do it the other way 'round, i.e. use C as the backend (for graphics, OS
access, all sorts of algorithms-and-data-types sort of work) for Fortran
programs; it's always Fortran calling C, never the other way around.  As the
fraction of C grows (mostly a reflection of the bad implementations of Fortran
on workstation-class machines, that we observe), SOME C->Ftn cases appear; and
I would indeed like some automated/C-macro approach to such interfacing, just
as I have built for C routines intended to be called from Fortran.  The only
hard problem I see is passing CHARACTER arguments - it's a very convoluted
task in the Ftn->C direction (to keep it portable between 12+ platforms), I
just have not devoted enough work to the C->Ftn direction, so far.  I have
seen your "cfortran" stuff, one month ago, but I have not delved deeply enough
into it to see how hard/easy it would be to generaliza it to Apollo, OS/2, HP,
Sun, IBM RS/6000, etc etc...
-- 
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 53, Bologna, Italia
Email: (work:) martelli@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).

rkowen@violet.berkeley.edu (Dario Bressanini) (04/30/91)

    martelli@cadlab.sublink.ORG (Alex Martelli) writes:

>>:Just in case.... I don't want to start the usual and useless war
>>:C vs FORTRAN etc..,i would like to use C for my programs, but in most
>>:cases i tried, the Code produced by the C compiler was 2 or 3 times
>>:slower that the fortran code.
>On what platforms?  On workstation-class hardware, my experience is
>just the reverse: despite all the theoretical advantages of Fortran,
>which SHOULD be vastly easier to optimize for numerical computation
>(no aliasing allowed...), I find most Fortran-generated code quite a
>bit slower than C. 

Ciao Alex! I didn't know i was starting such a discussion with my
first innocent message.

Well, i tried on Mini systems (Gould, Convex ...). When i used 
f77 (the standard unix compiler) the performances were *VERY* bad,
but when i switched to the vendor's compiler the fortran code
was faster than the C code. I plan to try again on the RS-6000
and on a Personal IRIS to see if there is any difference. 

>>On PC's, it's even worse, as
>>if compiler vendors barely cared for Fortran performance (particularly
>>in bulk I/O) but were locked in a mad race for C performance, or
>>something like that...

I agree with you. Once i reduced the execution time of a fortran
program on a IBM PC  from 18 Hours to 2 by changing a single line!
it was something like

  a = (expr)**3

I changed it in    

        b = expr
        a = b*b*b 

et voila' (expr is a floating point expression).
If the compiler cannot do this kind of optimizations, well, it is a
very poor compiler.
On my Atari ST, the FASTEST language is......compiled BASIC!
(and now all comp.lang.c will flame me :-) )


>>modern C's should allow you to use
>>single precision anyway where/when it matters (ANSI allows that, Sun C
>>while non-ANSI offers a special option for this, etc).  Anyway I'm not
>>surprised if your results are on supers/mainframes, since apparently
>>those guys are GOOD at writing Fortran compilers (maybe it is that they
>>CARE about it!), while their WSs colleagues appear to be either less
>>able or less motivated... 

Yes, with ANSI C you can use single precision, but it is almost USELESS  
in scientific computations (at least in my field, i.e. quantum chemistry)
If I use single precision I cannot trust the results!
FORTRAN and C belong to different worlds: C is for programmers
while FORTRAN is for most of the scientific community. Most of the time
a C programmer doesn't give a damn for the floating point performances,
while sometimes, doing scientific computations, I even have to care
about the performances of DOUBLE PRECISION COMPLEX.
I would like to use a language which combines the good features of the
two, but i guess I'll have to wait...


>>Since when do you mind 'pure void theoric discussions', DB...?-) {Personal
>>joke warning, me and DB are old fidonet friend/enemies!-}
things change....


   rob@mckinley.eng.ohio-state.edu (Rob Carriere) writes:

>>>Right now, in the Real World (TM), optimized Fortran really does run faster
>>>than optimized C.  A situation that truly sucks.
>>It seems to me that, like mister Giles, you are confusing properties of the
>>language with properties of the implementation.  It would sure be nice to have
>>those wonderful compilers today, but their presence or absence has nothing to
>>do with the merits or lack thereof of the language C.

I don't agree with this. We can talk forever about properties of a language,
but when you write a program you have to deal with the actual implementation.
I am glad to know that a language has certain properties, but if the
implementation doesn't use them, well, they don't count for me.
If my fortran program, that takes 100 hours on a CRAY, can takes 80 hours
if rewritten in C, I'll start coding immediately, but if you say that 
i should judge C from the fact that in theory i could have a program
that takes 80 hours, and not from the fact that the actual C implementation
would take longer that 100 hours, well...
 
     dario bressanini      rkowen@violet.berkeley.edu

throopw@sheol.UUCP (Wayne Throop) (04/30/91)

> jlg@cochiti.lanl.gov (Jim Giles)
> In NO present implementation that does IM analysis
> is the method standard conforming.

To quote Inigo Montoya: "You keep using that word.  I do not
think it means what you think it means."

In fact an implementation that uses inter-module analysis to get
superior optimization of cases that can be proven non-aliased may very
well be standard conforming.  Certainly the inter-module optimizations
of the MIPS C compiler aren't what (if anything) keep it from being
standard conforming. 

In just what way are inter-module optimizations supposed to extend or
violate the standard?  As far as I can tell, they do neither. 

( Note that these inter-module optimizations can be performed within
  the usual capabilities of current Unix .o and .a file formats, at the
  cost of a tradeoff in multiple, differently-optimized translations
  output from the compile phase.  In much the same way as the inter-module
  type checking is done within this restriction by C++. 

  Not that there aren't better ways available if you allow the
  introduction of a better intermediate form.  It's just that this 
  conservative approach (while not totally satisfactory), would work 
  and yield 99 and 44/100ths percent of the benefits Jim is looking for.  )
--
Wayne Throop  ...!mcnc!dg-rtp!sheol!throopw

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (05/01/91)

In article <22839@lanl.gov>, jlg@cochiti.lanl.gov (Jim Giles) writes:
> That is exactly the point I made earlier about empowerment.  C allows
> both subroutines, but most implementations pessimize both func2 and
> func3 because they can't tell whether their argument is aliased to
> thier global or not.  Fortran also restricts the user, but not in
> speed.  The Fortran compiler optimizes both func2 and func3 as if
> no aliasing has occurred - but then, the call in this example which
> causes aliasing in func2 is _ILLEGAL_.  Since deliberate aliasing 
> of this kind is rare (or can be made rare by coding around it), the
> Fortran rule permits the user to get better code generated.  

Here I think we have come to the core of the problem.
Friends, let's face it, Jim Giles' main point (that many currently
available C compilers miss chances to generate better code because
they think pointers _might_ be aliased) IS VALID.  There are some
really good compilers out there (Plan 9 sounds _wonderful_) but there
are some not-so-good compilers,  and especially on the top end machines
that I suspect Jim Giles cares most about, Fortran compilers really do
generate much better code than C.  (There are other reasons for using
Fortran on top end machines.  For example, a certain mini-super
manufacturer making what's basically an improved DAP lets the number of
processors in the actual machine "show through" in C, but gives you
however many "logical processors" you want in Fortran.  Ok, fair enough
for C to have a way of getting at the hardware directly, but there is no
advantage to customers in forbidding "virtual processors" to C programmers.)

The key difference is
    -- C permits aliasing.  The price of this is that in the absence of
       good interfile analysis, you get inferior optimisation.  This is
       a Quality of Implementation issue, because interfile analysis
       _can_ be done without violating the standard.
    -- Fortran forbids aliasing.  The price of this is that in the
       absence of good interfile analysis, the fact that the code that
       has been generated is WRONG because there _is_ aliasing goes
       quietly un-noticed.

Let me show you a typical example of aliasing in C:

	struct ListNode { struct ListNode *next; int item; }

	void ListDelete(struct ListNode **p; int item)
	    {
		struct ListNode **v = p;
		struct ListNode *q;

		for (q = *p; q != NULL; q = q->next)
		    if (q->item != item)
			*v = q, v = &(q->next);
		*v = q;
	    }

	    ...
		ListDelete(&var);
	    ...

At various times we have *v and *p as names for the same variable,
**p and *q as names for the same variable, **v and *q as names for
the same variable, and so on.  This is aliasing.

C pointers would be almost useless for managing linked structures unless
"aliasing allowed" were the default.  (This is not a defence of C
pointers.  There may well be better ways of accomplishing the same ends.)
Fortran, however, _used_ not to have pointers.  I know that Fortran
Extended _has_ pointers, but not having a current draft, I don't know
what form they now take.  Can Fortran Extended pointers be used for this
sort of linked structure manipulation, and what does that do to Fortran's
prohibition of aliases?

It should also be borne in mind that the prohibition of aliases in the
Fortran standard used to be widely ignored by Fortran programmers.  For
example, consider matrix addition:

	subroutine matadd(a, b, c, m, n)
	    integer m, n
	    real a(m,n), b(m,n), c(m,n)
	    integer i, j
	    do 20 j = 1, n
		do 10 i = 1, m
		    a(i,j) = b(i,j) + c(i,j)
10		continue		    
20	    continue
	end

I've lost count of the number of books I've seen which tell you that
it is ok to call a subroutine like this as
	call matadd(a, a, c, m, n)
or	call matadd(a, b, a, m, n)
This is _precisely_ the kind of aliasing that Fortran prohibits.  The
effect of such a call is undefined.  A Fortran compiler is perfectly
entitled to generate code that starts out
	if (iaddr(a).eq.iaddr(b)) call system('rm *')

The corresponding C code is required to work.

If Jim Giles accompanied his criticism of C for allowing aliasing
with a swipe at Fortran compiler vendors whose products do not by
default warn users when they commit the "error" of aliasing, then
I'd be impressed by his fairness.

(For what it's worth, I have no objections to using Fortran, and 
would _love_ to have access to a Fortran Extended compiler.)

-- 
Bad things happen periodically, and they're going to happen to somebody.
Why not you?					-- John Allen Paulos.

jlg@cochiti.lanl.gov (Jim Giles) (05/01/91)

In article <5475@goanna.cs.rmit.oz.au>, ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
|> [...]                     Can Fortran Extended pointers be used for this
|> sort of linked structure manipulation, [...]

Yes.

|> [...]                                  and what does that do to Fortran's
|> prohibition of aliases?

Shoots it all to hell.  But, there is a very important difference.
Fortran does _not_ automatically convert array reference arguments 
into pointers when you make a function call.  So, array calculations
can still be compiled to efficient code without intermodule analysis.

|> [...]
|> 	subroutine matadd(a, b, c, m, n)
|> [...]
|> 		    a(i,j) = b(i,j) + c(i,j)
|> [...]
|> I've lost count of the number of books I've seen which tell you that
|> it is ok to call a subroutine like this as
|> 	call matadd(a, a, c, m, n)
|> or	call matadd(a, b, a, m, n)

I've not seen any books that recommend that.  I have seen books that
recomment _against_ it.  There is an aliased call that is legal in 
Fortran:
      call matadd(a,b,b,m,n)

It's safe to alias the second and third arguments because neither of
them is assigned to by the procedure.  Most books on Fortran recommend
that you don't do even this, since you can't be sure and it's better
to be safe.  Numerical Recipes (the one for Fortran) has _one_ program
in it which recommends an example of unsafe aliasing - but, in their 
case, so many of their algorithms need work it is not surprising that
they make such an error.

|> 
|> If Jim Giles accompanied his criticism of C for allowing aliasing
|> with a swipe at Fortran compiler vendors whose products do not by
|> default warn users when they commit the "error" of aliasing, then
|> I'd be impressed by his fairness.

Quite so.  I have always criticized Fortran implementations for this.
Especially since it does not require a run-time test.  If the compiler
were to put the proper information into the object code, the loader
could propagate the identities of procedure arguments through the 
call chain and detect all such illegal aliasing at load-time.  It is,
in fact, almost the same algorithm that I've been recommending for use
to optimize C code at load-time.

J. Giles

gwyn@smoke.brl.mil (Doug Gwyn) (05/02/91)

In article <1991Apr29.213922.27863@agate.berkeley.edu> rkowen@violet.berkeley.edu (Dario Bressanini) writes:
>Well, i tried on Mini systems (Gould, Convex ...). When i used 
>f77 (the standard unix compiler) the performances were *VERY* bad,
>but when i switched to the vendor's compiler the fortran code
>was faster than the C code.

Note that most UNIX "f77" and "cc" compilers use the exact same compiler
technology, including a common code generator.  For a fair comparison of
the properties of application implementation in different languages,
that is appropriate.  Certainly, switching to a more highly tuned
compiler would produce faster generated code, but that would be true for
C as well as for Fortran.  Another relevant factor may be that the UNIX
system implementors were not as concerned about obtaining optimal code
for Fortran, since the vast majority of UNIX is implemented in C;
therefore, even using the same technology the C compiler variant may
have been somewhat more highly tuned than the Fortran variant.

Anecdotal evidence comparing compiler performance is thus rather
useless in an argument about the run-time efficiency for programs coded
in different languages.  Note also that the very criterion is by no
means the most important one for selection of a programming language.

burley@pogo.gnu.ai.mit.edu (Craig Burley) (05/02/91)

In article <22983@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:

   In article <5475@goanna.cs.rmit.oz.au>, ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
   |> [...]                     Can Fortran Extended pointers be used for this
   |> If Jim Giles accompanied his criticism of C for allowing aliasing
   |> with a swipe at Fortran compiler vendors whose products do not by
   |> default warn users when they commit the "error" of aliasing, then
   |> I'd be impressed by his fairness.

   Quite so.  I have always criticized Fortran implementations for this.
   Especially since it does not require a run-time test.  If the compiler
   were to put the proper information into the object code, the loader
   could propagate the identities of procedure arguments through the 
   call chain and detect all such illegal aliasing at load-time.  It is,
   in fact, almost the same algorithm that I've been recommending for use
   to optimize C code at load-time.

Uh, maybe "all obvious cases of illegal aliasing", but not all cases.

Remember that whether a given definition occurs, even of a dummy argument, is
entirely a run-time decision.  I.e. the statement "Dummy argument X is
defined by MATADD" is not really the opposite of the statement "Dummy argument
X is not defined by MATADD".  Instead, the statements must read as follows
to be opposites: "Dummy argument X is known to be defined during any
invocation of MATADD"; "Dummy argument is not known to be defined...".

(Another way to explain: Fortran 90 has the INTENT(IN), INTENT(OUT),
and INTENT(INOUT) specifications for dummy arguments to indicate whether
a given dummy argument is (IN) possibly referenced but never defined;
(OUT) possibly defined but never referenced prior to an initial definition;
(INOUT) possibly defined and possibly referenced.  However, the default is
not INTENT(INOUT) as might be suspected, because a given dummy is not
necessarily known to be possible referenced and/or possibly defined at
compile time, and THAT is the default.  For a default-INTENT dummy, any
actual argument may be passed (essentially); for INTENT(INOUT), only a
definable and defined actual argument may be passed, and so on.)

Obviously, if the decision as to whether a dummy gets defined or even
referenced is made at run-time, then it is impossible for a compiler or
linker/loaded to do a complete check as to whether aliasing errors have
been coded into the application.

That having been said, I do think 99% or more of all aliasing errors comitted
in Fortran code are detectable (without falsing) at the link/load phase
(i.e. more than 99% are provable and obvious).

Oh, this assumes that reporting a given invocation as violating an aliasing
requirements is not a "false" if that invocation wouldn't actually get called
during a given invocation of the program as a whole.  I.e. if one gets a
message that "CALL MATADD(A,B,B,N)" does invalid aliasing because the 3rd
argument of MATADD gets modified, it shouldn't be considered a "false" report
if the CALL itself happens to never (or rarely) get executed.  (In fact,
this is why I like compile/load-time checking, as long as it doesn't report
further levels of falses -- one gets checking that, if only performed at
run-time, might find cases one might never find by just running the program.
Cases like what happens when particular data sets are used, for example.)
--

James Craig Burley, Software Craftsperson    burley@gnu.ai.mit.edu

jlg@cochiti.lanl.gov (Jim Giles) (05/02/91)

In article <BURLEY.91May1153042@pogo.gnu.ai.mit.edu>, burley@pogo.gnu.ai.mit.edu (Craig Burley) writes:
|> [...]
|> Obviously, if the decision as to whether a dummy gets defined or even
|> referenced is made at run-time, then it is impossible for a compiler or
|> linker/loaded to do a complete check as to whether aliasing errors have
|> been coded into the application.

True.  The test I had in mind would signal a message whenever an instance
of aliasing occurred which _might_ be illegal.  That is, if either of the 
aliased entities is _possibly_ assigned to within the procedure (this is a 
simple thing for the compiler to determine - the variables either appear
on the left of an assignment _somewhere_ or not).  In this respect, I simply
assume that the possible will happen.  The bottom line is that any variable
with the OUT or INOUT attribute (or, one that would have that attribute if 
you made an interface for it) must not be aliased or I issue a message.

|> [...]
|> That having been said, I do think 99% or more of all aliasing errors comitted
|> in Fortran code are detectable (without falsing) at the link/load phase
|> (i.e. more than 99% are provable and obvious).

I agree.  This is also true of all C code except where pointer assignment
has been performed.  Then the issue gets muddy indeed.  The only safe
answer is that the pointers involved are now _maybe_ aliased to all
the objects in the transitive closure of assignments to which they belong.
This is also simple for the compiler to determine, but will result in 
a lot of alias warnings that aren't actually true.  Fortran pointers
will suffer the same problem.  The best solution is to extend the
languages to allow the user to explicitly declare which pointers are
allowed to be aliased and to what.  Then the compiler can actually 
enforce the constraints (at compile time).

This gets back to the issue of "empowerment".  Giving the user explicit
control is empowerment.  Automatically making the decision for him is
not.  Though, if an automatic method _mostly_ does what you want, you may
be satisfied to accept the default most of the time - just as long as
you are given the power to override the default when you want to.  With
respect to aliasing, neither Fortran nor C give you that power.  Using
the loader in the manner I've described would give the user feedback about
how to use that power most effectively - and maybe, therefore, convince
implementors and language designers to provide the necessary features.

J. Giles

clive@x.co.uk (Clive Feather) (05/02/91)

In article <5475@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
> The key difference is
> -- C permits aliasing.  The price of this is that in the absence of
>    good interfile analysis, you get inferior optimisation.  This is
>    a Quality of Implementation issue, because interfile analysis
>    _can_ be done without violating the standard.
> -- Fortran forbids aliasing.  The price of this is that in the
>    absence of good interfile analysis, the fact that the code that
>    has been generated is WRONG because there _is_ aliasing goes
>    quietly un-noticed.

[He's not the only one to do so, but this message was to hand when I
finally decided to contribute to this thread]

The standard (3.9.6) says:
   "The use of two parameters declared with an array type (prior to their
    adjustment to pointer type) in separate lvalues to designate the same
    object is an obsolescent feature."

Looking at this and the rationale, it seems to me that the intention of
the committee was:

    void mangle_two_arrays (double *a, double *b, size_t n)
    {
        /*
            The compiler must generate code which will work if a and b
            point to the same object, or if a == b + 1, etc. In other
            words, traditional C semantics.
            Future versions of the standard will retain these semantics.
        */
    }

    void mangle_two_arrays (double a [], double b [], size_t n)
    {
        /*
            At present this has the same semantics as above. However,
            a future version of the standard will probably change the
            semantics of this procedure to those in the following comment;
            compiler implementers should look at this as a reasonable
            extension, and programmers should use this notation when they
            want these semantics.
        */
        /*
            The compiler is entitled to assume that a and b point to
            different objects, and having them point to the same object
            will cause undefined behaviour. In other words, Fortran
            semantics.
        */
        /*
            If this procedure only accesses a [0] to a [n-1] and b [0]
            to b [n-1], it is presumably safe for b == a + n. However,
            it is undefined what happens if b == a + n - 1, even if
            the code is such as the following.
        */

        size_t i;

        for (i = 0; i < n; i++)
            b [i] = sin (a [i]) - cos (b [i]);
    }

What I am wondering is:

(1) am I reading the standard correctly ?
(2) why hasn't Chris Torek (or Doug Gwyn or Henry Spencer or Mark Brader
or ...) mentioned this before ?
(3) is Chris wrong (shock, horror; I can sense the flames coming at me
already :-) when he says that declaring a parameter as an array is
always a Bad Thing (TM) ?
-- 
Clive D.W. Feather     | IXI Limited         | If you lie to the compiler,
clive@x.co.uk          | 62-74 Burleigh St.  | it will get its revenge.
Phone: +44 223 462 131 | Cambridge   CB1 1OJ |   - Henry Spencer
(USA: 1 800 XDESK 57)  | United Kingdom      |

sef@kithrup.COM (Sean Eric Fagan) (05/03/91)

In article <22841@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>If it is done by a
>post-translation tool, then it _is_ standard conforming.

Why do you have this hangup about "post-translation"?

ANSI says *nothing* about how code is produced, yet, obviously, the code you
write must be executed somehow.  Some systems, such as Saber-C, will execute
as they translate (or, perhaps, do a quick translation and then execute the
intermediate results).  Others can require you to type in 42 commands to
produce an executable that you can then run:  each of those 42 steps is part
of the "translation" process as, jim so ignorantly puts it.

The unix linker is, essentially, part of the "translation" process for C on
unix.  So there is no "post-translation" optimisation.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

jlg@cochiti.lanl.gov (Jim Giles) (05/03/91)

In article <1991May02.173643.9528@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes:
|> [...]
|> The unix linker is, essentially, part of the "translation" process for C on
|> unix.  So there is no "post-translation" optimisation.

You need to read your copy of the standard more carefully.  The C standard
_specifically_ make a distinction between translation and linking.  Source
'files' may be 'translated' in any order and separately and subsequently
'linked' together to form a program.  Define these terms how you like, 
but don't expect any vendors to agree with you.  In the meantime, what
the standard has in mind is perfectly clear: separate compilation.

J. Giles