[comp.lang.misc] Aggressive optimization

chased@rbbb.Eng.Sun.COM (David Chase) (10/18/90)

> Aggressive optimization is the idea that restructuring a program *in
> the course of translation* is both feasible and advantageous, and
> desirable (cost effective).

I'll grant that.

> My contention is that logic restructuring optimizations are more cost
> effective instead at the source level, whether automatic or manual,
> and that often automatic is not necessary, manual is sufficient.

Contention refuted, I think.

Monday, I went to a nice talk by Michael Wolf of Stanford U on
blocking (of matrix algorithms) for (register, cache, VM) locality and
parallelism.  These transformations are well understood, have clear
conditions for correctness, and though they could be performed at the
source level, doing them by hand is extremely tedious and far more
likely to be incorrect.  Also, choices must be made based on cache
structure and array size; it can be profitable to make these choices
at RUN time.  The examples shown in the talk involved transformation
of three (nested) loops into five (nested) loops.  I believe you'd
call this "aggressive".

Performance boost: a factor of 3 or 4 for common matrix algorithms on
two different machines.  Note that the exact blocking chosen is
machine-dependent.

Other dependence-based transformations (scalarization, software
pipelining, vectorization, parallelization) are also well understood,
have clear conditions for correctness, and yield ample (often integer
factors) increases in speed.

> My main reason for surmising so is that automatic program analysis and
> rewriting is often far more difficult than code planning and
> synthesis,

This is quite true, if (1) you ignore economies of scale and (2) you
ignore those optimizations that are not expressible in the source
language.  The tremendous effort put into writing an optimizing
compiler (and getting it correct) is paid back many times over in its
later use.  The work of a small number of people in this building
improves the performance of code written by anyone using our
compilers.

> and the benefits are not as often competitive with those of more
> limited and safer alternatives (source analysis and rewriting).

You make two mistakes here:
First, you act as if one must EITHER do manual optimization, OR do
compiler optimization, but not BOTH.

Second, compiler optimization IS more reliable than manual
optimization, especially if the manual optimizer attempts to perform
the same transformations (AND, the optimizer will optimize for
different machines, unlike the manual optimizer).  How often do you
catch an optimizer bug, and often do you make a mistake
hand-optimizing code?

Consider, for example, a piece of code (written in C) I wrote
demonstrating a new approach to solving a problem (it's the same
example I've always used, because it's the only piece of code I've
bothered that much over; usually, good algorithms, correctness, and
compiler optimization are enough).  First we have the algorithmic
improvements -- there's about a factor of 20 - 1000 on the interesting
problems.  Then, we do some profiling to look for hot spots, and
perform some optimizations by hand (that the compilers then did not
do, but WILL do now, saving me the bother of doing it by hand and
checking my work).  Another factor of 10.  More hand optimization, and
it's 3 times faster yet.

Now, that's a factor of thirty by hand (except that the compilers
nowadays would have gotten a good chunk of that), but I started from a
very careful implementation to be sure that I got it right.  Even so,
running the code through an optimizing compiler got me another factor
of 1.67 on top of the factor of 30.  It took the optimizer a couple of
minutes to do that; it took me several weeks, spread out over a couple
of years, and collaborating with other people using (and DEBUGGING --
I made a couple of mistakes along the way) the program to get the
factor of 30.

> More shortly: when -O[2345....] will not raise substantially the
> probability of getting broken code in most compilers, when it will not
> result in huge increases in compile time or space, and when it will
> give substantially better results than hand tuned code, then I will
> BELIEVE!

If you get broken code from a compiler, the people who wrote it
probably want to know, and will almost certainly fix it in a future
release of their compiler.  I believe this holds for DEC, FSF, HP,
IBM, Metaware, MIPS, Sun, and Tartan Labs.  

The increases in compile time and space are regrettable, but that is a
choice that you get to make.  My computer, at least, doesn't need me
to hold its hand while it compiles code, and I don't run the compiler
at -O666 every day.  Choice of optimization level during development,
debugging, testing, and performance tuning is just a matter of
allocating your time sensibly. (which is why I'm spending time
replying to you, eh?)  As for "substantially better" results, I
believe that is the case already -- the compiler is certainly more
often correct, and since I get to do *both* hand and compiler
optimizations, I let the compiler go first, then I look for hot spots
that it missed.  I don't know about you, but my time is more valuable
than CPU time on most computers.

David Chase
Sun Microsystems, Inc.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/18/90)

In article <1458@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
> (AND, the optimizer will optimize for
> different machines, unlike the manual optimizer).

quickpick would remedy that, permanently.

> How often do you
> catch an optimizer bug, and often do you make a mistake
> hand-optimizing code?

First releases of optimizers simply don't work; after they've spent a
few years on the market, their bugs are few and far between. Maybe one
bug per ten thousand lines of code, after a year of maturity. It goes
down quickly past that.

I do not remember ever making a mistake in hand optimization by the most
fundamental standard technique: adding variable x to keep track of some
intermediate quantity, and eliminating redundant variables given x. What
optimizer can do this for any but the most primitive intermediate
expressions? (Introducing a pointer to traverse an array, and
eliminating the index that it replaces, is the simplest example.)

It would be interesting to see a language where you could say ``Okay,
keep 4*k + j - foo(i) in a register, and use it wherever possible.''
Somehow I don't think we have the technology yet.

  [ example of code: 30x faster from better algorithms, etc., in ]
  [ several weeks, 1.7x faster from optimizer in a couple of minutes ]

But I can usually get another 2x to 5x out of lots of ``picky'' hand
optimizations, having nothing to do with the algorithms. And I'm not
going to introduce subtle bugs in weird cases with unsafe program
transformations.

---Dan

eerke@cs.kun.nl (Eerke Boiten) (10/18/90)

In article <13405:Oct1800:22:5690@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>I do not remember ever making a mistake in hand optimization by the most
>fundamental standard technique: adding variable x to keep track of some
>intermediate quantity, and eliminating redundant variables given x. What
>optimizer can do this for any but the most primitive intermediate
>expressions? (Introducing a pointer to traverse an array, and
>eliminating the index that it replaces, is the simplest example.)

A useful technique, indeed (called "strength reduction" in optimising
compilers, "finite differencing" in transformational programming).

Surprisingly :-), it's one of the first things I learned in the course
on optimising compilers I followed as a masters student. Of course,
*the* example is going through a (multi-dimensional) array.

>Somehow I don't think we have the technology yet.

Somehow I can't imagine this has never been implemented, considering how
long it's been known.

>And I'm not going to introduce subtle bugs in weird cases with unsafe
>program transformations.

Tell me, what do you mean by unsafe program transformations? 
Hand optimisations? Of course, finite differencing is relatively safe
since you introduce redundant information most of the time. It's the
process of *removing* (supposedly) redundant information that might very
well cause many bugs. By the way, there *are* systems that help you in
applying source-level optimisations. Just make sure that your
transformations are correct (only once!), and they preserve the set
of bugs in your program, when applied.

Eerke Boiten
Department of Informatics (STOP Project), K.U.Nijmegen
Toernooiveld, 6525 AD Nijmegen, The Netherlands
Tel. +31-80-612236.	Email: eerke@cs.kun.nl

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/19/90)

In article <2301@wn1.sci.kun.nl> eerke@cs.kun.nl (Eerke Boiten) writes:
  [ after my description of the most general optimization technique ]
> A useful technique, indeed (called "strength reduction" in optimising
> compilers, "finite differencing" in transformational programming).

Huh? The strength reduction and finite differencing that CS majors learn
is absolutely trivial compared to what anyone can do by hand. As I said,
walking through an array is only the simplest example.

Does anyone have a compiler that can introduce a non-linear intermediate
expression and reduce around it? If so, I'm impressed. How advanced is
the symbolic algebra system included in the compiler? Can it take
advantage of range constraints, so that if it knows that x is between 0
and 7, it might set up a table to calculate ((1<<(x+2))-1)/10 quickly?
Can it manipulate floor and ceiling expressions? Can it introduce
invariants to figure out range constraints? These are all part of that
single, fundamental optimization technique.

I know, compilers are improving. Some optimizers fully automate the
dullest, most machine-dependent part of optimization---viz., figuring
out how often loops or branches are executed in practice, seeing how
long machine instructions take, and deciding on that basis whether a
given optimization is worthwhile. I really appreciate that. I won't stop
hand optimization because of it.

> >And I'm not going to introduce subtle bugs in weird cases with unsafe
> >program transformations.
> Tell me, what do you mean by unsafe program transformations? 
> Hand optimisations?

``Neither -O3 nor -O4 should be used when compiling either device
drivers, or programs that modify external variables from within signal
handlers.''

> Of course, finite differencing is relatively safe
> since you introduce redundant information most of the time.

It's exactly this attitude of ``finite differencing is the only
optimization in the world'' that leads people to think that hand
optimization is useless. Both the attitude and the conclusion are
wrong.

> By the way, there *are* systems that help you in
> applying source-level optimisations.

I'm perfectly willing to use whatever's out there. I'm not willing to
pretend that current compilers can figure out any reasonable level of
optimization for me.

---Dan

chased@rbbb.Eng.Sun.COM (David Chase) (10/19/90)

In article <13405:Oct1800:22:5690@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>First releases of optimizers simply don't work; after they've spent a
>few years on the market, their bugs are few and far between. Maybe one
>bug per ten thousand lines of code, after a year of maturity. It goes
>down quickly past that.

Do you have documentation to back this figure up?  I do not believe
it; I've used beta-test compilers from more than one vendor, and they
largely worked.  When they didn't work, ALL the bugs I found were
fixed before the compiler was released to customers.  There's test
suites in abundance for Fortran, C, and Ada -- it's elementary, though
tedious, to be sure that a compiler passes everything in a test suite.

>  [ example of code: 30x faster from better algorithms, etc., in ]
>  [ several weeks, 1.7x faster from optimizer in a couple of minutes ]
>
>But I can usually get another 2x to 5x out of lots of ``picky'' hand
>optimizations, having nothing to do with the algorithms. And I'm not
>going to introduce subtle bugs in weird cases with unsafe program
>transformations.

You didn't read the posting clearly enough (or else I deleted the
details for brevity).  Better algorithms, high-level hand optimization
(i.e., malloc, memoization), low-level hand-optimization (unrolling,
inlining, stuff that optimizing compilers will do automatically for me
today) all contributed to that speedup.  (The grand total was a factor
of 250, which has since been improved to about 400 because of other
algorithmic improvements).  Even so, there was still another factor of
1.6 to be had by the compiler's optimizations.

In general, I find that my uncaught error rate for typing is higher
than the bug rate for modern optimizing compilers.  I think this is
true for most people.  Furthermore, even if I could do the
transformations without thinking, it takes more time for me to type
in the transformations than it does for the compiler to do the
optimizations.

And, it is not the intention that optimizing compilers do unsafe
program transformations -- when they do, it is a bug.  Hold that
thought -- a transformation is not considered unless it is safe.
People don't publish papers on "transformations that usually work".
The wholesale rearrangements that expose parallelism and enhance
locality are based on sound theory, are incredibly tedious to do by
hand (the resulting program is often unrecognizeable), and can yield
fabulous increases in speed.  Other transformations are completely
unavailable at the source level (can you schedule instructions between
the fetch, increment, and store required to implement "x++" on a RISC
machine?) and produce solid improvements in speed.

Perhaps I'm missing something -- I claim, I believe correctly, that
modern optimizing compilers improve code quality (at the *low level*)
faster, more effectively, and with a lower error rate than I could
ever hope to attain by hand (doing the *same* optimizations).  What's
wrong with using such an excellent tool?  What awful property of
optimizing compilers have I missed?

David Chase
Sun Microsystems

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/19/90)

In article <1493@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
> In article <13405:Oct1800:22:5690@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >First releases of optimizers simply don't work; after they've spent a
> >few years on the market, their bugs are few and far between. Maybe one
> >bug per ten thousand lines of code, after a year of maturity. It goes
> >down quickly past that.
> Do you have documentation to back this figure up?

No, just my experience backed by what others tell me. (A '60s programmer
says that for years Fortran compilers were built around the same
optimizer, which produced incorrect results on very long stretches of
code. The workaround? Stick a random label in to split up the code.
Many Fortran programs today still have unnecessary labels in strange
places. Funny thing: I had to work around a similar bug, in exactly the
same way, under an optimizer on a three-year-old machine.)

> You didn't read the posting clearly enough (or else I deleted the
> details for brevity).  Better algorithms, high-level hand optimization
> (i.e., malloc, memoization), low-level hand-optimization (unrolling,
> inlining, stuff that optimizing compilers will do automatically for me
> today) all contributed to that speedup.

Okay, I didn't realize that you were including unrolling and inlining in
your 30x figure. But I do a lot more than unroll and inline when I
optimize by hand.

> And, it is not the intention that optimizing compilers do unsafe
> program transformations -- when they do, it is a bug.

Then I don't understand the warning in your company's cc(1) about -O3
and -O4. Is the compiler buggy?

> The wholesale rearrangements that expose parallelism and enhance
> locality are based on sound theory, are incredibly tedious to do by
> hand (the resulting program is often unrecognizeable), and can yield
> fabulous increases in speed. 

Most programs do not manipulate matrices. I agree, it's fine for the
compiler to do what it can. However, the most I have ever seen an
optimizer get is 3x, and *it* has all the low levels of the hardware to
use.

> Other transformations are completely
> unavailable at the source level (can you schedule instructions between
> the fetch, increment, and store required to implement "x++" on a RISC
> machine?) and produce solid improvements in speed.

I agree entirely. In fact, the most impressive optimizer I've seen gets
2x on practically any code. The underlying hardware is extremely
pipelined (much more than a Cray); there are even separate instruction
streams for integer and floating point operations. (And it takes two
instructions to read or write memory.) I don't expect the language to
support this level of instruction scheduling, so the compiler had
better do a good job. 

Note that the optimizer gets relatively little speedup out of anything
other than scheduling and trivial peephole optimizations.

> Perhaps I'm missing something -- I claim, I believe correctly, that
> modern optimizing compilers improve code quality (at the *low level*)
> faster, more effectively, and with a lower error rate than I could
> ever hope to attain by hand (doing the *same* optimizations).

Doing the *same* optimizations, yes; but there are infinitely more
optimizations that you can do by hand, because computers are still
pretty stupid. Except on hugely parallel code, current optimizers get
almost nothing out of ``intelligent'' transformations.

> What's
> wrong with using such an excellent tool?  What awful property of
> optimizing compilers have I missed?

Unsafe transformations and other optimizer bugs. As in sun% cc -O4. Not
to pick on your compiler, which is otherwise rather nice.

---Dan

gl8f@astsun7.astro.Virginia.EDU (Greg Lindahl) (10/19/90)

In article <20683:Oct1819:44:1490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>Unsafe transformations and other optimizer bugs. As in sun% cc -O4. Not
>to pick on your compiler, which is otherwise rather nice.

Does Sun's -O4 have bugs, or is that warning in the cc man page
because C programmers leave off "volitile" where it's needed?

--
"Restraint, hell. I'm just too fucking busy." -- Bill Wisner

jlg@lanl.gov (Jim Giles) (10/19/90)

From article <13405:Oct1800:22:5690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> I do not remember ever making a mistake in hand optimization by the most
> fundamental standard technique: adding variable x to keep track of some
> intermediate quantity, and eliminating redundant variables given x. What
> optimizer can do this for any but the most primitive intermediate
> expressions? (Introducing a pointer to traverse an array, and
> eliminating the index that it replaces, is the simplest example.)

_ALL_ production quality compilers I've ever used on mainframes can do
much better than all but the most adept programmer at this optimization.
Further, the adept programmer can usually do no better than the compiler.
Compilers have been able to do this quite well since I started programming
23 years ago.  This is one of the reasons that your claim that pointers
are more efficient that arrays is completely incorrect.

In fact, the only place a production quality compiler (the only kind
worth paying for) would fail to make this optimization is when it is
actually _more_ efficient to recompute the index than to reload the
saved value from memory.  And _this_ decision is a feedback from the
register allocation mechanism - so the user _can't_ do this unless he
wants to take _complete_ responsibility for register allocation.  But,
register allocation is machine dependent - anyway it's another of
those things that compilers are usually better than people at.

In the meantime, there _are_ types of optimization that compilers
_can't_ do yet.  The adept programmer should spend his time on those
rather than wasting it on things the compiler can do more quickly
and reliably.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/19/90)

In article <1990Oct18.212844.14728@murdoch.acc.Virginia.EDU> gl8f@astsun7.astro.Virginia.EDU (Greg Lindahl) writes:
> In article <20683:Oct1819:44:1490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >Unsafe transformations and other optimizer bugs. As in sun% cc -O4. Not
> >to pick on your compiler, which is otherwise rather nice.
> Does Sun's -O4 have bugs, or is that warning in the cc man page
> because C programmers leave off "volitile" where it's needed?

Well, the latter can't be the explanation, because their cc doesn't even
understand volatile. Hence the former must be true.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/19/90)

In article <66071@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <13405:Oct1800:22:5690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > I do not remember ever making a mistake in hand optimization by the most
> > fundamental standard technique: adding variable x to keep track of some
> > intermediate quantity, and eliminating redundant variables given x.
> _ALL_ production quality compilers I've ever used on mainframes can do
> much better than all but the most adept programmer at this optimization.
> Further, the adept programmer can usually do no better than the compiler.

What?! This is absolutely unbelievable. In one of my last few articles I
had a paragraph listing some of what a competent hand optimizer does
regularly. A simple example, and one responsible for a 10% speedup in
one section of code: A variable is incremented regularly, but never goes
past 2^16 - 1. Whenever it reaches a power of 2, another variable (which
is initialized appropriately) is incremented. It is the number of bits
in the first variable, though I don't know any compiler that can figure
this out. It is hence smaller than 16. In another piece of code, a loop
(which is performed at most twice---I had to unroll it by hand, because
the compiler couldn't figure it out) essentially subtracts 8 from that
variable each time around, and references (1 << x) - 1 for x a certain
expression computed from that variable. I knew x had to be between 0 and
7, so I made a table and replaced (1 << x) - 1 by a reference into that
table. This took me hardly any thought; it was one of hundreds of hand
optimizations that I performed on the same program. Those optimizations
added up to a 3x overall improvement, over and above the 1.5-2x that the
compiler got for being able to use the machine language directly.

A less trivial example: In a heavily hand-optimized implementation of
Nussbaumer's convolution method, I had to combine a 27-digit number's
residues mod 10^9 + {1,3,5} into its three digits radix 10^9. A certain
intermediate result was guaranteed to be between 0 and 28258, though the
compiler wouldn't be able to figure this out without some rather amazing
calculations with floor and modulus functions. I was able to take
advantage of this to rewrite a large portion of the computation in a way
that I knew was safe only because overflow was impossible. Jim, I bet
you $100 that you have used a ``production quality compiler... on
mainframes'' that cannot perform this optimization. Bet?

Now what did you mean to say?

---Dan

jdarcy@encore.com (Floating Exception) (10/19/90)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <66071@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>> _ALL_ production quality compilers I've ever used on mainframes can do
>> much better than all but the most adept programmer at this optimization.
>> Further, the adept programmer can usually do no better than the compiler.

>What?! This is absolutely unbelievable. In one of my last few articles I
>had a paragraph listing some of what a competent hand optimizer does
>regularly. A simple example
> [convoluted example]
>A less trivial example: In a heavily hand-optimized implementation of
>Nussbaumer's convolution method

Give us a break, Dan!  Both of the examples you've given represent changes
of *algorithm* based on foreknowledge of possible input values, a type of
optimization of which nobody expects compilers to be capable, at least not
in this century.  I guess in some distant future there will exist compilers
that can recognize and replace inferior algorithms with more efficient ones,
but that's irrelvant to your disagreement with Jim Giles.  What he was
talking about was an optimization that, in essence, merely rearranges the
order of evaluation of intermediate results.

I think most people admit that your obviously high opinion of your own
programming skills may not be entirely unjustified, but not everyone is
so talented.  There are many programmers out there who simply don't do
even the simplest types of hand optimizations, who write really crappy
code, and because they are so numerous I would say that it's perfectly
reasonable for compilers to optimize as aggressively as their creators
can make them.  Obviously, this should be done without sacrificing any
correctness, and may not do much for "adept" programmers, but for the
code produced by run-of-the-mill types optimization has always been a
very big winner.
--

Jeff d'Arcy, Generic Software Engineer - jdarcy@encore.com
      Nothing was ever achieved by accepting reality

chased@rbbb.Eng.Sun.COM (David Chase) (10/20/90)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
gl8f@astsun7.astro.Virginia.EDU (Greg Lindahl) writes:
>> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>> >Unsafe transformations and other optimizer bugs. As in [xxx]% cc -O4. Not
>> >to pick on [that] compiler, which is otherwise rather nice.
>> Does [Xxx]'s -O4 have bugs, or is that warning in the cc man page
>> because C programmers leave off "volatile" where it's needed?
>
>Well, the latter can't be the explanation, because their cc doesn't even
>understand volatile. Hence the former must be true.

[I'd prefer not to discuss bugs/features (the neutral word is
"behaviors") of specific products of specific companies.  Let's
confine this to a general discussion of aggerssive optimization,
please.]

There is a third possibility -- that there are at least two dialects
of pre-ANSI C; in one, global variables are volatile, in the other
they are not.  Most parts of most programs are written in the second
dialect, thought some very important parts of some very important
programs are written in the first dialect.  Lacking a standard, common
use dictates, and *both* dialects are in common use, regardless of
what the net.authorities say is "right".  (I know that sounds like a
slimy excuse, but remember who "is always right".  Note, too, that the
keyword added for ANSI was "volatile" for safety, and not "stable" for
speed.  This should give some indication of what is usually intended.)

And, whether you agree with the choice of dialects or not, it's got
nothing to do with "buggy optimizations" -- either global variables
are volatile (i.e, programs execute in serial order), or they aren't.
I believe that is what were discussing in the first place.  There's no
way to "debug" the optimization of references to global variables if
those references are intended to be treated as volatile.

David Chase
Sun Microsystems

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/20/90)

In article <1559@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
> And, whether you agree with the choice of dialects or not, it's got
> nothing to do with "buggy optimizations" -- either global variables
> are volatile (i.e, programs execute in serial order), or they aren't.
> I believe that is what were discussing in the first place.  There's no
> way to "debug" the optimization of references to global variables if
> those references are intended to be treated as volatile.

Fair enough. I'll stop calling the assumption of nonvolatility a bug,
though I still consider it a huge mistake.

There are lots of other examples of optimizer bugs, so the point stands.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/20/90)

Skip to the word ``opinion'' to see my main point.

In article <jdarcy.656338868@zelig> jdarcy@encore.com (Floating Exception) writes:
> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> > [convoluted example]
> >A less trivial example: In a heavily hand-optimized implementation of
> >Nussbaumer's convolution method
> Give us a break, Dan!  Both of the examples you've given represent changes
> of *algorithm* based on foreknowledge of possible input values,

First of all, neither change was based on any foreknowledge of possible
inputs; all constraints were expressed by actual tests in the programs.
More importantly, they weren't algorithm changes at all. See below for
further explanation.

> a type of
> optimization of which nobody expects compilers to be capable, at least not
> in this century.

Exactly. Compilers can't do this sort of simple optimization. See below.

> I think most people admit that your obviously high opinion of your own
> programming skills may not be entirely unjustified, but not everyone is
> so talented.

Wtf are you talking about? I was presenting the optimization the way a
very smart compiler might see it, step by step. Now here's how any
programmer would see it, especially after coding it in the first place:

--> Variable x is the number of bits of padding inside a byte, so it's
--> between 0 and 7. So we can compute f(x) more efficiently by
--> precomputing a table for 0 through 7. Done.

You call it ``convoluted''? No shit. The human brain works in very
convoluted ways. It can also do a lot of things infinitely better than
any existing computer. Simple hand optimizations like the one above are
well beyond the reach of current technology. The same comments apply to
the other optimization I talked about.

In any case, this optimization is in the class of optimizations that I
described: introducing new intermediate quantities (in this case, the
precomputed table) and using them to cut out redundant code (in this
case, the main computation). Jim says he has compilers that can do those
optimizations. I find that unbelievable.

> I would say that it's perfectly
> reasonable for compilers to optimize as aggressively as their creators
> can make them.  Obviously, this should be done without sacrificing any
> correctness,

Well, yes. This is the crux of the issue. I (and, I think, Piercarlo,
and lots of other programmers) see too many optimizer bugs to believe
that aggressive computer optimization is worthwhile. Other people may
think the balance swings in the opposite direction. It's just a matter
of where you draw the line.

By the way, I don't agree with one of your comments---namely, that
automatic optimization becomes less useful as hand optimization kicks
in. On the contrary: the compiler will always understand details of the
machine language that are purposefully hidden from the language used by
the programmer. The programmer and compiler optimize independently, in a
sense, as long as the compiler doesn't try to do optimizations that it
doesn't understand. So on a RISC machine I'll gladly have the compiler
give me its 2x instruction scheduling benefit, no matter how much hand
optimization time I put in.

---Dan

pcg@aber-cs.UUCP (Piercarlo Grandi) (10/22/90)

I was writing:

pcg> Aggressive optimization is the idea that restructuring a program *in
pcg> the course of translation* is both feasible and advantageous, and
pcg> desirable (cost effective).

In article <1458@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase)
writes:

chased> I'll grant that.

pcg> My contention is that logic restructuring optimizations are more cost
pcg> effective instead at the source level, whether automatic or manual,
pcg> and that often automatic is not necessary, manual is sufficient.

chased> Contention refuted, I think.

chased> Monday, I went to a nice talk by Michael Wolf of Stanford U on
chased> blocking (of matrix algorithms) for (register, cache, VM) locality
chased> and parallelism. [ ... ]

chased> These transformations are well understood, have clear conditions
chased> for correctness, and though they could be performed at the source
chased> level, doing them by hand is extremely tedious and far more likely
chased> to be incorrect.

What you are saying is that this guy has some technology to synthetize high
performance matrix algorithms depending on the target architecture, and that
such synthesis can be reliably done by program. This is excellent news, but
does not necessarily mean that the technology is about aggressive
optimization. The same might be used for clever code generation, which is
absolutely OK.

chased> Also, choices must be made based on cache structure and
chased> array size; it can be profitable to make these choices
chased> at RUN time.

On the fly code generation, thunks. Excellent. Not related in any way to
aggressive optimization.

chased> The examples shown in the talk involved transformation of three
chased> (nested) loops into five (nested) loops.  I believe you'd call this
chased> "aggressive".

Frankly, I'd call them "crazy" if used as optimizations. Very clever and
safe if used in a high level code generator. It all depends on the
abstraction level at which you apply them.

chased> Performance boost: a factor of 3 or 4 for common matrix algorithms
chased> on two different machines.  Note that the exact blocking chosen is
chased> machine-dependent.

The real point here is that the guy has a better algorithm for matrix
operations than the standard one, so what he is doing is pattern matching
for occurrence of it, and substituting the standard one for his better one.

This begs the question: why not just write 'm * n' and let the code
generator (or the library manager) synthetize the right algorithm?

My answer: it is crazy to have programmers write specific algorithm
implementations, and then have the compiler pattern match for the
implementation, deduce the algorithm from the implementation, and then
rewrite the implementation with another one that (supposedly) implements the
same algorithm.

IMNHO it is much more effective for example to use a source rewriting tool
that scans the programmer code for implementations, and substitutes them
with higher level constructs. The idea is that the source should be true; it
should either contain an implementation that is literally translated *or* a
specification that the compiler can implement as it pleases, and not
something halfway. Everything else is a program with confusion in its
abstraction levels.

pcg> My main reason for surmising so is that automatic program analysis and
pcg> rewriting is often far more difficult than code planning and synthesis,

chased> This is quite true, if (1) you ignore economies of scale and (2) you

If the economies of scale are "don't touch dusty decks", there is an
unresolved argument as to whether it is more economical to rewrite manually
or automatically the dusty decks in a form more amenable to clever code
generation, or leave them alone and work on the generated code.

I have seen too many dusty decks improve beyond recognition with a little
manual or automatic restructuring (as Knuth says in vol. 1 of the ACP), so I
tend to think that even for dusty decks explicit (automatic or manual)
restructuring is better than aggressive optimization.

If we are speaking of newly designed programs, then probably it is always
cheaper to write the programs right (at the appropriate abstraction level)
in the first place than to restructure them in the code generator.

Unless of course there is not only the "dusty deck" problem, but the "dusty
programmer" one.  (the programmer can only write dusty decks, and cannot be
taught to write programs in a higher level notation, e.g. using procedure
calls :->).

    To those who think that I am joking here: well, no. I know physicists
    and mathematicians that program in Fortran without using subroutines.
    They don't understand them. I have even met a physicist that did not
    know about loops (or block datas) -- but at least he knew about the
    local version of the NAG library, so he initialized his matrixes with
    hundreds of assignment statements, and then called a library subroutine
    to do the real work.

But then we would be discussing sociological issues, not computer science.

chased> The work of a small number of people in this building
chased> improves the performance of code written by anyone using our
chased> compilers.

No, it just gives the illusion that you can have faster rerunning of dusty
decks without touching them. You would be doing a much better service to
anyone if you wrote source program restructuring tools, and then clever code
generators.

pcg> and the benefits are not as often competitive with those of more
pcg> limited and safer alternatives (source analysis and rewriting).

Because once you have turned a low level implementation in a higher level
one it has become magically more portable, and you can also have better
confidence in the program correctness.

I am far more confident of the correctness of an SQL query one page long
than of the equivalent implementation in COBOL which probably is several
dozen pages long. If the SQL code generator is clever it is also likely that
the SQL version is faster too, and on a wider range of machines than the
COBOL one.

I would not trust as much for either portability of speed a COBOL compiler
that would deduce little by little the query the program is supposed to
implement, and then rewrite the COBOL program's logic into a different
implementation for the same query.

chased> If you get broken code from a compiler, the people who wrote it
chased> probably want to know, and will almost certainly fix it in a future
chased> release of their compiler.  I believe this holds for DEC, FSF, HP,
chased> IBM, Metaware, MIPS, Sun, and Tartan Labs.  

Maybe, but there is a thing called software physics that tells us that bugs
are strictly related to complexity, and virtually the only way to drive down
bugs is to reduce complexity, not to report them as they occur.

chased> I don't know about you, but my time is more valuable than CPU time
chased> on most computers.

But this is a good argument to use higher level languages with clever code
generators! Or to let expensive programmer's code be literally obeyed.

I think that if the programmer's time is valuable the best solution is not
to have him write low level implementations that are ignored by the compiler
and restructured into different but equivalent implementations of the same
specification, but to have the programmer write the specification and then
the compiler or library manager generate or select the optimal
implementation straight away. And if the programmer writes a specific
implementation, then that must be because of a very good reason, and it
ought to be respected.

Dan Bernstein and Herman Rubin only present one side of the anti aggressive
optimization argument -- that implementations should not be rewritten by the
language translator. The other side, mine, is that specifications should be
translated in the cleverest way possible.  The aggressive optimizers believe
that implementations should be translated in the cleverest way possible.

The Bernstein/Rubin assumption is that programmers know what they are doing
and write in a low level language which therefore ought to be translated
literally; the Chase/Giles assumption is that programmers don't know what
they are doing and still write in a low level language which therefore ought
to be translated liberally.

My argument is that programmers should use the language at the abstraction
level suitable to them and their applications, and that low level languages
ought to be translated literally and high level ones liberally, and that
translating liberally implementations in low level languages is a monstrosity.

When you have a high level specification and you generate clever code for it
you only have to show that the synthetized implementation satisfies that
specification; when you do aggressive optimization on an implementation
written in a low level language you also have to show that you have
correctly deduced the specification from the implementation you are
overriding. This is a completely different and far more difficult task, and
should not be done in a hidden way by the compiler, on safety grounds alone.

It should be done IMNHO either by the programmer itself (manual algorithm
reimplementation) or by automatic tools on the source, so that the new
implementation is documented in it. And I think that in most cases
restructuring should not be horizontal, i.e. from original inappropriate
implementation to new better implementation (three nested loops to five
nested loops), but from low level to high level (three nested loops to 'm *
n'), and separately from high levle to low level ('m * n' expanded in the
locally optimal version, maybe five nested loops, either by high level code
generation or library selection).

I am not against clever code generation for high level languages -- I am
against compilers trying to second guess programmer's implementations in low
level ones. I would like aggressive optimization advocates to contend with
this point, not with the straw one that better implementations are better.

The question is not:

1) Can a compiler restructure an algorithm's implementation into a supposedly
equivalent supposedly faster implementation as a matter of course during
code analysis and generation?

but it is:

2) Isn't it rather more cost effective to write new programs in a more
abstract and portable way so that clever code generation can be applied to
them, and to restructure old programs in source form so that they are
brought to the same more abstract level either manually or automatically?

My answers are:

1) yes, it is possible to do it but with non neglibile loss in confidence.

2) yes, because it results in long term benefits as well as short term ones.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

cik@l.cc.purdue.edu (Herman Rubin) (10/22/90)

In article <jdarcy.656338868@zelig>, jdarcy@encore.com (Floating Exception) writes:
> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >In article <66071@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> >> _ALL_ production quality compilers I've ever used on mainframes can do
> >> much better than all but the most adept programmer at this optimization.
> >> Further, the adept programmer can usually do no better than the compiler.
> 
> >What?! This is absolutely unbelievable. In one of my last few articles I
> >had a paragraph listing some of what a competent hand optimizer does
> >regularly. A simple example
> > [convoluted example]
> >A less trivial example: In a heavily hand-optimized implementation of
> >Nussbaumer's convolution method
> 
> Give us a break, Dan!  Both of the examples you've given represent changes
> of *algorithm* based on foreknowledge of possible input values, a type of
> optimization of which nobody expects compilers to be capable, at least not
> in this century.  I guess in some distant future there will exist compilers
> that can recognize and replace inferior algorithms with more efficient ones,
> but that's irrelvant to your disagreement with Jim Giles.  What he was
> talking about was an optimization that, in essence, merely rearranges the
> order of evaluation of intermediate results.

As I have stated before, the specific hardware configuration can have a major
effect on the relative performance of algorithms.  I am even more pessimistic
than you about automata taking into account this information.

But even other decisions as to when and how to do weird unrolling, or can
conditional transfers on planned exceptions be combined, etc., are unlikely
to be made by automata.  I can revise an algorithm to take into account the
size of an instruction stack or interference by doing things which the 
compiler cannot know are possible.

The real solution is to have the programmer and the compiler interact.  The
compiler can systematically look at millions of options, but it cannot 
innovate.  The human mind has great flexibility, but is not particularly
fast at routine.

> I think most people admit that your obviously high opinion of your own
> programming skills may not be entirely unjustified, but not everyone is
> so talented.  There are many programmers out there who simply don't do
> even the simplest types of hand optimizations, who write really crappy
> code, and because they are so numerous I would say that it's perfectly
> reasonable for compilers to optimize as aggressively as their creators
> can make them.  Obviously, this should be done without sacrificing any
> correctness, and may not do much for "adept" programmers, but for the
> code produced by run-of-the-mill types optimization has always been a
> very big winner.

I suspect that he is not bragging but complaining.  If you teach potential
programmers that "these are the ways things are done, and these are the
concepts you can use in your programs," you make it difficult for them to
even think later.  As many have said,

		"It ain't what you don't know that hurts you, 
		 t's what you know that ain't so."

Talent can be destroyed more easily than it can be created.  To start 
a potential programmer with a set of techniques designed for those who
cannot think is to at least bury the potential for thinking.  My exper-
ience with students who have had computational courses in mathematics
and statistics is that the situation is much worse.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)

jlg@lanl.gov (Jim Giles) (10/23/90)

From article <2060@aber-cs.UUCP>, by pcg@aber-cs.UUCP (Piercarlo Grandi):
> [...]
>            the Chase/Giles assumption is that programmers don't know what
> they are doing and still write in a low level language which therefore ought
> to be translated liberally.

On the contrary.  My assumption is that the programmer knows very well
what he's doing and that he shouldn't have to waste that talent on
trivial things that the compiler _should_ know how to do at the present
state of the art.  To be sure, programmers who _don't_ know what they're
doing also benefit from a well written optimizing compiler - so the
language should be aggressively optimized for that reason too.

> [...]
> My argument is that programmers should use the language at the abstraction
> level suitable to them and their applications, and that low level languages
> ought to be translated literally and high level ones liberally, and that
> translating liberally implementations in low level languages is a monstrosity.

I can't find any rigid line between "high level" and "low level"
languages.  No compiler should alter the semantic meaning of a program
as required by the semantic description of the language used.  A lower
level language may give more specific operational requirements in its
definition.  Any feature that is not operationally restricted by the
semantic definition of the language should be freely optimized.

Assembly languages are the only ones I know of in which _no_
operational freedom is granted to the 'compiler'.  Most other
languages (including C - the 'as-if' rule) encourage as much
optimization as possible in their language definitions.  If you
object to this, you should seek to have the specifications of these
languages altered - or write your own language.  In the meantime,
compilers should be allowed - and even expected - to optimize as
aggressively as possible with the semantic definition of the
language.

In any case, the acceptability of an optimization should not be based
on a subjective judgement about the 'level' of the language.  The
semantic description of a language gives the only reliable and
(reasonably) objective standard by which to judge this.

J. Giles

cik@l.cc.purdue.edu (Herman Rubin) (10/23/90)

In article <9239:Oct2003:53:1890@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
			.........................

> . On the contrary: the compiler will always understand details of the
> machine language that are purposefully hidden from the language used by
> the programmer. The programmer and compiler optimize independently, in a
> sense, as long as the compiler doesn't try to do optimizations that it
> doesn't understand. So on a RISC machine I'll gladly have the compiler
> give me its 2x instruction scheduling benefit, no matter how much hand
> optimization time I put in.

I have no objection to the compiler helping to optimize, and even making 
suggestions to the programmer.  But I object to the first sentence; there
should be NO details of the machine language, instruction overlap, etc.,
hidden from the programmer.  The existence of a single machine instruction
can determine whether or not a block of code, or an algorithm, should even
be considered.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)

seanf@sco.COM (Sean Fagan) (10/24/90)

In article <25336:Oct1823:13:3990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>> >Unsafe transformations and other optimizer bugs. As in sun% cc -O4. Not
>> >to pick on your compiler, which is otherwise rather nice.

The CDC Cyber 170 FORTRAN compiler has 3 levels of optimization:  O1, O2,
and O3.  O1 was, essentially, no optimization (there may have bene a
differenc between no O<num> and O1, but I don't think so); O2 was, basicly,
full optimization.  The only difference between O2 and O3 was that O3 would
take advantage of delayed loads for array traversals.  For example:

	REAL foobar(35)
	REAL sum
	sum = 0.
	DO 10 i=0,35,1
		sum = sum + foobar(i)
		foobar(i) = foobar(i-1)*sum
 10	CONTINUE

With O2, the loop would be compiled as you expected.  With O3, however, near
the end of the loop would be a load for the next element in foobar.  Note
that, at the end of the loop, this would be accessing an element not of
foobar.  If foobar happened to be placed at the end of your memory segment,
this would cause a runtime error (fatal).

On the other hand, on the top-of-the-line machines, this extra optimization
could spell the difference between 20 hours and 15 hours, which is quite a
bit.

I don't consider this buggy, since the documentation clearly told you what
it would do, and what to expect, and you could always avoid it if you didn't
want to take that chance.

I'm not saying the Sun compiler is like that, but it is a possibility...

-- 
-----------------+
Sean Eric Fagan  | "*Never* knock on Death's door:  ring the bell and 
seanf@sco.COM    |   run away!  Death hates that!"
uunet!sco!seanf  |     -- Dr. Mike Stratford (Matt Frewer, "Doctor, Doctor")
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

mjs@hpfcso.HP.COM (Marc Sabatella) (10/24/90)

>I have no objection to the compiler helping to optimize, and even making 
>suggestions to the programmer.  But I object to the first sentence; there
>should be NO details of the machine language, instruction overlap, etc.,
>hidden from the programmer.  The existence of a single machine instruction
>can determine whether or not a block of code, or an algorithm, should even
>be considered.

Perhaps that is why it took you and a graduate student two weeks to port a
sort routine (or whatever it was you were blathering about a few months ago).
If you want machine level, write in assembly.  And no I don't particularly care
to hear you rant about how existing assemblers don't have a flexible enough
notation.  Lobby for better assemblers; don't try to cripple HLL's.

cik@l.cc.purdue.edu (Herman Rubin) (10/25/90)

In article <65592@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:

			...................

> Assembly languages are the only ones I know of in which _no_
> operational freedom is granted to the 'compiler'.

It is possible to allow an assember to rearrange code for optimization.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)

meissner@osf.org (Michael Meissner) (10/26/90)

In article <2677@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin)
writes:

| In article <65592@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
| 
| 			...................
| 
| > Assembly languages are the only ones I know of in which _no_
| > operational freedom is granted to the 'compiler'.
| 
| It is possible to allow an assember to rearrange code for optimization.

And in fact on systems that use the MIPS chips (MIPSco, SGI,
DECstation), the assemblers already do this, primarily to fill delay
slots and such.  If I really need to know about what the assembler did
behind my back, I have to disassemble the object module.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Do apple growers tell their kids money doesn't grow on bushes?

ted@nmsu.edu (Ted Dunning) (10/27/90)

In article <2301@wn1.sci.kun.nl> eerke@cs.kun.nl (Eerke Boiten) writes:

   In article <13405:Oct1800:22:5690@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
   >I do not remember ever making a mistake in hand optimization by the most
   >fundamental standard technique: adding variable x to keep track of some
   >intermediate quantity, and eliminating redundant variables given x. What
   >optimizer can do this for any but the most primitive intermediate
   >expressions? (Introducing a pointer to traverse an array, and
   >eliminating the index that it replaces, is the simplest example.)

   A useful technique, indeed (called "strength reduction" in optimising
   compilers, "finite differencing" in transformational programming).

if you compile the following code on a sun3 with -O4,

int a[100];

foo()
{
    int i;
    int sum;
    int max;

    sum = 0;
    max = a[0];
    for (i=0;i<100;i++) {
	sum += a[i];
	if (max < a[i]) max = a[i];
    }
    printf("%d %d\n", sum, max);
}
    
you get the following code:

	.text
	.globl	_foo
_foo:
|#PROLOGUE# 0
	link	a6,#-20
	moveml	#8432,sp@
|#PROLOGUE# 1
	moveq	#0,d5
	movl	_a,d4
	moveq	#0,d6
	movl	#_a,a5
L77003:
	movl	a5@,d7
	addl	d7,d5
	cmpl	d7,d4
	jge	L77005
	movl	d7,d4
L77005:
	addql	#1,d6		<--- increment i
	addqw	#4,a5		<--- increment a magic pointer
	moveq	#100,d7
	cmpl	d7,d6
	jlt	L77003
	movl	d4,sp@-
	movl	d5,sp@-
	pea	L25
	jbsr	_printf
|#PROLOGUE# 2
	moveml	a6@(-20),#8432
	unlk	a6
|#PROLOGUE# 3
	rts
--
I don't think the stories are "apocryphal".  I did it :-)  .. jthomas@nmsu.edu

conor@penguin.inmos.co.uk (Conor O'Neill) (10/29/90)

In article <MEISSNER.90Oct25130214@osf.osf.org> meissner@osf.org (Michael Meissner) writes:
>| > Assembly languages are the only ones I know of in which _no_
>| > operational freedom is granted to the 'compiler'.
>| 
>| It is possible to allow an assember to rearrange code for optimization.
>
>And in fact on systems that use the MIPS chips (MIPSco, SGI,
>DECstation), the assemblers already do this, primarily to fill delay
>slots and such.  If I really need to know about what the assembler did
>behind my back, I have to disassemble the object module.

When I was first taught any computer science, I was taught that an assembler
performed a one-to-one translation of assembly language mnemonics to
machine instructions. I like, and can understand, that definition.

I don't consider instruction scheduling, and the removal of no-ops in
the delay slots, as "a one-to-one translation".

Am I the only programmer who wishes that MIPS hadn't "corrupted" the word
"assembler", but had used some other term instead?

---
Conor O'Neill, Software Group, INMOS Ltd., UK.
UK: conor@inmos.co.uk		US: conor@inmos.com
"It's state-of-the-art" "But it doesn't work!" "That is the state-of-the-art".

poser@csli.Stanford.EDU (Bill Poser) (10/30/90)

In article <12175@ganymede.inmos.co.uk> conor@inmos.co.uk (Conor O'Neill) writes:
>When I was first taught any computer science, I was taught that an assembler
>performed a one-to-one translation of assembly language mnemonics to
>machine instructions. I like, and can understand, that definition.

>Am I the only programmer who wishes that MIPS hadn't "corrupted" the word
>"assembler", but had used some other term instead?

Fancy assemblers have been around for a long time. Consider the
VAX assembler (the DEC one, not the UNIX one), which had the pseudo-op
JBR which was translated into a JMP or BRanch instruction depending on
how far away the target ended up, and allowed a high-level specification
of the register save mask for subroutine calls, so you didn't have to
figure it out in binary. Or the MACRO assembler for DEC 20s, which
had all sorts of high-level stuff. It was practically an HLL.

diamond@tkou02.enet.dec.com (diamond@tkovoa) (10/30/90)

In article <8960018@hpfcso.HP.COM> mjs@hpfcso.HP.COM (Marc Sabatella) writes:
[looks like maybe Dr. Rubin's writing here]
||I have no objection to the compiler helping to optimize, and even making 
||suggestions to the programmer.  But I object to the first sentence; there
||should be NO details of the machine language, instruction overlap, etc.,
||hidden from the programmer.  The existence of a single machine instruction
||can determine whether or not a block of code, or an algorithm, should even
||be considered.
|Perhaps that is why it took you and a graduate student two weeks to port a
|sort routine (or whatever it was you were blathering about a few months ago).
|If you want machine level, write in assembly.  And no I don't particularly care
|to hear you rant about how existing assemblers don't have a flexible enough
|notation.  Lobby for better assemblers; don't try to cripple HLL's.

C was invented in order to be that better assembler.  If you want machine
level, you're supposed to program in C.  That is exactly the purpose for
which C was created.  If you want an HLL, use one; don't cripple C.

-- 
Norman Diamond, Nihon DEC    diamond@tkov50.enet.dec.com
                                    (tkou02 is scheduled for demolition)
We steer like a sports car:  I use opinions; the company uses the rack.

mjs@hpfcso.HP.COM (Marc Sabatella) (10/31/90)

>Fancy assemblers have been around for a long time. Consider the
>VAX assembler (the DEC one, not the UNIX one), which had the pseudo-op
>JBR which was translated into a JMP or BRanch instruction depending on
>how far away the target ended up, and allowed a high-level specification
>of the register save mask for subroutine calls, so you didn't have to
>figure it out in binary. Or the MACRO assembler for DEC 20s, which
>had all sorts of high-level stuff. It was practically an HLL.

None of these transformations violate the cardinal rule of assemblers, namely
that of providing a one-to-one translation.  That is, pseudo-ops, registers
masks, and macros are all merely notational conveniences.  If I write a macro
and then use it, I know exactly what code will come out.  Conversely, if I
know I want a specific code sequence to come out, I can write the source
accordingly.  The MIPS "assembler", if the instruction scheduling or whatever
cannot be disabled, fails this test of assembler-ness.

Of course, I have nothing against this, and there is probably no real such
dictionary definition for "assembler".  It is counter-intuitive, though, in a
way that pseudo-ops, masks, and macros are not.

chased@rbbb.Eng.Sun.COM (David Chase) (10/31/90)

diamond@tkou02.enet.dec.com (diamond@tkovoa) writes:
>C was invented in order to be that better assembler.  If you want machine
>level, you're supposed to program in C.  That is exactly the purpose for
>which C was created.  If you want an HLL, use one; don't cripple C.

This may be true for (one dialect of) pre-ANSI C, but the
specification of ANSI C allows (if you don't say "volatile") a large
number of entertaining optimizing transformations, despite assertions
by you and other posters here that C is a "low level language" and
"low level languages ought not be aggressively optimized".  C *is* a
low level language, but the optimizations are 100% legal, often tend
to make the code go faster, and thus they are applied.  The
compiler-writer's claim is "I can prove that these optimizations
conform to the standard"; can you *prove* that they should not be
done?  How will it sell more computers/compilers to NOT do the
transformations?  (Lest you forget, dollars and cents is what it all
comes to in the end.)

Two things to note:

1) several optimizations could be applied much more frequently to C if
it weren't for possible aliasing in procedure pointer/array
parameters.  If you don't understand what I'm talking about, go read a
book on dependence analysis -- Wolfe's _Optimizing Supercompilers ..._
will give you a feel for what can be done to Fortran.

2) it's senseless to argue with the optimizer authors; they'll do
everything that time, the standard, and the customers permit.  If ANSI
has changed Your Favorite Language and you had a substantial
investment in software that was harmed by this, take it up with ANSI.
You might look into the history of ANSI Cobol.

David Chase
Sun Microsystems, Inc.

new@ee.udel.edu (Darren New) (10/31/90)

In article <16101@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes:
>Fancy assemblers have been around for a long time. Consider [...]
>[...] It was practically an HLL.

The "Meta" assembler on Sigma-9's under CP/V was also quite powerful.
My friend wrote a LISP compiler in assembler macros. You could do
something like

       LISP,12	(DEFUN FLUB (X) (cond (null x) R3 (t) (x R3)))
to put the definition of FLUB into register 12.
It was pretty neat, but it took forever to compile.  -- Darren
-- 
--- Darren New --- Grad Student --- CIS --- Univ. of Delaware ---
----- Network Protocols, Graphics, Programming Languages, 
      Formal Description Techniques (esp. Estelle), Coffee -----

grier@marx.enet.dec.com (Michael J. Grier) (10/31/90)

In article <16101@csli.Stanford.EDU>, poser@csli.Stanford.EDU (Bill Poser)
writes:

|> Fancy assemblers have been around for a long time. Consider the
|> VAX assembler (the DEC one, not the UNIX one), which had the pseudo-op
|> JBR which was translated into a JMP or BRanch instruction depending on
|> how far away the target ended up, and allowed a high-level specification
|> of the register save mask for subroutine calls, so you didn't have to
|> figure it out in binary. Or the MACRO assembler for DEC 20s, which
|> had all sorts of high-level stuff. It was practically an HLL.
|>  
|> 

   Ummm...  I don't know what assembler you're using but on my definitive VAX
VMS macro
assembler...

$ create xyz.mar
        .entry main, ^M<>

        jbr     a
a:      ret
        .end main
(^Z here)
$ macro xyz.mar
                                     0002     3         jbr     a
%MACRO-E-UNRECSTMT, Unrecognized statement             !

There were 1 error, 0 warnings and 0 information messages, on lines:
    3 (1)

MACRO XYZ.MAR

   I've only seen this pseudo-operation in Unix/Ultrix VAX assemblers, never in
the real
VAX macro assembler.

   I think that these optimizing assemblers should not be called assemblers
also.  After all,
they're not the 1-1 translators that assemblers are supposed to be.  (True, I
haven't seen
that concrete of a definition anywhere, and it is useful to have some common
instruction
scheduling and peephole-optimizing for a platform done in a common place, but
then it's
not an assembler, IMO.)

------------------------------------------------------------------------------
Michael J. Grier				grier@leno.dec.com
Software Engineer
Digital Equipment Corporation			but I don't speak for 'em!

thornley@cs.umn.edu (David H. Thornley) (10/31/90)

In article <16101@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes:
>In article <12175@ganymede.inmos.co.uk> conor@inmos.co.uk (Conor O'Neill) writes:
>>When I was first taught any computer science, I was taught that an assembler
>>performed a one-to-one translation of assembly language mnemonics to
>>machine instructions. I like, and can understand, that definition.
>
>>Am I the only programmer who wishes that MIPS hadn't "corrupted" the word
>>"assembler", but had used some other term instead?
>
>[VAX and DEC 20 references omitted]

If my memory serves, the IBM 650 had a Symbolic Optimizing
Assembly Program (yep, they used the acronym) that rearranged
the machine-language instructions to a good order so that
they can be executed faster on the drum memory.

The concept that an assembler can mess with the translation is
a lot older than some of the people reading it.

(Disclaimer:  I'm not sure it was an IBM 650, but it was on an
IBM with drum main memory, which puts it back in the '50s.
For an education in how people worked with such drum
memories, go to alt.folklore.computers and ask about Mel. :-)

DHT

wallach@motcid.UUCP (Cliff H. Wallach) (10/31/90)

In article <TED.90Oct26124614@kythera.nmsu.edu> ted@nmsu.edu (Ted Dunning) writes:

  <text deleted>

-if you compile the following code on a sun3 with -O4,
-
-int a[100];
-
-foo()
-{
-    int i;
-    int sum;
-    int max;
-
-    sum = 0;
-    max = a[0];
-    for (i=0;i<100;i++) {
-	sum += a[i];
-	if (max < a[i]) max = a[i];
-    }
-    printf("%d %d\n", sum, max);
-}
-    
-you get the following code:
-
-	.text
-	.globl	_foo
-_foo:
-|#PROLOGUE# 0
-	link	a6,#-20
-	moveml	#8432,sp@
-|#PROLOGUE# 1
-	moveq	#0,d5
-	movl	_a,d4
-	moveq	#0,d6
-	movl	#_a,a5

	movl d4,a5  is more efficient


-L77003:
-	movl	a5@,d7
-	addl	d7,d5
-	cmpl	d7,d4
-	jge	L77005
-	movl	d7,d4
-L77005:
-	addql	#1,d6		<--- increment i
-	addqw	#4,a5		<--- increment a magic pointer
-	moveq	#100,d7
-	cmpl	d7,d6
-	jlt	L77003
-	movl	d4,sp@-
-	movl	d5,sp@-
-	pea	L25
-	jbsr	_printf
-|#PROLOGUE# 2
-	moveml	a6@(-20),#8432
-	unlk	a6
-|#PROLOGUE# 3
-	rts
---
-I don't think the stories are "apocryphal".  I did it :-)  .. jthomas@nmsu.edu



Compiler generated code is still far from hand assembled code.  While
I have much more experience with intel than 68k,  I can see several simple
improvements.  I have not assembled this code.

	<some random prologue>

	moveq	#0,d1			; sum =0
	lea	_a,a0			; a0 = &a[0]
	lea	100(a0),a1		; a1 = &a[100]

	move.l	(a0)+,d2		; max = a[0]

l100:
	cmp.l	a1,a0			; (a dbra may be faster)
	bge	l101

	move.l	(a0)+,d0		; load a[i++]
	add	d0,d1			; sum += a[i]
	cmp	d2,d0			; if (max < a[i])
	blt	l100
	move	d0,d2			;   max = a[i];
	bra	l100

l101:
	<jsr printf ...>
	<some random epilogue>


Can a compiler convert an array reference to a pointer reference?
Can it restructure code to minimize jumps?

strom@arnor.uucp (11/01/90)

In article <1990Oct30.221510.4392@cs.umn.edu>, thornley@cs.umn.edu (David H. Thornley) writes:
|> In article <16101@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes:
|> > ...
|> If my memory serves, the IBM 650 had a Symbolic Optimizing
|> Assembly Program (yep, they used the acronym) that rearranged
|> the machine-language instructions to a good order so that
|> they can be executed faster on the drum memory.
|> 
|> The concept that an assembler can mess with the translation is
|> a lot older than some of the people reading it.
|> 
|> (Disclaimer:  I'm not sure it was an IBM 650, but it was on an
|> IBM with drum main memory, which puts it back in the '50s.
|> For an education in how people worked with such drum
|> memories, go to alt.folklore.computers and ask about Mel. :-)
|> 
|> DHT

Yes, it was a 650.  

And there were arguments between advocates of ``high-level'' assembly
programming and advocates of machine language with hand optimization.
Plus ca change, ... :-)
-- 
Rob Strom, strom@ibm.com, (914) 784-7641
IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY  10958

poser@csli.Stanford.EDU (Bill Poser) (11/01/90)

In article <1990Oct30.165712@marx.enet.dec.com> grier@marx.enet.dec.com (Michael J. Grier) points out that it is the UNIX VAX assembler that has the JBR
pseudo-op. Sorry, memory fault. It is the VMS assembler that has the register
save mask stuff.

diamond@tkou02.enet.dec.com (diamond@tkovoa) (11/01/90)

In article <1849@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
>diamond@tkou02.enet.dec.com (diamond@tkovoa) writes:
>>C was invented in order to be that better assembler.  If you want machine
>>level, you're supposed to program in C.  That is exactly the purpose for
>>which C was created.  If you want an HLL, use one; don't cripple C.
>This may be true for (one dialect of) pre-ANSI C,
For the original and several revisions; for its entire raison d'etre.

>specification of ANSI C allows (if you don't say "volatile") a large
>number of entertaining optimizing transformations,
Many of these don't hurt.  Some of them do.

>despite assertions
>by you and other posters here that C is a "low level language"
And its inventors and its original users.

>The compiler-writer's claim is "I can prove that these optimizations
>conform to the standard"; can you *prove* that they should not be done?
Proofs are demonstrated by example very frequently on Usenet.

>How will it sell more computers/compilers to NOT do the transformations?
Oh THAT, obviously it won't do.  That was not the issue until now.

>1) several optimizations could be applied much more frequently to C if
>it weren't for possible aliasing in procedure pointer/array parameters.
I think this is getting mixed up with a separate thread but anyway...
>book on dependence analysis -- Wolfe's _Optimizing Supercompilers ..._
Is this a revision of his famous out-of-print book?  I'd like to take
a look at one.  [end of digression]

>2) it's senseless to argue with the optimizer authors; they'll do
>everything that time, the standard, and the customers permit.
Partial agreement here.

>If ANSI has changed Your Favorite Language and you had a substantial
>investment in software that was harmed by this, take it up with ANSI.
Yuk.  C is far from my favorite language.  But here is the problem
that ANSI has caused:  The language which was invented for the purpose
of doing machine-dependent hacks with a moderately portable syntax is
no longer suitable for that purpose, because implementors are encouraged
to take advantage of the standard's prohibition and non-definition of
various features that would be useful for machine-dependent hacks.
It is now necessary to invent a new language for machine-dependent
hacks, in order to do what C was supposed to do, or else -- as was
stated by the poster that I previously followed up to -- return to
the days of assembly language programming instead of using a moderately
portable syntax.  C used to beat assembly in many cases.  Now C is
no longer in the running.  What a shame.
-- 
Norman Diamond, Nihon DEC    diamond@tkov50.enet.dec.com
                                    (tkou02 is scheduled for demolition)
We steer like a sports car:  I use opinions; the company uses the rack.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/01/90)

In article <8960020@hpfcso.HP.COM>, mjs@hpfcso.HP.COM (Marc Sabatella) writes:
> None of these transformations violate the cardinal rule of assemblers,
> namely that of providing a one-to-one translation.

At which ecumenical council was this cardinal rule adopted?

It's worth pointing out that the optimisations done by the MIPS
assembler _can_ be disabled.

Surely the cardinal rule for an assembler is not "one-to-one translation"
but "full access to all machine operations".  If you want a particular
sequence of bits in an executable code area, you should be able to get
them.  The MIPS assembler lets you do that.  Why is it evil to offer
optimisation as well?

(Now the assembler _I_ would gripe about is the one where I had to
write a lot indirect branches as ".value MagicNumber" because the
assembler didn't understand "jp *%ax", although the disassembler,
debugger, hardware, and manufacturer's manual _did_ support it.)

-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

tom@stl.stc.co.uk (Tom Thomson) (11/02/90)

In article <8960020@hpfcso.HP.COM> mjs@hpfcso.HP.COM (Marc Sabatella) writes:
>None of these transformations violate the cardinal rule of assemblers, namely
>that of providing a one-to-one translation.  That is, pseudo-ops, registers
one to one translation?  back in the 60s we used to talk about 
the Fortran assembler;  even earlier we had machines like Deuce 
where the assembler would fix the placement of instructions for
you to minimise drum latencies, which was maybe one-to-one still
but very much against the spirit of your proposed restriction on
assemblers.
>accordingly.  The MIPS "assembler", if the instruction scheduling or whatever
>cannot be disabled, fails this test of assembler-ness.
>
but behaves just like 1950s assemblers!

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/02/90)

On 30 Oct 90 21:22:38 GMT, chased@rbbb.Eng.Sun.COM (David Chase) said:

chased> diamond@tkou02.enet.dec.com (diamond@tkovoa) writes:

diamond> C was invented in order to be that better assembler.  If you
diamond> want machine level, you're supposed to program in C.  That is
diamond> exactly the purpose for which C was created.  If you want an
diamond> HLL, use one; don't cripple C.

chased> This may be true for (one dialect of) pre-ANSI C, but the
chased> specification of ANSI C allows (if you don't say "volatile") a
chased> large number of entertaining optimizing transformations, despite
chased> assertions by you and other posters here that C is a "low level
chased> language" and "low level languages ought not be aggressively
chased> optimized".

Haha. Never said that low level *languages* should not be optimized. We
are saying that *programs* in low level languages (or indeed in any
language) should not be optimized.

I have decided to stick out my neck again and try to define better what
is aggressive optimization.

When you have a language it has a set of primitive operators (+,*,...)
and of primitive operator combinators (if, ;, procedures, ...). The
language definition prescribes the effect of the primitive operators and
an algebra for such effects defined by the combinators.

In a way or another this implies that each language implies a virtual
machine. The task of the code generator is to map the virtual machine
onto the real machine as well as it can, which means faithfully
(semantics) and efficiently (pragmatics).

Optimization is doing the mapping from the virtual machine defined by
the language to the real machine in a clever (i.e. optimal for the
target real machine) instead of a straightforward way.

A program in some language, at any level of abstraction, defines an
higher level virtual machine than that offered by the language. The
primitives of the virtual machine defined by a program (most obviously,
but only, its procedures and macros) are actually composed out of the
primitives and combinators of the language, and their meaning is defined
according to the algebra for these as given in the language definition.

Aggressive optimization is mapping from the virtual machine defined by
the program (not by the language!) to the real machine *in the code
generator*.

This is in way of principle possible, because the code generator can
indeed recognize combinations of primitive operators and rewrite them to
other combinations that supposedly map better onto the target real
machine, as long as the two are equivalent under the algebra of the
language definition, in that particular context.

The problems are:

Is this desirable?

    No, because the rewriting algebra of the language may be poorly
    defined, or poorly understood, and thus it may be difficult to rely
    on the ability of the code generator to rewrite combinations of
    language primitives in semantics preserving ways. This is
    particularly true for languages like C, where a clean algebra for
    rewriting was not a design goal. Ansi C have tried to do some poor
    fixup engineering on this. We are not amused.

    No, because even assuming that the underlying algebra is clean,
    rewriting combinations of primitives is a far more concpetually
    difficult task than mapping each of them to the real machine. In
    particular it requires careful analysis. Such analysis tends to
    result in more bugs in the compiler, and the compiler is the
    bottleneck of an entire system's reliability.

    No, because even if two combinations of primitives can actually be
    shown in the given context to be equivaltn according to the
    language's algebra, it ought to be assumed that the programmer
    intended to write the particular one he chose to write for some
    good reason.

    No, because the speed advantages are not that dramatic (or
    nonexistent), *if* the author is a clever programmer that knows
    which of the equivalent formaulations will be optimal on the given
    machine.

Are there alternatives?

    Yes, rewriting the combinations of primitives not in the code
    generator, but at the source code level, either manually or
    automatically.

    Yes, using a language whose primitives are at the abstraction level
    desired, i.e. the primitives offered by the language are those
    needed for the application. In this case the optimization is not
    aggressive, because the generator is only doing optimal
    implementation of hopefully well defined language primitives, not
    supposedly optimal rewriting of pieces of a user program.

Are the alternatives cost effective?

    Yes, because for *many* even if not all codes the time critical
    portion is small enough that careful automatic rewriting is
    feasible.

    Yes, because automatic rewriting can be to a higher level language,
    which can then be more easily and non aggressively optimized, and
    automatic rewriting gives long lasting benefits in portability and
    confidence in the program.

    Yes, because using a language whose primitives are at the level of
    the problem is known to reduce costs and improve quality via reuse
    of the language's implementation or libraries, where otherwise the
    implementation of thsoe primitives would have to be rewritten and
    re-aggressively optimized for every application.

    Yes, because it separates the compiler from the code rewriter,
    which are two modules with very different goals and very different
    reliability profiles.

chased> C *is* a low level language, but the optimizations are 100%
chased> legal, often tend to make the code go faster, and thus they are
chased> applied.

They are legal but dangerous and not cost effective, according to the
above discussion.



chased> The compiler-writer's claim is "I can prove that these
chased> optimizations conform to the standard";


STOP THE PRESSES! Compiler writer proves that he can do program
rewriting in a bug free way! Program synthethizers must be round the
corner! Major software companies stocks collapse!

chased> can you *prove* that they should not be done?  How will it sell
chased> more computers/compilers to NOT do the transformations?  (Lest
chased> you forget, dollars and cents is what it all comes to in the
chased> end.)

The argument is that it is not *cost effective* for *users* of compilers
to pay for program rewriting in the code generator. They may require a
little education to acknowledge this, just like smokers. In the mean
time the tobacco industry has every incentive to educate the customers
to the contrary...

--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

jlg@lanl.gov (Jim Giles) (11/02/90)

From article <PCG.90Nov1171210@odin.cs.aber.ac.uk>, by pcg@cs.aber.ac.uk (Piercarlo Grandi):
> [...]
> In a way or another this implies that each language implies a virtual
> machine. The task of the code generator is to map the virtual machine
> onto the real machine as well as it can, which means faithfully
> (semantics) and efficiently (pragmatics).

Unfortunately for your argument, there are few languages which have
definitions of this kind.  The definition of a language usually
implies a _class_ of virtual machines.  The code generator is allowed
to map any of these onto the real hardware.  Any optimizer which
generates code which faithfully implements anything within the class
of acceptable behaviours is correct.  If an optimizer leaves the
class of acceptable behaviours, it is _broken_.

You are trying to limit the compiler to code transformations that
_you_ regard as straightforward.  In truth, the compiler is only
limited by the language definition.  If you don't like some of the
optimizations allowed by the language definition, then lobby for
the definition to be changed.

J. Giles

chased@rbbb.Eng.Sun.COM (David Chase) (11/02/90)

pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>chased> The compiler-writer's claim is "I can prove that these
>chased> optimizations conform to the standard";

>STOP THE PRESSES! Compiler writer proves that he can do program
>rewriting in a bug free way! Program synthethizers must be round the
>corner! Major software companies stocks collapse!

Read your own posting, please, and note that what you quote did not
say what you say it said.  "I claim I can prove" is most definitely
not the same as "I have proved".  It means that the steps in
optimization are based on little proofs, and sketches of proves; it
doesn't mean that there is a big book proving the compiler correct.
Such a book *could* be written if the need existed (the existance of
dire need would be demonstrated by the exhibition of wheelbarrows full
of money to finance its writing).

Note, too, that I never claimed that the optimizations performed,
however aggressive, are anywhere near general-purpose program
rewriting or program synthesis.  I have claimed, and will continue to
claim, that what transformations are performed are based on a carefully
worked-out understanding of certain program transformations.

Regarding the virtual-machine-to-physical-machine mapping, if the
standard only specifies that we must reproduce the I/O and approximate
resource consumption of the virtual machine, then that's what we'll do
if it makes the code faster.  Where does it say that we should not do
this?

And, regarding, the tobacco industry analogy (an ad hominem analogy,
too -- aren't vile insults what one uses when arguing from a weak
position?), I will suggest that you have no evidence, only anecdotes,
for the alleged pervasive bugginess of optimizing compilers.
Certainly you have nothing like the evidence gathered to demonstrate a
connection between tobacco and lung disease, and only hand-waving
claims as to the cause-and-effect link.  I *am* selling my skills, but
I fail to see that there is anything pernicious about that.

I think it is also true that new compilers will be both more
aggressive and less buggy; if nothing else, the test suites only grow
larger and more comprehensive, and faster, cheaper machines allow more
tests to be run in a given amount of time, and wished-for and formerly
ad-hoc optimizations are being backed up by better theory.  The need
for better compilers will also increase with time as clock rates and
potential parallelism increase (both of these things pound harder on
memory, and increase the gains of good register allocation and
locality-enhancing transformations).

It is also unclear that your claims about the speed/risk trade-offs
are in any way connected with reality.  Speed gained for you by an
optimizing compiler is speed that you don't have to work for by
hand-coding (though both can be used together to get even more
speed-up if that is necessary), and the time saved by the programmer
is more time that can be spent debugging and testing the code.  Every
programmer I know has a bug introduction rate higher than any compiler
that I've ever used (but this is purely anecdotal -- perhaps it is
completely different for other people, eh?  do *you* find more
compiler-introduced bugs than author-introduced bugs in *your* code?),
so it is in fact more reliable to let the compiler do the optimizing,
even if you don't use the time saved for additional testing and
debugging.

Now, I will grant one important thing, which I think is also important
to you -- if I had garbage collection, I would be a more productive
programmer.  Garbage collection is available for "C", but it is
potentially broken by optimizing compiler for C, which means that one
powerful productivity and bug-avoiding aid is unavailable if you
optimize.  However, no specification for C anywhere in existence
guarantees that a garbage collector should be able to function, and
most programmers do not take advantage of garbage collectors, and some
programmers write (perfectly legal) code that is guaranteed to break a
collector at any level of optimization.  It pains me that this is the
case, but it is so, and if it increases the SPECmark to break a
non-standard piece of code, even a useful one, that piece of code will
be broken.  There's always Lisp, ML, Eiffel, Russell, Modula-3, and
Miranda.

I've said several of the things that I said in this article before --
you might seriously consider comparing the bug-introduction rate of
programmers, both skilled and ordinary, with the bug-introduction rate
of several "aggressively optimizing" compilers.  You might also check
to see if there are any trends in compiler bug rates over time, and
see how that correlates with the optimizations performed by those
compilers.  I think you'll find that your claims are not supported.

David Chase
Sun Microsystems, Inc.

barmar@think.com (Barry Margolin) (11/02/90)

In article <PCG.90Nov1171210@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>A program in some language, at any level of abstraction, defines an
>higher level virtual machine than that offered by the language. The
>primitives of the virtual machine defined by a program (most obviously,
>but only, its procedures and macros) are actually composed out of the
 ^^^^^^^^
Should this have been "but not only"?

>primitives and combinators of the language, and their meaning is defined
>according to the algebra for these as given in the language definition.
>
>Aggressive optimization is mapping from the virtual machine defined by
>the program (not by the language!) to the real machine *in the code
>generator*.

So aggressive optimization refers to optimization that does things like
interprocedural flow analysis?  Are loop unrolling, strength reduction, and
common subexpression elimination aggressive or not?

>This is in way of principle possible, because the code generator can
>indeed recognize combinations of primitive operators and rewrite them to
>other combinations that supposedly map better onto the target real
>machine, as long as the two are equivalent under the algebra of the
>language definition, in that particular context.

Code generators, and certainly unagressive optimizers, are almost always
required to "recognize combinations of primitive operators".  The only
optimizers that are independent of the primitive operators are peephole
optimizers.  For instance, in a language with a "+" operator that accepts
arguments of either float or integer, and with conversion operators
float(x) and integer(x), the compiler would be expected to emit different
code for float(x)+float(y) than for integer(x)+integer(y).  To do this it
must recognize the combination of the conversion operators with the "+"
operator.

To some extent a compiler is simply an optimizer for an interpreter.  The
dividing lines between code generation, ordinary optimization, and
aggressive optimization are very fuzzy.

>The problems are:
>
>Is this desirable?
>
>    No, because the rewriting algebra of the language may be poorly
>    defined, or poorly understood, and thus it may be difficult to rely
>    on the ability of the code generator to rewrite combinations of
>    language primitives in semantics preserving ways. This is
>    particularly true for languages like C, where a clean algebra for
>    rewriting was not a design goal. Ansi C have tried to do some poor
>    fixup engineering on this. We are not amused.

Where does this "rewriting algebra" come from, and what makes it poorly
defined?  I don't think the ANSI C standard includes a rewriting algebra; I
presume it's a function of the language's semantics.  So, the rewriting
algebra could conceivably be as well defined as the language itself.  A
poorly defined rewriting algebra could be the result of a poorly defined
language, or because the designer of the rewriting algebra did a poor job.

>    No, because even assuming that the underlying algebra is clean,
>    rewriting combinations of primitives is a far more concpetually
>    difficult task than mapping each of them to the real machine. In
>    particular it requires careful analysis. Such analysis tends to
>    result in more bugs in the compiler, and the compiler is the
>    bottleneck of an entire system's reliability.

Many of the things we do with computers were once thought to be too complex
to program, but that doesn't mean we shouldn't have tried to implement
them.  AI is an example of a computer task that we still consider
conceptually difficult, but we keep plodding on trying to solve it.
Therefore, my interpretation of the above is that you think aggressive
optimization should still be considered a research project, not something
that should be expected in commercial products.

>    No, because even if two combinations of primitives can actually be
>    shown in the given context to be equivaltn according to the
>    language's algebra, it ought to be assumed that the programmer
>    intended to write the particular one he chose to write for some
>    good reason.

Every good treatise on writing efficient programs generally starts with
something to the effect that one should write programs clearly first, and
then go back and optimize the bottlenecks.  So, the programmer is likely to
have chosen a combination of primitives that is closer to his algorithm
than to the real machine.  The programmer may not be aware of the physical
machine primitives that the language primitives map onto, or may believe he
knows but be mistaken (a particular compiler's mapping between virtual and
physical machines is not generally documented, except perhaps in the area
of data formats), so his "good reason" may actually be a "bad reason".
Also, many programs are targeted for multiple (or unspecified) physical
machines and/or compilers, so the programmer may not be able to choose
language primitives that are good for all the target environments.

>    No, because the speed advantages are not that dramatic (or
>    nonexistent), *if* the author is a clever programmer that knows
>    which of the equivalent formaulations will be optimal on the given
>    machine.

See above -- the program may not be targetted to a specific "given
machine".

>Are there alternatives?
>
>    Yes, rewriting the combinations of primitives not in the code
>    generator, but at the source code level, either manually or
>    automatically.

Are you saying that this should be done in a module that is independent of
the target machine and compiler?  I was under the impression that most
complex optimization was already beind done using source-source
translation, although sometimes it is necessary for the target source to
include nonstandard, implementation-dependent primitives that map more
directly onto the target hardware.  For instance, strength reduction and
common subexpression elimination translate the source into source with an
additional temporary variable.

Also, I don't think it matters what the destination of the rewrite is.  I
think the hard part is recognizing large combinations of primitives and
reasoning about them.  Transformation bugs usually result from unexpected
interations between seemingly unrelated portions of the code (e.g. due to
use of global variables or aliasing).  This is one reason why much of the
program transformation research has been done with functional languages:
the absence of side effects makes semantic analysis much easier.

>    Yes, using a language whose primitives are at the abstraction level
>    desired, i.e. the primitives offered by the language are those
>    needed for the application. In this case the optimization is not
>    aggressive, because the generator is only doing optimal
>    implementation of hopefully well defined language primitives, not
>    supposedly optimal rewriting of pieces of a user program.

Unfortunately, the computer industry doesn't abound with
application-specific languages.  And even if there were, finding
programmers proficient in many of them would be hard.  [Enter biased mode]
It's hard enough finding people proficient in the better general-purpose
languages; most schools just churn out Pascal and C programmers.  [Exit
biased mode]

>Are the alternatives cost effective?
>
>    Yes, because for *many* even if not all codes the time critical
>    portion is small enough that careful automatic rewriting is
>    feasible.

How does the automatic rewriter know which parts are time critical?  In
many cases, program optimization is an incremental process: you find the
bottleneck, rewrite it efficiently, find the new bottleneck, rewrite it,
etc., until the overall efficiency meets the goals.

>    Yes, because automatic rewriting can be to a higher level language,
>    which can then be more easily and non aggressively optimized, and
>    automatic rewriting gives long lasting benefits in portability and
>    confidence in the program.

Ah, now I see that you're talking about *replacing* the source file with
this automatically rewritten source file.  I thought you were talking about
source-level transformations internal to the compiler (usually operating on
the parse-tree, not the actual source text).  Since the purpose of the
transformation is to produce code that will map better by a specific
compiler onto a particular hardware, I don't see the portability benefit of
saving the transformed code.

>    Yes, because using a language whose primitives are at the level of
>    the problem is known to reduce costs and improve quality via reuse
>    of the language's implementation or libraries, where otherwise the
>    implementation of thsoe primitives would have to be rewritten and
>    re-aggressively optimized for every application.

This is true if you can find such a language and programmers who know it.
If the application is not very common then you will probably have to spend
quite a bit to purchase or develop the language implementation.  Subroutine
libraries for general purpose languages can often do nearly as well and are
much easier to come by.

>They are legal but dangerous and not cost effective, according to the
>above discussion.

You made the statement that these optimizations don't buy much, but didn't
back it up with any evidence.

>chased> The compiler-writer's claim is "I can prove that these
>chased> optimizations conform to the standard";
>
>STOP THE PRESSES! Compiler writer proves that he can do program
>rewriting in a bug free way! Program synthethizers must be round the
>corner! Major software companies stocks collapse!

Very little can be *proven* about any program; there are just too many
variables and dependencies.  A compiler writer who makes such a statement
is merely asserting that he's as confident about his optimizations as he is
about the rest of his compiler.
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

seanf@sco.COM (Sean Fagan) (11/02/90)

In article <8960020@hpfcso.HP.COM> mjs@hpfcso.HP.COM (Marc Sabatella) writes:
>None of these transformations violate the cardinal rule of assemblers, namely
>that of providing a one-to-one translation.  

The MIPS assembler does, I believe, provide a one-to-one translation.
However, it may also stick in NOP's, as necessary.

Remember that the original MIPS chip did not have "interlocks":  if you used
a register before the previous instruction was done with it, you would get
wrong results.  The NOP's were necessary at that point.

Consider a machine like the CDC 170-state architecture (any comp.arch
readers who knew I would bring it up somehow? 8-)).  It has/had 60-bit
words, with 15 and 30 bit instructions.  The assembler would pack 1-4
instructions per word; it provided padding (nop's, essentially) when the
next instruction wouldn't fit (e.g., 15 / 30 / 30 won't fit in one word), or
when the instruciton was proceeded directly by a lable (since you couldn't
jump into the middle of a word).

That was about as much a rewriting as the MIPS assembler (albeit on a
smaller scale 8-)), yet it was still an assembler.

-- 
-----------------+
Sean Eric Fagan  | "Quoth the raven,"
seanf@sco.COM    | "Eat my shorts!"
uunet!sco!seanf  |     -- Lisa and Bart Simpson
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/03/90)

On 2 Nov 90 00:13:15 GMT, barmar@think.com (Barry Margolin) said:

barmar> In article <PCG.90Nov1171210@odin.cs.aber.ac.uk>
barmar> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

pcg> A program in some language, at any level of abstraction, defines an
pcg> higher level virtual machine than that offered by the language. The
pcg> primitives of the virtual machine defined by a program (most obviously,
pcg> but only, its procedures and macros) are actually composed out of the
     ^^^^^^^^
barmar> Should this have been "but not only"?

Yes, a Freudian slip (was thinking of Lisp...).

pcg> Aggressive optimization is mapping from the virtual machine defined by
pcg> the program (not by the language!) to the real machine *in the code
pcg> generator*.

barmar> So aggressive optimization refers to optimization that does
barmar> things like interprocedural flow analysis?  Are loop unrolling,
barmar> strength reduction, and common subexpression elimination
barmar> aggressive or not?

They are all aggressive. Even substituting a shift for a multiplication
or division by a power of two may be considered aggressive in certain
languages where the properties of the two are not precisely identical.

barmar> Code generators, and certainly unagressive optimizers, are
barmar> almost always required to "recognize combinations of primitive
barmar> operators".

Ah yes, but not to rewrite them. And the recognition is typically simple
and well defined. It does not usually imply trying to understand the
effect of what is recognized. Languages where code generation itself (or
even parsing) is strongly context dependent make for dangerous
compilers (not to mention for hard to understand programs).

barmar> The only optimizers that are independent of the primitive
barmar> operators are peephole optimizers.

Yep, yep. Long live peephole optimizers!

barmar> To some extent a compiler is simply an optimizer for an
barmar> interpreter.  The dividing lines between code generation,
barmar> ordinary optimization, and aggressive optimization are very
barmar> fuzzy.

They are nearly the same thing, just like iteration can be said is a
special case of recursion.  But, as in this example, their conceptual
difficulty level is far different.  Code generation deals with
equivalences between a virtual and a real machine, ordinary optimization
with machine dependent transformations (already hazardous enough,
because the rewriting algebras of many real machines are underspecified
and contrived enough) and aggressive optimizations with rewriting
programmer's code.

barmar> Where does this "rewriting algebra" come from, and what makes it
barmar> poorly defined?  I don't think the ANSI C standard includes a
barmar> rewriting algebra; I presume it's a function of the language's
barmar> semantics.  So, the rewriting algebra could conceivably be as
barmar> well defined as the language itself.

Precisely. But usually this is not true, unless special care has been
taken. Often one has a fairly workable algebra that defines what is the
effect of a program, and a far less well defined one (usually implicit
in the former, as you say) that allows you to "prove" that transformations
preserve semantics. Theses are written about such things.

barmar> A poorly defined rewriting algebra could be the result of a
barmar> poorly defined language, or because the designer of the
barmar> rewriting algebra did a poor job.

Or maybe because the language was not designed with that goal in mind.
Some languages have been designed so that it is relatively easy to
reason about semantics preserving transformations; but even for these I
still think that tranformations ought to be in the open, as source
rewriting, than hidden in the code generator. But languages that people
want to aggressively optimize have not been designed with that goal in
mind, maybe because they were designed forty or twenty years ago as
autocoders or super assemblers.

pcg> No, because even assuming that the underlying algebra is clean,
pcg> rewriting combinations of primitives is a far more concpetually
pcg> difficult task than mapping each of them to the real machine. In
pcg> particular it requires careful analysis. Such analysis tends to
pcg> result in more bugs in the compiler, and the compiler is the
pcg> bottleneck of an entire system's reliability.

barmar> Many of the things we do with computers were once thought to be
barmar> too complex to program, but that doesn't mean we shouldn't have
barmar> tried to implement them.

Indeed, but such difficult things, especially for languages that have
not been designed for them, should not be put into the reliability
bottleneck of an entire system. That's why I think that algorithmic
transformation should be done in an independent module, and on the
source code. It's a plain question of responsibility. The programmer
takes responsibility for what is fed to the compiler. This ought to be
clear. It is the programmer's task, IMNHO, to ensure that the source is
OK and efficiently written -- the compiler's responsibility should be to
ensure that it is faithfully translated. If risky technology hs to be
used, it should be used by the programmer, and it should be evident in
the source.

I am sick of having to disassemble compiler output to see where the
optimizer has botched it. I'd rather like that the effort put in the
pursuit of code generator level aggressive optimizers were put into
source code level rewrite assistants, and in making compilers more
stable, efficient, and reliable at doing translation, not rewriting.

pcg> No, because even if two combinations of primitives can actually be
pcg> shown in the given context to be equivaltn according to the
pcg> language's algebra, it ought to be assumed that the programmer
pcg> intended to write the particular one he chose to write for some
pcg> good reason.

barmar> Also, many programs are targeted for multiple (or unspecified)
barmar> physical machines and/or compilers, so the programmer may not be
barmar> able to choose language primitives that are good for all the
barmar> target environments.

This is damn good reason to use higher level languages, and let the
compiler translate their primitives into locally efficient versions. It
is not good reason for the compiler to rewrite pieces of program.

pcg> Are there alternatives?

pcg>     Yes, rewriting the combinations of primitives not in the code
pcg>     generator, but at the source code level, either manually or
pcg>     automatically.

barmar> Are you saying that this should be done in a module that is
barmar> independent of the target machine and compiler?

Certainly of the compiler. Possibly of the machine. Such a module may be
either a source level rewrite assistant, or a librarian, or both.
Consider the Amsterdam Compiler Kit; they define a tolerably well
specified machine independent intermediate code, and then work on that.
I don't like that a lot, but it is a defensible approach. I still think
it is too hazardous, because the intermediate code generated by the
compiler may well be equivalent to the source, but maybe not
transformations of it. Demonstrating that the intermediate code output
of the compiler is equivalent to the source under all possible
transformations of it is not that simple.

barmar> I was under the impression that most complex optimization was
barmar> already beind done using source-source translation,

Not really. What happens is that the program is translated into some
very low level representation, this is analyzed and rewritten, and the
rewritten form is translated to machine code. Most of the time this very
low level representation is not terribly well defined.

barmar> For instance, strength reduction and common subexpression
barmar> elimination translate the source into source with an additional
barmar> temporary variable.

Well, no. You do not see the source. If you did, this would be a
rewriting assistant.

barmar> Also, I don't think it matters what the destination of the
barmar> rewrite is.

Ah, as long as things work. But making things work is the problem -- it
is easy to say that aggressive optimization is either semantics
preserving and OK, or it is broken, as Jim Giles says. The problem is to
show that it is, not to assume that it can be made so. The engineering
problem is to demonstrate that it does not add to the unreliability of a
compiler, contrarily to appearances, and that hiding it in the code
generator does not lower the confidence level in the correctness of
compiled programs.

I don't think I ever claimed that aggressive optimization should be
avoided even if it is bug and cost free. The problem is whether it can
be made so, and at what price, and if a more conservative and
step-by-step approach should be used.

barmar> I think the hard part is recognizing large combinations of
barmar> primitives and reasoning about them.

barmar> Transformation bugs usually result from unexpected interations
barmar> between seemingly unrelated portions of the code (e.g. due to
barmar> use of global variables or aliasing).

In other words they result from underspecified or poorly designed
language algebra. On the order hand some languages have been designed
for easing semantic tranformations:

barmar> This is one reason why much of the program transformation
barmar> research has been done with functional languages: the absence of
barmar> side effects makes semantic analysis much easier.

Precisely. But this to me is somewhat extreme. I wish there were a
better way. I have my own ideas on the subject, but let's keep them out
of the way for now.

pcg> Yes, using a language whose primitives are at the abstraction level
pcg> desired, i.e. the primitives offered by the language are those
pcg> needed for the application.

barmar> Unfortunately, the computer industry doesn't abound with
barmar> application-specific languages.  And even if there were, finding
barmar> programmers proficient in many of them would be hard.

Ah. Here we get outside the scope of this newgroup: we should be
discussing here what *ought* to be done. This is not
biz.compiler.marketing or soc.programming.education. I have repeatedly
conceded that the "dusty deck" or "dusty programmer" or "dusty language"
problem means that the quick fix solution may be indeed the tricky job
of doing aggressive optimization. I still think however that even in
thios common but pathological case source level rewriting, which is a
very interesting growth industry in the commercial sector (and this is a
remark that should appear in biz.compsci.opportunities), would be
better.

pcg> Are the alternatives cost effective?

pcg> Yes, because for *many* even if not all codes the time critical
pcg> portion is small enough that careful automatic rewriting is
pcg> feasible.

Sorry, I meant to write "manual rewriting".

barmar> How does the automatic rewriter know which parts are time
barmar> critical?  In many cases, program optimization is an incremental
barmar> process: you find the bottleneck, rewrite it efficiently, find
barmar> the new bottleneck, rewrite it, etc., until the overall
barmar> efficiency meets the goals.

Precisely, precisely. That's why I meant "manual". Naturally "manual" in
this case may also mean "assisted by tools", but still under programmer
control. As to automatic rewriting, instead:

pcg> Yes, because automatic rewriting can be to a higher level language,
pcg> which can then be more easily and non aggressively optimized, and
pcg> automatic rewriting gives long lasting benefits in portability and
pcg> confidence in the program.

barmar> Ah, now I see that you're talking about *replacing* the source
barmar> file with this automatically rewritten source file.

Yes, yes.

barmar> I thought you were talking about source-level transformations
barmar> internal to the compiler (usually operating on the parse-tree,
barmar> not the actual source text).  Since the purpose of the
barmar> transformation is to produce code that will map better by a
barmar> specific compiler onto a particular hardware, I don't see the
barmar> portability benefit of saving the transformed code.

But I wrote about automatic rewriting to a higher level language, or the
same language with calls to library procedures instead of ad hoc
codes...  This solves the problem at the root. You eliminate the need
for aggressive optimization, and you get better maintainability and
portability.

pcg> Yes, because using a language whose primitives are at the level of
pcg> the problem is known to reduce costs and improve quality via reuse
pcg> of the language's implementation or libraries, where otherwise the
pcg> implementation of thsoe primitives would have to be rewritten and
pcg> re-aggressively optimized for every application.

barmar> This is true if you can find such a language and programmers who
barmar> know it.  If the application is not very common then you will
barmar> probably have to spend quite a bit to purchase or develop the
barmar> language implementation.  Subroutine libraries for general
barmar> purpose languages can often do nearly as well and are much
barmar> easier to come by.

Indeed. That is why I say that the nice alternatives to aggressive
optimization are either a higher level language compiler or a librarian.

>They are legal but dangerous and not cost effective, according to the
>above discussion.

barmar> You made the statement that these optimizations don't buy much,
barmar> but didn't back it up with any evidence.

Bah. Hand waving hard here, the impression seems that on general purpose
architectures we get improvements in the 10-30% range. For special
purpose architectures running specific application classes the payoff
may be much higher, but here the issue of whether the language is well
matched to the application and the target machine becomes much much more
important.

chased> The compiler-writer's claim is "I can prove that these
chased> optimizations conform to the standard";

pcg> STOP THE PRESSES! Compiler writer proves that he can do program
pcg> rewriting in a bug free way! Program synthethizers must be round
pcg> the corner! Major software companies stocks collapse!

barmar> Very little can be *proven* about any program; there are just
barmar> too many variables and dependencies.  A compiler writer who
barmar> makes such a statement is merely asserting that he's as
barmar> confident about his optimizations as he is about the rest of his
barmar> compiler.

Which must be supported by good evidence, because reasoning says that
adding complexity, especially about murky areas of language definitions,
and especially where AI like behaviour is expected, to a program
normally does lower it reliability. Anectodal evidence supports the
impression that aggressive optimization in the code generator is still a
black art rather than a science.

It is up to the proponents of aggressive optimization to demonstrate
that it does not add to the complexity or reliability of a program, and
that it is a better alternative than rewriting, manually or
automatically, the program in a higher level form more amenable to
straightforward code generation, and that the benefits so gained are
worth the effort.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/03/90)

On 1 Nov 90 22:33:35 GMT, jlg@lanl.gov (Jim Giles) said:

jlg> From article <PCG.90Nov1171210@odin.cs.aber.ac.uk>, by
jlg> pcg@cs.aber.ac.uk (Piercarlo Grandi):

pcg> In a way or another this implies that each language implies a virtual
pcg> machine. The task of the code generator is to map the virtual machine
pcg> onto the real machine as well as it can, which means faithfully
pcg> (semantics) and efficiently (pragmatics).

jlg> Unfortunately for your argument, there are few languages which have
jlg> definitions of this kind.  The definition of a language usually
jlg> implies a _class_ of virtual machines.

Yes, of course a virtual machine may be left unspecified here and there.
You are indeed allowed to view such a virtual machine as a possibly
infinite class of more precisely defined virtual machines. Nominalism.

jlg> The code generator is allowed to map any of these onto the real
jlg> hardware.  Any optimizer which generates code which faithfully
jlg> implements anything within the class of acceptable behaviours is
jlg> correct.

Again, the problem is not to show that aggressive optimization is
desirable if it assumed bug free. Indeed this idea is a red herring:

jlg> If an optimizer leaves the class of acceptable behaviours, it is
jlg> _broken_.

Write or find a compiler, optimizing or not, for a popular language that
is not _broken_ according to your definition.

Of course you cannot. Now, the interesting question is how much of a
liability in terms of reduced confidence is aggressive optimization.
Reasoning shows that in way of principle it lowers the reliability of a
compiler. Show that this is not true, or that if true there are
compensating advantages, and quantify yoru argument.

The burden of proof is on proponents of aggressive optimization, becasue
they have to overcome the obvious observation that anything that
enlarges or complicates a program will tend to decrease its reliability.

jlg> You are trying to limit the compiler to code transformations that
jlg> _you_ regard as straightforward.  In truth, the compiler is only
jlg> limited by the language definition.

Only in a theoretical world. It is also limited by what is cost
effective, in the sense above. The argument here is not that aggressive
optimizations are illegal or impossible; it is whether they are the best
alternative, when compared to straightforward code generation plus
source level rewriting assistants, from an engineering point of view.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/03/90)

On 1 Nov 90 23:46:28 GMT, chased@rbbb.Eng.Sun.COM (David Chase) said:

chased> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
chased> The compiler-writer's claim is "I can prove that these
chased> optimizations conform to the standard";

pcg> STOP THE PRESSES! Compiler writer proves that he can do program
pcg> rewriting in a bug free way! Program synthethizers must be round the
pcg> corner! Major software companies stocks collapse!

chased> Read your own posting, please, and note that what you quote did not
chased> say what you say it said.  "I claim I can prove" is most definitely
chased> not the same as "I have proved".

Ahem. Agreed, it is not the same. This is a bit too easy... :-)

chased> It means that the steps in optimization are based on little
chased> proofs, and sketches of proves; it doesn't mean that there is a
chased> big book proving the compiler correct.

It so easy to convince oneself that things will work...

chased> Such a book *could* be written if the need existed (the
chased> existance of dire need would be demonstrated by the exhibition
chased> of wheelbarrows full of money to finance its writing).

Point taken. However, *you* (not you personally, of course) are
proposing new untried and potentially hazardous technology. The burden
of proof in on you and your customers. If one believes without proof, in
the presence of arguments that counsel diffidence, amen.

chased> And, regarding, the tobacco industry analogy (an ad hominem analogy,
chased> too -- aren't vile insults what one uses when arguing from a weak
chased> position?),

I am sorry that it was perceived as insults and not grave humour. And
certainly it was not meant to be ad hominem. Hey, I actually like David
Chase (and Jim Giles). Good discussions, even if we do not necessaily
agree, are excellent learning exercises, as layers of assumptions and
different world views are exposed. And no doubt David Chase and Jim
Giles are people one can learn from. OK?

chased> I will suggest that you have no evidence, only anecdotes,
chased> for the alleged pervasive bugginess of optimizing compilers.

For starters I produce the general principle that a bigger and more
complicated program, unless proven otherwise, has to be presumed to be
buggier. There is a discipline called software physics, a branch of
software engineering, that collects such grim statistics.

chased> Certainly you have nothing like the evidence gathered to
chased> demonstrate a connection between tobacco and lung disease, and
chased> only hand-waving claims as to the cause-and-effect link.

There is a large body of evidence about a strong correlation between
complexity and size and bugginess (note that it seems that over a
program's lifetime the number of bugs seems to be related *only* to its
size and complexity, not to debugging effort, etc..., for a given level
of programming expertise).

chased> I *am* selling my skills, but I fail to see that there is
chased> anything pernicious about that.

Nothing wrong about that, except that your compilers do not carry the
warning "The President of ACM reminds you that aggressive optimization
may be hazardous to your program's correctness" :-). Again you have to
prove or at least argue that the well known general principle that
bigger and hairier means buggier does not apply to your aggressive
optimizers.

If you were using the same technology in source level rewriting
assistants things would be different -- the burden of ultimate
responsibility for the program logic's reliability shifts from you to
the programmer.

chased> It is also unclear that your claims about the speed/risk trade-offs
chased> are in any way connected with reality.  Speed gained for you by an
chased> optimizing compiler is speed that you don't have to work for by
chased> hand-coding (though both can be used together to get even more
chased> speed-up if that is necessary), and the time saved by the programmer
chased> is more time that can be spent debugging and testing the code.

Again, what about the alternative of unbundling the risky rewriting
logic from the compiler and putting it in a source level rewriting
assistant? This is my favourite alternative. You could have as much fun
and business (and probably more) writing a source level rewriting
assistant as an aggressive optimizer, and it would be cleaner
engineering, and probably even more effective as to results.

chased> Every programmer I know has a bug introduction rate higher than
chased> any compiler that I've ever used (but this is purely anecdotal
chased> -- perhaps it is completely different for other people, eh?  do
chased> *you* find more compiler-introduced bugs than author-introduced
chased> bugs in *your* code?), so it is in fact more reliable to let the
chased> compiler do the optimizing, even if you don't use the time saved
chased> for additional testing and debugging.

and also:

chased> I've said several of the things that I said in this article before --
chased> you might seriously consider comparing the bug-introduction rate of
chased> programmers, both skilled and ordinary, with the bug-introduction rate
chased> of several "aggressively optimizing" compilers.

Ah no, this cannot stand. The compiler does a completely different job
from the programmer -- the programmer must *invent* new code, and show
ti respects some specification, the compiler must merely translate it,
showing that the translation is done according to hopefully well defined
rules. As long as things remain like this I am happy. Unfortunately
aggressive optimization is a step away from this, because it requires an
element of invention, and using less well defined rules.

chased> You might also check to see if there are any trends in compiler
chased> bug rates over time, and see how that correlates with the
chased> optimizations performed by those compilers.  I think you'll find
chased> that your claims are not supported.

This would be an interesting exercise for supporters of aggressive
optimization. Maybe they would be able to show that the general rule is
not true for aggressive optimizers, or that the increased level of
problems is compensated by the gains in resource economy. Maybe not. I
woudl like to remind that that I have no claims -- merely skepticism
about *their* unsupported claims, and the idea that motivated skepticism
counsels caution, much caution.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

grover@brahmand.Eng.Sun.COM (Vinod Grover) (11/03/90)

In article <PCG.90Nov2231021@athene.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>The burden of proof is on proponents of aggressive optimization, becasue
>they have to overcome the obvious observation that anything that
>enlarges or complicates a program will tend to decrease its reliability.

I fail to see how an optimiser can complicate a program if it performs its
transfromation at an intermediate level. The source program written by the
programmer is *not modified* in any way. The translation to a complex
intermediate form, if correct, does not affect the reliability of the source
program. 

Vinod Grover
Sun Microsystems

eric@snark.thyrsus.com (Eric S. Raymond) (11/03/90)

In <3674@stl.stc.co.uk> Tom Thomson wrote:
> one to one translation?  back in the 60s we used to talk about 
> the Fortran assembler;  even earlier we had machines like Deuce 
> where the assembler would fix the placement of instructions for
> you to minimise drum latencies, which was maybe one-to-one still
> but very much against the spirit of your proposed restriction on
> assemblers.

I don't think your examples do much to challenge the previous poster's
contention; they're all neolithic-era or older! These `optimizing'
assemblers were all designed before the idea of HLLs had matured, in
response to a *very* different set of tradeoffs than drives today's
relatively hard distinction between HLLs and assemblers.

I think it's a little too restrictive to characterize assemblers as one-to-one
instruction translators. The essential functional difference is that HLL code
is *opaque*; you can't look at it and write the corresponding machine code
sequence without knowing global information such as name-to-stack-offset
mappings. Assembler code, on the other hand, is transparent; you can look
at it and see where all the bits will fall.

"But wait!", I hear you say. "What about...macroassemblers?". Yes, they're
a bit of a hybrid. But hey, when's the last you heard of anybody *using*
one? One of the effects of maturing compiler technology for HLLs (and
especially of the spread of C) has been to `purify' assemblers; from the
point of view of a modern language designer, MACRO-10 was an overengineered
wrong turn, like a Zeppelin with stained-glass windows.

The pcc design made an important evolutionary step by actually generating
assembler source rather than object records. This completed the separation
of function -- the HLL for abstraction handling, the assembler for bashing
instruction bits into place.

I think there are all kinds of good engineering reasons to believe this
separation of function between HLL and assembler will be a stable feature
of future development-environment design for compiled languages. From that
modern point of view, the gentleman making the `one-to-one' claim was
correct in concept if not in detail.
-- 
      Eric S. Raymond = eric@snark.thyrsus.com  (mad mastermind of TMN-Netnews)

cik@l.cc.purdue.edu (Herman Rubin) (11/04/90)

In article <1849@exodus.Eng.Sun.COM>, chased@rbbb.Eng.Sun.COM (David Chase) writes:
> diamond@tkou02.enet.dec.com (diamond@tkovoa) writes:
> >C was invented in order to be that better assembler.  If you want machine
> >level, you're supposed to program in C.  That is exactly the purpose for
> >which C was created.  If you want an HLL, use one; don't cripple C.

			....................

> 1) several optimizations could be applied much more frequently to C if
> it weren't for possible aliasing in procedure pointer/array
> parameters.  If you don't understand what I'm talking about, go read a
> book on dependence analysis -- Wolfe's _Optimizing Supercompilers ..._
> will give you a feel for what can be done to Fortran.

It is possible that this was the intention for C, but I do not know of
any machine for which it succeeds in being a better assembler.  I believe
it is possible to achieve that purpose, but not in a language as limited
as C.  

The big problem with aggressive optimization is that the compiler cannot
have the necessary information to know what is safe.  The present "solution"
is to either only allow the compiler to do what is known, or at least believed,
to be safe, or to restrict the programmer so that the unsafe conditions 
supposedly cannot arise.

There is a way out of this, which I do not believe has been tried.  That is,
have the compiler ask the programmer!  It may even be that the programmer
does not know, but can find out.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/05/90)

On 3 Nov 90 04:16:26 GMT, grover@brahmand.Eng.Sun.COM (Vinod Grover) said:

grover> In article <PCG.90Nov2231021@athene.cs.aber.ac.uk>
grover> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

pcg> The burden of proof is on proponents of aggressive optimization,
pcg> becasue they have to overcome the obvious observation that anything
pcg> that enlarges or complicates a program will tend to decrease its
pcg> reliability.

grover> I fail to see how an optimiser can complicate a program if it
grover> performs its transfromation at an intermediate level. The source
grover> program written by the programmer is *not modified* in any way.

Sorry if It was not clear from what I wrote that the program I was
referring to was the *compiler*.

grover> The translation to a complex intermediate form, if correct, does
grover> not affect the reliability of the source program.

I intended to say that the code that effects the transformation adds to
the size and complexity of the compiler and therefore affects the
compiler's reliability. Each individual transformation may look simple
and safe, but once you consider that you need code also to do the
analysis that makes possible and drives the transformation process and
you also need to consider he interactions between tranformations, which
are difficult to model in languages that have not been designed with
transsformations in mind... You have a large amount of hairy code in the
compiler's bowels, and this is baaaad.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

jlg@lanl.gov (Jim Giles) (11/06/90)

From article <PCG.90Nov2231021@athene.cs.aber.ac.uk>, by pcg@cs.aber.ac.uk (Piercarlo Grandi):
> [...]                 Now, the interesting question is how much of a
> liability in terms of reduced confidence is aggressive optimization.
> Reasoning shows that in way of principle it lowers the reliability of a
> compiler. Show that this is not true, or that if true there are
> compensating advantages, and quantify yoru argument.

You are making the assumption that optimization techniques reduce the
reliability of the compiler.  This is certainly true for esoterica.
But, you have also attacked the ond standby techniques as well.

If I were writing a compiler for _reliability_ alone, I would
certainly make a complete data flow graph and a complete analysis of
control flow.  These techniques have been around for decades, are
stable, and much theoretical work (including correctness proofs) has
been done: in short, they are the most reliable tool after automatic
parser generators that are available in compiler work.  Having done
this, it is difficult (and therefore less reliable) _NOT_ to apply
the obvious code generation - which would be code which has been
optimized in the following ways (and probably more):

   Constant folding
   dead code elimination (given above: any branch _known_ not to be taken)
   common expression elimination
   loop invariant detection
   strength reduction
   ...

The last three are all optimizations that you have strongly opposed
in the past.  Yet, in a modern compiler, it would be _less_ reliable
not to apply them.  So, don't pretend your objections are based
soley on reliability.

J. Giles

jlg@lanl.gov (Jim Giles) (11/06/90)

From article <2701@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin):
> [...]
> There is a way out of this, which I do not believe has been tried.  That is,
> have the compiler ask the programmer!  It may even be that the programmer
> does not know, but can find out.

And yet, Herman Rubin is one of those who opposed my proposal for
explicit "aliased" attributes together with a feedback mechanism
in the loader for indicating when the user's explicit directives
were mismatched.  The specific purpose of these was to allow
the user do safely direct that optimizations were allowed (by
not aliasing things) or to tell the compiler not to optimize
specific combinations of variables (which are aliased).  Further,
the proposal was made in a specific way so that the user would
_always_ know whether aliasing were needed of not.

Oh well

J. Giles

chased@rbbb.Eng.Sun.COM (David Chase) (11/06/90)

pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
> Point taken. However, *you* (not you personally, of course) are
> proposing new untried and potentially hazardous technology. The burden
> of proof in on you and your customers. If one believes without proof, in
> the presence of arguments that counsel diffidence, amen.

I don't think that the technology is as new or untried as you seem to
think.  The dependence-based code manglers use information that is
gathered (for Fortran, at least) in a reliable fashion (for C, it's
not clear how much useful information there is without either pragmas
or inlining), and has been used in "vectorizers" and "parallelizers"
for about the last ten years.  There's been plenty of proofs to show
that dependence information means what it claims to mean.

The set of optimizations that one hopes a modern compiler would
perform (e.g., strength reduction, constant propagation, hoisting of
loop invariant code, register allocation, reassociation, linear
function test replacement) have also all been around for some time,
and thus are not untried, and are no more "potentially hazardous" than
scanning and parsing.  (Read on for a code size comparison.)

> For starters I produce the general principle that a bigger and more
> complicated program, unless proven otherwise, has to be presumed to be
> buggier.

Agreed, generally true.  A quick glance around the programs that I run
every day reveals that the (text of the) optimizer (for C, Fortran,
Pascal, and Modula-2) is smaller (often by an order of magnitude) than
vmunix, gnuemacs, dbx, cfront, and several language front ends.  So,
should I worry more about bugs in the front ends, bugs in the editor,
bugs in the OS, bugs in the debugger, or bugs in the optimizer?

Although, if people cared that much more about reliability than they
do about speed, I think we'd see a lot more people programming in
languages other than Fortran, C, and C++.  Of course, that's another
discussion, but it probably gives some indication of what people
expect from their compilers (besides perfection, of course).

David Chase
Sun Microsystems

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <2672@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> I have no objection to the compiler helping to optimize, and even making 
> suggestions to the programmer.  But I object to the first sentence; there
> should be NO details of the machine language, instruction overlap, etc.,
> hidden from the programmer.

It is, unfortunately, impossible to standardize a language that makes
allowances for every hardware known to man.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <4950@avocado5.UUCP> wallach@motcid.UUCP (Cliff H. Wallach) writes:
> In article <TED.90Oct26124614@kythera.nmsu.edu> ted@nmsu.edu (Ted Dunning) writes:
> -if you compile the following code on a sun3 with -O4,
   [ compiled array code ]
  [ obviously faster hand code ]

I observe that when I convert the original code into pointer code, the
compiler produces very similar results to the hand version.

Sure, Jim. Arrays are as efficient as pointers. Uh-huh.

Now I know why all my array code has always seemed so slow on Suns.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <4179@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
> Surely the cardinal rule for an assembler is not "one-to-one translation"
> but "full access to all machine operations".

Exactly! As a matter of principle I'm even willing to argue that the
programmer should be able to exert control over instruction scheduling,
even though I am reasonably confident that the compiler does this well.

> Why is it evil to offer
> optimisation as well?

It's not. I hope we can get Herman to agree on this.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <1932@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
> I will suggest that you have no evidence, only anecdotes,
> for the alleged pervasive bugginess of optimizing compilers.

He says this while another comp.lang.c poster complains about an
optimizer bug, while I find that a new optimizer crashes on an empty
loop, and while someone else sees an optimizing Fortran compiler that
still produces incorrect code for overly complex control structures.

---Dan

peter@ficc.ferranti.com (Peter da Silva) (11/06/90)

What's wrong with:

	if(array_a_and_array_b_overlap)
		use_safe_code
	else
		use_fast_code_that_doesn't_work_for_aliased_arrays.

The whole alias problem is a compiler technology problem.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/07/90)

On 5 Nov 90 22:15:54 GMT, jlg@lanl.gov (Jim Giles) said:

jlg> From article <PCG.90Nov2231021@athene.cs.aber.ac.uk>, by
jlg> pcg@cs.aber.ac.uk (Piercarlo Grandi):

pcg> [...]  Now, the interesting question is how much of a liability in
pcg> terms of reduced confidence is aggressive optimization.  Reasoning
pcg> shows that in way of principle it lowers the reliability of a
pcg> compiler. Show that this is not true, or that if true there are
pcg> compensating advantages, and quantify your argument.

jlg> You are making the assumption that optimization techniques reduce
jlg> the reliability of the compiler.  This is certainly true for
jlg> esoterica.  But, you have also attacked the old standby techniques
jlg> as well.

Whether the technique is well understood or not does not matter that
much -- the *algorithm* may be perfectly safe (even if proving so in the
context of a language whose semantics have not been designed for
transformations is quite hard, even for seemingly simple things like
common subexpression elimination); the hazard comes from the amount of
code needed to implement it. The code may be buggy -- actually *it
will*. More code, more bugs. Do we want 100 thousand lines of compiler
to debug when 10 thousand would do it? No... Naturally not all those 100
thousand are involved in the critical translation business, but the
translation business had better be as streamlined as possible.

More code more bugs, no question about it. It is not an assumption.
Anything that is not related to straight translation means more bugs.

jlg> If I were writing a compiler for _reliability_ alone, I would
jlg> certainly make a complete data flow graph and a complete analysis of
jlg> control flow.  These techniques have been around for decades, are
jlg> stable, and much theoretical work (including correctness proofs) has
jlg> been done:

But these techniques require pretty long and elaborate code. Again, if
we assume (and this is indeed a crazy assumption) that we can find an
algorithm to perform hairy transformations without making mistakes and
then implement the algorithm without bugs, then aggressive optimization
is a no-loss situation (except for compile time and space), and whatever
its benefits on run time and space we shoudl do it. It also wins if we
think the program is a "dusty deck" or the programmer is a "dusty
programmer". But in the latter case programmer's assistants at the
source level seem an even better win.

jlg> in short, they are the most reliable tool after automatic
jlg> parser generators that are available in compiler work.

This is a quite extraordinary statement -- you may be implying that
parsing C's syntax, which has been designed for easy context free LL(1)
parsing, is something as straighforward and reliable as rewriting
sequences of C's statements, which have not been designed for it? (note
that C's *expressions* have actually been designed for rewriting,
instead)

jlg> So, don't pretend your objections are based soley on reliability.

Never claimed this -- the claim is that anything that complicates a
crucial system component like a code generator without demonstrated and
quantified cost/benefit analysis should be treated with caution,
especially if the complication is about doing something that was not
anticipated at design time.

My argument is that such potentially risky business should be a separate
program, that it is the responsbility of the programmer to invoke,
guide, and where the programmer has to check the results. This may look
like too much work, but aggressive optimization is expected to make a
significant difference, like most any optimization, in a few cases, and
these may deserve attention.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

chased@rbbb.Eng.Sun.COM (David Chase) (11/07/90)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>chased@rbbb.Eng.Sun.COM (David Chase) writes:
>> I will suggest that you have no evidence, only anecdotes,
>> for the alleged pervasive bugginess of optimizing compilers.
>
>He says this while another comp.lang.c poster complains about an
>optimizer bug, while I find that a new optimizer crashes on an empty
>loop, and while someone else sees an optimizing Fortran compiler that
>still produces incorrect code for overly complex control structures.

That's still just anecdotes.  You've found what, maybe one bug per
compiler?  That's not pervasive.  Note the nature of the bugs --
you've got one bug in a beta-test compiler (GCC -- this is not a slam
on GCC either; most people don't release their beta compilers for the
world to tinker with), one bug that, though annoying, won't cause you
to go debugging down a rathole, and one bug that will actually cause a
programmer to waste time.  I'd call that last bug serious.

I've never claimed that optimizers were bug-free, but I will claim
that the bug rate is acceptably low for the improved performance that
they offer, especially if compared with other methods of obtaining
that performance (i.e., hand-tweaking of assembly language,
source-to-source transformations).  People make mistakes.
Furthermore, in the sad event that a compiler has a bug in it, I can
often write down description of the bug and put it on my list of
things to avoid/watch for(/repair?) in the future.  Human beings,
however, are creative, and make *new* mistakes.

People should do the optimizations that compilers cannot hope to get
(i.e., selection of appropriate algorithm), and leave the repetitive
twiddly stuff for compilers.  Furthermore, even after extensive
application of hand-optimizations, optimizing compilers can still
yield performance improvements by performing optimizations not
expressible (within bounds of tedium) in the source language (note
that some of these optimizations still fall into the category of
"aggressive" optimizations -- e.g., movement of non-volatile loads and
stores for scheduling, software pipelining, tail-call elimination).

Arg.  I'm sure I sound like a broken record, but I'm sure glad I
stayed out of the pointer-and-arrays fiasco.

David Chase
Sun Microsystems

jlg@lanl.gov (Jim Giles) (11/07/90)

From article <2133:Nov607:16:1090@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
< In article <4950@avocado5.UUCP> wallach@motcid.UUCP (Cliff H. Wallach) writes:
<> In article <TED.90Oct26124614@kythera.nmsu.edu> ted@nmsu.edu (Ted Dunning) writes:
<> -if you compile the following code on a sun3 with -O4,
<    [ compiled array code ]
<   [ obviously faster hand code ]
<
< I observe that when I convert the original code into pointer code, the
< compiler produces very similar results to the hand version.
<
< Sure, Jim. Arrays are as efficient as pointers. Uh-huh.

I have never made the claim that arrays _are_ as efficient as
pointers.  I have made the claim that _most_ code I've seen are
_faster_ with arrays.  I've also frequently pointed out that a
_bad_ compiler can make anything happen.  The _basic_ claim I
always make is that there is no reason at the present state of
the art for arrays to ever be slower that the corresponding
pointer implementation.  If a given implementation implements
arrays slower, then it isn't state of the art.

By the way, I frequently make rather strong remarks about the
bad quality of C compilers.  Recently someone has pointed out
that there is a sound reason for it.  Since the optimizations
which I have been discussing are heavily impaired by the presence
of aliasing, and since things are almost always aliased in a
C context, C compiler writers simply don't bother to do those
optimizations.   It seems practical that a technique which is
almost always inhibited is not really worth implementing.  So,
any of you people out there writing C compilers, I apologize
if my remarks offend - there's nothing you can do about it
if the language itself makes it useless to optimize.  Still,
I would expect a truly "production quality" compiler to do
well with arrays - at least, if _I_ have to par for it.

J. Giles

smryan@garth.UUCP (Steven Ryan) (11/07/90)

Beat me! Whip me! Make me use an optimiser!

Isn't this kind of silly? Are there really compilers that force people to
use the optimiser or is it optional? If it's optional and you don't want,
don't use. If you can buy a compiler without an optimiser (which is easy),
do so. If nobody wants an optimiser, nobody buys them. If somebody wants
one, why not let them buy it?

Leather.
Windows.
And chains.
Oh! my!
-- 
...!uunet!ingr!apd!smryan                                       Steven Ryan
...!{apple|pyramid}!garth!smryan              2400 Geng Road, Palo Alto, CA

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/07/90)

In article <2213@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
> You've found what, maybe one bug per
> compiler?

And the bugs keep coming, and coming, and coming, and coming ...

I think we've successfully reduced this argument to a simple question:
``Do optimizers gain more speed than they lose reliability?'' Again, I'm
firmly convinced that heavy optimizations are on the latter side. It's
pretty clear that you are on the former side. In any case I think the
argument is finished.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/07/90)

In article <6PX6-O@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> What's wrong with:
> 	if(array_a_and_array_b_overlap)
> 		use_safe_code
> 	else
> 		use_fast_code_that_doesn't_work_for_aliased_arrays.

Yes, that's what I've been trying to convince Jim of for most of this
year. Note that it cannot be implemented portably in C---remember my
question a while back in comp.lang.c about whether there was any way to
detect whether a pointer was within a given array[n]? The answer is that
there isn't, and you need a language extension to do it right.

However, a smart compiler can perform this optimization automatically. In
most cases it can do the test at compile time, and generate calls to
either the fast compiled version or the slow compiled version. I don't
know why Jim repeatedly insists this is impossible.

> The whole alias problem is a compiler technology problem.

Exactly.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/07/90)

In article <5079@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> I have never made the claim that arrays _are_ as efficient as
> pointers.  I have made the claim that _most_ code I've seen are
> _faster_ with arrays.  I've also frequently pointed out that a
> _bad_ compiler can make anything happen.

Are you insinuating that Sun's optimizing compiler is a bad compiler?

> If a given implementation implements
> arrays slower, then it isn't state of the art.

I'd love to see your brilliant state-of-the-art solution. You're saying
that, the compiler can figure out rather general assertions about what
indices are going to be used where, and can generate code where the
pointers (arrays) are pre-indexed appropriately. Could one of you CS
types please illustrate to Jim that finding optimal addition chains in a
general computation is equivalent to solving the halting problem?

---Dan

jlg@lanl.gov (Jim Giles) (11/07/90)

From article <7177:Nov620:48:1590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [... if-test around aliased vs. non-aliased code ...]
> However, a smart compiler can perform this optimization automatically. In
> most cases it can do the test at compile time, and generate calls to
> either the fast compiled version or the slow compiled version. I don't
> know why Jim repeatedly insists this is impossible.

The example given doesn't even contain a procedure call.  It has
nothing at all to do with the thing that I consistently have asserted
is not possible for the compiler _alone_.  As it happens, I agree that
this case probably has a purely compile-time (or compiled-in run-time)
solution.

It is you continued claim that the compiler can solve interprocedural
aliasing questions by itself (and at compile-time) that I disagree with.

J. Giles

jlg@lanl.gov (Jim Giles) (11/07/90)

From article <7659:Nov620:58:5990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]                                            Could one of you CS
> types please illustrate to Jim that finding optimal addition chains in a
> general computation is equivalent to solving the halting problem?

Could anyone in Dan's immediate neighborhood please go over to his
place and explain what a data dependency graph is?  This is one
of the most stable and reliable tools available in the compiler
generator's kit.  It allows one to trace the use of an array index
right back to the place where the index was calculated.  Or, failing
that, to the place where the index came into the present context
(a procedure argument, for example).  In either case, the compiler can
clearly place the add of the array base to that index at _exactly_
the place where the index becomes a "live" value.  Or, the compiler
can do the add of the array base at any intervening point in the
control flow graph (also an extremely stable and reliable part of
the compiler writer's art).  If the array index becomes a "live'
value before the array reference (that is, the index is precomputed),
then the whole reference address can be precomputed by the compiler
exactly there.

J. Giles

gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) (11/07/90)

In article <2133:Nov607:16:1090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>I observe that when I convert the original code into pointer code, the
>compiler produces very similar results to the hand version.

Maybe you should have tried gcc -- it does quite a bit better than cc
on this simple exaple, *which includes no alias possibilties*.

Real array code has possible aliasing all over the place.

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/08/90)

On 6 Nov 90 01:21:12 GMT, chased@rbbb.Eng.Sun.COM (David Chase) said:

chased> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

pcg> Point taken. However, *you* (not you personally, of course) are
pcg> proposing new untried and potentially hazardous technology. The
pcg> burden of proof in on you and your customers.

chased> I don't think that the technology is as new or untried as you
chased> seem to think.

Well some of it is. All the parts where one is trying to do serious
program rewriting are akin to program synthesis, which is still way from
anything I would depend on. Also, the point is that it is not
*necessary*; it is only used for cost/benefit reasons. It remains to be
shown that the costs are greater than the benefits. Jim Giles says that
any optimizer that is not bug free is _broken_, and thus (I guess from
his tone) irrelevant to our discussion. He therefore dismisses a large
part of the cost side of the equation, which is a bit too easy.

chased> The set of optimizations that one hopes a modern compiler would
chased> perform [ ... have been with us for a long time and thus ... ]
chased> are no more "potentially hazardous" than scanning and parsing.

Ahem. Scanning and parsing are usually context-free, they do not involve
any interpretation of the input (semantics), their algebras are very
well understood indeed and fairly simple, etc... -- I would grant though
that code generation is approaching the level of confidence of parsing
and scanning (thanks to the works of people trying to automate as much
of it as possible), but even that is subject to doubt.

But I concede that many of the optimizations that you mention are indeed
'traditional'; given sufficiently large amounts of time, talent and
effort they could be performed reliably inside a code generator.

But then my second level of argument is triggered -- they may have
become cost-effective for the user (visible performance gain, small
reliability loss), but is it true that investing in an aggressive
optimizer is the best way to achieve that user level cost-effectiveness?

Better training may give greater and longer lasting benefits, using
higher level languages may give greater and longer lasting benefits,
source level programmer's assistant may give greater and longer lasting
benefits. All such technologies are tested and proven (except for
programmer's assistants, unfortunately).

Better training and use of higher level languages do not apply to the
"dusty deck" and "dusty programmer" problems, admittedly, so there
aggressive optimization may be justified, but even then there is some
reason to think that programmer's assistants and source rewriting may be
preferable to aggressive optimization, and mor eresearch on that may pay
off better than more research on aggressive optimization.

pcg> For starters I produce the general principle that a bigger and more
pcg> complicated program, unless proven otherwise, has to be presumed to be
pcg> buggier.

chased> Agreed, generally true.  A quick glance around the programs that
chased> I run every day reveals that the (text of the) optimizer (for C,
chased> Fortran, Pascal, and Modula-2) is smaller (often by an order of
chased> magnitude) than vmunix, gnuemacs, dbx, cfront, and several
chased> language front ends.

But these comparisons are not that meaningful! Bugs in the editor and in
the debugger are not as dangerous as those in the compiler. Bugs in the
OS are almost as dangerous though. Also, a lot of these programs is
relatively trivial and unimportant -- almost all of an aggressive
optimizer is critical.

Not every type of program has the same bug intensity -- scanners and
parser can be reasonably expected to have much less bug intensity than
optimizers, and whatever they intrinsic complexity they are in any case
*necessary*. An optimizer is not *necessary*; especially not an
aggressive optimizer, give that there are alternatives. A code
generator, a parser and a scanner and symbol table management are
necessary.

The argument is not, I repeat here for the millionth time, that
optimizers are useless or infinitely buggy or otherwise that if they are
perfect using them is a no loss situation -- it is that, since they are
not necessary, and they are very critical, they must be *proven* to add
more value (performance) than they subtract (*extra*, avoidable bugs),
and that their effect is not best achieved by other means.

chased> So, should I worry more about bugs in the front ends, bugs in
chased> the editor, bugs in the OS, bugs in the debugger, or bugs in the
chased> optimizer?

You should worry a lot about them too! I am no less unhappy about the
size of vmunix, GNU Emacs, gdb and so on than about the size of gcc and
cfront and accomplices. Indeed I am consistent in other newgroups in
complaining about the large increases in complexity and unreliability,
with little gain in performance (or loss of it) or in functionality,
that these programs give, and offering examples of equivalent technology
that is better designed and provides the same benefits at lower cost.
RISC software!

chased> Although, if people cared that much more about reliability than
chased> they do about speed, I think we'd see a lot more people
chased> programming in languages other than Fortran, C, and C++.  Of
chased> course, that's another discussion, but it probably gives some
chased> indication of what people expect from their compilers (besides
chased> perfection, of course).

Let's be temporarily diverted to that other, sociological discussion:

If they cared about speed almost all programmers would be much better
off if they used a higher level language and left code generation to the
compiler, or to a programmer's assistant, or to the authors of a
library.  What instead they expect is to write poor code themselves and
have a magic optimizer substitute it with the better code that a higher
level language would have more economically and reliably translated
into.

To get back to the admittedly exxagerated tobacco industry analogy -- if
smokers cared about their health, they would be eating carrots instead,
but they don't.  This is maybe a good reason to be a cigarette
manufacturer (there is a market demand) but maybe not a good reason to
say that if people don't care about a problem it is irrelevant.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

jlg@lanl.gov (Jim Giles) (11/08/90)

From article <PCG.90Nov6181731@odin.cs.aber.ac.uk>, by pcg@cs.aber.ac.uk (Piercarlo Grandi):
> On 5 Nov 90 22:15:54 GMT, jlg@lanl.gov (Jim Giles) said:
>
> [...]                        Do we want 100 thousand lines of compiler
> to debug when 10 thousand would do it? No... Naturally not all those 100
> thousand are involved in the critical translation business, but the
> translation business had better be as streamlined as possible.

I don't know what 100 thousand lines you are referring to.  The
techniques I am recommending make compilers smaller and easier to
write and maintain.  They also, as a side-effect of how they work,
carry out several of the optimizations that you are on record as
opposing.  I would not trust an ad-hoc compiler of the kind you
seem to be recommending to get the "hello world" example correct.

> [...]
> More code more bugs, no question about it. It is not an assumption.
> Anything that is not related to straight translation means more bugs.

In other words, ad-hoc "straight" translations is better that structured
application of known compiler techniques.  Since most of the known
techniques are off-the-shelf the compiler writer can concentrate on
getting any differences specific to his own task working correctly.

> [...]
> But these techniques require pretty long and elaborate code.  [...]

Really?  The smallest, quickest written Modula-2 compiler I've
seen has nearly all the techniques That I've been discussing.
The goal of the thing was to produce a simple compiler that
concentrates of getting the best code generated for the least
effort by the compiler.  Don't know where this "long and elaborate
code" you're talking about is hiding.  You must be thinking about
the esoterica that I mentioned last time (and which I specifically
said I _wasn't_ talking about).

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/08/90)

In article <1990Nov6.224410.15068@murdoch.acc.Virginia.EDU> gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) writes:
> In article <2133:Nov607:16:1090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >I observe that when I convert the original code into pointer code, the
> >compiler produces very similar results to the hand version.
> Maybe you should have tried gcc -- it does quite a bit better than cc
> on this simple exaple, *which includes no alias possibilties*.

Weird, you're right. gcc does a lot better on array code. Sun, would you
please make the obvious additions to your optimizer?

---Dan

misha@ai.mit.edu (Mike Bolotski) (11/08/90)

In article <7659:Nov620:58:5990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

> pointers (arrays) are pre-indexed appropriately. Could one of you CS
> types please illustrate to Jim that finding optimal addition chains in a
> general computation is equivalent to solving the halting problem?

Bzzt.  But thanks for playing, Dan.

Optimal addition is NP.  Halting problem is undecidable (ever).
You can argue practicality, but not theory.

--
Mike

--
Mike Bolotski          Artificial Intelligence Laboratory, MIT
misha@ai.mit.edu       Cambridge, MA

gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) (11/08/90)

In article <6PX6-O@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>What's wrong with:
>
>	if(array_a_and_array_b_overlap)
>		use_safe_code
>	else
>		use_fast_code_that_doesn't_work_for_aliased_arrays.

Of course, this won't catch every situation where aliasing can occur.
However, you might be happy to know that Cray's Fortran compiler will
insert tests into runtime code to see if particular loops are vector
or scalar, in situations where a variable determines the behavior.
I've yet to see a C compiler which does this, although some (Convex)
do some compile-time checking for aliasing in array references.

See my reply to Dan for situations where this won't catch all
Fortran-style non-aliased expressions.

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/09/90)

On 6 Nov 90 20:12:12 GMT, smryan@garth.UUCP (Steven Ryan) said:

smryan> If it's optional and you don't want, don't use. If you can buy a
smryan> compiler without an optimiser (which is easy), do so. If nobody
smryan> wants an optimiser, nobody buys them. If somebody wants one, why
smryan> not let them buy it?

Oh, this is sociology -- the issue is not whether there is a *market*
for optimizers or not -- it is whether the market should exist, and
which needs it should satisfy.

People who buy optimizers do so not on the basis of adverts associating
them with scantily clad gorgeous females or (even more alluring to some
people :->) 64 bit superscalar CPUs.  They (should) do so on the basis
of a cost effectiveness argument, and we have been debating this.

The argument is actually twofold: as a user, should I buy an optimizer?
as a developer, should I invest in building an optimizer?

The second problem is also very important -- the computer industry is
largely driven by the supply, not the demand side, quantitatively and
qualitatively. Users look to implementors for a sense of direction. The
decisions of implementors will be often followed without questioning,
because users do not have the experience or insight and more or less
know it.

IMNHO the main reason for which the computer industry is driven by the
supply side is that talent is very scarce, and for this reason not all
possible alternatives worth exploring can be explored.

The problem, bfore considering market demand, can also be expressed
thus: should David Chase and Jim Giles and Preston Briggs and Hank Dietz
(just to mention some frequent contributors to this sort of discussion)
spend their time devising ever more ingenious and complicated and hairy
code generator level hacks, or would their research and development
efforts be better spent on clever compilation of higher level languages
and source level programmer's assistants? Are there enough compiler
experts to explore both possibilities? If not, which possibility looks
the best winner? Is aggressive optimization of low level languages a
technological blind alley like autocoders? Inquiring minds want to
know...

Not simple questions, and I am presenting here an unfashionable but
hopefully amusing set of answers to them, which should not be ignored.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/09/90)

On 6 Nov 90 20:48:15 GMT, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) said:

brnstnd> In article <6PX6-O@xds13.ferranti.com> peter@ficc.ferranti.com
brnstnd> (Peter da Silva) writes:

peter> What's wrong with:
peter> 	if(array_a_and_array_b_overlap)
peter> 		use_safe_code
peter> 	else
peter> 		use_fast_code_that_doesn't_work_for_aliased_arrays.

A lot of things. Code bloat for one. But it is an appealing solution to
some difficult problem in many useful cases.

brnstnd> [ ... ]
peter> The whole alias problem is a compiler technology problem.
brnstnd> Exactly.

No, and here we have our largest difference: it is a language design
problem, and a problem of choosing the suitable language.

This discussion reminds me of Dijkstra's famous address in 1968 to some
(the IFIP?)  conference, on the software crisis, that went like this:

  We have a software crisis, and we are told that we must advance the
  state of the art so as to solve it. We are only given a few constraints
  to satisfy:

	  no changes to the hardware platforms
	  no changes to the languages
	  no changes to the operating systems 
	  no changes to the education of programmers
	  no changes to the software tools
	  no changes to organizational practices
	  ...

  Well, this is RIDICULOUS!

In twenty two years progress has not exactly astonishing, except for a
few things, like DBMSes. There has even bit some kind of regress towards
mediocrity.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/09/90)

On 6 Nov 90 20:39:13 GMT, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) said:

brnstnd> And the bugs keep coming, and coming, and coming, and coming
brnstnd> ...  I think we've successfully reduced this argument to a
brnstnd> simple question: ``Do optimizers gain more speed than they lose
brnstnd> reliability?'' Again, I'm firmly convinced that heavy
brnstnd> optimizations are on the latter side. It's pretty clear that
brnstnd> you are on the former side. In any case I think the argument is
brnstnd> finished.

Not so easily: the argument is also: are the alternatives at least in
way of principle as effective and safer?

Because if aggressive optimization were the only alternative, we would
only have choice between more speed and less reliability or less speed
and more reliability, and more speed can often mean a lot, and many
people do not care about reliability (e.g. the quoted bloat in vmunix,
GNU Emacs, etc...). But there are alternatives.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

jlg@lanl.gov (Jim Giles) (11/09/90)

From article <MISHA.90Nov7191750@just-right.ai.mit.edu>, by misha@ai.mit.edu (Mike Bolotski):
> In article <7659:Nov620:58:5990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>> pointers (arrays) are pre-indexed appropriately. Could one of you CS
>> types please illustrate to Jim that finding optimal addition chains in a
>> general computation is equivalent to solving the halting problem?
> [...]
> Optimal addition is NP.  Halting problem is undecidable (ever).
> You can argue practicality, but not theory.

Besides, the problem under discussion originally wasn't the optimal
addition chain problem anyway.  The original question was whether
there were cases when a pointer _could_ be precomputed but a compiler
_couldn't_ find the same optimization.  So, the issue is this:

   p = expression + &x;         i = expression /* same as pointer version*/
   ...real far away...`         ...real far too...
   a=*p;                        a=x[i]

The only question is "can the compiler promote the add of the base
of array 'x' up to the point where 'i' is precomputed and therefore
get the same efficiency as the precomputed pointer?"  Turns out that
this problem is not even NP, it's O(N^2) for languages which permit
arbitrary GOTOs and it's O(N) for "structured" languages (where all
constructs are guaranteed to be properly nested, etc.).  In both
cases, the number N is the number of 'basic blocks' in the code
to be optimized.  The "liveness" analysis used is required anyway
in order to do compile-time error checking adequately.

Note: even a language with GOTOs can be analyzed in O(NlogN) if the
flow graph is "reducible".  See any good book on compiler construction.

Note2: having pointed out that the optimization is possible, I should
say that it's not always desireable.  If the array 'x' is statically
allocated, many mainframes and minis have address modes which would
do the add in the memory unit for free.  See Aho, Sethi and Ullman's
`dragon' book, chapter 9.  In chapter 10 of the same book, they
work an optimization example where this optimization is possible,
but they don't do it for this very reason.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/09/90)

In article <5351@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <MISHA.90Nov7191750@just-right.ai.mit.edu>, by misha@ai.mit.edu (Mike Bolotski):
> > In article <7659:Nov620:58:5990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >> pointers (arrays) are pre-indexed appropriately. Could one of you CS
> >> types please illustrate to Jim that finding optimal addition chains in a
> >> general computation is equivalent to solving the halting problem?
> > [...]
> > Optimal addition is NP.  Halting problem is undecidable (ever).
> > You can argue practicality, but not theory.

(Note that I have demonstrated in another article that Mike is wrong.)

> Besides, the problem under discussion originally wasn't the optimal
> addition chain problem anyway.  The original question was whether
> there were cases when a pointer _could_ be precomputed but a compiler
> _couldn't_ find the same optimization.  So, the issue is this:
>    p = expression + &x;         i = expression /* same as pointer version*/
>    ...real far away...`         ...real far too...
>    a=*p;                        a=x[i]

That's exactly what I'm talking about! The i's and j's and other
integers used to index the arrays are *added* together. The program
could have x[i + j - 17] and lots of other weird expressions, all
evaluated conditionally through the control structure of the program.
But I didn't need even this in my undecidability proof: all I needed was
x[0], x[1], and control flow.

  [ Jim waxes prolific: ]
> The only question is "can the compiler promote the add of the base
> of array 'x' up to the point where 'i' is precomputed and therefore
> get the same efficiency as the precomputed pointer?"  Turns out that
> this problem is not even NP, it's O(N^2) for languages which permit
> arbitrary GOTOs and it's O(N) for "structured" languages

I simply don't understand where you and Mike are getting this ridiculous
assertions from. What problems are you fantasizing you've solved here?
Where are your purported proofs?

> Note2: having pointed out that the optimization is possible, I should
> say that it's not always desireable.  If the array 'x' is statically
> allocated, many mainframes and minis have address modes which would
> do the add in the memory unit for free.

Good point. All the problems, reappear, though, as soon as you have more
than one addition. In x[i + 1], do you precompute *x + 1? Or *x + i? Or
the whole thing? Or maybe *x + j, because x[j] is referenced a lot more?

---Dan

jlg@lanl.gov (Jim Giles) (11/10/90)

From article <24232:Nov905:50:0690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> That's exactly what I'm talking about! The i's and j's and other
> integers used to index the arrays are *added* together. The program
> could have x[i + j - 17] and lots of other weird expressions, all
> evaluated conditionally through the control structure of the program.
> [...]

Ok.  So how do _you_ propose to optimize this using pointers?  Do you
plan to do the following:

   p = &x + i + j - 17;
   /*... lots of code ...*/
   a = *p

If that's what you plan, then I will do:

   ii = i + j - 17
   /*... lots of code ...*/
   a = x[ii]

And the optimizer can promote the add of the base of 'x' back to the
point where 'ii' was computed.  Well, maybe you plan to do:

   if (some_condition)
      p = &x + an_expression
   else
      p = &x + another_expression;
   /*... lots of code ...*/
   a = *p

Then I'll do this:

   if (some_condition)
      ii = an_expression
   else
      ii = another_expression;
   /*... lots of code ...*/
   a = x[ii]

And the compiler can promote the add to the place where the two
if branches reconverge - or even up each branch!  This promotion
problem is O[N] in the number of basic blocks if the language is
"structured".

The fact is, no matter _how_ you precompute the pointer value, the
array index can be precomputes _exactly_ the same way except for
the add of the array base - which can be promoted back to where
the rest of the calculation if the compiler writer wants it to be.
Of course, in the case of static arrays, you usually don't want it
to be - at least on most mainframes.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/10/90)

In article <5491@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
  [ given any code using pointer arithmetic, the compiler can convert ]
  [ it into array code that is just as fast ]

That's true. However, the compiler cannot figure out the optimal code
either way! You're ignoring the problem once again.

You *have* to let the programmer shift his arrays manually, because no
algorithm can produce the optimal shifts for a program.

Jim, you've made a blanket assertion that arrays are as efficient as
pointers. I've proven you wrong. Give up.

---Dan

meissner@osf.org (Michael Meissner) (11/12/90)

In article <123@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:

| Beat me! Whip me! Make me use an optimiser!
| 
| Isn't this kind of silly? Are there really compilers that force people to
| use the optimiser or is it optional? If it's optional and you don't want,
| don't use. If you can buy a compiler without an optimiser (which is easy),
| do so. If nobody wants an optimiser, nobody buys them. If somebody wants
| one, why not let them buy it?

I once worked on a language (DG/L at Data General), in which it had NO
option to turn off the optimizer.  It was 9 passes plus a control
program that ran in a 64K address space (and no segments to extend the
address space), and had some optimizations that the later DG compilers
never did, such as doing interprocedural analysis with code motion
from internal subroutines to their caller's.

I've also found a few cases in finishing GCC ports for the 88k and
MIPS, where optimizing the code produced correct code, and not
optimizing produced compiler error messages.  This is because
constraints in the machine description were not quite correct, but the
optimizer smoothed things over by putting variables into registers.

I personally never compile any code, except test suites, without
enabling both debug and 'full' optimization these days, unless I'm
using a vendor's PCC derived compiler.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?

misha@ai.mit.edu (Mike Bolotski) (11/13/90)

In article <24232:Nov905:50:0690@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
|> In article <5351@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
|> 
|> (Note that I have demonstrated in another article that Mike is wrong.)

Yeah, right.

|>   [ Jim waxes prolific: ]
|> > The only question is "can the compiler promote the add of the base
|> > of array 'x' up to the point where 'i' is precomputed and therefore
|> > get the same efficiency as the precomputed pointer?"  Turns out that
|> > this problem is not even NP, it's O(N^2) for languages which permit
|> > arbitrary GOTOs and it's O(N) for "structured" languages
|> 
|> I simply don't understand where you and Mike are getting this ridiculous
|> assertions from. What problems are you fantasizing you've solved here?
|> Where are your purported proofs?

Read a book, Dan.  You've chosen to ignore Jim's suggestion of the Aho, 
Sethi, and Ullman text, *in the very article you quote.*

Chapters 9 and 10 should clear up your confusion about reducible
flow graphs, conservative vs non-conservative optimizations, and 
various aspects of aliasing.  (Although the rest of the books is
also quite informative and would probably save this newsgroup
several megabytes of explanations.

-- 
Mike Bolotski          Artificial Intelligence Laboratory, MIT
misha@ai.mit.edu       Cambridge, MA

jlg@lanl.gov (Jim Giles) (11/14/90)

From article <7001:Nov1008:37:2690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> Jim, you've made a blanket assertion that arrays are as efficient as
> pointers. I've proven you wrong. Give up.

You have not even posted _one_ counter-example.  You have only
posted things the the compiler _could_ have optimized the array
version as well (or better than) your hand optimized pointer version.
The point you seem to miss is that _all_ of you hand-optimizations
apply to the array versions as well.  The only difference between
one-d array notation and pointers is the place at which the _base_
of the array is added.  I have amply demonstrated that _this_ part
of the optimization` can _easily_ be handled by the compiler.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/14/90)

In article <5810@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <7001:Nov1008:37:2690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > Jim, you've made a blanket assertion that arrays are as efficient as
> > pointers. I've proven you wrong. Give up.
> You have not even posted _one_ counter-example.

As part of my proof I have posted the general form that a counterexample
would take. Namely, if you show me an algorithm that can supposedly
shift arrays optimally, I'll find a program for which you cannot detect
which of two branches is taken (independently of the input). I'll then
use x[0] (or x[a + b], or whatever) in one branch, and x[1] (or
whatever) in the other branch. You won't be able to figure out whether
it's better to precompute &x[0] or to precompute &x[1]. Such a program
must exist.

However, no counterexample can exist for *all* algorithms, just as no
single proof can exist that *all* computer programs will fail to find.
Surely you understand this.

> The point you seem to miss is that _all_ of you hand-optimizations
> apply to the array versions as well.

You're quite right.

It's also true that a computer program can read in a rigorous, formal
proof, and in linear time either say it's correct or exhibit the error.

That is irrelevant to *finding* the proof in the first place, just as
your statement is irrelevant to *finding* the hand optimizations in the
first place.

---Dan

jlg@lanl.gov (Jim Giles) (11/14/90)

From article <5059:Nov1323:43:2290@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> As part of my proof I have posted the general form that a counterexample
> would take. Namely, if you show me an algorithm that can supposedly
> shift arrays optimally, I'll find a program for which you cannot detect
> which of two branches is taken (independently of the input). I'll then
> use x[0] (or x[a + b], or whatever) in one branch, and x[1] (or
> whatever) in the other branch. You won't be able to figure out whether
> it's better to precompute &x[0] or to precompute &x[1]. Such a program
> must exist.

Fine.  The compiler won't be able to determine this one.  Neither will
the human programmer.  If the programmer knows a priori which branch
will be taken, why is there a branch?  And, if he doesn't know a priori
which branch will be taken, then he doesn't know which pointer to
precompute.  And, since he doesn't know which to precompute, he must
guess: which is what the compiler will do as well.

Now, there are certain trade-offs about who does the guessing.  The
programmer _may_ have advance information about the relative frequency
of the branches taken.  For this reason, it might be better to let
the programmer do the guessing.  The compiler, on the other hand,
knows the register allocation and the instruction pipelining: so
it knows which of the branches offers the best chance of completely
covering the precomputing of the array index.  For this reason, it
might be best to let the compiler choose.  The best solution is
probably to let the compiler choose and give the programmer the
ability to assert information about the frequency of each branch.

In any case, this contrived example only saves a fractional add at
best (an add is saved each time you take the branch that you
precomputed for, one is lost each time you take the other branch,
the average "savings" is a fraction of an add).  The problem is,
it's possible that the programmer will pick the wrong one to
precompute.  He may pick the most frequently taken branch to
precompute for and then lose time anyway because the computation on
the other branch would have pipelined better.  On most compilers, it
will probably decide to precompute _both_ anyway - even when you use
explicit pointers.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/14/90)

In article <5839@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <5059:Nov1323:43:2290@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > As part of my proof I have posted the general form that a counterexample
> > would take. Namely, if you show me an algorithm that can supposedly
> > shift arrays optimally, I'll find a program for which you cannot detect
> > which of two branches is taken (independently of the input). I'll then
> > use x[0] (or x[a + b], or whatever) in one branch, and x[1] (or
> > whatever) in the other branch. You won't be able to figure out whether
> > it's better to precompute &x[0] or to precompute &x[1]. Such a program
> > must exist.
> Fine.  The compiler won't be able to determine this one.

Thank you for admitting it! I'm glad we agree that you were wrong when
you said that the computer can make arrays as efficient as pointers in
all cases.

> Neither will
> the human programmer.

Bullshit. You removed all my analogies to theorem-proving: the analogy
here is ``Fine. The computer won't be able to figure out proofs for this
one. Neither will the mathematician.'' Just because no single algorithm
will handle every case doesn't mean people can't!

> If the programmer knows a priori which branch
> will be taken, why is there a branch?

Maybe he didn't realize at first. But I know what you're saying, and in
the proof (which you apparently haven't read) I pointed out why that's
irrelevant: all you need is for one branch to be taken with much higher
probability than another. As I'm sure Herman will instantly agree, this
is an extremely common case in real numerical code.

> And, if he doesn't know a priori
> which branch will be taken, then he doesn't know which pointer to
> precompute.  And, since he doesn't know which to precompute, he must
> guess:

No. This is like saying ``And, since he doesn't know a priori how to
prove the theorem, he won't be able to find a proof.'' All the best
efforts of AI aside, we must assume that, for any given program, *some*
programmer will be able to optimize better than the compiler. We must
not take away his tools for doing so.

  [ a reasonable summary of programmer-versus-compiler in general ]
> The best solution is
> probably to let the compiler choose and give the programmer the
> ability to assert information about the frequency of each branch.

Some people would argue for a third solution, namely profiling; but
there are many cases when basing optimizations on profiling can be very
dangerous. (You kill your worst case in favor of your average case.)

Anyway, I don't care whether you let the programmer specify array shifts
imperatively or functionally. Just let him do it.

> In any case, this contrived example

``Simplified example.'' Since you refused to think through real
examples to come to the same conclusion.

> only saves a fractional add at
> best (an add is saved each time you take the branch that you
> precomputed for, one is lost each time you take the other branch,
> the average "savings" is a fraction of an add).

Cripes. What if there are several pointers (arrays) being dealt with?
What if this is in a loop that will be executed a trillion times? We are
talking about optimizations here, so I don't understand your attitude.

> The problem is,
> it's possible that the programmer will pick the wrong one to
> precompute.

That's true. People make mistakes. So what? Why do you keep bringing up
these points that we all accept? Lots of things are possible. Saying
that the programmer shouldn't be *allowed* to make this choice because
he might make a mistake is like saying that people shouldn't program in
the first place because they might make a mistake. Who cares?

> On most compilers, it
> will probably decide to precompute _both_ anyway - even when you use
> explicit pointers.

Sometimes this isn't possible. Remember that space optimization is
important too: if the less-executed branch is only executed one
millionth of the time, the code and data space for the extra computation
is not worthwhile.

---Dan

cik@l.cc.purdue.edu (Herman Rubin) (11/14/90)

In article <5810@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
> From article <7001:Nov1008:37:2690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > [...]
> > Jim, you've made a blanket assertion that arrays are as efficient as
> > pointers. I've proven you wrong. Give up.
> 
> You have not even posted _one_ counter-example.  You have only
> posted things the the compiler _could_ have optimized the array
> version as well (or better than) your hand optimized pointer version.

Well, here is an example which will definitely not work on all machines.
It is desired to move 3 bytes into the three least significant bytes of
a 4-byte word (other sizes are appropriate), and repeat this operation.
This can be done efficiently by using pointers to words on a machine with
unaligned reads.  Even with overhead for unaligned reads, it is hard to
see how to do this efficiently otherwise.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)

jlg@lanl.gov (Jim Giles) (11/15/90)

From article <4857:Nov1405:39:3590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> Thank you for admitting it! I'm glad we agree that you were wrong when
> you said that the computer can make arrays as efficient as pointers in
> all cases.

I admitted nothing of the sort.  I admitted that there were
optimizations the _humans_ could find that compilers couldn't.  I
_don't_ admit that pointers are in any way superior to arrays in
allowing humans to express those optimizations to the compiler.

I also pointed out that the _human_ is not in possession of important
information about the algotithm that the _compiler_ does know (like
pipelining, instruction selection, and register allocation - which
the compiler makes the final judgement about even in languages which
allow programmers to hint about things like this).  Since the human
doesn't know this stuff, his choice of which to optimize in the case
you gave is usually pure guesswork.

> [...]
>   [ a reasonable summary of programmer-versus-compiler in general ]
>> The best solution is
>> probably to let the compiler choose and give the programmer the
>> ability to assert information about the frequency of each branch.
> 
> Some people would argue for a third solution, namely profiling; but
> there are many cases when basing optimizations on profiling can be very
> dangerous. (You kill your worst case in favor of your average case.)

Which is why it is better to let the programmer give assertions.  The
human is then free to decide whether the worst case is important enough
to optimize or not.

> [...]
> Anyway, I don't care whether you let the programmer specify array shifts
> imperatively or functionally. Just let him do it.

Quite.  When did I say otherwise?

> [...]
>> The problem is,
>> it's possible that the programmer will pick the wrong one to
>> precompute.
> 
> That's true. People make mistakes. So what?  [...]

Well, since you don't seem to know, I'll tell you so what.  In this
particular case, even _very_ clever people will make mistakes at about
a rate of 50%.  The only language type which allow programmers to
make this decision with a better batting average is assembly (where
the programmer does _all_ register allocation and instruction
selection).

> [...]                                                         Saying
> that the programmer shouldn't be *allowed* to make this choice because
> he might make a mistake is like saying that people shouldn't program in
> the first place because they might make a mistake. Who cares?

Further, who said anything about not allowing the user to make the
choice?  I didn't.  I only said that the user should not be forced
to the level of pointers to have this type of control.

> [...]
>> On most compilers, it
>> will probably decide to precompute _both_ anyway - even when you use
>> explicit pointers.
> 
> Sometimes this isn't possible. Remember that space optimization is
> important too: if the less-executed branch is only executed one
> millionth of the time, the code and data space for the extra computation
> [...]

Oh, your example implied that space wasn't an issue.  You wrote
glibly about having a choice of saving &x[0] or &x[1].  Clearly, in
all programming languages I know of, &x[0] is _automatically_ saved
for the whole lifespan of the array.  Only in C is it even _possible_
to lose the base address of an array (without losing the _whole_
array at the same time).  So, if you're talking about precomputing
&x[1] and saving it _in_addition_ to &x[0], you obviously didn't
think saving space was meaningful.  If you were talking about
precomputing &x[1] and saving it _instead_of_ &x[0], I would object
that it's bad style to lose track of the base of the array.

J. Giles

jlg@lanl.gov (Jim Giles) (11/15/90)

From article <2732@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin):
> [...]
> Well, here is an example which will definitely not work on all machines.
> It is desired to move 3 bytes into the three least significant bytes of
> a 4-byte word (other sizes are appropriate), and repeat this operation.
> This can be done efficiently by using pointers to words on a machine with
> unaligned reads.  Even with overhead for unaligned reads, it is hard to
> see how to do this efficiently otherwise.

It's hard to see what this has to do with the array vs. pointer
issue which was being discussed.  It looks to me like you've got
a word that's being 'mapped' as an array of bytes and your loop
is filling the bottom three bytes.  Where are the pointers?

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

Look, Jim. You were saying a few weeks ago that arrays (as in Fortran,
or Pascal, or Ada) could always be made as efficient as pointers (as in
C, or, of course, in machine language). You didn't prove your claim, and
various people disagreed with you. Now I've shown that they were right
and your claim is wrong. There is *no* way that an automatic optimizer
can always produce the optimal pointer version of pointer-free code.
Give up.

The crucial property of pointers here is that they can express array
shifts. I am perfectly willing to accept that C's pointer arithmetic may
not be the best way to express array shifts, and I'm interested in your
views on better ways to express them. But Fortran doesn't have them.
Neither does Pascal. Neither does Ada. Those languages are suboptimal.

Detailed responses follow.

In article <5950@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <4857:Nov1405:39:3590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > Thank you for admitting it! I'm glad we agree that you were wrong when
> > you said that the computer can make arrays as efficient as pointers in
> > all cases.
> I admitted nothing of the sort.  I admitted that there were
> optimizations the _humans_ could find that compilers couldn't.

Somehow I don't see the difference, but you said it better. There are
optimizations of array code into pointer code that humans can find and
that compilers can't. Glad we agree.

> I
> _don't_ admit that pointers are in any way superior to arrays in
> allowing humans to express those optimizations to the compiler.

Huh? I defined what I meant by those terms pretty precisely in my proof.
Maybe your idea of ``arrays'' includes array shifting, or pointer
arithmetic, or whatever you want to call it. That has nothing to do with
arrays in Fortran, or Pascal, or Ada.

> I also pointed out that the _human_ is not in possession of important
> information about the algotithm that the _compiler_ does know

Sure, but that's irrelevant to this subthread.

> Which is why it is better to let the programmer give assertions.

Again, I don't care how you let the programmer do his work. However, you
are abusing the word ``assertions'' in the same way that you are abusing
``array'' above. Assertions in existing languages (and compilers) simply
don't have the power to express the level of decisionmaking that has to
go on here.

> >> The problem is,
> >> it's possible that the programmer will pick the wrong one to
> >> precompute.
> > That's true. People make mistakes. So what?  [...]
> Well, since you don't seem to know, I'll tell you so what.  In this
> particular case, even _very_ clever people will make mistakes at about
> a rate of 50%.

Obviously you like hand optimization about as much as you like C. You
have the same amount of prejudice against each. Since you never bother
to learn how to optimize by hand or to program in C, you never believe
that either can be effective. The vicious cycle continues.

> Further, who said anything about not allowing the user to make the
> choice?  I didn't.  I only said that the user should not be forced
> to the level of pointers to have this type of control.

So propose a different mechanism!

Q has a set of array slicing mechanisms more powerful than Fortran 90's.
I *think* they eliminate the need for pointer arithmetic. I've thought
so for months. But I'm not sure. I just haven't tried it on enough real
code to be absolutely sure that it does the job, and I haven't spent
enough time working it out on paper.

> If you were talking about
> precomputing &x[1] and saving it _instead_of_ &x[0], I would object
> that it's bad style to lose track of the base of the array.

Oh, great. We're talking about optimizations and now you introduce
``style.'' Gee, Jim, what do you think of the ``style'' of optimized
output from the Convex compiler? I must say it's so unfashionable.
Ultrix is much more hip.

Of course I was talking about saving &x[1] instead of &x[0]. If you had
read the proof you would understand this.

---Dan

jlg@lanl.gov (Jim Giles) (11/16/90)

From article <5494:Nov1519:06:3790@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> Look, Jim. You were saying a few weeks ago that arrays (as in Fortran,
> or Pascal, or Ada) [...]

Actually, I said arrays as in _my_ suggested language.  I used Pascal,
etc. as examples that would be familiar to the rest of the net in order
to counter your weak attempts to make pointers look good.  You have
yet to pose a _specific_ example of a case that even these languages
couldn't optimize.

There is some loudmouth on the net who (in another thread) is making
very insulting remarks about computing scientists being too theoretical
an not being practical.  He is insisting on specific practical experience
as a measure of a program's efficiency.

In this discussion, _you_ are refusing to give a specific example on
the grounds that no particular example can fool _all_ optimizers.
Fair enough.  But if the ones that fool a specific optimizer are
so _rare_ that you can't find one, the for _practical_ purposes, the
problem doesn't exist.

> [...]                    There is *no* way that an automatic optimizer
> can always produce the optimal pointer version of pointer-free code.

There is (by _your_ own earlier statements) no automatic optimizer
that can always produce the optimal pointer version of code that
_has_ pointers either.  You have been consistently pointing out that
the optimizations you are referring to must be carried out by hand.
Fine.  I prefer to make _specific_ statements to the optimizer about
what it should do.  That way, no one (like the programmer himself
at a later date) will get confused about why the code is stated in
the way it is.  So, an explicit directive to the compiler that a
path through the code is more common than another path will be
specific (and easier to verify later).  An implicit statement
of that same fact (like, by reording the computation of pointer
expressions) will lead the later readers of the code to ask "why
did he/I do that?"

> [...]   I am perfectly willing to accept that C's pointer arithmetic may
> not be the best way to express array shifts, and I'm interested in your
> views on better ways to express them. [...]

Finally.  [...]

> [...]                                 But Fortran doesn't have them.
> Neither does Pascal. Neither does Ada. Those languages are suboptimal.

Yet, all these languages can handle the _specific_ examples that you've
posted.  Frequency assertions to the compiler would solve the rest of
the ones that you claim are "theoretically" out there.

> [...]
>> I admitted nothing of the sort.  I admitted that there were
>> optimizations the _humans_ could find that compilers couldn't.
> 
> Somehow I don't see the difference, but you said it better. There are
> optimizations of array code into pointer code that humans can find and
> that compilers can't. Glad we agree.

We don't.  The following statement is part of the same paragraph and
thought as the preceding.

>> I
>> _don't_ admit that pointers are in any way superior to arrays in
>> allowing humans to express those optimizations to the compiler.

This is what _I_ said.

> [...]
>> I also pointed out that the _human_ is not in possession of important
>> information about the algotithm that the _compiler_ does know
> 
> Sure, but that's irrelevant to this subthread.

Why is every thing that doesn't support _your_ position suddenly
not relevant to the thread?  Your entire claim here is that the
human can _always_ beat the compiler at its own game.  This is
only true when the human has perfect knowledge of what the compiler's
game is.

Again, I don't oppose letting the programmer have the tools to
_try_ to beat the compiler with.  I'm just pointing out that it
_is_ relevant that the compiler has some hidden cards in its hands.

>> Which is why it is better to let the programmer give assertions.
> 
> Again, I don't care how you let the programmer do his work. However, you
> are abusing the word ``assertions'' in the same way that you are abusing
> ``array'' above. Assertions in existing languages (and compilers) simply
> don't have the power to express the level of decisionmaking that has to
> go on here.

Again, you are restricting the argument to a subset of the original
subject.  In your f77 vs. cc thread about interprocess aliasing, the
thread was begun by _you_ and was restricted to _existing_ languages.
This thread has _no_ such restrictions.

So, the fact that assertions in existing languages don't do a this
particular thing (express the expected frequency of a decision
point), is _not_ an argument against using this solution.  What
_is_ an argument against using this is that there are, as yet,
no _specific_ examples which require it.

> [...]
> Obviously you like hand optimization about as much as you like C. You
> have the same amount of prejudice against each. Since you never bother
> to learn how to optimize by hand or to program in C, you never believe
> that either can be effective. The vicious cycle continues.

I hand optimize in ASSEMBLY all the time.  I can pipeline and do
register allocation better than the compilers I have (which, by the
way are _really_ good: most people _can't_ beat them - I'm bragging,
but it's really true:  I wouldn't bother to use assembly if I
couldn't beat the compiler :-).  I don't hand optimize C because the
compiler usually re-pessimizes the code.

As to whether C coding is effective, I can only judge from
experience.  Every code that I've ever recoded from C to some other
language runs _faster_ in the other language.  This is a _uniform_
observation with no exceptions _at_all_.  This _may_ be that the C
code available to me is abnormally bad.  But, it seems unlikely.  It
seems more likely that the C code I've seen is typical.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

The saga continues: Jim pesters Dan for a *specific* example of code
that real optimizers can't handle. Dan is surprised. He thought that Jim
was reading the group while people showed specific examples that tripped
up Sun's optimizer. But just to be sure he gives an example that trips
up gcc. Life goes on.

In article <6063@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <5494:Nov1519:06:3790@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > Look, Jim. You were saying a few weeks ago that arrays (as in Fortran,
> > or Pascal, or Ada) [...]
> Actually, I said arrays as in _my_ suggested language.  I used Pascal,
> etc. as examples that would be familiar to the rest of the net in order
> to counter your weak attempts to make pointers look good.  You have
> yet to pose a _specific_ example of a case that even these languages
> couldn't optimize.

You want a specific example? Were you reading this group when somebody
pointed out how Sun's optimizer couldn't produce even near-optimal code
for a simple loop through an array? Matt pointed out that gcc did much
better. gcc, however, will produce suboptimal code for

  for (j = 0;j < 100;j++)
    a[i * j] = ...;

As you now understand, no optimizer will handle all cases. For any
optimizer you show me, there exists array code that it can't convert
into pointer code as well as possible. What matters in practice is that
real optimizers trip up on such simple code. Even good Fortran
optimizers don't do very well on moderately complex code.

> There is some loudmouth on the net who (in another thread) is making
> very insulting remarks about computing scientists being too theoretical
> an not being practical.  He is insisting on specific practical experience
> as a measure of a program's efficiency.

Yes, and I'm sure he will continue to do so. He is not a computer
scientist: he uses computers to get things done. He gets annoyed when
computer scientists make ridiculous assertions about real-world problems
just because their models aren't sufficiently accurate or because they
haven't bothered to think.

> In this discussion, _you_ are refusing to give a specific example on
> the grounds that no particular example can fool _all_ optimizers.
> Fair enough.

Thank you for saying so. So you *do* admit that you were wrong when you
said that arrays (as in Fortran) can always be made as efficient as
pointers (as in C)? Yes? You do? Stop beating around the bush. A week
ago you were saying that arrays were as efficient as pointers, and that
somewhat better compilers than we have today would suffice. Now we know
that you were wrong. Do you admit it?

By the way, I fully expect that you will chop up the above paragraph and
fail to quote the most important parts. That's your argument style.

> But if the ones that fool a specific optimizer are
> so _rare_ that you can't find one,

When did that ridiculous idea come out of the blue? I merely thought
that you had been paying attention to the examples already posted in
this group.

> > [...]                    There is *no* way that an automatic optimizer
> > can always produce the optimal pointer version of pointer-free code.
> There is (by _your_ own earlier statements) no automatic optimizer
> that can always produce the optimal pointer version of code that
> _has_ pointers either.

Where did that ridiculous idea come out of the blue? I don't know what
earlier statements you're referring to. I certainly don't agree with the
conclusion.

> >> I also pointed out that the _human_ is not in possession of important
> >> information about the algotithm that the _compiler_ does know
> > Sure, but that's irrelevant to this subthread.
> Why is every thing that doesn't support _your_ position suddenly
> not relevant to the thread?

Huh? It supports neither my position nor the opposite of my position in
this subthread. It's simply irrelevant. See below.

> Your entire claim here is that the
> human can _always_ beat the compiler at its own game.

Where did that ridiculous idea come out of the blue? I have never said
any such thing. I have said that it is *not* true that the compiler can
always beat the human at optimizations. Theory tells us that automatic
tools will never be good enough; in practice, programmers who put the
effort into hand optimization will get better results than optimizers.

As one specific case, Fortran arrays can't be shifted; machine language
arrays (pointers) can. Hence it is not true that the compiler can always
convert Fortran code into machine language code making the best use of
pointers. Hence the programmer should be given array shifting tools
himself---through, for instance, C pointers.

I certainly don't believe that the compiler is impotent; in fact, I said
a week ago in this group that I trust the compiler to do (low-level)
instruction scheduling. Herman disagreed; take it to him.

> Again, I don't oppose letting the programmer have the tools to
> _try_ to beat the compiler with.  I'm just pointing out that it
> _is_ relevant that the compiler has some hidden cards in its hands.

Why? I just don't see its relevance to this issue. All I've been trying
to do in this subthread is make it clear that C pointers, with their
``awful'' arithmetic, can make some code more efficient than in Fortran.
That's all.

> > Obviously you like hand optimization about as much as you like C. You
> > have the same amount of prejudice against each. Since you never bother
> > to learn how to optimize by hand or to program in C, you never believe
> > that either can be effective. The vicious cycle continues.
> I hand optimize in ASSEMBLY all the time.

Good to see you're awake. However, I think you'd find some rather
impressive gains from hand optimization in any language, if you just
took the time to practice.

---Dan

jlg@lanl.gov (Jim Giles) (11/16/90)

From article <9576:Nov1523:11:0990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]                               Matt pointed out that gcc did much
> better. gcc, however, will produce suboptimal code for
> 
>   for (j = 0;j < 100;j++)
>     a[i * j] = ...;

Huh? This is a _trivial_ example of induction variable elimination.
I suspect that the compiler couldn't optimize it because it couldn't
tell whether 'i' was aliased to part of 'a' or not.  This only happens
in languages in which pointers and arrays are confused.  It doesn't
happen in languages which _really_ have arrays (as opposed to a shorthand
that's preprocessed into pointers).

> [...]
>> I hand optimize in ASSEMBLY all the time.
> 
> Good to see you're awake. However, I think you'd find some rather
> impressive gains from hand optimization in any language, if you just
> took the time to practice.

I do hand optimize in other languages.  the technique is just different
in each.  In high-level languages (where I don't have direct control
over instruction selection or register allocation), I don't _try_
to hand optimize in ways which requires such control to be successful.
In C, I hand optimize by converting it into another language which
runs faster.

J. Giles

By the way, this is my _last_ response to Dan Bernstein.  It is not
worth the aggravation to try to discuss anything with someone whose
only discussion technique is invective.  If (an very implausible
if) Dan ever writes something with useful technical merit I will
discuss it in a general way without naming names.  I suspect that
the only reason he doesn't discuss things more rationally is that
he's afraid to admit he's wrong.  If I don't mention him directly,
maybe he won't feel personally threatened.

gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) (11/16/90)

In article <9576:Nov1523:11:0990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>Matt pointed out that gcc did much better.

My name is spelled "Greg". Matt Crawford may also be an astronomer,
but my hair is much longer. Trivial to tell us apart.

> gcc, however, will produce suboptimal code for
>
>  for (j = 0;j < 100;j++)
>    a[i * j] = ...;

If you forget to turn on strength reduction, you deserve what you get.

>Even good Fortran
>optimizers don't do very well on moderately complex code.

I showed you code in which FORTRAN does well, and you have to
hand-optimize C to beat it. Please don't forget history, or you will
have to repeat it. The best part of "discussing" anything with you is
that you refuse to learn.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

In article <6096@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <9576:Nov1523:11:0990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > [...]                               Matt pointed out that gcc did much
> > better. gcc, however, will produce suboptimal code for
> >   for (j = 0;j < 100;j++)
> >     a[i * j] = ...;
> Huh? This is a _trivial_ example of induction variable elimination.

I agree entirely. But gcc can't do it. (Sun's optimizer sort of does,
though as usual it fails to compress the j computation.)

All you wanted was real examples for real optimizers. I had thought that
you were listening when other people posted real examples. Apparently
not. Now I've posted the examples for you.

> I suspect that the compiler couldn't optimize it because it couldn't
> tell whether 'i' was aliased to part of 'a' or not.

No, there was no danger of this. Sun's optimizer can tell. gcc's can
too, when the ``i *'' is removed.

> In C, I hand optimize by converting it into another language which
> runs faster.

I think your definition of ``hand optimize'' is a bit screwy here,
though I sympathize with what you're saying. The way people optimize Ada
is to write it in a different language first, then run it through a
preprocessor.

> By the way, this is my _last_ response to Dan Bernstein.  It is not
> worth the aggravation to try to discuss anything with someone whose
> only discussion technique is invective.

You always do this. Whenever you lose an argument you first blithely
assume that you were right all along, then try to squirm out of your
mistakes, then give up communication.

> If I don't mention him directly,
> maybe he won't feel personally threatened.

Stop projecting your paranoia onto other people.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

In article <1990Nov16.022829.19283@murdoch.acc.Virginia.EDU> gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) writes:
> In article <9576:Nov1523:11:0990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >Matt pointed out that gcc did much better.
> My name is spelled "Greg".

Sorry, misspoke.

> > gcc, however, will produce suboptimal code for
> >  for (j = 0;j < 100;j++)
> >    a[i * j] = ...;
> If you forget to turn on strength reduction, you deserve what you get.

I turned on strength reduction; it gets in trouble with the variable
multiplication. Maybe the version here is older than yours?

> >Even good Fortran
> >optimizers don't do very well on moderately complex code.
> I showed you code in which FORTRAN does well, and you have to
> hand-optimize C to beat it.

All I meant by ``don't do very well'' is that I can easily do better in
machine language. We are, after all, talking about whether Fortran
arrays are always as efficient as pointers (as in machine language, for
instance). I didn't say anything about C.

I pointed out that hand optimization only because it seemed appropriate.
I also idly wondered whether there was any connection between the hand
optimization thread and the pointer thread, and explicitly refused to
make any philosophical comments about it. I'm still not sure what can be
said in general.

So your response really looks like a non sequitur, and a rather vapid
one at that.

---Dan

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/16/90)

Please do NOT take this seriously!

In article <6063@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
> As to whether C coding is effective, I can only judge from
> experience.  Every code that I've ever recoded from C to some other
> language runs _faster_ in the other language.  This is a _uniform_
> observation with no exceptions _at_all_.

I'd _love_ you to recode some C programs into Lisp for me.

(I'd also like to see you recode my screen editor in Cobol. <grin>)

> This _may_ be that the C
> code available to me is abnormally bad.  But, it seems unlikely.  It
> seems more likely that the C code I've seen is typical.

I would argue that it is typical in being abnormally bad.
I suggest that the basic problem in the text I've quoted is that
C is a good language for people who know what they are doing, but
that there is rather more crud around in C than Sturgeon's Law would
predict, and that Jim Giles is just a better programmer than most.
Certainly someone who knows about Fortran 90 has more ways available
of thinking about a program than someone who only knows the two
nearly identical languages C and Pascal (gee, it would be nice if C
had arrays).

Bearing in mind that every time I've recoded from Pascal or Fortran
into C, I've had a speedup, and that 2:1 ratio in the speed of code
produced by different C compilers on the same machine is quite common,
may I respectfully suggest
	[here comes the serious bit]
that if anyone wants to argue "my language _is_ faster than yours"
they present actual examples (such and such a program available by
anonymous FTP from such and such a site) with hard numbers, and
that people wanting to argue "my language _could_ be faster than
yours" remember that a major factor in determining whether it _will_
be faster is the _market_ for better compilers?  (Does anyone sell
high-quality Jovial compilers for Encore Multimaxes?  I doubt it.)
	[end of serious bit]

I'm reminded of a debate we had when I was an undergraduate.
The topic was "will high-level languages effectively replace
assembler?"  Both sides were wrong, the answer was "no, but C will".

-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

seanf@sco.COM (Sean Fagan) (11/18/90)

*AAAAAAAAARRRRRRRRRRRRRRGGGGGGGGGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHHHH!!!!!!!!!!!*

-- 
-----------------+
Sean Eric Fagan  | "*Never* knock on Death's door:  ring the bell and 
seanf@sco.COM    |   run away!  Death hates that!"
uunet!sco!seanf  |     -- Dr. Mike Stratford (Matt Frewer, "Doctor, Doctor")
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

meissner@osf.org (Michael Meissner) (11/20/90)

In article <9576:Nov1523:11:0990@kramden.acf.nyu.edu>
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

| You want a specific example? Were you reading this group when somebody
| pointed out how Sun's optimizer couldn't produce even near-optimal code
| for a simple loop through an array? Matt pointed out that gcc did much
| better. gcc, however, will produce suboptimal code for
| 
|   for (j = 0;j < 100;j++)
|     a[i * j] = ...;
| 
| As you now understand, no optimizer will handle all cases. For any
| optimizer you show me, there exists array code that it can't convert
| into pointer code as well as possible. What matters in practice is that
| real optimizers trip up on such simple code. Even good Fortran
| optimizers don't do very well on moderately complex code.

GCC does do the optimization on the MIPS (providing you use strength
reduction).  I suspect the array a was an integer array, since the
Sparc does not have an integer multiplication, gcc internally turned
the thing into calling __mulsi3, which it no longer reconizes as a
multiply.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?

smryan@garth.UUCP (Steven Ryan) (11/20/90)

>Fine.  The compiler won't be able to determine this one.  Neither will
>the human programmer.  If the programmer knows a priori which branch

I haven't been following all the fascinating details (yawn) of this
discussion, but if the arguments are based on this assumption, they
are invalid. That is, the assumption that you must only consider what
a reasonable programmer would do.

Unless you have an algorithm for distinguishing reasonable and
unreasonable programs.

Regardless of whether humans are computationally more powerful than
computers, an argument must consider any possible program, whether
from a reasonable programmer or a roomful of Cobol programmers (or
monkeys, if there is any difference).

For my sake, please, just pretend to be rigorous. Like all that nonsense
that computer=fsm without any demonstration that the total external media
is a priori bounded.
-- 
...!uunet!ingr!apd!smryan                                       Steven Ryan
...!{apple|pyramid}!garth!smryan              2400 Geng Road, Palo Alto, CA

bglenden@mandrill.cv.nrao.edu (Brian Glendenning) (11/22/90)

In article <PCG.90Nov8163939@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
   This discussion reminds me of Dijkstra's famous address in 1968 to some
   (the IFIP?)  conference, on the software crisis, that went like this:

     We have a software crisis, and we are told that we must advance the
     state of the art so as to solve it. We are only given a few constraints
     to satisfy:

	     no changes to the hardware platforms
	     no changes to the languages
	     no changes to the operating systems 
	     no changes to the education of programmers
	     no changes to the software tools
	     no changes to organizational practices
	     ...

     Well, this is RIDICULOUS!

Does anyone have the correct reference for this (or even the actual
text?) Thanks!

Brian
--
       Brian Glendenning - National Radio Astronomy Observatory
bglenden@nrao.edu          bglenden@nrao.bitnet          (804) 296-0286

meeuse@idca.tds.PHILIPS.nl (Jaap A. Meeuse) (11/23/90)

In article <BGLENDEN.90Nov21143012@mandrill.cv.nrao.edu> bglenden@mandrill.cv.nrao.edu (Brian Glendenning) writes:
>In article <PCG.90Nov8163939@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>   This discussion reminds me of Dijkstra's famous address in 1968 to some
>   (the IFIP?)  conference, on the software crisis, that went like this:
>
>     We have a software crisis, and we are told that we must advance the
>     state of the art so as to solve it. We are only given a few constraints
>     to satisfy:
>
>	     no changes to the hardware platforms
>	     no changes to the languages
>	     no changes to the operating systems 
>	     no changes to the education of programmers
>	     no changes to the software tools
>	     no changes to organizational practices
>	     ...
>
>     Well, this is RIDICULOUS!
>
>Does anyone have the correct reference for this (or even the actual
>text?) Thanks!
>
>Brian
>--
>       Brian Glendenning - National Radio Astronomy Observatory
>bglenden@nrao.edu          bglenden@nrao.bitnet          (804) 296-0286

The actual text:

Dijkstra: I would like to comment on the distinction that has been made
between practical and theoretical people. I must stress that I feel this
distinction to be obsolete, worn out, inadequate and fruitless. It is just no
good, if you want to do anything reasonable, to think that you can work
with such simple notions. Its inadequacy, amongst other things, is shown by 
the fact that I absolutely refuse to regard myself as either impractical or
not theoretical.
  David expressed the idea that "We can make case studies in industry, and
then you can study the results. You can do these analysis." A probable effect
of the distinction is that if such a study is made, the aoutput of it can be
reagrded as theory and therefore ignored. What is actually happening, I am
afraid, is that we all tell each other and ourselves that software engineering
techniques should be improved consideraly, because there is a crisis. But
there are a few boundary conditions which apparently have to be satisfied. I
will list them for you:
1 We may not change our thinking habits.
2 We may not change our programming tools.
3 We may not change our hardware.
4 We may not change our tasks.
5 We may not change the organizational set-up in which the work has to be done.
Now under these five immutable boundary conditions, we have to try to 
improve matters. This is utterly ridiculous. Thank you. (Applause)

This was copied from page 13 of the:

Report on a conference sponsored by the NATO SCIENCE COMMITTEE
Rome, Italy, 27th to 31th October 1969: SOFTWARE ENGINEERING TECHNIQUES

Copy of the contents of its page 2:

The present report is available from:
Scientidic Affairs Division
NATO
Brussels 39
Belgium

Publishers: NATO Science Committee

Printed in England at The Kynoch Press, Birmingham

IMHO, more than 20 years later, these boudaries have shown to be not
absolutely immutable: although we know that they cannot be changed
under our direct control, we see that they are changing (yes, not too
fast).
1 thinking habits:
  at least some of us are (becoming) aware that programming influences our
  thinking habits: especially programmers (like philosophers) tend to be
  utterly curious to know the exact semantics of what is said/written.
  Evidently this does not simplify their communications with other people,
  but it seems to become one of their characteristic thinking habits
  because they have to model such semantics in the results of their
  activities.
2 programming tools:
  at least some fundamental changes can be observed:
  - support for more formal approaches (VDM / Z  and Eiffel)
  - functional programming with it lazy-evaluation (Prolog)
3 hardware:
  here again:
  - parallel processing and neural networks
  - integration of vidio (text, pictures and other data) with audio (sound)
4 tasks:
  hopefully most of us have learnt that two types of complexity should be
  distinguished:
  - the intrinsic complex tasks: those that we don't know how to divide into
    tasks which are less complex (e.g. travelling salesman)
  - the extrinsic complex tasks: those that can be broken down into less
    complex tasks.
  Fortunately, nearly all of us are working on the second type. It is just
  a shame that we do not always exploit this.
5 organizational set-up:
  yes, here too we can observe some weak signals that boundaries are changing:
  not longer centralization is prefered always above distribution (USSR,
  Philips).

-- 
###  Jaap A. Meeuse         ###  Internet:  meeuse@idca.tds.philips.nl   ###
###  Philips IS, Dept OFS   ###  UUCP    :  ..!mcsun!philapd!meeuse      ###