chased@rbbb.Eng.Sun.COM (David Chase) (10/18/90)
> Aggressive optimization is the idea that restructuring a program *in > the course of translation* is both feasible and advantageous, and > desirable (cost effective). I'll grant that. > My contention is that logic restructuring optimizations are more cost > effective instead at the source level, whether automatic or manual, > and that often automatic is not necessary, manual is sufficient. Contention refuted, I think. Monday, I went to a nice talk by Michael Wolf of Stanford U on blocking (of matrix algorithms) for (register, cache, VM) locality and parallelism. These transformations are well understood, have clear conditions for correctness, and though they could be performed at the source level, doing them by hand is extremely tedious and far more likely to be incorrect. Also, choices must be made based on cache structure and array size; it can be profitable to make these choices at RUN time. The examples shown in the talk involved transformation of three (nested) loops into five (nested) loops. I believe you'd call this "aggressive". Performance boost: a factor of 3 or 4 for common matrix algorithms on two different machines. Note that the exact blocking chosen is machine-dependent. Other dependence-based transformations (scalarization, software pipelining, vectorization, parallelization) are also well understood, have clear conditions for correctness, and yield ample (often integer factors) increases in speed. > My main reason for surmising so is that automatic program analysis and > rewriting is often far more difficult than code planning and > synthesis, This is quite true, if (1) you ignore economies of scale and (2) you ignore those optimizations that are not expressible in the source language. The tremendous effort put into writing an optimizing compiler (and getting it correct) is paid back many times over in its later use. The work of a small number of people in this building improves the performance of code written by anyone using our compilers. > and the benefits are not as often competitive with those of more > limited and safer alternatives (source analysis and rewriting). You make two mistakes here: First, you act as if one must EITHER do manual optimization, OR do compiler optimization, but not BOTH. Second, compiler optimization IS more reliable than manual optimization, especially if the manual optimizer attempts to perform the same transformations (AND, the optimizer will optimize for different machines, unlike the manual optimizer). How often do you catch an optimizer bug, and often do you make a mistake hand-optimizing code? Consider, for example, a piece of code (written in C) I wrote demonstrating a new approach to solving a problem (it's the same example I've always used, because it's the only piece of code I've bothered that much over; usually, good algorithms, correctness, and compiler optimization are enough). First we have the algorithmic improvements -- there's about a factor of 20 - 1000 on the interesting problems. Then, we do some profiling to look for hot spots, and perform some optimizations by hand (that the compilers then did not do, but WILL do now, saving me the bother of doing it by hand and checking my work). Another factor of 10. More hand optimization, and it's 3 times faster yet. Now, that's a factor of thirty by hand (except that the compilers nowadays would have gotten a good chunk of that), but I started from a very careful implementation to be sure that I got it right. Even so, running the code through an optimizing compiler got me another factor of 1.67 on top of the factor of 30. It took the optimizer a couple of minutes to do that; it took me several weeks, spread out over a couple of years, and collaborating with other people using (and DEBUGGING -- I made a couple of mistakes along the way) the program to get the factor of 30. > More shortly: when -O[2345....] will not raise substantially the > probability of getting broken code in most compilers, when it will not > result in huge increases in compile time or space, and when it will > give substantially better results than hand tuned code, then I will > BELIEVE! If you get broken code from a compiler, the people who wrote it probably want to know, and will almost certainly fix it in a future release of their compiler. I believe this holds for DEC, FSF, HP, IBM, Metaware, MIPS, Sun, and Tartan Labs. The increases in compile time and space are regrettable, but that is a choice that you get to make. My computer, at least, doesn't need me to hold its hand while it compiles code, and I don't run the compiler at -O666 every day. Choice of optimization level during development, debugging, testing, and performance tuning is just a matter of allocating your time sensibly. (which is why I'm spending time replying to you, eh?) As for "substantially better" results, I believe that is the case already -- the compiler is certainly more often correct, and since I get to do *both* hand and compiler optimizations, I let the compiler go first, then I look for hot spots that it missed. I don't know about you, but my time is more valuable than CPU time on most computers. David Chase Sun Microsystems, Inc.
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/18/90)
In article <1458@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: > (AND, the optimizer will optimize for > different machines, unlike the manual optimizer). quickpick would remedy that, permanently. > How often do you > catch an optimizer bug, and often do you make a mistake > hand-optimizing code? First releases of optimizers simply don't work; after they've spent a few years on the market, their bugs are few and far between. Maybe one bug per ten thousand lines of code, after a year of maturity. It goes down quickly past that. I do not remember ever making a mistake in hand optimization by the most fundamental standard technique: adding variable x to keep track of some intermediate quantity, and eliminating redundant variables given x. What optimizer can do this for any but the most primitive intermediate expressions? (Introducing a pointer to traverse an array, and eliminating the index that it replaces, is the simplest example.) It would be interesting to see a language where you could say ``Okay, keep 4*k + j - foo(i) in a register, and use it wherever possible.'' Somehow I don't think we have the technology yet. [ example of code: 30x faster from better algorithms, etc., in ] [ several weeks, 1.7x faster from optimizer in a couple of minutes ] But I can usually get another 2x to 5x out of lots of ``picky'' hand optimizations, having nothing to do with the algorithms. And I'm not going to introduce subtle bugs in weird cases with unsafe program transformations. ---Dan
eerke@cs.kun.nl (Eerke Boiten) (10/18/90)
In article <13405:Oct1800:22:5690@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >I do not remember ever making a mistake in hand optimization by the most >fundamental standard technique: adding variable x to keep track of some >intermediate quantity, and eliminating redundant variables given x. What >optimizer can do this for any but the most primitive intermediate >expressions? (Introducing a pointer to traverse an array, and >eliminating the index that it replaces, is the simplest example.) A useful technique, indeed (called "strength reduction" in optimising compilers, "finite differencing" in transformational programming). Surprisingly :-), it's one of the first things I learned in the course on optimising compilers I followed as a masters student. Of course, *the* example is going through a (multi-dimensional) array. >Somehow I don't think we have the technology yet. Somehow I can't imagine this has never been implemented, considering how long it's been known. >And I'm not going to introduce subtle bugs in weird cases with unsafe >program transformations. Tell me, what do you mean by unsafe program transformations? Hand optimisations? Of course, finite differencing is relatively safe since you introduce redundant information most of the time. It's the process of *removing* (supposedly) redundant information that might very well cause many bugs. By the way, there *are* systems that help you in applying source-level optimisations. Just make sure that your transformations are correct (only once!), and they preserve the set of bugs in your program, when applied. Eerke Boiten Department of Informatics (STOP Project), K.U.Nijmegen Toernooiveld, 6525 AD Nijmegen, The Netherlands Tel. +31-80-612236. Email: eerke@cs.kun.nl
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/19/90)
In article <2301@wn1.sci.kun.nl> eerke@cs.kun.nl (Eerke Boiten) writes: [ after my description of the most general optimization technique ] > A useful technique, indeed (called "strength reduction" in optimising > compilers, "finite differencing" in transformational programming). Huh? The strength reduction and finite differencing that CS majors learn is absolutely trivial compared to what anyone can do by hand. As I said, walking through an array is only the simplest example. Does anyone have a compiler that can introduce a non-linear intermediate expression and reduce around it? If so, I'm impressed. How advanced is the symbolic algebra system included in the compiler? Can it take advantage of range constraints, so that if it knows that x is between 0 and 7, it might set up a table to calculate ((1<<(x+2))-1)/10 quickly? Can it manipulate floor and ceiling expressions? Can it introduce invariants to figure out range constraints? These are all part of that single, fundamental optimization technique. I know, compilers are improving. Some optimizers fully automate the dullest, most machine-dependent part of optimization---viz., figuring out how often loops or branches are executed in practice, seeing how long machine instructions take, and deciding on that basis whether a given optimization is worthwhile. I really appreciate that. I won't stop hand optimization because of it. > >And I'm not going to introduce subtle bugs in weird cases with unsafe > >program transformations. > Tell me, what do you mean by unsafe program transformations? > Hand optimisations? ``Neither -O3 nor -O4 should be used when compiling either device drivers, or programs that modify external variables from within signal handlers.'' > Of course, finite differencing is relatively safe > since you introduce redundant information most of the time. It's exactly this attitude of ``finite differencing is the only optimization in the world'' that leads people to think that hand optimization is useless. Both the attitude and the conclusion are wrong. > By the way, there *are* systems that help you in > applying source-level optimisations. I'm perfectly willing to use whatever's out there. I'm not willing to pretend that current compilers can figure out any reasonable level of optimization for me. ---Dan
chased@rbbb.Eng.Sun.COM (David Chase) (10/19/90)
In article <13405:Oct1800:22:5690@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >First releases of optimizers simply don't work; after they've spent a >few years on the market, their bugs are few and far between. Maybe one >bug per ten thousand lines of code, after a year of maturity. It goes >down quickly past that. Do you have documentation to back this figure up? I do not believe it; I've used beta-test compilers from more than one vendor, and they largely worked. When they didn't work, ALL the bugs I found were fixed before the compiler was released to customers. There's test suites in abundance for Fortran, C, and Ada -- it's elementary, though tedious, to be sure that a compiler passes everything in a test suite. > [ example of code: 30x faster from better algorithms, etc., in ] > [ several weeks, 1.7x faster from optimizer in a couple of minutes ] > >But I can usually get another 2x to 5x out of lots of ``picky'' hand >optimizations, having nothing to do with the algorithms. And I'm not >going to introduce subtle bugs in weird cases with unsafe program >transformations. You didn't read the posting clearly enough (or else I deleted the details for brevity). Better algorithms, high-level hand optimization (i.e., malloc, memoization), low-level hand-optimization (unrolling, inlining, stuff that optimizing compilers will do automatically for me today) all contributed to that speedup. (The grand total was a factor of 250, which has since been improved to about 400 because of other algorithmic improvements). Even so, there was still another factor of 1.6 to be had by the compiler's optimizations. In general, I find that my uncaught error rate for typing is higher than the bug rate for modern optimizing compilers. I think this is true for most people. Furthermore, even if I could do the transformations without thinking, it takes more time for me to type in the transformations than it does for the compiler to do the optimizations. And, it is not the intention that optimizing compilers do unsafe program transformations -- when they do, it is a bug. Hold that thought -- a transformation is not considered unless it is safe. People don't publish papers on "transformations that usually work". The wholesale rearrangements that expose parallelism and enhance locality are based on sound theory, are incredibly tedious to do by hand (the resulting program is often unrecognizeable), and can yield fabulous increases in speed. Other transformations are completely unavailable at the source level (can you schedule instructions between the fetch, increment, and store required to implement "x++" on a RISC machine?) and produce solid improvements in speed. Perhaps I'm missing something -- I claim, I believe correctly, that modern optimizing compilers improve code quality (at the *low level*) faster, more effectively, and with a lower error rate than I could ever hope to attain by hand (doing the *same* optimizations). What's wrong with using such an excellent tool? What awful property of optimizing compilers have I missed? David Chase Sun Microsystems
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/19/90)
In article <1493@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: > In article <13405:Oct1800:22:5690@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >First releases of optimizers simply don't work; after they've spent a > >few years on the market, their bugs are few and far between. Maybe one > >bug per ten thousand lines of code, after a year of maturity. It goes > >down quickly past that. > Do you have documentation to back this figure up? No, just my experience backed by what others tell me. (A '60s programmer says that for years Fortran compilers were built around the same optimizer, which produced incorrect results on very long stretches of code. The workaround? Stick a random label in to split up the code. Many Fortran programs today still have unnecessary labels in strange places. Funny thing: I had to work around a similar bug, in exactly the same way, under an optimizer on a three-year-old machine.) > You didn't read the posting clearly enough (or else I deleted the > details for brevity). Better algorithms, high-level hand optimization > (i.e., malloc, memoization), low-level hand-optimization (unrolling, > inlining, stuff that optimizing compilers will do automatically for me > today) all contributed to that speedup. Okay, I didn't realize that you were including unrolling and inlining in your 30x figure. But I do a lot more than unroll and inline when I optimize by hand. > And, it is not the intention that optimizing compilers do unsafe > program transformations -- when they do, it is a bug. Then I don't understand the warning in your company's cc(1) about -O3 and -O4. Is the compiler buggy? > The wholesale rearrangements that expose parallelism and enhance > locality are based on sound theory, are incredibly tedious to do by > hand (the resulting program is often unrecognizeable), and can yield > fabulous increases in speed. Most programs do not manipulate matrices. I agree, it's fine for the compiler to do what it can. However, the most I have ever seen an optimizer get is 3x, and *it* has all the low levels of the hardware to use. > Other transformations are completely > unavailable at the source level (can you schedule instructions between > the fetch, increment, and store required to implement "x++" on a RISC > machine?) and produce solid improvements in speed. I agree entirely. In fact, the most impressive optimizer I've seen gets 2x on practically any code. The underlying hardware is extremely pipelined (much more than a Cray); there are even separate instruction streams for integer and floating point operations. (And it takes two instructions to read or write memory.) I don't expect the language to support this level of instruction scheduling, so the compiler had better do a good job. Note that the optimizer gets relatively little speedup out of anything other than scheduling and trivial peephole optimizations. > Perhaps I'm missing something -- I claim, I believe correctly, that > modern optimizing compilers improve code quality (at the *low level*) > faster, more effectively, and with a lower error rate than I could > ever hope to attain by hand (doing the *same* optimizations). Doing the *same* optimizations, yes; but there are infinitely more optimizations that you can do by hand, because computers are still pretty stupid. Except on hugely parallel code, current optimizers get almost nothing out of ``intelligent'' transformations. > What's > wrong with using such an excellent tool? What awful property of > optimizing compilers have I missed? Unsafe transformations and other optimizer bugs. As in sun% cc -O4. Not to pick on your compiler, which is otherwise rather nice. ---Dan
gl8f@astsun7.astro.Virginia.EDU (Greg Lindahl) (10/19/90)
In article <20683:Oct1819:44:1490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Unsafe transformations and other optimizer bugs. As in sun% cc -O4. Not >to pick on your compiler, which is otherwise rather nice. Does Sun's -O4 have bugs, or is that warning in the cc man page because C programmers leave off "volitile" where it's needed? -- "Restraint, hell. I'm just too fucking busy." -- Bill Wisner
jlg@lanl.gov (Jim Giles) (10/19/90)
From article <13405:Oct1800:22:5690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > [...] > I do not remember ever making a mistake in hand optimization by the most > fundamental standard technique: adding variable x to keep track of some > intermediate quantity, and eliminating redundant variables given x. What > optimizer can do this for any but the most primitive intermediate > expressions? (Introducing a pointer to traverse an array, and > eliminating the index that it replaces, is the simplest example.) _ALL_ production quality compilers I've ever used on mainframes can do much better than all but the most adept programmer at this optimization. Further, the adept programmer can usually do no better than the compiler. Compilers have been able to do this quite well since I started programming 23 years ago. This is one of the reasons that your claim that pointers are more efficient that arrays is completely incorrect. In fact, the only place a production quality compiler (the only kind worth paying for) would fail to make this optimization is when it is actually _more_ efficient to recompute the index than to reload the saved value from memory. And _this_ decision is a feedback from the register allocation mechanism - so the user _can't_ do this unless he wants to take _complete_ responsibility for register allocation. But, register allocation is machine dependent - anyway it's another of those things that compilers are usually better than people at. In the meantime, there _are_ types of optimization that compilers _can't_ do yet. The adept programmer should spend his time on those rather than wasting it on things the compiler can do more quickly and reliably. J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/19/90)
In article <1990Oct18.212844.14728@murdoch.acc.Virginia.EDU> gl8f@astsun7.astro.Virginia.EDU (Greg Lindahl) writes: > In article <20683:Oct1819:44:1490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >Unsafe transformations and other optimizer bugs. As in sun% cc -O4. Not > >to pick on your compiler, which is otherwise rather nice. > Does Sun's -O4 have bugs, or is that warning in the cc man page > because C programmers leave off "volitile" where it's needed? Well, the latter can't be the explanation, because their cc doesn't even understand volatile. Hence the former must be true. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/19/90)
In article <66071@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > From article <13405:Oct1800:22:5690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > > I do not remember ever making a mistake in hand optimization by the most > > fundamental standard technique: adding variable x to keep track of some > > intermediate quantity, and eliminating redundant variables given x. > _ALL_ production quality compilers I've ever used on mainframes can do > much better than all but the most adept programmer at this optimization. > Further, the adept programmer can usually do no better than the compiler. What?! This is absolutely unbelievable. In one of my last few articles I had a paragraph listing some of what a competent hand optimizer does regularly. A simple example, and one responsible for a 10% speedup in one section of code: A variable is incremented regularly, but never goes past 2^16 - 1. Whenever it reaches a power of 2, another variable (which is initialized appropriately) is incremented. It is the number of bits in the first variable, though I don't know any compiler that can figure this out. It is hence smaller than 16. In another piece of code, a loop (which is performed at most twice---I had to unroll it by hand, because the compiler couldn't figure it out) essentially subtracts 8 from that variable each time around, and references (1 << x) - 1 for x a certain expression computed from that variable. I knew x had to be between 0 and 7, so I made a table and replaced (1 << x) - 1 by a reference into that table. This took me hardly any thought; it was one of hundreds of hand optimizations that I performed on the same program. Those optimizations added up to a 3x overall improvement, over and above the 1.5-2x that the compiler got for being able to use the machine language directly. A less trivial example: In a heavily hand-optimized implementation of Nussbaumer's convolution method, I had to combine a 27-digit number's residues mod 10^9 + {1,3,5} into its three digits radix 10^9. A certain intermediate result was guaranteed to be between 0 and 28258, though the compiler wouldn't be able to figure this out without some rather amazing calculations with floor and modulus functions. I was able to take advantage of this to rewrite a large portion of the computation in a way that I knew was safe only because overflow was impossible. Jim, I bet you $100 that you have used a ``production quality compiler... on mainframes'' that cannot perform this optimization. Bet? Now what did you mean to say? ---Dan
jdarcy@encore.com (Floating Exception) (10/19/90)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <66071@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >> _ALL_ production quality compilers I've ever used on mainframes can do >> much better than all but the most adept programmer at this optimization. >> Further, the adept programmer can usually do no better than the compiler. >What?! This is absolutely unbelievable. In one of my last few articles I >had a paragraph listing some of what a competent hand optimizer does >regularly. A simple example > [convoluted example] >A less trivial example: In a heavily hand-optimized implementation of >Nussbaumer's convolution method Give us a break, Dan! Both of the examples you've given represent changes of *algorithm* based on foreknowledge of possible input values, a type of optimization of which nobody expects compilers to be capable, at least not in this century. I guess in some distant future there will exist compilers that can recognize and replace inferior algorithms with more efficient ones, but that's irrelvant to your disagreement with Jim Giles. What he was talking about was an optimization that, in essence, merely rearranges the order of evaluation of intermediate results. I think most people admit that your obviously high opinion of your own programming skills may not be entirely unjustified, but not everyone is so talented. There are many programmers out there who simply don't do even the simplest types of hand optimizations, who write really crappy code, and because they are so numerous I would say that it's perfectly reasonable for compilers to optimize as aggressively as their creators can make them. Obviously, this should be done without sacrificing any correctness, and may not do much for "adept" programmers, but for the code produced by run-of-the-mill types optimization has always been a very big winner. -- Jeff d'Arcy, Generic Software Engineer - jdarcy@encore.com Nothing was ever achieved by accepting reality
chased@rbbb.Eng.Sun.COM (David Chase) (10/20/90)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: gl8f@astsun7.astro.Virginia.EDU (Greg Lindahl) writes: >> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >> >Unsafe transformations and other optimizer bugs. As in [xxx]% cc -O4. Not >> >to pick on [that] compiler, which is otherwise rather nice. >> Does [Xxx]'s -O4 have bugs, or is that warning in the cc man page >> because C programmers leave off "volatile" where it's needed? > >Well, the latter can't be the explanation, because their cc doesn't even >understand volatile. Hence the former must be true. [I'd prefer not to discuss bugs/features (the neutral word is "behaviors") of specific products of specific companies. Let's confine this to a general discussion of aggerssive optimization, please.] There is a third possibility -- that there are at least two dialects of pre-ANSI C; in one, global variables are volatile, in the other they are not. Most parts of most programs are written in the second dialect, thought some very important parts of some very important programs are written in the first dialect. Lacking a standard, common use dictates, and *both* dialects are in common use, regardless of what the net.authorities say is "right". (I know that sounds like a slimy excuse, but remember who "is always right". Note, too, that the keyword added for ANSI was "volatile" for safety, and not "stable" for speed. This should give some indication of what is usually intended.) And, whether you agree with the choice of dialects or not, it's got nothing to do with "buggy optimizations" -- either global variables are volatile (i.e, programs execute in serial order), or they aren't. I believe that is what were discussing in the first place. There's no way to "debug" the optimization of references to global variables if those references are intended to be treated as volatile. David Chase Sun Microsystems
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/20/90)
In article <1559@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: > And, whether you agree with the choice of dialects or not, it's got > nothing to do with "buggy optimizations" -- either global variables > are volatile (i.e, programs execute in serial order), or they aren't. > I believe that is what were discussing in the first place. There's no > way to "debug" the optimization of references to global variables if > those references are intended to be treated as volatile. Fair enough. I'll stop calling the assumption of nonvolatility a bug, though I still consider it a huge mistake. There are lots of other examples of optimizer bugs, so the point stands. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/20/90)
Skip to the word ``opinion'' to see my main point. In article <jdarcy.656338868@zelig> jdarcy@encore.com (Floating Exception) writes: > brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > > [convoluted example] > >A less trivial example: In a heavily hand-optimized implementation of > >Nussbaumer's convolution method > Give us a break, Dan! Both of the examples you've given represent changes > of *algorithm* based on foreknowledge of possible input values, First of all, neither change was based on any foreknowledge of possible inputs; all constraints were expressed by actual tests in the programs. More importantly, they weren't algorithm changes at all. See below for further explanation. > a type of > optimization of which nobody expects compilers to be capable, at least not > in this century. Exactly. Compilers can't do this sort of simple optimization. See below. > I think most people admit that your obviously high opinion of your own > programming skills may not be entirely unjustified, but not everyone is > so talented. Wtf are you talking about? I was presenting the optimization the way a very smart compiler might see it, step by step. Now here's how any programmer would see it, especially after coding it in the first place: --> Variable x is the number of bits of padding inside a byte, so it's --> between 0 and 7. So we can compute f(x) more efficiently by --> precomputing a table for 0 through 7. Done. You call it ``convoluted''? No shit. The human brain works in very convoluted ways. It can also do a lot of things infinitely better than any existing computer. Simple hand optimizations like the one above are well beyond the reach of current technology. The same comments apply to the other optimization I talked about. In any case, this optimization is in the class of optimizations that I described: introducing new intermediate quantities (in this case, the precomputed table) and using them to cut out redundant code (in this case, the main computation). Jim says he has compilers that can do those optimizations. I find that unbelievable. > I would say that it's perfectly > reasonable for compilers to optimize as aggressively as their creators > can make them. Obviously, this should be done without sacrificing any > correctness, Well, yes. This is the crux of the issue. I (and, I think, Piercarlo, and lots of other programmers) see too many optimizer bugs to believe that aggressive computer optimization is worthwhile. Other people may think the balance swings in the opposite direction. It's just a matter of where you draw the line. By the way, I don't agree with one of your comments---namely, that automatic optimization becomes less useful as hand optimization kicks in. On the contrary: the compiler will always understand details of the machine language that are purposefully hidden from the language used by the programmer. The programmer and compiler optimize independently, in a sense, as long as the compiler doesn't try to do optimizations that it doesn't understand. So on a RISC machine I'll gladly have the compiler give me its 2x instruction scheduling benefit, no matter how much hand optimization time I put in. ---Dan
pcg@aber-cs.UUCP (Piercarlo Grandi) (10/22/90)
I was writing: pcg> Aggressive optimization is the idea that restructuring a program *in pcg> the course of translation* is both feasible and advantageous, and pcg> desirable (cost effective). In article <1458@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: chased> I'll grant that. pcg> My contention is that logic restructuring optimizations are more cost pcg> effective instead at the source level, whether automatic or manual, pcg> and that often automatic is not necessary, manual is sufficient. chased> Contention refuted, I think. chased> Monday, I went to a nice talk by Michael Wolf of Stanford U on chased> blocking (of matrix algorithms) for (register, cache, VM) locality chased> and parallelism. [ ... ] chased> These transformations are well understood, have clear conditions chased> for correctness, and though they could be performed at the source chased> level, doing them by hand is extremely tedious and far more likely chased> to be incorrect. What you are saying is that this guy has some technology to synthetize high performance matrix algorithms depending on the target architecture, and that such synthesis can be reliably done by program. This is excellent news, but does not necessarily mean that the technology is about aggressive optimization. The same might be used for clever code generation, which is absolutely OK. chased> Also, choices must be made based on cache structure and chased> array size; it can be profitable to make these choices chased> at RUN time. On the fly code generation, thunks. Excellent. Not related in any way to aggressive optimization. chased> The examples shown in the talk involved transformation of three chased> (nested) loops into five (nested) loops. I believe you'd call this chased> "aggressive". Frankly, I'd call them "crazy" if used as optimizations. Very clever and safe if used in a high level code generator. It all depends on the abstraction level at which you apply them. chased> Performance boost: a factor of 3 or 4 for common matrix algorithms chased> on two different machines. Note that the exact blocking chosen is chased> machine-dependent. The real point here is that the guy has a better algorithm for matrix operations than the standard one, so what he is doing is pattern matching for occurrence of it, and substituting the standard one for his better one. This begs the question: why not just write 'm * n' and let the code generator (or the library manager) synthetize the right algorithm? My answer: it is crazy to have programmers write specific algorithm implementations, and then have the compiler pattern match for the implementation, deduce the algorithm from the implementation, and then rewrite the implementation with another one that (supposedly) implements the same algorithm. IMNHO it is much more effective for example to use a source rewriting tool that scans the programmer code for implementations, and substitutes them with higher level constructs. The idea is that the source should be true; it should either contain an implementation that is literally translated *or* a specification that the compiler can implement as it pleases, and not something halfway. Everything else is a program with confusion in its abstraction levels. pcg> My main reason for surmising so is that automatic program analysis and pcg> rewriting is often far more difficult than code planning and synthesis, chased> This is quite true, if (1) you ignore economies of scale and (2) you If the economies of scale are "don't touch dusty decks", there is an unresolved argument as to whether it is more economical to rewrite manually or automatically the dusty decks in a form more amenable to clever code generation, or leave them alone and work on the generated code. I have seen too many dusty decks improve beyond recognition with a little manual or automatic restructuring (as Knuth says in vol. 1 of the ACP), so I tend to think that even for dusty decks explicit (automatic or manual) restructuring is better than aggressive optimization. If we are speaking of newly designed programs, then probably it is always cheaper to write the programs right (at the appropriate abstraction level) in the first place than to restructure them in the code generator. Unless of course there is not only the "dusty deck" problem, but the "dusty programmer" one. (the programmer can only write dusty decks, and cannot be taught to write programs in a higher level notation, e.g. using procedure calls :->). To those who think that I am joking here: well, no. I know physicists and mathematicians that program in Fortran without using subroutines. They don't understand them. I have even met a physicist that did not know about loops (or block datas) -- but at least he knew about the local version of the NAG library, so he initialized his matrixes with hundreds of assignment statements, and then called a library subroutine to do the real work. But then we would be discussing sociological issues, not computer science. chased> The work of a small number of people in this building chased> improves the performance of code written by anyone using our chased> compilers. No, it just gives the illusion that you can have faster rerunning of dusty decks without touching them. You would be doing a much better service to anyone if you wrote source program restructuring tools, and then clever code generators. pcg> and the benefits are not as often competitive with those of more pcg> limited and safer alternatives (source analysis and rewriting). Because once you have turned a low level implementation in a higher level one it has become magically more portable, and you can also have better confidence in the program correctness. I am far more confident of the correctness of an SQL query one page long than of the equivalent implementation in COBOL which probably is several dozen pages long. If the SQL code generator is clever it is also likely that the SQL version is faster too, and on a wider range of machines than the COBOL one. I would not trust as much for either portability of speed a COBOL compiler that would deduce little by little the query the program is supposed to implement, and then rewrite the COBOL program's logic into a different implementation for the same query. chased> If you get broken code from a compiler, the people who wrote it chased> probably want to know, and will almost certainly fix it in a future chased> release of their compiler. I believe this holds for DEC, FSF, HP, chased> IBM, Metaware, MIPS, Sun, and Tartan Labs. Maybe, but there is a thing called software physics that tells us that bugs are strictly related to complexity, and virtually the only way to drive down bugs is to reduce complexity, not to report them as they occur. chased> I don't know about you, but my time is more valuable than CPU time chased> on most computers. But this is a good argument to use higher level languages with clever code generators! Or to let expensive programmer's code be literally obeyed. I think that if the programmer's time is valuable the best solution is not to have him write low level implementations that are ignored by the compiler and restructured into different but equivalent implementations of the same specification, but to have the programmer write the specification and then the compiler or library manager generate or select the optimal implementation straight away. And if the programmer writes a specific implementation, then that must be because of a very good reason, and it ought to be respected. Dan Bernstein and Herman Rubin only present one side of the anti aggressive optimization argument -- that implementations should not be rewritten by the language translator. The other side, mine, is that specifications should be translated in the cleverest way possible. The aggressive optimizers believe that implementations should be translated in the cleverest way possible. The Bernstein/Rubin assumption is that programmers know what they are doing and write in a low level language which therefore ought to be translated literally; the Chase/Giles assumption is that programmers don't know what they are doing and still write in a low level language which therefore ought to be translated liberally. My argument is that programmers should use the language at the abstraction level suitable to them and their applications, and that low level languages ought to be translated literally and high level ones liberally, and that translating liberally implementations in low level languages is a monstrosity. When you have a high level specification and you generate clever code for it you only have to show that the synthetized implementation satisfies that specification; when you do aggressive optimization on an implementation written in a low level language you also have to show that you have correctly deduced the specification from the implementation you are overriding. This is a completely different and far more difficult task, and should not be done in a hidden way by the compiler, on safety grounds alone. It should be done IMNHO either by the programmer itself (manual algorithm reimplementation) or by automatic tools on the source, so that the new implementation is documented in it. And I think that in most cases restructuring should not be horizontal, i.e. from original inappropriate implementation to new better implementation (three nested loops to five nested loops), but from low level to high level (three nested loops to 'm * n'), and separately from high levle to low level ('m * n' expanded in the locally optimal version, maybe five nested loops, either by high level code generation or library selection). I am not against clever code generation for high level languages -- I am against compilers trying to second guess programmer's implementations in low level ones. I would like aggressive optimization advocates to contend with this point, not with the straw one that better implementations are better. The question is not: 1) Can a compiler restructure an algorithm's implementation into a supposedly equivalent supposedly faster implementation as a matter of course during code analysis and generation? but it is: 2) Isn't it rather more cost effective to write new programs in a more abstract and portable way so that clever code generation can be applied to them, and to restructure old programs in source form so that they are brought to the same more abstract level either manually or automatically? My answers are: 1) yes, it is possible to do it but with non neglibile loss in confidence. 2) yes, because it results in long term benefits as well as short term ones. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
cik@l.cc.purdue.edu (Herman Rubin) (10/22/90)
In article <jdarcy.656338868@zelig>, jdarcy@encore.com (Floating Exception) writes: > brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >In article <66071@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > >> _ALL_ production quality compilers I've ever used on mainframes can do > >> much better than all but the most adept programmer at this optimization. > >> Further, the adept programmer can usually do no better than the compiler. > > >What?! This is absolutely unbelievable. In one of my last few articles I > >had a paragraph listing some of what a competent hand optimizer does > >regularly. A simple example > > [convoluted example] > >A less trivial example: In a heavily hand-optimized implementation of > >Nussbaumer's convolution method > > Give us a break, Dan! Both of the examples you've given represent changes > of *algorithm* based on foreknowledge of possible input values, a type of > optimization of which nobody expects compilers to be capable, at least not > in this century. I guess in some distant future there will exist compilers > that can recognize and replace inferior algorithms with more efficient ones, > but that's irrelvant to your disagreement with Jim Giles. What he was > talking about was an optimization that, in essence, merely rearranges the > order of evaluation of intermediate results. As I have stated before, the specific hardware configuration can have a major effect on the relative performance of algorithms. I am even more pessimistic than you about automata taking into account this information. But even other decisions as to when and how to do weird unrolling, or can conditional transfers on planned exceptions be combined, etc., are unlikely to be made by automata. I can revise an algorithm to take into account the size of an instruction stack or interference by doing things which the compiler cannot know are possible. The real solution is to have the programmer and the compiler interact. The compiler can systematically look at millions of options, but it cannot innovate. The human mind has great flexibility, but is not particularly fast at routine. > I think most people admit that your obviously high opinion of your own > programming skills may not be entirely unjustified, but not everyone is > so talented. There are many programmers out there who simply don't do > even the simplest types of hand optimizations, who write really crappy > code, and because they are so numerous I would say that it's perfectly > reasonable for compilers to optimize as aggressively as their creators > can make them. Obviously, this should be done without sacrificing any > correctness, and may not do much for "adept" programmers, but for the > code produced by run-of-the-mill types optimization has always been a > very big winner. I suspect that he is not bragging but complaining. If you teach potential programmers that "these are the ways things are done, and these are the concepts you can use in your programs," you make it difficult for them to even think later. As many have said, "It ain't what you don't know that hurts you, t's what you know that ain't so." Talent can be destroyed more easily than it can be created. To start a potential programmer with a set of techniques designed for those who cannot think is to at least bury the potential for thinking. My exper- ience with students who have had computational courses in mathematics and statistics is that the situation is much worse. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)
jlg@lanl.gov (Jim Giles) (10/23/90)
From article <2060@aber-cs.UUCP>, by pcg@aber-cs.UUCP (Piercarlo Grandi): > [...] > the Chase/Giles assumption is that programmers don't know what > they are doing and still write in a low level language which therefore ought > to be translated liberally. On the contrary. My assumption is that the programmer knows very well what he's doing and that he shouldn't have to waste that talent on trivial things that the compiler _should_ know how to do at the present state of the art. To be sure, programmers who _don't_ know what they're doing also benefit from a well written optimizing compiler - so the language should be aggressively optimized for that reason too. > [...] > My argument is that programmers should use the language at the abstraction > level suitable to them and their applications, and that low level languages > ought to be translated literally and high level ones liberally, and that > translating liberally implementations in low level languages is a monstrosity. I can't find any rigid line between "high level" and "low level" languages. No compiler should alter the semantic meaning of a program as required by the semantic description of the language used. A lower level language may give more specific operational requirements in its definition. Any feature that is not operationally restricted by the semantic definition of the language should be freely optimized. Assembly languages are the only ones I know of in which _no_ operational freedom is granted to the 'compiler'. Most other languages (including C - the 'as-if' rule) encourage as much optimization as possible in their language definitions. If you object to this, you should seek to have the specifications of these languages altered - or write your own language. In the meantime, compilers should be allowed - and even expected - to optimize as aggressively as possible with the semantic definition of the language. In any case, the acceptability of an optimization should not be based on a subjective judgement about the 'level' of the language. The semantic description of a language gives the only reliable and (reasonably) objective standard by which to judge this. J. Giles
cik@l.cc.purdue.edu (Herman Rubin) (10/23/90)
In article <9239:Oct2003:53:1890@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: ......................... > . On the contrary: the compiler will always understand details of the > machine language that are purposefully hidden from the language used by > the programmer. The programmer and compiler optimize independently, in a > sense, as long as the compiler doesn't try to do optimizations that it > doesn't understand. So on a RISC machine I'll gladly have the compiler > give me its 2x instruction scheduling benefit, no matter how much hand > optimization time I put in. I have no objection to the compiler helping to optimize, and even making suggestions to the programmer. But I object to the first sentence; there should be NO details of the machine language, instruction overlap, etc., hidden from the programmer. The existence of a single machine instruction can determine whether or not a block of code, or an algorithm, should even be considered. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)
seanf@sco.COM (Sean Fagan) (10/24/90)
In article <25336:Oct1823:13:3990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >> >Unsafe transformations and other optimizer bugs. As in sun% cc -O4. Not >> >to pick on your compiler, which is otherwise rather nice. The CDC Cyber 170 FORTRAN compiler has 3 levels of optimization: O1, O2, and O3. O1 was, essentially, no optimization (there may have bene a differenc between no O<num> and O1, but I don't think so); O2 was, basicly, full optimization. The only difference between O2 and O3 was that O3 would take advantage of delayed loads for array traversals. For example: REAL foobar(35) REAL sum sum = 0. DO 10 i=0,35,1 sum = sum + foobar(i) foobar(i) = foobar(i-1)*sum 10 CONTINUE With O2, the loop would be compiled as you expected. With O3, however, near the end of the loop would be a load for the next element in foobar. Note that, at the end of the loop, this would be accessing an element not of foobar. If foobar happened to be placed at the end of your memory segment, this would cause a runtime error (fatal). On the other hand, on the top-of-the-line machines, this extra optimization could spell the difference between 20 hours and 15 hours, which is quite a bit. I don't consider this buggy, since the documentation clearly told you what it would do, and what to expect, and you could always avoid it if you didn't want to take that chance. I'm not saying the Sun compiler is like that, but it is a possibility... -- -----------------+ Sean Eric Fagan | "*Never* knock on Death's door: ring the bell and seanf@sco.COM | run away! Death hates that!" uunet!sco!seanf | -- Dr. Mike Stratford (Matt Frewer, "Doctor, Doctor") (408) 458-1422 | Any opinions expressed are my own, not my employers'.
mjs@hpfcso.HP.COM (Marc Sabatella) (10/24/90)
>I have no objection to the compiler helping to optimize, and even making >suggestions to the programmer. But I object to the first sentence; there >should be NO details of the machine language, instruction overlap, etc., >hidden from the programmer. The existence of a single machine instruction >can determine whether or not a block of code, or an algorithm, should even >be considered. Perhaps that is why it took you and a graduate student two weeks to port a sort routine (or whatever it was you were blathering about a few months ago). If you want machine level, write in assembly. And no I don't particularly care to hear you rant about how existing assemblers don't have a flexible enough notation. Lobby for better assemblers; don't try to cripple HLL's.
cik@l.cc.purdue.edu (Herman Rubin) (10/25/90)
In article <65592@lanl.gov>, jlg@lanl.gov (Jim Giles) writes: ................... > Assembly languages are the only ones I know of in which _no_ > operational freedom is granted to the 'compiler'. It is possible to allow an assember to rearrange code for optimization. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)
meissner@osf.org (Michael Meissner) (10/26/90)
In article <2677@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: | In article <65592@lanl.gov>, jlg@lanl.gov (Jim Giles) writes: | | ................... | | > Assembly languages are the only ones I know of in which _no_ | > operational freedom is granted to the 'compiler'. | | It is possible to allow an assember to rearrange code for optimization. And in fact on systems that use the MIPS chips (MIPSco, SGI, DECstation), the assemblers already do this, primarily to fill delay slots and such. If I really need to know about what the assembler did behind my back, I have to disassemble the object module. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Do apple growers tell their kids money doesn't grow on bushes?
ted@nmsu.edu (Ted Dunning) (10/27/90)
In article <2301@wn1.sci.kun.nl> eerke@cs.kun.nl (Eerke Boiten) writes: In article <13405:Oct1800:22:5690@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >I do not remember ever making a mistake in hand optimization by the most >fundamental standard technique: adding variable x to keep track of some >intermediate quantity, and eliminating redundant variables given x. What >optimizer can do this for any but the most primitive intermediate >expressions? (Introducing a pointer to traverse an array, and >eliminating the index that it replaces, is the simplest example.) A useful technique, indeed (called "strength reduction" in optimising compilers, "finite differencing" in transformational programming). if you compile the following code on a sun3 with -O4, int a[100]; foo() { int i; int sum; int max; sum = 0; max = a[0]; for (i=0;i<100;i++) { sum += a[i]; if (max < a[i]) max = a[i]; } printf("%d %d\n", sum, max); } you get the following code: .text .globl _foo _foo: |#PROLOGUE# 0 link a6,#-20 moveml #8432,sp@ |#PROLOGUE# 1 moveq #0,d5 movl _a,d4 moveq #0,d6 movl #_a,a5 L77003: movl a5@,d7 addl d7,d5 cmpl d7,d4 jge L77005 movl d7,d4 L77005: addql #1,d6 <--- increment i addqw #4,a5 <--- increment a magic pointer moveq #100,d7 cmpl d7,d6 jlt L77003 movl d4,sp@- movl d5,sp@- pea L25 jbsr _printf |#PROLOGUE# 2 moveml a6@(-20),#8432 unlk a6 |#PROLOGUE# 3 rts -- I don't think the stories are "apocryphal". I did it :-) .. jthomas@nmsu.edu
conor@penguin.inmos.co.uk (Conor O'Neill) (10/29/90)
In article <MEISSNER.90Oct25130214@osf.osf.org> meissner@osf.org (Michael Meissner) writes: >| > Assembly languages are the only ones I know of in which _no_ >| > operational freedom is granted to the 'compiler'. >| >| It is possible to allow an assember to rearrange code for optimization. > >And in fact on systems that use the MIPS chips (MIPSco, SGI, >DECstation), the assemblers already do this, primarily to fill delay >slots and such. If I really need to know about what the assembler did >behind my back, I have to disassemble the object module. When I was first taught any computer science, I was taught that an assembler performed a one-to-one translation of assembly language mnemonics to machine instructions. I like, and can understand, that definition. I don't consider instruction scheduling, and the removal of no-ops in the delay slots, as "a one-to-one translation". Am I the only programmer who wishes that MIPS hadn't "corrupted" the word "assembler", but had used some other term instead? --- Conor O'Neill, Software Group, INMOS Ltd., UK. UK: conor@inmos.co.uk US: conor@inmos.com "It's state-of-the-art" "But it doesn't work!" "That is the state-of-the-art".
poser@csli.Stanford.EDU (Bill Poser) (10/30/90)
In article <12175@ganymede.inmos.co.uk> conor@inmos.co.uk (Conor O'Neill) writes: >When I was first taught any computer science, I was taught that an assembler >performed a one-to-one translation of assembly language mnemonics to >machine instructions. I like, and can understand, that definition. >Am I the only programmer who wishes that MIPS hadn't "corrupted" the word >"assembler", but had used some other term instead? Fancy assemblers have been around for a long time. Consider the VAX assembler (the DEC one, not the UNIX one), which had the pseudo-op JBR which was translated into a JMP or BRanch instruction depending on how far away the target ended up, and allowed a high-level specification of the register save mask for subroutine calls, so you didn't have to figure it out in binary. Or the MACRO assembler for DEC 20s, which had all sorts of high-level stuff. It was practically an HLL.
diamond@tkou02.enet.dec.com (diamond@tkovoa) (10/30/90)
In article <8960018@hpfcso.HP.COM> mjs@hpfcso.HP.COM (Marc Sabatella) writes: [looks like maybe Dr. Rubin's writing here] ||I have no objection to the compiler helping to optimize, and even making ||suggestions to the programmer. But I object to the first sentence; there ||should be NO details of the machine language, instruction overlap, etc., ||hidden from the programmer. The existence of a single machine instruction ||can determine whether or not a block of code, or an algorithm, should even ||be considered. |Perhaps that is why it took you and a graduate student two weeks to port a |sort routine (or whatever it was you were blathering about a few months ago). |If you want machine level, write in assembly. And no I don't particularly care |to hear you rant about how existing assemblers don't have a flexible enough |notation. Lobby for better assemblers; don't try to cripple HLL's. C was invented in order to be that better assembler. If you want machine level, you're supposed to program in C. That is exactly the purpose for which C was created. If you want an HLL, use one; don't cripple C. -- Norman Diamond, Nihon DEC diamond@tkov50.enet.dec.com (tkou02 is scheduled for demolition) We steer like a sports car: I use opinions; the company uses the rack.
mjs@hpfcso.HP.COM (Marc Sabatella) (10/31/90)
>Fancy assemblers have been around for a long time. Consider the >VAX assembler (the DEC one, not the UNIX one), which had the pseudo-op >JBR which was translated into a JMP or BRanch instruction depending on >how far away the target ended up, and allowed a high-level specification >of the register save mask for subroutine calls, so you didn't have to >figure it out in binary. Or the MACRO assembler for DEC 20s, which >had all sorts of high-level stuff. It was practically an HLL. None of these transformations violate the cardinal rule of assemblers, namely that of providing a one-to-one translation. That is, pseudo-ops, registers masks, and macros are all merely notational conveniences. If I write a macro and then use it, I know exactly what code will come out. Conversely, if I know I want a specific code sequence to come out, I can write the source accordingly. The MIPS "assembler", if the instruction scheduling or whatever cannot be disabled, fails this test of assembler-ness. Of course, I have nothing against this, and there is probably no real such dictionary definition for "assembler". It is counter-intuitive, though, in a way that pseudo-ops, masks, and macros are not.
chased@rbbb.Eng.Sun.COM (David Chase) (10/31/90)
diamond@tkou02.enet.dec.com (diamond@tkovoa) writes: >C was invented in order to be that better assembler. If you want machine >level, you're supposed to program in C. That is exactly the purpose for >which C was created. If you want an HLL, use one; don't cripple C. This may be true for (one dialect of) pre-ANSI C, but the specification of ANSI C allows (if you don't say "volatile") a large number of entertaining optimizing transformations, despite assertions by you and other posters here that C is a "low level language" and "low level languages ought not be aggressively optimized". C *is* a low level language, but the optimizations are 100% legal, often tend to make the code go faster, and thus they are applied. The compiler-writer's claim is "I can prove that these optimizations conform to the standard"; can you *prove* that they should not be done? How will it sell more computers/compilers to NOT do the transformations? (Lest you forget, dollars and cents is what it all comes to in the end.) Two things to note: 1) several optimizations could be applied much more frequently to C if it weren't for possible aliasing in procedure pointer/array parameters. If you don't understand what I'm talking about, go read a book on dependence analysis -- Wolfe's _Optimizing Supercompilers ..._ will give you a feel for what can be done to Fortran. 2) it's senseless to argue with the optimizer authors; they'll do everything that time, the standard, and the customers permit. If ANSI has changed Your Favorite Language and you had a substantial investment in software that was harmed by this, take it up with ANSI. You might look into the history of ANSI Cobol. David Chase Sun Microsystems, Inc.
new@ee.udel.edu (Darren New) (10/31/90)
In article <16101@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes: >Fancy assemblers have been around for a long time. Consider [...] >[...] It was practically an HLL. The "Meta" assembler on Sigma-9's under CP/V was also quite powerful. My friend wrote a LISP compiler in assembler macros. You could do something like LISP,12 (DEFUN FLUB (X) (cond (null x) R3 (t) (x R3))) to put the definition of FLUB into register 12. It was pretty neat, but it took forever to compile. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee -----
grier@marx.enet.dec.com (Michael J. Grier) (10/31/90)
In article <16101@csli.Stanford.EDU>, poser@csli.Stanford.EDU (Bill Poser) writes: |> Fancy assemblers have been around for a long time. Consider the |> VAX assembler (the DEC one, not the UNIX one), which had the pseudo-op |> JBR which was translated into a JMP or BRanch instruction depending on |> how far away the target ended up, and allowed a high-level specification |> of the register save mask for subroutine calls, so you didn't have to |> figure it out in binary. Or the MACRO assembler for DEC 20s, which |> had all sorts of high-level stuff. It was practically an HLL. |> |> Ummm... I don't know what assembler you're using but on my definitive VAX VMS macro assembler... $ create xyz.mar .entry main, ^M<> jbr a a: ret .end main (^Z here) $ macro xyz.mar 0002 3 jbr a %MACRO-E-UNRECSTMT, Unrecognized statement ! There were 1 error, 0 warnings and 0 information messages, on lines: 3 (1) MACRO XYZ.MAR I've only seen this pseudo-operation in Unix/Ultrix VAX assemblers, never in the real VAX macro assembler. I think that these optimizing assemblers should not be called assemblers also. After all, they're not the 1-1 translators that assemblers are supposed to be. (True, I haven't seen that concrete of a definition anywhere, and it is useful to have some common instruction scheduling and peephole-optimizing for a platform done in a common place, but then it's not an assembler, IMO.) ------------------------------------------------------------------------------ Michael J. Grier grier@leno.dec.com Software Engineer Digital Equipment Corporation but I don't speak for 'em!
thornley@cs.umn.edu (David H. Thornley) (10/31/90)
In article <16101@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes: >In article <12175@ganymede.inmos.co.uk> conor@inmos.co.uk (Conor O'Neill) writes: >>When I was first taught any computer science, I was taught that an assembler >>performed a one-to-one translation of assembly language mnemonics to >>machine instructions. I like, and can understand, that definition. > >>Am I the only programmer who wishes that MIPS hadn't "corrupted" the word >>"assembler", but had used some other term instead? > >[VAX and DEC 20 references omitted] If my memory serves, the IBM 650 had a Symbolic Optimizing Assembly Program (yep, they used the acronym) that rearranged the machine-language instructions to a good order so that they can be executed faster on the drum memory. The concept that an assembler can mess with the translation is a lot older than some of the people reading it. (Disclaimer: I'm not sure it was an IBM 650, but it was on an IBM with drum main memory, which puts it back in the '50s. For an education in how people worked with such drum memories, go to alt.folklore.computers and ask about Mel. :-) DHT
wallach@motcid.UUCP (Cliff H. Wallach) (10/31/90)
In article <TED.90Oct26124614@kythera.nmsu.edu> ted@nmsu.edu (Ted Dunning) writes:
<text deleted>
-if you compile the following code on a sun3 with -O4,
-
-int a[100];
-
-foo()
-{
- int i;
- int sum;
- int max;
-
- sum = 0;
- max = a[0];
- for (i=0;i<100;i++) {
- sum += a[i];
- if (max < a[i]) max = a[i];
- }
- printf("%d %d\n", sum, max);
-}
-
-you get the following code:
-
- .text
- .globl _foo
-_foo:
-|#PROLOGUE# 0
- link a6,#-20
- moveml #8432,sp@
-|#PROLOGUE# 1
- moveq #0,d5
- movl _a,d4
- moveq #0,d6
- movl #_a,a5
movl d4,a5 is more efficient
-L77003:
- movl a5@,d7
- addl d7,d5
- cmpl d7,d4
- jge L77005
- movl d7,d4
-L77005:
- addql #1,d6 <--- increment i
- addqw #4,a5 <--- increment a magic pointer
- moveq #100,d7
- cmpl d7,d6
- jlt L77003
- movl d4,sp@-
- movl d5,sp@-
- pea L25
- jbsr _printf
-|#PROLOGUE# 2
- moveml a6@(-20),#8432
- unlk a6
-|#PROLOGUE# 3
- rts
---
-I don't think the stories are "apocryphal". I did it :-) .. jthomas@nmsu.edu
Compiler generated code is still far from hand assembled code. While
I have much more experience with intel than 68k, I can see several simple
improvements. I have not assembled this code.
<some random prologue>
moveq #0,d1 ; sum =0
lea _a,a0 ; a0 = &a[0]
lea 100(a0),a1 ; a1 = &a[100]
move.l (a0)+,d2 ; max = a[0]
l100:
cmp.l a1,a0 ; (a dbra may be faster)
bge l101
move.l (a0)+,d0 ; load a[i++]
add d0,d1 ; sum += a[i]
cmp d2,d0 ; if (max < a[i])
blt l100
move d0,d2 ; max = a[i];
bra l100
l101:
<jsr printf ...>
<some random epilogue>
Can a compiler convert an array reference to a pointer reference?
Can it restructure code to minimize jumps?
strom@arnor.uucp (11/01/90)
In article <1990Oct30.221510.4392@cs.umn.edu>, thornley@cs.umn.edu (David H. Thornley) writes: |> In article <16101@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes: |> > ... |> If my memory serves, the IBM 650 had a Symbolic Optimizing |> Assembly Program (yep, they used the acronym) that rearranged |> the machine-language instructions to a good order so that |> they can be executed faster on the drum memory. |> |> The concept that an assembler can mess with the translation is |> a lot older than some of the people reading it. |> |> (Disclaimer: I'm not sure it was an IBM 650, but it was on an |> IBM with drum main memory, which puts it back in the '50s. |> For an education in how people worked with such drum |> memories, go to alt.folklore.computers and ask about Mel. :-) |> |> DHT Yes, it was a 650. And there were arguments between advocates of ``high-level'' assembly programming and advocates of machine language with hand optimization. Plus ca change, ... :-) -- Rob Strom, strom@ibm.com, (914) 784-7641 IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY 10958
poser@csli.Stanford.EDU (Bill Poser) (11/01/90)
In article <1990Oct30.165712@marx.enet.dec.com> grier@marx.enet.dec.com (Michael J. Grier) points out that it is the UNIX VAX assembler that has the JBR
pseudo-op. Sorry, memory fault. It is the VMS assembler that has the register
save mask stuff.
diamond@tkou02.enet.dec.com (diamond@tkovoa) (11/01/90)
In article <1849@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: >diamond@tkou02.enet.dec.com (diamond@tkovoa) writes: >>C was invented in order to be that better assembler. If you want machine >>level, you're supposed to program in C. That is exactly the purpose for >>which C was created. If you want an HLL, use one; don't cripple C. >This may be true for (one dialect of) pre-ANSI C, For the original and several revisions; for its entire raison d'etre. >specification of ANSI C allows (if you don't say "volatile") a large >number of entertaining optimizing transformations, Many of these don't hurt. Some of them do. >despite assertions >by you and other posters here that C is a "low level language" And its inventors and its original users. >The compiler-writer's claim is "I can prove that these optimizations >conform to the standard"; can you *prove* that they should not be done? Proofs are demonstrated by example very frequently on Usenet. >How will it sell more computers/compilers to NOT do the transformations? Oh THAT, obviously it won't do. That was not the issue until now. >1) several optimizations could be applied much more frequently to C if >it weren't for possible aliasing in procedure pointer/array parameters. I think this is getting mixed up with a separate thread but anyway... >book on dependence analysis -- Wolfe's _Optimizing Supercompilers ..._ Is this a revision of his famous out-of-print book? I'd like to take a look at one. [end of digression] >2) it's senseless to argue with the optimizer authors; they'll do >everything that time, the standard, and the customers permit. Partial agreement here. >If ANSI has changed Your Favorite Language and you had a substantial >investment in software that was harmed by this, take it up with ANSI. Yuk. C is far from my favorite language. But here is the problem that ANSI has caused: The language which was invented for the purpose of doing machine-dependent hacks with a moderately portable syntax is no longer suitable for that purpose, because implementors are encouraged to take advantage of the standard's prohibition and non-definition of various features that would be useful for machine-dependent hacks. It is now necessary to invent a new language for machine-dependent hacks, in order to do what C was supposed to do, or else -- as was stated by the poster that I previously followed up to -- return to the days of assembly language programming instead of using a moderately portable syntax. C used to beat assembly in many cases. Now C is no longer in the running. What a shame. -- Norman Diamond, Nihon DEC diamond@tkov50.enet.dec.com (tkou02 is scheduled for demolition) We steer like a sports car: I use opinions; the company uses the rack.
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/01/90)
In article <8960020@hpfcso.HP.COM>, mjs@hpfcso.HP.COM (Marc Sabatella) writes: > None of these transformations violate the cardinal rule of assemblers, > namely that of providing a one-to-one translation. At which ecumenical council was this cardinal rule adopted? It's worth pointing out that the optimisations done by the MIPS assembler _can_ be disabled. Surely the cardinal rule for an assembler is not "one-to-one translation" but "full access to all machine operations". If you want a particular sequence of bits in an executable code area, you should be able to get them. The MIPS assembler lets you do that. Why is it evil to offer optimisation as well? (Now the assembler _I_ would gripe about is the one where I had to write a lot indirect branches as ".value MagicNumber" because the assembler didn't understand "jp *%ax", although the disassembler, debugger, hardware, and manufacturer's manual _did_ support it.) -- The problem about real life is that moving one's knight to QB3 may always be replied to with a lob across the net. --Alasdair Macintyre.
tom@stl.stc.co.uk (Tom Thomson) (11/02/90)
In article <8960020@hpfcso.HP.COM> mjs@hpfcso.HP.COM (Marc Sabatella) writes: >None of these transformations violate the cardinal rule of assemblers, namely >that of providing a one-to-one translation. That is, pseudo-ops, registers one to one translation? back in the 60s we used to talk about the Fortran assembler; even earlier we had machines like Deuce where the assembler would fix the placement of instructions for you to minimise drum latencies, which was maybe one-to-one still but very much against the spirit of your proposed restriction on assemblers. >accordingly. The MIPS "assembler", if the instruction scheduling or whatever >cannot be disabled, fails this test of assembler-ness. > but behaves just like 1950s assemblers!
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/02/90)
On 30 Oct 90 21:22:38 GMT, chased@rbbb.Eng.Sun.COM (David Chase) said:
chased> diamond@tkou02.enet.dec.com (diamond@tkovoa) writes:
diamond> C was invented in order to be that better assembler. If you
diamond> want machine level, you're supposed to program in C. That is
diamond> exactly the purpose for which C was created. If you want an
diamond> HLL, use one; don't cripple C.
chased> This may be true for (one dialect of) pre-ANSI C, but the
chased> specification of ANSI C allows (if you don't say "volatile") a
chased> large number of entertaining optimizing transformations, despite
chased> assertions by you and other posters here that C is a "low level
chased> language" and "low level languages ought not be aggressively
chased> optimized".
Haha. Never said that low level *languages* should not be optimized. We
are saying that *programs* in low level languages (or indeed in any
language) should not be optimized.
I have decided to stick out my neck again and try to define better what
is aggressive optimization.
When you have a language it has a set of primitive operators (+,*,...)
and of primitive operator combinators (if, ;, procedures, ...). The
language definition prescribes the effect of the primitive operators and
an algebra for such effects defined by the combinators.
In a way or another this implies that each language implies a virtual
machine. The task of the code generator is to map the virtual machine
onto the real machine as well as it can, which means faithfully
(semantics) and efficiently (pragmatics).
Optimization is doing the mapping from the virtual machine defined by
the language to the real machine in a clever (i.e. optimal for the
target real machine) instead of a straightforward way.
A program in some language, at any level of abstraction, defines an
higher level virtual machine than that offered by the language. The
primitives of the virtual machine defined by a program (most obviously,
but only, its procedures and macros) are actually composed out of the
primitives and combinators of the language, and their meaning is defined
according to the algebra for these as given in the language definition.
Aggressive optimization is mapping from the virtual machine defined by
the program (not by the language!) to the real machine *in the code
generator*.
This is in way of principle possible, because the code generator can
indeed recognize combinations of primitive operators and rewrite them to
other combinations that supposedly map better onto the target real
machine, as long as the two are equivalent under the algebra of the
language definition, in that particular context.
The problems are:
Is this desirable?
No, because the rewriting algebra of the language may be poorly
defined, or poorly understood, and thus it may be difficult to rely
on the ability of the code generator to rewrite combinations of
language primitives in semantics preserving ways. This is
particularly true for languages like C, where a clean algebra for
rewriting was not a design goal. Ansi C have tried to do some poor
fixup engineering on this. We are not amused.
No, because even assuming that the underlying algebra is clean,
rewriting combinations of primitives is a far more concpetually
difficult task than mapping each of them to the real machine. In
particular it requires careful analysis. Such analysis tends to
result in more bugs in the compiler, and the compiler is the
bottleneck of an entire system's reliability.
No, because even if two combinations of primitives can actually be
shown in the given context to be equivaltn according to the
language's algebra, it ought to be assumed that the programmer
intended to write the particular one he chose to write for some
good reason.
No, because the speed advantages are not that dramatic (or
nonexistent), *if* the author is a clever programmer that knows
which of the equivalent formaulations will be optimal on the given
machine.
Are there alternatives?
Yes, rewriting the combinations of primitives not in the code
generator, but at the source code level, either manually or
automatically.
Yes, using a language whose primitives are at the abstraction level
desired, i.e. the primitives offered by the language are those
needed for the application. In this case the optimization is not
aggressive, because the generator is only doing optimal
implementation of hopefully well defined language primitives, not
supposedly optimal rewriting of pieces of a user program.
Are the alternatives cost effective?
Yes, because for *many* even if not all codes the time critical
portion is small enough that careful automatic rewriting is
feasible.
Yes, because automatic rewriting can be to a higher level language,
which can then be more easily and non aggressively optimized, and
automatic rewriting gives long lasting benefits in portability and
confidence in the program.
Yes, because using a language whose primitives are at the level of
the problem is known to reduce costs and improve quality via reuse
of the language's implementation or libraries, where otherwise the
implementation of thsoe primitives would have to be rewritten and
re-aggressively optimized for every application.
Yes, because it separates the compiler from the code rewriter,
which are two modules with very different goals and very different
reliability profiles.
chased> C *is* a low level language, but the optimizations are 100%
chased> legal, often tend to make the code go faster, and thus they are
chased> applied.
They are legal but dangerous and not cost effective, according to the
above discussion.
chased> The compiler-writer's claim is "I can prove that these
chased> optimizations conform to the standard";
STOP THE PRESSES! Compiler writer proves that he can do program
rewriting in a bug free way! Program synthethizers must be round the
corner! Major software companies stocks collapse!
chased> can you *prove* that they should not be done? How will it sell
chased> more computers/compilers to NOT do the transformations? (Lest
chased> you forget, dollars and cents is what it all comes to in the
chased> end.)
The argument is that it is not *cost effective* for *users* of compilers
to pay for program rewriting in the code generator. They may require a
little education to acknowledge this, just like smokers. In the mean
time the tobacco industry has every incentive to educate the customers
to the contrary...
--
Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
jlg@lanl.gov (Jim Giles) (11/02/90)
From article <PCG.90Nov1171210@odin.cs.aber.ac.uk>, by pcg@cs.aber.ac.uk (Piercarlo Grandi): > [...] > In a way or another this implies that each language implies a virtual > machine. The task of the code generator is to map the virtual machine > onto the real machine as well as it can, which means faithfully > (semantics) and efficiently (pragmatics). Unfortunately for your argument, there are few languages which have definitions of this kind. The definition of a language usually implies a _class_ of virtual machines. The code generator is allowed to map any of these onto the real hardware. Any optimizer which generates code which faithfully implements anything within the class of acceptable behaviours is correct. If an optimizer leaves the class of acceptable behaviours, it is _broken_. You are trying to limit the compiler to code transformations that _you_ regard as straightforward. In truth, the compiler is only limited by the language definition. If you don't like some of the optimizations allowed by the language definition, then lobby for the definition to be changed. J. Giles
chased@rbbb.Eng.Sun.COM (David Chase) (11/02/90)
pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >chased> The compiler-writer's claim is "I can prove that these >chased> optimizations conform to the standard"; >STOP THE PRESSES! Compiler writer proves that he can do program >rewriting in a bug free way! Program synthethizers must be round the >corner! Major software companies stocks collapse! Read your own posting, please, and note that what you quote did not say what you say it said. "I claim I can prove" is most definitely not the same as "I have proved". It means that the steps in optimization are based on little proofs, and sketches of proves; it doesn't mean that there is a big book proving the compiler correct. Such a book *could* be written if the need existed (the existance of dire need would be demonstrated by the exhibition of wheelbarrows full of money to finance its writing). Note, too, that I never claimed that the optimizations performed, however aggressive, are anywhere near general-purpose program rewriting or program synthesis. I have claimed, and will continue to claim, that what transformations are performed are based on a carefully worked-out understanding of certain program transformations. Regarding the virtual-machine-to-physical-machine mapping, if the standard only specifies that we must reproduce the I/O and approximate resource consumption of the virtual machine, then that's what we'll do if it makes the code faster. Where does it say that we should not do this? And, regarding, the tobacco industry analogy (an ad hominem analogy, too -- aren't vile insults what one uses when arguing from a weak position?), I will suggest that you have no evidence, only anecdotes, for the alleged pervasive bugginess of optimizing compilers. Certainly you have nothing like the evidence gathered to demonstrate a connection between tobacco and lung disease, and only hand-waving claims as to the cause-and-effect link. I *am* selling my skills, but I fail to see that there is anything pernicious about that. I think it is also true that new compilers will be both more aggressive and less buggy; if nothing else, the test suites only grow larger and more comprehensive, and faster, cheaper machines allow more tests to be run in a given amount of time, and wished-for and formerly ad-hoc optimizations are being backed up by better theory. The need for better compilers will also increase with time as clock rates and potential parallelism increase (both of these things pound harder on memory, and increase the gains of good register allocation and locality-enhancing transformations). It is also unclear that your claims about the speed/risk trade-offs are in any way connected with reality. Speed gained for you by an optimizing compiler is speed that you don't have to work for by hand-coding (though both can be used together to get even more speed-up if that is necessary), and the time saved by the programmer is more time that can be spent debugging and testing the code. Every programmer I know has a bug introduction rate higher than any compiler that I've ever used (but this is purely anecdotal -- perhaps it is completely different for other people, eh? do *you* find more compiler-introduced bugs than author-introduced bugs in *your* code?), so it is in fact more reliable to let the compiler do the optimizing, even if you don't use the time saved for additional testing and debugging. Now, I will grant one important thing, which I think is also important to you -- if I had garbage collection, I would be a more productive programmer. Garbage collection is available for "C", but it is potentially broken by optimizing compiler for C, which means that one powerful productivity and bug-avoiding aid is unavailable if you optimize. However, no specification for C anywhere in existence guarantees that a garbage collector should be able to function, and most programmers do not take advantage of garbage collectors, and some programmers write (perfectly legal) code that is guaranteed to break a collector at any level of optimization. It pains me that this is the case, but it is so, and if it increases the SPECmark to break a non-standard piece of code, even a useful one, that piece of code will be broken. There's always Lisp, ML, Eiffel, Russell, Modula-3, and Miranda. I've said several of the things that I said in this article before -- you might seriously consider comparing the bug-introduction rate of programmers, both skilled and ordinary, with the bug-introduction rate of several "aggressively optimizing" compilers. You might also check to see if there are any trends in compiler bug rates over time, and see how that correlates with the optimizations performed by those compilers. I think you'll find that your claims are not supported. David Chase Sun Microsystems, Inc.
barmar@think.com (Barry Margolin) (11/02/90)
In article <PCG.90Nov1171210@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >A program in some language, at any level of abstraction, defines an >higher level virtual machine than that offered by the language. The >primitives of the virtual machine defined by a program (most obviously, >but only, its procedures and macros) are actually composed out of the ^^^^^^^^ Should this have been "but not only"? >primitives and combinators of the language, and their meaning is defined >according to the algebra for these as given in the language definition. > >Aggressive optimization is mapping from the virtual machine defined by >the program (not by the language!) to the real machine *in the code >generator*. So aggressive optimization refers to optimization that does things like interprocedural flow analysis? Are loop unrolling, strength reduction, and common subexpression elimination aggressive or not? >This is in way of principle possible, because the code generator can >indeed recognize combinations of primitive operators and rewrite them to >other combinations that supposedly map better onto the target real >machine, as long as the two are equivalent under the algebra of the >language definition, in that particular context. Code generators, and certainly unagressive optimizers, are almost always required to "recognize combinations of primitive operators". The only optimizers that are independent of the primitive operators are peephole optimizers. For instance, in a language with a "+" operator that accepts arguments of either float or integer, and with conversion operators float(x) and integer(x), the compiler would be expected to emit different code for float(x)+float(y) than for integer(x)+integer(y). To do this it must recognize the combination of the conversion operators with the "+" operator. To some extent a compiler is simply an optimizer for an interpreter. The dividing lines between code generation, ordinary optimization, and aggressive optimization are very fuzzy. >The problems are: > >Is this desirable? > > No, because the rewriting algebra of the language may be poorly > defined, or poorly understood, and thus it may be difficult to rely > on the ability of the code generator to rewrite combinations of > language primitives in semantics preserving ways. This is > particularly true for languages like C, where a clean algebra for > rewriting was not a design goal. Ansi C have tried to do some poor > fixup engineering on this. We are not amused. Where does this "rewriting algebra" come from, and what makes it poorly defined? I don't think the ANSI C standard includes a rewriting algebra; I presume it's a function of the language's semantics. So, the rewriting algebra could conceivably be as well defined as the language itself. A poorly defined rewriting algebra could be the result of a poorly defined language, or because the designer of the rewriting algebra did a poor job. > No, because even assuming that the underlying algebra is clean, > rewriting combinations of primitives is a far more concpetually > difficult task than mapping each of them to the real machine. In > particular it requires careful analysis. Such analysis tends to > result in more bugs in the compiler, and the compiler is the > bottleneck of an entire system's reliability. Many of the things we do with computers were once thought to be too complex to program, but that doesn't mean we shouldn't have tried to implement them. AI is an example of a computer task that we still consider conceptually difficult, but we keep plodding on trying to solve it. Therefore, my interpretation of the above is that you think aggressive optimization should still be considered a research project, not something that should be expected in commercial products. > No, because even if two combinations of primitives can actually be > shown in the given context to be equivaltn according to the > language's algebra, it ought to be assumed that the programmer > intended to write the particular one he chose to write for some > good reason. Every good treatise on writing efficient programs generally starts with something to the effect that one should write programs clearly first, and then go back and optimize the bottlenecks. So, the programmer is likely to have chosen a combination of primitives that is closer to his algorithm than to the real machine. The programmer may not be aware of the physical machine primitives that the language primitives map onto, or may believe he knows but be mistaken (a particular compiler's mapping between virtual and physical machines is not generally documented, except perhaps in the area of data formats), so his "good reason" may actually be a "bad reason". Also, many programs are targeted for multiple (or unspecified) physical machines and/or compilers, so the programmer may not be able to choose language primitives that are good for all the target environments. > No, because the speed advantages are not that dramatic (or > nonexistent), *if* the author is a clever programmer that knows > which of the equivalent formaulations will be optimal on the given > machine. See above -- the program may not be targetted to a specific "given machine". >Are there alternatives? > > Yes, rewriting the combinations of primitives not in the code > generator, but at the source code level, either manually or > automatically. Are you saying that this should be done in a module that is independent of the target machine and compiler? I was under the impression that most complex optimization was already beind done using source-source translation, although sometimes it is necessary for the target source to include nonstandard, implementation-dependent primitives that map more directly onto the target hardware. For instance, strength reduction and common subexpression elimination translate the source into source with an additional temporary variable. Also, I don't think it matters what the destination of the rewrite is. I think the hard part is recognizing large combinations of primitives and reasoning about them. Transformation bugs usually result from unexpected interations between seemingly unrelated portions of the code (e.g. due to use of global variables or aliasing). This is one reason why much of the program transformation research has been done with functional languages: the absence of side effects makes semantic analysis much easier. > Yes, using a language whose primitives are at the abstraction level > desired, i.e. the primitives offered by the language are those > needed for the application. In this case the optimization is not > aggressive, because the generator is only doing optimal > implementation of hopefully well defined language primitives, not > supposedly optimal rewriting of pieces of a user program. Unfortunately, the computer industry doesn't abound with application-specific languages. And even if there were, finding programmers proficient in many of them would be hard. [Enter biased mode] It's hard enough finding people proficient in the better general-purpose languages; most schools just churn out Pascal and C programmers. [Exit biased mode] >Are the alternatives cost effective? > > Yes, because for *many* even if not all codes the time critical > portion is small enough that careful automatic rewriting is > feasible. How does the automatic rewriter know which parts are time critical? In many cases, program optimization is an incremental process: you find the bottleneck, rewrite it efficiently, find the new bottleneck, rewrite it, etc., until the overall efficiency meets the goals. > Yes, because automatic rewriting can be to a higher level language, > which can then be more easily and non aggressively optimized, and > automatic rewriting gives long lasting benefits in portability and > confidence in the program. Ah, now I see that you're talking about *replacing* the source file with this automatically rewritten source file. I thought you were talking about source-level transformations internal to the compiler (usually operating on the parse-tree, not the actual source text). Since the purpose of the transformation is to produce code that will map better by a specific compiler onto a particular hardware, I don't see the portability benefit of saving the transformed code. > Yes, because using a language whose primitives are at the level of > the problem is known to reduce costs and improve quality via reuse > of the language's implementation or libraries, where otherwise the > implementation of thsoe primitives would have to be rewritten and > re-aggressively optimized for every application. This is true if you can find such a language and programmers who know it. If the application is not very common then you will probably have to spend quite a bit to purchase or develop the language implementation. Subroutine libraries for general purpose languages can often do nearly as well and are much easier to come by. >They are legal but dangerous and not cost effective, according to the >above discussion. You made the statement that these optimizations don't buy much, but didn't back it up with any evidence. >chased> The compiler-writer's claim is "I can prove that these >chased> optimizations conform to the standard"; > >STOP THE PRESSES! Compiler writer proves that he can do program >rewriting in a bug free way! Program synthethizers must be round the >corner! Major software companies stocks collapse! Very little can be *proven* about any program; there are just too many variables and dependencies. A compiler writer who makes such a statement is merely asserting that he's as confident about his optimizations as he is about the rest of his compiler. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
seanf@sco.COM (Sean Fagan) (11/02/90)
In article <8960020@hpfcso.HP.COM> mjs@hpfcso.HP.COM (Marc Sabatella) writes: >None of these transformations violate the cardinal rule of assemblers, namely >that of providing a one-to-one translation. The MIPS assembler does, I believe, provide a one-to-one translation. However, it may also stick in NOP's, as necessary. Remember that the original MIPS chip did not have "interlocks": if you used a register before the previous instruction was done with it, you would get wrong results. The NOP's were necessary at that point. Consider a machine like the CDC 170-state architecture (any comp.arch readers who knew I would bring it up somehow? 8-)). It has/had 60-bit words, with 15 and 30 bit instructions. The assembler would pack 1-4 instructions per word; it provided padding (nop's, essentially) when the next instruction wouldn't fit (e.g., 15 / 30 / 30 won't fit in one word), or when the instruciton was proceeded directly by a lable (since you couldn't jump into the middle of a word). That was about as much a rewriting as the MIPS assembler (albeit on a smaller scale 8-)), yet it was still an assembler. -- -----------------+ Sean Eric Fagan | "Quoth the raven," seanf@sco.COM | "Eat my shorts!" uunet!sco!seanf | -- Lisa and Bart Simpson (408) 458-1422 | Any opinions expressed are my own, not my employers'.
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/03/90)
On 2 Nov 90 00:13:15 GMT, barmar@think.com (Barry Margolin) said: barmar> In article <PCG.90Nov1171210@odin.cs.aber.ac.uk> barmar> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: pcg> A program in some language, at any level of abstraction, defines an pcg> higher level virtual machine than that offered by the language. The pcg> primitives of the virtual machine defined by a program (most obviously, pcg> but only, its procedures and macros) are actually composed out of the ^^^^^^^^ barmar> Should this have been "but not only"? Yes, a Freudian slip (was thinking of Lisp...). pcg> Aggressive optimization is mapping from the virtual machine defined by pcg> the program (not by the language!) to the real machine *in the code pcg> generator*. barmar> So aggressive optimization refers to optimization that does barmar> things like interprocedural flow analysis? Are loop unrolling, barmar> strength reduction, and common subexpression elimination barmar> aggressive or not? They are all aggressive. Even substituting a shift for a multiplication or division by a power of two may be considered aggressive in certain languages where the properties of the two are not precisely identical. barmar> Code generators, and certainly unagressive optimizers, are barmar> almost always required to "recognize combinations of primitive barmar> operators". Ah yes, but not to rewrite them. And the recognition is typically simple and well defined. It does not usually imply trying to understand the effect of what is recognized. Languages where code generation itself (or even parsing) is strongly context dependent make for dangerous compilers (not to mention for hard to understand programs). barmar> The only optimizers that are independent of the primitive barmar> operators are peephole optimizers. Yep, yep. Long live peephole optimizers! barmar> To some extent a compiler is simply an optimizer for an barmar> interpreter. The dividing lines between code generation, barmar> ordinary optimization, and aggressive optimization are very barmar> fuzzy. They are nearly the same thing, just like iteration can be said is a special case of recursion. But, as in this example, their conceptual difficulty level is far different. Code generation deals with equivalences between a virtual and a real machine, ordinary optimization with machine dependent transformations (already hazardous enough, because the rewriting algebras of many real machines are underspecified and contrived enough) and aggressive optimizations with rewriting programmer's code. barmar> Where does this "rewriting algebra" come from, and what makes it barmar> poorly defined? I don't think the ANSI C standard includes a barmar> rewriting algebra; I presume it's a function of the language's barmar> semantics. So, the rewriting algebra could conceivably be as barmar> well defined as the language itself. Precisely. But usually this is not true, unless special care has been taken. Often one has a fairly workable algebra that defines what is the effect of a program, and a far less well defined one (usually implicit in the former, as you say) that allows you to "prove" that transformations preserve semantics. Theses are written about such things. barmar> A poorly defined rewriting algebra could be the result of a barmar> poorly defined language, or because the designer of the barmar> rewriting algebra did a poor job. Or maybe because the language was not designed with that goal in mind. Some languages have been designed so that it is relatively easy to reason about semantics preserving transformations; but even for these I still think that tranformations ought to be in the open, as source rewriting, than hidden in the code generator. But languages that people want to aggressively optimize have not been designed with that goal in mind, maybe because they were designed forty or twenty years ago as autocoders or super assemblers. pcg> No, because even assuming that the underlying algebra is clean, pcg> rewriting combinations of primitives is a far more concpetually pcg> difficult task than mapping each of them to the real machine. In pcg> particular it requires careful analysis. Such analysis tends to pcg> result in more bugs in the compiler, and the compiler is the pcg> bottleneck of an entire system's reliability. barmar> Many of the things we do with computers were once thought to be barmar> too complex to program, but that doesn't mean we shouldn't have barmar> tried to implement them. Indeed, but such difficult things, especially for languages that have not been designed for them, should not be put into the reliability bottleneck of an entire system. That's why I think that algorithmic transformation should be done in an independent module, and on the source code. It's a plain question of responsibility. The programmer takes responsibility for what is fed to the compiler. This ought to be clear. It is the programmer's task, IMNHO, to ensure that the source is OK and efficiently written -- the compiler's responsibility should be to ensure that it is faithfully translated. If risky technology hs to be used, it should be used by the programmer, and it should be evident in the source. I am sick of having to disassemble compiler output to see where the optimizer has botched it. I'd rather like that the effort put in the pursuit of code generator level aggressive optimizers were put into source code level rewrite assistants, and in making compilers more stable, efficient, and reliable at doing translation, not rewriting. pcg> No, because even if two combinations of primitives can actually be pcg> shown in the given context to be equivaltn according to the pcg> language's algebra, it ought to be assumed that the programmer pcg> intended to write the particular one he chose to write for some pcg> good reason. barmar> Also, many programs are targeted for multiple (or unspecified) barmar> physical machines and/or compilers, so the programmer may not be barmar> able to choose language primitives that are good for all the barmar> target environments. This is damn good reason to use higher level languages, and let the compiler translate their primitives into locally efficient versions. It is not good reason for the compiler to rewrite pieces of program. pcg> Are there alternatives? pcg> Yes, rewriting the combinations of primitives not in the code pcg> generator, but at the source code level, either manually or pcg> automatically. barmar> Are you saying that this should be done in a module that is barmar> independent of the target machine and compiler? Certainly of the compiler. Possibly of the machine. Such a module may be either a source level rewrite assistant, or a librarian, or both. Consider the Amsterdam Compiler Kit; they define a tolerably well specified machine independent intermediate code, and then work on that. I don't like that a lot, but it is a defensible approach. I still think it is too hazardous, because the intermediate code generated by the compiler may well be equivalent to the source, but maybe not transformations of it. Demonstrating that the intermediate code output of the compiler is equivalent to the source under all possible transformations of it is not that simple. barmar> I was under the impression that most complex optimization was barmar> already beind done using source-source translation, Not really. What happens is that the program is translated into some very low level representation, this is analyzed and rewritten, and the rewritten form is translated to machine code. Most of the time this very low level representation is not terribly well defined. barmar> For instance, strength reduction and common subexpression barmar> elimination translate the source into source with an additional barmar> temporary variable. Well, no. You do not see the source. If you did, this would be a rewriting assistant. barmar> Also, I don't think it matters what the destination of the barmar> rewrite is. Ah, as long as things work. But making things work is the problem -- it is easy to say that aggressive optimization is either semantics preserving and OK, or it is broken, as Jim Giles says. The problem is to show that it is, not to assume that it can be made so. The engineering problem is to demonstrate that it does not add to the unreliability of a compiler, contrarily to appearances, and that hiding it in the code generator does not lower the confidence level in the correctness of compiled programs. I don't think I ever claimed that aggressive optimization should be avoided even if it is bug and cost free. The problem is whether it can be made so, and at what price, and if a more conservative and step-by-step approach should be used. barmar> I think the hard part is recognizing large combinations of barmar> primitives and reasoning about them. barmar> Transformation bugs usually result from unexpected interations barmar> between seemingly unrelated portions of the code (e.g. due to barmar> use of global variables or aliasing). In other words they result from underspecified or poorly designed language algebra. On the order hand some languages have been designed for easing semantic tranformations: barmar> This is one reason why much of the program transformation barmar> research has been done with functional languages: the absence of barmar> side effects makes semantic analysis much easier. Precisely. But this to me is somewhat extreme. I wish there were a better way. I have my own ideas on the subject, but let's keep them out of the way for now. pcg> Yes, using a language whose primitives are at the abstraction level pcg> desired, i.e. the primitives offered by the language are those pcg> needed for the application. barmar> Unfortunately, the computer industry doesn't abound with barmar> application-specific languages. And even if there were, finding barmar> programmers proficient in many of them would be hard. Ah. Here we get outside the scope of this newgroup: we should be discussing here what *ought* to be done. This is not biz.compiler.marketing or soc.programming.education. I have repeatedly conceded that the "dusty deck" or "dusty programmer" or "dusty language" problem means that the quick fix solution may be indeed the tricky job of doing aggressive optimization. I still think however that even in thios common but pathological case source level rewriting, which is a very interesting growth industry in the commercial sector (and this is a remark that should appear in biz.compsci.opportunities), would be better. pcg> Are the alternatives cost effective? pcg> Yes, because for *many* even if not all codes the time critical pcg> portion is small enough that careful automatic rewriting is pcg> feasible. Sorry, I meant to write "manual rewriting". barmar> How does the automatic rewriter know which parts are time barmar> critical? In many cases, program optimization is an incremental barmar> process: you find the bottleneck, rewrite it efficiently, find barmar> the new bottleneck, rewrite it, etc., until the overall barmar> efficiency meets the goals. Precisely, precisely. That's why I meant "manual". Naturally "manual" in this case may also mean "assisted by tools", but still under programmer control. As to automatic rewriting, instead: pcg> Yes, because automatic rewriting can be to a higher level language, pcg> which can then be more easily and non aggressively optimized, and pcg> automatic rewriting gives long lasting benefits in portability and pcg> confidence in the program. barmar> Ah, now I see that you're talking about *replacing* the source barmar> file with this automatically rewritten source file. Yes, yes. barmar> I thought you were talking about source-level transformations barmar> internal to the compiler (usually operating on the parse-tree, barmar> not the actual source text). Since the purpose of the barmar> transformation is to produce code that will map better by a barmar> specific compiler onto a particular hardware, I don't see the barmar> portability benefit of saving the transformed code. But I wrote about automatic rewriting to a higher level language, or the same language with calls to library procedures instead of ad hoc codes... This solves the problem at the root. You eliminate the need for aggressive optimization, and you get better maintainability and portability. pcg> Yes, because using a language whose primitives are at the level of pcg> the problem is known to reduce costs and improve quality via reuse pcg> of the language's implementation or libraries, where otherwise the pcg> implementation of thsoe primitives would have to be rewritten and pcg> re-aggressively optimized for every application. barmar> This is true if you can find such a language and programmers who barmar> know it. If the application is not very common then you will barmar> probably have to spend quite a bit to purchase or develop the barmar> language implementation. Subroutine libraries for general barmar> purpose languages can often do nearly as well and are much barmar> easier to come by. Indeed. That is why I say that the nice alternatives to aggressive optimization are either a higher level language compiler or a librarian. >They are legal but dangerous and not cost effective, according to the >above discussion. barmar> You made the statement that these optimizations don't buy much, barmar> but didn't back it up with any evidence. Bah. Hand waving hard here, the impression seems that on general purpose architectures we get improvements in the 10-30% range. For special purpose architectures running specific application classes the payoff may be much higher, but here the issue of whether the language is well matched to the application and the target machine becomes much much more important. chased> The compiler-writer's claim is "I can prove that these chased> optimizations conform to the standard"; pcg> STOP THE PRESSES! Compiler writer proves that he can do program pcg> rewriting in a bug free way! Program synthethizers must be round pcg> the corner! Major software companies stocks collapse! barmar> Very little can be *proven* about any program; there are just barmar> too many variables and dependencies. A compiler writer who barmar> makes such a statement is merely asserting that he's as barmar> confident about his optimizations as he is about the rest of his barmar> compiler. Which must be supported by good evidence, because reasoning says that adding complexity, especially about murky areas of language definitions, and especially where AI like behaviour is expected, to a program normally does lower it reliability. Anectodal evidence supports the impression that aggressive optimization in the code generator is still a black art rather than a science. It is up to the proponents of aggressive optimization to demonstrate that it does not add to the complexity or reliability of a program, and that it is a better alternative than rewriting, manually or automatically, the program in a higher level form more amenable to straightforward code generation, and that the benefits so gained are worth the effort. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/03/90)
On 1 Nov 90 22:33:35 GMT, jlg@lanl.gov (Jim Giles) said: jlg> From article <PCG.90Nov1171210@odin.cs.aber.ac.uk>, by jlg> pcg@cs.aber.ac.uk (Piercarlo Grandi): pcg> In a way or another this implies that each language implies a virtual pcg> machine. The task of the code generator is to map the virtual machine pcg> onto the real machine as well as it can, which means faithfully pcg> (semantics) and efficiently (pragmatics). jlg> Unfortunately for your argument, there are few languages which have jlg> definitions of this kind. The definition of a language usually jlg> implies a _class_ of virtual machines. Yes, of course a virtual machine may be left unspecified here and there. You are indeed allowed to view such a virtual machine as a possibly infinite class of more precisely defined virtual machines. Nominalism. jlg> The code generator is allowed to map any of these onto the real jlg> hardware. Any optimizer which generates code which faithfully jlg> implements anything within the class of acceptable behaviours is jlg> correct. Again, the problem is not to show that aggressive optimization is desirable if it assumed bug free. Indeed this idea is a red herring: jlg> If an optimizer leaves the class of acceptable behaviours, it is jlg> _broken_. Write or find a compiler, optimizing or not, for a popular language that is not _broken_ according to your definition. Of course you cannot. Now, the interesting question is how much of a liability in terms of reduced confidence is aggressive optimization. Reasoning shows that in way of principle it lowers the reliability of a compiler. Show that this is not true, or that if true there are compensating advantages, and quantify yoru argument. The burden of proof is on proponents of aggressive optimization, becasue they have to overcome the obvious observation that anything that enlarges or complicates a program will tend to decrease its reliability. jlg> You are trying to limit the compiler to code transformations that jlg> _you_ regard as straightforward. In truth, the compiler is only jlg> limited by the language definition. Only in a theoretical world. It is also limited by what is cost effective, in the sense above. The argument here is not that aggressive optimizations are illegal or impossible; it is whether they are the best alternative, when compared to straightforward code generation plus source level rewriting assistants, from an engineering point of view. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/03/90)
On 1 Nov 90 23:46:28 GMT, chased@rbbb.Eng.Sun.COM (David Chase) said:
chased> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
chased> The compiler-writer's claim is "I can prove that these
chased> optimizations conform to the standard";
pcg> STOP THE PRESSES! Compiler writer proves that he can do program
pcg> rewriting in a bug free way! Program synthethizers must be round the
pcg> corner! Major software companies stocks collapse!
chased> Read your own posting, please, and note that what you quote did not
chased> say what you say it said. "I claim I can prove" is most definitely
chased> not the same as "I have proved".
Ahem. Agreed, it is not the same. This is a bit too easy... :-)
chased> It means that the steps in optimization are based on little
chased> proofs, and sketches of proves; it doesn't mean that there is a
chased> big book proving the compiler correct.
It so easy to convince oneself that things will work...
chased> Such a book *could* be written if the need existed (the
chased> existance of dire need would be demonstrated by the exhibition
chased> of wheelbarrows full of money to finance its writing).
Point taken. However, *you* (not you personally, of course) are
proposing new untried and potentially hazardous technology. The burden
of proof in on you and your customers. If one believes without proof, in
the presence of arguments that counsel diffidence, amen.
chased> And, regarding, the tobacco industry analogy (an ad hominem analogy,
chased> too -- aren't vile insults what one uses when arguing from a weak
chased> position?),
I am sorry that it was perceived as insults and not grave humour. And
certainly it was not meant to be ad hominem. Hey, I actually like David
Chase (and Jim Giles). Good discussions, even if we do not necessaily
agree, are excellent learning exercises, as layers of assumptions and
different world views are exposed. And no doubt David Chase and Jim
Giles are people one can learn from. OK?
chased> I will suggest that you have no evidence, only anecdotes,
chased> for the alleged pervasive bugginess of optimizing compilers.
For starters I produce the general principle that a bigger and more
complicated program, unless proven otherwise, has to be presumed to be
buggier. There is a discipline called software physics, a branch of
software engineering, that collects such grim statistics.
chased> Certainly you have nothing like the evidence gathered to
chased> demonstrate a connection between tobacco and lung disease, and
chased> only hand-waving claims as to the cause-and-effect link.
There is a large body of evidence about a strong correlation between
complexity and size and bugginess (note that it seems that over a
program's lifetime the number of bugs seems to be related *only* to its
size and complexity, not to debugging effort, etc..., for a given level
of programming expertise).
chased> I *am* selling my skills, but I fail to see that there is
chased> anything pernicious about that.
Nothing wrong about that, except that your compilers do not carry the
warning "The President of ACM reminds you that aggressive optimization
may be hazardous to your program's correctness" :-). Again you have to
prove or at least argue that the well known general principle that
bigger and hairier means buggier does not apply to your aggressive
optimizers.
If you were using the same technology in source level rewriting
assistants things would be different -- the burden of ultimate
responsibility for the program logic's reliability shifts from you to
the programmer.
chased> It is also unclear that your claims about the speed/risk trade-offs
chased> are in any way connected with reality. Speed gained for you by an
chased> optimizing compiler is speed that you don't have to work for by
chased> hand-coding (though both can be used together to get even more
chased> speed-up if that is necessary), and the time saved by the programmer
chased> is more time that can be spent debugging and testing the code.
Again, what about the alternative of unbundling the risky rewriting
logic from the compiler and putting it in a source level rewriting
assistant? This is my favourite alternative. You could have as much fun
and business (and probably more) writing a source level rewriting
assistant as an aggressive optimizer, and it would be cleaner
engineering, and probably even more effective as to results.
chased> Every programmer I know has a bug introduction rate higher than
chased> any compiler that I've ever used (but this is purely anecdotal
chased> -- perhaps it is completely different for other people, eh? do
chased> *you* find more compiler-introduced bugs than author-introduced
chased> bugs in *your* code?), so it is in fact more reliable to let the
chased> compiler do the optimizing, even if you don't use the time saved
chased> for additional testing and debugging.
and also:
chased> I've said several of the things that I said in this article before --
chased> you might seriously consider comparing the bug-introduction rate of
chased> programmers, both skilled and ordinary, with the bug-introduction rate
chased> of several "aggressively optimizing" compilers.
Ah no, this cannot stand. The compiler does a completely different job
from the programmer -- the programmer must *invent* new code, and show
ti respects some specification, the compiler must merely translate it,
showing that the translation is done according to hopefully well defined
rules. As long as things remain like this I am happy. Unfortunately
aggressive optimization is a step away from this, because it requires an
element of invention, and using less well defined rules.
chased> You might also check to see if there are any trends in compiler
chased> bug rates over time, and see how that correlates with the
chased> optimizations performed by those compilers. I think you'll find
chased> that your claims are not supported.
This would be an interesting exercise for supporters of aggressive
optimization. Maybe they would be able to show that the general rule is
not true for aggressive optimizers, or that the increased level of
problems is compensated by the gains in resource economy. Maybe not. I
woudl like to remind that that I have no claims -- merely skepticism
about *their* unsupported claims, and the idea that motivated skepticism
counsels caution, much caution.
--
Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
grover@brahmand.Eng.Sun.COM (Vinod Grover) (11/03/90)
In article <PCG.90Nov2231021@athene.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >The burden of proof is on proponents of aggressive optimization, becasue >they have to overcome the obvious observation that anything that >enlarges or complicates a program will tend to decrease its reliability. I fail to see how an optimiser can complicate a program if it performs its transfromation at an intermediate level. The source program written by the programmer is *not modified* in any way. The translation to a complex intermediate form, if correct, does not affect the reliability of the source program. Vinod Grover Sun Microsystems
eric@snark.thyrsus.com (Eric S. Raymond) (11/03/90)
In <3674@stl.stc.co.uk> Tom Thomson wrote: > one to one translation? back in the 60s we used to talk about > the Fortran assembler; even earlier we had machines like Deuce > where the assembler would fix the placement of instructions for > you to minimise drum latencies, which was maybe one-to-one still > but very much against the spirit of your proposed restriction on > assemblers. I don't think your examples do much to challenge the previous poster's contention; they're all neolithic-era or older! These `optimizing' assemblers were all designed before the idea of HLLs had matured, in response to a *very* different set of tradeoffs than drives today's relatively hard distinction between HLLs and assemblers. I think it's a little too restrictive to characterize assemblers as one-to-one instruction translators. The essential functional difference is that HLL code is *opaque*; you can't look at it and write the corresponding machine code sequence without knowing global information such as name-to-stack-offset mappings. Assembler code, on the other hand, is transparent; you can look at it and see where all the bits will fall. "But wait!", I hear you say. "What about...macroassemblers?". Yes, they're a bit of a hybrid. But hey, when's the last you heard of anybody *using* one? One of the effects of maturing compiler technology for HLLs (and especially of the spread of C) has been to `purify' assemblers; from the point of view of a modern language designer, MACRO-10 was an overengineered wrong turn, like a Zeppelin with stained-glass windows. The pcc design made an important evolutionary step by actually generating assembler source rather than object records. This completed the separation of function -- the HLL for abstraction handling, the assembler for bashing instruction bits into place. I think there are all kinds of good engineering reasons to believe this separation of function between HLL and assembler will be a stable feature of future development-environment design for compiled languages. From that modern point of view, the gentleman making the `one-to-one' claim was correct in concept if not in detail. -- Eric S. Raymond = eric@snark.thyrsus.com (mad mastermind of TMN-Netnews)
cik@l.cc.purdue.edu (Herman Rubin) (11/04/90)
In article <1849@exodus.Eng.Sun.COM>, chased@rbbb.Eng.Sun.COM (David Chase) writes: > diamond@tkou02.enet.dec.com (diamond@tkovoa) writes: > >C was invented in order to be that better assembler. If you want machine > >level, you're supposed to program in C. That is exactly the purpose for > >which C was created. If you want an HLL, use one; don't cripple C. .................... > 1) several optimizations could be applied much more frequently to C if > it weren't for possible aliasing in procedure pointer/array > parameters. If you don't understand what I'm talking about, go read a > book on dependence analysis -- Wolfe's _Optimizing Supercompilers ..._ > will give you a feel for what can be done to Fortran. It is possible that this was the intention for C, but I do not know of any machine for which it succeeds in being a better assembler. I believe it is possible to achieve that purpose, but not in a language as limited as C. The big problem with aggressive optimization is that the compiler cannot have the necessary information to know what is safe. The present "solution" is to either only allow the compiler to do what is known, or at least believed, to be safe, or to restrict the programmer so that the unsafe conditions supposedly cannot arise. There is a way out of this, which I do not believe has been tried. That is, have the compiler ask the programmer! It may even be that the programmer does not know, but can find out. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/05/90)
On 3 Nov 90 04:16:26 GMT, grover@brahmand.Eng.Sun.COM (Vinod Grover) said:
grover> In article <PCG.90Nov2231021@athene.cs.aber.ac.uk>
grover> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
pcg> The burden of proof is on proponents of aggressive optimization,
pcg> becasue they have to overcome the obvious observation that anything
pcg> that enlarges or complicates a program will tend to decrease its
pcg> reliability.
grover> I fail to see how an optimiser can complicate a program if it
grover> performs its transfromation at an intermediate level. The source
grover> program written by the programmer is *not modified* in any way.
Sorry if It was not clear from what I wrote that the program I was
referring to was the *compiler*.
grover> The translation to a complex intermediate form, if correct, does
grover> not affect the reliability of the source program.
I intended to say that the code that effects the transformation adds to
the size and complexity of the compiler and therefore affects the
compiler's reliability. Each individual transformation may look simple
and safe, but once you consider that you need code also to do the
analysis that makes possible and drives the transformation process and
you also need to consider he interactions between tranformations, which
are difficult to model in languages that have not been designed with
transsformations in mind... You have a large amount of hairy code in the
compiler's bowels, and this is baaaad.
--
Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
jlg@lanl.gov (Jim Giles) (11/06/90)
From article <PCG.90Nov2231021@athene.cs.aber.ac.uk>, by pcg@cs.aber.ac.uk (Piercarlo Grandi): > [...] Now, the interesting question is how much of a > liability in terms of reduced confidence is aggressive optimization. > Reasoning shows that in way of principle it lowers the reliability of a > compiler. Show that this is not true, or that if true there are > compensating advantages, and quantify yoru argument. You are making the assumption that optimization techniques reduce the reliability of the compiler. This is certainly true for esoterica. But, you have also attacked the ond standby techniques as well. If I were writing a compiler for _reliability_ alone, I would certainly make a complete data flow graph and a complete analysis of control flow. These techniques have been around for decades, are stable, and much theoretical work (including correctness proofs) has been done: in short, they are the most reliable tool after automatic parser generators that are available in compiler work. Having done this, it is difficult (and therefore less reliable) _NOT_ to apply the obvious code generation - which would be code which has been optimized in the following ways (and probably more): Constant folding dead code elimination (given above: any branch _known_ not to be taken) common expression elimination loop invariant detection strength reduction ... The last three are all optimizations that you have strongly opposed in the past. Yet, in a modern compiler, it would be _less_ reliable not to apply them. So, don't pretend your objections are based soley on reliability. J. Giles
jlg@lanl.gov (Jim Giles) (11/06/90)
From article <2701@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin): > [...] > There is a way out of this, which I do not believe has been tried. That is, > have the compiler ask the programmer! It may even be that the programmer > does not know, but can find out. And yet, Herman Rubin is one of those who opposed my proposal for explicit "aliased" attributes together with a feedback mechanism in the loader for indicating when the user's explicit directives were mismatched. The specific purpose of these was to allow the user do safely direct that optimizations were allowed (by not aliasing things) or to tell the compiler not to optimize specific combinations of variables (which are aliased). Further, the proposal was made in a specific way so that the user would _always_ know whether aliasing were needed of not. Oh well J. Giles
chased@rbbb.Eng.Sun.COM (David Chase) (11/06/90)
pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: > Point taken. However, *you* (not you personally, of course) are > proposing new untried and potentially hazardous technology. The burden > of proof in on you and your customers. If one believes without proof, in > the presence of arguments that counsel diffidence, amen. I don't think that the technology is as new or untried as you seem to think. The dependence-based code manglers use information that is gathered (for Fortran, at least) in a reliable fashion (for C, it's not clear how much useful information there is without either pragmas or inlining), and has been used in "vectorizers" and "parallelizers" for about the last ten years. There's been plenty of proofs to show that dependence information means what it claims to mean. The set of optimizations that one hopes a modern compiler would perform (e.g., strength reduction, constant propagation, hoisting of loop invariant code, register allocation, reassociation, linear function test replacement) have also all been around for some time, and thus are not untried, and are no more "potentially hazardous" than scanning and parsing. (Read on for a code size comparison.) > For starters I produce the general principle that a bigger and more > complicated program, unless proven otherwise, has to be presumed to be > buggier. Agreed, generally true. A quick glance around the programs that I run every day reveals that the (text of the) optimizer (for C, Fortran, Pascal, and Modula-2) is smaller (often by an order of magnitude) than vmunix, gnuemacs, dbx, cfront, and several language front ends. So, should I worry more about bugs in the front ends, bugs in the editor, bugs in the OS, bugs in the debugger, or bugs in the optimizer? Although, if people cared that much more about reliability than they do about speed, I think we'd see a lot more people programming in languages other than Fortran, C, and C++. Of course, that's another discussion, but it probably gives some indication of what people expect from their compilers (besides perfection, of course). David Chase Sun Microsystems
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)
In article <2672@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: > I have no objection to the compiler helping to optimize, and even making > suggestions to the programmer. But I object to the first sentence; there > should be NO details of the machine language, instruction overlap, etc., > hidden from the programmer. It is, unfortunately, impossible to standardize a language that makes allowances for every hardware known to man. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)
In article <4950@avocado5.UUCP> wallach@motcid.UUCP (Cliff H. Wallach) writes: > In article <TED.90Oct26124614@kythera.nmsu.edu> ted@nmsu.edu (Ted Dunning) writes: > -if you compile the following code on a sun3 with -O4, [ compiled array code ] [ obviously faster hand code ] I observe that when I convert the original code into pointer code, the compiler produces very similar results to the hand version. Sure, Jim. Arrays are as efficient as pointers. Uh-huh. Now I know why all my array code has always seemed so slow on Suns. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)
In article <4179@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: > Surely the cardinal rule for an assembler is not "one-to-one translation" > but "full access to all machine operations". Exactly! As a matter of principle I'm even willing to argue that the programmer should be able to exert control over instruction scheduling, even though I am reasonably confident that the compiler does this well. > Why is it evil to offer > optimisation as well? It's not. I hope we can get Herman to agree on this. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)
In article <1932@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: > I will suggest that you have no evidence, only anecdotes, > for the alleged pervasive bugginess of optimizing compilers. He says this while another comp.lang.c poster complains about an optimizer bug, while I find that a new optimizer crashes on an empty loop, and while someone else sees an optimizing Fortran compiler that still produces incorrect code for overly complex control structures. ---Dan
peter@ficc.ferranti.com (Peter da Silva) (11/06/90)
What's wrong with: if(array_a_and_array_b_overlap) use_safe_code else use_fast_code_that_doesn't_work_for_aliased_arrays. The whole alias problem is a compiler technology problem. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/07/90)
On 5 Nov 90 22:15:54 GMT, jlg@lanl.gov (Jim Giles) said: jlg> From article <PCG.90Nov2231021@athene.cs.aber.ac.uk>, by jlg> pcg@cs.aber.ac.uk (Piercarlo Grandi): pcg> [...] Now, the interesting question is how much of a liability in pcg> terms of reduced confidence is aggressive optimization. Reasoning pcg> shows that in way of principle it lowers the reliability of a pcg> compiler. Show that this is not true, or that if true there are pcg> compensating advantages, and quantify your argument. jlg> You are making the assumption that optimization techniques reduce jlg> the reliability of the compiler. This is certainly true for jlg> esoterica. But, you have also attacked the old standby techniques jlg> as well. Whether the technique is well understood or not does not matter that much -- the *algorithm* may be perfectly safe (even if proving so in the context of a language whose semantics have not been designed for transformations is quite hard, even for seemingly simple things like common subexpression elimination); the hazard comes from the amount of code needed to implement it. The code may be buggy -- actually *it will*. More code, more bugs. Do we want 100 thousand lines of compiler to debug when 10 thousand would do it? No... Naturally not all those 100 thousand are involved in the critical translation business, but the translation business had better be as streamlined as possible. More code more bugs, no question about it. It is not an assumption. Anything that is not related to straight translation means more bugs. jlg> If I were writing a compiler for _reliability_ alone, I would jlg> certainly make a complete data flow graph and a complete analysis of jlg> control flow. These techniques have been around for decades, are jlg> stable, and much theoretical work (including correctness proofs) has jlg> been done: But these techniques require pretty long and elaborate code. Again, if we assume (and this is indeed a crazy assumption) that we can find an algorithm to perform hairy transformations without making mistakes and then implement the algorithm without bugs, then aggressive optimization is a no-loss situation (except for compile time and space), and whatever its benefits on run time and space we shoudl do it. It also wins if we think the program is a "dusty deck" or the programmer is a "dusty programmer". But in the latter case programmer's assistants at the source level seem an even better win. jlg> in short, they are the most reliable tool after automatic jlg> parser generators that are available in compiler work. This is a quite extraordinary statement -- you may be implying that parsing C's syntax, which has been designed for easy context free LL(1) parsing, is something as straighforward and reliable as rewriting sequences of C's statements, which have not been designed for it? (note that C's *expressions* have actually been designed for rewriting, instead) jlg> So, don't pretend your objections are based soley on reliability. Never claimed this -- the claim is that anything that complicates a crucial system component like a code generator without demonstrated and quantified cost/benefit analysis should be treated with caution, especially if the complication is about doing something that was not anticipated at design time. My argument is that such potentially risky business should be a separate program, that it is the responsbility of the programmer to invoke, guide, and where the programmer has to check the results. This may look like too much work, but aggressive optimization is expected to make a significant difference, like most any optimization, in a few cases, and these may deserve attention. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
chased@rbbb.Eng.Sun.COM (David Chase) (11/07/90)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >chased@rbbb.Eng.Sun.COM (David Chase) writes: >> I will suggest that you have no evidence, only anecdotes, >> for the alleged pervasive bugginess of optimizing compilers. > >He says this while another comp.lang.c poster complains about an >optimizer bug, while I find that a new optimizer crashes on an empty >loop, and while someone else sees an optimizing Fortran compiler that >still produces incorrect code for overly complex control structures. That's still just anecdotes. You've found what, maybe one bug per compiler? That's not pervasive. Note the nature of the bugs -- you've got one bug in a beta-test compiler (GCC -- this is not a slam on GCC either; most people don't release their beta compilers for the world to tinker with), one bug that, though annoying, won't cause you to go debugging down a rathole, and one bug that will actually cause a programmer to waste time. I'd call that last bug serious. I've never claimed that optimizers were bug-free, but I will claim that the bug rate is acceptably low for the improved performance that they offer, especially if compared with other methods of obtaining that performance (i.e., hand-tweaking of assembly language, source-to-source transformations). People make mistakes. Furthermore, in the sad event that a compiler has a bug in it, I can often write down description of the bug and put it on my list of things to avoid/watch for(/repair?) in the future. Human beings, however, are creative, and make *new* mistakes. People should do the optimizations that compilers cannot hope to get (i.e., selection of appropriate algorithm), and leave the repetitive twiddly stuff for compilers. Furthermore, even after extensive application of hand-optimizations, optimizing compilers can still yield performance improvements by performing optimizations not expressible (within bounds of tedium) in the source language (note that some of these optimizations still fall into the category of "aggressive" optimizations -- e.g., movement of non-volatile loads and stores for scheduling, software pipelining, tail-call elimination). Arg. I'm sure I sound like a broken record, but I'm sure glad I stayed out of the pointer-and-arrays fiasco. David Chase Sun Microsystems
jlg@lanl.gov (Jim Giles) (11/07/90)
From article <2133:Nov607:16:1090@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): < In article <4950@avocado5.UUCP> wallach@motcid.UUCP (Cliff H. Wallach) writes: <> In article <TED.90Oct26124614@kythera.nmsu.edu> ted@nmsu.edu (Ted Dunning) writes: <> -if you compile the following code on a sun3 with -O4, < [ compiled array code ] < [ obviously faster hand code ] < < I observe that when I convert the original code into pointer code, the < compiler produces very similar results to the hand version. < < Sure, Jim. Arrays are as efficient as pointers. Uh-huh. I have never made the claim that arrays _are_ as efficient as pointers. I have made the claim that _most_ code I've seen are _faster_ with arrays. I've also frequently pointed out that a _bad_ compiler can make anything happen. The _basic_ claim I always make is that there is no reason at the present state of the art for arrays to ever be slower that the corresponding pointer implementation. If a given implementation implements arrays slower, then it isn't state of the art. By the way, I frequently make rather strong remarks about the bad quality of C compilers. Recently someone has pointed out that there is a sound reason for it. Since the optimizations which I have been discussing are heavily impaired by the presence of aliasing, and since things are almost always aliased in a C context, C compiler writers simply don't bother to do those optimizations. It seems practical that a technique which is almost always inhibited is not really worth implementing. So, any of you people out there writing C compilers, I apologize if my remarks offend - there's nothing you can do about it if the language itself makes it useless to optimize. Still, I would expect a truly "production quality" compiler to do well with arrays - at least, if _I_ have to par for it. J. Giles
smryan@garth.UUCP (Steven Ryan) (11/07/90)
Beat me! Whip me! Make me use an optimiser! Isn't this kind of silly? Are there really compilers that force people to use the optimiser or is it optional? If it's optional and you don't want, don't use. If you can buy a compiler without an optimiser (which is easy), do so. If nobody wants an optimiser, nobody buys them. If somebody wants one, why not let them buy it? Leather. Windows. And chains. Oh! my! -- ...!uunet!ingr!apd!smryan Steven Ryan ...!{apple|pyramid}!garth!smryan 2400 Geng Road, Palo Alto, CA
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/07/90)
In article <2213@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: > You've found what, maybe one bug per > compiler? And the bugs keep coming, and coming, and coming, and coming ... I think we've successfully reduced this argument to a simple question: ``Do optimizers gain more speed than they lose reliability?'' Again, I'm firmly convinced that heavy optimizations are on the latter side. It's pretty clear that you are on the former side. In any case I think the argument is finished. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/07/90)
In article <6PX6-O@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > What's wrong with: > if(array_a_and_array_b_overlap) > use_safe_code > else > use_fast_code_that_doesn't_work_for_aliased_arrays. Yes, that's what I've been trying to convince Jim of for most of this year. Note that it cannot be implemented portably in C---remember my question a while back in comp.lang.c about whether there was any way to detect whether a pointer was within a given array[n]? The answer is that there isn't, and you need a language extension to do it right. However, a smart compiler can perform this optimization automatically. In most cases it can do the test at compile time, and generate calls to either the fast compiled version or the slow compiled version. I don't know why Jim repeatedly insists this is impossible. > The whole alias problem is a compiler technology problem. Exactly. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/07/90)
In article <5079@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > I have never made the claim that arrays _are_ as efficient as > pointers. I have made the claim that _most_ code I've seen are > _faster_ with arrays. I've also frequently pointed out that a > _bad_ compiler can make anything happen. Are you insinuating that Sun's optimizing compiler is a bad compiler? > If a given implementation implements > arrays slower, then it isn't state of the art. I'd love to see your brilliant state-of-the-art solution. You're saying that, the compiler can figure out rather general assertions about what indices are going to be used where, and can generate code where the pointers (arrays) are pre-indexed appropriately. Could one of you CS types please illustrate to Jim that finding optimal addition chains in a general computation is equivalent to solving the halting problem? ---Dan
jlg@lanl.gov (Jim Giles) (11/07/90)
From article <7177:Nov620:48:1590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > [... if-test around aliased vs. non-aliased code ...] > However, a smart compiler can perform this optimization automatically. In > most cases it can do the test at compile time, and generate calls to > either the fast compiled version or the slow compiled version. I don't > know why Jim repeatedly insists this is impossible. The example given doesn't even contain a procedure call. It has nothing at all to do with the thing that I consistently have asserted is not possible for the compiler _alone_. As it happens, I agree that this case probably has a purely compile-time (or compiled-in run-time) solution. It is you continued claim that the compiler can solve interprocedural aliasing questions by itself (and at compile-time) that I disagree with. J. Giles
jlg@lanl.gov (Jim Giles) (11/07/90)
From article <7659:Nov620:58:5990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > [...] Could one of you CS > types please illustrate to Jim that finding optimal addition chains in a > general computation is equivalent to solving the halting problem? Could anyone in Dan's immediate neighborhood please go over to his place and explain what a data dependency graph is? This is one of the most stable and reliable tools available in the compiler generator's kit. It allows one to trace the use of an array index right back to the place where the index was calculated. Or, failing that, to the place where the index came into the present context (a procedure argument, for example). In either case, the compiler can clearly place the add of the array base to that index at _exactly_ the place where the index becomes a "live" value. Or, the compiler can do the add of the array base at any intervening point in the control flow graph (also an extremely stable and reliable part of the compiler writer's art). If the array index becomes a "live' value before the array reference (that is, the index is precomputed), then the whole reference address can be precomputed by the compiler exactly there. J. Giles
gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) (11/07/90)
In article <2133:Nov607:16:1090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >I observe that when I convert the original code into pointer code, the >compiler produces very similar results to the hand version. Maybe you should have tried gcc -- it does quite a bit better than cc on this simple exaple, *which includes no alias possibilties*. Real array code has possible aliasing all over the place.
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/08/90)
On 6 Nov 90 01:21:12 GMT, chased@rbbb.Eng.Sun.COM (David Chase) said:
chased> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
pcg> Point taken. However, *you* (not you personally, of course) are
pcg> proposing new untried and potentially hazardous technology. The
pcg> burden of proof in on you and your customers.
chased> I don't think that the technology is as new or untried as you
chased> seem to think.
Well some of it is. All the parts where one is trying to do serious
program rewriting are akin to program synthesis, which is still way from
anything I would depend on. Also, the point is that it is not
*necessary*; it is only used for cost/benefit reasons. It remains to be
shown that the costs are greater than the benefits. Jim Giles says that
any optimizer that is not bug free is _broken_, and thus (I guess from
his tone) irrelevant to our discussion. He therefore dismisses a large
part of the cost side of the equation, which is a bit too easy.
chased> The set of optimizations that one hopes a modern compiler would
chased> perform [ ... have been with us for a long time and thus ... ]
chased> are no more "potentially hazardous" than scanning and parsing.
Ahem. Scanning and parsing are usually context-free, they do not involve
any interpretation of the input (semantics), their algebras are very
well understood indeed and fairly simple, etc... -- I would grant though
that code generation is approaching the level of confidence of parsing
and scanning (thanks to the works of people trying to automate as much
of it as possible), but even that is subject to doubt.
But I concede that many of the optimizations that you mention are indeed
'traditional'; given sufficiently large amounts of time, talent and
effort they could be performed reliably inside a code generator.
But then my second level of argument is triggered -- they may have
become cost-effective for the user (visible performance gain, small
reliability loss), but is it true that investing in an aggressive
optimizer is the best way to achieve that user level cost-effectiveness?
Better training may give greater and longer lasting benefits, using
higher level languages may give greater and longer lasting benefits,
source level programmer's assistant may give greater and longer lasting
benefits. All such technologies are tested and proven (except for
programmer's assistants, unfortunately).
Better training and use of higher level languages do not apply to the
"dusty deck" and "dusty programmer" problems, admittedly, so there
aggressive optimization may be justified, but even then there is some
reason to think that programmer's assistants and source rewriting may be
preferable to aggressive optimization, and mor eresearch on that may pay
off better than more research on aggressive optimization.
pcg> For starters I produce the general principle that a bigger and more
pcg> complicated program, unless proven otherwise, has to be presumed to be
pcg> buggier.
chased> Agreed, generally true. A quick glance around the programs that
chased> I run every day reveals that the (text of the) optimizer (for C,
chased> Fortran, Pascal, and Modula-2) is smaller (often by an order of
chased> magnitude) than vmunix, gnuemacs, dbx, cfront, and several
chased> language front ends.
But these comparisons are not that meaningful! Bugs in the editor and in
the debugger are not as dangerous as those in the compiler. Bugs in the
OS are almost as dangerous though. Also, a lot of these programs is
relatively trivial and unimportant -- almost all of an aggressive
optimizer is critical.
Not every type of program has the same bug intensity -- scanners and
parser can be reasonably expected to have much less bug intensity than
optimizers, and whatever they intrinsic complexity they are in any case
*necessary*. An optimizer is not *necessary*; especially not an
aggressive optimizer, give that there are alternatives. A code
generator, a parser and a scanner and symbol table management are
necessary.
The argument is not, I repeat here for the millionth time, that
optimizers are useless or infinitely buggy or otherwise that if they are
perfect using them is a no loss situation -- it is that, since they are
not necessary, and they are very critical, they must be *proven* to add
more value (performance) than they subtract (*extra*, avoidable bugs),
and that their effect is not best achieved by other means.
chased> So, should I worry more about bugs in the front ends, bugs in
chased> the editor, bugs in the OS, bugs in the debugger, or bugs in the
chased> optimizer?
You should worry a lot about them too! I am no less unhappy about the
size of vmunix, GNU Emacs, gdb and so on than about the size of gcc and
cfront and accomplices. Indeed I am consistent in other newgroups in
complaining about the large increases in complexity and unreliability,
with little gain in performance (or loss of it) or in functionality,
that these programs give, and offering examples of equivalent technology
that is better designed and provides the same benefits at lower cost.
RISC software!
chased> Although, if people cared that much more about reliability than
chased> they do about speed, I think we'd see a lot more people
chased> programming in languages other than Fortran, C, and C++. Of
chased> course, that's another discussion, but it probably gives some
chased> indication of what people expect from their compilers (besides
chased> perfection, of course).
Let's be temporarily diverted to that other, sociological discussion:
If they cared about speed almost all programmers would be much better
off if they used a higher level language and left code generation to the
compiler, or to a programmer's assistant, or to the authors of a
library. What instead they expect is to write poor code themselves and
have a magic optimizer substitute it with the better code that a higher
level language would have more economically and reliably translated
into.
To get back to the admittedly exxagerated tobacco industry analogy -- if
smokers cared about their health, they would be eating carrots instead,
but they don't. This is maybe a good reason to be a cigarette
manufacturer (there is a market demand) but maybe not a good reason to
say that if people don't care about a problem it is irrelevant.
--
Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
jlg@lanl.gov (Jim Giles) (11/08/90)
From article <PCG.90Nov6181731@odin.cs.aber.ac.uk>, by pcg@cs.aber.ac.uk (Piercarlo Grandi): > On 5 Nov 90 22:15:54 GMT, jlg@lanl.gov (Jim Giles) said: > > [...] Do we want 100 thousand lines of compiler > to debug when 10 thousand would do it? No... Naturally not all those 100 > thousand are involved in the critical translation business, but the > translation business had better be as streamlined as possible. I don't know what 100 thousand lines you are referring to. The techniques I am recommending make compilers smaller and easier to write and maintain. They also, as a side-effect of how they work, carry out several of the optimizations that you are on record as opposing. I would not trust an ad-hoc compiler of the kind you seem to be recommending to get the "hello world" example correct. > [...] > More code more bugs, no question about it. It is not an assumption. > Anything that is not related to straight translation means more bugs. In other words, ad-hoc "straight" translations is better that structured application of known compiler techniques. Since most of the known techniques are off-the-shelf the compiler writer can concentrate on getting any differences specific to his own task working correctly. > [...] > But these techniques require pretty long and elaborate code. [...] Really? The smallest, quickest written Modula-2 compiler I've seen has nearly all the techniques That I've been discussing. The goal of the thing was to produce a simple compiler that concentrates of getting the best code generated for the least effort by the compiler. Don't know where this "long and elaborate code" you're talking about is hiding. You must be thinking about the esoterica that I mentioned last time (and which I specifically said I _wasn't_ talking about). J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/08/90)
In article <1990Nov6.224410.15068@murdoch.acc.Virginia.EDU> gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) writes: > In article <2133:Nov607:16:1090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >I observe that when I convert the original code into pointer code, the > >compiler produces very similar results to the hand version. > Maybe you should have tried gcc -- it does quite a bit better than cc > on this simple exaple, *which includes no alias possibilties*. Weird, you're right. gcc does a lot better on array code. Sun, would you please make the obvious additions to your optimizer? ---Dan
misha@ai.mit.edu (Mike Bolotski) (11/08/90)
In article <7659:Nov620:58:5990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > pointers (arrays) are pre-indexed appropriately. Could one of you CS > types please illustrate to Jim that finding optimal addition chains in a > general computation is equivalent to solving the halting problem? Bzzt. But thanks for playing, Dan. Optimal addition is NP. Halting problem is undecidable (ever). You can argue practicality, but not theory. -- Mike -- Mike Bolotski Artificial Intelligence Laboratory, MIT misha@ai.mit.edu Cambridge, MA
gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) (11/08/90)
In article <6PX6-O@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >What's wrong with: > > if(array_a_and_array_b_overlap) > use_safe_code > else > use_fast_code_that_doesn't_work_for_aliased_arrays. Of course, this won't catch every situation where aliasing can occur. However, you might be happy to know that Cray's Fortran compiler will insert tests into runtime code to see if particular loops are vector or scalar, in situations where a variable determines the behavior. I've yet to see a C compiler which does this, although some (Convex) do some compile-time checking for aliasing in array references. See my reply to Dan for situations where this won't catch all Fortran-style non-aliased expressions.
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/09/90)
On 6 Nov 90 20:12:12 GMT, smryan@garth.UUCP (Steven Ryan) said: smryan> If it's optional and you don't want, don't use. If you can buy a smryan> compiler without an optimiser (which is easy), do so. If nobody smryan> wants an optimiser, nobody buys them. If somebody wants one, why smryan> not let them buy it? Oh, this is sociology -- the issue is not whether there is a *market* for optimizers or not -- it is whether the market should exist, and which needs it should satisfy. People who buy optimizers do so not on the basis of adverts associating them with scantily clad gorgeous females or (even more alluring to some people :->) 64 bit superscalar CPUs. They (should) do so on the basis of a cost effectiveness argument, and we have been debating this. The argument is actually twofold: as a user, should I buy an optimizer? as a developer, should I invest in building an optimizer? The second problem is also very important -- the computer industry is largely driven by the supply, not the demand side, quantitatively and qualitatively. Users look to implementors for a sense of direction. The decisions of implementors will be often followed without questioning, because users do not have the experience or insight and more or less know it. IMNHO the main reason for which the computer industry is driven by the supply side is that talent is very scarce, and for this reason not all possible alternatives worth exploring can be explored. The problem, bfore considering market demand, can also be expressed thus: should David Chase and Jim Giles and Preston Briggs and Hank Dietz (just to mention some frequent contributors to this sort of discussion) spend their time devising ever more ingenious and complicated and hairy code generator level hacks, or would their research and development efforts be better spent on clever compilation of higher level languages and source level programmer's assistants? Are there enough compiler experts to explore both possibilities? If not, which possibility looks the best winner? Is aggressive optimization of low level languages a technological blind alley like autocoders? Inquiring minds want to know... Not simple questions, and I am presenting here an unfashionable but hopefully amusing set of answers to them, which should not be ignored. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/09/90)
On 6 Nov 90 20:48:15 GMT, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) said:
brnstnd> In article <6PX6-O@xds13.ferranti.com> peter@ficc.ferranti.com
brnstnd> (Peter da Silva) writes:
peter> What's wrong with:
peter> if(array_a_and_array_b_overlap)
peter> use_safe_code
peter> else
peter> use_fast_code_that_doesn't_work_for_aliased_arrays.
A lot of things. Code bloat for one. But it is an appealing solution to
some difficult problem in many useful cases.
brnstnd> [ ... ]
peter> The whole alias problem is a compiler technology problem.
brnstnd> Exactly.
No, and here we have our largest difference: it is a language design
problem, and a problem of choosing the suitable language.
This discussion reminds me of Dijkstra's famous address in 1968 to some
(the IFIP?) conference, on the software crisis, that went like this:
We have a software crisis, and we are told that we must advance the
state of the art so as to solve it. We are only given a few constraints
to satisfy:
no changes to the hardware platforms
no changes to the languages
no changes to the operating systems
no changes to the education of programmers
no changes to the software tools
no changes to organizational practices
...
Well, this is RIDICULOUS!
In twenty two years progress has not exactly astonishing, except for a
few things, like DBMSes. There has even bit some kind of regress towards
mediocrity.
--
Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/09/90)
On 6 Nov 90 20:39:13 GMT, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) said: brnstnd> And the bugs keep coming, and coming, and coming, and coming brnstnd> ... I think we've successfully reduced this argument to a brnstnd> simple question: ``Do optimizers gain more speed than they lose brnstnd> reliability?'' Again, I'm firmly convinced that heavy brnstnd> optimizations are on the latter side. It's pretty clear that brnstnd> you are on the former side. In any case I think the argument is brnstnd> finished. Not so easily: the argument is also: are the alternatives at least in way of principle as effective and safer? Because if aggressive optimization were the only alternative, we would only have choice between more speed and less reliability or less speed and more reliability, and more speed can often mean a lot, and many people do not care about reliability (e.g. the quoted bloat in vmunix, GNU Emacs, etc...). But there are alternatives. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
jlg@lanl.gov (Jim Giles) (11/09/90)
From article <MISHA.90Nov7191750@just-right.ai.mit.edu>, by misha@ai.mit.edu (Mike Bolotski): > In article <7659:Nov620:58:5990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >> pointers (arrays) are pre-indexed appropriately. Could one of you CS >> types please illustrate to Jim that finding optimal addition chains in a >> general computation is equivalent to solving the halting problem? > [...] > Optimal addition is NP. Halting problem is undecidable (ever). > You can argue practicality, but not theory. Besides, the problem under discussion originally wasn't the optimal addition chain problem anyway. The original question was whether there were cases when a pointer _could_ be precomputed but a compiler _couldn't_ find the same optimization. So, the issue is this: p = expression + &x; i = expression /* same as pointer version*/ ...real far away...` ...real far too... a=*p; a=x[i] The only question is "can the compiler promote the add of the base of array 'x' up to the point where 'i' is precomputed and therefore get the same efficiency as the precomputed pointer?" Turns out that this problem is not even NP, it's O(N^2) for languages which permit arbitrary GOTOs and it's O(N) for "structured" languages (where all constructs are guaranteed to be properly nested, etc.). In both cases, the number N is the number of 'basic blocks' in the code to be optimized. The "liveness" analysis used is required anyway in order to do compile-time error checking adequately. Note: even a language with GOTOs can be analyzed in O(NlogN) if the flow graph is "reducible". See any good book on compiler construction. Note2: having pointed out that the optimization is possible, I should say that it's not always desireable. If the array 'x' is statically allocated, many mainframes and minis have address modes which would do the add in the memory unit for free. See Aho, Sethi and Ullman's `dragon' book, chapter 9. In chapter 10 of the same book, they work an optimization example where this optimization is possible, but they don't do it for this very reason. J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/09/90)
In article <5351@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > From article <MISHA.90Nov7191750@just-right.ai.mit.edu>, by misha@ai.mit.edu (Mike Bolotski): > > In article <7659:Nov620:58:5990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >> pointers (arrays) are pre-indexed appropriately. Could one of you CS > >> types please illustrate to Jim that finding optimal addition chains in a > >> general computation is equivalent to solving the halting problem? > > [...] > > Optimal addition is NP. Halting problem is undecidable (ever). > > You can argue practicality, but not theory. (Note that I have demonstrated in another article that Mike is wrong.) > Besides, the problem under discussion originally wasn't the optimal > addition chain problem anyway. The original question was whether > there were cases when a pointer _could_ be precomputed but a compiler > _couldn't_ find the same optimization. So, the issue is this: > p = expression + &x; i = expression /* same as pointer version*/ > ...real far away...` ...real far too... > a=*p; a=x[i] That's exactly what I'm talking about! The i's and j's and other integers used to index the arrays are *added* together. The program could have x[i + j - 17] and lots of other weird expressions, all evaluated conditionally through the control structure of the program. But I didn't need even this in my undecidability proof: all I needed was x[0], x[1], and control flow. [ Jim waxes prolific: ] > The only question is "can the compiler promote the add of the base > of array 'x' up to the point where 'i' is precomputed and therefore > get the same efficiency as the precomputed pointer?" Turns out that > this problem is not even NP, it's O(N^2) for languages which permit > arbitrary GOTOs and it's O(N) for "structured" languages I simply don't understand where you and Mike are getting this ridiculous assertions from. What problems are you fantasizing you've solved here? Where are your purported proofs? > Note2: having pointed out that the optimization is possible, I should > say that it's not always desireable. If the array 'x' is statically > allocated, many mainframes and minis have address modes which would > do the add in the memory unit for free. Good point. All the problems, reappear, though, as soon as you have more than one addition. In x[i + 1], do you precompute *x + 1? Or *x + i? Or the whole thing? Or maybe *x + j, because x[j] is referenced a lot more? ---Dan
jlg@lanl.gov (Jim Giles) (11/10/90)
From article <24232:Nov905:50:0690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > [...] > That's exactly what I'm talking about! The i's and j's and other > integers used to index the arrays are *added* together. The program > could have x[i + j - 17] and lots of other weird expressions, all > evaluated conditionally through the control structure of the program. > [...] Ok. So how do _you_ propose to optimize this using pointers? Do you plan to do the following: p = &x + i + j - 17; /*... lots of code ...*/ a = *p If that's what you plan, then I will do: ii = i + j - 17 /*... lots of code ...*/ a = x[ii] And the optimizer can promote the add of the base of 'x' back to the point where 'ii' was computed. Well, maybe you plan to do: if (some_condition) p = &x + an_expression else p = &x + another_expression; /*... lots of code ...*/ a = *p Then I'll do this: if (some_condition) ii = an_expression else ii = another_expression; /*... lots of code ...*/ a = x[ii] And the compiler can promote the add to the place where the two if branches reconverge - or even up each branch! This promotion problem is O[N] in the number of basic blocks if the language is "structured". The fact is, no matter _how_ you precompute the pointer value, the array index can be precomputes _exactly_ the same way except for the add of the array base - which can be promoted back to where the rest of the calculation if the compiler writer wants it to be. Of course, in the case of static arrays, you usually don't want it to be - at least on most mainframes. J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/10/90)
In article <5491@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
[ given any code using pointer arithmetic, the compiler can convert ]
[ it into array code that is just as fast ]
That's true. However, the compiler cannot figure out the optimal code
either way! You're ignoring the problem once again.
You *have* to let the programmer shift his arrays manually, because no
algorithm can produce the optimal shifts for a program.
Jim, you've made a blanket assertion that arrays are as efficient as
pointers. I've proven you wrong. Give up.
---Dan
meissner@osf.org (Michael Meissner) (11/12/90)
In article <123@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes: | Beat me! Whip me! Make me use an optimiser! | | Isn't this kind of silly? Are there really compilers that force people to | use the optimiser or is it optional? If it's optional and you don't want, | don't use. If you can buy a compiler without an optimiser (which is easy), | do so. If nobody wants an optimiser, nobody buys them. If somebody wants | one, why not let them buy it? I once worked on a language (DG/L at Data General), in which it had NO option to turn off the optimizer. It was 9 passes plus a control program that ran in a 64K address space (and no segments to extend the address space), and had some optimizations that the later DG compilers never did, such as doing interprocedural analysis with code motion from internal subroutines to their caller's. I've also found a few cases in finishing GCC ports for the 88k and MIPS, where optimizing the code produced correct code, and not optimizing produced compiler error messages. This is because constraints in the machine description were not quite correct, but the optimizer smoothed things over by putting variables into registers. I personally never compile any code, except test suites, without enabling both debug and 'full' optimization these days, unless I'm using a vendor's PCC derived compiler. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?
misha@ai.mit.edu (Mike Bolotski) (11/13/90)
In article <24232:Nov905:50:0690@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: |> In article <5351@lanl.gov> jlg@lanl.gov (Jim Giles) writes: |> |> (Note that I have demonstrated in another article that Mike is wrong.) Yeah, right. |> [ Jim waxes prolific: ] |> > The only question is "can the compiler promote the add of the base |> > of array 'x' up to the point where 'i' is precomputed and therefore |> > get the same efficiency as the precomputed pointer?" Turns out that |> > this problem is not even NP, it's O(N^2) for languages which permit |> > arbitrary GOTOs and it's O(N) for "structured" languages |> |> I simply don't understand where you and Mike are getting this ridiculous |> assertions from. What problems are you fantasizing you've solved here? |> Where are your purported proofs? Read a book, Dan. You've chosen to ignore Jim's suggestion of the Aho, Sethi, and Ullman text, *in the very article you quote.* Chapters 9 and 10 should clear up your confusion about reducible flow graphs, conservative vs non-conservative optimizations, and various aspects of aliasing. (Although the rest of the books is also quite informative and would probably save this newsgroup several megabytes of explanations. -- Mike Bolotski Artificial Intelligence Laboratory, MIT misha@ai.mit.edu Cambridge, MA
jlg@lanl.gov (Jim Giles) (11/14/90)
From article <7001:Nov1008:37:2690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > [...] > Jim, you've made a blanket assertion that arrays are as efficient as > pointers. I've proven you wrong. Give up. You have not even posted _one_ counter-example. You have only posted things the the compiler _could_ have optimized the array version as well (or better than) your hand optimized pointer version. The point you seem to miss is that _all_ of you hand-optimizations apply to the array versions as well. The only difference between one-d array notation and pointers is the place at which the _base_ of the array is added. I have amply demonstrated that _this_ part of the optimization` can _easily_ be handled by the compiler. J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/14/90)
In article <5810@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > From article <7001:Nov1008:37:2690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > > Jim, you've made a blanket assertion that arrays are as efficient as > > pointers. I've proven you wrong. Give up. > You have not even posted _one_ counter-example. As part of my proof I have posted the general form that a counterexample would take. Namely, if you show me an algorithm that can supposedly shift arrays optimally, I'll find a program for which you cannot detect which of two branches is taken (independently of the input). I'll then use x[0] (or x[a + b], or whatever) in one branch, and x[1] (or whatever) in the other branch. You won't be able to figure out whether it's better to precompute &x[0] or to precompute &x[1]. Such a program must exist. However, no counterexample can exist for *all* algorithms, just as no single proof can exist that *all* computer programs will fail to find. Surely you understand this. > The point you seem to miss is that _all_ of you hand-optimizations > apply to the array versions as well. You're quite right. It's also true that a computer program can read in a rigorous, formal proof, and in linear time either say it's correct or exhibit the error. That is irrelevant to *finding* the proof in the first place, just as your statement is irrelevant to *finding* the hand optimizations in the first place. ---Dan
jlg@lanl.gov (Jim Giles) (11/14/90)
From article <5059:Nov1323:43:2290@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > [...] > As part of my proof I have posted the general form that a counterexample > would take. Namely, if you show me an algorithm that can supposedly > shift arrays optimally, I'll find a program for which you cannot detect > which of two branches is taken (independently of the input). I'll then > use x[0] (or x[a + b], or whatever) in one branch, and x[1] (or > whatever) in the other branch. You won't be able to figure out whether > it's better to precompute &x[0] or to precompute &x[1]. Such a program > must exist. Fine. The compiler won't be able to determine this one. Neither will the human programmer. If the programmer knows a priori which branch will be taken, why is there a branch? And, if he doesn't know a priori which branch will be taken, then he doesn't know which pointer to precompute. And, since he doesn't know which to precompute, he must guess: which is what the compiler will do as well. Now, there are certain trade-offs about who does the guessing. The programmer _may_ have advance information about the relative frequency of the branches taken. For this reason, it might be better to let the programmer do the guessing. The compiler, on the other hand, knows the register allocation and the instruction pipelining: so it knows which of the branches offers the best chance of completely covering the precomputing of the array index. For this reason, it might be best to let the compiler choose. The best solution is probably to let the compiler choose and give the programmer the ability to assert information about the frequency of each branch. In any case, this contrived example only saves a fractional add at best (an add is saved each time you take the branch that you precomputed for, one is lost each time you take the other branch, the average "savings" is a fraction of an add). The problem is, it's possible that the programmer will pick the wrong one to precompute. He may pick the most frequently taken branch to precompute for and then lose time anyway because the computation on the other branch would have pipelined better. On most compilers, it will probably decide to precompute _both_ anyway - even when you use explicit pointers. J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/14/90)
In article <5839@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > From article <5059:Nov1323:43:2290@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > > As part of my proof I have posted the general form that a counterexample > > would take. Namely, if you show me an algorithm that can supposedly > > shift arrays optimally, I'll find a program for which you cannot detect > > which of two branches is taken (independently of the input). I'll then > > use x[0] (or x[a + b], or whatever) in one branch, and x[1] (or > > whatever) in the other branch. You won't be able to figure out whether > > it's better to precompute &x[0] or to precompute &x[1]. Such a program > > must exist. > Fine. The compiler won't be able to determine this one. Thank you for admitting it! I'm glad we agree that you were wrong when you said that the computer can make arrays as efficient as pointers in all cases. > Neither will > the human programmer. Bullshit. You removed all my analogies to theorem-proving: the analogy here is ``Fine. The computer won't be able to figure out proofs for this one. Neither will the mathematician.'' Just because no single algorithm will handle every case doesn't mean people can't! > If the programmer knows a priori which branch > will be taken, why is there a branch? Maybe he didn't realize at first. But I know what you're saying, and in the proof (which you apparently haven't read) I pointed out why that's irrelevant: all you need is for one branch to be taken with much higher probability than another. As I'm sure Herman will instantly agree, this is an extremely common case in real numerical code. > And, if he doesn't know a priori > which branch will be taken, then he doesn't know which pointer to > precompute. And, since he doesn't know which to precompute, he must > guess: No. This is like saying ``And, since he doesn't know a priori how to prove the theorem, he won't be able to find a proof.'' All the best efforts of AI aside, we must assume that, for any given program, *some* programmer will be able to optimize better than the compiler. We must not take away his tools for doing so. [ a reasonable summary of programmer-versus-compiler in general ] > The best solution is > probably to let the compiler choose and give the programmer the > ability to assert information about the frequency of each branch. Some people would argue for a third solution, namely profiling; but there are many cases when basing optimizations on profiling can be very dangerous. (You kill your worst case in favor of your average case.) Anyway, I don't care whether you let the programmer specify array shifts imperatively or functionally. Just let him do it. > In any case, this contrived example ``Simplified example.'' Since you refused to think through real examples to come to the same conclusion. > only saves a fractional add at > best (an add is saved each time you take the branch that you > precomputed for, one is lost each time you take the other branch, > the average "savings" is a fraction of an add). Cripes. What if there are several pointers (arrays) being dealt with? What if this is in a loop that will be executed a trillion times? We are talking about optimizations here, so I don't understand your attitude. > The problem is, > it's possible that the programmer will pick the wrong one to > precompute. That's true. People make mistakes. So what? Why do you keep bringing up these points that we all accept? Lots of things are possible. Saying that the programmer shouldn't be *allowed* to make this choice because he might make a mistake is like saying that people shouldn't program in the first place because they might make a mistake. Who cares? > On most compilers, it > will probably decide to precompute _both_ anyway - even when you use > explicit pointers. Sometimes this isn't possible. Remember that space optimization is important too: if the less-executed branch is only executed one millionth of the time, the code and data space for the extra computation is not worthwhile. ---Dan
cik@l.cc.purdue.edu (Herman Rubin) (11/14/90)
In article <5810@lanl.gov>, jlg@lanl.gov (Jim Giles) writes: > From article <7001:Nov1008:37:2690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > > [...] > > Jim, you've made a blanket assertion that arrays are as efficient as > > pointers. I've proven you wrong. Give up. > > You have not even posted _one_ counter-example. You have only > posted things the the compiler _could_ have optimized the array > version as well (or better than) your hand optimized pointer version. Well, here is an example which will definitely not work on all machines. It is desired to move 3 bytes into the three least significant bytes of a 4-byte word (other sizes are appropriate), and repeat this operation. This can be done efficiently by using pointers to words on a machine with unaligned reads. Even with overhead for unaligned reads, it is hard to see how to do this efficiently otherwise. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
jlg@lanl.gov (Jim Giles) (11/15/90)
From article <4857:Nov1405:39:3590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > [...] > Thank you for admitting it! I'm glad we agree that you were wrong when > you said that the computer can make arrays as efficient as pointers in > all cases. I admitted nothing of the sort. I admitted that there were optimizations the _humans_ could find that compilers couldn't. I _don't_ admit that pointers are in any way superior to arrays in allowing humans to express those optimizations to the compiler. I also pointed out that the _human_ is not in possession of important information about the algotithm that the _compiler_ does know (like pipelining, instruction selection, and register allocation - which the compiler makes the final judgement about even in languages which allow programmers to hint about things like this). Since the human doesn't know this stuff, his choice of which to optimize in the case you gave is usually pure guesswork. > [...] > [ a reasonable summary of programmer-versus-compiler in general ] >> The best solution is >> probably to let the compiler choose and give the programmer the >> ability to assert information about the frequency of each branch. > > Some people would argue for a third solution, namely profiling; but > there are many cases when basing optimizations on profiling can be very > dangerous. (You kill your worst case in favor of your average case.) Which is why it is better to let the programmer give assertions. The human is then free to decide whether the worst case is important enough to optimize or not. > [...] > Anyway, I don't care whether you let the programmer specify array shifts > imperatively or functionally. Just let him do it. Quite. When did I say otherwise? > [...] >> The problem is, >> it's possible that the programmer will pick the wrong one to >> precompute. > > That's true. People make mistakes. So what? [...] Well, since you don't seem to know, I'll tell you so what. In this particular case, even _very_ clever people will make mistakes at about a rate of 50%. The only language type which allow programmers to make this decision with a better batting average is assembly (where the programmer does _all_ register allocation and instruction selection). > [...] Saying > that the programmer shouldn't be *allowed* to make this choice because > he might make a mistake is like saying that people shouldn't program in > the first place because they might make a mistake. Who cares? Further, who said anything about not allowing the user to make the choice? I didn't. I only said that the user should not be forced to the level of pointers to have this type of control. > [...] >> On most compilers, it >> will probably decide to precompute _both_ anyway - even when you use >> explicit pointers. > > Sometimes this isn't possible. Remember that space optimization is > important too: if the less-executed branch is only executed one > millionth of the time, the code and data space for the extra computation > [...] Oh, your example implied that space wasn't an issue. You wrote glibly about having a choice of saving &x[0] or &x[1]. Clearly, in all programming languages I know of, &x[0] is _automatically_ saved for the whole lifespan of the array. Only in C is it even _possible_ to lose the base address of an array (without losing the _whole_ array at the same time). So, if you're talking about precomputing &x[1] and saving it _in_addition_ to &x[0], you obviously didn't think saving space was meaningful. If you were talking about precomputing &x[1] and saving it _instead_of_ &x[0], I would object that it's bad style to lose track of the base of the array. J. Giles
jlg@lanl.gov (Jim Giles) (11/15/90)
From article <2732@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin): > [...] > Well, here is an example which will definitely not work on all machines. > It is desired to move 3 bytes into the three least significant bytes of > a 4-byte word (other sizes are appropriate), and repeat this operation. > This can be done efficiently by using pointers to words on a machine with > unaligned reads. Even with overhead for unaligned reads, it is hard to > see how to do this efficiently otherwise. It's hard to see what this has to do with the array vs. pointer issue which was being discussed. It looks to me like you've got a word that's being 'mapped' as an array of bytes and your loop is filling the bottom three bytes. Where are the pointers? J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)
Look, Jim. You were saying a few weeks ago that arrays (as in Fortran, or Pascal, or Ada) could always be made as efficient as pointers (as in C, or, of course, in machine language). You didn't prove your claim, and various people disagreed with you. Now I've shown that they were right and your claim is wrong. There is *no* way that an automatic optimizer can always produce the optimal pointer version of pointer-free code. Give up. The crucial property of pointers here is that they can express array shifts. I am perfectly willing to accept that C's pointer arithmetic may not be the best way to express array shifts, and I'm interested in your views on better ways to express them. But Fortran doesn't have them. Neither does Pascal. Neither does Ada. Those languages are suboptimal. Detailed responses follow. In article <5950@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > From article <4857:Nov1405:39:3590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > > Thank you for admitting it! I'm glad we agree that you were wrong when > > you said that the computer can make arrays as efficient as pointers in > > all cases. > I admitted nothing of the sort. I admitted that there were > optimizations the _humans_ could find that compilers couldn't. Somehow I don't see the difference, but you said it better. There are optimizations of array code into pointer code that humans can find and that compilers can't. Glad we agree. > I > _don't_ admit that pointers are in any way superior to arrays in > allowing humans to express those optimizations to the compiler. Huh? I defined what I meant by those terms pretty precisely in my proof. Maybe your idea of ``arrays'' includes array shifting, or pointer arithmetic, or whatever you want to call it. That has nothing to do with arrays in Fortran, or Pascal, or Ada. > I also pointed out that the _human_ is not in possession of important > information about the algotithm that the _compiler_ does know Sure, but that's irrelevant to this subthread. > Which is why it is better to let the programmer give assertions. Again, I don't care how you let the programmer do his work. However, you are abusing the word ``assertions'' in the same way that you are abusing ``array'' above. Assertions in existing languages (and compilers) simply don't have the power to express the level of decisionmaking that has to go on here. > >> The problem is, > >> it's possible that the programmer will pick the wrong one to > >> precompute. > > That's true. People make mistakes. So what? [...] > Well, since you don't seem to know, I'll tell you so what. In this > particular case, even _very_ clever people will make mistakes at about > a rate of 50%. Obviously you like hand optimization about as much as you like C. You have the same amount of prejudice against each. Since you never bother to learn how to optimize by hand or to program in C, you never believe that either can be effective. The vicious cycle continues. > Further, who said anything about not allowing the user to make the > choice? I didn't. I only said that the user should not be forced > to the level of pointers to have this type of control. So propose a different mechanism! Q has a set of array slicing mechanisms more powerful than Fortran 90's. I *think* they eliminate the need for pointer arithmetic. I've thought so for months. But I'm not sure. I just haven't tried it on enough real code to be absolutely sure that it does the job, and I haven't spent enough time working it out on paper. > If you were talking about > precomputing &x[1] and saving it _instead_of_ &x[0], I would object > that it's bad style to lose track of the base of the array. Oh, great. We're talking about optimizations and now you introduce ``style.'' Gee, Jim, what do you think of the ``style'' of optimized output from the Convex compiler? I must say it's so unfashionable. Ultrix is much more hip. Of course I was talking about saving &x[1] instead of &x[0]. If you had read the proof you would understand this. ---Dan
jlg@lanl.gov (Jim Giles) (11/16/90)
From article <5494:Nov1519:06:3790@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > Look, Jim. You were saying a few weeks ago that arrays (as in Fortran, > or Pascal, or Ada) [...] Actually, I said arrays as in _my_ suggested language. I used Pascal, etc. as examples that would be familiar to the rest of the net in order to counter your weak attempts to make pointers look good. You have yet to pose a _specific_ example of a case that even these languages couldn't optimize. There is some loudmouth on the net who (in another thread) is making very insulting remarks about computing scientists being too theoretical an not being practical. He is insisting on specific practical experience as a measure of a program's efficiency. In this discussion, _you_ are refusing to give a specific example on the grounds that no particular example can fool _all_ optimizers. Fair enough. But if the ones that fool a specific optimizer are so _rare_ that you can't find one, the for _practical_ purposes, the problem doesn't exist. > [...] There is *no* way that an automatic optimizer > can always produce the optimal pointer version of pointer-free code. There is (by _your_ own earlier statements) no automatic optimizer that can always produce the optimal pointer version of code that _has_ pointers either. You have been consistently pointing out that the optimizations you are referring to must be carried out by hand. Fine. I prefer to make _specific_ statements to the optimizer about what it should do. That way, no one (like the programmer himself at a later date) will get confused about why the code is stated in the way it is. So, an explicit directive to the compiler that a path through the code is more common than another path will be specific (and easier to verify later). An implicit statement of that same fact (like, by reording the computation of pointer expressions) will lead the later readers of the code to ask "why did he/I do that?" > [...] I am perfectly willing to accept that C's pointer arithmetic may > not be the best way to express array shifts, and I'm interested in your > views on better ways to express them. [...] Finally. [...] > [...] But Fortran doesn't have them. > Neither does Pascal. Neither does Ada. Those languages are suboptimal. Yet, all these languages can handle the _specific_ examples that you've posted. Frequency assertions to the compiler would solve the rest of the ones that you claim are "theoretically" out there. > [...] >> I admitted nothing of the sort. I admitted that there were >> optimizations the _humans_ could find that compilers couldn't. > > Somehow I don't see the difference, but you said it better. There are > optimizations of array code into pointer code that humans can find and > that compilers can't. Glad we agree. We don't. The following statement is part of the same paragraph and thought as the preceding. >> I >> _don't_ admit that pointers are in any way superior to arrays in >> allowing humans to express those optimizations to the compiler. This is what _I_ said. > [...] >> I also pointed out that the _human_ is not in possession of important >> information about the algotithm that the _compiler_ does know > > Sure, but that's irrelevant to this subthread. Why is every thing that doesn't support _your_ position suddenly not relevant to the thread? Your entire claim here is that the human can _always_ beat the compiler at its own game. This is only true when the human has perfect knowledge of what the compiler's game is. Again, I don't oppose letting the programmer have the tools to _try_ to beat the compiler with. I'm just pointing out that it _is_ relevant that the compiler has some hidden cards in its hands. >> Which is why it is better to let the programmer give assertions. > > Again, I don't care how you let the programmer do his work. However, you > are abusing the word ``assertions'' in the same way that you are abusing > ``array'' above. Assertions in existing languages (and compilers) simply > don't have the power to express the level of decisionmaking that has to > go on here. Again, you are restricting the argument to a subset of the original subject. In your f77 vs. cc thread about interprocess aliasing, the thread was begun by _you_ and was restricted to _existing_ languages. This thread has _no_ such restrictions. So, the fact that assertions in existing languages don't do a this particular thing (express the expected frequency of a decision point), is _not_ an argument against using this solution. What _is_ an argument against using this is that there are, as yet, no _specific_ examples which require it. > [...] > Obviously you like hand optimization about as much as you like C. You > have the same amount of prejudice against each. Since you never bother > to learn how to optimize by hand or to program in C, you never believe > that either can be effective. The vicious cycle continues. I hand optimize in ASSEMBLY all the time. I can pipeline and do register allocation better than the compilers I have (which, by the way are _really_ good: most people _can't_ beat them - I'm bragging, but it's really true: I wouldn't bother to use assembly if I couldn't beat the compiler :-). I don't hand optimize C because the compiler usually re-pessimizes the code. As to whether C coding is effective, I can only judge from experience. Every code that I've ever recoded from C to some other language runs _faster_ in the other language. This is a _uniform_ observation with no exceptions _at_all_. This _may_ be that the C code available to me is abnormally bad. But, it seems unlikely. It seems more likely that the C code I've seen is typical. J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)
The saga continues: Jim pesters Dan for a *specific* example of code that real optimizers can't handle. Dan is surprised. He thought that Jim was reading the group while people showed specific examples that tripped up Sun's optimizer. But just to be sure he gives an example that trips up gcc. Life goes on. In article <6063@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > From article <5494:Nov1519:06:3790@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > > Look, Jim. You were saying a few weeks ago that arrays (as in Fortran, > > or Pascal, or Ada) [...] > Actually, I said arrays as in _my_ suggested language. I used Pascal, > etc. as examples that would be familiar to the rest of the net in order > to counter your weak attempts to make pointers look good. You have > yet to pose a _specific_ example of a case that even these languages > couldn't optimize. You want a specific example? Were you reading this group when somebody pointed out how Sun's optimizer couldn't produce even near-optimal code for a simple loop through an array? Matt pointed out that gcc did much better. gcc, however, will produce suboptimal code for for (j = 0;j < 100;j++) a[i * j] = ...; As you now understand, no optimizer will handle all cases. For any optimizer you show me, there exists array code that it can't convert into pointer code as well as possible. What matters in practice is that real optimizers trip up on such simple code. Even good Fortran optimizers don't do very well on moderately complex code. > There is some loudmouth on the net who (in another thread) is making > very insulting remarks about computing scientists being too theoretical > an not being practical. He is insisting on specific practical experience > as a measure of a program's efficiency. Yes, and I'm sure he will continue to do so. He is not a computer scientist: he uses computers to get things done. He gets annoyed when computer scientists make ridiculous assertions about real-world problems just because their models aren't sufficiently accurate or because they haven't bothered to think. > In this discussion, _you_ are refusing to give a specific example on > the grounds that no particular example can fool _all_ optimizers. > Fair enough. Thank you for saying so. So you *do* admit that you were wrong when you said that arrays (as in Fortran) can always be made as efficient as pointers (as in C)? Yes? You do? Stop beating around the bush. A week ago you were saying that arrays were as efficient as pointers, and that somewhat better compilers than we have today would suffice. Now we know that you were wrong. Do you admit it? By the way, I fully expect that you will chop up the above paragraph and fail to quote the most important parts. That's your argument style. > But if the ones that fool a specific optimizer are > so _rare_ that you can't find one, When did that ridiculous idea come out of the blue? I merely thought that you had been paying attention to the examples already posted in this group. > > [...] There is *no* way that an automatic optimizer > > can always produce the optimal pointer version of pointer-free code. > There is (by _your_ own earlier statements) no automatic optimizer > that can always produce the optimal pointer version of code that > _has_ pointers either. Where did that ridiculous idea come out of the blue? I don't know what earlier statements you're referring to. I certainly don't agree with the conclusion. > >> I also pointed out that the _human_ is not in possession of important > >> information about the algotithm that the _compiler_ does know > > Sure, but that's irrelevant to this subthread. > Why is every thing that doesn't support _your_ position suddenly > not relevant to the thread? Huh? It supports neither my position nor the opposite of my position in this subthread. It's simply irrelevant. See below. > Your entire claim here is that the > human can _always_ beat the compiler at its own game. Where did that ridiculous idea come out of the blue? I have never said any such thing. I have said that it is *not* true that the compiler can always beat the human at optimizations. Theory tells us that automatic tools will never be good enough; in practice, programmers who put the effort into hand optimization will get better results than optimizers. As one specific case, Fortran arrays can't be shifted; machine language arrays (pointers) can. Hence it is not true that the compiler can always convert Fortran code into machine language code making the best use of pointers. Hence the programmer should be given array shifting tools himself---through, for instance, C pointers. I certainly don't believe that the compiler is impotent; in fact, I said a week ago in this group that I trust the compiler to do (low-level) instruction scheduling. Herman disagreed; take it to him. > Again, I don't oppose letting the programmer have the tools to > _try_ to beat the compiler with. I'm just pointing out that it > _is_ relevant that the compiler has some hidden cards in its hands. Why? I just don't see its relevance to this issue. All I've been trying to do in this subthread is make it clear that C pointers, with their ``awful'' arithmetic, can make some code more efficient than in Fortran. That's all. > > Obviously you like hand optimization about as much as you like C. You > > have the same amount of prejudice against each. Since you never bother > > to learn how to optimize by hand or to program in C, you never believe > > that either can be effective. The vicious cycle continues. > I hand optimize in ASSEMBLY all the time. Good to see you're awake. However, I think you'd find some rather impressive gains from hand optimization in any language, if you just took the time to practice. ---Dan
jlg@lanl.gov (Jim Giles) (11/16/90)
From article <9576:Nov1523:11:0990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > [...] Matt pointed out that gcc did much > better. gcc, however, will produce suboptimal code for > > for (j = 0;j < 100;j++) > a[i * j] = ...; Huh? This is a _trivial_ example of induction variable elimination. I suspect that the compiler couldn't optimize it because it couldn't tell whether 'i' was aliased to part of 'a' or not. This only happens in languages in which pointers and arrays are confused. It doesn't happen in languages which _really_ have arrays (as opposed to a shorthand that's preprocessed into pointers). > [...] >> I hand optimize in ASSEMBLY all the time. > > Good to see you're awake. However, I think you'd find some rather > impressive gains from hand optimization in any language, if you just > took the time to practice. I do hand optimize in other languages. the technique is just different in each. In high-level languages (where I don't have direct control over instruction selection or register allocation), I don't _try_ to hand optimize in ways which requires such control to be successful. In C, I hand optimize by converting it into another language which runs faster. J. Giles By the way, this is my _last_ response to Dan Bernstein. It is not worth the aggravation to try to discuss anything with someone whose only discussion technique is invective. If (an very implausible if) Dan ever writes something with useful technical merit I will discuss it in a general way without naming names. I suspect that the only reason he doesn't discuss things more rationally is that he's afraid to admit he's wrong. If I don't mention him directly, maybe he won't feel personally threatened.
gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) (11/16/90)
In article <9576:Nov1523:11:0990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Matt pointed out that gcc did much better. My name is spelled "Greg". Matt Crawford may also be an astronomer, but my hair is much longer. Trivial to tell us apart. > gcc, however, will produce suboptimal code for > > for (j = 0;j < 100;j++) > a[i * j] = ...; If you forget to turn on strength reduction, you deserve what you get. >Even good Fortran >optimizers don't do very well on moderately complex code. I showed you code in which FORTRAN does well, and you have to hand-optimize C to beat it. Please don't forget history, or you will have to repeat it. The best part of "discussing" anything with you is that you refuse to learn.
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)
In article <6096@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > From article <9576:Nov1523:11:0990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > > [...] Matt pointed out that gcc did much > > better. gcc, however, will produce suboptimal code for > > for (j = 0;j < 100;j++) > > a[i * j] = ...; > Huh? This is a _trivial_ example of induction variable elimination. I agree entirely. But gcc can't do it. (Sun's optimizer sort of does, though as usual it fails to compress the j computation.) All you wanted was real examples for real optimizers. I had thought that you were listening when other people posted real examples. Apparently not. Now I've posted the examples for you. > I suspect that the compiler couldn't optimize it because it couldn't > tell whether 'i' was aliased to part of 'a' or not. No, there was no danger of this. Sun's optimizer can tell. gcc's can too, when the ``i *'' is removed. > In C, I hand optimize by converting it into another language which > runs faster. I think your definition of ``hand optimize'' is a bit screwy here, though I sympathize with what you're saying. The way people optimize Ada is to write it in a different language first, then run it through a preprocessor. > By the way, this is my _last_ response to Dan Bernstein. It is not > worth the aggravation to try to discuss anything with someone whose > only discussion technique is invective. You always do this. Whenever you lose an argument you first blithely assume that you were right all along, then try to squirm out of your mistakes, then give up communication. > If I don't mention him directly, > maybe he won't feel personally threatened. Stop projecting your paranoia onto other people. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)
In article <1990Nov16.022829.19283@murdoch.acc.Virginia.EDU> gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) writes: > In article <9576:Nov1523:11:0990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >Matt pointed out that gcc did much better. > My name is spelled "Greg". Sorry, misspoke. > > gcc, however, will produce suboptimal code for > > for (j = 0;j < 100;j++) > > a[i * j] = ...; > If you forget to turn on strength reduction, you deserve what you get. I turned on strength reduction; it gets in trouble with the variable multiplication. Maybe the version here is older than yours? > >Even good Fortran > >optimizers don't do very well on moderately complex code. > I showed you code in which FORTRAN does well, and you have to > hand-optimize C to beat it. All I meant by ``don't do very well'' is that I can easily do better in machine language. We are, after all, talking about whether Fortran arrays are always as efficient as pointers (as in machine language, for instance). I didn't say anything about C. I pointed out that hand optimization only because it seemed appropriate. I also idly wondered whether there was any connection between the hand optimization thread and the pointer thread, and explicitly refused to make any philosophical comments about it. I'm still not sure what can be said in general. So your response really looks like a non sequitur, and a rather vapid one at that. ---Dan
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/16/90)
Please do NOT take this seriously! In article <6063@lanl.gov>, jlg@lanl.gov (Jim Giles) writes: > As to whether C coding is effective, I can only judge from > experience. Every code that I've ever recoded from C to some other > language runs _faster_ in the other language. This is a _uniform_ > observation with no exceptions _at_all_. I'd _love_ you to recode some C programs into Lisp for me. (I'd also like to see you recode my screen editor in Cobol. <grin>) > This _may_ be that the C > code available to me is abnormally bad. But, it seems unlikely. It > seems more likely that the C code I've seen is typical. I would argue that it is typical in being abnormally bad. I suggest that the basic problem in the text I've quoted is that C is a good language for people who know what they are doing, but that there is rather more crud around in C than Sturgeon's Law would predict, and that Jim Giles is just a better programmer than most. Certainly someone who knows about Fortran 90 has more ways available of thinking about a program than someone who only knows the two nearly identical languages C and Pascal (gee, it would be nice if C had arrays). Bearing in mind that every time I've recoded from Pascal or Fortran into C, I've had a speedup, and that 2:1 ratio in the speed of code produced by different C compilers on the same machine is quite common, may I respectfully suggest [here comes the serious bit] that if anyone wants to argue "my language _is_ faster than yours" they present actual examples (such and such a program available by anonymous FTP from such and such a site) with hard numbers, and that people wanting to argue "my language _could_ be faster than yours" remember that a major factor in determining whether it _will_ be faster is the _market_ for better compilers? (Does anyone sell high-quality Jovial compilers for Encore Multimaxes? I doubt it.) [end of serious bit] I'm reminded of a debate we had when I was an undergraduate. The topic was "will high-level languages effectively replace assembler?" Both sides were wrong, the answer was "no, but C will". -- The problem about real life is that moving one's knight to QB3 may always be replied to with a lob across the net. --Alasdair Macintyre.
seanf@sco.COM (Sean Fagan) (11/18/90)
*AAAAAAAAARRRRRRRRRRRRRRGGGGGGGGGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHHHH!!!!!!!!!!!* -- -----------------+ Sean Eric Fagan | "*Never* knock on Death's door: ring the bell and seanf@sco.COM | run away! Death hates that!" uunet!sco!seanf | -- Dr. Mike Stratford (Matt Frewer, "Doctor, Doctor") (408) 458-1422 | Any opinions expressed are my own, not my employers'.
meissner@osf.org (Michael Meissner) (11/20/90)
In article <9576:Nov1523:11:0990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: | You want a specific example? Were you reading this group when somebody | pointed out how Sun's optimizer couldn't produce even near-optimal code | for a simple loop through an array? Matt pointed out that gcc did much | better. gcc, however, will produce suboptimal code for | | for (j = 0;j < 100;j++) | a[i * j] = ...; | | As you now understand, no optimizer will handle all cases. For any | optimizer you show me, there exists array code that it can't convert | into pointer code as well as possible. What matters in practice is that | real optimizers trip up on such simple code. Even good Fortran | optimizers don't do very well on moderately complex code. GCC does do the optimization on the MIPS (providing you use strength reduction). I suspect the array a was an integer array, since the Sparc does not have an integer multiplication, gcc internally turned the thing into calling __mulsi3, which it no longer reconizes as a multiply. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?
smryan@garth.UUCP (Steven Ryan) (11/20/90)
>Fine. The compiler won't be able to determine this one. Neither will >the human programmer. If the programmer knows a priori which branch I haven't been following all the fascinating details (yawn) of this discussion, but if the arguments are based on this assumption, they are invalid. That is, the assumption that you must only consider what a reasonable programmer would do. Unless you have an algorithm for distinguishing reasonable and unreasonable programs. Regardless of whether humans are computationally more powerful than computers, an argument must consider any possible program, whether from a reasonable programmer or a roomful of Cobol programmers (or monkeys, if there is any difference). For my sake, please, just pretend to be rigorous. Like all that nonsense that computer=fsm without any demonstration that the total external media is a priori bounded. -- ...!uunet!ingr!apd!smryan Steven Ryan ...!{apple|pyramid}!garth!smryan 2400 Geng Road, Palo Alto, CA
bglenden@mandrill.cv.nrao.edu (Brian Glendenning) (11/22/90)
In article <PCG.90Nov8163939@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
This discussion reminds me of Dijkstra's famous address in 1968 to some
(the IFIP?) conference, on the software crisis, that went like this:
We have a software crisis, and we are told that we must advance the
state of the art so as to solve it. We are only given a few constraints
to satisfy:
no changes to the hardware platforms
no changes to the languages
no changes to the operating systems
no changes to the education of programmers
no changes to the software tools
no changes to organizational practices
...
Well, this is RIDICULOUS!
Does anyone have the correct reference for this (or even the actual
text?) Thanks!
Brian
--
Brian Glendenning - National Radio Astronomy Observatory
bglenden@nrao.edu bglenden@nrao.bitnet (804) 296-0286
meeuse@idca.tds.PHILIPS.nl (Jaap A. Meeuse) (11/23/90)
In article <BGLENDEN.90Nov21143012@mandrill.cv.nrao.edu> bglenden@mandrill.cv.nrao.edu (Brian Glendenning) writes: >In article <PCG.90Nov8163939@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: > This discussion reminds me of Dijkstra's famous address in 1968 to some > (the IFIP?) conference, on the software crisis, that went like this: > > We have a software crisis, and we are told that we must advance the > state of the art so as to solve it. We are only given a few constraints > to satisfy: > > no changes to the hardware platforms > no changes to the languages > no changes to the operating systems > no changes to the education of programmers > no changes to the software tools > no changes to organizational practices > ... > > Well, this is RIDICULOUS! > >Does anyone have the correct reference for this (or even the actual >text?) Thanks! > >Brian >-- > Brian Glendenning - National Radio Astronomy Observatory >bglenden@nrao.edu bglenden@nrao.bitnet (804) 296-0286 The actual text: Dijkstra: I would like to comment on the distinction that has been made between practical and theoretical people. I must stress that I feel this distinction to be obsolete, worn out, inadequate and fruitless. It is just no good, if you want to do anything reasonable, to think that you can work with such simple notions. Its inadequacy, amongst other things, is shown by the fact that I absolutely refuse to regard myself as either impractical or not theoretical. David expressed the idea that "We can make case studies in industry, and then you can study the results. You can do these analysis." A probable effect of the distinction is that if such a study is made, the aoutput of it can be reagrded as theory and therefore ignored. What is actually happening, I am afraid, is that we all tell each other and ourselves that software engineering techniques should be improved consideraly, because there is a crisis. But there are a few boundary conditions which apparently have to be satisfied. I will list them for you: 1 We may not change our thinking habits. 2 We may not change our programming tools. 3 We may not change our hardware. 4 We may not change our tasks. 5 We may not change the organizational set-up in which the work has to be done. Now under these five immutable boundary conditions, we have to try to improve matters. This is utterly ridiculous. Thank you. (Applause) This was copied from page 13 of the: Report on a conference sponsored by the NATO SCIENCE COMMITTEE Rome, Italy, 27th to 31th October 1969: SOFTWARE ENGINEERING TECHNIQUES Copy of the contents of its page 2: The present report is available from: Scientidic Affairs Division NATO Brussels 39 Belgium Publishers: NATO Science Committee Printed in England at The Kynoch Press, Birmingham IMHO, more than 20 years later, these boudaries have shown to be not absolutely immutable: although we know that they cannot be changed under our direct control, we see that they are changing (yes, not too fast). 1 thinking habits: at least some of us are (becoming) aware that programming influences our thinking habits: especially programmers (like philosophers) tend to be utterly curious to know the exact semantics of what is said/written. Evidently this does not simplify their communications with other people, but it seems to become one of their characteristic thinking habits because they have to model such semantics in the results of their activities. 2 programming tools: at least some fundamental changes can be observed: - support for more formal approaches (VDM / Z and Eiffel) - functional programming with it lazy-evaluation (Prolog) 3 hardware: here again: - parallel processing and neural networks - integration of vidio (text, pictures and other data) with audio (sound) 4 tasks: hopefully most of us have learnt that two types of complexity should be distinguished: - the intrinsic complex tasks: those that we don't know how to divide into tasks which are less complex (e.g. travelling salesman) - the extrinsic complex tasks: those that can be broken down into less complex tasks. Fortunately, nearly all of us are working on the second type. It is just a shame that we do not always exploit this. 5 organizational set-up: yes, here too we can observe some weak signals that boundaries are changing: not longer centralization is prefered always above distribution (USSR, Philips). -- ### Jaap A. Meeuse ### Internet: meeuse@idca.tds.philips.nl ### ### Philips IS, Dept OFS ### UUCP : ..!mcsun!philapd!meeuse ###