pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) (02/02/90)
In article <2217@sunset.MATH.UCLA.EDU> pmontgom@sonia.math.ucla.edu (Peter Montgomery) writes: In article <PCG.90Jan29131938@rupert.cs.aber.ac.uk> pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) writes: >The same is true for common subexpression elimination; if there >are common subexpressions, and it is *important* that they be >shared, the programmer has better make those explicit, because >the common subexpression usually has an interesting reason for being >common. In my programs, common subexpressions often appear as a result of macro expansions. Others are not visible at the source level, but arise when producing the object code. Of course. But this is the result, most often, of insufficient language design, both as to the need to use macro expansion, and as to the hidden code problem (as I try to demonstrate later on). Let me give one, but quite general example; common subexpressions in fortran and other lanuages often arise from repeated subscript calculations. Well, this most often happens because there are no have slices, or, even better, array iterators. Consider the usual fragment for matrix product (dimensions are [m,o] [o,n]), written in some pseudo language: for i in 1..m for j in 1..n c[i,j] = 0 for k in 1..o c[i,j] = c[i,j] + a[i,k]*b[k,j] Of course there are ample opportunities for common subexpression elimination here, and other hairy optimizations. What about instead: for i in 1..m let s[] alias c[i,], fast ronly h[] alias a[i,] for j in 1..n let fast e alias s[j] initially 0, fast ronly v[] alias b[,j] for fast k in 1..o e += h[k]*v[k] (which can be readily written in Algol 68, which was *carefully* designed for easy, efficient code generation) which is vastly more descriptive as well, as it makes obvious we are using a vector product on one hand and that we *know* that e,h,v,k are the variables to cache, the others are irrelevant, and that the compiler need not bother "optimize" ? In a previous discussion on a similar theme, we compared C code for this, written both ways, the first subjected to a very highly optimizing compiler. Well, the second version was faster by two or three times. Then the following version for s[] over c[*,], ronly h[] over a[*,] for p over s[*], ronly v[] over b[,*] p = h innerprod v (ronly h[]) innerprod (ronly v[]) is let fast d initially 0 for fast ronly h_i over h[*], fast ronly v_j over v[*] d += h_i*v_j; return d would be in many ways less procedural and highly informative, and allow for a lot of safe optimization (coupled with some heuristic on expected access frequencies), especially on vector machines. Somebody may claim that the style I advocate is "unnatural", because the first version is more similar to the habits of a mathematically trained person. Let me argue that: 1) As to this particular example, I think that matrix product as usually described uses poor notation. I am all in favour of using more abstract, terse notation in mathematics. There are unfortunately many people that love lots of hanging indexes... 2) Programming is not the same as mathematics. You often have to change *radically* technique when writing a program. Risch-Norman for symbolic integration is totally unsuitable for human execution, and has no resemblance to "by parts", etc... Programming languages of the sort used for time critical applications are not "non procedural" yet, and I think never will be, and *the programmer* must adapt to them, not viceversa. 3) If thehot spot is localized, and is *important*, one doesn't give a lot about naturalness, even if I maintain that often one can still do a clean job. Now back to your example. Your code contains: MUL2(a1, b1, c1, d1) MUL2(a1, b2, c1, d2) MUL2(a2, b1, c2, d1) MUL2(a2, b2, c2, d2) . and you comment that On the SPARC, the macro is being defined not to produce the first expansion but rather is defined so MUL2(a, b, c, d) becomes (TRI(a+b) + TRI(c+d)) - (TRI(a) + TRI(c)) - (TRI(b) + TRI(d)), where TRI is a table of triangular numbers (i.e., TRI(n) = n*(n+1)/2). Ahhhhhh.... So. You want to avoid slow SPARC multiplications and you use a table. This is highly unobvious; you add that It is easily to verify that this is functionally equivalent to the first expansion, if the TRI table is sufficiently large and if no integer overflow occurs. which means that this highly machine dependent (because your space vs. time *tradeoff* is such) code must also be supported by easy but non obvious boundary conditions and mathematical reasoning. In other words, *major* wizardry, which should be shouted about, not hidden within different MUL2 expansions. After the macro is expanded, the compiler sees the four expressions (TRI(a1+b1)+TRI(c1+d1)) - (TRI(a1)+TRI(c1)) - (TRI(b1)+TRI(d1)) (TRI(a1+b2)+TRI(c1+d2)) - (TRI(a1)+TRI(c1)) - (TRI(b2)+TRI(d2)) (TRI(a2+b1)+TRI(c2+d1)) - (TRI(a2)+TRI(c2)) - (TRI(b1)+TRI(d1)) (TRI(a2+b2)+TRI(c2+d2)) - (TRI(a2)+TRI(c2)) - (TRI(b2)+TRI(d2)). As previously mentioned, this is a time-critical loop, and the macro definition has been carefully written so the subexpressions (TRI(a1)+TRI(c1)), (TRI(b1)+TRI(d1)), (TRI(a2)+TRI(c2)), (TRI(b2)+TRI(d2)) appear twice in the resultant expansions. The module which has the four references to MUL2 does not know about this; it would defeat information hiding if I expanded everything there and removed the common subexpressions explicitly. I think that a different choice of abstraction boundaries would do it; probably it is not MUL2 the abstractions layer you should care about, but instead the four successive calls. Here you are playing into my hands; in your application, what is central is indeed the effect achieved by the block of four calls, not that it is achieved by each multiplication step. In practice, a1, a2, c1, and c2 are loop invariants (but b1, b2, d1, and d2 vary on each iteration). But this is very interesting information again, which should be duly obvious to the reader of your code. Who reads your code, be it human or program, should not discoer this important property by him/her/itself. [... because of expensive array indexing on SPARC maybe ...] I want the compiler to optimize the array indexing, perhaps by computing the addresses TRI, TRI+4*a1, TRI+4*a2, TRI+4*c1, TRI+4*c2 outside the loop, with the inner loop using 4*b1, 4*b2, 4*d1, and 4*d2 as offsets. This requires the compiler to rewrite the address TRI+4*(a1+b1) as (TRI+4*a1)+(4*b1), recognizing that TRI+4*a1 is a loop invariant and that 4*b1 is also part of the address TRI+4*b1. I cannot make these optimizations at the source level. In C you can, for example. Step a pointer thru the arrays. This would be a way to make obvious that you are building your result by sequential scan of the tables. Again, e.g. for virtual memory, if the application is really critical, this might be a machine dependent valuable piece of information, or not. As another example, on many cached machines it is *vital* to do memory to memory copies that are aligned on cache line boundaries in the destination. This is something that is highly interesting, as it makes clear some unobvious design choices. Even on systems using the first macro expansion (in terms of multiplications), there may be hidden common subexpressions. For example, suppose a1*b1 + c1*d1 is done by floating point hardware, with all variables being converted from integer to floating beforehand. If this is important, I think you could well be doing it manually, just like the avoidance of multiplications above. It would also make things clearer to who reads your code, and the hazards involved (floating point has quite different properties from integer arithmetic). This conversion to floating point does not appear in my source code, but I would want the compiler to recognize that variables appearing repeatedly in products need be converted to floating only once, and that a1, a2, c2, c2 can be converted outside the loop. This will be most easily done if you let the readers of your program know that those are loop invariants. The it just takes a good code generator to choose the most efficient strategy for carrying out your wishes. Of course, if the application is really time critical, you also want to second guess your compiler and architecture. Unavoidable. In summary, I think your example is helping my argument; you take great pains to do manual micro optimization (eschewing multiplies, for example), but then you do this at what I deem too low a level, and you don't include important information as to whether some variables are invariant, and as to whether intermediate passage to floating point is safe. Well, it's mostly a difference of programming style. You may not like mine, but I think it can still cope with problems like this. On the other hand, I cannot make these optimizations at the source level. which is a limit of current languages for which compilers that are too clever by half (many of the optimizations you want your compiler to do above require non obvious reasoning -- all too many optimizers generate optimized code that is not equivalent to unoptimized code). As such, we are all poor victims of the status quo. Eventually, my conclusion is that if one wants smart compilers, one had better have languages that make that easier, less procedural and more descriptive. For languages like C or Fortran, for which the compiler must do a painful and often incorrect deductive work if second guessing the programmer is desired, I hold that careful manual optimization of the few relevant lines is both preferable and more effective, and safer, and the compiler should help with that (that's why I think that "register" in C is such a big win). On the other hand both are expensive in terms of man power, so we see monstrosities such as the many hundred millions of dollars that have been sunk in vectorizing compilers for fortran that try to second guess programmers that wrote for long dead architectures. The thought of using more appropriate language technology seems to be heretical. Well, dusty decks have a morbid fascination. Many could not do without their bugs (and don't remember what Knuth wisely observed on this matter in his volume 1 about rewriting being better -- of course if you have the manpower to do it). -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
preston@titan.rice.edu (Preston Briggs) (02/02/90)
In article <PCG.90Feb1164820@rupert.cs.aber.ac.uk> pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) writes: many interesting things... He also gives some examples of how we might notate matrix multiplication so that a comparitively simple compiler can generate good code. > for i in 1..m > let s[] alias c[i,], > fast ronly h[] alias a[i,] > for j in 1..n > let fast e alias s[j] initially 0, > fast ronly v[] alias b[,j] > for fast k in 1..o > e += h[k]*v[k] > >(which can be readily written in Algol 68, which was *carefully* >designed for easy, efficient code generation) which is vastly >more descriptive as well, as it makes obvious we are using a >vector product on one hand and that we *know* that e,h,v,k are >the variables to cache, the others are irrelevant, and that the >compiler need not bother "optimize" ? My impression has been that the generality of Algol-68 prohibits efficient code. Nevertheless, the example is interesting. Note that we'd also like to see m*4 in a register (assuming row major array ordering and 4 bytes/word) and k itself eliminated. The point being that the "fast" hint has some perhaps unexpected side-effects. >Then the following version > > for s[] over c[*,], ronly h[] over a[*,] > for p over s[*], ronly v[] over b[,*] > p = h innerprod v > > (ronly h[]) innerprod (ronly v[]) is > let fast d initially 0 > for fast ronly h_i over h[*], fast ronly v_j over v[*] > d += h_i*v_j; > return d > >would be in many ways less procedural and highly informative, >and allow for a lot of safe optimization (coupled with some heuristic >on expected access frequencies), especially on vector machines. If you'd throw out the "ronly" and "fast" hints, I'd say these forms aren't too bad. Why do you need to say "ronly" when it's fairly obvious to the compiler that h and h_i are not assigned? These hints seem like non-abstract intrusions into the middle of otherwise fairly declarative code. Both these examples are written in a relatively high-level notation. My approach to compiling either notation would be to translate quickly into a fairly low-level form with very little analysis of the original code. To simplify my life, I'd make the translation as straightforward as possible, generating deliberately naive code and concentrating on producing a correct translation. Then I'd optimize, using a few powerful, general purpose optimizations, (i.e. partial redundancy elimination, strength reduction, dead code elimination, value numbering). Finally, I'd generate object code from the optimized intermediate code. I'd use a global register allocator to assign registers to the important variables and temporaries. Note the seperation of concerns. Enforcing the language in the front end, improvement (without changing the meaning of the program) in the middle, and concern for the machine details in the back end. So what happened to the "fast" hints (and register declarations in C)? They get thrown out. It's easier, more effective, and more complete to discover opportunities for optimization during the latter stages of compilation. But let's imagine a slick compiler that can make the code you'd like to see for either of these examples. There exist Fortran compilers that can produce faster code. Why? They can make fuller use of the machines resources and they're willing to generate the ugly code necessary. You mustn't misunderstand me and think that I code in Fortran; but I do recognize that Fortran compilers know a lot about DO-loops and arrays. If other languages had similar restrictions on their arrays and loop structures, their compiler could achieve similar levels of optimization. It's also possible that the notation outlined by PCG would support fairly deep analysis; I'm not sure from the examples. Preston Briggs preston@titan.rice.edu
jlg@lambda.UUCP (Jim Giles) (02/02/90)
From article <PCG.90Feb1164820@rupert.cs.aber.ac.uk>, by pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi): > [...] > Consider the usual fragment for matrix product (dimensions are > [m,o] [o,n]), written in some pseudo language: > > for i in 1..m > for j in 1..n > c[i,j] = 0 > for k in 1..o > c[i,j] = c[i,j] + a[i,k]*b[k,j] > > Of course there are ample opportunities for common subexpression > elimination here, and other hairy optimizations. What about instead: > > for i in 1..m > let s[] alias c[i,], > fast ronly h[] alias a[i,] > for j in 1..n > let fast e alias s[j] initially 0, > fast ronly v[] alias b[,j] > for fast k in 1..o > e += h[k]*v[k] > > (which can be readily written in Algol 68, which was *carefully* > designed for easy, efficient code generation) which is vastly > more descriptive as well, [...] Yes, it is more descriptive. But, it is descriptive of all the wrong things. Here, you are describing things for optimization purposes which the programmer (with today's state-of-the-art) RIGHTLY assumes the compiler sould be able to find for itself. In fact, the only reason that programmers accept the above Fortran-ish loop notation is that (up to now) the compiler technology hasn't been capable of doing better. The mathematical notation for this is simply: C = A B Now, you can convince a user that this is ambiguous: is it element-wise array multiplication or matrix multiplication? Well, this requires a rewrite. The experienced mathematician will then suggest: n o n C = A B m m o This uses a convention that upper- and lower-indices that match on the right are summed (sometimes called the 'Einstein summation convention'). Well, now you might conmvince the programmer that the above notation is ill suited to keyboard/terminal usage since these devices are (so far) not well suited to the two-dimensional layout given. So, you might end up with: C(m,n) = A(m,o) * B(o,n) This is fine, except that now it matches the notation for a simple scalar multiply using the current values of m,n, and o. There needs to be some way of explicitly representing the fact that m, n, and o are iteration variables and not simple subscripts. Hence, the loop notation that began this message. But, the mathematical programmer has legitimate reason to expect that, as compiler/language technology improves, the notation will converge back to the original mathematical notation. So, by today's state-of-the-art, you might expect a language to accept AND OPTIMIZE the following: C(#m,#n) = A(#m,%o) * B(%o,#n) Where the sharp (#) indices are iterated over and the percent (%) indices are summed over. This may be the best we can expect until keyboards and fonts improve (which is already happening). Still, the correct direction for programming languages to move is toward notation for _people_ to use, not toward notation to make (already standard) optimizations easier. > [...] > 2) Programming is not the same as mathematics. You often have to > change *radically* technique when writing a program. [...] Quite true. But, the key phrase in what you've said is "have to". You shouldn't change your notation (radically or not) unless you have to. Even then you should change as little as you have to. As time goes on, it is legitimate to expect that these occasions when I have to change notation will become less frequent. It is toward that goal that language design should be turned. > [... "register" attribute in C "a big win" ...] The only difference between a register variable and any other local variable in C is that you cant use the address-of (&) operator on register variables. In languages which don't have pointers (like Fortran 77) this is a moot distinction. In languages in which pointers can only point to dynamic memory and are the only way to do so (like Pascal and Modula2) this is also an irrelevant distinction. In fact, the only use of the "register" attribute is to be able to declare variables that can't be aliased - an irrelevant distinction in MOST languages where variables have that property by default. J. Giles
preston@titan.rice.edu (Preston Briggs) (02/02/90)
In article <4466@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes: >In article <PCG.90Feb1164820@rupert.cs.aber.ac.uk> >> for s[] over c[*,], ronly h[] over a[*,] >> for p over s[*], ronly v[] over b[,*] >> p = h innerprod v >> >> (ronly h[]) innerprod (ronly v[]) is >> let fast d initially 0 >> for fast ronly h_i over h[*], fast ronly v_j over v[*] >> d += h_i*v_j; >> return d >If you'd throw out the "ronly" and "fast" hints, I'd say these >forms aren't too bad. Why do you need to say "ronly" when it's >fairly obvious to the compiler that h and h_i are not assigned? Following up my own posting, ... I'd like a compiler to make code to make the computer do what I want. Unfortunately, it can't read minds, so I have to write a program to give it a hint. Since it's so stupid, I have to be fairly specific about how it's to do things like multiplying matrices. On the other hand, an optimizing compiling allows me to avoid too much tedium. Strength reducers are good enough that I don't need to mess with pointer variables. Various forms of redundancy elimination are good enough that I needn't bother about every little common subexpression. Vectorizers are good enough so that I need not manually insert vector primitives. Register allocators are good enough that I need not scatter register declarations about. Constant propagation is strong enough so that I can afford to declare some constants in convenient places, without bothering about folding every constant expression by hand. And all these optimizations work exceptionally well on the inner loops that consume so much time. The optimizer is written once, by the compiler writer. I, on the other hand, write many programs and I'm able to take advantage of the optimizer in all my programs. It costs compile time, but it's cheaper to use the computer to optimize my code than to do it by hand. Optimizers can't do everything. They don't parallelize Fortran very well, for example. A better language would probably be a good alternative. But why should we do what the computer can do well? Preston Briggs preston@titan.rice.edu
pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) (02/06/90)
In article <14225@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: > [... "register" attribute in C "a big win" ...] The only difference between a register variable and any other local variable in C is that you cant use the address-of (&) operator on register variables. Well, there is also actually a strong *hint* that the variable will be heavily *dynamically* used across statement boundaries, in the current scope. The compiler can use this hint very effectively. A careful programmer will never declare a variable for a scope larger than that in which it is used, will never reuse a variable with the same name for two different roles. Very few people don't do the sloppy practice of declaring a bunch of variables at the beginning of a major scope, some of which are aonly used in some section of it. The compiler could do a liveness analysis on such variables, but the source code remains obscure, because the source misrepresents to the reader what goes on. The compiler or a human can second guess the author, but this had better not happen, because it means the source is not informative enough. For example, when I heavvily use for a section of code a variable that is otherwise little used, I will create a new scope for that section of code, a register variable in it in which I will copy the outer one, and copy back at the end of the paragraph. In this way I am conveying both to human and compiler reading my code the important information that I reckon that the variable is being heavvily used. A human reader can find this information valuable just as a compiler. Of course this is because I am eccentric enough to think that the performance characterization of a program is part of its design and pragmatics should be as obvious as semantics. In languages which don't have pointers (like Fortran 77) this is a moot distinction. In languages in which pointers can only point to dynamic memory and are the only way to do so (like Pascal and Modula2) this is also an irrelevant distinction. Let me differ. Fortran has had equivalence and common forever, and even if dirty tricks are prohibited in theory, most compilers cannot assume users are well behaved (hey, most fortran programs depend on subroutine local variables keep their values between invocations!). Pascal has 'var' parameters, and, more murkly, variant records without discriminant. In fact, the only use of the "register" attribute is to be able to declare variables that can't be aliased While it is true that guaranteeing the absence of aliases makes the compiler's job easier in caching variables safely across statements, the two things are not related. The abominable 'noalias' (and the slightly less horrid 'volatile') proposed extension to Classic C had the same purpose, but unsafe, in that the absence of presence of 'noalias' (and 'volatile') does change the semantics of a program, while for 'register' this is not true; 'register' also gives a *positive* hint on usage frequency. A compiler compiler could watch a function body for cases where a variable address is not taken, and thus deduce the 'register' attribute by itself. 'register' is better, because it allows one pass compilation, and documents the assumptions made by the author (e.g. frequency of usage) to both human and compiler. With 'register' safety (no aliasing) is a side effect, but a clever one, under more than one aspect. - an irrelevant distinction in MOST languages where variables have that property by default. Actually I am hard pressed to think of widely adopted languages with side effects in which variables can be safely assumed not to be aliased. Aleph and Euclid come to mind, but not many more (I have not thought about Ada, and the exact semantics of 'IN OUT' parameters). What ruins the alias show for most languages is either separate compilation or parameter passing, even where pointers are not present. And even if this not true, the compiler still has to guess which of the safe variables are actually worth caching. Naturally this is a moot point on many of today's architectures, where register optimization is entirely unnecessary, as there usually far more registers available than variables (unless you really have something like multiple threads, e.g. superscalar, or vector, but then you are really speaking of multiple register sets). -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) (02/06/90)
In article <4466@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes: In article <PCG.90Feb1164820@rupert.cs.aber.ac.uk> pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) writes: He also gives some examples of how we might notate matrix multiplication so that a comparitively simple compiler can generate good code. Here is the keyword: *simple*. I don't like complex compilers that second guess me. I like *simple*, obedient, compilers. Simplicity is the path to reliability and economy. These seem to be lost esoteric arts :-(. My impression has been that the generality of Algol-68 prohibits efficient code. Nevertheless, the example is interesting. Algol68 is difficult to *parse*, but it has been carefully designed to make efficient code possible, and indeed even easy. Array slicing, for example, is there for this reason, as are other things. Too bad that something like 'fast' is missing (but 'ronly' is there). Note that we'd also like to see m*4 in a register (assuming row major array ordering and 4 bytes/word) and k itself eliminated. The point being that the "fast" hint has some perhaps unexpected side-effects. In C this is done by stepping a pointer thru an array. Well, I was assuming a more conventional language. The idea that C pointer arithmetic is automatically scaled is another win. If you'd throw out the "ronly" and "fast" hints, I'd say these forms aren't too bad. Why do you need to say "ronly" when it's fairly obvious to the compiler that h and h_i are not assigned? These hints seem like non-abstract intrusions into the middle of otherwise fairly declarative code. I like them because they document assumptions by the programmer both to human and compiler reader. Both can check them and exploit them. A program should also document such assumptions, not just the algorithm. This is the major problem I have for example with Dijkstra; his books are really about algorithm design, thinly veiled as programs, not programming. Programming is a hands on job. Economics *are* important. A "programmer" will carefully shape an SQL query (and SQL as a language is quite non procedural and pure) so that it will be the most efficient of those with equivalent results, taking into account the query optimization strategies of the fron tend and the performance characterization of the DB engine and of the configuration of the machine. The query optimizer will tend to do a good job done, but for *important* large queries, the programmer will know better, having more information and insight (hopefully!). The theme is where the point of diminishing returns is, not whether the compiler can extract a further X% speedup from a source that the programmer has not described carefully as to pragmatics. If a *simple* compiler can get 90% of source lines fast enough, and there is a steep cost in compiler complexity to have it optimize an extra 5%, and the remaining 5% requires real intelligence, the tradeoff is clear. [ ... about compiling a high level notation into a lower level one and applying optimizations to this ... ] Note the seperation of concerns. Enforcing the language in the front end, improvement (without changing the meaning of the program) in the middle, and concern for the machine details in the back end. Not bad. In outline I tend to agree. So what happened to the "fast" hints (and register declarations in C)? They get thrown out. It's easier, more effective, and more complete to Let me disagree. "more effective" is a strong statement. Armchair evidence tends to point out that current highly optimizing compilers get better results with "register" declarations (some flimsy data points: dhrystones, the matrix multiplication example recoded in C, etc...). This tallies with the expectation that most of the hairy optimization is about irrelevant pieces of code anyhow, and that current widely available optimizer technology does not take into account *dynamic* usage, and thus does not know which are the hot spots. There are heuristics or profiler feedback optimizers around already, but I tend to delude myself that programmers can do the same job more intelligently and reliably, in the few spots where it is necessary. discover opportunities for optimization during the latter stages of compilation. On this let me disagree too. The hints should not get thrown out; they are valuable, *quality* information. The compiler writer should not believe that it can (reliably!) second guess the author as to the semantic/pragmatic properties of the program. The author should describe them as clearly as possible, and the compiler should treasure such information. Of course the author should be spared (not always! if it is *really* important, machine dependent considerations should enter into the design of a program) the work of machine dependent optimization (which is actually just competent code generation), because there the compiler is in charge. In other words, this means that most compiler optimizations should be about space, because this is what a code generator has direct control on and knowledge of, and instruction reordering, and other architecture dependent details. I'd cleaim that this approach results in clearer programs, and in simpler, faster, more reliable compilers, and object speed need not suffer, quite the opposite. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) (02/06/90)
In article <4467@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes:
I'd like a compiler to make code to make the computer do what I want.
Unfortunately, it can't read minds, so I have to write a program
to give it a hint. Since it's so stupid, I have to be fairly
specific about how it's to do things like multiplying matrices.
Agreed. We are speaking of *implementation* languages.
On the other hand, an optimizing compiling allows me to avoid
too much tedium.
Let me be stroppy: I'd say "allows me sloppy, opaque coding".
Strength reducers are good enough that I don't need to mess
with pointer variables. Various forms of redundancy elimination
are good enough that I needn't bother about every little common
subexpression. Vectorizers are good enough so that I need not
manually insert vector primitives. Register allocators are
good enough that I need not scatter register declarations about.
About the "good enough" I am somewhat skeptical (especially about
the register allocators -- but then again, many modern machines
have more registers than variables...). Also, most of the time
these optimizations are irrelevant.
Constant propagation is strong enough so that I can afford
to declare some constants in convenient places, without bothering about
folding every constant expression by hand.
This is *not* an optimization. It does not involve second
guessing the program's semantics. It is just good code
generation. Like removing redundant code, inlining, reordering
expression's operands (where allowed by language rules), etc...
And all these optimizations work exceptionally well on the
inner loops that consume so much time.
Not entirely true. Again, at least some armchair evidence tends
to support the idea that hand care does a lot of good. And the
inner loops are rare enough you can devote your attention to them.
The optimizer is written once, by the compiler writer.
I, on the other hand, write many programs and I'm able to
take advantage of the optimizer in all my programs.
This argument remains true for good code generation, and the
machine independent programming it affords. It is even true for
SQL query optimizers, most of the time, and they do *very* high
level planning and strategies; but not always. Even with SQL (or
similar things) you have to do hand optimization in critical
cases.
It costs compile time, but it's cheaper to use the computer
to optimize my code than to do it by hand.
It encourages less precise coding, and makes the compiler many
times larger, and these are *very* bad news, and very costly too.
All this to optimize very little pieces of code, which could be
manually sorted out, and with improvement on their information
content for both humans and programs that read them.
--
Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
preston@titan.rice.edu (Preston Briggs) (02/07/90)
In article <PCG.90Feb5183407@rupert.cs.aber.ac.uk> pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) writes: >Let me disagree. "more effective" is a strong statement. Armchair >evidence tends to point out that current highly optimizing >compilers get better results with "register" declarations (some >flimsy data points: dhrystones, the matrix multiplication example >recoded in C, etc...). Which reminds me... I'm still hoping to see somebody recode matrix multiply in C, using all the source hacks they want, so we can compare it to what we get using Fortran and our (not mine, friends') latest transformations. We'd run it on a MIPS M/120, with 32 integer registers and 16 fp regs, so you've got plenty of room for register declarations, pointer variables, and such. 64K data cache, so plenty of room there. The Fortran source is subroutine mm(A, B, C, n) do 1 i = 1, n do 1 j = 1, n A(i, j) = 0.0 do 1 k = 1, n 1 A(i, j) = A(i, j) + B(i, k) * C(k, j) end Preston Briggs preston@titan.rice.edu
chase@Ozona.orc.olivetti.com (David Chase) (02/07/90)
In article <PCG.90Feb5185112@rupert.cs.aber.ac.uk> pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) writes: > Strength reducers are good enough that I don't need to mess > with pointer variables. Various forms of redundancy elimination > are good enough that I needn't bother about every little common > subexpression. Vectorizers are good enough so that I need not > manually insert vector primitives. Register allocators are > good enough that I need not scatter register declarations about. > >About the "good enough" I am somewhat skeptical (especially about >the register allocators -- but then again, many modern machines >have more registers than variables...). Also, most of the time >these optimizations are irrelevant. Interesting assertion -- do you have references to support this? Especially, do you have evidence that vectorization is usually irrelevant? >It [optimization] encourages less precise coding, and makes the >compiler many times larger, and these are *very* bad news, and very >costly too. All this to optimize very little pieces of code, which >could be manually sorted out, and with improvement on their >information content for both humans and programs that read them. (1) Large costly compilers are only "bad news" if few programs are developed -- once a compiler is widely used, the extra time spent in its development is paid back many times over. (Of course, first you have to get the compiler into wide use.) (2) Why do you assume that "precise coding" is good? Does it get a program written more quickly, or make the program more maintainable, or more likely to be correct, or increase its portability? In general, the answers are NO, NO, NO, and NO. At best, "precise coding" will make code run faster on one machine with one particular compiler. A good optimizing compiler can do just as well, with less time spent doing it. (Note -- you perhaps regard optimizers as program-manglers; for C, you have no grounds for complaint, since (last I heard) the language standard is not yet official -- if you program in a language with no official definition, you should expect some variation in the implemented semantics. Nobody forced that choice on you.) A high-powered optimizer does allow programmers to get a program running efficiently with less time spent thinking about it. This is bad? We should program more slowly? *Some* people use the extra time to worry about higher-level issues, like correctness and algorithmic complexity. Some programmers aren't that thoughtful -- would you trust such a person to hand-optimize code? David
jlg@lambda.UUCP (Jim Giles) (02/07/90)
From article <PCG.90Feb5185112@rupert.cs.aber.ac.uk>, by pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi): > [... constant folding ...] > This is *not* an optimization. It does not involve second > guessing the program's semantics. It is just good code > generation. Like removing redundant code, inlining, reordering > expression's operands (where allowed by language rules), etc... Now I understand your problem. You have a confused notion of the meaning of "semantics" and "optimization". Semantics refers to the _meaning_ of the code. That is: for a given input, what does the program generate as output? Optimization is the use of program transformations to improve code performance _WITHOUT_CHANGING_SEMANTICS_! If your optimizer changes the semantics of your code - it's a broken optimizer.* Your list above are _all_ optimizations. So are strength reduction, common sub- expression elimination, pipelining, etc.. NONE of these code transformations involve changes to the program's semantics. ALL of these techniques fit into the category of "just good code generation". Footnote *: The only exception to this is the so-called associative operators (+, *), which are not, in general, _computationally_ associative. Languages which allow the compiler to reorder such operators generally say so _explicitly_ in their documentation and generally also provide an efficient way to force evaluation order if it is important. One could argue that reordering such operators doesn't change the semantics of the code since such vagueness is part of the semantics of the operators in the first place. > [...] > It costs compile time, but it's cheaper to use the computer > to optimize my code than to do it by hand. > > It encourages less precise coding, and makes the compiler many > times larger, and these are *very* bad news, and very costly too. No, coding should _always_ be precise. Doing common subexpression elimination by hand is _NOT_ more precise - it is merely more verbose. Both versions would have _exactly_ the same _meaning_. > All this to optimize very little pieces of code, which could be > manually sorted out, and with improvement on their information > content for both humans and programs that read them. Manually sorting it out cost programmer time, which _may_ be more expensive than any possible saving over the life of the program. Further, be more verbose is not, necessarily, an improvement in content for the human reader (or, sometimes not even for the computer - see my next post). In fact, the kinds of "sorting out" that you are recommending won't save _any_ time at all on state-of-the-art compilers. I would recommend spending your human resources on types of optimization that compilers _CAN'T_ yet do automatically. J. Giles > Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk > Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg > Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
jlg@lambda.UUCP (Jim Giles) (02/07/90)
From article <PCG.90Feb5183407@rupert.cs.aber.ac.uk>, by pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi): > [...] > Here is the keyword: *simple*. [...] I agree. But it is clear that we have a different definition of the word "simple". Adding obfuscating detail to a program in order for the compiler to optimize it better is _NOT_ a way of simplifying the code - quite the reverse. It is sometimes (_only_ sometimes) necessary to do so in order to meet efficiency criterion for the program, but I don't have to like it. > [...] I don't like complex compilers > that second guess me. If, by "second guess", you mean compilers which change the _meaning_ of your code, then I agree! This kind of defect in a compiler is called "bad code generation" and is widely regarded as the worst defect a compiler can have. On the other hand, if you mean that you don't like compilers which generate efficient code in ways you hadn't thought of - so what? > [...] I like *simple*, obedient, compilers. The simplest compiler I know of consists of front panel switches (ie. NO compiler at all). I like mine a little more capable than that. As compiler technology improves, the set of things regarded as _simple_ will increase. This is all to the good - it means I can concentrate more on getting the code correct, portable, maintainable, etc.. It means that the frequency of occasions that I have to hand-optimize for efficiency will decrease. In short, it allows me to keep my source code _simple_. > [...] > Simplicity is the path to reliability and economy. These seem to > be lost esoteric arts :-(. I agree - see above. > [... array indexing within a loop - strength reduction ...] > In C this is done by stepping a pointer thru an array. Well, I > was assuming a more conventional language. The idea that C > pointer arithmetic is automatically scaled is another win. Consider the following two code fragments (side by side): int a[200][2]; int a[200][2]; int *p; int *q; ... ... ... p = &a; q = p+1; for (i=0;i<200;i++){ for (i=0;i<200;i++){ a[i][0] = expr1(i); *p = expr1(i); a[i][1] = expr2(i); *q = expr2(i); }; p += 2; q += 2; }; The first is clearly more readible than the second. Modern compilers can easily find the optimizations necessary to make the first run as fast as the second. The first also has the important advantage of clearly stateing something that the second obscures: there are no dependencies between the two assignments! The algorithm is stepping through two independent vectors. Compilers (and people) can easily see this on the first example and can (for example) vectorize the first more easily. The general problem of analysing dependencies in the second kind of code is VERY difficult - it's clearly and explicitly stated in the first code fragment. (I know, I made the problem harder by looping on the first index instead of the second. But the other way around won't make general dependency analysis any easier for the pointered case.) This is a case where your verbose approach actually decreases the information content of the code for _both_ the programmer and the compiler. J. Giles
raulmill@usc.edu (R D R) (02/08/90)
Preston Briggs writes:
;> I'm still hoping to see somebody recode matrix multiply in C, using
;> all the source hacks they want, so we can compare it to what we get
;> using Fortran and our (not mine, friends') latest transformations.
;> . . .
While we're on the topics of recoding, and how well source code maps
onto a machine, why not consider a few other mainstream languages
besides C and Fortran??????
I mean, C++ seems to rectify a large number of the problems one sees
with C. (It allows inline functions, for example.) You still have
pointers, but g++ (gnu's C++) allows one to declare that a function
has no side effects, allowing the optimizer to fold calls to it. (gcc
has both of these features too, and for "hello world" programs
generates smaller code, but I'm getting tired of only hearing about c
vs FORTRAN.)
On the other hand, there are languages like APL which with a few
changes (like variable scoping, and function rank declaration) ought
to make very nice compiling languages. /* APL implementation of
matrix multiply: A <- B +.x C */. From what I've heard, it is
possible to DEVELOP code in APL about 6 or 7 times faster than what is
usual with FORTRAN. (Weren't the two big advantages of FORTRAN over c
supposed to be that FORTRAN is faster to write, and FORTRAN allows
more optimizations in compilation, especially as machine architectures
change???)
--
Raul Rockwell
INTERNET: raulmill@usc.edu !
UUCP: ...uunet!usc!raulmill ! 55 mph = 82 nc
U.S.SNAIL: 721 E Windsor #4, GLENDALE CA 91205 !
brnstnd@stealth.acf.nyu.edu (02/08/90)
Piercarlo is entirely correct. If you care about ``information hiding'' and the ``purity'' of your code more than efficiency, you can't expect fast code. If you can't bear to put your original code in comments and write an efficient version with temporary variables, you don't deserve fast code. In article <2217@sunset.MATH.UCLA.EDU> pmontgom@math.ucla.edu (Peter Montgomery) writes: > In my programs, common subexpressions often appear > as a result of macro expansions. Others are not visible at the > source level, but arise when producing the object code. When efficiency is important, write out the expressions. If the efficient version is different on different machines, write it out separately for each. There is a limit to what the compiler can optimize; a competent programmer assumes merely that the compiler understands scheduling and register allocation, and does the rest of the job by hand. > There was recently a discussion in comp.arch about the lack of > an integer multiply instruction on the SPARC, and how to circumvent it. > In one of my programs (for multiple precision arithmetic), the most > time-critical inner loop requires four sums of products (all variables integer) [ the expressions he wants, and how he implements them on the SPARC with macros referencing a table of triangular numbers ] Actually, the code would be faster if you used squaring rather than triangular numbers. Do you expect the compiler to figure this out, or should the programmer perform this optimization? Using squaring as the foundation for multiprecision multiplication (i.e., to multiply arrays a and b, square a+b, square a-b, subtract, and divide by 4) saves a lot of space (and code). Do you expect the compiler to figure this out, or should the programmer perform this optimization? > it would defeat information hiding if I expanded everything > there and removed the common subexpressions explicitly. Tough. Efficiency demands code space. You can leave the ``pure'' expressions in the code, commented out by #ifdef SLOW_N_PORTABLE. Supposedly optimizing compilers don't remove the burden of thought from programmers. [ more optimizations, based on loop invariants ] > I cannot make these optimizations at the source level. Again, you can and should perform this optimization by hand, sacrificing your macros. Feel free to leave the original code there, but don't pretend that the compiler can figure things out better than you can. > Even on systems using the first macro expansion > (in terms of multiplications), there may be hidden common subexpressions. So don't trust Cray's compiler to figure out that it should store the fp variables. How can the compiler tell the difference between a bottleneck and a loop that's executed only twice? How can the compiler know whether variable a is used more than variable b in a function? How can the compiler know whether you prefer code space, compiling effort, memory, or time? ---Dan
nick@lfcs.ed.ac.uk (Nick Rothwell) (02/08/90)
In article <838.18:06:33@stealth.acf.nyu.edu>, brnstnd@stealth writes: >Piercarlo is entirely correct. If you care about ``information hiding'' >and the ``purity'' of your code more than efficiency, you can't expect >fast code. If you can't bear to put your original code in comments and >write an efficient version with temporary variables, you don't deserve >fast code. This is rubbish, check out the performance of New Jersey ML (high-level functional language, efficient optimised native code generation). >---Dan Nick. -- Nick Rothwell, Laboratory for Foundations of Computer Science, Edinburgh. nick@lfcs.ed.ac.uk <Atlantic Ocean>!mcvax!ukc!lfcs!nick ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ...als das Kind, Kind war...
firth@sei.cmu.edu (Robert Firth) (02/08/90)
In article <838.18:06:33@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes: >Piercarlo is entirely correct. If you care about ``information hiding'' >and the ``purity'' of your code more than efficiency, you can't expect >fast code. If you can't bear to put your original code in comments and >write an efficient version with temporary variables, you don't deserve >fast code. As somebody who in his time has had to write and maintain substantial quantities of numerical software, I emphatically disagree. We are talking here about five or ten page subroutines containing some pretty complicated algebra. Radar antenna design programs, for instance, can easily have expressions extending over three or four lines of text. This code is hard to maintain at best, but one precondition for readability and maintainability is that, as far as possible, the formula written in Algol (or whatever) look as much like the formula in the book as possible. No other way can you read it, check it, or understand it. If this means that I call sin(a[i,j-1]) six times in a single expression, so be it. Optimising the code, extracting common subexpressions, calling pure functions only once, and all that stuff, is the job of the compiler. That's why it's there. That's why the first major higher-level language was called FORmula TRANslator. Don't make people fit machines; make machines fit people.
cks@white.toronto.edu (Chris Siebenmann) (02/09/90)
brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes: | When efficiency is important, write out the expressions. If the | efficient version is different on different machines, write it out | separately for each. There is a limit to what the compiler can | optimize; a competent programmer assumes merely that the compiler | understands scheduling and register allocation, and does the rest of | the job by hand. I would say instead that a competent programmer would first get it correct, then measure things to see whether or not you need to speend them up, and then look at the various mechanisms for doing so. Always know where your bottlenecks are; it makes little sense to shave milliseconds off a loop when every iteration spends half a second doing a system call. It's true that the compiler system can do noticably better with hints of some sort from the user. However, there is little need for these hints to be "register" declarations and unrolled expressions and so on; feed the compiler system profile run output instead (see, for example, the DECWRL paper(s) on link-time register allocation, especially the one that compares it to register windows). Does anyone have numbers or hard information on the quality of current compilers (especially C compilers) and how much one can improve things by tweaking the source code? -- "I shall clasp my hands together and bow to the corners of the world." Number Ten Ox, "Bridge of Birds" cks@white.toronto.edu ...!{utgpu,utzoo,watmath}!utcsri!white!cks
jlg@lambda.UUCP (Jim Giles) (02/09/90)
From article <838.18:06:33@stealth.acf.nyu.edu>, by brnstnd@stealth.acf.nyu.edu: > Piercarlo is entirely correct. If you care about ``information hiding'' "Information hiding" is NOT part of the subject matter of this thread in the newsgroup. We can talk about "information hiding" if you like, but it is a different subject. ("Information hiding" refers to various scoping rules which allow codes to share data without making the data _global_. In C, 'static' variables which are declared at the beginning of a file are shared by all the procedures within that file and are not seen outside the file - THIS is a form of information hiding.) On the other hand: if, by "information hiding", you really mean "obfuscating code" - I can't think of a quicker way of doing it than putting in the kinds of optimizations recommended by Peter Grandi. > [...] > and the ``purity'' of your code more than efficiency, you can't expect > fast code. Agreed. If clarity, correctness, and maintainability are more important than speed, then simple, clear code is the proper choice. However, I CAN still expect better speed performance from a good compiler than from the kind Peter Grandi recommends. In fact, for the types of optimizations that we have been discussing, modern compilers can be expected to do as well as the hand optimized stuff. There are, indeed, other types of optimization which compilers cannot yet do (at least not well) - these are the proper targets of programmer effort (if anything is). > [...] If you can't bear to put your original code in comments and > write an efficient version with temporary variables, you don't deserve > fast code. Temporary variables aren't, necessarily, the best way to achieve improvement. As compilers get better, they may become even _more_ irrelevant. Furthermore, fast code is _NOT_ the only criterion for measuring the quality of the programming. Everyone has heard (or should have) of the 90%/10% rule - 90% of the execution time is in 10% of the code. Programmer effort in that 10% might be cost effective. But to try to optimize the last cycle out of the whole code is almost certainly NOT cost effective. Programmers generally cost about $100-$200 a day! If it takes a day to optimize a procedure which saves an hour of SUN workstation time over the life of the code - I've WASTED a couple of hundred bucks! For MOST things, the important thing is to get the code written and correct. Speed is, almost always, a secondary consideration. (This is not to say that speed isn't important - it _usually_ isn't - but when it is, it REALLY IS. Unfortunately, when speed REALLY IS important, it is usually more cost effective to switch directly to assembly than to mess around with half-way measures in a high- level language. The speed sensitive code is usually a small part of the whole program and can be recoded in assembly with gratifying results.) J. Giles
brnstnd@stealth.acf.nyu.edu (02/09/90)
I've answered the ``maintainability'' objection before: Don't throw out your original code after you've optimized it. Stick it inside #ifdef SLOW_N_NATURAL or whatever your favorite language uses for comments. If you change the original, ``natural'' code, throw away the efficient version and rewrite it from scratch. In article <6004@bd.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: > This code is hard to maintain at best, but one precondition for > readability and maintainability is that, as far as possible, the > formula written in Algol (or whatever) look as much like the formula > in the book as possible. No other way can you read it, check it, > or understand it. > > If this means that I call sin(a[i,j-1]) six times in a single > expression, so be it. { register float t = sin(a[i][j-1]); x = t * foo(t,t+1) + bar(t,t-1) - t; } #ifdef SLOW_N_NATURAL x = sin(a[i][j-1]) * foo(sin(a[i][j-1]),sin(a[i][j-1])+1) + bar(sin(a[i][j-1]),sin(a[i][j-1])-1) - sin(a[i][j-1]); #endif As I understand the efficient code much better than the slow 'n' natural code, I would leave the original out entirely. In fact, I doubt any programmer would need to duplicate more than one percent of a program in this fashion. > Optimising the code, extracting common > subexpressions, calling pure functions only once, and all that > stuff, is the job of the compiler. That's why it's there. That's > why the first major higher-level language was called FORmula TRANslator. The compiler is allowed to help but it simply isn't as good as a human. > Don't make people fit machines; make machines fit people. Unfortunately, researchers in artificial intelligence report that we have absolutely no idea how to make machines fit people. ---Dan
preston@titan.rice.edu (Preston Briggs) (02/09/90)
In article <5453:23:28:32@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes: >In article <2115@castle.ed.ac.uk> nick@lfcs.ed.ac.uk (Nick Rothwell) writes: >> This is rubbish, check out the performance of New Jersey ML >> (high-level functional language, efficient optimised native code >> generation). >Can it figure out, say, Knuth's algorithm for computing the delta2 table >in Boyer-Moore string searching, starting from a dumb algorithm for the >same job? > >Humans can optimize better than computers. This will be true for a long >time. I chose the above example randomly, but it'll serve well enough as You've made a big step from eliminating common subexpressions to understanding Knuth (anything by Knuth). One is a trivial task; the other requires "thought." Computers are good at trivial tasks; humans, on the other hand, tend to fumble trivial tasks. Optimization in a compiler isn't finding a good algorithm. "Optimization" is a misnomer. It's transforming the program to run more quickly. You never get asymptotic improvements. You don't see discovery of new algorithms. On the other hand, optimiztion lets me avoid having to express tedious details that can be discovered automatically (trivially). It lets me write matrix multiply (or whatever) in a short time, with good performance, instead of trying to squeeze the last ounce out of the implementation. Hopefully you can put the saved time to use discovering better algorithms. Good old seperation of concerns -- the computer does what it's good at and I do what I'm good at. As to "optimizing better than compilers", have you started working on a version of matrix multiply that will beat our Fortran? Good luck! Preston Briggs preston@titan.rice.edu
brnstnd@stealth.acf.nyu.edu (02/09/90)
In article <2115@castle.ed.ac.uk> nick@lfcs.ed.ac.uk (Nick Rothwell) writes: > In article <838.18:06:33@stealth.acf.nyu.edu>, brnstnd@stealth writes: > >Piercarlo is entirely correct. If you care about ``information hiding'' > >and the ``purity'' of your code more than efficiency, you can't expect > >fast code. If you can't bear to put your original code in comments and > >write an efficient version with temporary variables, you don't deserve > >fast code. > > This is rubbish, check out the performance of New Jersey ML > (high-level functional language, efficient optimised native code > generation). Are you saying that your compiler is smarter than you are or just that it's not as lazy? Can it figure out, say, Knuth's algorithm for computing the delta2 table in Boyer-Moore string searching, starting from a dumb algorithm for the same job? Humans can optimize better than computers. This will be true for a long time. I chose the above example randomly, but it'll serve well enough as a goal that I won't live to see computers reach. ---Dan
jlg@lambda.UUCP (Jim Giles) (02/09/90)
From article <5453:23:28:32@stealth.acf.nyu.edu>, by brnstnd@stealth.acf.nyu.edu: > [... Good ML compiler ...] > Can it figure out, say, Knuth's algorithm for computing the delta2 table > in Boyer-Moore string searching, starting from a dumb algorithm for the > same job? > > Humans can optimize better than computers. This will be true for a long > time. I chose the above example randomly, but it'll serve well enough as > a goal that I won't live to see computers reach. Good example. It demonstrates the point I've been making all along! Ther are optimizations that the compiler can't yet do well. There _may_ be optimizations that compilers will _never_ be able to do. It is these types of things that programmers should spend their effort on. However, your example does NOT support your claim that programmers should do strength-reduction or array indexing by hand. These are things that compilers of today CAN do reasonably well. They are also things which decrease the clarity of the code. All your example shows is that good choice of algorithm is worth more than penny-pinching optimizations. This old platitude is the commonplace of programming. So, what have you said that's new? J. Giles > > ---Dan
pcg@aber-cs.UUCP (Piercarlo Grandi) (02/10/90)
In article <48942@ricerca.UUCP> David Chase <chase@orc.olivetti.com> writes: >About the "good enough" I am somewhat skeptical (especially about >the register allocators -- but then again, many modern machines >have more registers than variables...). Also, most of the time >these optimizations are irrelevant. Interesting assertion -- do you have references to support this? Plenty. Several sources, most recently J. Mashey discussing exception handling on the MIPS in comp.arch, state that it is rare to find that you need more than 24-28 registers to cache *all* variables (a particularly funny example is the Pyramid, that has over fifty registers, and some researcher found that in all programs they tried never more than half could be used). Some references (the DEC WRL ones for the Titan) demonstrate that if you do cache only frequently *used* variables the knee of the curve is at the 8-12 registers level at most. There are many papers that confirm the idea that most (traditionally 80%) of a program has a negligible or small enough effect on the total running time that optimizing it is irrelevant, either by hand or mechanically. Especially, do you have evidence that vectorization is usually irrelevant? I never said so. On the other hand I think that vectorization is a program tranformation problem, and the 'compiler' proper should have nothing to do with it. Vectorization is an issue only because for historical reasons people want to continue using 'implementation' languages as 'application' ones. Dusty decks rule the waves, unfortunately. On the other hand, Knuth's observations on this problem, in Vol. 1 of the Art of Comp. Prog. are still valid, I think (shortly: rewrite!). (1) Large costly compilers are only "bad news" if few programs are developed -- once a compiler is widely used, the extra time spent in its development is paid back many times over. (Of course, first you have to get the compiler into wide use.) First you have to get the compiler generating correct code. I may be wrong :-), but the number of bugs in a complicated piece of software grows more than linearly with its size. A compiler is possibly one of the most critical components of your toolset. I am a (young) fogey, and tend to think that in these cases prudence advises against getting such a crucial component as complex as your address space allows you to. (2) Why do you assume that "precise coding" is good? Does it get a program written more quickly, or make the program more maintainable, or more likely to be correct, or increase its portability? In general, the answers are NO, NO, NO, and NO. Let me differ. A program that requires the reader to second guess the author is not going to be very maintainable, is not very likely to be correct, is not likely to be very portable. It may be *coded* more quickly, oh yes :-(. Making obvious, documenting in the program text (comments can be dangerous) all your assumptions is one of the tenets of disciplined software engineering. I used to like the 'lint' directive (e.g. /* NOTREACHED */) approach, for example. A great aid, both for lint and for the human reader. Some programmers aren't that thoughtful -- would you trust such a person to hand-optimize code? If you reckon that your optimizing compiler is less buggy and more intelligent than your programmer, all that I say blows up. I have to admit that this is usually the case, though. On the other hand, I think that the problem is better solved thru programmer training (but we all know about the realities of deskilling) or via programmer's assistants, which should not be made parts of the compiler -- for the same reason for which 'lint' as not part of the compiler. I would dearly love to have an "active" lint like tool. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
pcg@aber-cs.UUCP (Piercarlo Grandi) (02/10/90)
In article <RAULMILL.90Feb7094338@alcor.usc.edu> raulmill@usc.edu (R D R) writes:
On the other hand, there are languages like APL which with a few
changes (like variable scoping, and function rank declaration) ought
to make very nice compiling languages. /* APL implementation of
matrix multiply: A <- B +.x C */. From what I've heard, it is
possible to DEVELOP code in APL about 6 or 7 times faster than what is
usual with FORTRAN.
Well, this may be too optimistic, as APL has some obscurities
as well. On the other hand, APL is an application level
language, and here I have no qualms about optimizing it, as for
SQL, in another application context.
However, even languages at the correct level of abstraction for
the application, for which the compiler can use interesting
strategies for optimization, occasionally require the
programmer to use special attention, if the programmer knows
better than the optimizer. But thank goodness this is rarer.
There are many ways to code matrix multiplication; I'd rather
not write one and then expect the compiler to transform it into
another one that it thinks more appropriate, but just use the
high level concept and then the compiler can, more safely, do
what it wants. Unless of course I am the implementor of the
matrix multiplication module, where I want the compiler to
respect my code obediently.
--
Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
pcg@aber-cs.UUCP (Piercarlo Grandi) (02/10/90)
In article <6004@bd.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
We are talking here about five or ten page subroutines containing
some pretty complicated algebra. Radar antenna design programs,
for instance, can easily have expressions extending over three or
four lines of text.
This code is hard to maintain at best, but one precondition for
readability and maintainability is that, as far as possible, the
formula written in Algol (or whatever) look as much like the formula
in the book as possible. No other way can you read it, check it,
or understand it.
Let me suggest that your book's math is poorly notated. The
problems I was talking about exist in maths as well -- a lot of
maths is very poorly notated, because it is hard to read, what's
going on is less obvious than it should be. Mathematicians are
not the only tones hat think that a pageful of index rich
formulas is macho. Physicists seem to be more concerned about
the quality of notation they use, I seem to appreciate.
Too bad that maths is often as full of bugs (even typographical
ones) as programs, or more, because of this, even if supposedly
more 'abstract'.
If this means that I call sin(a[i,j-1]) six times in a single
expression, so be it.
Most probably that term also has a meaning in the formula in
the book, and it should have been made evident, to aid
comprehension.
Moreover, again, if all else fails, maybe a programmer's
assistant could be employed profitably. I emphatically don't
think that it should be part of the code generator, though.
--
Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
pcg@aber-cs.UUCP (Piercarlo Grandi) (02/10/90)
In article <2115@castle.ed.ac.uk> nick@lfcs.ed.ac.uk (Nick Rothwell) writes: In article <838.18:06:33@stealth.acf.nyu.edu>, brnstnd@stealth writes: >Piercarlo is entirely correct. If you care about ``information hiding'' >and the ``purity'' of your code more than efficiency, you can't expect >fast code. If you can't bear to put your original code in comments and >write an efficient version with temporary variables, you don't deserve >fast code. This is rubbish, check out the performance of New Jersey ML (high-level functional language, efficient optimised native code generation). Hey, this is not fair! you are compared a cleanly designed, high level language, with C and Fortran here. If you can roll your own, and make it clean, everything changes. I don't like functional languages, but ML is pretty good. I used to like very much Aleph, a CWI language based on CDL, based on twolevel/attribute grammars. Very clever, very clean design. Algol 68, for many applications, is not bad either. APL, SQL, ICON, etcetera. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
ske@pkmab.se (Kristoffer Eriksson) (02/10/90)
In article <4616@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes: >I'm still hoping to see somebody recode matrix multiply in C, using >all the source hacks they want, so we can compare it to what we get >using Fortran and our (not mine, friends') latest transformations. > >We'd run it on a MIPS M/120, with 32 integer registers and 16 fp regs, >The Fortran source is > > subroutine mm(A, B, C, n) > do 1 i = 1, n > do 1 j = 1, n > A(i, j) = 0.0 > do 1 k = 1, n >1 A(i, j) = A(i, j) + B(i, k) * C(k, j) > end I'm not a Fortran programmer, so I am not sure exactly how this routine works. I guess that A, B, and C are arrays and n is the maximum index in these arrays that is to participate in the multiply, and that the array bounds are automatically passed to routine, without explicit mention. That is not "natural" in C, thus we have the problem of what to do instead... Most C compilers don't accept variable size array arguments. The simplest way to code exactly the same functionality without that facility, I think, is to use a macro. The compiler will have the information about the array bounds available for the arrays the macro is used on. Each of the three arrays can have a different size, and they need coincide with n. Maybe you never would use it that way, but it is the same as what I suppose the Fortran routine does. ----cut---- #define mm(A, B, C, n) \ { \ register unsigned idxsize k, j, i; \ register float acc; \ \ for(i = 0; i < n; ++i) { \ for(j = 0; j < n; ++j) { \ acc = 0.0; \ for(k = 0; k < n; ++k) \ acc += B[i][k] * C[k][j]; \ A[i][j] = acc; \ } \ } \ } /* Simple test case */ typedef short idxsize; float a[22][16], b[17][32], c[45][16]; main() { mm(a,b,c,16); } ----cut---- I don't bother using pointers (or reversing the looping direction). I don't see any reason why the compiler should not have the possibility of optimizing the array indexing on it's own. Probably many compilers don't do that, but I think it should in principle be possible for the C-compiler to optimize this routine in the same way as the Fortran compiler does. (This being a macro, the C-compiler will even generate in-line code and use constant array sizes, even without global code analyses.) I don't know about MIPS' compilers. If I instead decide that n gives the array size (maximum indexes) for all arrays, I can write a simple function for it: ----cut---- /* Simple test case */ typedef short idxsize; float a[16][16], b[16][16], c[16][16]; main() { mm(a,b,c,16); } /* Matrix multiply. */ mm(A, B, C, n) register n; register float *B, *C; float *A; { register unsigned idxsize k, j, i; register float acc; for(i = 0; i < n; ++i) { for(j = 0; j < n; ++j) { acc = 0.0; for(k = 0; k < n; ++k) acc += B[i * n + k] * C[k * n + j]; A[i * n + j] = acc; } } } ----cut---- I could write a function that takes differing array sizes too. It isn't very hard, but there are many ways to do it to choose among. If nothing else works, it should be possible to explicitly code most of the things the Fortran compiler might do automatically. That's the character of C, for good or worse. -- Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden Phone: +46 19-13 03 60 ! e-mail: ske@pkmab.se Fax: +46 19-11 51 03 ! or ...!{uunet,mcvax}!sunic.sunet.se!kullmar!pkmab!ske
ske@pkmab.se (Kristoffer Eriksson) (02/10/90)
In article <14229@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: >From article <PCG.90Feb5183407@rupert.cs.aber.ac.uk>, by pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi): >> [... array indexing within a loop - strength reduction ...] >> In C this is done by stepping a pointer thru an array. ... The idea that C >> pointer arithmetic is automatically scaled is another win. > >Consider the following two code fragments (side by side): > > int a[200][2]; int a[200][2]; int *p; int *q; > ... ... > ... p = &a; q = p+1; > for (i=0;i<200;i++){ for (i=0;i<200;i++){ > a[i][0] = expr1(i); *p = expr1(i); > a[i][1] = expr2(i); *q = expr2(i); > }; p += 2; q += 2; > }; > >The first is clearly more readible than the second. Modern compilers >can easily find the optimizations necessary to make the first run as >fast as the second. The first also has the important advantage of >clearly stateing something that the second obscures: there are no >dependencies between the two assignments! The algorithm is stepping >through two independent vectors. Not that I necessarily agree with the first poster about always using pointers for array loops, but why would you use *two* pointers? It would be much more "natural" and less obscure this way: int a[200][2]; int *p; p = &a[0][0]; /* Start fromthe first int of a. */ for (i = 0; i < 200; i++) { *(p++) = expr1(i); *(p++) = expr2(i); }; or this way: int a[200][2]; /* Array of 200 arrays of two int. */ int (*p)[2]; /* Pointer to array of two int. */ p = a; /* Start with the first array of two int of a. */ for (i = 0; i < 200; i++) { (*p)[0] = expr1(i); (*p)[1] = expr2(i); p++; /* Note automatic scaling. */ } > (I know, I made the >problem harder by looping on the first index instead of the second. Did you? I'd rather think it would be the other index that would be somewhat harder, and possibly could motivate use of two pointers. >This is a case where your verbose approach actually decreases the >information content No need to be just that verbose... -- Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden Phone: +46 19-13 03 60 ! e-mail: ske@pkmab.se Fax: +46 19-11 51 03 ! or ...!{uunet,mcvax}!sunic.sunet.se!kullmar!pkmab!ske
chase@Ozona.orc.olivetti.com (David Chase) (02/10/90)
In article <1625@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >If you reckon that your optimizing compiler is less buggy and >more intelligent than your programmer, all that I say blows up. >I have to admit that this is usually the case, though. Thanks -- that's one point on which lots of this rests. I think that one root in our apparent difference of opinion is the culture -- I've got a background in compiling, and how optimizations Ought To Be Done (correctly, to start with). You seem to enjoy programming in C, and (in my experience, though things appear to be improving lately) (1) C compilers have bugs (2) the bugs go unfixed (3) C has been defined by its compilers, so one person's "bug" is another person's "undocumented feature". IF there's someone (the vendor, perhaps?) responsible for repairing the compiler when bugs are found, THEN it is probably more cost-effective to repair the compiler than to (try to) educate an army of programmers in the way of "precise coding". Now, if you live in a world of perpetually broken and useless optimizers, then C is not such a bad way to go, but I'd rather repair the compilers. Note, too, that people have been writing compilers for some time, and there have actually been improvements in technology. There's table-driven instruction selection (not as useful on a RISC, but there's still those Other Chips out there), and the algorithms for doing things like (global) constant propagation and loop invariant code motion are getting cleaner and less ad hoc. There's reasonable algorithms for optimal instruction scheduling (Simons and someone else, POPL 1990) for a reasonable class of machines. (Note -- latency and block loads are one thing that will suck a good deal of registers. "Vectorization" isn't just for vector machines) The best argument I know against hand-optimizing your code is that programmers often don't know what runs fast on the machine that they're compiling to, and they certainly don't know what will run fast on the machine that their program will run on 5 years from now. The people at Ardent spent some time working on analysis to "deoptimize" those fucked-up little loops that C programmers are so fond of writing, so that they could vectorize them (Allen and someone, SIGPLAN PLDI, 1988). David
brnstnd@stealth.acf.nyu.edu (02/10/90)
In article <14235@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: > Good example. It demonstrates the point I've been making all along! > Ther are optimizations that the compiler can't yet do well. There _may_ > be optimizations that compilers will _never_ be able to do. It is these > types of things that programmers should spend their effort on. Finally, we agree! The argument is one of degree: how much time to invest in optimizing various ``levels'' of a program. You trust your compilers to do a good job of strength reduction, array indexing, common expression elimination, and so on, so you don't worry about them. I'm more cynical, so I write things like #ifdef HANDOPT ... #else ... #endif in critical sections of a program. (As long as I remain consistently better at this than my machine's optimizer, I'll keep doing it.) In any case, neither of us bothers with trivial optimizations, and both of us worry about major changes. [ hand optimizations ] > They are also things which > decrease the clarity of the code. How? In one program using FFT-related methods I use a reverse(n,bits) function. n can be very large, up to 2^30. First I wrote out the obvious code. Now it's in comments, replaced by very fast code and a precomputed 128K table. You might just leave the slow version, as it only took 10% of the program's time before optimization. How is my version any less clear than the original? If you want to understand the ``real'' workings of the code, the original is still there for you to see. It's in comments, but it's there. You shouldn't criticize hand optimization. You should criticize the poor coding practice of throwing away the unoptimized version. ---Dan
sommar@enea.se (Erland Sommarskog) (02/11/90)
Kristoffer Eriksson (ske@pkmab.se) writes: >Jim Giles (jlg@lambda.UUCP) writes: >> int a[200][2]; int a[200][2]; int *p; int *q; >> ... ... >> ... p = &a; q = p+1; >> for (i=0;i<200;i++){ for (i=0;i<200;i++){ >> a[i][0] = expr1(i); *p = expr1(i); >> a[i][1] = expr2(i); *q = expr2(i); >> }; p += 2; q += 2; >> }; >> >>The first is clearly more readible than the second. ... > >Not that I necessarily agree with the first poster about always using >pointers for array loops, but why would you use *two* pointers? It >would be much more "natural" and less obscure this way: > >int a[200][2]; int *p; >p = &a[0][0]; /* Start fromthe first int of a. */ >for (i = 0; i < 200; i++) { > *(p++) = expr1(i); > *(p++) = expr2(i); >}; OK, I don't speak C, maybe this is a standard approach once you know the language, but I find Kristoffer's proposal to be at least as obscure as Jim Giles' right-hand example. I fail to see any problems with the left-hand fragment which is the obvious way to write it. The other two look as the programmer's been training for that obsfucated (or whatever the word is) C contest. >or this way: > >int a[200][2]; /* Array of 200 arrays of two int. */ >int (*p)[2]; /* Pointer to array of two int. */ >p = a; /* Start with the first array of two int of a. */ >for (i = 0; i < 200; i++) { > (*p)[0] = expr1(i); > (*p)[1] = expr2(i); > p++; /* Note automatic scaling. */ >} Slightly more understandable, but still obscure. It seems to assume that the dimension are stored in a certain order. Although this probably is defined by the language, it still seems unnecessary to rely on such a fact. Anyway, the human reader have to think a while before he can ensure that the fragment is correct. -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se Unix is a virus.
ske@pkmab.se (Kristoffer Eriksson) (02/13/90)
In article <767@enea.se> sommar@enea.se (Erland Sommarskog) writes: >Kristoffer Eriksson (ske@pkmab.se) writes: > >Not that I necessarily agree with the first poster about always using > >pointers for array loops, but why would you use *two* pointers? It > >would be much more "natural" and less obscure this way: > > > >int a[200][2]; int *p; > >p = &a[0][0]; /* Start from the first int of a. */ > >for (i = 0; i < 200; i++) { > > *(p++) = expr1(i); > > *(p++) = expr2(i); > >}; > >OK, I don't speak C, maybe this is a standard approach once you >know the language, Yes, it is. > but I find Kristoffer's proposal to be at >least as obscure as Jim Giles' right-hand example. Jim complained about that his own example that it obscured the fact that expr1() was stored only in a...[0], and expr2() only in a...[1], at least that is what I think he meant. But the only thing that obscured that (in my view), was his use of two indepent pointers to step through [0] and [1] respectivel. It's hard to see that they don't step on each other. That is what I perceived as his main point with the example. So I showed that the example was badly written. In my example above, one can easily see that expr1() and expr2() will never be stored on each other, since thee loop just sequentially steps through the whole a[][] array (should be easy for any C programmer at least). Not as easy as in Jim's left hand example, but *that was not the point*! (And if you found the initial assignment to p (p = &a[0][0]) hard to read, that's no argument against my example, because Jim's should have read so too. Just saying "p = &a" is wrong. And besides, "&a[0][0]" is not that hard to read. It just takes the address of the first element of a.) (And if you find "++" hard to read, that only shows you are not used to C.) > I fail to >see any problems with the left-hand fragment which is the obvious >way to write it. You apparently did not read my very first sentence: >>Not that I necessarily agree with the first poster about always using >>pointers... I, too, find the left-hand example slightly more readible. But Jim's example wasn't a fair demonstration of that. Wasn't it you that in another news group, not long ago, said you could object to unfair arguments without necessarily disagreeing with the conclusion?... (swnet.politik) > The other two look as the programmer's been >training for that obsfucated (or whatever the word is) C contest. You wouldn't win a bit with those examples. > >[ second example] > >Slightly more understandable, but still obscure. It seems to assume >that the dimension are stored in a certain order. It is basically the same as the example above it, just written another way. And doesn't make any more assumptions than either the one above it or Jim's right-hand one. > Although this probably is defined by the language, In C arrays are written name [index1] [index2] [and_so_on], not name [index1, index2, and_so_on], wich seems to indicate a slight difference in the conception of arrays... > it still seems unnecessary to rely on such a fact. This is C, after all. > Anyway, the human reader have to think >a while before he can ensure that the fragment is correct. Normally, I would use Jim's right-hand example, if for no other reason than that it uses fewer variables, and is shorter. -- Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden Phone: +46 19-13 03 60 ! e-mail: ske@pkmab.se Fax: +46 19-11 51 03 ! or ...!{uunet,mcvax}!sunic.sunet.se!kullmar!pkmab!ske
jlg@lambda.UUCP (Jim Giles) (02/13/90)
From article <2844@pkmab.se>, by ske@pkmab.se (Kristoffer Eriksson): > [...] > int a[200][2]; /* Array of 200 arrays of two int. */ > int (*p)[2]; /* Pointer to array of two int. */ > p = a; /* Start with the first array of two int of a. */ > for (i = 0; i < 200; i++) { > (*p)[0] = expr1(i); > (*p)[1] = expr2(i); > p++; /* Note automatic scaling. */ > } Still more obscure than using 'a' directly. Furthermore, using the array notation directly removes the need for _ANY_ pointer increment in the loop (whether it is automatically scaled or not). Other than that, the code is now nearly identical to the code I gave as best - the array indexing is still done _automatically_ by the compiler. Since the supposed advantage of using pointers was _not_ to rely on the compiler for optimizing the index calculations, this version doesn't support Peter Grandi's claims. J. Giles
dts@quad.uucp (David T. Sandberg) (02/13/90)
In article <767@enea.se> sommar@enea.se (Erland Sommarskog) writes: :Kristoffer Eriksson (ske@pkmab.se) writes: : >Jim Giles (jlg@lambda.UUCP) writes: : >> int a[200][2]; int *p; int *q; : >> ... : >> p = &a; q = p+1; : >> for (i=0;i<200;i++){ : >> *p = expr1(i); : >> *q = expr2(i); : >> p += 2; q += 2; : >> }; : > : >would be much more "natural" and less obscure this way: : > : > int a[200][2]; int *p; : > p = &a[0][0]; /* Start fromthe first int of a. */ : > for (i = 0; i < 200; i++) { : > *(p++) = expr1(i); : > *(p++) = expr2(i); : > }; : :OK, I don't speak C, maybe this is a standard approach once you :know the language, but I find Kristoffer's proposal to be at :least as obscure as Jim Giles' right-hand example. You're right: you don't speak C. Kristoffer's version is a very common way to do operations like this in C. I've written several bits of code nearly identical to that in the last week. And as a C programmer, I found Kristoffer's version instantly comprehendable, whereas Jim Giles' right hand example required a second look. And I daresay that Kristoffer's version would compile into more efficient code. BTW, you don't even need the parens in the lines which do the assignment and increment to pointer *p, because it evaluates that way anyway, so I would have written it as: *p++ = expr1( i ); -- "Now beat it, or the red guy screams again!" David Sandberg, dts@quad.sialis.mn.org or ..uunet!rosevax!sialis!quad!dts
root@kunivv1.sci.kun.nl (Privileged Account) (02/13/90)
(Dan Bernstein) writes: >(Nick Rothwell) writes: >> (Dan Bernstein) writes: >> >Piercarlo is entirely correct. If you care about ``information hiding'' >> >and the ``purity'' of your code more than efficiency, you can't expect >> >fast code. If you can't bear to put your original code in comments and >> >write an efficient version with temporary variables, you don't deserve >> >fast code. >> >> This is rubbish, check out the performance of New Jersey ML >> (high-level functional language, efficient optimised native code >> generation). > >Are you saying that your compiler is smarter than you are or just that >it's not as lazy? > >Can it figure out, say, Knuth's algorithm for computing the delta2 table >in Boyer-Moore string searching, starting from a dumb algorithm for the >same job? > After some discussion with my colleague Norbert V\"olker (who doesn't read news), I can come up with the following reply: - first, Knuth's algorithm as published in [1] is *WRONG*. This already shows that not even DEK, who is an *extremely* competent programmer, to my knowledge, was able to come up with the algorithm. See [2] for the improved algorithm. - second, N. V\"olker is currently working on a (transformational) derivation of the delta-algorithm for BM pattern matching (with H. Partsch). Their version of the algorithm is going to be different from the delta2 algorithm in [2], it uses more space and less time. It is Norbert's feeling, however, that in the future compilers will be able to come up with the delta2 algorithm, since all that is involved in a derivation by hand is finite differencing (aka strength reduction). The reason that current compilers are not able to do this is that they do not ``know'' what lists (strings, sequences, or whatever) *are*. Cf. [3] for the basic laws about strings. My view of this whole issue is the following: - there are two kinds of optimisations: routine ones and smart ones. Routine optimisations are: strength reduction, code motion, loop merging, (`bout everything you heard about in your course on Optimizing Compilers) Smart optimisations are inventions of new algorithms, like the BM pattern matching algorithm *itself*, the non-trivial sorting algorithms (cf. Darlington's paper `A Synthesis of Several Sorting Algorithms') The only problem I don't know for sure how to place is register allocation. - the routine optimisations can be done (eventually) by a compiler; all the examples of things that `you'd better do yourself because your compiler is too stupid' mentioned so far fall in this class. - the smart optimisations can only be done by humans because they involve ``eureka's'' and design decisions; it is, however, possible to limit the number of eureka's by inventing strategies (cf. some work by D.R.Smith and A.Pettorossi) and by gaining knowledge about your problem domain first (e.g. lists, cf. [3]). References: [1] Knuth, Morris, Pratt: Fast Pattern Matching in Strings, Siam J. Comput. 6, 1977. [2] Rytter: A correct preprocessing algorithm for Boyer-Moore string-searching, Siam J. Comput. 9, 1980. [3] R.S. Bird: An introduction to the theory of lists, in: M.Broy (ed.): Logic of Programming and Calculi of Discrete Design, ASI Series Vol. F36, Berlin:Springer 1987. I'm very sorry for you, mr. Bernstein, that you chose an example that proves my (and Nick Rothwell's) point. Eerke Boiten, Department of Informatics, K.U.Nijmegen Toernooiveld, 6525 AD Nijmegen, The Netherlands. Tel. +31-80-612236. eerke@cs.kun.nl
jlg@lambda.UUCP (Jim Giles) (02/14/90)
From article <2858@pkmab.se>, by ske@pkmab.se (Kristoffer Eriksson): > [...] > (And if you found the initial assignment to p (p = &a[0][0]) hard to > read, that's no argument against my example, because Jim's should have > read so too. Just saying "p = &a" is wrong. [...] I'll have to look at my example again, but I don't think I said just: "p = &a". What I thought I said was: "p = (int *) &a", which is correct. If I forgot the cast, I apologize. J. Giles
brnstnd@stealth.acf.nyu.edu (02/14/90)
The argument is whether hand optimizations are bad. My point is that there are optimizations that current compilers can't do at all in practice, even though there are no barriers in theory; if those must be done by hand, what's wrong with doing simpler ones by hand? You can't draw the line between trivial optimizations and so-called linear programming (inductive generalization, trading space for time, or whatever you want to call it). I'm answering this article because the author was the only respondent who realized that my example is, in principle, mechanically solvable. Others tried to draw that line, saying (in essence) that only humans could figure out Knuth's delta2 algorithm. But there's no reason that compilers can't do linear programming---the necessary AI is old hat. They can insert one temporary variable; why not allocate some memory and insert n of them? In article <1041@kunivv1.sci.kun.nl> root@kunivv1.sci.kun.nl (Privileged Account) writes: > (Dan Bernstein) writes: > >Can it figure out, say, Knuth's algorithm for computing the delta2 table > >in Boyer-Moore string searching, starting from a dumb algorithm for the > >same job? > - first, Knuth's algorithm as published in [1] is *WRONG*. I'm not talking about Knuth's delta2 algorithm as presented in his addendum to the 1976 Knuth-Morris-Pratt paper on the subject. I'm talking about Knuth's delta2 algorithm, the perfectly correct example of a solution to a linear programming problem. Don't misinterpret. (If you don't understand this, read Knuth's ACP historical comments on Euclid's algorithm, and think about what we call Euclid's algorithm.) > It is Norbert's feeling, however, that in the future compilers will be > able to come up with the delta2 algorithm, since all that is involved in > a derivation by hand is finite differencing (aka strength reduction). Exactly! I agree completely with Volker. This all falls more generally under linear programming. That's why I chose the example: Calculating the delta2 table quickly is a (conceptually) simple job of inserting appropriate temporary variables. > My view of this whole issue is the following: > - there are two kinds of optimisations: > routine ones and smart ones. Volker just erased this line. Now you're turning around and redrawing it? > Smart optimisations are inventions of new algorithms, like the BM > pattern matching algorithm *itself*, I'll bet that a few hundred years from now, compilers will be able to figure this out. After all, the basic idea is just to sort a certain three-dimensional comparison network into an efficient order. (Didn't think of it that way, didja.) You can't make this distinction. > The only problem I don't know for sure how to place is register > allocation. I place it at the very bottom: it's the only thing that can't be done by hand, because the language has no support for it. (This is about the only unarguable point in this discussion.) So the compiler had better do a good job. > I'm very sorry for you, mr. Bernstein, that you chose an example that > proves my (and Nick Rothwell's) point. Not at all. Have you lost track of what the point is? Read again. Everything you say above ``my view is'' supports my argument; what you say below ``my view is'' has nothing to do with my example. ---Dan
rang@cs.wisc.edu (Anton Rang) (02/15/90)
In article <449@quad.uucp> dts@quad.uucp (David T. Sandberg) writes: >You're right: you don't speak C. Kristoffer's version is a very >common way to do operations like this in C. I've written several >bits of code nearly identical to that in the last week. Of course, if the array is redeclared to a different size, you get interesting side-effects in the pointer versions, which can be rather annoying. For instance, adding a third element to each row (say, to hold a precomputed function of the other two elements later on) will screw up every loop like this. Anton +---------------------------+------------------+-------------+ | Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison | +---------------------------+------------------+-------------+
preston@titan.rice.edu (Preston Briggs) (02/15/90)
In article <3668:04:34:52@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes: >The argument is whether hand optimizations are bad. My point is that >there are optimizations that current compilers can't do at all in >practice, even though there are no barriers in theory; if those must >be done by hand, what's wrong with doing simpler ones by hand? You >can't draw the line between trivial optimizations and so-called linear >programming (inductive generalization, trading space for time, or >whatever you want to call it). Sure we can draw a line. If the compiler can, in practise, save us some work, it should. Extending your argument will have us adding on our fingers again I don't need the machine to do this! After all, it's not my equal in number theory. Better seems I don't need to do this! I've got machines to do it for me. (Do you suppose we could modify an Eliza to carry on these arguments for us?) Preston Briggs preston@titan.rice.edu
ske@pkmab.se (Kristoffer Eriksson) (02/15/90)
In article <14239@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: >From article <2858@pkmab.se>, by ske@pkmab.se (Kristoffer Eriksson): >> (And if you found the initial assignment to p (p = &a[0][0]) hard to > >I'll have to look at my example again, but I don't think I said just: >"p = &a". What I thought I said was: "p = (int *) &a", which is >correct. Correct, yes, but you defeat any type-checking by the compiler, as the cast overrides all checking. It hides the type of "a", both to the human reader and to the compiler. If you accidentally substituted "a" for e.g. a character string, the compiler would still accept and compile it. Explicitly taking the address of the first element of the array exactly states what you are doing, and automatically gives you the type of that element. If it doesn't match the type of the variable it is being assigned to, the compiler will tell you. An alternative to the complicated expression "&a[0][0]" is the simple "a[0]", but I think that one is more confusing to the reader, as it can mean several different things. -- Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden Phone: +46 19-13 03 60 ! e-mail: ske@pkmab.se Fax: +46 19-11 51 03 ! or ...!{uunet,mcvax}!sunic.sunet.se!kullmar!pkmab!ske