chased@rbbb.Eng.Sun.COM (David Chase) (02/27/91)
(Even though I'm replying to Dan, I'll be polite. Truth is stranger than fiction, eh?) brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Observation 2: Whoever writes the optimizer for FUBAR---let's call this >guy Natas---*could* make every single FUBAR instruction available from >Z. All he has to do is make sure that there's *some* language construct, >perhaps ridiculously convoluted, that will compile to each instruction. >Observation 3: Natas rarely does this. I have yet to see any compiler >understand any portable expression of Hamming weights, even though this >wouldn't be very difficult. Even in the occasional case where the >compiler writer shows some evidence of having used the assembly language >he's compiling for, he rarely tells programmers what the optimizer can >and can't do. >Let's assume that Natas is not such a devil, and manages not only to >give his optimizer some way of using FUBAR's CXi operation and some way >of using FUBAR's DREM operation, but also to *document* (gasp) the >corresponding expressions in Z. Now Joe can write portable Z code that >will run fast on FUBAR, taking advantage of the chip's instructions. All >he has to do is follow Natas's instructions. There's one problem with this: finite resources (if you or Herman Rubin wants to pony up a dump-truck full of cash, maybe we can talk, but I'll bet your resources are finite, too). I think you'll agree that no compiler writer should spend time optimizing for the machine-specific case until most of the optimization has been done for the portable (standard-conforming) case. So, after we've taken care of reduction in strength, global value numbering, constant propagation, redundancy elimination, loop invariant code motion, instruction selection, register allocation, scheduling, loop unrolling, loop straightening, peephole improvements, software pipelining, linear function test replacement, loop fusion, stripmining, blocking, dead code elimination, tail call elimination, leaf routine optimization -- oops, better algorithms appeared, so we have to reimplement a couple of those -- oops, new machine, time to fool around with the scheduling and instruction selection some more -- oops, time to do some fancy code placement to help out the cache -- oops, time to do interprocedural optimization. I think you get the picture. Always, optimization efforts have to be directed towards those things that will make the largest number of present and future customers happy. When all the work is done for C, Fortran, Pascal, Modula, and C++, is it better to expose non-portable machine-specific optimizations to the programmer, or should we look at extending the optimizer to be useful to Lisp, Ada, Cobol, RPG-III, Prolog, and Jovial? Maybe we'd make people happier if the optimizer ran twice as fast, or in half the memory. Maybe we should make use of some of the dataflow algorithms to provide debugging feedback to the user ("there's no exit from this loop; perhaps it won't terminate?") Another difficulty with the scheme that you describe is that you really don't want people to be writing their code in strange little idioms because that will engage some magic widget in the optimizer. It's portable, but weird. That doesn't help readability or debuggability, and it may not be portable into the future. If someone rewrites the optimizer, the last thing they want to do is support weird little hacks like that. If you must (and some people must), then use something based on subroutine calls. That has the advantage that (1) if no machine support exists, it is easy to plug in portable code and (2) machine support often exists for assembly-language implementations (for instance, say "man inline" on a Sun). Of course, there are some things that "ought" to be recognized because they are portable, but it still isn't clear that they are needed, or worth the cost. For instance, on the Sparc we *could* spill register %i7 (return PC) and use that register for other purposes in the main body of a subroutine. Fine, except that we'd break all the debuggers, including adb. That adds big costs to the final debugging that goes on before a product (ours, or some software vendor's) product is shipped. Probably not worth it. David Chase Sun
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/27/91)
In article <8658@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: > There's one problem with this: finite resources Look, Dave. Say you have a chip with 100 instructions, of which you use 90 in your compiler output. Pick one of the ten left: DREM, perhaps. How long would it take you to write a DREM function in portable C? So you write it just *one* way, and get your optimizer to understand that *one* way, and document that *one* way. How long does this take? An hour, perhaps, depending on the structure of your compiler. Can you spare 10 hours to make your compiler vastly more useful? > Another difficulty with the scheme that you describe is that you > really don't want people to be writing their code in strange little > idioms because that will engage some magic widget in the optimizer. > It's portable, but weird. That doesn't help readability or > debuggability, and it may not be portable into the future. Maybe you didn't understand what I said. By definition, everything done here is portable. If your compiler understands <shazam!> to be DREM, and <shazam!> doesn't work the same on another machine, then your compiler is simply incorrect. Programmers need to write efficient code that *will* be portable into the future, and my scheme is designed to allow this. As for maintainability... If the programmer *needs* Hamming weights, then he's going to write the code for it anyway. Adding an equivalent section of code for a different machine won't hurt the maintainability of the original. In fact, I think a pick-this-or-that construct would *help* maintainability, as it would let the programmer display his original, comprehensible code along with each transformed version. Do you have some religious argument against ``strange little idioms''? Well, it's the compiler writer's fault---your fault---if the idioms are strange. What's important to me is that they be portable. If you want to cure the ``strangeness,'' publish a standard for what you consider to be the right idiom, as I suggested before. > If you must (and some people must), > then use something based on subroutine calls. That's fine. ``If you write the following routine: int sumones(x) int x; { int a; int b; <blah, blah, blah> } then our optimizer will convert any call to sumones() into a CX instruction. You may only change the names sumones, x, a, and b; you must use exactly the order of instructions shown here, or our optimizer will not recognize the routine.'' This is all I'm asking for. Surely your compiler can do fixed pattern-matching without trouble? Is this such a ``weird little hack''? ---Dan
chased@rbbb.Eng.Sun.COM (David Chase) (02/27/91)
1) I prefer David, not Dave. 2) My time is still better spent doing general-purpose optimizations. 3) For hysterical reasons, an optimizer pre-mutilates the code, so idioms may get transformed beyond recognition in a context-dependent way. 4) There is ALREADY a tool that will do most of what you want. Say "man inline" on some nearby Sun machine running a recent rev of the operating system. "f77 -v -fast ..." might provide some guidance. Given that an 80% solution exists, that makes solving the problem again even less profitable. 5) Any "idiom" designed to trigger use of a specific instruction may be portable for correctness, but it will not be portable for speed in future releases of the compiler unless care is taken. Verifying that it continues to provide the same function originally promised to customers may well take more time than it took to add it in the first place. 6) Since we have a multi-language optimizer, we might choose to do it for some language other than C first. Fortran comes to mind. >> If you must (and some people must), >> then use something based on subroutine calls. > >That's fine. ``If you write the following routine: int sumones(x) int x; >{ int a; int b; <blah, blah, blah> } then our optimizer will convert any >call to sumones() into a CX instruction. 7) No, it is simpler to use the existing inliner tool to inline assembly language, and the results are more certain. If there tremendous demand, it goes into a library. Portable into the future, and not dependent on the behavior of the optimizer. 8) Need is demonstrated with dollars. Other people back up their requests (sometimes unreasonable requests) with dollars, so they get serious attention paid to them. David Chase Sun
tom@ssd.csd.harris.com (Tom Horsley) (02/27/91)
>>>>> Regarding Re: What the compiler won't do you for you; brnstnd@kramden.acf.nyu.edu (Dan Bernstein) adds: brnstnd> Look, David, I cannot afford to write any large number of routines in brnstnd> assembly language. Most of my code must work on several different brnstnd> platforms without excessive rewriting. I have the time to write an brnstnd> occasional routine in assembly language for a few machines, but this is brnstnd> simply not cost-effective most of the time. Look, Dan. Say you have to port your code to 100 machines, of which 90 don't have machine instructions for DREM anyway. How long would it take you to write a DREM function in portable C? So you write it just *one* way, and use it on those 90 machines. For the 10 machines that do have some sort of DREM instruction, you hand code an assembly routine. How long does this take? An hour, perhaps, depending on the machine instruction you need to execute. Can you spare 10 hours to make your application run on 10 machines? -- ====================================================================== domain: tahorsley@csd.harris.com USMail: Tom Horsley uucp: ...!uunet!hcx1!tahorsley 511 Kingbird Circle Delray Beach, FL 33444 +==== Censorship is the only form of Obscenity ======================+ | (Wait, I forgot government tobacco subsidies...) | +====================================================================+
chased@rbbb.Eng.Sun.COM (David Chase) (02/28/91)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In the article you responded to, David, I proposed a language construct >to let the compiler choose one of several sections of code. It is f'ing >obvious that this makes the program ``portable for speed.'' It is f'ing >obvious that I proposed the construct *exactly* because I am worried >about efficiency, and because I am trying to *reduce* the work of the >compiler writer and programmer in making code efficient. The only way >now to make portable code efficient is to spend years communicating with >language designers and chip writers. I want to reduce that time. Dan, you're missing the forest for the trees, several times over. Even if the general idea that you propose were worthwhile, your scheme for accessing special-purpose assembly language instructions from portable code is too complicated. It is gratuitously difficult for both the compiler writer and the programmer. Compare: Dan's method: compiler writer writes recognizer for special code sequences. compiler writer documents these code sequences. programmer reads the code sequences. programmer writes these code sequences. Simpler method: compiler writer writes library of inlineable code. compiler writer documents names of routines that are in this library. programmer uses those names. if you insist, compiler writer writes portable versions of these subroutines, though programmer would have had to do it anyway in your method (working from documentation). No need to recognize complicated code sequences, and easier to use. The inliner sucks in the code automagically. Writing the portable version is no harder than figuring out what pattern is recognized, but you don't have to write a pattern recognizer, OR impose restrictions on what transformations might be performed by the optimizer now or in the future that might complicate pattern recognition (yes, straight-line code can get oddly mutilated). No need to promise that in the future, a special chunk of code will result in the use of a special instruction. Now, no doubt in your next twisted missive you will claim that is exactly what you proposed, since, of course, a procedure call is a special case of a code sequence. This behavior is already supported by existing compilers -- there's several people who write inlineable routines for Fortran -- so if this is what you meant, then you are ignorant of the state of current tools. Since that cannot possibly be the case, I conclude that this is not what you meant. In addition, as I said before, there is ample evidence that time is better spent doing a good job of optimizing code in a general way. If it paid to do the things that you propose, it would have showed up in compiler tuning. People do look at the code that is generated, and try to come up with ways to make it better. Somehow, nothing like your idea even made it onto the list, let alone near the top. Perhaps you have an exaggerated opinion of your intelligence and the importance of your problems. Just a guess. By the way, is abbreviating "fucking" into "f'ing" consider politeness on the net? Thanks for being so considerate. Your pal, David Chase Sun
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/28/91)
In article <8787@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: > Simpler method: > compiler writer writes library of inlineable code. > compiler writer documents names of routines that are in this library. > programmer uses those names. I give up. The man just refuses to see that this does NOT produce portable code. Sun writes one library, and DEC writes another one, and Convex writes another one, and a programmer who uses these libraries will NOT be able to run his programs on more than one machine. > No need to promise that > in the future, a special chunk of code will result in the use of a > special instruction. Nor does he see that the compiler writer NEVER has to make such a promise. Again it is obvious that his viewpoint is twisted by his Sun-specific environment. Too bad for him that the rest of the world uses hundreds of different architectures. If David opens his eyes he will see that what I am proposing will ALWAYS improve the portability of efficient code. But because this requires a bit of work on the compiler-writer's part, his eyes will remain closed. > In addition, as I said before, there is ample evidence that time is > better spent doing a good job of optimizing code in a general way. If > it paid to do the things that you propose, it would have showed up in > compiler tuning. People do look at the code that is generated, and > try to come up with ways to make it better. Somehow, nothing like > your idea even made it onto the list, let alone near the top. Actually, David, I found out from e-mail that essentially the same ideas have been published before, in many separate papers. I still find it amazing that David can draw a line between optimizing *(p+a) into a single instruction on a Convex, optimizing c = a/b and d = a%b into a single instruction on most CISC chips, and optimizing a short Hamming weight function into a single instruction on a Cyber. ---Dan
chased@rbbb.Eng.Sun.COM (David Chase) (02/28/91)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <8787@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: >> Simpler method: >> compiler writer writes library of inlineable code. >> compiler writer documents names of routines that are in this library. >> programmer uses those names. > >I give up. The man just refuses to see that this does NOT produce >portable code. Sun writes one library, and DEC writes another one, and >Convex writes another one, and a programmer who uses these libraries >will NOT be able to run his programs on more than one machine. I believe you forgot to quote the next line that said "if you insist, provide portable implementations of the subroutines in that library". Ergo, portable code. Or, the programmer can write the portable code him- or herself; that's no harder than your atrocity. No harder for programmers, simpler for compiler, easier to maintain, means BETTER. Furthermore, note that your scheme for triggering special instructions through the use of special code sequences doesn't do anything for all the code that has already been written. No win means not better. >Nor does he see that the compiler writer NEVER has to make such a >promise. If the documentation says "do this to get the compiler to emit the splartzfooie instruction", that's a promise. It seems to be the case that a non-trivial number of people with money will interpret ANY discovered behavior as a promise, so imagine the signicance of documentation. >Actually, David, I found out from e-mail that essentially the same ideas >have been published before, in many separate papers. References, please. I'd love to be enlightened. >I still find it amazing that David can draw a line between optimizing >*(p+a) into a single instruction on a Convex, optimizing c = a/b and >d = a%b into a single instruction on most CISC chips, and optimizing a >short Hamming weight function into a single instruction on a Cyber. It's an easy distinction to make. I'll bet you'd loan someone a dime, and not be too upset about not getting it back. I'll also bet that you wouldn't feel the same way about $100. Same difference -- LOTS of code can benefit from recognizing *(p+a), AND it is easy to recognize using any number of fast algorithms (it is a simple tree pattern, and no unification is required). Recognizing that two divisions can be compressed into one is not so automatic, but it matters sometimes, and it isn't too hard to catch most cases, given a basic block that's had value numbers applied to it. This last case occurs far less often, and I think it's harder to recognize (I'd have to go look up what a Hamming weight function is. If it involves a loop, it is harder). Given the low return for high investment, why bother? So, the line is easy to draw: profitable, probably profitable, probably unprofitable. You and Herman Rubin are not a large enough market to justify more than a couple of minutes of work, especially given that there is an alternative. There are far better wild geese to chase. [Lest anyone wonders why I persist, I'm practicing for the day when I have to deal with a two-year old.] David Chase Sun
preston@ariel.rice.edu (Preston Briggs) (03/01/91)
> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >>I found out from e-mail that essentially the same ideas >>have been published before, in many separate papers. > >References, please. I'd love to be enlightened. Good ref's that many people should read include papers describing the ECS project undertaken at IBM Yorktown in the mid-70's. ECS was Experimental Compiling System. Very roughly, it worked like this: They defined an intermediate language (IL). This was fairly interesting by itself, and rather more extensive and complete than the norm. They defined (built?) an extensive IL optimizer Front-ends produced IL Libraries were kept in IL form There was the potential for extensive inlining with reoptimization (and more inlining, ...) New operations could be added by providing appropriate IL definitions. With adequate inlining, constant propagation, type propagation, etc... these new operations could be extensively optimized (examples included string concatenation, which is trickier than many imagine). I have only a few references. "A Case Study of a New Code Generation Technique for Compilers" J. L. Carter CACM, 1977 "A New Strategy for Code Generation -- The General Purpose Optimizing Compiler" W. Harrison IEEE Transactions on Software Engineering, 1977 "The Experimental Compiling System" F. Allen, Carter, Fabri, Ferrante, Harrison, Loewner, Trevillyan IBM Journal of R & D, November 1980 Unfortunately, the project apparently died after a few years. Why? I dunno. Perhaps the success of the PL.8 work. Perhaps it was too complex. Perhaps it didn't work. Perhaps funding or politics. Personally, I was frightened by the complexity of their intermediate language. I also wonder how to control the huge potential inlining. Can compile-times be controlled? I think many people suggesting various forms of compiler extensibility can learn a lot by reading theses guys. In particular, the case study paper illustrates how operations can interact and how difficult it can be to optimize the results. >>I still find it amazing that David can draw a line between optimizing >>*(p+a) into a single instruction on a Convex, optimizing c = a/b and >>d = a%b into a single instruction on most CISC chips, and optimizing a >>short Hamming weight function into a single instruction on a Cyber. There are significant differences between the three cases. The 1st is simply instruction selection based on a single tree. The 2nd is much trickier. The 2 instructions don't share a common root, so we don't have any easy place to start looking (for efficient search). They do share operands, which can be used as a big hint. Hence the commonality can often be detected with a simple extension to value numbering (like Chase suggested). The extension many people are arguing for: c, d = a divrem b is much more significant (at least to languages like C or Fortran). Basically, you've suddenly defined a new language with significantly different syntax. The front-end of the compiler will probably have to be chopped up pretty extensively, though the optimizer and code generator might survive unscathed. Better than a massive hack job would be to look into languages that allow various forms of multiple-assignment (Mesa, Beta, Lisp, Clu, others?). Do some reading! The 3rd is much harder again. Recognizing all the ways I might code a particularly complex function as being equivalent to some bizzarre instruction is quite difficult -- probably impossible. It's also wrong-headed, for many reasons. Chase pointed out that there's only a small payoff. Perhaps he didn't make his argument clearly enough. Think of all the things that are wrong with some particular compiler, and prioritize the list. First is usually that it produces incorrect code in some important case. It also compiles to slow, uses too much memory, and produces slow code. Why is the object code slow? Well, we didn't do good enough strength reduction and the value number has bugs. Our dependence analysis can handle subscript expressions that complex, and we don't know how to block loops for the TLB yet. Or whatever. These matter a lot and occur over and over in everyone's code. And there you sit, working on that special hamming-weight function recognizer so Preston's code will run faster. Besides being a generally undecidable problem (program equivalence), Preston suddenly decides to change his code a little or the architects redefine how the special hamming-instruction works in the first place (say, how one of the condition code bits is set). Besides, is the instruction faster than those cute little RISCish instructions anyway? Especially when the optimizer realizes that you've already done half a hamming-thing earlier and that it needn't repeat some of the instructions. >>I give up. Good idea. One of the cool things about the net is that it provides a way we can ask questions and get quick answers. For example, I can ask about the MIPS I-cache and usaually get a precise answer from someone who knows. It isn't helpful to tell John Mashey he's wrong about how his machine works! Similarly, you aren't going to win arguments with David (or Ben) Chase about how compilers work.
khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (03/01/91)
In article <24280:Feb2802:55:4991@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
...
If David opens his eyes he will see that what I am proposing will ALWAYS
improve the portability of efficient code. But because this requires a
bit of work on the compiler-writer's part, his eyes will remain
closed.
...
I question whose eyes are closed here. Exactly how many compiler
writers do you expect to conform to your new ad hoc standard ? In what
timeframe ? If you rely exclusively on the kindness of compilter
writers, you will have zero portability.
Methods which start with class libraries (modules etc.), are
completely portable. Performance can be dialed in over time, and if
need be compilers altered. Portability is maximal; as even old systems
can provide the required functionality.
This is precisely why languages features such as modules and operator
overloading were put into ISO DIS 1539 Fortran.
--
----------------------------------------------------------------
Keith H. Bierman kbierman@Eng.Sun.COM | khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33 | (415 336 2648)
Mountain View, CA 94043
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/01/91)
In article <KHB.91Feb28184824@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes: > I question whose eyes are closed here. Exactly how many compiler > writers do you expect to conform to your new ad hoc standard ? Huh? It's purely a quality-of-implementation issue. Someone whose compiler *cannot* produce DREM or CX or any other available instruction has a poor compiler. And someone with a better compiler had better document the idioms that his optimizer will understand, or programmers won't be able to take advantage of the feature. Are you claiming that a compiler that *can't* produce DREM on a VAX is better than a compiler that *can*, all else being equal? Sure, it takes a bit of time for the compiler writer to add this feature. That's the disadvantage of any improvement to software: someone has to write the code. > If you rely exclusively on the kindness of compilter > writers, you will have zero portability. Huh? I've seen several responses that make the same claim. But it's entirely wrong. The recognized idioms will typically be *different* under different compilers, but since they're written in PORTABLE code this is entirely irrelevant. If I write code for one idiom and use it under a compiler that doesn't recognize the idiom, the code will still work, and I'm certainly better off than if I had used assembly language on the first machine. > Methods which start with class libraries (modules etc.), are > completely portable. And this is completely useless. I've said this before, and I'll say it again: I cannot wait for language designers to catch up. I need to be able to write efficient, portable code NOW, not in ten years when a better language is widely available. Sure, it would be nice if C or Fortran had Hamming weights. But they don't. So stop talking about portable libraries to solve these problems: they don't exist. ---Dan
hrubin@pop.stat.purdue.edu (Herman Rubin) (03/01/91)
In article <5412:Mar107:10:0491@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: ......................... > And this is completely useless. I've said this before, and I'll say it > again: I cannot wait for language designers to catch up. I need to be > able to write efficient, portable code NOW, not in ten years when a > better language is widely available. I fully sympathize with you, but neither you nor I will be able to write efficient, portable code NOW. We both WANT to, but I see no reasonable prospect of this in the near future. What I believe can be done in the near future is to produce a totally non-optimizing macro translator, and let the user and compiler/assembler do the optimizing afterwards. This is within present knowledge. But if the operations one wishes to use are not even known by the language, and cannot be added to it, even reasonable code cannot be written. Followups to comp.lang.misc, please. This is not a hardware problem, although I suspect that hardware would change to take into account these instructions, now totally disregarded. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
thomson@hub.toronto.edu (Brian Thomson) (03/01/91)
In article <1991Feb28.181948.26958@rice.edu> preston@ariel.rice.edu (Preston Briggs) writes: > The 3rd is much harder again. Recognizing all the ways > I might code a particularly complex function as being equivalent > to some bizzarre instruction is quite difficult -- probably > impossible. All the ways, yes. But many of the ways, perhaps not. I seem to remember running across a reference, about 10 years ago (that's when I saw it, not necessarily when it was published), that dealt with recognizing common sequences of operations in APL and compiling them into particularly efficient code. I think they called it "recognition of programming idioms" or something of the sort. Does this ring a bell with anyone? > It's also wrong-headed, for many reasons. Chase pointed out > that there's only a small payoff. I think that, in the APL case, the payoff was somewhat higher because it sometimes encourages the use of powerful yet expensive operations to accomplish relatively simple tasks. -- Brian Thomson, CSRI Univ. of Toronto utcsri!uthub!thomson, thomson@hub.toronto.edu
wilson@uicbert.eecs.uic.edu (Paul Wilson) (03/02/91)
thomson@hub.toronto.edu (Brian Thomson) writes: >I seem to remember running across a reference, about 10 years ago >(that's when I saw it, not necessarily when it was published), >that dealt with recognizing common sequences of operations in APL >and compiling them into particularly efficient code. >I think they called it "recognition of programming idioms" or something >of the sort. Does this ring a bell with anyone? I'm not sure we're talking about the same thing, but it rings bells with me. The Hewlett-Packard APL 3000 compiler did snazzy dynamic compilation, allowing it to do optimizations of program fragments that actually get executed, as opposed to working that hard on all of the dynamically possible sequences. (See Van Dyke, E.J., "A dynamic incremental compiler for an interpretive language," Hewlett-Packard Journal, July 1977, pp.17-24.) This is philosophically similar to Chambers & Ungar's recent work on the Self compiler, which uses aggressive inlining and interprocedural optimization. It comes up with "customized" versions of code for different kinds of call sites. This and the inlining allow a lot more type analysis of code for a (*very*) dynamically-typed language. That lets them optimize away most of the dynamic type checks, and (importantly) optimize across things they otherwise couldn't optimize across. Self is very, very fast for a dynamically-typed language. (Self is like Smalltalk, only more so. It doesn't even have classes, like Smalltalk; it uses prototypes intead. The implementation has an analogue of classes to get back the efficiency advantages. See Chambers & Ungar, "Customization: optimizing compiler technology for Self, a dynamically-typed object-oriented language," Proc. SIGPLAN '89, pp. 146-160.) It seems likely that some of the stuff from the APL 3000 compiler could also be generalized to do a good job optimizing operations on parameterized types and homogeneous collection types in a strongly-typed polymorphic language like Modula-3 or Eiffel. -- Paul -- Paul R. Wilson Software Systems Laboratory lab ph.: (312) 996-9216 U. of Illin. at C. EECS Dept. (M/C 154) wilson@bert.eecs.uic.edu Box 4348 Chicago,IL 60680
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/03/91)
In article <TOM.91Feb27071247@hcx2.ssd.csd.harris.com> tom@ssd.csd.harris.com (Tom Horsley) writes: > Can > you spare 10 hours to make your application run on 10 machines? Oh, great, and when a library routine has a dozen things like this, and a thousand people are using the library, you can expect that a few hundred of them are using architectures that I've never seen. You want each of them to spend dozens of hours just to make a library run fast? You want the library to get slower and slower as the years go by and as architectures change? You want to have a dozen ten-way #ifdefs that every user has to worry about? Contrast this with the situation I'm proposing. The compiler writers on those 10 machines with DREM make sure to recognize and document *some* portable idiom for the instruction. I don't have to learn any assembly languages; I just have to copy the routines straight out of the manuals. Since the language has a pick-this-or-that construct, I don't need *any* #ifdefs. Since there are only so many obvious idioms for a given operation, my code has a good chance of being perfectly efficient on an architecture I've never seen. The users don't have to worry about a thing. Even better, as time goes by, the code will get faster: compiler writers will recognize more idioms, the idioms will slowly become standardized, and the code will work efficiently without change. ---Dan
dik@cwi.nl (Dik T. Winter) (03/04/91)
Oh dear, again... In article <23481:Mar311:05:3591@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > Contrast this with the situation I'm proposing. The compiler writers on > those 10 machines with DREM make sure to recognize and document *some* > portable idiom for the instruction. I don't have to learn any assembly > languages; I just have to copy the routines straight out of the manuals. > Since the language has a pick-this-or-that construct, I don't need *any* > #ifdefs. Since there are only so many obvious idioms for a given > operation, my code has a good chance of being perfectly efficient on an > architecture I've never seen. The users don't have to worry about a > thing. Even better, as time goes by, the code will get faster: compiler > writers will recognize more idioms, the idioms will slowly become > standardized, and the code will work efficiently without change. > Compiler writers are very inventive in the idiom they recognize. Do not expect two compiler writers to support the same idiom. And even then... Peter Montgomery and Bob Silverman do request two slightly different routines, in order to do (in essence) the same operation. We have the BigNum package from INRIA/DEC, which requires a different operation again, and next we have Henry Cohen from Bordeaux with again different requirements. And I am doing some stuff in this field myself, and do something different again. If the users do apparently not have a common view, how do you think the compiler writers come to a common view? I think that in view of this it is not important to standardize on an interface with a DREM instruction (should it operate on unsigned only, or signed only, or should both options be present; all occur in existing hardware), it is more important to standardize on a mp library. (Yes I am working on such a thing, but at this stage my package is far from complete; although it has been ported to some 40 platforms for fixed size integers.) More about this in my follow-up to John Mashey. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/06/91)
In article <3061@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: > Compiler writers are very inventive in the idiom they recognize. Do not > expect two compiler writers to support the same idiom. And even then... > Peter Montgomery and Bob Silverman do request two slightly different > routines, in order to do (in essence) the same operation. True. Since few compiler writers for the VAX have made the effort to provide documented DREM support, nobody agrees on idioms yet, and we all have different ideas about what good idioms would be. So what? There doesn't have to be any agreement on idioms for my plan to work. That's why it's such a crucial element of the plan that the idioms be written portably (and why it's helpful to have a pick-this-or-that control structure). At first, instead of learning five different assembly languages and figuring out how to write inlined assembly code under five different compilers, the programmer will copy five routines straight out of manuals. Even this reduction in effort would be well worth the compiler writer's time, but that's not the only advantage. As I explained in my previous article, as time goes by, idioms for a certain instruction can only become more standardized, and programs written for those idioms can only become faster. Everything will be perfectly portable to start. > Users do apparently not have a common view, how do you think > the compiler writers come to a common view? I don't, though I imagine that standards will spring up eventually. My whole point is that we should be able to write portable, efficient code *now*, even with *no* agreement on standards. All it takes is a bit of effort on the part of each compiler writer. > it is more important to > standardize on a mp library. Is there anyone else here who understands why suggestions like this are so pointless? I can't wait for the language designers to catch up and force everyone to provide an mp library. I think it's fine that you're writing libraries, and I fully support any efforts to standardize libraries. But standardization is SLOW. There is a much larger problem here, namely all the chip instructions that *aren't* supported by languages or other standards. Why do you refuse to admit that the programmer should be able to take advantage of those instructions without sacrificing portability? ---Dan
msp33327@uxa.cso.uiuc.edu (Michael S. Pereckas) (03/06/91)
In <4267:Mar602:32:0591@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <3061@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: >> it is more important to >> standardize on a mp library. >Is there anyone else here who understands why suggestions like this are >so pointless? I can't wait for the language designers to catch up and >force everyone to provide an mp library. I think it's fine that you're >writing libraries, and I fully support any efforts to standardize >libraries. But standardization is SLOW. There is a much larger problem >here, namely all the chip instructions that *aren't* supported by >languages or other standards. Why do you refuse to admit that the >programmer should be able to take advantage of those instructions >without sacrificing portability? Do you really suppose that your idiom recognition with your pick-this-or-that and adequate documentation will become at all widespread sooner than a new library? Or even that it could be implemented faster if everyone wanted to? If the language doesn't do what you want it to, is the hack you're describing really the way to fix it? Even if it is workable, and people keep telling you that it isn't, it sure isn't elegant. It's not that we don't think you should be able to access machine instructions that can speed up your code, it's that we think that the method you describe for doing so is bad and impractical. And that it won't come any faster than a standard library. \begin{:-)} Besides, it isn't ``object oriented''. It is ``turbo,'' but turbo went out a few years ago. ``Hyper'' still has a bit of life left. If you can find a way to call it hyper, maybe you can sell it. Name it ``fuzzy'' and it will be a hit in Japan. At least this year. \end{:-)} -- Michael Pereckas * InterNet: m-pereckas@uiuc.edu * just another student... (CI$: 72311,3246) Jargon Dept.: Decoupled Architecture---sounds like the aftermath of a tornado
lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) (03/07/91)
In article <8787@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: >brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >>In the article you responded to, David, I proposed a language construct >>to let the compiler choose one of several sections of code. : : >Dan, you're missing the forest for the trees, several times over. >Even if the general idea that you propose were worthwhile, your scheme >for accessing special-purpose assembly language instructions from >portable code is too complicated. It is gratuitously difficult for >both the compiler writer and the programmer. Compare: This may be true, but I question it. Vectorizing compilers have been around in usable form for about a decade. Before that, some compilers recognized certain loops and variations to emit specific code sequences (e.g. "Stacklib" type loops). (In unusable form, they have been around for two decades :-) ) Note that vectorizing compilers DO recognize specific multiple statement language constructs and, sometimes, optimize these constructs down to a simple code sequence, or, in some cases, even a single instruction. >Dan's method: : >compiler writer writes recognizer for special code sequences. >compiler writer documents these code sequences. >programmer reads the code sequences. >programmer writes these code sequences. This is done every day with vectorizing compilers. Programmers write specific code sequences so that the vectorizer will recognize them. The compiler documentation, or,sometimes a separate document, tells programmers exactly how to write code so that it is recognized by the vectorizer. Sometimes, they even write very, very specific sequences that several different vectorizers from several different vendors will recognize. Of course, now Parallelizing compilers are the rage. >Simpler method: > >compiler writer writes library of inlineable code. >compiler writer documents names of routines that are in this library. >programmer uses those names. >if you insist, compiler writer writes portable versions of these > subroutines, though programmer would have had to do it anyway in > your method (working from documentation). This method is also useful and important. Why do they have to be mutually exclusive? BTW, I do insist that portable versions be available. Note that Cray Research provides documentation of equivalent code for many of its scientific library subroutines. In other words, both methods have long been done, and are not news. What Dan is proposing is that scalar oriented machines could benefit from the same technique that vector machine users have all along. I think there probably *are* cases which could benefit from his technique, even on small workstations. In fact, as I understand it, some compilers for RISC micros *do* already recognize, for example, special code sequences that really mean: integer div, and remainder also. There are probably other cases: for example, code sequences that do multiple precision arithmetic that could take advantage of a double precision multiply result (anyone remember the CDC 66xx/Cyber/etc multiply?) : >try to come up with ways to make it better. Somehow, nothing like >your idea even made it onto the list, let alone near the top. Of course, this is incorrect for vector machines. For scalar machines, the performance payoff is usually a lot less. But not always trivial. And, as I mentioned above, a few compilers have a few such cases. I have seen factors of two increase in performance with older compilers by recognizing special code sequences and doing the optimal thing. This sort of thing is a lot more common in numerical simulation type codes. Running typical kernel code, or pointer intensive stuff, the potential for payoff is very small with today's much better optimizers. And, since some readers of this group don't want to talk about any other kind of applications, it is no wonder that the perceived benefit is so small :-) Hugh LaMaster, M/S 233-9, UUCP: ames!lamaster NASA Ames Research Center Internet: lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 With Good Mailer: lamaster@george.arc.nasa.gov Phone: 415/604-6117 #include <std.disclaimer>
lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) (03/07/91)
In article <1991Feb28.181948.26958@rice.edu> preston@ariel.rice.edu (Preston Briggs) writes: : >> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: : >>>I found out from e-mail that essentially the same ideas >>>have been published before, in many separate papers. : >>References, please. I'd love to be enlightened. > They defined an intermediate language (IL). >Unfortunately, the project apparently died after a few years. : >Personally, I was frightened by the complexity of their >intermediate language. I also wonder how to control >the huge potential inlining. Can compile-times be controlled? I would be curious to know if any of the IL's for vectorizing compilers, commercial or not, have been published. I note that several companies, including Cray and CDC, claim to have vectorized IL's with common code generators for multiple languages (C, Fortran, et al.) I believe CDC has been doing this for about 4 years on the Cyber 800/900 series running their own O/S. Cray has done it more recently with their Fortran and C compilers. It seems that vectorization is generally viewed as a front-end, language-dependent, process, unlike what you might expect. >how his machine works! Similarly, you aren't going to win arguments >with David (or Ben) Chase about how compilers work. Maybe not. (It wasn't my argument to begin with - pardon my butting my head in). But I think that there is more to this than is first apparent. P.S. I think there is, indeed, a good case for multiple assignment. *Some* of the funny optimizations that compilers now have to deal with are an unnatural artifact of the languages with only a single assignment operator. > The extension many people are arguing for: > > c, d = a divrem b > > is much more significant (at least to languages like C or Fortran). > Basically, you've suddenly defined a new language with significantly > different syntax. The front-end of the compiler will probably Hugh LaMaster, M/S 233-9, UUCP: ames!lamaster NASA Ames Research Center Internet: lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 With Good Mailer: lamaster@george.arc.nasa.gov Phone: 415/604-6117 #include <std.disclaimer>
peter@ficc.ferranti.com (Peter da Silva) (03/07/91)
In article <4267:Mar602:32:0591@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > At first, instead of learning five different assembly languages and > figuring out how to write inlined assembly code under five different > compilers, the programmer will copy five routines straight out of > manuals. How does this differ from copying 5 inline assembly examples straight out of 5 manuals, except that under your plan it's less obvious that the code is optimised for one particular compiler? > as time goes by, idioms for a certain instruction can > only become more standardized, Right. Unless there is a definite conscious effort to do so, things in general don't move in that direction. Just look at UNIX. Or code formatting in C. > Is there anyone else here who understands why suggestions like this are > so pointless? I can't wait for the language designers to catch up and > force everyone to provide an mp library. But you think it's OK to wait for them to catch up and standardise a bunch of poorly-defined idioms? > Why do you refuse to admit that the > programmer should be able to take advantage of those instructions > without sacrificing portability? #ifdef frobco asm("..."); #else count_ones(...); #endif -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/09/91)
In article <VDY9J-C@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > In article <4267:Mar602:32:0591@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > > At first, instead of learning five different assembly languages and > > figuring out how to write inlined assembly code under five different > > compilers, the programmer will copy five routines straight out of > > manuals. > How does this differ from copying 5 inline assembly examples straight out > of 5 manuals, except that under your plan it's less obvious that the code is > optimised for one particular compiler? It differs in three important ways. First, the code will always remain portable. In your world, the vendors promise to have those library routines always do the same thing; otherwise they will break correct code. In my world, there is no chance of this. Second, the extra code will remain useful, even if the vendors go out of business or the architectures die. In your world, someone might have written code under #ifdef Z80; now that code will be almost completely useless. In my world, the idioms are not hooked to one vendor or architecture, and will make the code faster under any compiler recognizing the same idiom. Third, the #ifdefs can be dispensed with entirely. In your world, there is no way to get rid of the #ifdefs, because each compiler will blow up on the unportable library routines written for the others. In my world, if the language specifies a pick-this-or-that control structure, you can just throw away the #ifdefs and the code will work on any machine. It will also be more resilient to changes in architecture or vendor support. > > as time goes by, idioms for a certain instruction can > > only become more standardized, > Right. Unless there is a definite conscious effort to do so, things in general > don't move in that direction. Just look at UNIX. Yes, let's look at UNIX. Within several years, most machines will support (for example) POSIX termio, and code written for that termio will be extremely portable. I think you're confusing the standardization of *all* system facilities with the standardization of *one* system facility. BSD will always be a very different system from System V, but their common base gradually grows larger and larger. Similarly, although compilers will always support many different idioms, more and more particular idioms will become standardized. A sufficiently standardized idiom can even be added to the language. > > Is there anyone else here who understands why suggestions like this are > > so pointless? I can't wait for the language designers to catch up and > > force everyone to provide an mp library. > But you think it's OK to wait for them to catch up and standardise a bunch > of poorly-defined idioms? No, I don't. That's why my proposal works with absolutely no standardization of the idioms. Whether the idioms gradually become more standard or not, they will have the three advantages I outlined above. ---Dan