firth@sei.cmu.edu (Robert Firth) (09/03/86)
To respond to some issues that have been raised. First, I don't believe the Assembler should reorganise code. The purpose of assembler code is to allow the user total control over what the machine will do, and the assembler has no business taking away any of that control. Moreover, why would you WANT the assembler to reorganise? Presumably you are writing in assembler to do things the compiler can't, such as access interrupts, devices &c, in which case you probably need tight control. If you are writing in assembler because you don't trust the optimising compiler, well, would you trust an optimising assembler? Filling the hole after a delayed branch, and similar transformations, are done at a late stage in the compilation process, because it is only at that late stage that the program representation has branch instructions in it. While being compiled, a program usually goes through several intermediate transformations, gradually becoming more and more like machine code. What starts as a Pascal WHILE loop becomes a tree node with the key token "while", then maybe a flowgraph node with two entries and one exit, then maybe a code node with "branch false" token. Only at the last stage is it possible to do instruction reordering. Unfortunately, this way of building compilers misses a lot of optimisations. To reuse a previous example, filling the hole by loop rotation may be impossible, because at the stage where the compiler knows about branches, it has forgotten about WHILE statements, and must laboriously reanalyse the flow by scanning the linear machine code. The lesson here is that one can get good code with successive optimisations based on partial information and heuristics, but to get truly "optimal" code one must never throw information away. Hence, the program representation must contain the fact that there is a branch, and SIMULTANEOUSLY the fact that this branch is the branch back at the end of a pure WHILE loop.
mash@mips.UUCP (John Mashey) (09/04/86)
In article <304@sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes as shown below, generally saying that a reorganizing assembler is a bad idea. First, I give some data about use of assembler at MIPS, then repond to the specific comments: SOME DATA: the 4.3BSD version of the MIPS UNIX kernel has 10105 lines of assembler, of which 5944 are IEEE floating-point emulation (for systems without fltpt hardware), and the remaining 4161 are the traditional "machine" directory routines like locore.s, etc. The 4161 equates to the VAX's 3122 lines; our coding style is less dense and has more comments. Thus, to first approximation, there is a similar amount of assembler code. Of the 10105 lines of code, about 800 (i.e., 8%) are done ".noreorder", i.e., "leave this code alone". Interestingly, the FP emulation never uses .noreorder, so of the code that uses it, 800/4161, or about 20% is .noreorder. Outside the kernel, assembler is used for the usual things. I didn't gather the stats, but I'd say there's a typical amount of assembler for a UNIX port. Diagnostics, especially AVTs (Architectural Verification Tests) often use assembler, and they use .noreorder quite heavily. Now, for comments: >To respond to some issues that have been raised. > >First, I don't believe the Assembler should >reorganise code. The purpose of assembler code >is to allow the user total control over what >the machine will do, and the assembler has no business >taking away any of that control. No. Purposes of assembler also include: a) Writing things that cannot be expressed otherwise. b) Getting the utmost speed for truly time-critical routines. The data above speaks for itself: most of our code doesn't need to turn reordering off. The comment above is like aying that one should write in machine languauge, and not let an assembler do "nasty" things like assigning addresses to symbols, or Span-Dependent-Instruction generation. > >Moreover, why would you WANT the assembler to >reorganise? Presumably you are writing in >assembler to do things the compiler can't, such >as access interrupts, devices &c, in which case you >probably need tight control. If you are writing >in assembler because you don't trust the optimising >compiler, well, would you trust an optimising >assembler? I WANT it to reorganize our code (except when we tell it not to) because: a) it runs faster. b) It suppresses a level of detail that we don't need to be bothered with all of the time. c) Sometimes it does optimizations that would be very tricky to code and maintain, i.e., when it fills a delay slot by grabbing the branch's target instruction into the delay slot and then branching 1 instruction beyond the original target. d) If the assembler doesn't do reorganization, EVERY compiler must do it. If customers want to move other compilers to our system, they can either generate U-code (intermediate code) as input to the optimizer, or assembler as input to the assembler. If the natural choice is to generate assembler, and if the assembler doesn't reorganize, then these optimizations are lost, unless the compiler is made to do it, which is usually not desirable. I.e., people have compilers with simple back-end code generation that absolutely don't want to know much about specific machien properties. People write assembler, when necessary, for the reasons above, NOT because they don't trust optimizing compilers. In our case, the assembler does all sorts of things, like generating optimal sequences for multiplies. (Is this useful? Well, I suppose that if you want multiply by 7, you could turn that into shift-left-3 & subtract yourself, but that's ugly when you have: mul $1,SIZE where SIZE is a #define shared with C programs.) > >Filling the hole after a delayed branch, and similar >transformations, are done at a late stage in the >compilation process, because it is only at that >late stage that the program representation has >branch instructions in it. While being compiled, >a program usually goes through several intermediate >transformations, gradually becoming more and more >like machine code.... Only at the last stage is it possible to >do instruction reordering. Yes. Those who do reordering in the compiler per se do it at the end. > >Unfortunately, this way of building compilers misses >a lot of optimisations. To reuse a previous example, >filling the hole by loop rotation may be impossible, >because at the stage where the compiler knows about >branches, it has forgotten about WHILE statements, >and must laboriously reanalyse the flow by scanning the ----------^^^^^^^^^^^ it's not that laborious. >linear machine code. The lesson here is that one can >get good code with successive optimisations based on >partial information and heuristics, but to get truly >"optimal" code one must never throw information away. >Hence, the program representation must contain the >fact that there is a branch, and SIMULTANEOUSLY the >fact that this branch is the branch back at the >end of a pure WHILE loop. There is some truth here, in the sense that generating truly optimal code means that some compiler phase knows "everything", and does everything at once, sort of. Unfortunately, such compilers are hard to build and get right, hence, most people don't. As it turns out, there are certainly interactions amongst optimizer/code generator/reorganizer that one must design carefully, in the same way that people design code generators knowing that there will (or will not) be a peephole optimizer tweaking the generated code. Good optimizing compilers are hard, and as usual, a reasonable engineering solution is to divide up the complexity into manageable hunks. Although I can't offer hard data, I have examined at least 10K lines of disassembled code (usually of UNIX kernels) generated this way: a) There are a few tweaks that can be done to improve the code, where interactions of optimizer, code generator, and reorganizer could be tuned up. b) There isn't much in a), however. There's maybe 2-4% improvement, at best (in terms of code space, anyway), that I could identify. That doesn't mean there isn't a bunch that I couldn't find, but I did look pretty carefully. c) I've looked very hard at pices of code that contained higher than usual fractions of unfillable delay slots. In most cases, the code was already as good as could reasonably be gotten, i.e., the nops were intrinsic to the structure of the code, and doing better would require mind-reading compilers. This discussion seems to be caught in an infinite loop: a) reorganizing assemblers are claimed to be unworkable, a Bad Idea, or otherwise awful. b) Somebody who actually uses one (me!) cites data and experience that says otherwise. A while later, a) happens again, and somebody else tells me that this kind of software doesn't work, which contradicts what I see. Maybe my comments are getting lost in the net, or maybe I'm just not communicating them properly, but it's getting depressingly repetitive. To move this discussion along, it would help to get more data from people who actually use reorganizers, whether implemented in compiler or assembler. For example, I'd love to hear from Spectrum folks, for example. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
henry@utzoo.UUCP (Henry Spencer) (09/07/86)
> First, I don't believe the Assembler should > reorganise code. The purpose of assembler code > is to allow the user total control over what > the machine will do... John Mashey has already rebutted this in some detail. I'd like to add one thing: I think we have an unrecognized cultural difference here. The Unix community assumes that compilers generate assembler, while a lot of other people assume the compiler goes direct to binary. Reorganization at the assembler level makes rather more sense when you realize that *all* code goes through the assembler. I.e., most of the code going into the assembler was *not* written directly by humans. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry