radford@calgary.UUCP (08/11/86)
Something I've wondered about: On RISC machines with delayed branches, who does the instruction rearrangement that tries to put useful instructions after the branch? Obvious possibilities are the assembler or the compiler. Does anyone know whether the compiler can do a significantly better job than the assembler? (I'm assuming the compiler generates assembler code.) Radford Neal The University of Calgary
mash@mips.UUCP (John Mashey) (08/12/86)
In article <299@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes: >On RISC machines with delayed branches, who does the instruction >rearrangement that tries to put useful instructions after the branch? >Obvious possibilities are the assembler or the compiler. Does anyone >know whether the compiler can do a significantly better job than >the assembler? (I'm assuming the compiler generates assembler code.) 1) Ours is in the assembler, and I think most others are, also. I haven't seen one that was in the compiler; maybe others have and would say so. 2) Our assembler does a lot of code selection itself to make life saner for the compilers, i.e., load immediate reg,value can be 1 or 2 instructions, and multiply by constant yields all sorts of sequences. Pipeline reorganization naturally happens after such expansions. 3) It's possible that compilers might do it better; however, I don't see much evidence of this [and I've looked at many 1000s of lines of dis-assembled code]: Either: a) The reorganizer already fills the branch delay slot. b) The branch delay is there, and there's nothing anyone could conceivably do about it. c) A modest smartening of the reorganizer would get rid of it, perhaps with also passing a little more information from earlier passes. Most are in a) or b), and my gut feeling [no data] is that compilers can't get much of c) that the assembler can't. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
steve@jplgodo.UUCP (Steve Schlaifer x43171 301/167) (08/13/86)
In article <299@vaxb.calgary.UUCP>, radford@calgary.UUCP (Radford Neal) writes: > Something I've wondered about: > > On RISC machines with delayed branches, who does the instruction > rearrangement that tries to put useful instructions after the branch? > > Obvious possibilities are the assembler or the compiler. Does anyone > know whether the compiler can do a significantly better job than > the assembler? (I'm assuming the compiler generates assembler code.) An optimizing assembler? What a disaster that would be. I want assemblers to leave my code exactly as I wrote it. Compilers on the other hand are translating from high to low level languages, and have a long tradition of performing all kinds of optimizations on the final results. Moving code to after delayed branches is just another kind of optimization. To answer your question, the compiler is the only place where such optimizations can reasonably be done. -- ...smeagol\ Steve Schlaifer ......wlbr->!jplgodo!steve Advance Projects Group, Jet Propulsion Labs ....logico/ 4800 Oak Grove Drive, M/S 156/204 Pasadena, California, 91109 +1 818 354 3171
quiroz@rochester.ARPA (Cesar Quiroz) (08/14/86)
Expires: Sender: Followup-To: >> In article <299@vaxb.calgary.UUCP>, radford@calgary.UUCP (Radford >> Neal) writes: >> > Something I've wondered about: >> > >> > On RISC machines with delayed branches, who does the instruction >> > rearrangement that tries to put useful instructions after the branch? >> > To which steve@jplgodo.UUCP replies: >> An optimizing assembler? What a disaster that would be. I want >> assemblers to leave my code exactly as I wrote it. This position assumes that there is a point in writing code in the lowest levels of a RISC architecture and that strict adherence to the letter of such code is desirable. Although both points have some strength when applied to standard machines, they both get weakened with more complicated cases (for instance, RISCs or machines with strange synchronization requirements). The Assemblers for those machines are designed mainly as postprocessors for compiler output, so they are part of the compilation process (in Unix terms, they allow you to put your layout for a.out in a single program, not in each compiler). Seen from this perspective, it is not outrageous that those assemblers take liberties (which might be an option anyway) with their input. On a slightly different point, there is precedent for assemblers helping in opcode selection. Span-dependent jumps and calls, for instance, can be selected by the assembler, so you don't have to keep a running location-counter as you code; you just say 'j-or-b foo' and the assembler decides on a jump or a branch, depending on how far away foo ends up being. If you are forced to write code in those assemblers AND YOU WANT NO-OPS in the delayed branches, I guess you have the right to insist in keeping that as an option. Same goes for SIMDs with software control of the pipeline interlocks, microprogrammable graphics systems and what-have-you, if the assembler level contains any weirdness that you might not want to see: let the assembler do some clean-up, but keep as an option straight code production. -- Cesar Augusto Quiroz Gonzalez Department of Computer Science {allegra|seismo}!rochester!quiroz University of Rochester or Rochester, NY 14627 quiroz@ROCHESTER
josh@polaris.UUCP (Josh Knight) (08/15/86)
In article <612@mips.UUCP> mash@mips.UUCP (John Mashey) writes: >In article <299@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes: >>On RISC machines with delayed branches, who does the instruction >>rearrangement that tries to put useful instructions after the branch? >>Obvious possibilities are the assembler or the compiler. Does anyone >>know whether the compiler can do a significantly better job than >>the assembler? (I'm assuming the compiler generates assembler code.) > >1) Ours is in the assembler, and I think most others are, also. I haven't >seen one that was in the compiler; maybe others have and would say so. I don't believe the PL.8 compiler for the 801 (or for the ROMP) generates assembly language. I.e. any code movement to take advantage of delayed branches is done by the compiler. -- Josh Knight, IBM T.J. Watson Research josh@ibm.com, josh@yktvmh.bitnet, ...!philabs!polaris!josh
patc@tekcrl.UUCP (Pat Caudill) (08/15/86)
Although I usually want an assembler to assemble what I coded, there is precidence for assemblers doing optimization. Besides long short jump, opcode selection for imed, short, long offset or macros. I beleive the SOAP (Symbolic Optimizing Assembly Program) did instruction scheduling for a machine which this was very important and also very difficult. Code for the machine was mainly in assembly language. The machine is now somewhat dated however. Pat Caudill Tektronix!tekcrl!patc.UUCP
johnl@ima.UUCP (John R. Levine) (08/15/86)
In the AIX C compiler for the RT PC, we handled the delayed branches in the peepholer for the compiler. I wrote a very vanilla assembler by mutating (a lot) the Sys III Vax assembler, so it did nothing more fancy than choosing instruction formats and handling long vs. short jumps. The peepholer seemed to be the best place because it could make assumptions about the kind of code emitted by the compiler. Doing the right thing was a little tricky because only about half of the RT's instructions change the condition code, and you had to look back in the instruction stream for one of those. -- John R. Levine, Javelin Software Corp., Cambridge MA +1 617 494 1400 { ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.EDU The opinions expressed herein are solely those of a 12-year-old hacker who has broken into my account and not those of any person or organization.
mash@mips.UUCP (John Mashey) (08/16/86)
In article <823@jplgodo.UUCP> steve@jplgodo.UUCP (Steve Schlaifer x43171 301/167) writes: >An optimizing assembler? What a disaster that would be. I want assemblers to >leave my code exactly as I wrote it.... > >To answer your question, the compiler is the only place where such >optimizations can reasonably be done. 1) This is an example of an authoritatively-stated non-fact, a thing seen all too often on the net. A better to way to express this opinion might be: "It is my opinion that compilers are the place to do this. Does anybody know of real counterexamples?" 2) The MIPS assembler does this all the time, and it works fine. 3) There are occasional cases where you do want to turn it off: for example: architectural verification tests often want to generate very precisely-chosen code. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
dms@mit-hermes.ARPA (David M. Siegel) (08/16/86)
From: steve@jplgodo.UUCP Newsgroups: net.arch Date: 13 Aug 86 16:45:20 GMT An optimizing assembler? What a disaster that would be. Actually, optimzing assemblers are fairly common, since they are very easy to write. If you have an existing compiler that doesn't optimize, and you need to boost performance a bit, fiddling directly with the assembler code output is the easiest thing to do. -- -Dave
mash@mips.UUCP (John Mashey) (08/17/86)
In article <187@ima.UUCP> johnl@ima.UUCP (John R. Levine) writes: >In the AIX C compiler for the RT PC, we handled the delayed branches in the >peepholer for the compiler. >The peepholer seemed to be the best place because it could make assumptions >about the kind of code emitted by the compiler. Doing the right thing was >a little tricky because only about half of the RT's instructions change the >condition code, and you had to look back in the instruction stream .... 1) It's good to see more data: there seems to be a consensus that reorganization is either in the assembler or very late (peephole time) in the compiler. 2) The condition code note is interesting: it's exactly this kind of thing that encouraged us to remove any trace of condition codes! -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
mash@mips.UUCP (John Mashey) (08/20/86)
In article <688@polaris.UUCP> josh@polaris.UUCP (Josh Knight) writes: ... >>>On RISC machines with delayed branches, who does the instruction >>>rearrangement that tries to put useful instructions after the branch? ... >> >>1) Ours is in the assembler, and I think most others are, also. I haven't >>seen one that was in the compiler; maybe others have and would say so. > >I don't believe the PL.8 compiler for the 801 (or for the ROMP) generates >assembly language. I.e. any code movement to take advantage of delayed >branches is done by the compiler. Thanx for the note. I reread the HP Spectrum papers and they seem to be in the compiler also, although, as far as I could tell, fairly late, i.e., peephole time. Ours passes binary between compiler and assembler, i.e., the "assembler" is really an ascii-binary conversion in front of the the common assembler. Maybe the real issues are: a) When you write handcoded assembler, do you get the optimizations or not? b) Is the pipeline scheduling early in the compilation, or late? (Regardless of the implementation, most cases seem to be late, i.e., peephole time, whether inside the compielr or the assembler. Of course, careful code generation helps the pipeline scheduler by careful choices of registers and other things.) Now, does anyone have some instances where pipeline scheduling is done by, for example, the global optimization phase? -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
patcl@hammer.UUCP (Pat Clancy) (08/21/86)
In article <624@mips.UUCP> mash@mips.UUCP (John Mashey) writes: >In article <823@jplgodo.UUCP> steve@jplgodo.UUCP (Steve Schlaifer x43171 301/167) writes: >>An optimizing assembler? What a disaster that would be. I want assemblers to >>leave my code exactly as I wrote it.... >> >>To answer your question, the compiler is the only place where such >>optimizations can reasonably be done. > >2) The MIPS assembler does this all the time, and it works fine. > There's also a more fundamental issue involved in the choice of where to do reorganizition; namely, the potential interdependence between register allocation and reorganization opportunities. This is especially apparent with global (eg.: coloring) allocation schemes. A particular choice of register assignment will determine (to some extent) how code may be rearranged to fill delayed branch, handle interlock, etc. Then, if the rearranged code were to be run once again through the register allocator, a more optimal register assignment might be found (since liveness ranges of values are now different). Then this output from the allocator might be run through the reorganizer again..., etc. This kind of feedback approach is impossible if register allocation and reorganization are done in different programs (compiler and assembler, respectively). Pat Clancy Tektronix
srm@iris.berkeley.edu (Richard Mateosian) (08/22/86)
>I believe the SOAP (Symbolic Optimizing Assembly Program) did instruction >scheduling. Actually not exactly. It never reordered instructions. On the IBM 650, the main memory was a 2000-word drum, and each instruction had the format op addr addr The first addr was the drum address of the operand, and the second was the drum address of the next instruction. (You thought linked lists were invented by software people?) SOAP let you omit specifying the second addr. Then by taking account of how far the drum would turn during execution of op, it found an optimal drum placement for the next instruction. Richard Mateosian ...ucbvax!ucbiris!srm 2919 Forest Avenue 415/540-7745 srm%ucbiris@Berkeley.EDU Berkeley, CA 94705
alverson@decwrl.DEC.COM (Robert Alverson) (08/22/86)
There is an aspect of this that I think people are missing. If you let the assembler take care of design-specific architectural features, then those features can be more easily changed in the future. I think the MIPS-X assembler also does the instruction scheduling. Of course, if all your programs are in a HLL, scheduling in the code generator seems as appropriate. There are some people, though, who feel that a very low level intermediate form (almost assembler) is desirable for its independence across a family of machines. Bob echo $DISCLAIMER
agr@vaxine.UUCP (Arnold Reinhold) (08/26/86)
In article <931@tekcrl.UUCP> patc@tekcrl.UUCP (Pat Caudill) writes: > > Although I usually want an assembler to assemble >what I coded, there is precidence for assemblers doing >optimization. Besides long short jump, opcode selection >for imed, short, long offset or macros. I beleive the SOAP >(Symbolic Optimizing Assembly Program) did instruction >scheduling for a machine which this was very important >and also very difficult. Code for the machine was mainly >in assembly language. The machine is now somewhat dated >however. > The IBM 650 was a drum memory machine with 2000 ten digit decimal words. Instructions consisted of a two digit op code, a four digit operand address and a four digit address for the next instruction. Optimization consisted of placing the next instruction so it would be just in front of one of the ten read heads when the previous instruction had finished. SOAP did this by a crude aproximation of execution times, there wasn't room for a complete table. Real programmers wrote in machine language and optimized with the aid of an IBM preprinted chart of all memory locations so you could check them off as you used them. SOAP did not change the order of execution of instructions at all, however. For sanity, any assembler should have a what you wrote is what you get option. It would seem vital for the diagnostic programmer, if no one else. Arnold Reinhold Automatix Inc. vaxine!agr