jlg@lanl.gov (Jim Giles) (07/08/89)
Every once in a while, someone in this newsgroup makes the claim that RISC trades off hardware complexity for compiler complexity. This is simply _NOT_ true. It is always _easier_ to write a compiler for a RISC machine than for a CISC machine. Compiling is generally divided into two parts. This "front end" parses the input language and determines whether the program is syntactically and semantically correct. The "front end" then passes the program along in some intermediate form. The "back end" selects instructions which implement the original program, allocates registers (and other temporary resources), and outputs the code. Some compilers combine the "front" and "back end" parts, or divide labor in more unusual ways, but this does not effect the validity of the following argument. The "front end" processing of a compiler is identical whether the target machine is a RISC or a CISC. So, no trade-off of complexity can occur there. For a RISC machine, the only hard part of the "back end" is register allocation. Instruction selection is fairly simple since there is generally only one way the perform each intermediate code operation. On a pipelined machine, code ordering comes into play (at least if you want optimized code). This compilcates matters, since a different code ordering makes different register allocation constraints. For this reason, optimizing can be difficult, even on a RISC machine. On a CISC machine, instruction selection can be a real problem. This is because there are generally several instruction sequences possible to implement each operation. Instruction selection obviously effects the register allocation phase (and the code ordering phase). To make matters worse, instruction selection is context sensitive. This means that different code ordering during optimization could invalidate the instruction selection choice (or, at least make the selected instructions sub-optimal). There may indeed be a trade off between RISC and CISC. But this trade off is certainly _not_ in the complexity of the compiler. In fact, as shown above, the circumstance is quite the opposite.
frazier@oahu.cs.ucla.edu (Greg Frazier) (07/08/89)
In article <13976@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >Every once in a while, someone in this newsgroup makes the claim >that RISC trades off hardware complexity for compiler complexity. >This is simply _NOT_ true. It is always _easier_ to write a compiler >for a RISC machine than for a CISC machine. > [ deleted explanation of compiler front- and backend ] >For a RISC machine, the only hard part of the "back end" is register >allocation. Instruction selection is fairly simple since there is >generally only one way the perform each intermediate code operation. >On a pipelined machine, code ordering comes into play (at least if >you want optimized code). This compilcates matters, since a different >code ordering makes different register allocation constraints. >For this reason, optimizing can be difficult, even on a RISC machine. > >On a CISC machine, instruction selection can be a real problem. This >is because there are generally several instruction sequences possible >to implement each operation. Instruction selection obviously >effects the register allocation phase (and the code ordering phase). >To make matters worse, instruction selection is context sensitive. >This means that different code ordering during optimization could >invalidate the instruction selection choice (or, at least make the >selected instructions sub-optimal). > >There may indeed be a trade off between RISC and CISC. But this >trade off is certainly _not_ in the complexity of the compiler. >In fact, as shown above, the circumstance is quite the opposite. Actually, I believe you are wrong, Jim. While code selection is easier on a RISC, CISC compilers tend to avoid this by only using the simple compilers. On the other hand, RISCs require very good optimizers in order to take advantage of their RISC-ness. This is complex. In addition, most RISC architectures expose their pipelines, and hence require the compiler to avoid interlocks, etc. This is also complex. On a related note, RISCs tend to have delayed branches, register windows, etc. which the compiler must understand. Finally, as you pointed out, RISCs require sophisticated register allocation. So, you are right, in that the reduced instruction set makes instruction selection much simpler. So I guess it is easier to write a stupid compiler for a RISC. However, because a generic RISC arch allows and requires a high level of optimization, they end up being very complex. Greg Frazier &&&&&&&&&&&&&&#########################)))))))))))))))))))) Greg Frazier o Internet: frazier@CS.UCLA.EDU CS dept., UCLA /\ UUCP: ...!{ucbvax,rutgers}!ucla-cs!frazier ----^/---- /
davet@oakhill.UUCP (David Trissel) (07/08/89)
In article <13976@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >Every once in a while, someone in this newsgroup makes the claim >that RISC trades off hardware complexity for compiler complexity. >This is simply _NOT_ true. It is always _easier_ to write a compiler >for a RISC machine than for a CISC machine. False. My group has produced essentially the same compiler for both CISC (68K family) and RISC (88K family) and we've found the various tradeoffs cancel each other out. The amount of work is virtually the same. The code produced by both compilers is comparable. >For a RISC machine, the only hard part of the "back end" is register >allocation. What about the required pairing of registers for double wide operations such as floating-point or shifting? That means the compiler has added complexity in its register allocation scheme. Never does the 68K architecture require paired registers. What about the lack of a 32-bit effective address? This forces some RISC systems to have to resort to Intel segment-like (UGH!!) approaches to support access to the program's common data base. Some RISCS don't even have divide or multiply. (The 88K does have them.) For every CISC compiler hassle you come up with I'll give you a RISC compiler hassle. For every RISC compiler hassle I'll give you a CISC compiler hassle. >Instruction selection is fairly simple since there is >generally only one way the perform each intermediate code operation. This is a strange statement. Since in general terms CISC instruction sets are supersets of RISC models then why are the "extra" available CISC instructions mandated to be used by a CISC compiler? Indeed, one of the arguments for RISC is the elimination of "unused" instructions from the instruction set. Although this may bring up important architectural differences between RISC and CISC it has no bearing on the complexity of a compiler. >On a pipelined machine, code ordering comes into play (at least if >you want optimized code). This compilcates matters, since a different >code ordering makes different register allocation constraints. >For this reason, optimizing can be difficult, even on a RISC machine. But as CISC implementations become more advanced the applicability of code reordering is starting to surface there as well. >On a CISC machine, instruction selection can be a real problem. This >is because there are generally several instruction sequences possible >to implement each operation. I presume what you are driving at is that a CISC may have a shorter more complex instruction sequence available to it than a RISC for a given piece of code. Point one is that the CISC compiler does not have to generate the more complex sequence since it can still produce the degenerate case. Point two is that the CISC compiler has the option of improving the instruction stream while the RISC implementation has to wait for a future more advanced hardware implementation to come along which can support something like simultaneous instruction execution. As a case in point consider the somewhat common C expression *p++. The 68K family could issue the standard: mov.l (%an),%dn add.l &4,%an or the faster mov.l (%an)+,%dn The 68K requires a routine in the compiler peephole optimizer to "discover" and implement this optimization. But the result is a single 16-bit instruction which (I think) executes in a single clock on the MC68040. >Instruction selection obviously >effects the register allocation phase (and the code ordering phase). True for both RISC and CISC. No difference. >To make matters worse, instruction selection is context sensitive. >This means that different code ordering during optimization could >invalidate the instruction selection choice (or, at least make the >selected instructions sub-optimal). True for both RISC and CISC. No difference. >There may indeed be a trade off between RISC and CISC. But this >trade off is certainly _not_ in the complexity of the compiler. Correct. >In fact, as shown above, the circumstance is quite the opposite. This is counter-indicated above. Perhaps you have some concrete examples in mind to show otherwise. -- Dave Trissel Motorola Austin
tim@crackle.amd.com (Tim Olson) (07/10/89)
In article <25547@shemp.CS.UCLA.EDU> frazier@cs.ucla.edu (Greg Frazier) writes: | Actually, I believe you are wrong, Jim. While code selection | is easier on a RISC, CISC compilers tend to avoid this by only | using the simple compilers. On the other hand, RISCs require | very good optimizers in order to take advantage of their RISC-ness. No, RISCs don't *require* very good optimizers; they just make it easier to perform some optimizations. Very simple compilers can be used with fairly good results. | This is complex. In addition, most RISC architectures expose their | pipelines, and hence require the compiler to avoid interlocks, | etc. This is also complex. *Most* RISC architectures? The Stanford MIPS processor did not have interlocks, but nearly all other RISC processors do (a recent exception is the i860, which exposes the pipeline in pipelined floating-point mode). | On a related note, RISCs tend to | have delayed branches, register windows, etc. which the compiler | must understand. I would claim that the stack-cache operation of the Am29000 register file is *easier* to generate code for than a standard register file. Just determine the maximium number of registers required by a function and allocate them on function entry -- register spilling & filling is taken care of automatically by bounds checks. Yes, delayed branches have to be taken care of, but they aren't particularly hard -- some systems don't put this in the compiler; it is done at assembly or link time, along with other optimizations. | Finally, as you pointed out, RISCs require | sophisticated register allocation. No, it is just useful to help reduce memory traffic. -- Tim Olson Advanced Micro Devices (tim@amd.com)
frazier@oahu.cs.ucla.edu (Greg Frazier) (07/10/89)
In article <26247@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: >No, RISCs don't *require* very good optimizers; they just make it easier >to perform some optimizations. Very simple compilers can be used with >fairly good results. To take advantage of their RISC-ness, the DO require the optimizations. Of course, they can run lousy code - they just go slow. >| This is complex. In addition, most RISC architectures expose their >| pipelines, and hence require the compiler to avoid interlocks, >| etc. This is also complex. > >*Most* RISC architectures? The Stanford MIPS processor did not have >interlocks, but nearly all other RISC processors do (a recent exception >is the i860, which exposes the pipeline in pipelined floating-point mode). When you have single-cycle instructions, talking about exposed pipelines is rather non-sensical. On most RISCs, the only instructions which are multi-cycle are loads and stores, which are variable and thus require interlocks. The i860 is the only RISC with a significant number of multi-cycle instructions on it (and it's still a RISC! :-) - as more processors move in that direction, you are going to see more exposed piplines. Also, if you'll allow me to stretch things a bit, putting inst'ns in delayed branch slots does count as exposing the pipeline, I think - it is very implementation dependent. >I would claim that the stack-cache operation of the Am29000 register >file is *easier* to generate code for than a standard register file. >Just determine the maximium number of registers required by a function >and allocate them on function entry -- register spilling & filling is >taken care of automatically by bounds checks. Yes, delayed branches >have to be taken care of, but they aren't particularly hard -- some >systems don't put this in the compiler; it is done at assembly or link >time, along with other optimizations. You're right - unfortunately, I was confusing in my memory someone complaining how hard it is to write assembly code for register- window machines. Yes, they do make life easier for the compiler, unless you are trying to optimize over/underflows! >| Finally, as you pointed out, RISCs require >| sophisticated register allocation. > >No, it is just useful to help reduce memory traffic. Well, if you don't do the optimizations, and don't reduce the memory traffic, then once again you are throwing out the advantages of a RISC arch, and you might as well be running on a CISC. My basic thrust, please correct me if I'm wrong, is that RISCs are much more dependent upon compiler optimizations for their performance, and if you're not willing to invest in a sophisticated compiler, you're probably working on the wrong machine. Greg Frazier &&&&&&&&&&&&&&&&&&&##########################%%%%%%%%%%%%%%%%%%%%% Greg Frazier o Internet: frazier@CS.UCLA.EDU CS dept., UCLA /\ UUCP: ...!{ucbvax,rutgers}!ucla-cs!frazier ----^/---- /
khb@gammara.Sun.COM (gammara) (07/10/89)
In article <13976@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >This is simply _NOT_ true. It is always _easier_ to write a compiler >for a RISC machine than for a CISC machine. > > stuff about compilers Well, if RISC machines were simple (everyting in 1 cycle, no overlapped execution, sans multiple functional units, etc.) you would be 100% right. Unfortunately RISC is commonly applied to machines like SPARC, MIPSco, i860 and others. It is said that Metaflow is working on a SPARC with several functional units, for example. Even for the current chips employed by Sun, instruction scheduling is one of the most interesting optimizations (wrt effectiveness on real programs). On a machine such as Metaflow is working on, instruction scheduling could become as interesting as a TRACE/28 .... Such compilers are NOT simpler than CISC compilers. Keith H. Bierman |*My thoughts are my own. Only my work belongs to Sun* It's Not My Fault | Marketing Technical Specialist ! kbierman@sun.com I Voted for Bill & | Languages and Performance Tools. Opus (* strange as it may seem, I do more engineering now *)
davecb@yunexus.UUCP (David Collier-Brown) (07/10/89)
In article <13976@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >Every once in a while, someone in this newsgroup makes the claim >that RISC trades off hardware complexity for compiler complexity. >This is simply _NOT_ true. It is always _easier_ to write a compiler >for a RISC machine than for a CISC machine. [well-reasoned arguement for a Vax] If one is to consider CISC machines, there are two things to remember: 1) they come in families 2) the popular ones aren't always the ones you want to consider. The IBM /360 and VAXen are good examples of a particular era. That does not make them the only CISC machines. The Honeywell (now Bull) DPS-8 is a conscious attempt at a very-complex-instruction set computer, based on the basic architecture of the era of IBM 7090s and DEC-10s. Given a compiler (say, FORTRAN IV) for the basic machine language, one can add all the constructs for PL/1 and (you should pardon the expression) COBOL to your language in about two man-days: Honeybun put the primitives into the EISbox (Extended Instruction Set). One can even generate good code for them (;-)), because they're either regular, or only exist in one variant. The machine is still in production, is still very CISC, and actually runs rather well. They really did "narrow the semantic gap between the machine language and the language understood by the compiler", which was the reason d'etre of the vCISC machines. Mind you, I can't recommend the underlying order code (the stuff that preceded the EISbox) to my worst enemy. The machine is a sorta wart with an elegant bag on the side. And Waterloo even has a very standard/rather good C compiler for it. --dave (I once worked on Honeybuns) c-b -- David Collier-Brown, | davecb@yunexus, ...!yunexus!davecb or 72 Abitibi Ave., | {toronto area...}lethe!dave Willowdale, Ontario, | Joyce C-B: CANADA. 223-8968 | He's so smart he's dumb.
jkrueger@daitc.daitc.mil (Jonathan Krueger) (07/10/89)
In article <13976@lanl.gov>, jlg@lanl (Jim Giles) writes: >Every once in a while, someone in this newsgroup makes the claim >that RISC trades off hardware complexity for compiler complexity. >This is simply _NOT_ true. It is always _easier_ to write a compiler >for a RISC machine than for a CISC machine. False. It is always easier to generate code for a simple, orthogonal instruction set. (Corollary: it is always easier to generate optimal code for a simple, orthogonal instruction set with simple, regular rules for timings.) The belief that every RISC machine exhibits these characteristics more than any CISC is not based on fact. -- Jon -- --
rec@dg.dg.com (Robert Cousins) (07/10/89)
In article <13976@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >Every once in a while, someone in this newsgroup makes the claim >that RISC trades off hardware complexity for compiler complexity. >This is simply _NOT_ true. It is always _easier_ to write a compiler >for a RISC machine than for a CISC machine. >In fact, as shown above, the circumstance is quite the opposite. I must agree. Our experience on the 88K with a number of pieces of software from gcc to an internally developed FORTH interpreter has shown that RISCs are easier to generate code for. I would carry this a step further. While the assembly language development effort is minimal, we have had little difficulty with assembly language to the point where I wonder if RISCs might not be easier to program at that level, also. If a programmer can do it easily . . . . Concerning your optimization comment, I would like to throw out an interesting phenomenon we have noticed (perhaps other RISCers will comment also): At one point, people were talking about RISC processors needing to execute MORE instructions to do the same amount of work. THis implied that where a CISC might require 100 instructions, a RISC might require >100 instructions, though the RISC instructions would take less time. We have noticed that in many cases, the instruction count goes down. In fact, the best example is the Dhrystone benchmark. Since Dhrystone is an artificial measure of integer compute power, a "Dhrystone MIPS" is considered one VAX MIPS of compute power. On the 88K, we need less than 1 MIPS to generate a Dhrystone MIPS. In other words, we get Dhrystone MIPS > CPU clock speed. When this first happened here, many people were mumbling about clocks running backward and Escher paintings . . . . :-) Robert Cousins Dept. Mgr, Workstation Dev't. Data General Corp. Speaking for myself alone.
guy@auspex.auspex.com (Guy Harris) (07/11/89)
>They really did "narrow the semantic gap between the machine language >and the language understood by the compiler", >which was the reason d'etre of the vCISC machines. As far as I can tell, so would a library routine implementing the same primitives as the EIS instructions; as such, while it may close some particular semantic gap (i.e., some piece of the semantic gap between solid-state physics and your favorite programming language), I don't know that it closed any very interesting semantic gap....
tim@crackle.amd.com (Tim Olson) (07/11/89)
In article <25562@shemp.CS.UCLA.EDU> frazier@cs.ucla.edu (Greg Frazier) writes: | In article <26247@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: | >No, RISCs don't *require* very good optimizers; they just make it easier | >to perform some optimizations. Very simple compilers can be used with | >fairly good results. | | To take advantage of their RISC-ness, the DO require the optimizations. | Of course, they can run lousy code - they just go slow. I don't agree with this. Our experience with the early (PCC-based) internal compiler for the Am29000 showed pretty good performance with little or no standard optimizations (no load scheduling, common-subexpression elimination, dead-code elimination, live/dead analysis for register allocation, etc.) The only things it performed were those required by the architecture, i.e. delayed branches. Of course, the compiler utilized the features of the architecture, such as keeping all scalar local variables and function parameters in registers, which you may or may not want to count as "optimization". | >| This is complex. In addition, most RISC architectures expose their | >| pipelines, and hence require the compiler to avoid interlocks, | >| etc. This is also complex. When I first read this, I though you were referring to architectures without hardware interlocks -- now I see that you were probably referring to increasing the performance by scheduling instructions, trying to avoid the interlocks. | When you have single-cycle instructions, talking about exposed | pipelines is rather non-sensical. On most RISCs, the only | instructions which are multi-cycle are loads and stores, which | are variable and thus require interlocks. | The i860 is the only RISC with a significant number of multi-cycle | instructions on it (and it's still a RISC! :-) - as more processors | move in that direction, you are going to see more exposed piplines. If by "exposed pipeline" you mean that performance will be enhanced by instruction scheduling, then I agree. However, I don't agree that most RISCs will expose the pipeline to software like the i860 -- most will provide hardware interlocks. -- Tim Olson Advanced Micro Devices (tim@amd.com)
jlg@lanl.gov (Jim Giles) (07/11/89)
From article <25547@shemp.CS.UCLA.EDU>, by frazier@oahu.cs.ucla.edu (Greg Frazier): > [...] > Actually, I believe you are wrong, Jim. While code selection > is easier on a RISC, CISC compilers tend to avoid this by only > using the simple compilers. In this case, you are eliminating the supposed advantage to CISC machines - their richer instruction set. If you don't use a fairly sophisticated instruction selection algorithm, your instruction selection will almost always be long short of optimal. > [...] Finally, as you pointed out, RISCs require > sophisticated register allocation. So do CISCs!! To be sure, CISCs tend to be less reliant upon registers. But they also tend to have _fewer_ registers - so register spilling occurs as often as on RISCs. CISCs require ALL the optimization steps (for the same performance) as RISCs, PLUS the extra complexity of instruction selection (which feeds back into the other optimization steps).
jlg@lanl.gov (Jim Giles) (07/11/89)
From article <2190@oakhill.UUCP>, by davet@oakhill.UUCP (David Trissel): > In article <13976@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > [...] >>For a RISC machine, the only hard part of the "back end" is register >>allocation. > > What about the required pairing of registers for double wide operations > such as floating-point or shifting? In what way does a machine which requires register pairing qualify as a RISC? If an instruction requires 2 operands, they should be allowed to be any two general purpose registers. Furthermore, you are assuming that floating point is larger than other intrinsic types. The best RISCs are those which only have _one_ data size. (By the way, my model of a reasonable RISC would be a Cray-I instruction set without vectors. This is certainly RISCy - all data is 64 bits, all operations are reg to reg, only one memory addressing mode, etc..) > [...] >>Instruction selection is fairly simple since there is >>generally only one way the perform each intermediate code operation. > > This is a strange statement. Since in general terms CISC instruction sets are > supersets of RISC models then why are the "extra" available CISC > instructions mandated to be used by a CISC compiler? Indeed, one of the > arguments for RISC is the elimination of "unused" instructions from the > instruction set. Although this may bring up important architectural > differences between RISC and CISC it has no bearing on the complexity > of a compiler. This is _really_ a strange statement. Since the supposed advantage of CISC is the richer instruction set, failure to use it would not take advantage of the machine. I've heard CISC designers claim that individual instructions can be allowed to be slower than possible in order to provide the additional instructions. If you are not using those extra instructions, you might as well have a RISC which provides only the instructions you _do_ use. The hardware designer could then spend more time making those work faster instead of making sure that the unused instructions work. So, this issue _does_ have a bearing on the complexity of the compiler. If you are not willing to provide the sophisticated compilers required to adequately use a machine, you have wasted money (read: design effort, chip space, etc.) on the hardware. >>On a pipelined machine, code ordering comes into play (at least if >>you want optimized code). This compilcates matters, since a different >>code ordering makes different register allocation constraints. >>For this reason, optimizing can be difficult, even on a RISC machine. > > But as CISC implementations become more advanced the applicability of code > reordering is starting to surface there as well. _EXACTLY_!!!!! All the optimizations required on a RISC are also required on a CISC. CISC just adds more complexity to the mix. > [... example with C: *p++ ...] > mov.l (%an),%dn > add.l &4,%an > or the faster > mov.l (%an)+,%dn > The 68K requires a routine in the compiler peephole optimizer to "discover" > and implement this optimization. But the result is a single 16-bit instruction > which (I think) executes in a single clock on the MC68040. Exactly my point. There are actually several other possibilities for instruuction selection in this case. For example, p may already be resident in a register. The use of the data may require it to end up in a register. The further use of p may require it to be left in a register. Etc.. Only a pretty sophisticated compiler can determine which instructions to use in each context. By contrast, there is only _one_ instruction sequence which will work on most RISC machines: load p, load data, store data, increment p, store p. The 'peephole' optimizer need only discover the redundant loads and stores to fit this sequence into context. The instruction scheduler can reeorder the last four of these any way it likes. Now, clearly 5 instructions may take longer than the 1 in your 68K example. But, RISC machines are easier to pipeline, easier to speed up the clock for, easier to provide staged functional units for, etc.. I don't know of any CISC machines with 'hardwired' instruction sets. Micro- coding slows the machine down, but is typically the only way to fit a CISC on a chip. All this may mean that 5 instructions on a RISC may be _faster_ than one on a CISC.
roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (07/11/89)
In article <26257@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: >In article <25562@shemp.CS.UCLA.EDU> frazier@cs.ucla.edu (Greg Frazier) writes: >| In article <26247@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: >| >No, RISCs don't *require* very good optimizers; they just make it easier >| >to perform some optimizations. Very simple compilers can be used with >| >fairly good results. ^^^^^^^^^^^ > >I don't agree with this. Our experience with the early (PCC-based) >internal compiler for the Am29000 showed pretty good performance with ^^^^^^^^^^^ >little or no standard optimizations (no load scheduling, >common-subexpression elimination, dead-code elimination, live/dead >analysis for register allocation, etc.) The only things it performed >were those required by the architecture, i.e. delayed branches. Of Anybody care to quantify this? Just what sort of performance improvement can I expect from no-holds-barred optimization over only-what-I-have-to optimization.? > >If by "exposed pipeline" you mean that performance will be >enhanced by instruction scheduling, then I agree. However, I don't This term has been confusing me (too?). Is this the (an?) "accepted" definition of exposed pipeline? >agree that most RISCs will expose the pipeline to software like the i860 >-- most will provide hardware interlocks. > Well...who will and who won't? > -- Tim Olson > Advanced Micro Devices > (tim@amd.com) f i l l e r f o r r n -- Roelof Vuurboom SSP/V3 Philips TDS Apeldoorn, The Netherlands +31 55 432226 domain: roelof@idca.tds.philips.nl uucp: ...!mcvax!philapd!roelof
davet@oakhill.UUCP (David Trissel) (07/11/89)
In article <13980@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >> What about the required pairing of registers for double wide operations >> such as floating-point or shifting? >In what way does a machine which requires register pairing qualify as >a RISC? The point was that RISC architectures DO require the pairing. Can you name any that don't? >If an instruction requires 2 operands, they should be allowed >to be any two general purpose registers. This is not what was being discussed. >Furthermore, you are assuming >that floating point is larger than other intrinsic types. Well, unless you limit yourself to only single-precision, floating-point IS larger than other instrinsic types. >The best RISCs are those which only have _one_ data size. Such an architecture would utterly fail in the marketplace. This is so far off base it doesn't need a rebuttal. >(By the way, my model of a reasonable RISC would be a Cray-I instruction >set without vectors. This is certainly RISCy - all data is 64 bits, all >operations are reg to reg, only one memory addressing mode, etc..) Do you have any inkling why your ideas aren't being implemented by RISC designers? >So, this issue _does_ have a bearing on the complexity of the compiler. >If you are not willing to provide the sophisticated compilers required >to adequately use a machine, you have wasted money (read: design effort, >chip space, etc.) on the hardware. Nope, no bearing on the compiler. There is no gun being pointed at the compiler writer forcing her to utilize all the available instructions. Your assumption that most of the instructions have to be used "to adequately use a machine" is simply incorrect. >All the optimizations required on a RISC are also required on a CISC. Wrong. 68K compilers don't have to worry about register pairing. 68K compilers don't have to worry about branch delayed slot filling, setting up segment registers for data addressability, etc. >CISC just adds more complexity to the mix. I showed otherwise in that RISC has it's own set of headaches. I don't see much difference in the total amount of work involved for either RISC or CISC. >Now, clearly 5 instructions may take longer than the 1 in your 68K example. Actually RISCs only require 2 or 3 instructions to do the work of the one in that example. >All this may mean that 5 instructions on a RISC may be _faster_ than one on a CISC. Not when using the latest CISCs. -- Dave Trissel Motorola Austin
roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (07/11/89)
In article <13980@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >that floating point is larger than other intrinsic types. The best RISCs >are those which only have _one_ data size. (By the way, my model of a ^^^^^ Why? f i l l e r -- Roelof Vuurboom SSP/V3 Philips TDS Apeldoorn, The Netherlands +31 55 432226 domain: roelof@idca.tds.philips.nl uucp: ...!mcvax!philapd!roelof
daveh@cbmvax.UUCP (Dave Haynie) (07/11/89)
in article <13980@lanl.gov>, jlg@lanl.gov (Jim Giles) says: > From article <2190@oakhill.UUCP>, by davet@oakhill.UUCP (David Trissel): >> In article <13976@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >> mov.l (%an),%dn >> add.l &4,%an >> or the faster >> mov.l (%an)+,%dn >> The 68K requires a routine in the compiler peephole optimizer to "discover" >> and implement this optimization. But the result is a single 16-bit instruction >> which (I think) executes in a single clock on the MC68040. > Now, clearly 5 instructions may take longer than the 1 in your 68K example. > But, RISC machines are easier to pipeline, easier to speed up the clock > for, easier to provide staged functional units for, etc.. I don't > know of any CISC machines with 'hardwired' instruction sets. That's probably why the 68040 was mentioned in the original example. It has a decent number of hardwired instructions, thus the aforementioned move with post increment executes in one _clock_ (at least I had the same recollection; many of the instructions execute in a single clock). You're going to be hard pressed to make a RISC do better than one instruction/clock (some do by exploiting parallel execution units, a concept not limited to RISC), and if your RISC machine needs more instructions to achieve the same result, you lose. The main advantage any RISC is going to have in the long run is its simplicity. A thing like the 68040 can adopt many if not all of the tricks you'll find in RISC machines, but it's a very large processor, and you're probably not going to see it in ECL or GaAs anytime soon. Its CMOS process is probably not good for much over 50MHz (top speed on the 68030 anyway). Folks are talking about ECL designs starting at 60MHz-80MHz and continuing on with GaAs into the 100MHz to 200MHz range. The first CPUs that get there will be extremely simple ones so that the number of chips (and thus slow external connections) can be kept to a minimum. And so they can be made in the first place with the very latest technologies, which aren't counting millions of transistors yet. -- Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: D-DAVE H BIX: hazy Be careful what you wish for -- you just might get it
henry@utzoo.uucp (Henry Spencer) (07/11/89)
In article <13980@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >> What about the required pairing of registers for double wide operations >> such as floating-point or shifting? > >In what way does a machine which requires register pairing qualify as >a RISC? ... Don't be too hard on the guy; he's judging all RISCs by the lousy one he's got. Only Motorola could build a RISC with register pairing... :-) >... If you are not using those >extra instructions, you might as well have a RISC which provides >only the instructions you _do_ use. The hardware designer could >then spend more time making those work faster... In fact, it is notoriously true that many CISCs run faster if the compiler writer treats them as RISCs and ignores all the fancy stuff. >...RISC machines are easier to pipeline, easier to speed up the clock >for, easier to provide staged functional units for, etc.. I don't >know of any CISC machines with 'hardwired' instruction sets... I can think of a couple of old ones. The first pdp11, the 11/20, was hardwired. (This accounts for some of the little irregularities in the 11 instruction set, in fact, like the way INC isn't quite the same as ADD #1...) And I seem to recall that the 360/75 was mostly hardwired, for speed. -- $10 million equals 18 PM | Henry Spencer at U of Toronto Zoology (Pentagon-Minutes). -Tom Neff | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
frazier@oahu.cs.ucla.edu (Greg Frazier) (07/12/89)
In article <151@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes: > >This term has been confusing me (too?). Is this the (an?) "accepted" >definition of exposed pipeline? My understanding is that an exposed pipeline is one in which interlocks, register reservations, etc. are done in software. Thus, if you have a 10 cycle divide and a 2 cycle addition, and have in your machine code R1 <- R2 / R3 R4 <- R1 + R2, you are going to get an incorrect value in R4, unless you stick a bunch of no-ops inbetween. In an un-exposed pipeline, the compiler does not have to know how long operations take, the hardware performs register reservations and prevents such things from happening. Thus, in my second posting, the question as to the applicability of discussing exposed/unexposed pipelines in the context of a RISC, even though I was the one who brought it up in the first place. Yes, I'm a bit of a chuckle-head :-) Greg Frazier ******************@@@@@@@@@@@@@@@@@@@@@@@@@==================== Greg Frazier o Internet: frazier@CS.UCLA.EDU CS dept., UCLA /\ UUCP: ...!{ucbvax,rutgers}!ucla-cs!frazier ----^/---- /
paul@moncam.co.uk (Paul Hudson) (07/12/89)
In article <2199@oakhill.UUCP> davet@oakhill.UUCP (David Trissel) writes: In article <13980@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >> What about the required pairing of registers for double wide operations >> such as floating-point or shifting? >In what way does a machine which requires register pairing qualify as >a RISC? The point was that RISC architectures DO require the pairing. Can you name any that don't? ARM. (Acorn Risc Machine) last time I looked ... >If an instruction requires 2 operands, they should be allowed >to be any two general purpose registers. This is not what was being discussed. But it is relevant to the complexity of the compiler. >Furthermore, you are assuming >that floating point is larger than other intrinsic types. Well, unless you limit yourself to only single-precision, floating-point IS larger than other instrinsic types. [ omitted ] >So, this issue _does_ have a bearing on the complexity of the compiler. >If you are not willing to provide the sophisticated compilers required >to adequately use a machine, you have wasted money (read: design effort, >chip space, etc.) on the hardware. Nope, no bearing on the compiler. There is no gun being pointed at the compiler writer forcing her to utilize all the available instructions. Your assumption that most of the instructions have to be used "to adequately use a machine" is simply incorrect. But what's the point in having those instructions, then? Why not use all that silicon real-estate for something else? >All the optimizations required on a RISC are also required on a CISC. Wrong. 68K compilers don't have to worry about register pairing. 68K compilers don't have to worry about branch delayed slot filling, setting up segment registers for data addressability, etc. Neither does the compilers for the ARM. (except addressing literals, and that wasn't difficult). >CISC just adds more complexity to the mix. I showed otherwise in that RISC has it's own set of headaches. I don't see much difference in the total amount of work involved for either RISC or CISC. >Now, clearly 5 instructions may take longer than the 1 in your 68K example. Actually RISCs only require 2 or 3 instructions to do the work of the one in that example. >All this may mean that 5 instructions on a RISC may be _faster_ than one on a CISC. Not when using the latest CISCs. I think all this shows that it depends on the particular examples of CISC or RISC rather than being generic. I would argue that most RISCs are more regular in their instruction sets than most CISCs, and that this does make compilers significantly easier. My experience in this is limited to writing a replacement code generator for pcc for the Acorn Risc Machine. The result had only peephole optimisation and couldn't compete with an fully-optimizing compiler, but still gave quite good results. More importantly, it was working quite quickly - here the regularilty of register use and instructions was a big win. -- Paul Hudson MAIL: Monotype ADG, Science Park, Cambridge, CB4 4FQ, UK. PHONE: +44 (223) 420018 EMAIL: paul@moncam.co.uk, ;" FAX: +44 (223) 420911 ...!ukc!acorn!moncam!paul `"";";" "/dev/null full: please empty the bit bucket"
tim@crackle.amd.com (Tim Olson) (07/12/89)
In article <151@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes: | Anybody care to quantify this? Just what sort of performance improvement can I | expect from no-holds-barred optimization over only-what-I-have-to optimization.? Here are a couple of data points. The only optimizations performed by the internal pcc-derived compiler were delayed-branch slot filling, loop-rotation, leaf-procedure optimization (no frame allocated), and some loop-invarient code-motion (mainly constant addresses). We can compare this to the MetaWare High-C compiler for the Am29000, which performs many more optimizations, including common-subexpression elimination, dead-code elimination, constant and variable propagation, register assignment by coloring, etc: Benchmark VAX 11/780 pcc 29K pcc 29K MetaWare diff 3.2 s 0.208 s 0.157 s (+32%) grep 2.1 s 0.193 s 0.142 s (+35%) nroff 7.1 s 0.564 s 0.507 s (+11%) (29K simulation model was 25MHz, with separate 8Kbyte caches, which were 2-cycle first access, single cycle burst.) So you can see that there is a definite improvement, but it certainly isn't the 3X - 5X implied by the assertion that "you have to use highly-optimizing compilers with RISC, otherwise you might as well use a CISC processor." -- Tim Olson Advanced Micro Devices (tim@amd.com)
fotland@hpihoah.HP.COM (Dave Fotland) (07/12/89)
> davet@oakhill.UUCP (David Trissel) writes: >In article <13980@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > >>> What about the required pairing of registers for double wide operations >>> such as floating-point or shifting? >>In what way does a machine which requires register pairing qualify as >>a RISC? >The point was that RISC architectures DO require the pairing. Can you name >any that don't? Hewlett Packard precision architecture doesn't require register pairing. Double shifts can use any two general registers. The floating point registers are separate from the general registers and each can hold either a 32 bit or 64 bit floating point value. The compiler writers were against requiring pairing and the floating point experts felt that double precision was more important than single so there was no need to try to put two singles in a double precision register. This is also why double precision floating point is only slightly slower than single on most HP-PA machines. Floating point operations are naturally slower than integer operations so it makes sense to have them handled by a coprocessor so they can overlap with integer operations. Having separate floating point registers makes the coprocessor design easier and saves communication between the IU and FPU. David Fotland fotland@hpda.hp.com
mac3n@babbage.acc.virginia.edu (Alex Colvin) (07/13/89)
> The Honeywell (now Bull) DPS-8 is a conscious attempt at a > very-complex-instruction set computer, based on the basic architecture > of the era of IBM 7090s and DEC-10s. > .... Given a compiler (say, FORTRAN > IV) for the basic machine language, one can add all the constructs for > PL/1 and (you should pardon the expression) COBOL to your language in > about two man-days: Honeybun put the primitives into the EISbox > (Extended Instruction Set). > The machine is still in production, is still very CISC, and actually > runs rather well. They really did "narrow the semantic gap between > the machine language and the language understood by the compiler", > which was the reason d'etre of the vCISC machines. Only for some languages. PL/I, COBOL. Try C. While the machine does understand character strings, it doesn't understand single characters very well. They're pretty much restricted to the EIS instructions, which are strictly memory-memory, in contrast to the rest of the instruction set, which is memory-register. > One can even generate good code for them (;-)), because they're > either regular, or only exist in one variant. Sure, it's easy to do optimal register allocation when all you've got is one good register (EAQ). > Mind you, I can't recommend the underlying order code (the stuff that > preceded the EISbox) to my worst enemy. The machine is a sorta wart > with an elegant bag on the side. "Order code" gives you an idea of the age of the basic architecture. But EIS is a mess. Seriously, I grew up on this machine, and I can say that I prefer i80*86s. Perhaps I'm biased by the early flaky implementations of EIS (such as 0-length string copy faulting) and the wild variation in instruction timing among different models.
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (07/13/89)
In article <199@dg.dg.com> uunet!dg!rec (Robert Cousins) writes: >of compute power. On the 88K, we need less than 1 MIPS to generate >a Dhrystone MIPS. In other words, we get Dhrystone MIPS > CPU >clock speed. When this first happened here, many people were Could you clarify what you mean? I know of no correlation between "Dhrystones/second" and "Instructions issued/second". Also, on a lot of machines, the Dhrystone rating is quite a bit higher than a simple "MIPS"/"VAX MIPS" ratio would indicate, thus demonstrating that Dhrystone is but one of many benchmarks, and the 11/780 does better on others... Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
rec@dg.dg.com (Robert Cousins) (07/13/89)
In article <28471@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: >In article <199@dg.dg.com> uunet!dg!rec I wrote: >>of compute power. On the 88K, we need less than 1 MIPS to generate >>a Dhrystone MIPS. In other words, we get Dhrystone MIPS > CPU >>clock speed. When this first happened here, many people were >Could you clarify what you mean? I know of no correlation between >"Dhrystones/second" and "Instructions issued/second". > .... > Hugh LaMaster, m/s 233-9, UUCP ames!lamaster > NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov > Moffett Field, CA 94035 > Phone: (415)694-6117 Before going any further, I would like to point out that I am (was) pointing out anecdotal information. When we first begain to run benchmarks on the AViiON 88K products, we came up with Dhrystone numbers which when converted to VAX MIPS came out higher than the RAW MIPS of the machine. The 16.67 Mhz machine has a peak raw MIPS rating of 16.67 MIPS (1 instruction per clock). 16.67 Dhrystone MIPS is 29289 Dhrystones/Second. Since we were getting more Dhrystones/second than this, some people were suprized. Hugh, I agree that Dhrystone is one benchmark in a whole field of benchmarks. However, I feel comfortable drawing a simple conclusion from this result: It is not clear that a CISC processor will take FEWER INSTRUCTIONS to solve a problem than a RISC processor. In some cases RISCs may require fewer instructions. This contrasts with some early claims that RISCs could execute short instructions in a shorter time than a CISC could execute long instructions doing the same work. From then on, in some circle, RISCs have been tacitly assumed to require more instructions to do a given job than CISCs. Robert Cousins Dept. Mgr, Workstation Dev't. Data General Corp. Speaking for myself alone.
slackey@bbn.com (Stan Lackey) (07/13/89)
In article <199@dg.dg.com> uunet!dg!rec (Robert Cousins) writes: >Concerning your optimization comment, I would like to throw out an >interesting phenomenon we have noticed (perhaps other RISCers will >comment also): At one point, people were talking about RISC processors >needing to execute MORE instructions to do the same amount of work. >THis implied that where a CISC might require 100 instructions, a >RISC might require >100 instructions, though the RISC instructions >would take less time. We have noticed that in many cases, the >instruction count goes down. I assume you mean 'instruction count goes down relative to our old architecture'. I bet going from 4 to 30 registers did something for register spills and lots of other interesting stuff... As well as byte addressability... :-) Stan
schow@bnr-public.uucp (Stanley Chow) (07/14/89)
In article <200@dg.dg.com> uunet!dg!rec (Robert Cousins) writes: > >Hugh, I agree that Dhrystone is one benchmark in a whole field of >benchmarks. However, I feel comfortable drawing a simple conclusion >from this result: It is not clear that a CISC processor will take >FEWER INSTRUCTIONS to solve a problem than a RISC processor. In some >cases RISCs may require fewer instructions. > This is such a simple issue that arguing is pointless. You can just post the static and dynamic instruction counts for some sample CPU's and settle the question. Static numbers ought to be easy to get. Dynamic numbers can be either true dynamic number or just approximation by counting inner loops. Since you are making a "suprising" claim, you ought to come up with the numbers. Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!psuvax1!BNR.CA.bitnet!schow (613) 763-2831 ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public Me? Represent other people? Don't make them laugh so hard.
acockcroft@pitstop.West.Sun.COM (Adrian Cockcroft) (07/14/89)
> In other words, we get Dhrystone MIPS > CPU > clock speed. When this first happened here, many people were > mumbling about clocks running backward and Escher paintings . . . . :-) > > Robert Cousins I'm interested to see what code the compiler produced for the 88k could you mail me your Dhrystone source code and the resultant assembler output (cc -S on sun). I'll compile your source and compare it to the SPARC equivalent to try to see what optimisations have been performed. I can also run the SPARC dhrystone through SPARCsim to get an instruction mix analysis. If I get anywhere I will post the results back here. As an aside, I used to use the Greenhills 68k C compiler a lot for realtime work and I found that the compiler optimisations were style dependent. I eventually found what sort of expressions the compiler could cope with best by tweaking the C code and looking at the assembler produced. In particular strength reduction optimisations were very dependent on the way I wrote loops. -- Adrian Cockcroft Sun Cambridge UK TSE sun!sunuk!acockcroft Disclaimer: These are my own opinions
mash@mips.COM (John Mashey) (07/15/89)
In article <28471@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: >In article <199@dg.dg.com> uunet!dg!rec (Robert Cousins) writes: >>of compute power. On the 88K, we need less than 1 MIPS to generate >>a Dhrystone MIPS. In other words, we get Dhrystone MIPS > CPU >>clock speed. When this first happened here, many people were > >Could you clarify what you mean? I know of no correlation between >"Dhrystones/second" and "Instructions issued/second". > >Also, on a lot of machines, the Dhrystone rating is quite a bit higher >than a simple "MIPS"/"VAX MIPS" ratio would indicate, thus demonstrating >that Dhrystone is but one of many benchmarks, and the 11/780 does better >on others... Yes, 100% [I'm on the road with little bandwidth, but I just can't pass this one up...] 1) Even with no gimmickry, Dhrystone is the benchmark of choice for showing how much better than a VAX you are, because Dhrystone has a lower-than usual number of cycles / subroutine call, i.e., it seriously penalizes anything with a slow function call sequence. This data has been published for years..... 2) With the current level of compiler gimmickry, you must obey HrDr Weicker"s advice about disbelieving anything unless you see the generated code. What does any of this mean when you can change the numbers +40% based on a single optimization of a single statement??? -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
hankd@pur-ee.UUCP (Hank Dietz) (07/16/89)
Let's assume that the RISC and CISC machines under consideration are identical except for instruction encoding; the CISC being microcoded and each RISC instruction corresponding to one micro instruction. CISC compilers tend to pack as many operations together in single instructions as they possibly can, but this implies a certain amount of redundancy because the complex instructions represent general case sequences of micro instructions. Initially, there will be several RISC instructions for each of the CISC instructions, perhaps 3:1. However, many of them will be optimized away. This is because the RISC compiler can recognize and remove redundancies in the CISC micro instruction sequences. Generally, RISC implies more available expressions, which makes value propagation and common subexpression elimination more productive. Taking arbitrary basic blocks of RISC code and doing rather simple analysis, we have often found that only about 1/4 of the operations remain after optimization... this is a very unlikely scenario for CISC operations. In summary, redundant "micro instructions" of the CISC operations nearly all disappear from the optimized RISC code, hence a 1:1 ratio of RISC:CISC is quite possible. How do you get better than 1:1? The only way is if some entire CISC instructions are useless; i.e., you can't do better than 1:1 unless the RISC compiler is "smarter" than the CISC compiler. -hankd PS: If the above sounds like "why nobody should ever build a CISC," let me remind folks that we're assuming instruction bandwidth isn't an issue and the code you care about really has such low-level redundancies. Of course, finer-grain (RISC) compiler control can require more instruction bits per computation, so there's still plenty to argue about. ;-)
lord@se-sd.NCR.COM (Dave Lord ) (07/20/89)
In article <199@dg.dg.com> uunet!dg!rec (Robert Cousins) writes: :In article <13976@lanl.gov: jlg@lanl.gov (Jim Giles) writes: ::Every once in a while, someone in this newsgroup makes the claim ::that RISC trades off hardware complexity for compiler complexity. ::This is simply _NOT_ true. It is always _easier_ to write a compiler ::for a RISC machine than for a CISC machine. : :I must agree. Our experience on the 88K with a number of pieces :of software from gcc to an internally developed FORTH interpreter :has shown that RISCs are easier to generate code for. I would :carry this a step further. While the assembly language development :effort is minimal, we have had little difficulty with assembly :language to the point where I wonder if RISCs might not be easier :to program at that level, also. If a programmer can do it easily . . . . Uh, yes and no. Compilers for _some_ languages may be easier to write for RISC but having recently been working on a COBOL compiler for the 88K I can tell you that a _huge_ amount of work is involved in handling commercial data types (decimal and variations of decimal). I will agree that contrary to popular belief writing assembly language code for the 88k is a piece of cake.
guy@auspex.auspex.com (Guy Harris) (07/21/89)
>Uh, yes and no. Compilers for _some_ languages may be easier to >write for RISC but having recently been working on a COBOL compiler >for the 88K I can tell you that a _huge_ amount of work is involved >in handling commercial data types (decimal and variations of decimal). So, just out of curiosity: 1) how much work relative to the work involved on CISCs with decimal instructions, counting both the compiler work *and* the incremental CPU design work for the decimal instructions for *all* generations of the CISC (unless, of course, the compiler work for the RISC has to be redone for subsequent generations)? 2) how much work relative to the work involved on CISCs without decimal instructions, or at least without the whizzo ones on 370s and VAXes and the like?
davet@oakhill.UUCP (David Trissel) (07/22/89)
In article <1989Jul9.202858.27121@jarvis.csri.toronto.edu> norvell@csri.toronto.edu (Theodore Stevens Norvell) writes: > >But the choice of instructions is not the hard part of instruction selection >(on a CISC). There is also the choice of address modes. Valid point but that's only one side of the coin. Indeed our 68K compiler has extra templates for recognizing the more complex trees (actually called 'shapes' for you pcc2 fans) that some addressing modes require. However, our 88K compiler template file has become just as large as that of the 68K compiler because it requires extra templates to catch several special cases of 3 register operand possibilities that are missed with the simple classic stock templates one would create for a base 88K machine. We found it rather fascinating that the template files for the two machines are almost the same size. >compiling the C statement i = *p ; where i and p are both memory (not >register) variables. Do you: > (A) Emit a move using an indirect address mode. > (B) load p and do a memory to memory move > (C) load *p with an indirect address mode and store i > (D) load p, load *p and then store i > [...] >In other words, even in a simple assignment statement, there are phase ordering >problems between instruction selection and, optimization, register allocation >and scheduling. You are correct. But then the 68K compiler doesn't have the massive headache of trying to support data addressability segment registers because of the RISC inability to directly access an item with a full 32-bit address. Both are massive headaches, which gets back to my original point some postings ago that I don't see one architecture having a big advantage over the other when it comes to purely 'amount-of-work-to-build-a-compiler' issues. > >Mr. Trissel's group seem to have solved this phase ordering problem by >putting some of the address mode selection into a peephole optimizer. >This seems reasonable, but it moves the complexity to the peephole optimizer. Correct again. But once you figure in the amount of work that a RISC compiler (or it's assembler) requires for code reordering [and for some non-Motorola RISCs the pipeline interlock worries -> :-)] is RISC really ahead in the amount of work required? >>>To make matters worse, instruction selection is context sensitive. >>>This means that different code ordering during optimization could >>>invalidate the instruction selection choice (or, at least make the >>>selected instructions sub-optimal). >> >>True for both RISC and CISC. No difference. > >The difference is that if there is only one good way to select the >instructions (as is more often true on a RISC than a CISC), then there >is no phase ordering problem, You simply select the instructions first >with no fear of picking a sequence that screws up the register allocation >or scheduling. Not because it won't, but because there probably wasn't >another sequence that would have been better. But it seems to me that once you take into consideration the requirement that a RISC compiler has to support code reorder scheduling to produce decent output then the code reorderer must be every bit as complex as a 68K peephole optimizer (for example.) A reorderer has to keep up with live/dead registers, operand use/defines, have the knowledge of what can be swapped around to where etc. which is even more complex in many respects than what my 68K peephole is required to do. -- Dave Trissel Motorola Austin
mash@mips.COM (John Mashey) (07/23/89)
In article <2231@oakhill.UUCP> davet@oakhill.UUCP (David Trissel) writes: >But it seems to me that once you take into consideration the requirement that >a RISC compiler has to support code reorder scheduling to produce decent >output then the code reorderer must be every bit as complex as a 68K peephole >optimizer (for example.) A reorderer has to keep up with live/dead >registers, operand use/defines, have the knowledge of what can be swapped >around to where etc. which is even more complex in many respects than what >my 68K peephole is required to do. How about publishing some data on the size and complexity of the peephole for the 68K and the code reorderer for the 88K? This would seem to be a useful data point, esp. if the rest of the compiler system is similar. Also, estimates of staff-weeks to do this would be really nice, to help start building an element of quantitative data into this ongoing discussion. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
preston@titan.rice.edu (Preston Briggs) (07/23/89)
In article <23868@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >How about publishing some data on the size and complexity of the >peephole for the 68K and the code reorderer for the 88K? >This would seem to be a useful data point, esp. if the rest of the >compiler system is similar. Also, estimates of staff-weeks to do this >would be really nice, to help start building an element of quantitative >data into this ongoing discussion. Well, here's a data point, though for an IBM/RT. In an experimental FORTRAN compiler, I do instruction selection and choice of addressing modes via peephole optimization (ala Fraser and Davidson). The code is pretty simple and short, less than 250 lines of C. For the integer portion of the machine, there are only about 70 patterns. These give complete coverage of the (very limited) instruction set. For the floating point portion, I have more than 200 patterns, evenly split between single and double precision. However, these cover only a fraction of the available addressing mode combinations. I do code scheduling before and after register allocation, using the scheme of Gibbons and Muchnik (Sigplan 86). I found it significantly more difficult to program, at least to achieve good space and time performance. The code size is less than 450 lines of C. Unfortunately, I can't give you good guesses about the development time. Less than two years of grad student labor though! A version for the 68K would avoid the scheduling problems, but the number of peephole patterns needed for complete coverage of the instruction set would be prohibitive. (millions). Fraser and Davidson would either use a table-driven interpreter to get complete coverage, or use a large set of test programs and select a useful subset of the available pattterns. These schemes, seperately or together, would however be more difficult to program. Preston