dgh@validgh.com (David G. Hough on validgh) (09/09/90)
Stephen Spackman's recent postings recall a question I asked during the development of the SPARC instruction-set architecture at Sun: since floating-point instructions can be decomposed into simple integer operations, how can they be justified in a RISC architecture? Why is it that they don't run as fast in software? (They don't, and can't, but you might have to try it to convince yourself. All you need to do is look at 64-bit double precision floating-point add/subtract on a 32-bit RISC architecture). Basically I was attacking the idea that RISC = 'a few simple instructions'. This was an overly simple definition anyway. The correct definition of RISC architecture is 'good engineering' in the sense of 'good engineering economy', although not everybody has realized this yet. The underlying answer to the floating-point question is that while software floating point is limited by the macro-instruction cycle time and parallelism is limited by the macro-instruction parallelism potential, a hardware floating-point implementation can run at a faster clock and have entirely different kinds of parallelism. For instance, one of the Hot Chips Symposium papers this year mentioned a floating-point addition unit that simultaneously does various cases that arises and picks the correct one at the end. And I mean really simultaneously. Although high performance hardware floating point is not microcoded in the usual sense it is often implemented in hard-wired micro steps whose clock rate isn't limited by the instruction fetch bandwidth as is the macro-clock cycle rate. What tells you which complex multi-macro-cycle instructions (like floating-point ops) are appropriate for inclusion in an instruction-set architecture? One issue that arises if you want to be commercially successful is that it's not a good idea to completely overlook any major application area even if it's less than 1% of some "total", especially if your competitors didn't overlook it. Thus MIPS put in integer multiplication before SPARC and SPARC put in floating-point sqrt before MIPS. Both oversights were remedied in the second-generation instruction set architectures, although I think MIPS has already implemented sqrt while no SPARC vendor has implemented integer multiplication. Thus people like Silverman point out correctly that current SPARC implementations aren't competitive for his kinds of problems; this is embarrassing, and I used to worry that it would bother potential customers - most of whom don't depend on integer multiplication but may not know it - but it doesn't seem to be much of a problem. Sun-3 sales are down in the noise compared to Sun-4 even if they can do some integer arithmetic problems faster at the same clock. Another aspect of commercial success is mass marketability - generalized processors may be cheaper and more cost effective and faster than specialized ones because of higher run rates and more attention from vendors in getting them into the latest device technologies. Spackman's speculation is that a totally different paradigm for non-integer calculations could be more cost-effective than conventional floating point. There are lots of candidate proposals; consult any recent proceedings from the IEEE Computer Arithmetic Symposia. But most of them are content to prove feasibility rather than cost-effectiveness. As mentioned, the issue is good engineering economy. The quantitative approach demonstrated in Hennessy and Patterson is the best basis to start, but it's much more expensive than thought-experiments posted to news: to really test an idea you have to build a hardware simulator and a good optimizing compiler that properly exploits it, and possibly design some language extensions to express what you can do. And even that's not enough; to avoid the kinds of embarrassments mentioned above you need to learn as much as possible about what potential customers actually do with computers and what they would do if they could. It's a lifetime undertaking. Besides Patterson, I should mention that Robert Garner, George Taylor, John Mashey, and Earl Killian have helped me sort out what RISC is all about. -- David Hough dgh@validgh.com uunet!validgh!dgh na.hough@na-net.stanford.edu
Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) (09/10/90)
>>>>> On 9 Sep 90 15:17:44 GMT, dgh@validgh.com (David G. Hough on validgh) said:
David> ...since floating-point instructions can be decomposed into simple
David> integer operations, how can they be justified in a RISC
David> architecture? Why is it that they don't run as fast in software?
David> (They don't, and can't, but you might have to try it to convince
David> yourself. All you need to do is look at 64-bit double precision
David> floating-point add/subtract on a 32-bit RISC architecture).
David> Basically I was attacking the idea that RISC = 'a few simple
David> instructions'. This was an overly simple definition anyway. The
David> correct definition of RISC architecture is 'good engineering' in the
David> sense of 'good engineering economy', although not everybody has
David> realized this yet.
Perhaps RISC does indeed stand for Reduced Instruction Set, and "good
engineering" can, and has, been applied to CISC architectures (notably the
80486 and the 68040).
Modern processor design is indeed indebted to the RISC pioneers who, in
order to compensate for reduced instruction sets, applied "good
engineering" to come up with some remarkable techniques for parallelism.
_Except for the reduced number of instructions_, these same techniques can
be applied to CISC (albeit some techniques with more difficulty).
If a CISC processor _averages_ close to 1 Cycle Per Instruction, what is
the advantage of removing many of those instructions? Are you claiming a
CISC processor is somehow transformed into a RISC processor because of an
improved CPI, _even though the actual instruction set has not diminished_?
(e.g. the 68040 & 80486)
In a given technology, the physics of the medium limits how fast a switch
can toggle, leaving parallelism as the route for even greater throughput.
It appears Reduced Instruction Sets and parallelism are, to a great degree,
orthagonal. Am I missing something here?
Is it possible higher silicon densities will shift (or have shifted) the
economics of processor design toward more robust parallelized instruction
sets, perhaps even toward "Super CISC"?
Just for discussion,
David> David Hough
David> dgh@validgh.com uunet!validgh!dgh na.hough@na-net.stanford.edu
#include <std/disclaimer.h>
--
Chuck Phillips MS440
NCR Microelectronics Chuck.Phillips%FtCollins.NCR.com
2001 Danfield Ct.
Ft. Collins, CO. 80525 uunet!ncrlnk!ncr-mpd!bach!chuckp
amos@taux01.nsc.com (Amos Shapir) (09/10/90)
[Quoted from the referenced article by Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips)] > >In a given technology, the physics of the medium limits how fast a switch >can toggle, leaving parallelism as the route for even greater throughput. >It appears Reduced Instruction Sets and parallelism are, to a great degree, >orthagonal. Am I missing something here? What you're missing is that CISC processors are a bitch to parallelize on the instruction level - each instruction or part thereof can take a different number of cycle and occupy an unpredictable number of resources; when several processors have to share these resources, a lot of effort should be put into interlocking, synchronisation, etc. -- Amos Shapir amos@taux01.nsc.com, amos@nsc.nsc.com National Semiconductor (Israel) P.O.B. 3007, Herzlia 46104, Israel Tel. +972 52 522255 TWX: 33691, fax: +972-52-558322 GEO: 34 48 E / 32 10 N
mash@mips.COM (John Mashey) (09/15/90)
In article <CHUCK.PHILLIPS.90Sep9215755@halley.FtCollins.NCR.COM> Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) writes: >>>>>> On 9 Sep 90 15:17:44 GMT, dgh@validgh.com (David G. Hough on validgh) said: >David> ...since floating-point instructions can be decomposed into simple >David> integer operations, how can they be justified in a RISC >David> architecture? Why is it that they don't run as fast in software? >David> (They don't, and can't, but you might have to try it to convince >David> yourself. All you need to do is look at 64-bit double precision >David> floating-point add/subtract on a 32-bit RISC architecture). > >David> Basically I was attacking the idea that RISC = 'a few simple >David> instructions'. This was an overly simple definition anyway. The >David> correct definition of RISC architecture is 'good engineering' in the >David> sense of 'good engineering economy', although not everybody has >David> realized this yet. > >Perhaps RISC does indeed stand for Reduced Instruction Set, and "good >engineering" can, and has, been applied to CISC architectures (notably the >80486 and the 68040). > >Modern processor design is indeed indebted to the RISC pioneers who, in >order to compensate for reduced instruction sets, applied "good >engineering" to come up with some remarkable techniques for parallelism. >_Except for the reduced number of instructions_, these same techniques can >be applied to CISC (albeit some techniques with more difficulty). > >If a CISC processor _averages_ close to 1 Cycle Per Instruction, what is >the advantage of removing many of those instructions? Are you claiming a >CISC processor is somehow transformed into a RISC processor because of an >improved CPI, _even though the actual instruction set has not diminished_? >(e.g. the 68040 & 80486) > >In a given technology, the physics of the medium limits how fast a switch >can toggle, leaving parallelism as the route for even greater throughput. >It appears Reduced Instruction Sets and parallelism are, to a great degree, >orthagonal. Am I missing something here? > >Is it possible higher silicon densities will shift (or have shifted) the >economics of processor design toward more robust parallelized instruction >sets, perhaps even toward "Super CISC"? > > Just for discussion, > >David> David Hough >David> dgh@validgh.com uunet!validgh!dgh na.hough@na-net.stanford.edu > >#include <std/disclaimer.h> >-- >Chuck Phillips MS440 >NCR Microelectronics Chuck.Phillips%FtCollins.NCR.com >2001 Danfield Ct. >Ft. Collins, CO. 80525 uunet!ncrlnk!ncr-mpd!bach!chuckp Newsgroups: comp.arch Subject: Re: Why floating point hardware: micro-parallelism, micro-cycles Summary: Expires: References: <197@validgh.com> <CHUCK.PHILLIPS.90Sep9215755@halley.FtCollins.NCR.COM> Sender: Reply-To: mash@mips.COM (John Mashey) Followup-To: Distribution: Organization: MIPS Computer Systems, Inc. Keywords: There are a bunch of things in the following discussion that could use some clarification, or amplification, so here goes: In article <CHUCK.PHILLIPS.90Sep9215755@halley.FtCollins.NCR.COM> Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) writes: >>>>>> On 9 Sep 90 15:17:44 GMT, dgh@validgh.com (David G. Hough on validgh) said: >David> ...since floating-point instructions can be decomposed into simple >David> integer operations, how can they be justified in a RISC >David> architecture? Why is it that they don't run as fast in software? >David> (They don't, and can't, but you might have to try it to convince >David> yourself. All you need to do is look at 64-bit double precision >David> floating-point add/subtract on a 32-bit RISC architecture). >David> Basically I was attacking the idea that RISC = 'a few simple >David> instructions'. This was an overly simple definition anyway. The >David> correct definition of RISC architecture is 'good engineering' in the >David> sense of 'good engineering economy', although not everybody has >David> realized this yet. Dgh has this right about FP (note that on a MIPS, 64-bit FP add = 2 cycles, hard to match by sequences of integer instructions),` and it is a good example of what people really do, without the confusion of counting instructions. >Perhaps RISC does indeed stand for Reduced Instruction Set, and "good >engineering" can, and has, been applied to CISC architectures (notably the >80486 and the 68040). Good engineering can be of course applied to CISCs, and has been, for years. If you track succeeding designs among, for example, the S/360 & VAX families, you will find that the designers have carefully studied the statistics of program behavior, moved some instructions from microcode into hardware, or vice-versa, or even into software emulation. Examples include: 360/44 (didn't have decimal ops, for example) MicroVAX II (also didn't have decimal ops) In addition, successive designs have generally gotten more efficient pipeline designs and memory hierarchies. Certainly, the 80486 is a fine implementation, the 68040 appears to be well-thought-out, from the published information. This whole process, in general, goes on amongst all competent computer designers, and has been, for many years, and is not particularly new, nor would I expect that any knowledgable RISC designer tell you that is was something magic and new. So what's the difference: let's try again: RISC micros were designed from the beginning: 1) To avoid instruction complexity that would require microcode in general, which often costs you 1.5-2 : 1 if used for the simpler instructions. 2) (In better cases) with a great deal of input from software people. Since RISCs are newer, they have a lot of benefit from hindsight. Since RISCs were designed when there was considerable more use of high-level languages and (sometimes) optimizing compilers, it was much easier to study these things and input them into the design. AS it happens, compiler technology has taken leaps in the last decade, and the tradeoffs have changed, not suprising, since the entire nature & structure of the computer business is a lot different from 10 years ago, and unbelievably different from 20 years ago. 3) RISCs usually were designed after it was clear that caches were good things, and that let them make tradeoffs from Day 1, tradeoffs that were not necesasrily appropriate for architectures designed when caches were either unknown or not practical for the part of the design space being attacked. Also in this category are: a) Pure code segments b) Virtual memory support, if needed In some cases, some older machines allowed programs to write into their code any time they felt like it (like into the immediately suceeding instruction), or they included features that conflicted morewith VM than they need to have. All of these can be worked around, but hindsight... 4) RISCs are generally designed to permit clean, simple pipelining, without requiring huge amounts of logic for special cases and such. This is certainly one of the key differences, and again, some of it comes from hindsight. 5) Avoid those instructions that can easily be simulated by sequences of simple ones AT COMPARABLE PERFORMANCE. Include those instructions, NO MATTER HOW "COMPLEX" someone thinks they are, if those instructions achieve performance that cannot be approximated elsewise, and if the tradeoffs are acceptable. (again: include FP Add, which may well be a huge hunk of hardware, but don't include Translate&Test). It is interesting, as H&P point out, that never in the history of computing have bunch of ISA (note: just ISAs, nothing said about architecture in general) designs done at the same time resembled each other as much as the current crop of RISCs do. (This is where they describe several different chips by showing their relatively minor differences from their DLX). This doesn't mean there aren't important diifferences among them, but machines that have 32-bit instructions, load/store orientation, usually 32 integer registers available at once, etc, etc, are a lot more alike, than, say: IBM 1401, IBM 7074, and IBM 7094, or S/360, CDC 6600, Univac 1108, or VAX & DG MV, or Intel 8086, Moto 68000, and NSC 32K. >Modern processor design is indeed indebted to the RISC pioneers who, in >order to compensate for reduced instruction sets, applied "good >engineering" to come up with some remarkable techniques for parallelism. >_Except for the reduced number of instructions_, these same techniques can >be applied to CISC (albeit some techniques with more difficulty). As noted, good engineering practice is good engineering practice, and it didn't start with RISC. However, the reduced number of instructions is the LEAST of the issues, and people keep getting confused with this. Much more relevant are issues like: Operand and instruction alignment, especially in VM systems Number and especially kinds of addressing modes, especially multi-level indirect, for example. Number & size of operand fetches/writes caused by an instruction Multiple instruction sizes Number and kind of side-effects caused by an instruction, especially in VM systems Exception model > >If a CISC processor _averages_ close to 1 Cycle Per Instruction, what is >the advantage of removing many of those instructions? Are you claiming a >CISC processor is somehow transformed into a RISC processor because of an >improved CPI, _even though the actual instruction set has not diminished_? >(e.g. the 68040 & 80486) Well, so far, 80486s don't appear to average close to 1 CPI, although, as I've pointed out before, only the designers really know. On the other hand, if you approximate CPI by MHz/(Integer-VAX-mips), for machines for whichtaht makes sense, and use SPEC integer = Integer-vax-mips, you get numbers like: (from "Your Mileage May Vary, Issue 2.0): Clock SPEC-Int Clock/SPEC Chip System 25 11.2 2.23 SPARC SUN SS1+ w/s (64K cache) 25 12.3 2.03 SPARC Sun SS330 w/s (128K cache) 25 13.3 1.88 486 Intel-reported (128K) 25 18.3 1.37 88K Moto 8864SP (128K) 25 19.4 1.29 R3000 MIPS Magnum 3000 w/s (64K) 25 19.7 1.27 R3000 MIPS M/2000, RC3260 (128K) 25 20.2 1.24 RS/6000 IBM RS/6000 model 530 w/s (72K) Note, of course, that there is some element of apples&oranges here, as these things are not completely contemporaneous in design, have sometimes rather different silicon budgets, etc. Still, if you believe clock/SPEC is anywhere near close to CPI for these machines (it is for MIPS, but that's the only one I can be sure of), the 486 is still off by factor of 2. (Mainframes would get closer to 1, I think, and I suspect the '040 will do al ittle better also.) Of course, doing a heavily-streamlined implementation of a VAX, X86, 68K, etc ... doesn't magically make them RISC architectures, but of course, one shouldn't care much, either (except for marketing :-). The engineers are doing what they should be: making them go faster. Of course, they sometimes have to squeeze harder to get everything in. I have high respect for the implementation cleverness that has often gone into such things, because it is VERY HARD WORK to make ANYTHING go really fast, and people have to leave with past decisions. Consider people who build mainframes (IBM & PCMs): they must live with decisions made 25 years ago.... -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) (09/17/90)
>>>>> On 14 Sep 90 23:41:34 GMT, mash@mips.COM (John Mashey) said:
John> As noted, good engineering practice is good engineering practice,
John> and it didn't start with RISC.
John> However, the reduced number of instructions is the LEAST of the issues,
John> and people keep getting confused with this. Much more relevant are
John> issues like:
John> Operand and instruction alignment, especially in VM systems
John> Number and especially kinds of addressing modes, especially
John> multi-level indirect, for example.
John> Number & size of operand fetches/writes caused by an instruction
John> Multiple instruction sizes
John> Number and kind of side-effects caused by an instruction, especially
John> in VM systems
John> Exception model
Well put. So how about ditching the RISC acronym for a more descriptive
one? (e.g. LOUIS - Load/store One Uniform Instruction Size 1/2 :-)
John> -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
Ditto.
John> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash
John> DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700
John> USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
--
Chuck Phillips MS440
NCR Microelectronics Chuck.Phillips%FtCollins.NCR.com
2001 Danfield Ct.
Ft. Collins, CO. 80525 uunet!ncrlnk!ncr-mpd!bach!chuckp
shri@ncst.ernet.in (H.Shrikumar) (09/17/90)
>>>>>> On 9 Sep 90, dgh@validgh.com (David G. Hough on validgh) said: >David> Basically I was attacking the idea that RISC = 'a few simple >David> instructions'. This was an overly simple definition anyway. In article ref. above, (Chuck.Phillips) adds: >Perhaps RISC does indeed stand for Reduced Instruction Set, and "good >engineering" can, and has, been applied to CISC architectures (notably the >80486 and the 68040). If only we defined RISC = REGULAR Instruction Set Computers ..... ^^^^^^^ (and (1/2 :-) CISC = Confusing Instruction Set Computers ? ;-) -- shrikumar ( shri@ncst.in )
mash@mips.COM (John Mashey) (09/17/90)
In article <CHUCK.PHILLIPS.90Sep17040818@halley.FtCollins.NCR.COM> Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) writes: ... >John> However, the reduced number of instructions is the LEAST of the issues, >John> and people keep getting confused with this. Much more relevant are .... >Well put. So how about ditching the RISC acronym for a more descriptive >one? (e.g. LOUIS - Load/store One Uniform Instruction Size 1/2 :-) 1) Would be nice, but we're probably stuck with it ... 2) And besides, I'd have to rewrite all of my foils that explain why RISC (in sense of Reduced) has confused everybody :-) My usual sequence of acronyms is: Reduced Instruction Set Computer- not really Reusable Information Storage Computer-better (Marty Hopkins of IBM) Revolutionary Innovation in Science of Computing-no; Seymour at it 25yrs Response to Inherent Shifts in Computer technology -yes -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
mash@mips.COM (John Mashey) (09/18/90)
In article <919@shakti.ncst.ernet.in>, shri@ncst.ernet.in (H.Shrikumar) writes: > If only we defined RISC = REGULAR Instruction Set Computers ..... > ^^^^^^^ No, that doesn't work either. Many CISCs are as regular as RISCs, and some are more so. For instance, the VAX is pretty regular, as is the NSC 32K. Sometimes CISCs have completely regular addressing modes that RISCs don't (i.e., where you include base+index addressing, or auto-increment only one side of the load/store pairing). In any case, part of the point of the last posting was that the acronym didn't really matter much; of course, it's hardly the case that one can draw a precise line between RISCs and CISCs anyway, and in fact, being frenzied about which label to apply ismarketing, anyway. Much more relevant is to study the underlying issues about kinds of features that yield performance or not. You will note that Hennessy & Patterson's book doesn't waste a lot of time messing with RISC acronyms... -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086