croft@csusac.csus.edu (Steve Croft) (01/28/91)
I have heard from a non-RISC company and a magazine article that RISC is reaching its performance limits. Why would this be so? or not so? Inquiring minds wanna know... Steve Croft stevec@water.ca.gov
rcd@ico.isc.com (Dick Dunn) (01/29/91)
croft@csusac.csus.edu (Steve Croft) writes: > I have heard from a non-RISC company and a magazine article that RISC is > reaching its performance limits. Why would this be so? or not so? > Inquiring minds wanna know... I have heard from a newspaper that a dog gave birth to a two-headed human baby. One of the heads looks just like Elvis. Why would this be so? or not so? Enquirer minds wanna know... :-) The burden of enumerating the reasons RISC might be reaching its perfor- mance limits rests with the people who claim that's so. The burden of proof does not rest on the null hypothesis. In other words, if someone tells you RISC is running out of gas, have *them* back up their state- ments. There's no shortage of RISC nay-sayers. Some of them have valid points, but many simply have axes to grind. (You're obviously aware of this, since you point out that one of your sources is a "non-RISC company.") Why not find out their explanations and toss them at comp. arch to see what the folks here have to say about them. I'm not sure what it means for RISC to reach its performance limits. It will obviously take additional approaches to wring out more speed--such as superscalar or (more likely?) superpipelining--but these really are additions rather than replacements. Moreover, these techniques really build on the RISC approach, in that RISC gives you an easier base to work from. (Colwell, while he was at Multiflow, was fond of pointing out that their VLIWs took a RISC approach.) -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...Mr. Natural says, "Use the right tool for the job."
colwell@omews1.intel.com (Robert Colwell) (01/29/91)
In article <1991Jan28.184325.15815@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes: >croft@csusac.csus.edu (Steve Croft) writes: >> I have heard from a non-RISC company and a magazine article that RISC is >> reaching its performance limits. Why would this be so? or not so? >> Inquiring minds wanna know... > >I have heard from a newspaper that a dog gave birth to a two-headed human >baby. One of the heads looks just like Elvis. Why would this be so? or >not so? Enquirer minds wanna know... :-) The other head looks like Saddam's, oops, wrong newsgroup, sorry... >The burden of enumerating the reasons RISC might be reaching its perfor- >mance limits rests with the people who claim that's so. The burden of >proof does not rest on the null hypothesis. True enough, but it might be fun to kick around, and it's much closer to computer architecture than the unix interface discussion was... >I'm not sure what it means for RISC to reach its performance limits. It >will obviously take additional approaches to wring out more speed--such as >superscalar or (more likely?) superpipelining--but these really are >additions rather than replacements. Moreover, these techniques really >build on the RISC approach, in that RISC gives you an easier base to work >from. (Colwell, while he was at Multiflow, was fond of pointing out that >their VLIWs took a RISC approach.) It was an easy way to dissect the still-common attitude that just because the RISC acronym has "reduced" in it, then having a small number of instructions was necessary and sufficient for RISCdom. When one's instruction word is 1024 bits wide, with 28 separate operations specified thereby, one is forced to think a bit more thoroughly about the essence of "reducing". (See? I'm still fond of pointing this out. :-)) There are still a lot of folks who think RISCs are that set of architectures which have lots of registers. I fought that battle for a while, but goring other people's oxen is a thankless task at best. Personally, I think arguing about which will "run out of gas" first, RISC or CISC, is ultimately pointless. It's the von Neumann paradigm that may be maturing here. (Do I have to say this? "I Don't Speak For Intel.") Bob Colwell colwell@mipon2.intel.com 503-696-4550 Intel Corp. JF1-19 5200 NE Elam Young Parkway Hillsboro, Oregon 97124
msp33327@uxa.cso.uiuc.edu (Michael S. Pereckas) (01/31/91)
In <1991Jan28.232001.18009@omews63.intel.com> colwell@omews1.intel.com (Robert Colwell) writes: >In article <1991Jan28.184325.15815@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes: >>croft@csusac.csus.edu (Steve Croft) writes: >>> I have heard from a non-RISC company and a magazine article that RISC is >>> reaching its performance limits. Why would this be so? or not so? >>> Inquiring minds wanna know... Maybe the comparisions between RISC and CISC processors are not completely fair, since most of the popular CISC machines have to maintain compatibility with machines a decade or more old, while the RISC people have been able to design their own instruction sets to suit the internal design they decided upon. Maybe the CISC people would show us all a thing or two if they could design from a clean slate. I doubt it, though. -- Michael Pereckas * InterNet: m-pereckas@uiuc.edu * just another student... (CI$: 72311,3246) Jargon Dept.: Decoupled Architecture---sounds like the aftermath of a tornado
jfc@athena.mit.edu (John F Carr) (01/31/91)
In article <1991Jan30.215017.2651@ux1.cso.uiuc.edu> msp33327@uxa.cso.uiuc.edu (Michael S. Pereckas) writes: >Maybe the comparisions between RISC and CISC processors are not >completely fair, since most of the popular CISC machines have to >maintain compatibility with machines a decade or more old, while the >RISC people have been able to design their own instruction sets to >suit the internal design they decided upon. Maybe the CISC people >would show us all a thing or two if they could design from a clean >slate. I doubt it, though. [I'm limiting the discussion to general purpose processors; not including supercomputers, controllers, or graphics processors.] If there are no new CISC designs to compare against, I think that is a good sign that RISC is superior with current technology. Sun, IBM, and Mips have all come out with successful new RISC architectures in the past 5 years. The most recent successful CISC architectures I can think of are the 68000 and 8088, which were designed more than 10 years ago. Motorola and Intel have both come out with new RISC architectures since then. I don't believe this is because 5 independent design groups arbitrarily decided on RISC. There must be some reason they chose RISC. Question: Is there some effect of the current state of technology that makes RISC better than CISC? Why were the successful designs of the 1970s CISC? Comment: The IBM RT was designed in the first half of the 1980s. It may mark the beginning of the current trend*. The original version was a bit over 1 MIP, comparable to competing VAX and 68000 based products. A few years later, a new implementation ran at about 3-5 MIPS**, also comparable to VAX and 68000 designs. IBM's new chip, designed about 5 years later, runs twice as fast as the 68040. * The IBM RT didn't do very well, but I think that is the fault of IBM sales, not the design. ** dhrystone v1 says 4.5 MIPS @ 10 Mhz, but the RT compiler does a good job of eliminating dead code. There is also a 12 mhz processor. -- John Carr (jfc@athena.mit.edu)
jfc@athena.mit.edu (John F Carr) (02/01/91)
In article <3416@uc.msc.umn.edu> dwm@msc.edu (Don Mears) writes: >I believe that all new processors are CALLED RISC, no matter how complex >their design and instruction set is, because they have to use some >RISC techniques for improved performance. >IBM calls the RS/6000 a RISC CPU even though it fails most tests for having >reduced anything. It has 184 instructions and contains 2,000,000 transistors >for logic and 4,800,000 transistors for registers and cache. I have >seen articles from IBM where they redefined RISC to mean "reduced instruction >set cycles" in order to be able to call the extremely complex RS/6000 a >RISC CPU. I have heard this definition from IBM. They also said that they don't think instruction counting is a good measure of complexity. I agree. I think of RISC as describing the instruction set, not the implementation. Compare the RS/6000 opcodes and addressing modes to the VAX or 68000 instruction set to see the difference. The RS/6000 instruction set seems reasonable to me, given a decision to use fixed size 32 bit instructions*. The most surprising feature to me was the hardware string support; IBM justified this by pointing out profiling data showing that a large fraction of system time is spent in string compares**. The IBM RS/6000 instruction set is more complex than the IBM RT or MIPS [23]000***, but it is a lot closer to these machines than to the common CISC architectures. I you redefine RISC to describe features of implementations, not instruction set architectures, then things shift a bit. I haven't programmed a 68040 or an 80486 nor have I read details about their design, so I won't comment further on this. I would like to hear details about why these are described as RISC-like. Is it reduced cycle counts for operations on registers, or something more? * I'm not sure if this was the right choice. The IBM RT instruction set includes 16 bit versions of the most common instructions, and this decreases code size a lot (on the other hand, IBM would like to sell you more memory). Given the choice to use 32 bit instructions, enough bits become available to combine shift and mask into one instruction, for example. ** I agree with IBM that optimizing string operations is worth effort. The most heavily loaded servers at MIT/Athena spend much of their time doing string compares. Optimizing the C library function strcmp() on these servers can improve system response time for hundreds of users. *** I could be wrong about the MIPS products; I don't use them much. -- John Carr (jfc@athena.mit.edu)
preston@ariel.rice.edu (Preston Briggs) (02/01/91)
dwm@msc.edu (Don Mears) writes: >>I believe that all new processors are CALLED RISC, no matter how complex >>their design and instruction set is, because they have to use some >>RISC techniques for improved performance. >>IBM calls the RS/6000 a RISC CPU even though it fails most tests for having >>reduced anything. It has 184 instructions and contains 2,000,000 transistors jfc@athena.mit.edu (John F Carr) writes: >Compare the RS/6000 opcodes and addressing modes to the VAX or 68000 > instruction set tosee the difference. We might say that a RISC cpu reduces the path length thru the cpu. So, simplified addressing modes and 1 cycle instructions tend to result. But wide instructions (VLIW and superscalar) still seem reasonable. I also think Carr is write in saying "look at the opcodes and addressing modes." I.e., look at the ISA (instruction set architecture), not the implementation. Many implementations are possible (single-chip, multi-chip, multiboard), but the ISA remains the same. A rose is a ... Preston Briggs
glew@pdx007.intel.com (Andy Glew) (02/01/91)
Why is this a reply to "Re: RISC reaching its limits?!?"? Well, in the past I have been an advocate of RISC. But my new position probably indicates my current opinion. Actually, the above is just an excuse to post my new address to this group. This isn't really a comp.arch posting, more of net.people, but since many of my correspondents have contacted me through comp.arch, and since I've been rather active in this newsgroup in the past, I thought that I might inform comp.arch of my new whereabouts. I am: Andy Glew (perhaps better known as "Krazy Glew", although I am trying to live that down). My old email addresses was: aglew@uiuc.edu My new email address is: glew@mipon2.intel.com -- Andy Glew, glew@mipon2.intel.com Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, Hillsboro, Oregon 97124-6497
tom@ssd.csd.harris.com (Tom Horsley) (02/01/91)
>>I believe that all new processors are CALLED RISC, no matter how complex >>their design and instruction set is, because they have to use some >>RISC techniques for improved performance. >>IBM calls the RS/6000 a RISC CPU even though it fails most tests for having >>reduced anything. It has 184 instructions and contains 2,000,000 transistors My two cents worth: As far as I am concerned, the only import thing about RISC is the design philosophy behind it: 1) Let hardware do what hardware is good at, let compilers do what compilers are good at. 2) NEVER slow down the machine simply to add some sort of really neat instruction like polynomial evaluation that no one ever uses. I remember the research reports that showed an improvement in speed on CISC machines when they re-wrote the compilers to treat them like RISC machines. What this really means is that for the first time the compiler writers realized that maybe it is not a good idea to use a hardware feature simply because it is implemented in hardware. If you do careful performance analysis of the architecture and always choose the speediest technique to generate code, you find yourself not using a lot of the features on CISC machines, but you almost always use every feature on a RISC machine. As far as I am concerned, "Reduced" means "Reduced extra baggage in the hardware". As a compiler writer I don't care how "complex" an architecture is, as long as I can make effective use of the features and the engineers have not wasted time, energy, and money (or worst of all: cycles!) sticking junk that is not profitable in the architecture. As code generation techniques evolve and improve and hardware techniques evolve and improve, the correct balance between software and hardware will continue to change, but the real revolution associated with RISC is the recognition that there needs to be a balance. Before RISC no one ever paid any attention to this fundamental point. Now that RISC has arrived, I hope we never loose track of it again. An example: When we were working on the code generator for our 68k compilers, a member of our group did a fantastic amount of work generating a really massive analysis of the addressing modes on the 68030 (if you have ever looked at a 68030 manual, you know that "massive" is the only word to describe it). I do not recall the exact percentages, but *a lot* of the addressing mode are NEVER profitable, there are alternate instruction sequences that will always run faster to achieve the same results. A lot more of them are marginally profitable, depending on chicken and egg problems with interactions between the register allocator and code generator. All the work to implement these address modes in the hardware was totally wasted. -- ====================================================================== domain: tahorsley@csd.harris.com USMail: Tom Horsley uucp: ...!uunet!hcx1!tahorsley 511 Kingbird Circle Delray Beach, FL 33444 +==== Censorship is the only form of Obscenity ======================+ | (Wait, I forgot government tobacco subsidies...) | +====================================================================+
henry@zoo.toronto.edu (Henry Spencer) (02/02/91)
In article <1991Jan31.210053.22476@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes: >... I haven't programmed a 68040 or >an 80486 nor have I read details about their design, so I won't comment >further on this. I would like to hear details about why these are described >as RISC-like... This is marketing excrementspeak from CISC manufacturers. Ignore it. -- If the Space Shuttle was the answer, | Henry Spencer at U of Toronto Zoology what was the question? | henry@zoo.toronto.edu utzoo!henry
billms@eecs.umich.edu (Bill Mangione-Smith) (02/02/91)
In article <1991Feb1.162634.28992@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: In article <1991Jan31.210053.22476@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes: >... I haven't programmed a 68040 or >an 80486 nor have I read details about their design, so I won't comment >further on this. I would like to hear details about why these are described >as RISC-like... This is marketing excrementspeak from CISC manufacturers. Ignore it. But the term RISC and CISC are now just marketing excrementspeak. Certainly you know by now, Henry, that RISC=good, and CISC=bad. -- If the Space Shuttle was the answer, | Henry Spencer at U of Toronto Zoology what was the question? | henry@zoo.toronto.edu utzoo!henry bill -- ------------------------------- Bill Mangione-Smith billms@eecs.umich.edu
hrubin@pop.stat.purdue.edu (Herman Rubin) (02/02/91)
In article <TOM.91Feb1071345@hcx2.ssd.csd.harris.com>, tom@ssd.csd.harris.com (Tom Horsley) writes: > >>I believe that all new processors are CALLED RISC, no matter how complex > >>their design and instruction set is, because they have to use some > >>RISC techniques for improved performance. > > >>IBM calls the RS/6000 a RISC CPU even though it fails most tests for having > >>reduced anything. It has 184 instructions and contains 2,000,000 transistors > > My two cents worth: > > As far as I am concerned, the only import thing about RISC is the design > philosophy behind it: > > 1) Let hardware do what hardware is good at, let compilers do what > compilers are good at. > > 2) NEVER slow down the machine simply to add some sort of really neat > instruction like polynomial evaluation that no one ever uses. 1. Compilers and languages are designed by those who do not understand what intelligent users can get out of hardware. There is this great attempt at optimizing, but leaving out many important things which the current designers refuse to recognize. It is difficult for someone brought up on the HLLs to realize what has been left out. 2. If polynomial evaluation was, as was intended, faster than unrolling the loop, it would be heavily used. Putting in a "neat" instruction which it does not pay to use is, of course, stupid. If programming multiplication in terms of addition was faster than using a multiply unit, of course a multiply unit should not be included. The same holds for anything else. This is not RISC philosophy. It is (un)common sense. Other things, like saying number-crunching is floating-point and not fixed-point, which seems common in this group, is very short sighted and indicates the ignorance of the person involved. Even microcode does not go far enough back. Those who cannot see how the use of individual bits, unnormalized floating point, the need for good and fast integer arithmetic, etc., will continue to contribute to the problem. How much more hardware would be required if unnormalized floating point arithmetic were allowed? I suggest that anyone who thinks that a quick software solution is available try to produce one. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
sef@kithrup.COM (Sean Eric Fagan) (02/03/91)
In article <5040@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >2. If polynomial evaluation was, as was intended, faster than unrolling > the loop, it would be heavily used. Putting in a "neat" instruction > which it does not pay to use is, of course, stupid. More, putting in an instruction which requires other instructions to slow down, when doing the same thing through other methods is easy and/or fast, is, of course, stupid. But you don't seem to agree with that. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
hrubin@pop.stat.purdue.edu (Herman Rubin) (02/04/91)
In article <1991Feb03.071949.11297@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: > In article <5040@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > >2. If polynomial evaluation was, as was intended, faster than unrolling > > the loop, it would be heavily used. Putting in a "neat" instruction > > which it does not pay to use is, of course, stupid. > > More, putting in an instruction which requires other instructions to slow > down, when doing the same thing through other methods is easy and/or fast, > is, of course, stupid. > > But you don't seem to agree with that. Putting in more instructions does not necessarily slow down the instructions, especially if the additional code can be included in a way that it is decoded while the instruction is taking place. This would, for example, be the case if alternate versions of quotient and remainder were included, or the possibility of non-normalization in floating-point arithmetic. It may even\ save utilities, as there is no basic difference between integer and floating arithmetic. I suggeest that those who think these things are fast try coding them. In some cases, I can find workarounds in an application by doing things which look insane and use the explicit hardware representation of quantities, which seems to be almost anathema to the software designers. But on many machines, mixing integers and floats is a major headache, and doing fixed-point (not integer) also a horror. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
jdarcy@encore.com (Jeff d'Arcy) (02/04/91)
tom@ssd.csd.harris.com (Tom Horsley) writes:
-As far as I am concerned, the only import thing about RISC is the design
-philosophy behind it:
-
- 1) Let hardware do what hardware is good at, let compilers do what
- compilers are good at.
-
- 2) NEVER slow down the machine simply to add some sort of really neat
- instruction like polynomial evaluation that no one ever uses.
One of the principles that was repeated many times in Hennessy and Patterson's
book (yeah, I went out and read it) was "optimize for the frequent case". It
makes a lot of sense too, thanks to the 80/20 rule.
--
Jeff d'Arcy, Generic Software Engineer - jdarcy@encore.com
Contents under pressure - keep away from open flames
chip@tct.uucp (Chip Salzenberg) (02/04/91)
According to hrubin@pop.stat.purdue.edu (Herman Rubin): >1. Compilers and languages are designed by those who do not understand > what intelligent users can get out of hardware. "They disagree with me, so they must not understand the situation." >If programming multiplication in terms of addition was faster than using >a multiply unit, of course a multiply unit should not be included. This analysis omits dynamic performance measurement: if the addition of a multiply instruction makes addition 1% slower, then the multiply unit will be a good idea -- unless dynamic usage statistics show that additions outnumbers multiplications by 100 to one. The number 100 is arbitrary; the principle is not. Of course, Mr. Rubin's dynamic usage statistics differ from almost everyone else's. It is too bad that he Hasn't Gotten The Clue. -- Chip Salzenberg at Teltronics/TCT <chip@tct.uucp>, <uunet!pdn!tct!chip> "I want to mention that my opinions whether real or not are MY opinions." -- the inevitable William "Billy" Steinmetz
kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) (02/05/91)
In article <1991Jan31.210053.22476@athena.mit.edu>, jfc@athena.mit.edu (John F Carr) writes: |> In article <3416@uc.msc.umn.edu> dwm@msc.edu (Don Mears) writes: |> |> . . . |> |> >IBM calls the RS/6000 a RISC CPU even though it fails most tests for having |> >reduced anything. |> |> . . . |> |> The most |> surprising feature to me was the hardware string support; IBM justified this |> by pointing out profiling data showing that a large fraction of system time |> is spent in string compares**. |> Also, some of the marketing world's favorite benchmarks are affected greatly by the speed of string operations. ----------------------------------------------------------------------------- == jeff kenton Consulting at kenton@decvax.dec.com == == (617) 894-4508 (603) 881-0451 == -----------------------------------------------------------------------------
jgk@osc.COM (Joe Keane) (02/09/91)
In article <1991Feb1.204421.25992@news.larc.nasa.gov> kludge@grissom.larc.nasa.gov ( Scott Dorsey) writes: >PDP-11 Not RISC But the PDP-11 has such a simple instruction set. Every instruction is one word, and there are only a couple formats. Having the PC as a general register eliminates many separate instructions and addressing modes. Well OK, it's not really RISC, but i don't think you could call it a complex instruction set computer.
scu@otter.hpl.hp.com (Shankar Unni) (02/19/91)
> > In article <3416@uc.msc.umn.edu> dwm@msc.edu (Don Mears) writes: > > > > The most > > surprising feature to me was the hardware string support; IBM justified this > > by pointing out profiling data showing that a large fraction of system time > > is spent in string compares**. > > Also known as the DHRY instruction; named after a very popular but ultimately meaningless benchmark :-) :-). ----- Shankar Unni E-Mail: Hewlett-Packard California Language Lab. Internet: shankar@cup.hp.com DISCLAIMER: This response does not represent the official position of, or statement by, the Hewlett-Packard Company. It is a personal opinion only..
brandis@inf.ethz.ch (Marc Brandis) (02/20/91)
In article <780025@otter.hpl.hp.com> scu@otter.hpl.hp.com (Shankar Unni) writes: >> In article <3416@uc.msc.umn.edu> dwm@msc.edu (Don Mears) writes: >> The most >> surprising feature to me was the hardware string support; IBM justified this >> by pointing out profiling data showing that a large fraction of system time >> is spent in string compares**. >> >Also known as the DHRY instruction; named after a very popular but >ultimately meaningless benchmark :-) :-). It was not only the high number of string compares, but generally the high number of string operations that IBM used as a justification. Whether Dhrystone is a useful benchmark has been the cause of many flame wars and I do not want to raise this again, but indeed there are many programs that contain a lot of string operations. In one of the articles of "RISC System/6000 Technology" they are talking about the compiler, which spends a considerable time in string operations. I do not have profiling data, but I guess that many UNIX utilities contain a lot of string operations too. And you should not forget commercial data processing applications which are known to spend a lot of time in string operations. Talking about string operations and Dhrystone: Some of the benefit of the string operations is lost in the C Dhrystone on the IBM S/6000 because procedures are called for the string operations. In the Pascal version, the string operations are inlined, yielding a significant difference in the rating. On the IBM S/6000 Model 530, I got the following results. Dhrystone 2.1, C, all optimizations turned on 46729.0 Dhrystone 2.1, C, non-optimized 18764.7 Dhrystone 2.1, Pascal, all optimizations turned on 64892.9 Dhrystone 2.1, Pascal, non-optimized 21061.5 I had a look at the code and it was almost identical (no surprise, the optimizer is the same), except for the string operations. Marc-Michael Brandis Computer Systems Laboratory, ETH-Zentrum (Swiss Federal Institute of Technology) CH-8092 Zurich, Switzerland email: brandis@inf.ethz.ch