david@elroy.jpl.nasa.gov (David Robinson) (04/18/91)
[I hope this doesn't start any religious wars about which RISC is "better"] In looking at the various RISC chips that are on the market and the various integer benchmarks, it appears that all of the SPARC based machines have a lower rating for an equivilant clock speed. I know that something like SPECint/Mhz is not a great measure of worth but it does pose a couple questions. Most of the RISC chips (MIPS, HP-PA, RS6000) seem to be having a SPECint/Mhz ratio of 0.70 - 0.80 while the SPARC systems come in around 0.50. From a buyer point of view, if I can buy a 20 SPECint machine for $10K then I don't care if it has an internal clock of 10Mhz or 100Mhz. But from an architecture point of view I want to know why one chip out performs another at the same clock speed. Is it an implementation issue such as the fab process used effecting gate delays, or the design process such as complete custom vs cell libraries, or design trade offs such as pipeline depth? If not an implementation issue then is it a fundimental architecture issue such as lack of integer multiply and divide or register windows vs large flat register file. Has anyone compared why SPARC tends to run slower at the same clock speed as other RISC chips? Will this be a factor as the clock rate is cranked up to 100Mhz and beyond? -David -- David Robinson david@elroy.jpl.nasa.gov {decwrl,usc,ames}!elroy!david Disclaimer: No one listens to me anyway! "Once a new technology rolls over you, if you're not part of the steamroller, you're part of the road." - Stewart Brand
dik@cwi.nl (Dik T. Winter) (04/18/91)
In article <1991Apr17.183822.7681@elroy.jpl.nasa.gov> david@elroy.jpl.nasa.gov (David Robinson) writes: > [I hope this doesn't start any religious wars about which RISC is "better"] > Is it an implementation issue such as the > fab process used effecting gate delays, or the design process > such as complete custom vs cell libraries, or design trade offs > such as pipeline depth? Some of the differences come from this. Although I can not find it right now in the Cypress SPARC manual, I know that on the SPARC some instructions take more than 1 clock cycle (stores for example). I think however that that is not the main source of the difference. But at least this difference is not architecurely defined. > If not an implementation issue then is > it a fundimental architecture issue such as lack of integer > multiply and divide or register windows vs large flat register file. There are also implementation issues working here. I do not think that register windows play a big role; I would assume that overall the use of register windows vs. the use of a flat register file would balance (in some cases one is better, in other cases the other). Lack of instructions can play a role. I do not know how important integer multiplies are for the SPEC marks, but they are decidedly slower on the SPARC (note that some future SPARCs will have integer multiply operations). In this case it is also interesting to see that the new HP PA 1.1 architecture has integer multiply, and that it gives a big speed up on some codes. On the other hand, I would think that the absense of condition codes on the MIPS might have a detrimental effect on the SPEC marks. HPPA might score well because of its calculate+conditional-annul. Other influences are from the compiler technology. And perhaps more. For my own, very personal, opinion, I think Acorn has the best ideas with respect to RISC architecure. But that is of course not relevant here! (Well, I think m88k comes second, amd29k third, mips fourth, sparc fifth and hp sixth.) But that is my opinion on architecture, I do not think I would advise buying workstations in this order! -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
mslater@cup.portal.com (Michael Z Slater) (04/18/91)
David Robinson writes: >Has anyone compared why SPARC tends to run slower at the same clock >speed as other RISC chips? Will this be a factor as the clock rate >is cranked up to 100Mhz and beyond? I think the primary reason SPARC provide less performance at a given clock speed is that they have a single instruction/data bus between the processor and the cache, which puts a bubble in the pipe every time a load or store occurs. Future implementations will have separate on-chip instruction and data caches, and this limitation will go away. So no, I don't think this will be a factor as the clock rate is cranked up, because this will happen on new implementations. There is an interesting paper in the ASPLOS-IV proceedings that compares the MIPS and SPARC architecture and claims that SPARC actually executes significantly fewer instructions for a given set of benchmarks (SPEC). This would imply that it does not have an architectural disadvantage. Does anyone who is familiar with that paper have any comments on it? Michael Slater, Microprocessor Report mslater@cup.portal.com
preston@ariel.rice.edu (Preston Briggs) (04/18/91)
>David Robinson writes: >>Has anyone compared why SPARC tends to run slower at the same clock >>speed as other RISC chips? Will this be a factor as the clock rate >>is cranked up to 100Mhz and beyond? mslater@cup.portal.com (Michael Z Slater) writes: ... >There is an interesting paper in the ASPLOS-IV proceedings that compares the >MIPS and SPARC architecture and claims that SPARC actually executes >significantly fewer instructions for a given set of benchmarks (SPEC). ... >Does anyone who is familiar with that paper have any comments on it? The paper is An Analysis of MIPS and SPARC Instruction Set Utilization on the SPEC Benchmarks Cmelik, King, Ditzel, Kelly ASPLOS-IV, 1991 It was presented by Dave Ditzel, of Sun Microsystems. First off, people should read the paper; it's pretty hard to summarize a summary of a big study! Nevertheless, here's the raw instruction counts: Benchmark MIPS SPARC M/S ------------------------------------------------------------ spice 21,569,202,673 22,878,017,309 0.94 doduc 1,613,227,089 1,303,276,485 1.24 nasa7 9,256,812,144 6,614,656,686 1.40 matrix300 2,775,967,947 1,693,589,255 1.64 fppppppp 2,316,200,144 1,443,008,199 1.61 tomcat 1,812,691,974 1,626,342,454 1.11 ------------------------------------------------------------ FP Geometric Mean 1.30 ------------------------------------------------------------ gcc 1,110,816,041 1,115,986,011 0.96 expresso 2,828,804,443 2,930,860,108 0.97 li 6,022,855,076 4,661,320,853 1.29 eqntott 1,243,469,361 1,321,536,444 0.94 ------------------------------------------------------------ Integer Geometeric Mean 1.03 ------------------------------------------------------------ Overall Geometric Mean 1.18 Basically, the MIPS executed more instructions, except on most of the integer benchmarks. The authors note that a fairer comparison, taking into account register window overhead, interlocks, and annulled instructions still gives a 9% advantage to the SPARC. The biggest contributer to the difference seemed to be that the MIPS required two instructions to load or store a DP floating-point value. Hence the wide disparity in the FP benchmarks. Someone (Charlie Price?) from MIPS objected that the study had been carried out with an old generation of the MIPS compilers, and that newer numbers were significantly better for the MIPS. Ditzel admitted this was possible, but noted that they had used what was available on the market when they did the study. The paper goes on to discuss the affect of libraries, load/store usage, branches, nops, integer ops, and fp ops. Lots of good ideas for both architectures. The appendix contains detailed numbers for each of the benchmarks. BTW, the numbers were collected with pixie (mips) and spixie (sparc). One the consistantly interesting parts of the conference is the methodology used to perform experiments. Lots of good ideas here. Preston Briggs
mark@hubcap.clemson.edu (Mark Smotherman) (04/18/91)
From article <1991Apr17.183822.7681@elroy.jpl.nasa.gov>, by david@elroy.jpl.nasa.gov (David Robinson): > Has anyone compared why SPARC tends to run slower at the same clock > speed as other RISC chips? As Michael Slater points out in the most recent Microprocessor Report (p. 12, vol. 5, no. 6, April 3, 1991), the current SPARC implementations exhibit lower SPECx/MHz in part because they use a unified I/D cache. The competing implementations from MIPS, HP, and IBM have split caches. Also, the early SPARCstations used only a single 4-byte write buffer. The SS2 design seems to address the write buffer problem but not the unified cache. (Maybe the SPEC configuration parameters should include #write buffers and presence or absence of a cache refill buffer and store back buffer.) One possible explanation to the less aggressive memory system design seen in SPARC implementations is a reliance on register windows for performance. John Hennessy in the Oct. 1989 IEEE video seminar on RISC processor design noted that the register window approach was thought to substantially lower the load/store traffic (for integers) and could therefore tolerate simplified (i.e., slower) caches. However, Hennessy also noted that SPARC register windows do not help FP load/stores. An interesting architectural comparison between SPARC and MIPS was given by Sun folks at ASPLOS-IV: R.F. Cmelik, et al., "An analysis of MIPS and SPARC instruction set utilization on the SPEC benchmarks," pp. 290-302. (They concluded that SPARC had the advantage, but the MIPS folks were quick to point out that they used current Sun compilers and year-old MIPS compilers. It will be interesting to see how we chew over this paper in comp.arch!) The data presented in this paper showed the following MIPS/SPARC ratios for memory traffic: int loads (in int benchmarks) 1.07 int stores 1.00 FP loads (in FP benchmarks) 1.92 (i.e. MIPS did twice as many) FP stores 2.49 They suggested that the integer ratios do not show the true value of register windows since the dynamic procedure calling frequency of the SPEC benchmarks is abnormally low (see p. 293). They also noted that MIPS-I lacks dbl. prec. FP load/store but claimed that even disallowing DP FP l/s in SPARC would _not_ significantly reduce the memory traffic ratios; they attributed the large FP ratios to compiler technology. MIPS has repeated the experiment with current compilers. Let's ask John Mashey to post the new numbers and new ratios or to publish a follow-up article in ACM Computer Architecture Newsletter. -- Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634 INTERNET: mark@hubcap.clemson.edu UUCP: gatech!hubcap!mark
mash@mips.com (John Mashey) (04/20/91)
In article <1991Apr18.142341.23097@rice.edu> preston@ariel.rice.edu (Preston Briggs) writes: > An Analysis of MIPS and SPARC Instruction Set Utilization on > the SPEC Benchmarks > Cmelik, King, Ditzel, Kelly > ASPLOS-IV, 1991 >Basically, the MIPS executed more instructions, except on most of the integer >benchmarks. The authors note that a fairer comparison, taking into >account register window overhead, interlocks, and annulled instructions >still gives a 9% advantage to the SPARC. >The biggest contributer to the difference seemed to be that the >MIPS required two instructions to load or store a DP floating-point value. >Hence the wide disparity in the FP benchmarks. >Someone (Charlie Price?) from MIPS objected that the study had been carried >out with an old generation of the MIPS compilers, and that newer numbers >were significantly better for the MIPS. Ditzel admitted this was possible, >but noted that they had used what was available on the market when they did >the study. >The paper goes on to discuss the affect of libraries, load/store usage, >branches, nops, integer ops, and fp ops. Lots of good ideas for >both architectures. The appendix contains detailed numbers for each >of the benchmarks. > >BTW, the numbers were collected with pixie (mips) and spixie (sparc). >One the consistantly interesting parts of the conference is the methodology >used to perform experiments. Lots of good ideas here. 1) Definitely lots of good analysis here; it is heartening to see such detailed analyses done on more meaningful programs, and when everybody gets their ASPLOS proceedings, it is worth studying. A subset of the conclusions is fairly accurate, and in particular, the corrections done for plausible extensions/changes are reasonably OK, as well as most of the analysis about the effects of various features. *********************************************************************** *Of course, most of the overall conclusions turn out to be wrong, if you *use contemporary MIPS compilers released the same week as the Sun *compilers used in the study, as opposed to 1-year old MIPS compilers. * In general, using the same approach as Sun, I get something like * a 10% edge for MIPS (but read on to understand the numbers). *********************************************************************** 2) Sun did a perfectly reasonable thing, which is use the compilers they could buy off-the-shelf from us for the analysis. Unfortunately, what fails to appear in the paper are the release dates of the compilers, i.e., that: MIPS 2.10 release was 2Q90 Sun compilers were announced for release last week Why that might be relevant: In the last year, I believe that Sun has devoted a large amount of analysis and tuning of their compilers, using SPEC, among other things. This is evidenced by, of course the exhaustive analysis in this paper, as well as the improvements from 10.0 SPECmarks to 11.8 SPECmarks for 25MHz SPARCs (SS1+ in 2Q90 to SS IPC in 2Q91, with new compilers). A 25MHz MIPS Magnum went from 17.8 to 18.6 in same time, and it actually turns out that MIPS has done some (but not huge) analysis and tuning for these benchmarks, and some of that shows up in the current compilers (2.20), which happened to have been released within a week of when the new Sun compilers were announced ..... although SPEC numbers for both MIPS and Sun compilers in Beta form were published a while ago by both. (Note, just for the record, that there is NOTHING wrong with anybody using the SPEC benchmarks to tune their compilers; it's infinitely better for the buyers of computers out there if people do it with SPEC than with Dhrystone or Whetstone. I.e., if Sun spends a bunch of effort analyzing SPEC to death and tuning things up, more power to them, because the tunings are much more likely to encourage optimizations that will help real programs, and actually DO something for a customer.) 3) The analysis in the paper is well worth reading, and the micro-level discussions are very useful. The conclusions mostly follow from the data; it's just that a 1-year difference in compiler choice still makes a difference, and the general conclusions end up getting wiped out by MIPS' compilers' gain in that year.... 4) Let's start with the raw instruction counts (remember that this need to be corrected for nops, annuls, stalls, etc ,etc). Instruction counts in Millions: Most numbers from Sun paper, Tables A1 and A2 Total = raw instruction counts Total+ = SPARC counts with adjustments for window-handling, annulled instructions, load-use stalls (i.e., a little closer comparison) MIPS220 = equivalent of Total, but with 2.20 compilers rather than 2.10. Notation of form (n/m) means column n divided by column m. Most important thing to look at is differences between columns 1 and 4, and 7 and 8. FIRST TABLE: COL 1 2 3 4 5 6 7 8 Source Sun Sun Sun mash mash Sun Sun mash Bench MIPS SPARC M/S MIPS220 M/S SPARC M/S M/S Total Total (1/2) Total (4/2) Total+ (1/6) (4/6) --------------------------------------------------------------------- spice 21,569 22,878 0.94 20,114 0.88 26,516 0.81 0.76 doduc 1,613 1,303 1.24 1,392 1.07 1,335 1.21 1.04 nasa7 9,257 6,615 1.40 9,186 1.39 6,719 1.38 1.37 matrx300 2,776 1,694 1.64 2,339 1.38 1,695 1.64 1.38 fpppp 2,316 1,443 1.61 2,111 1.46 1,472 1.57 1.43 tomcat 1,813 1,626 1.11 1,738 1.07 1,640 1.11 1.06 --------------------------------------------------------------------- FP Geometric Mean 1.30 1.19 1.25 1.15 --------------------------------------------------------------------- gcc 1,111 1,155 0.96 1,149 0.99 1,317 0.84 0.87 espresso 2,829 2,931 0.97 2,723 0.93 3,397 0.83 0.80 li 6,023 4,661 1.29 5,938 1.27 6,131 0.98 0.97 eqntott 1,243 1,322 0.94 1,244 0.93 1,458 0.85 0.85 --------------------------------------------------------------------- Integer Geometric Mean 1.03 1.02 0.88 0.87 --------------------------------------------------------------------- Overall Geometric Mean 1.18 1.12 1.09 1.03 --------------------------------------------------------------------- Now, with this data, I draw several conclusions, some of which are rather different than those given in the paper. Again, please note that the above is not PERFORMANCE data, but instruction-count (Total) or Sun-adjusted-instruction-count (Total+) data, or my data on MIPS-1 with current compilers. I'm particularly looking at the rightmost column above> 1) MIPS uses more instructions on floating-point [mostly due to the lack of 64-bit FP load/stores, which is especially crucial to the linear-algebra and related benchmarks]. The FP code improved somewhat from 2.10 to 2.20, i.e., some of the effects seen in the paper were from compilers where the benchmarks had barely been looked at in any serious fashion... 2) MIPS usually uses less instructions on integer programs, and definitely uses less instruction-equivalents, (i.e., from Total+). The range is from .80 to .97, with a 95% confidence interval of [0.76 to 0.99]. 4) Now, lets look at the analysis at the end of the paper, which is quite interesting (and is, in fact, a good example of the kinds of analyses architects do or should do when figuring out how to design and/or evolve and architecture). What they did was: start with the MIPS Total/Total+ counts (always equal), and the SPARC Total+ counts, and then estimate the effects of adding instructions to each architecture, to make them architectural-neutral, and then compare, mostly to compare compilers, I guess. Columns 1-3 were from Sun, and the MIPS Total+~ was computed from the earlier Total by giving MIPS a 64-bit load/store (probably the dominant effect). SPARC got the int<->fp improvements. Column 4 is computed from the actual numbers for MIPS-2 machines {R6000, R4000}, which have the 64-bit load/stores hypothesized by Sun, plus load-interlocks and annulled-branches, and sqrt, and a few other things. It would have been interesting to have seen the Sun numbers, as modified not just by the int<->fp change, but also by the integer mul/div and anything else coming in the next-generation SPARCs; however, the paper makes the case that the mul/div issue is not a large one for this set of benchmarks (maybe on the order of a percent, at most, depending on the benchmarks, especially as the multiply left in the ineer loop of matrix300 on MIPS has disappeared). SECOND TABLE COL 1 2 3 4 5 6 Source Sun Sun Sun mash mash mash Bench MIPS SPARC M/S MIPS-2 M/S M/S of adjusted Total+~ Total+~ (1/2) Total (4/2) (col 4 of prev tab-adj)/2 --------------------------------------------------------------------- spice 20,211 25,095 0.81 18,429 0.73 0.75 doduc 1,358 1,301 1.04 1,056 0.81 0.87 nasa7 6,927 6,682 1.04 6,454 0.97 1.03 matrx300 2,126 1,695 1.25 2,339 0.99 1.00 fpppp 1,616 1,440 1.12 1,319 0.92 0.98 tomcat 1,377 1,607 0.86 1,283 0.80 0.81 --------------------------------------------------------------------- FP Geometric Mean 1.02 0.86 0.90 --------------------------------------------------------------------- gcc 1,111 1,262 0.88 1,122 0.89 espresso 2,829 3,397 0.83 2,648 0.81 li 6,016 5,626 1.07 5,504 0.98 eqntott 1,243 1,371 0.91 1,247 0.91 --------------------------------------------------------------------- Integer Geometric Mean 0.93 0.90 --------------------------------------------------------------------- Overall Geometric Mean 0.97 0.88 Now, more notes: 1) It might be interesting to recalibrate the Sun-computed overall Geo Mean above to allow for the MIPS 2.20 compilers. I haven't done this in detail, but note that going from 2.10 (on which columns 1 and 3 above are based) to 2.20 changed the Geo Means of Total+ in the earlier table from 1.10 to 1.03, with most of the effect being in the FP area. Thus, I'd guess that the overall number as adjusted by Sun would have come out around .94, rather than .97. 2) AS noted above, it would have been really fascinating to have seen the actual next-generation SPARC numbers as column Total+~ above (but I realize that would have a been a bit much to ask, since while MIPS-2 changes are guessable, having been public for over a year, the SPARC changes aren't yet public, I think (?)) 3) So, to summarize, in the paper's conclusions, it says: a) "MIPS typically executes 18% more user-level instructions than SPARC" This could be rewritten as "MIPS typically executes 12% more user-level instructions than SPARC", although in any case, one must be very careful about the word "typical" here, in one can also say that MIPS usually executes less instructions on integer code, and more on FP code. ("typical" is not a statistical term :-) b) "A fairer comparison which takes into account register-window overhead, load-use interlocks, and annulled instructions, still shows a 9% advantage for SPARC" turns into: A fairer comparison shows a 3% advantage for SPARC. c) "most significant differences are SPARC's DP load/store, and MIPS compare-and-branch" Probably so. d) "When archiectural factors were factored out, the differences due to combined compiler/library effects were so small (3%) that neither MIPS nor SPARC has any significant advantage." Wellll... include MIPS 2.20 compilers: -the 4 integer benchmarks range from 3% less to 20% less, with Geo mean = 13% less. -the 6 FP benchmarks are a little harder to figure, but if you take the 2.20 compiler numbers (COl 4 of first table), and subtract Sun's adjustments for 64-bit, and divide result by Sun's Total+~ column, you get the ratios shown in Column 6 above, whihc, not surprisingly, are in between what Sun computed, and what we actually get in MIPS-2. In any case this yields 10% less, using Sun's rules. Anyway, one must be VERY careful to avoid over-generalization from a small number of data points, and note, all of this was instruction COUNTS, not cycles, and cycles are always more important. Nevertheless, this kind of analysis in the Sun paper is very useful to compiler writers (to answer question: why are THEY beating us in that benchmark on instructions counts? are they don't something special? is it architecture, or our compilers missing an optimization?) However, the major conclusion: that the two are indistinguishable, is wrong, if you think 10% is distinguishable... -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650
jvm@hpfcso.FC.HP.COM (Jack McClurg) (04/22/91)
> There is an interesting paper in the ASPLOS-IV proceedings that compares the > MIPS and SPARC architecture and claims that SPARC actually executes > significantly fewer instructions for a given set of benchmarks (SPEC). > This would imply that it does not have an architectural disadvantage. > > Does anyone who is familiar with that paper have any comments on it? > > Michael Slater, Microprocessor Report mslater@cup.portal.com > ---------- One thing that struck me as strange about this paper was the data for the li benchmark. SPARC executes 4.7G instructions while MIPS executes 6G instructions (29% more). Recent published SPEC results show a 25MHz R3000 executes li in 270.5 seconds while a 40MHz SPARC executes li in 267.7 seconds. This may be explained by the time difference (MIPS has improved their compilers?), but I do not think so. The authors of the paper made no attempt to explain an obvious contradiction between what could be concluded from the paper (SPARC runs li much faster) and reality (MIPS runs li much faster). I could postulate that since li has greater stack depth than the other benchmarks, it shows some architectural differences between SPARC and MIPS very plainly. I was disappointed that the li results were not explained better. Although the paper is clearly intended to concentrate on instrucion utilisation, I thought it would have been improved with cycles per instruction data for each of the benchmarks. This is important for benchmarks like matrix300 where TLB misses dominate execution time hence instruction set use is not very important. The two things above were the few problems I had with the paper. Overall I thought that the paper was excellent. It has myriad data. The explanations for the data are quite good. Jack McClurg
lewine@cheshirecat.webo.dg.com (Donald Lewine) (04/23/91)
In article <2500@spim.mips.COM>, mash@mips.com (John Mashey) writes: |> [5 pages of comparison of Sun compilers and MIPS compilers deleted.] |> However, the major conclusion: that the two are indistinguishable, |> is wrong, if you think 10% is distinguishable... |> -- |> -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> The last sentence is the point! MIPS marketing (at least as retold by DEC) is that the MIPS compiler technology provides a *HUGE* advantage over other RISC architectures. If the difference is +/- 10%, that is a whole different story. -------------------------------------------------------------------- Donald A. Lewine (508) 870-9008 Voice Data General Corporation (508) 366-0750 FAX 4400 Computer Drive. MS D112A Westboro, MA 01580 U.S.A. uucp: uunet!dg!lewine Internet: lewine@cheshirecat.webo.dg.com
khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (04/24/91)
In article <1991Apr23.140140.27847@webo.dg.com> lewine@cheshirecat.webo.dg.com (Donald Lewine) writes:
.... MIPS marketing (at least as retold
by DEC) is that the MIPS compiler technology provides a *HUGE*
^^^^^^
advantage over other RISC architectures. If the difference is
+/- 10%, that is a whole different story.
That was the old _DEC_ story. DEC's newest compiler release (press
release info) claimed dramatic improvements with the new compilers ...
which are a product of DEC not based on the MIPSco base.
Marketing departments reserve the right to change stories as
circumstances dictate ;>
--
----------------------------------------------------------------
Keith H. Bierman keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33 | (415 336 2648)
Mountain View, CA 94043
meissner@osf.org (Michael Meissner) (04/24/91)
In article <KHB.91Apr23193227@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes: | | In article <1991Apr23.140140.27847@webo.dg.com> lewine@cheshirecat.webo.dg.com (Donald Lewine) writes: | | .... MIPS marketing (at least as retold | by DEC) is that the MIPS compiler technology provides a *HUGE* | ^^^^^^ | advantage over other RISC architectures. If the difference is | +/- 10%, that is a whole different story. | | That was the old _DEC_ story. DEC's newest compiler release (press | release info) claimed dramatic improvements with the new compilers ... | which are a product of DEC not based on the MIPSco base. | | Marketing departments reserve the right to change stories as | circumstances dictate ;> Maybe you should actually read the release. It says that the put a NEW front end on, and that the back end was provided by MIPSco. Also, only Fortran had dramatic improvements, C was pretty much unchanged. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?
baum@apple.com (Allen Baum) (04/24/91)
> > > There is an interesting paper in the ASPLOS-IV proceedings that compares the > > MIPS and SPARC architecture and claims that SPARC actually executes > > significantly fewer instructions for a given set of benchmarks (SPEC). > > This would imply that it does not have an architectural disadvantage. > > > > Does anyone who is familiar with that paper have any comments on it? > > > > Michael Slater, Microprocessor Report mslater@cup.portal.com > > ---------- > My 2 cents: The numbers presented by Sun indicated that they do use far fewer loads/stores than MIPs, which they attribute to reg. windows. On the other hand, this was not enough overall to change # of cycles to favor SPARC. They indicate problems with long floats (lack of double ld/st) which hurt FP performance a lot (presumably fixed in the R4000) They purported to show that annulled branches were superior to inserting NOPs after branching. I believe that argument is, well, flawed & misleading, shall we say. Although it causes more insts. to be executed, it doesn't cause more cycles to be spent. Sun argues that future versions of SPARCs won't spend an extra cycle executing after annulling branches. I say that whatever technique they use to do that can be applied to NOPs after a branch (not to mention the addition of annulling branches to the MIPs architecture which may be in the R4000) They indicated a big win for MIPs by allowing FP<->Int reg moves instead of going through memory. Sun used an unreleased (at the time of the study) compiler vs. a released compiler from Mips. Overall: instead of +20% for SPARC, a -12%->25%. (Suns own numbers showed about -12%. With the equivalent generation of Mips compilers, it could be as much as -25%. Is +-10% meaningful? Is +-20%? It depends who you ask. Supercomputer vendors might go to great lengths for a couple of percent. When you get to workstation prices, its a bit more academic.
craig@netcom.COM (Craig Hansen) (04/24/91)
In article <KHB.91Apr23193227@chiba.Eng.Sun.COM>, khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes: > Marketing departments reserve the right to change stories as > circumstances dictate ;> IMHO, the Mips folk have been too gentle in their clarification of the "results" of the paper. The single largest difference in the results can be attributed to the fact that the Mips implementation uses two instructions, that take two cycles, to load or store a double-precision floating-point value, while the Sun implementations use one instruction, that takes three or four cycles to do the same operation. To express this as an advantage for Sun is intellectually dishonest. The paper itself says: "If MIPS had double-precision loads and stores, it could reduce its instruction count by 19% in the SPEC FP benchmarks." The basic premise of the paper, to compare "architectures" rather than "implementations" rests only on the following reason: "...because implementations change more frequently than instruction set architectures." There is no further justification given, and it turns out that this reasoning is false. The 2.10 compiler release used to examine the MIPS code is capable of generating the "MIPS-II" instruction set, which includes load and store double. Mips has changed the instruction set architecture just as frequently as the implementation, with the exception of the R2000-to-R3000 change, which was a pin-compatible modification that did not change the width of the data interfaces. Let me close by admitting by own obvious biases: I am a former Mips employee who had a hand in the development of both architecture and implementation there. I am typing this message on a SPARCstation 2.
khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (04/25/91)
In article <MEISSNER.91Apr24103752@curley.osf.org> meissner@osf.org (Michael Meissner) writes:
..
Maybe you should actually read the release. It says that the put a
NEW front end on, and that the back end was provided by MIPSco. Also,
only Fortran had dramatic improvements, C was pretty much unchanged.
The material I had wasn't that specific. I stand corrected.
--
----------------------------------------------------------------
Keith H. Bierman keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33 | (415 336 2648)
Mountain View, CA 94043
mash@mips.com (John Mashey) (04/25/91)
In article <51942@apple.Apple.COM> baum@apple.com (Allen Baum) writes: >The numbers presented by Sun indicated that they do use far fewer loads/stores >than MIPs, which they attribute to reg. windows. On the other hand, this >was not enough overall to change # of cycles to favor SPARC. >They indicate problems with long floats (lack of double ld/st) which hurt >FP performance a lot (presumably fixed in the R4000) Yes, and in R6000. > >branch (not to mention the addition of annulling branches to the MIPs >architecture which may be in the R4000) Yes, and in R6000. > >Sun used an unreleased (at the time of the study) compiler vs. a released >compiler from Mips. P.S. I have no problem with them using an unreleased compiler, and certainly they couldn't get our unreleased compiler. It would have been nice had such a caveat appeared in the paper .... because, this is a perfect example of the necessity of appropriate caveats to obtain defendable conclusions, i.e., the paper in question analyzed the question "How does SPARC with current compilers compare with MIPS with 1-year old compilers?" and got some answers. I analyzed the question "How do MIPS and SPARC compare with both having current compilers?" and got a different answer. >Overall: instead of +20% for SPARC, a -12%->25%. (Suns own numbers showed >about -12%. With the equivalent generation of Mips compilers, it could >be as much as -25%. >Is +-10% meaningful? Is +-20%? It depends who you ask. Supercomputer vendors >might go to great lengths for a couple of percent. When you get to workstation >prices, its a bit more academic. Note that the paper was about INSTRUCTION COUNTS, which is not the same as performance. Also, note that people used to fight pretty hard over 10-15% differences on 2-mips products, and if you get 10-15% on 20 ... or 50, you're talking about adding/subtracting a 386 or 68030's worth of performance, which is pretty amusing, when you think about it. Why it may or may not be relevant to workstations, is that sometimes interactive code has some minimum requirement for speed, or lese people won't tolerate it. If a compiler boost makes the difference, then it's very relevant. Finally, once again, it is good to see such analyses of substantial chunks of code, rather than tinker-toys. Everybody should be reminded to include careful caveats about the dates of software, to avoid naking general conclusions that are clearly invalidated by software releases that happen simultaneously with the paper. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650