[comp.arch] SPARC implementation or architecture

david@elroy.jpl.nasa.gov (David Robinson) (04/18/91)

[I hope this doesn't start any religious wars about which RISC is "better"]

In looking at the various RISC chips that are on the market and the
various integer benchmarks, it appears that all of the SPARC based
machines have a lower rating for an equivilant clock speed.  I know that
something like SPECint/Mhz is not a great measure of worth but it
does pose a couple questions. 

Most of the RISC chips (MIPS, HP-PA, RS6000) seem to be having a 
SPECint/Mhz ratio of 0.70 - 0.80 while the SPARC systems come
in around 0.50.  From a buyer point of view, if
I can buy a 20 SPECint machine for $10K then I don't care if it
has an internal clock of 10Mhz or 100Mhz.  But from an architecture
point of view I want to know why one chip out performs another at
the same clock speed.  Is it an implementation issue such as the
fab process used effecting gate delays, or the design process
such as complete custom vs cell libraries, or design trade offs
such as pipeline depth?  If not an implementation issue then is
it a fundimental architecture issue such as lack of integer
multiply and divide or register windows vs large flat register file.

Has anyone compared why SPARC tends to run slower at the same clock
speed as other RISC chips?  Will this be a factor as the clock rate
is cranked up to 100Mhz and beyond?

	-David
-- 
David Robinson	david@elroy.jpl.nasa.gov 	{decwrl,usc,ames}!elroy!david
Disclaimer: No one listens to me anyway!
"Once a new technology rolls over you, if you're not part of the steamroller,
 you're part of the road." - Stewart Brand

dik@cwi.nl (Dik T. Winter) (04/18/91)

In article <1991Apr17.183822.7681@elroy.jpl.nasa.gov> david@elroy.jpl.nasa.gov (David Robinson) writes:
 > [I hope this doesn't start any religious wars about which RISC is "better"]
 >                        Is it an implementation issue such as the
 > fab process used effecting gate delays, or the design process
 > such as complete custom vs cell libraries, or design trade offs
 > such as pipeline depth?
Some of the differences come from this.  Although I can not find it right now
in the Cypress SPARC manual, I know that on the SPARC some instructions take
more than 1 clock cycle (stores for example).  I think however that that is
not the main source of the difference.  But at least this difference is not
architecurely defined.
 >                          If not an implementation issue then is
 > it a fundimental architecture issue such as lack of integer
 > multiply and divide or register windows vs large flat register file.
There are also implementation issues working here.  I do not think that
register windows play a big role; I would assume that overall the use of
register windows vs. the use of a flat register file would balance (in
some cases one is better, in other cases the other).  Lack of instructions
can play a role.  I do not know how important integer multiplies are for
the SPEC marks, but they are decidedly slower on the SPARC (note that some
future SPARCs will have integer multiply operations).  In this case it is
also interesting to see that the new HP PA 1.1 architecture has integer
multiply, and that it gives a big speed up on some codes.
On the other hand, I would think that the absense of condition codes on the
MIPS might have a detrimental effect on the SPEC marks.  HPPA might score well
because of its calculate+conditional-annul.

Other influences are from the compiler technology.  And perhaps more.

For my own, very personal, opinion, I think Acorn has the best ideas with
respect to RISC architecure.  But that is of course not relevant here!  (Well, I
think m88k comes second, amd29k third, mips fourth, sparc fifth and hp sixth.)
But that is my opinion on architecture, I do not think I would advise buying
workstations in this order!
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

mslater@cup.portal.com (Michael Z Slater) (04/18/91)

David Robinson writes:

>Has anyone compared why SPARC tends to run slower at the same clock
>speed as other RISC chips?  Will this be a factor as the clock rate
>is cranked up to 100Mhz and beyond?

I think the primary reason SPARC provide less performance at a given clock
speed is that they have a single instruction/data bus between the processor
and the cache, which puts a bubble in the pipe every time a load or store
occurs. Future implementations will have separate on-chip instruction and
data caches, and this limitation will go away.

So no, I don't think this will be a factor as the clock rate is cranked up,
because this will happen on new implementations.

There is an interesting paper in the ASPLOS-IV proceedings that compares the
MIPS and SPARC architecture and claims that SPARC actually executes
significantly fewer instructions for a given set of benchmarks (SPEC).
This would imply that it does not have an architectural disadvantage.

Does anyone who is familiar with that paper have any comments on it?

Michael Slater, Microprocessor Report  mslater@cup.portal.com

preston@ariel.rice.edu (Preston Briggs) (04/18/91)

>David Robinson writes:
>>Has anyone compared why SPARC tends to run slower at the same clock
>>speed as other RISC chips?  Will this be a factor as the clock rate
>>is cranked up to 100Mhz and beyond?

mslater@cup.portal.com (Michael Z Slater) writes:
...
>There is an interesting paper in the ASPLOS-IV proceedings that compares the
>MIPS and SPARC architecture and claims that SPARC actually executes
>significantly fewer instructions for a given set of benchmarks (SPEC).
...
>Does anyone who is familiar with that paper have any comments on it?

The paper is

	An Analysis of MIPS and SPARC Instruction Set Utilization on
	the SPEC Benchmarks
	Cmelik, King, Ditzel, Kelly
	ASPLOS-IV, 1991

It was presented by Dave Ditzel, of Sun Microsystems.
First off, people should read the paper; it's pretty hard to summarize
a summary of a big study!  Nevertheless, here's the raw instruction
counts:

	Benchmark		MIPS		SPARC		M/S
	------------------------------------------------------------
	spice		   21,569,202,673   22,878,017,309	0.94
	doduc		    1,613,227,089    1,303,276,485      1.24
	nasa7		    9,256,812,144    6,614,656,686	1.40
	matrix300	    2,775,967,947    1,693,589,255	1.64
	fppppppp	    2,316,200,144    1,443,008,199	1.61
	tomcat		    1,812,691,974    1,626,342,454	1.11
	------------------------------------------------------------
	FP Geometric Mean					1.30
	------------------------------------------------------------
	gcc		    1,110,816,041    1,115,986,011	0.96
	expresso	    2,828,804,443    2,930,860,108	0.97
	li		    6,022,855,076    4,661,320,853	1.29
	eqntott		    1,243,469,361    1,321,536,444	0.94
	------------------------------------------------------------
	Integer Geometeric Mean					1.03
	------------------------------------------------------------
	Overall Geometric Mean					1.18

Basically, the MIPS executed more instructions, except on most of the integer
benchmarks.  The authors note that a fairer comparison, taking into
account register window overhead, interlocks, and annulled instructions
still gives a 9% advantage to the SPARC.

The biggest contributer to the difference seemed to be that the
MIPS required two instructions to load or store a DP floating-point value.
Hence the wide disparity in the FP benchmarks.

Someone (Charlie Price?) from MIPS objected that the study had been carried
out with an old generation of the MIPS compilers, and that newer numbers
were significantly better for the MIPS.  Ditzel admitted this was possible,
but noted that they had used what was available on the market when they did
the study.

The paper goes on to discuss the affect of libraries, load/store usage,
branches, nops, integer ops, and fp ops.  Lots of good ideas for
both architectures.  The appendix contains detailed numbers for each
of the benchmarks.

BTW, the numbers were collected with pixie (mips) and spixie (sparc).
One the consistantly interesting parts of the conference is the methodology
used to perform experiments.  Lots of good ideas here.

Preston Briggs

mark@hubcap.clemson.edu (Mark Smotherman) (04/18/91)

From article <1991Apr17.183822.7681@elroy.jpl.nasa.gov>, by david@elroy.jpl.nasa.gov (David Robinson):
> Has anyone compared why SPARC tends to run slower at the same clock
> speed as other RISC chips?

As Michael Slater points out in the most recent Microprocessor Report
(p. 12, vol. 5, no. 6, April 3, 1991), the current SPARC implementations
exhibit lower SPECx/MHz in part because they use a unified I/D cache.
The competing implementations from MIPS, HP, and IBM have split caches.
Also, the early SPARCstations used only a single 4-byte write buffer.

The SS2 design seems to address the write buffer problem but not the
unified cache.  (Maybe the SPEC configuration parameters should include
#write buffers and presence or absence of a cache refill buffer and
store back buffer.)

One possible explanation to the less aggressive memory system design
seen in SPARC implementations is a reliance on register windows for
performance.  John Hennessy in the Oct. 1989 IEEE video seminar on RISC
processor design noted that the register window approach was thought to
substantially lower the load/store traffic (for integers) and could
therefore tolerate simplified (i.e., slower) caches.  However, Hennessy
also noted that SPARC register windows do not help FP load/stores.

An interesting architectural comparison between SPARC and MIPS was
given by Sun folks at ASPLOS-IV:  R.F. Cmelik, et al., "An analysis of
MIPS and SPARC instruction set utilization on the SPEC benchmarks,"
pp. 290-302.  (They concluded that SPARC had the advantage, but the
MIPS folks were quick to point out that they used current Sun compilers
and year-old MIPS compilers.  It will be interesting to see how we
chew over this paper in comp.arch!)  The data presented in this paper
showed the following MIPS/SPARC ratios for memory traffic:

	int loads (in int benchmarks)	1.07
	int stores			1.00
	FP loads (in FP benchmarks)	1.92	(i.e. MIPS did twice as many)
	FP stores			2.49

They suggested that the integer ratios do not show the true value of
register windows since the dynamic procedure calling frequency of the
SPEC benchmarks is abnormally low (see p. 293).  They also noted that
MIPS-I lacks dbl. prec. FP load/store but claimed that even disallowing
DP FP l/s in SPARC would _not_ significantly reduce the memory traffic
ratios; they attributed the large FP ratios to compiler technology.

MIPS has repeated the experiment with current compilers.  Let's ask
John Mashey to post the new numbers and new ratios or to publish a
follow-up article in ACM Computer Architecture Newsletter.

-- 
Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634
INTERNET: mark@hubcap.clemson.edu    UUCP: gatech!hubcap!mark

mash@mips.com (John Mashey) (04/20/91)

In article <1991Apr18.142341.23097@rice.edu> preston@ariel.rice.edu (Preston Briggs) writes:
>	An Analysis of MIPS and SPARC Instruction Set Utilization on
>	the SPEC Benchmarks
>	Cmelik, King, Ditzel, Kelly
>	ASPLOS-IV, 1991

>Basically, the MIPS executed more instructions, except on most of the integer
>benchmarks.  The authors note that a fairer comparison, taking into
>account register window overhead, interlocks, and annulled instructions
>still gives a 9% advantage to the SPARC.

>The biggest contributer to the difference seemed to be that the
>MIPS required two instructions to load or store a DP floating-point value.
>Hence the wide disparity in the FP benchmarks.

>Someone (Charlie Price?) from MIPS objected that the study had been carried
>out with an old generation of the MIPS compilers, and that newer numbers
>were significantly better for the MIPS.  Ditzel admitted this was possible,
>but noted that they had used what was available on the market when they did
>the study.

>The paper goes on to discuss the affect of libraries, load/store usage,
>branches, nops, integer ops, and fp ops.  Lots of good ideas for
>both architectures.  The appendix contains detailed numbers for each
>of the benchmarks.
>
>BTW, the numbers were collected with pixie (mips) and spixie (sparc).
>One the consistantly interesting parts of the conference is the methodology
>used to perform experiments.  Lots of good ideas here.

1) Definitely lots of good analysis here; it is heartening to see such
detailed analyses done on more meaningful programs, and when everybody
gets their ASPLOS proceedings, it is worth studying.  A subset of the
conclusions is fairly accurate, and in particular, the corrections
done for plausible extensions/changes are reasonably OK, as well as
most of the analysis about the effects of various features.

***********************************************************************
*Of course, most of the overall conclusions turn out to be wrong, if you
*use contemporary MIPS compilers released the same week as the Sun
*compilers used  in the study, as opposed to 1-year old MIPS compilers.
* In general, using the same approach as Sun, I get something like
* a 10% edge for MIPS (but read on to understand the numbers).
***********************************************************************

2) Sun did a perfectly reasonable thing, which is use the compilers
they could buy off-the-shelf from us for the analysis.  Unfortunately,
what fails to appear in the paper are the release dates of the compilers,
i.e., that:
	MIPS 2.10 release was 2Q90
	Sun compilers were announced for release last week

Why that might be relevant:

In the last year, I believe that Sun has devoted a large amount
of analysis and tuning of their compilers, using SPEC, among other things.
This is evidenced by, of course the exhaustive analysis in this paper,
as well as the improvements from 10.0 SPECmarks to 11.8 SPECmarks for
25MHz SPARCs (SS1+ in 2Q90 to SS IPC in 2Q91, with new compilers).

A 25MHz MIPS Magnum went from 17.8 to 18.6 in same time, and it actually
turns out that MIPS has done some (but not huge) analysis and tuning for
these benchmarks, and some of that shows up in the current compilers
(2.20), which happened to have been released within a week of when the
new Sun compilers were announced ..... although SPEC numbers for
both MIPS and Sun compilers in Beta form were published a while ago by
both.

(Note, just for the record, that there is NOTHING wrong with anybody
using the SPEC benchmarks to tune their compilers; it's infinitely better
for the buyers of computers out there if people do it with SPEC than
with Dhrystone or Whetstone.  I.e., if Sun spends a bunch of effort
analyzing SPEC to death and tuning things up, more power to them,
because the tunings are much more likely to encourage optimizations that
will help real programs, and actually DO something for a customer.)

3) The analysis in the paper is well worth reading, and the micro-level
discussions are very useful. The conclusions mostly follow from the data;
it's just that a 1-year difference in compiler choice still makes
a difference, and the general conclusions end up getting wiped out
by MIPS' compilers' gain in that year....

4) Let's start with the raw instruction counts (remember that this
need to be corrected for nops, annuls, stalls, etc ,etc).

Instruction counts in Millions:
Most numbers from Sun paper, Tables A1 and A2
Total = raw instruction counts
Total+ = SPARC counts with adjustments for window-handling,
annulled instructions, load-use stalls (i.e., a little closer comparison)
MIPS220 = equivalent of Total, but with 2.20 compilers rather than 2.10.
Notation of form (n/m) means column n divided by column m.
Most important thing to look at is differences between
columns 1 and 4, and 7 and 8.

FIRST TABLE:
COL	1	2	3	4	5	6	7	8
Source	Sun	Sun	Sun	mash	mash	Sun	Sun	mash
Bench	MIPS	SPARC	M/S	MIPS220	M/S	SPARC	M/S	M/S
	Total	Total	(1/2)	Total	(4/2)	Total+	(1/6)	(4/6)
---------------------------------------------------------------------
spice	21,569	22,878	0.94	20,114	0.88	26,516	0.81	0.76
doduc	 1,613	 1,303	1.24	 1,392	1.07	 1,335	1.21	1.04
nasa7    9,257	 6,615	1.40	 9,186	1.39	 6,719	1.38	1.37
matrx300 2,776   1,694	1.64	 2,339	1.38	 1,695	1.64	1.38
fpppp	 2,316	 1,443	1.61	 2,111	1.46	 1,472	1.57	1.43
tomcat	 1,813	 1,626	1.11	 1,738	1.07	 1,640	1.11	1.06
---------------------------------------------------------------------
FP Geometric Mean	1.30		1.19		1.25	1.15
---------------------------------------------------------------------
gcc	 1,111	 1,155	0.96	 1,149	0.99	1,317	0.84	0.87
espresso 2,829   2,931	0.97	 2,723	0.93	3,397	0.83	0.80
li	 6,023   4,661	1.29	 5,938	1.27	6,131	0.98	0.97
eqntott  1,243 	 1,322	0.94	 1,244	0.93	1,458	0.85	0.85
---------------------------------------------------------------------
Integer Geometric Mean	1.03		1.02		0.88	0.87
---------------------------------------------------------------------
Overall Geometric Mean	1.18		1.12		1.09	1.03
---------------------------------------------------------------------

Now, with this data, I draw several conclusions, some of which are rather
different than those given in the paper.  Again, please note that the
above is not PERFORMANCE data, but instruction-count (Total) or
Sun-adjusted-instruction-count (Total+) data, or my data on MIPS-1
with current compilers.

I'm particularly looking at the rightmost column above>

1) MIPS uses more instructions on floating-point [mostly due to the
lack of 64-bit FP load/stores, which is especially crucial to the
linear-algebra and related benchmarks].  The FP code improved somewhat
from 2.10 to 2.20, i.e., some of the effects seen in the paper were
from compilers where the benchmarks had barely been looked at in any
serious fashion...

2) MIPS usually uses less instructions on integer programs,
and definitely uses less instruction-equivalents, (i.e., from Total+).
The range is from .80 to .97, with a 95% confidence interval of
[0.76 to 0.99].

4) Now, lets look at the analysis at the end of the paper, which is
quite interesting (and is, in fact, a good example of the kinds of
analyses architects do or should do when figuring out how to design
and/or evolve and architecture).
What they did was:
start with the MIPS Total/Total+ counts (always equal),
and the SPARC Total+ counts, and then estimate the effects of adding
instructions to each architecture, to make them architectural-neutral,
and then compare, mostly to compare compilers, I guess.
Columns 1-3 were from Sun, and the MIPS Total+~ was computed from
the earlier Total by giving MIPS a 64-bit load/store (probably the
dominant effect). SPARC got the int<->fp improvements.
Column 4 is computed from the actual numbers for MIPS-2 machines
{R6000, R4000}, which have the 64-bit load/stores hypothesized by
Sun, plus load-interlocks and annulled-branches, and sqrt,
and a few other things.  It would have been interesting to have seen
the Sun numbers, as modified not just by the int<->fp change,
but also by the integer mul/div and anything else coming in the
next-generation SPARCs; however, the paper makes the case that
the mul/div issue is not a large one for this set of benchmarks
(maybe on the order of a percent, at most, depending on the benchmarks,
especially as the multiply left in the ineer loop of matrix300 on MIPS
has disappeared).

SECOND TABLE
COL	1	2	3	4	5	6
Source	Sun	Sun	Sun	mash	mash	mash
Bench	MIPS	SPARC	M/S	MIPS-2	M/S	M/S of adjusted
	Total+~	Total+~	(1/2)	Total	(4/2)	(col 4 of prev tab-adj)/2
---------------------------------------------------------------------
spice	20,211	25,095	0.81	18,429	0.73	0.75
doduc	 1,358	 1,301	1.04	 1,056	0.81	0.87
nasa7    6,927	 6,682	1.04	 6,454	0.97	1.03
matrx300 2,126   1,695	1.25	 2,339	0.99	1.00
fpppp	 1,616	 1,440	1.12	 1,319	0.92	0.98
tomcat	 1,377	 1,607	0.86	 1,283	0.80	0.81
---------------------------------------------------------------------
FP Geometric Mean	1.02		0.86	0.90
---------------------------------------------------------------------
gcc	 1,111	 1,262	0.88	 1,122	0.89
espresso 2,829   3,397	0.83	 2,648	0.81
li	 6,016   5,626	1.07	 5,504	0.98
eqntott  1,243 	 1,371	0.91	 1,247	0.91
---------------------------------------------------------------------
Integer Geometric Mean	0.93		0.90
---------------------------------------------------------------------
Overall Geometric Mean	0.97		0.88

Now, more notes:
1) It might be interesting to recalibrate the Sun-computed overall
Geo Mean above to allow for the MIPS 2.20 compilers.  I haven't
done this in detail, but note that going from 2.10 (on which columns
1 and 3 above are based) to 2.20 changed the Geo Means of Total+
in the earlier table from 1.10 to 1.03, with most of the effect being
in the FP area. Thus, I'd guess that the overall number as adjusted by Sun
would have come out around .94, rather than .97.

2) AS noted above, it would have been really fascinating to have seen
the actual next-generation SPARC numbers as column Total+~ above
(but I realize that would have a been a bit much to ask, since while
MIPS-2 changes are guessable, having been public for over a year,
the SPARC changes aren't yet public, I think (?))

3) So, to summarize, in the paper's conclusions, it says:
	a) "MIPS typically executes 18% more user-level instructions
	than SPARC"
	This could be rewritten as "MIPS typically executes 12%
	more user-level instructions than SPARC", although in any case,
	one must be very careful about the word "typical" here,
	in one can also say that MIPS usually executes less instructions
	on integer code, and more on FP code.  ("typical" is not
	a statistical term :-)
	b) "A fairer comparison which takes into account register-window
	overhead, load-use interlocks, and annulled instructions,
	still shows a 9% advantage for SPARC"  turns into:
	A fairer comparison shows a 3% advantage for SPARC.
	c) "most significant differences are SPARC's DP load/store,
	and MIPS compare-and-branch"
	Probably so.
	d) "When archiectural factors were factored out, the differences
	due to combined compiler/library effects were so small (3%)
	that neither MIPS nor SPARC has any significant advantage."
	Wellll...  include MIPS 2.20 compilers:
		-the 4 integer benchmarks range from 3% less to 20% less,
		with Geo mean = 13% less.
		-the 6 FP benchmarks are a little harder to figure,
		but if you take the 2.20 compiler numbers (COl 4 of
		first table), and subtract Sun's adjustments for 64-bit,
		and divide result by Sun's Total+~ column, you get
		the ratios shown in Column 6 above, whihc, not surprisingly,
		are in between what Sun computed, and what we actually get
		in MIPS-2.
		In any case this yields 10% less, using Sun's rules.

Anyway, one must be VERY careful to avoid over-generalization from a
small number of data points, and note, all of this was instruction
COUNTS, not cycles, and cycles are always more important.
Nevertheless, this kind of analysis in the Sun paper is very useful to
compiler writers (to answer question: why are THEY beating us in
that benchmark on instructions counts?  are they don't something
special? is it architecture, or our compilers missing an optimization?)

However, the major conclusion: that the two are indistinguishable,
is wrong, if you think 10% is distinguishable... 
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650

jvm@hpfcso.FC.HP.COM (Jack McClurg) (04/22/91)

> There is an interesting paper in the ASPLOS-IV proceedings that compares the
> MIPS and SPARC architecture and claims that SPARC actually executes
> significantly fewer instructions for a given set of benchmarks (SPEC).
> This would imply that it does not have an architectural disadvantage.
> 
> Does anyone who is familiar with that paper have any comments on it?
> 
> Michael Slater, Microprocessor Report  mslater@cup.portal.com
> ----------

One thing that struck me as strange about this paper was the data for the li
benchmark.  SPARC executes 4.7G instructions while MIPS executes 6G
instructions (29% more).  Recent published SPEC results show a 25MHz R3000
executes li in 270.5 seconds while a 40MHz SPARC executes li in 267.7 seconds.
This may be explained by the time difference (MIPS has improved their
compilers?), but I do not think so.  The authors of the paper made no attempt
to explain an obvious contradiction between what could be concluded from the
paper (SPARC runs li much faster) and reality (MIPS runs li much faster).  I
could postulate that since li has greater stack depth than the other
benchmarks, it shows some architectural differences between SPARC and MIPS
very plainly.  I was disappointed that the li results were not explained
better.

Although the paper is clearly intended to concentrate on instrucion
utilisation, I thought it would have been improved with cycles per instruction
data for each of the benchmarks.  This is important for benchmarks like
matrix300 where TLB misses dominate execution time hence instruction set
use is not very important.

The two things above were the few problems I had with the paper.  Overall I
thought that the paper was excellent.  It has myriad data.  The explanations
for the data are quite good.

Jack McClurg

lewine@cheshirecat.webo.dg.com (Donald Lewine) (04/23/91)

In article <2500@spim.mips.COM>, mash@mips.com (John Mashey) writes:
|> 
	[5 pages of comparison of Sun compilers and MIPS compilers
     deleted.]

|> However, the major conclusion: that the two are indistinguishable,
|> is wrong, if you think 10% is distinguishable... 
|> -- 
|> -john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
The last sentence is the point!  MIPS marketing (at least as retold
by DEC) is that the MIPS compiler technology provides a *HUGE* 
advantage over other RISC architectures.  If the difference is
+/- 10%, that is a whole different story.

--------------------------------------------------------------------
Donald A. Lewine                (508) 870-9008 Voice
Data General Corporation        (508) 366-0750 FAX
4400 Computer Drive. MS D112A
Westboro, MA 01580  U.S.A.

uucp: uunet!dg!lewine   Internet: lewine@cheshirecat.webo.dg.com

khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (04/24/91)

In article <1991Apr23.140140.27847@webo.dg.com> lewine@cheshirecat.webo.dg.com (Donald Lewine) writes:

   ....  MIPS marketing (at least as retold
   by DEC) is that the MIPS compiler technology provides a *HUGE* 
   ^^^^^^
   advantage over other RISC architectures.  If the difference is
   +/- 10%, that is a whole different story.

That was the old _DEC_ story. DEC's newest compiler release (press
release info) claimed dramatic improvements with the new compilers ...
which are a product of DEC not based on the MIPSco base.

Marketing departments reserve the right to change stories as
circumstances dictate ;>
--
----------------------------------------------------------------
Keith H. Bierman    keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33			 | (415 336 2648)   
    Mountain View, CA 94043

meissner@osf.org (Michael Meissner) (04/24/91)

In article <KHB.91Apr23193227@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes:

| 
| In article <1991Apr23.140140.27847@webo.dg.com> lewine@cheshirecat.webo.dg.com (Donald Lewine) writes:
| 
|    ....  MIPS marketing (at least as retold
|    by DEC) is that the MIPS compiler technology provides a *HUGE* 
|    ^^^^^^
|    advantage over other RISC architectures.  If the difference is
|    +/- 10%, that is a whole different story.
| 
| That was the old _DEC_ story. DEC's newest compiler release (press
| release info) claimed dramatic improvements with the new compilers ...
| which are a product of DEC not based on the MIPSco base.
| 
| Marketing departments reserve the right to change stories as
| circumstances dictate ;>

Maybe you should actually read the release.  It says that the put a
NEW front end on, and that the back end was provided by MIPSco.  Also,
only Fortran had dramatic improvements, C was pretty much unchanged.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?

baum@apple.com (Allen Baum) (04/24/91)

> 
> > There is an interesting paper in the ASPLOS-IV proceedings that compares the
> > MIPS and SPARC architecture and claims that SPARC actually executes
> > significantly fewer instructions for a given set of benchmarks (SPEC).
> > This would imply that it does not have an architectural disadvantage.
> > 
> > Does anyone who is familiar with that paper have any comments on it?
> > 
> > Michael Slater, Microprocessor Report  mslater@cup.portal.com
> > ----------
> 

My 2 cents:

The numbers presented by Sun indicated that they do use far fewer loads/stores
than MIPs, which they attribute to reg. windows. On the other hand, this
was not enough overall to change # of cycles to favor SPARC.

They indicate problems with long floats (lack of double ld/st) which hurt
FP performance a lot (presumably fixed in the R4000)

They purported to show that annulled branches were superior to inserting
NOPs after branching. I believe that argument is, well, flawed & misleading,
shall we say. Although it causes more insts. to be executed, it doesn't
cause more cycles to be spent. Sun argues that future versions of SPARCs
won't spend an extra cycle executing after annulling branches. I say that
whatever technique they use to do that can be applied to NOPs after a
branch (not to mention the addition of annulling branches to the MIPs
architecture which may be in the R4000)

They indicated a big win for MIPs by allowing FP<->Int reg moves instead of
going through memory.

Sun used an unreleased (at the time of the study) compiler vs. a released
compiler from Mips.

Overall: instead of +20% for SPARC, a -12%->25%. (Suns own numbers showed
about -12%. With the equivalent generation of Mips compilers, it could
be as much as -25%.

Is +-10% meaningful? Is +-20%? It depends who you ask. Supercomputer vendors
might go to great lengths for a couple of percent. When you get to workstation
prices, its a bit more academic.

craig@netcom.COM (Craig Hansen) (04/24/91)

In article <KHB.91Apr23193227@chiba.Eng.Sun.COM>, khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes:
> Marketing departments reserve the right to change stories as
> circumstances dictate ;>

IMHO, the Mips folk have been too gentle in their clarification
of the "results" of the paper. The single largest difference in
the results can be attributed to the fact that the Mips
implementation uses two instructions, that take two cycles,
to load or store a double-precision floating-point value, while
the Sun implementations use one instruction, that takes three or four
cycles to do the same operation. To express this as an advantage
for Sun is intellectually dishonest. The paper itself says:

	"If MIPS had double-precision loads and stores, it
	could reduce its instruction count by 19% in the SPEC
	FP benchmarks."

The basic premise of the
paper, to compare "architectures" rather than "implementations"
rests only on the following reason:

	"...because implementations change more frequently
	than instruction set architectures."

There is no further justification given, and it turns out that
this reasoning is false.

The 2.10 compiler release used to examine the MIPS code is
capable of generating the "MIPS-II" instruction set, which
includes load and store double. Mips has changed the
instruction set architecture just as frequently as the
implementation, with the exception of the R2000-to-R3000 change,
which was a pin-compatible modification that did not change
the width of the data interfaces.

Let me close by admitting by own obvious biases: I am a former
Mips employee who had a hand in the development of both
architecture and implementation there. I am typing this
message on a SPARCstation 2.

khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (04/25/91)

In article <MEISSNER.91Apr24103752@curley.osf.org> meissner@osf.org (Michael Meissner) writes:

..
   Maybe you should actually read the release.  It says that the put a
   NEW front end on, and that the back end was provided by MIPSco.  Also,
   only Fortran had dramatic improvements, C was pretty much unchanged.

The material I had wasn't that specific. I stand corrected. 
--
----------------------------------------------------------------
Keith H. Bierman    keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33			 | (415 336 2648)   
    Mountain View, CA 94043

mash@mips.com (John Mashey) (04/25/91)

In article <51942@apple.Apple.COM> baum@apple.com (Allen Baum) writes:
>The numbers presented by Sun indicated that they do use far fewer loads/stores
>than MIPs, which they attribute to reg. windows. On the other hand, this
>was not enough overall to change # of cycles to favor SPARC.

>They indicate problems with long floats (lack of double ld/st) which hurt
>FP performance a lot (presumably fixed in the R4000)
Yes, and in R6000.
>
>branch (not to mention the addition of annulling branches to the MIPs
>architecture which may be in the R4000)
Yes, and in R6000.
>
>Sun used an unreleased (at the time of the study) compiler vs. a released
>compiler from Mips.
P.S.  I have no problem with them using an unreleased compiler,
and certainly they couldn't get our unreleased compiler.  It would have been
nice had such a caveat appeared in the paper ....  because, this is a
perfect example of the necessity of appropriate caveats to obtain
defendable conclusions, i.e., the paper in question analyzed the
question "How does SPARC with current compilers compare with MIPS
with 1-year old compilers?" and got some answers. I analyzed the
question "How do MIPS and SPARC compare with both having current
compilers?" and got a different answer.

>Overall: instead of +20% for SPARC, a -12%->25%. (Suns own numbers showed
>about -12%. With the equivalent generation of Mips compilers, it could
>be as much as -25%.

>Is +-10% meaningful? Is +-20%? It depends who you ask. Supercomputer vendors
>might go to great lengths for a couple of percent. When you get to workstation
>prices, its a bit more academic.

Note that the paper was about INSTRUCTION COUNTS, which is not the same
as performance.  Also, note that people used to fight pretty hard over
10-15% differences on 2-mips products, and if you get 10-15% on 20 ... or 50,
you're talking about adding/subtracting a 386 or 68030's worth of
performance, which is pretty amusing, when you think about it.
Why it may or may not be relevant to workstations, is that sometimes
interactive code has some minimum requirement for speed, or lese people
won't tolerate it. If a compiler boost makes the difference,
then it's very relevant.

Finally, once again, it is good to see such analyses of substantial
chunks of code, rather than tinker-toys.  Everybody should be reminded to
include careful caveats about the dates of software, to avoid naking
general conclusions that are clearly invalidated by software releases
that happen simultaneously with the paper.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650