[comp.arch] SPARC vs MC68040

davet@oakhill.UUCP (David Trissel) (02/11/90)

In article <1850@cbnewsi.ATT.COM> ca@cbnewsi.ATT.COM (christopher.arnone) writes:
>
>Initial information about the 040 indicate that it will be faster than
>the SPARC at 25Mhz.  Of course, this remains to be seen.

The fastest Dhrystone 2.1 I have seen reported for SPARC (obtained from the 
report file delivered with the 4/89 Usenet distribution of the benchmark) is 
23,148 KDhrys. The MC68040 runs that benchmark at twice the speed.

 -- Dave Trissel - Motorola Semiconductor, Austin Texas

jkrueger@dgis.dtic.dla.mil (Jon) (02/12/90)

davet@oakhill.UUCP (David Trissel) writes:

>The fastest Dhrystone 2.1 I have seen reported for SPARC (obtained from the 
>report file delivered with the 4/89 Usenet distribution of the benchmark) is 
>23,148 KDhrys. The MC68040 runs that benchmark at twice the speed.

Inside Moto or elsewhere?  What compiler?

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
The Philip Morris Companies, Inc: without question the strongest
and best argument for an anti-flag-waving amendment.

wbeebe@rtmvax.UUCP (Bill Beebe) (02/12/90)

In article <2938@oakhill.UUCP> davet@oakhill.UUCP (David Trissel) writes:
>
>The fastest Dhrystone 2.1 I have seen reported for SPARC (obtained from the 
>report file delivered with the 4/89 Usenet distribution of the benchmark) is 
>23,148 KDhrys. The MC68040 runs that benchmark at twice the speed.
>
> -- Dave Trissel - Motorola Semiconductor, Austin Texas

Protestations aside, nothing will lend truth to the Moto numbers until some
independant non-Moto numbers come back from *real* working silicon and
systems. To re-quote a very tired old paraphrase, "there are lies, damn
lies, and then there are vendor benchmarks". I am intriqued by Heurikon's
(please forgive the spelling) 25 Mhz 68040 VME card in which they claim
only 14 or so VAX mips. Questions: what is the correspondance between VAX
mips and 040 mips (is it 1:1?); what board level architecture did Heurikon
use on their board?

Something else that's interesting. In the February 7th Microprocessor
Report, page 4, under new SPEC numbers, a Moto system with a 33 Mhz 88K came
up with a 17.8 SPECmark. Congratulations. However, the article goes on to
note that the 88K SPECmark was only 1% over the SPARC's 17.6 SPECmark (as
well as the MIPS). I would be most interested to see SPECmarks for the
Heurikon board (or any other system) running the 040 at 25 Mhz or even 33
Mhz.

The old Chinese curse has indeed come true. We do indeed live in interesting
times.

mash@mips.COM (John Mashey) (02/12/90)

In article <3085@rtmvax.UUCP> wbeebe@rtmvax.UUCP (Bill Beebe) writes:
>In article <2938@oakhill.UUCP> davet@oakhill.UUCP (David Trissel) writes:
>>
>>The fastest Dhrystone 2.1 I have seen reported for SPARC (obtained from the 
>>report file delivered with the 4/89 Usenet distribution of the benchmark) is 
>>23,148 KDhrys. The MC68040 runs that benchmark at twice the speed.
>>
>> -- Dave Trissel - Motorola Semiconductor, Austin Texas
>
>Protestations aside, nothing will lend truth to the Moto numbers until some
>independant non-Moto numbers come back from *real* working silicon and
>systems. To re-quote a very tired old paraphrase, "there are lies, damn
>lies, and then there are vendor benchmarks". I am intriqued by Heurikon's
>(please forgive the spelling) 25 Mhz 68040 VME card in which they claim
>only 14 or so VAX mips. Questions: what is the correspondance between VAX
>mips and 040 mips (is it 1:1?); what board level architecture did Heurikon
>use on their board?
>
>Something else that's interesting. In the February 7th Microprocessor
>Report, page 4, under new SPEC numbers, a Moto system with a 33 Mhz 88K came
>up with a 17.8 SPECmark. Congratulations. However, the article goes on to
>note that the 88K SPECmark was only 1% over the SPARC's 17.6 SPECmark (as
>well as the MIPS). I would be most interested to see SPECmarks for the
>Heurikon board (or any other system) running the 040 at 25 Mhz or even 33
>Mhz.

A bunch of people at various companies are busily stuffing SPEC numbers into
spreadsheets, plus published mips-ratings, and analyzing.  I'm also trying
to calibrate i486 and 68040 numbers into this scheme.

NOTE: regarding Dhrystone:
	a) A bunch of Motorola people have been working hard along with the
	rest of the SPECers to get better benchmarks, and have started getting
	good compiler gains by analyzing real programs.
	Talking about Dhrystone is a step backwards...
	b) In this newsgroup has been discussed many times why one has to
	be careful with Dhrystone ratings.  Also, I quote from the author's
	directions:
"In any case, for serious performance evaluation, users are  advised  to
ask  for  code listings and to check them carefully."
	EVERYBODY knows that inlining strcpy&strcmp can boost the number
	strongly without giving anything like that boost on real programs.
	SO POST THE CODE WHERE THE CRUCIAL STRCPY/STRCMP calls are made;
	otherwise, the number is simply meaningless, because anybody can
	boost the performance substantially on Dhrystone by an optimization
	that has relatively little effect on real programs.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (02/12/90)

In article <35825@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>
>NOTE: regarding Dhrystone:
>	a) A bunch of Motorola people have been working hard along with the
>	rest of the SPECers to get better benchmarks, and have started getting
>	good compiler gains by analyzing real programs.
>	Talking about Dhrystone is a step backwards...

Tuning compilers for it is also a step backwards. 

In a previous life, I worked on hightly optimizing compilers.
Adjusting our product to make one benchmark run better, would make
others run worse.  I am convinced that compilers tuned for Dhrystone,
are in fact badly tuned. We will be doing ourselves a favor by
demanding good Specmarks instead of good Dhrystones.

-- 
Don		D.C.Lindsay 	Carnegie Mellon Computer Science

davet@oakhill.UUCP (David Trissel) (02/12/90)

In article <35825@mips.mips.COM> mash@mips.COM (John Mashey) writes:

>>> is 23,148 KDhrys. The MC68040 runs that benchmark at twice the speed.

>A bunch of people at various companies are busily stuffing SPEC numbers into
>spreadsheets, plus published mips-ratings, and analyzing.  I'm also trying
>to calibrate i486 and 68040 numbers into this scheme.

What does this have to do with Dhrystone?

>	a) A bunch of Motorola people have been working hard along with the
>	rest of the SPECers to get better benchmarks, and have started getting
>	good compiler gains by analyzing real programs.
>	Talking about Dhrystone is a step backwards...

As has been discovered Dhrystones are an excellent way to find out how fast 
string operations go and to what extent C compilers incoporate them in-line. 
And the 1.1 version, since it is essentially a big NOP, can go a long way 
towards indicating how good a compiler is at removing dead code.

The Dhrystone benchmarks have known weaknesses. The SPEC benchmarks have their 
own. Many people are interested in Dhrystone so it gets talked about. If you
don't care for discussions on Dhrystone then simply ignore them.

>	b) In this newsgroup has been discussed many times why one has to
>	be careful with Dhrystone ratings.  Also, I quote from the author's
>	directions:
>"In any case, for serious performance evaluation, users are  advised  to
>ask  for  code listings and to check them carefully."

This true for ALL benchmarks. Do you think it only applies to Dhrystone?
Do you think it does not apply to the SPEC benchmarks? I know I wouldn't
be choosing a computer architecture without looking at code the compiler
produces.

>	EVERYBODY knows that inlining strcpy&strcmp can boost the number
>	strongly without giving anything like that boost on real programs.
>	SO POST THE CODE WHERE THE CRUCIAL STRCPY/STRCMP calls are made;
>	otherwise, the number is simply meaningless, because anybody can
>	boost the performance substantially on Dhrystone by an optimization
>	that has relatively little effect on real programs.

I fail to understand your tone here. By your own admission in a posting
you did to this newsgroup on March 15, 1989:

 "Now, according to the letter or the law of Herr Doktor Weicker's 
  Dhrystone 2.1 writeup, it's OK to in-line strcpy and strcmp.

and this is what the MC68040 compiler does. So just what is the problem? 
Here is one of the string copies (they all look similar) directly from the 
benchmark's .s file:

    lea.l   (12,%sp),%a5
    mov.l   %a5,%a1
    mov.l   &L%93,%a0
    mov.l   (%a0)+,(%a1)+
    mov.l   (%a0)+,(%a1)+
    mov.l   (%a0)+,(%a1)+
    mov.l   (%a0)+,(%a1)+
    mov.l   (%a0)+,(%a1)+
    mov.l   (%a0)+,(%a1)+
    mov.l   (%a0)+,(%a1)+
    mov.w   (%a0)+,(%a1)+
    mov.b   (%a0)+,(%a1)+

Now let's see you post the code that your MIPS compiler produces. Then tell
us what you find to be relevant about the two postings.

 -- Dave Trissel - Motorola, Austin

mash@mips.COM (John Mashey) (02/13/90)

In article <2943@oakhill.UUCP> davet@oakhill.UUCP (David Trissel) writes:
>In article <35825@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>
>>>> is 23,148 KDhrys. The MC68040 runs that benchmark at twice the speed.
>
>>A bunch of people at various companies are busily stuffing SPEC numbers into
>>spreadsheets, plus published mips-ratings, and analyzing.  I'm also trying
>>to calibrate i486 and 68040 numbers into this scheme.
>
>What does this have to do with Dhrystone?
Sorry, among other things, when you start looking at such data, you see that:
	a) Dhrystone correlates with integer performance on real benchmarks
	within machine lines, with same compilers, at least somewhat.
	b) It has some correlation among machines lines.
	c) If one machine uses the inline, and one doesn't, the difference
	in performance badly mispredicts the performance on realistic
	programs.
>The Dhrystone benchmarks have known weaknesses. The SPEC benchmarks have their 
>own. Many people are interested in Dhrystone so it gets talked about. If you
>don't care for discussions on Dhrystone then simply ignore them.
Impossible: it casues too much confusion, and I have to keep explaining to
financial analysts, and I'm tired of that.  The SPEC benchmarks have their
own weakenesses of course, but they're hardly in Dhrystone's class.

>>"In any case, for serious performance evaluation, users are  advised  to
>>ask  for  code listings and to check them carefully."

>This true for ALL benchmarks. Do you think it only applies to Dhrystone?
>Do you think it does not apply to the SPEC benchmarks? I know I wouldn't
>be choosing a computer architecture without looking at code the compiler
>produces.
Of course, but in Dhrystone's case, if all you ahve is the nubmers for two
machines, you know very little about their relative performance, without
looking at the code; it is especially irksome that it contains an optimization
that improves it's performance greatly, that simply does not improve
realistic programs significantly. (That doesn't mean that selective inlining
of strings is bad; in fact, if Dhrystone contined a REPRESENTATIVE set of
string operations, I wouldn't object so much, but it doesn't.)

>
>>	EVERYBODY knows that inlining strcpy&strcmp can boost the number
>>	strongly without giving anything like that boost on real programs.
>>	SO POST THE CODE WHERE THE CRUCIAL STRCPY/STRCMP calls are made;
>>	otherwise, the number is simply meaningless, because anybody can
>>	boost the performance substantially on Dhrystone by an optimization
>>	that has relatively little effect on real programs.
>
>I fail to understand your tone here. By your own admission in a posting
>you did to this newsgroup on March 15, 1989:
>
> "Now, according to the letter or the law of Herr Doktor Weicker's 
>  Dhrystone 2.1 writeup, it's OK to in-line strcpy and strcmp.

Yes, but subject to the comment above,which most people will not do,
i.e., hardly anyone shows the code for this.  The SPEC benchmarks were chosen
to allow any optimization you like, but have the effect that there are very
few optimizations you can do that won't help lots of real programs.
>
>and this is what the MC68040 compiler does. So just what is the problem? 
>Here is one of the string copies (they all look similar) directly from the 
>benchmark's .s file:
>
>    lea.l   (12,%sp),%a5
>    mov.l   %a5,%a1
>    mov.l   &L%93,%a0
>    mov.l   (%a0)+,(%a1)+
>    mov.l   (%a0)+,(%a1)+
>    mov.l   (%a0)+,(%a1)+
>    mov.l   (%a0)+,(%a1)+
>    mov.l   (%a0)+,(%a1)+
>    mov.l   (%a0)+,(%a1)+
>    mov.l   (%a0)+,(%a1)+
>    mov.w   (%a0)+,(%a1)+
>    mov.b   (%a0)+,(%a1)+
	Good! you at least did it correctly in general case, unlike the i860
	that pads this to 32-bytes so it can do 2 quad-word loads & stores...
>
>Now let's see you post the code that your MIPS compiler produces. Then tell
>us what you find to be relevant about the two postings.
Dhrystone usually overpredicts VAX-relative performance; on most machines,
if I know that this inlining is being done, I can estimate that it overpredicts
it another 20-30%.  That's what's relevant.

The numbers we use all come from:
	jal	strcpy
and I've seen the SPARC code as the equivalent; meaning, I think such things
don't overpredict as much (they still overpredict, and this has been
well-documented for years in published materials.)

And the reason (we don't inline str*) is:
	a) When you inline code it gets bigger.
	b) You might want to inline it only in those places it's called a lot.
	c) But there's acomplicated set of rules for when it's really a good
	idea in general.  Among other things, MOST strcpy's aren't of constants,
	they're of pointers to things whose alignment can't be predicted,
	or at least the target is some arbitrary pointer, and then this
	optimization doesn't work very well.  The only one I've seen that looked
	like it would really pay off is inlining strcpy's of small constants
	(1-2 bytes), or ones where you happen to know the alignment, and then
	up to a few words.
	d) Remember, we actually do full-bore inlining in the general case....
	but are forbidden by the rules from using it.... and we don't.

Here's the bottom line: either Dhrystone is a good predictor of
integer performance on real programs, or it isn't.  If it is (and it once
almost used to be), then it's a Good Thing, because it's simple and easy
to use.  If it doesn't correlate well with performance on real programs,
then it's become obsolete.

Rather than replowing ground that has been plowed for years,
let's try something else, as a bottom line, and get something concrete:
QUIZ:

It is claimed that a 25MHz 68040 is 2X faster than a 25MHz SPARC on Dhrystone;
for concreteness, consider a 68040 with at least 64K external cache,

a) Will it be 2X faster on the Geometric Mean of the 4 SPEC C benchmarks?
(Using same compiler as Dhrystone.)
b) Will it be more than 2X?
c) Will it be less than 2X?
d) Will it be a lot less than 2X, in fact, maybe closer to 1X?

I'd encourage anyone who posts to post an Analysis to back up their opinion,
with some data; I'm working on a Guesstimate for about a week from now.

If someone prefers other realistic benchmarks, that would be a good exercise
as well.

In any case, thanx to Mr. Trissell for properly qualifying the Dhrystone
number; this actually helps a lot.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

aglew@oberon.csg.uiuc.edu (Andy Glew) (02/13/90)

>Something else that's interesting. In the February 7th Microprocessor
>Report, page 4, under new SPEC numbers, a Moto system with a 33 Mhz 88K came
>up with a 17.8 SPECmark. Congratulations. However, the article goes on to
>note that the 88K SPECmark was only 1% over the SPARC's 17.6 SPECmark (as
>well as the MIPS). I would be most interested to see SPECmarks for the
>Heurikon board (or any other system) running the 040 at 25 Mhz or even 33
>Mhz.

I was waiting for some poster to notice this.

I felt honour-bound to type in the first SPECmark report, even though
I knew that the 88K results for early, untuned, compilers and OS were
being compared with more mature products.  Since then I have left Motorola
and no longer receive the SPEC reports.  Would somebody care to type
in the latest report (somebody from MIPS, perhaps? :-)

Prediction: MIPS and the 88K will keep swapping places (the same way
the 80x86 and 68k families do). They are basically very similar chips,
with minor differences that are important to specific applications
(not true for 80x86 vs. 68K!).  I know that there are some
blockbusters in the 88K camp coming, but I'm sure that the same goes
for MIPS.  The real differences in ranking will come from systems
level issues: how good a cache you have, how good your memory bus
interface is, how good your compilers are.
--
Andy Glew, aglew@uiuc.edu

alan@oz.nm.paradyne.com (Alan Lovejoy) (02/13/90)

In article <AGLEW.90Feb12170628@oberon.csg.uiuc.edu> aglew@oberon.csg.uiuc.edu (Andy Glew) writes:
>Would somebody care to type
>in the latest [SPECmark] report (somebody from MIPS, perhaps? :-)

Well, I'm not from MIPS, but I typed it in and posted it this morning.

You're welcome :-).

>Prediction: MIPS and the 88K will keep swapping places (the same way
>the 80x86 and 68k families do). They are basically very similar chips,
>with minor differences that are important to specific applications

I hope this is true.  That way, it won't matter so much which architecture
"wins" (except to MIPS and Moto), which was most definitely NOT the case
in the 68k vx. x86 conflict.  The 88k and the Rx000 both are CPUs that 
I can use without holding my nose.

____"Congress shall have the power to prohibit speech offensive to Congress"____
Alan Lovejoy; alan@pdn; 813-530-2211; AT&T Paradyne: 8550 Ulmerton, Largo, FL.
Disclaimer: I do not speak for AT&T Paradyne.  They do not speak for me. 
Mottos:  << Many are cold, but few are frozen. >>     << Frigido, ergo sum. >>

phil@aimt.UU.NET (Phil Gustafson) (02/21/90)

In article <43279@ames.arc.nasa.gov>, lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
> Reminds me of about 10 years ago, when I wrote some programs to test
> branch speeds.  I had to add some bogus assignments and outputs which were
> never executed, but might have been, in order to get the CDC and Cray 
> compilers of the time to create the loop.  And that was *ten years ago*.
> If someone ever puts AI into a compiler, we might as well give up on
> benchmarking :-)
> 
>   Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster

Yes.  I was responsible for helping make a [nameless] commercial benchmark
optimizer resistant.  Each test required a bogus assignment and output.

One very simple but effective technique involved reading _all_ the constants
used by the program from an external file to make sure the compilers didn't
do the arithmetic once at compilation time and never do it again.  The 
constant values were listed in comments in the source code -- I figured that
if the compiler could optimize using the comments it deserved to win :-} .

The same trick might well help with the famous dhrystone strcmp problem.

[Aside-- I keep hearing apocryphal stories about compilers that looked for
such strings as "dongarra" in source code and acted accordingly.  I'd appre-
ciate mail from anyone who knows of a real compiler or preprocessor that did
this.]

--
Opinions outside attributed quotations are mine alone.
Satirical material may not be labeled as such.
I don't work at this site anymore -- they just let me read their news.
--
-- 
				Phil Gustafson, Graphics/UN*X Consultant
				{uunet,ames!coherent}!aimt!phil phil@aimt.uu.net
				1550 Martin Ave, San Jose, Ca 95126