[comp.sys.nsc.32k] the NS32532

roger@nsc.UUCP (04/10/87)

References:


	Howdy ----

	First off let me introduce myself.  I'm the technical marketing
	manager for the Series 32000 product line.  The product line
	consists of the various components, software and system level
	products.  In other words, if its Series 32000 I'm involved
	in some way..

	In the last 24 hours, two articles were posted on the net
	that drove me to learning the tricks of posting my own
	article.  If I've made an error in the process, please
	excuse.

	Yes, National has previewed the NS32532, and we have 
	done that prior to having functional silicon.  We recognize
	that this is a departure from past practices at NSC.  We didn't do this
	as a result of any prior announcements by Mot or AMD.  We did
	it because we believe that the NS32532 is really a revolutionary
	device of sufficient importance to deserve special consideration.

	We have discussed the design of the NS32532 with several
	companies over the past year BUT all of this was under non-disclosure.
	We did this as a fine tuning process to ensure that the part
	was correctly architected.  We believe we have achieved
	our architectural objective.  The design is frozen.  Now
	to ease our discussions with a broader customer base, we have
	gone public.

	Yes, it is true that all we have today are simulated results.
	But we have more than just a simulation model.  We are in
	the final stages of a 5 year effort in designing this
	new addition to the Series 32000 family.  We are running
	extensive test vectors to ensure the integrity of the design
	prior to generating our first mask set.  We are very confident
	that we will have customer samples in the fourth calendar
	quarter of 1987.  What is so impressive about this project
	is that we are STILL ON OUR ORIGINAL SCHEDULE.

	I wish I had more than just simulated results, in a few
	short months, we will.

	For my friend Landon at Amdahl, I would like to say that the
	simulated dhrystones results we published are real.  Since
	he asked several pointed questions, I'll provide results
	du jour if you wish.

	Let me give you some details ------

	32332 @ 15 MHZ version 1.0 optimized ............ 3943.5
	32332 @ 15 MHZ version 1.1 optimized ............ 3183.0
	32332 @ 15 MHZ version 1.1 no global opt......... 2724.0


	32532 @ 30 MHZ version 1.0 optimized ............ 19800
	32532 @ 30 MHZ version 1.1 optimized ............ 16600
	32532 @ 30 MHZ version 1.1 no global opt ........ 14100

	Yes we have a new set of compilers in the works that among
	other things supports global optimizations.

	I don't care which compilers you use on which version of the
	benchmark, the 532 is better than 5X improvement over 
	todays 332 and over 11.7X faster than the 32032.

	OK you say ---- the 532 has a faster clock ------ lets factor that out
	of the equation. 

	      the 32532 is 2.53X the 32332 at the same frequency
	      the 32532 is 3.86  the 32032 at the same frequency

	Yes we have made significant architectural changes(hardware only)
	that provides this significant increase.  It's not imaginary
	and Landon, it isn't marketing hype.  It's fact. The numbers
	quoted were measured by the designers( members of the compiler
	team included) and NOT created by any of us in Marketing.


	An offer I have for you Landon ------

		       COME ON DOWN

	    I'll give you a load of documentation and we can arrange
	    a time to show you the simulator in action.

	One caveate:

	    I only have the low end simulator running on my system
	    and I don't have the latest tools so I can't show you the
	    listed results but I can get close.  I CAN show you
	    the improvement delta.

chongo@amdahl.UUCP (04/11/87)

In article <4190@nsc.nsc.com> roger@nsc.nsc.com (Roger Thompson) writes:
 >	We are in the final stages of a 5 year effort in designing this
 >	new addition to the Series 32000 family.  We are running
 >	extensive test vectors to ensure the integrity of the design
 >	prior to generating our first mask set.  We are very confident
 >	that we will have customer samples in the fourth calendar
 >	quarter of 1987.  What is so impressive about this project
 >	is that we are STILL ON OUR ORIGINAL SCHEDULE.

I seem to recall that the 32532 (which long ago was known at the 32132
back in the dayes when the 32016 was the 16032...) was to have been in
production last year.  I am confused about this claim of being on schedule
for 5 years.  Do you mean that the project was re-scheduled since 1982
and it is has been on the schedule since then?

In any event, good luck.  I look forward to seeing the 32532 exist.

chongo <> /\oo/\
-- 
[views above shouldn't be viewed as Amdahl views, or as views from Amdahl, or
 as Amdahl views views, or as views by Mr. Amdahl, or as views from his house]

jgp@moscom.UUCP (04/11/87)

In article <4190@nsc.nsc.com> roger@nsc.nsc.com (Roger Thompson) writes:
>	32332 @ 15 MHZ version 1.0 optimized ............ 3943.5
>	32332 @ 15 MHZ version 1.1 optimized ............ 3183.0
>	32332 @ 15 MHZ version 1.1 no global opt......... 2724.0
>
>	32532 @ 30 MHZ version 1.0 optimized ............ 19800
>	32532 @ 30 MHZ version 1.1 optimized ............ 16600
>	32532 @ 30 MHZ version 1.1 no global opt ........ 14100
>
>	Yes we have a new set of compilers in the works that among
>	other things supports global optimizations.
Unfortunately the Dhrystone benchmark does not support global
optimizations; it says so right in the instructions.  A good
global optimizer would reduce dhrystone to a few arithmetic
instructions, 2 calls to times() and 1 to write().  As current
global optimizers get better they will aproach this ideal by
eliminating more and more code that does nothing in the benchmark
but accomplishes real work in the code the benchmark is supposed
to represent.

The above numbers represent a dhrystone rating of about 14000, not
the 18000 previously quoted.

Of course, the above applies to everyone, I don't mean to single
out nsc.  If your going to publish a dhrystone number (or any
benhmark for that matter) keep the global optimizers away from it.
-- 
Jim Prescott	rochester!moscom!jgp

mash@mips.UUCP (04/12/87)

In article <4190@nsc.nsc.com> roger@nsc.nsc.com (Roger Thompson) writes:
>	First off let me introduce myself.  I'm the technical marketing
>	manager for the Series 32000 product line.....
>	Yes, it is true that all we have today are simulated results.
>	But we have more than just a simulation model....
>	For my friend Landon at Amdahl, I would like to say that the
>	simulated dhrystones results we published are real.  Since
>	he asked several pointed questions, I'll provide results
>	du jour if you wish.
>	Let me give you some details ------
>	32332 @ 15 MHZ version 1.0 optimized ............ 3943.5
>	32332 @ 15 MHZ version 1.1 optimized ............ 3183.0
>	32332 @ 15 MHZ version 1.1 no global opt......... 2724.0
>
>	32532 @ 30 MHZ version 1.0 optimized ............ 19800
>	32532 @ 30 MHZ version 1.1 optimized ............ 16600
>	32532 @ 30 MHZ version 1.1 no global opt ........ 14100

Since these were volunteered to the net, let me suggest a few things
that will help credibility and let people assess the reality of these
numbers.  Simulations can be OK: people who scoff at them automatically
are being a little harsh, although past histories of many vendors
leads to skepticism.  Here are the suggestions:

1) Since 1.0 Dhrystones have been obsolete for over a year, and clearly
labeled as such by every issue of Rick R's postings, is there some
nonobvious reason why 1.0 numbers are included?

2) This posting fixes a problem with an earlier one in this sequence
<417@nsc.nsc.com>, which claimed around 18K simulated Dhrystones:
"Just for interest, simulated Dhrystone performance of around 18,000 with 
on chip physical-address instruction and data caches, and on board demand-
paged MMU..."
Thank you for giving the correct labeling: from these numbers, it appears
that the earlier posting claimed Dhrystone figures that were 1.0 ones,
WITHOUT LABELING THEM AS SUCH.  If this is not true, please correct this
impression.  If it is true, I'd suggest that people in this newsgroup
especially have learned to be pretty skeptical for exactly this kind of
thing.

3) When publishing performance figures of microprocessors, if there is
to be ANY semblance of credibility, you MUST specify the memory system
hooked to the micro as modeled in the simulations, and give other environmental
issues.  Here is a starting list:
	cache size(s)
	data cache nature: write-thru [maybe with write buffers],
		or write-back
	cache line sizes
	number of cycles penalty for refilling the caches, refill nature
	main memory DRAM speed used in the model to achieve the above.
	environment: standalone, or in simulated virtual memory environment,
	including MMU overhead, if any, and something for OS overhead
	[like clock ticks that execute code that trashes the caches
	now and then].
[A bunch of this is very relevant to what real performance one will see
on real benchmarks: Dhrystone is amenable to small caches.]

4) Once again, just to introduce some reality, what this means is that
the 20MHz parts will, in 4Q87 (more-or-less, for the usual reasons), run at
about 2/3 the numbers show above.  Hence, we'd hope to see posted to
the list some real machine numbers, that following the usual Dhrystone
rules, rate a 20MHz, unoptimized 1.1 Dhrystone at around 9400
and an optimized one at around 11000.  Hence, given the usual intervals,
maybe in mid-88 we'll see real machine numbers in the 14000 - 16000 range,
which should be respectable, if not spectacular for that time.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

amos@instable.UUCP (04/12/87)

In article <6167@amdahl.UUCP> chongo@amdahl.UUCP (Landon Curt Noll) writes:
>I seem to recall that the 32532 (which long ago was known at the 32132
>back in the dayes when the 32016 was the 16032...) was to have been in
>production last year.

The 32132 is a dual-cpu chip at the 32032 level, and has been in production
for years. As far as I know it has no connection to the 32532.
I wasn't here during the Great Renumbering, but the next generation cpu
after the 32032 is the 32332, (whatever it was meant to be called B.G.R.),
and it *was* in production last year.
(Apologies to anyone who have seen my previous erronous posting about this,
it should have been canceled by now).
-- 
	Amos Shapir
National Semiconductor (Israel)
6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel  Tel. (972)52-522261
amos%nsta@nsc.com {hplabs,pyramid,sun,decwrl} 34.48'E 32.10'N

chongo@amdahl.UUCP (04/13/87)

In reply to: Message-ID: <4193@nsc.nsc.com> by Roger:
 >The demo showed the visual affects of what 3x ment.  Yes you are correct, 
 >at that same meeting, both set-ups were running at the same clock frequency.  

The demo did not state this this was a simulation, but rather mislead people
into thinking that the 32332 was going to 3x of a 32032.  Why do you think
some people bitched and moaned when the 32332 was showing a performance
increase of less than half that amount?

 >Yes you are correct, at that same meeting, both set-ups were running at
 >the same clock frequency.  Since the demo was I/O bounded anyway, it would 
 >hardly have mattered if the 32332 were actually 10X faster.

True.  The 3x performance gain was obtained by slowing down the 32032.  The
was done was by having the 32032 perform extra work (such as non-optimal I/O).
Had been a big sign on the demo saying: "simulated performance", maybe things
would have been less misleading.

This kind of benchmark (showing raw CPU increase) is somewhat comic when you
remember that the selling pitch of "32000 vs 68000" was based on total system
performance, discounting MOTs raw-cpu performance.  But that seems to all
be "water under the melons" now, if you know what I mean.

What am I looking/hoping for?  Well:

   * What kind of system thur-put increase will I see in a 32532 vs 32x32?

	Saying 'going from junk C compilers to normal C compilers', jumping
	to 15Mhz and a 332 gives you 3x is not the most optimal way to
	gain performance.  For example, if you improve the C compiler on a 
	32332, then you do the same on a 32032.   (your code compat, remember?)
	The compiler improvement should be factored out.  Or, if I have XYZns
	rams in my system, increasing the clock rate may likely lead to more
	wait states than more performance.

   * How soon after the 'on-time project' delivers a working 32532 in samples 
     might I see 19800 Dhrystone systems?

   * How has NSC changed the way the 532 is being developed/built in order to
     prevent a repeat of the past problems?

	Like the 16032G to 32016N problems, lack of NSC 32032 demo system,
	confusing statements about the 32332 performance, ...

   * If I go ahead and design a system based on the NS32532, won't I be faced 
     with the problem of building with a chip line that is not as commonly used?

	That may be an unfair/misleading question.  But my point is that
	customers are going to be seeing x86 ot 680x0 systems up the WAZooo.
	They might wonder about someone going with a non-common brand...

Lets try to direct this discussion away from finger point and more toward
what will be, and how it has been improved.  Also, I hope the NSC folks have 
not been offended at my asking pointed questions.

Best wishes toward building 19800 Dhrystone systems!

chongo <uh-clem?> /\mp/\
-- 
[views above shouldn't be viewed as Amdahl views, or as views from Amdahl, or
 as Amdahl views views, or as views by Mr. Amdahl, or as views from his house]

roger@nsc.UUCP (04/13/87)

> The demo did not state this this was a simulation, but rather mislead people
> into thinking that the 32332 was going to 3x of a 32032.  Why do you think
> some people bitched and moaned when the 32332 was showing a performance
> increase of less than half that amount?
> 
May I point out that this demo was used BUT ONCE at the stock holders
meeting.  Not a single observer bitched.  Many asked questions and were
told exactly what it was doing.  Several others as expected just didn't
care.  This is quite often the case with financial analyists.  If
my memory serves me right, you were the only one who moaned.  Since you
were not in attendance at the meeting, I'm not sure how you can say "some".
Let me interpret this as "1".  We have many satisfied 32332 customers
today all of whom would be pleased to describe the performance increase
they got.

> 
> This kind of benchmark (showing raw CPU increase) is somewhat comic when you
> remember that the selling pitch of "32000 vs 68000" was based on total system
> performance, discounting MOTs raw-cpu performance.  But that seems to all
> be "water under the melons" now, if you know what I mean.

What you are getting to here is what you could call "useable performance".
I don't think that even you would disagree that it has always been easier
to design a memory sub-system for any 32000 cpu versus the equivalent 68000
cpu.  The 32000 has always allowed the designer more margin and the
ability to use slower and CHEAPER memories.  How much performance loss
per wait state do you get on the 68020? or did you pay mega bucks for the
fast RAMs.

> What am I looking/hoping for?  Well:
> 
>    * What kind of system thur-put increase will I see in a 32532 vs 32x32?
> 
      Lets try "external bus bandwidth" ......... 532 @30 =  96 Mbytes/sec
					......... 032 @10 =   8 Mbytes/sec
	using the same code as a test case.  This is as we said befor
	an improvement of 12X.  If that doesn't count and you say that
	the CPU can't sustain that, let me just say that the internal
	hardware has a sustained thruput of 240 Mbytes.  Yes, the number is
	correct.

> 	32332, then you do the same on a 32032.   (your code compat, remember?)
> 	The compiler improvement should be factored out.  Or, if I have XYZns
> 	rams in my system, increasing the clock rate may likely lead to more
> 	wait states than more performance.

This is a two part answer, all the performance increase factors I've posted
(ie: 11.7x or 12x above ) don't INCLUDE improvements in compilers. As it
relates to RAM speed, I'll agree that using slow rams may cause a problem
when you integrate in new CPUs.  These are trade-offs designers of systems
have to make.  Where does it say that ALL SYSTEMS must be designed with
XYZns rams.  Befor I leave this subject, to operate at 0 wait states
at 30 MHZ, the address to ready time is 49 nsec.  Not bad.  How much
will you get for a 20 MHZ 030 when it comes out.  Whats more, if you
have wait states (which in many cases is likely), you will see about 3%
performance degradation per wait state, also not bad if you compare
that to what happens today on the 020 and other processors.

>    * How soon after the 'on-time project' delivers a working 32532 in samples 
>      might I see 19800 Dhrystone systems?

30 MHZ parts are due out the first half of CY88.

>    * How has NSC changed the way the 532 is being developed/built in order to
>      prevent a repeat of the past problems?

The original NS16032 (using the old numbers) was designed virtually by hand.
There were very few simulation tools available at the time.  As was done on
the 32332 and the 32382 (both of which are in full production), the design
of the 32532 was done with very sophisticated software tools.   

We have a complete vme based demo system in design today and it is scheduled
out simultaneously with the silicon.  We will not sample the 532 until it
works on our own hardware.

roger@nsc.UUCP (04/13/87)

In article <951@moscom.UUCP>, jgp@moscom.UUCP (Jim Prescott) writes:

> Unfortunately the Dhrystone benchmark does not support global
> optimizations; it says so right in the instructions.  A good
> global optimizer would reduce dhrystone to a few arithmetic
> instructions, 2 calls to times() and 1 to write().  As current
> global optimizers get better they will aproach this ideal by
> eliminating more and more code that does nothing in the benchmark
> but accomplishes real work in the code the benchmark is supposed
> to represent.
> 

We provided a range of numbers, in an effort to allow people to
make up their own minds. Yes, it is true that the dhrystone benchmark
as well as others used today can be defeated by good compilers.  BUT
in the end application, which would you prefer to use.


Roger Thompson

roger@nsc.nsc.com (Roger Thompson) (04/13/87)

In article <278@winchester.mips.UUCP>, mash@mips.UUCP (John Mashey) writes:
> 3) When publishing performance figures of microprocessors, if there is
> to be ANY semblance of credibility, you MUST specify the memory system
> hooked to the micro as modeled in the simulations, and give other environmental
> issues.  Here is a starting list:
> 	cache size(s)
> 	data cache nature: write-thru [maybe with write buffers],
> 		or write-back
> 	cache line sizes
> 	number of cycles penalty for refilling the caches, refill nature
> 	main memory DRAM speed used in the model to achieve the above.
> 	environment: standalone, or in simulated virtual memory environment,
> 	including MMU overhead, if any, and something for OS overhead
> 	[like clock ticks that execute code that trashes the caches
> 	now and then].
> [A bunch of this is very relevant to what real performance one will see
> the list some real machine numbers, that following the usual Dhrystone

I may get corrected on this shortly.  I believe that the simulations were
done with the assumptions that external memory was all 0 wait state.

However, in support of this let me say that we have a parallel design effort
on a vme board pair that supports a 64K direct mapped cache, write
through with a write buffer.  The cache is organized as 4096 lines of 
16 bytes.  The board also has 4 Meg of on-board ram.

In this configuration, our simulations show that the board with the 532
will be able to sustain a performance of approximately 95% of the
full rated performance of the 532.  Not bad, and one doesn't need to
pay mega bucks for the RAM.

Roger Thompson

clif@intelca.UUCP (04/13/87)

> In article <4190@nsc.nsc.com> roger@nsc.nsc.com (Roger Thompson) writes:
> >	32332 @ 15 MHZ version 1.0 optimized ............ 3943.5
> >	32332 @ 15 MHZ version 1.1 optimized ............ 3183.0
> >	32332 @ 15 MHZ version 1.1 no global opt......... 2724.0
> >
> >	32532 @ 30 MHZ version 1.0 optimized ............ 19800
> >	32532 @ 30 MHZ version 1.1 optimized ............ 16600
> >	32532 @ 30 MHZ version 1.1 no global opt ........ 14100
> >
> >	Yes we have a new set of compilers in the works that among
> >	other things supports global optimizations.
> Unfortunately the Dhrystone benchmark does not support global
> optimizations; it says so right in the instructions.  A good
> global optimizer would reduce dhrystone to a few arithmetic
> instructions, 2 calls to times() and 1 to write().  As current
> global optimizers get better they will aproach this ideal by
> eliminating more and more code that does nothing in the benchmark
> but accomplishes real work in the code the benchmark is supposed
> to represent.
> 
> The above numbers represent a dhrystone rating of about 14000, not
> the 18000 previously quoted.
> 
> Of course, the above applies to everyone, I don't mean to single
> out nsc.  If your going to publish a dhrystone number (or any
> benhmark for that matter) keep the global optimizers away from it.
> -- 
> Jim Prescott	rochester!moscom!jgp

While, I agree that using a global optimizing compiler is not exactly
kosher for the dhrystone benchmark it sometimes neccessary.  For 
instance: the GreenHills C compiler is a globally optimizing compiler
which generates good Dhrystone numbers for many architectures including
the 80386 and 68020.	 Unfortunately, I can not find a compiler
switch to turn off the global optimizer.  This leaves me with two choices:
post the numbers with the cavet that this a global optimizing compiler 
or use the results of a medicore compiler like CC.  I don't really
think that global optimization is a problem as long as it is clearly
labeled. 
-- 
Clif Purkiser, Intel, Santa Clara, Ca.
{pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif

These views are my own property.  However anyone who wants them can have 
them for a nominal fee.
	

chongo@amdahl.UUCP (04/14/87)

In article <4196@nsc.nsc.com> roger@nsc.nsc.com (Roger Thompson) writes:
 >May I point out that this demo was used BUT ONCE at the stock holders
 >meeting.  Not a single observer bitched.  Many asked questions and were
 >told exactly what it was doing.  Several others as expected just didn't
 >care.  This is quite often the case with financial analyists.  

This demo system may the tour rounds.  My Microterm terminal I used at work
had its delivery to my desk delayed for several weeks because it was the 
terminal used with the 32332.  The demo ran so long and was run so much that 
the tower of Hanoi was burned into the phosphor of the screen.  This rigged 
demo was not used more than once.

 >If my memory serves me right, you were the only one who moaned.  Since you
 >were not in attendance at the meeting, I'm not sure how you can say "some".
 
I am a stockholder.  (I even cast my vote for Mike Puckett as I recall :-})

 >Let me interpret this as "1".  We have many satisfied 32332 customers
 >today all of whom would be pleased to describe the performance increase
 >they got.

How many systems use the 32332?  Where can I evaluate (other than a VR32
for reasons that I will go into via Email if you care to ask) such a system?
(assuming it is in the price/performance range I am looking for)

 >The 32000 has always allowed the designer more margin and the
 >ability to use slower and CHEAPER memories.  How much performance loss
 >per wait state do you get on the 68020? or did you pay mega bucks for the
 >fast RAMs.

Why does the 68020 outsell the 32032, I wonder?  Maybe designers of 68020
don't know something?  Your in marketing, perhaps you understand such issues,
please explain!

 >> What am I looking/hoping for?  Well:

Thanx for your answers.

chongo <> /\oo/\
-- 
[views above shouldn't be viewed as Amdahl views, or as views from Amdahl, or
 as Amdahl views views, or as views by Mr. Amdahl, or as views from his house]

roger@nsc.UUCP (04/14/87)

In article <6237@amdahl.UUCP>, chongo@amdahl.UUCP (Landon Curt Noll) writes:

> This demo system may the tour rounds.  My Microterm terminal I used at work
> had its delivery to my desk delayed for several weeks because it was the 
> terminal used with the 32332.  The demo ran so long and was run so much that 
> the tower of Hanoi was burned into the phosphor of the screen.  This rigged 
> demo was not used more than once.

I humbly stand corrected.  You are correct with one respect.  The demo was used 
twice---- so sorry.  The second usage was for exactly the same purpose
at a meeting to introduce the 32332 in Europe.  It went without your
terminal so I can't account for that. How do I know that?  Well you
see your terminal at the time was not convertable to 220 volts.  Yes
your terminal was used alot in checking out the set-up.

>  
> I am a stockholder.  (I even cast my vote for Mike Puckett as I recall :-})
> 

Yes, age is creeping up on me, I did make one error, so I may be wrong 
again, but as best as I can remember there weren't any adjustments to
the vote tally at the meeting which means you must have mailed in your
proxy.  In any case, casting a vote for some else is slightly ---- should
I say ------ well hard to do unless that other person has signed over
his stock to you for voting purposes. If that is the case ---- fine.

> How many systems use the 32332?  Where can I evaluate (other than a VR32
> for reasons that I will go into via Email if you care to ask) such a system?
> (assuming it is in the price/performance range I am looking for)

The following people have systems in a demonstrable fashion.  All are just
going into production.  Design on these projects all started last summer
not due to any CPU related issues but because that was when we had 
our FIRST (and only ) rev of the new 32382 MMU

		       Encore, Opus, Seimens,
		       Definicon,Whitechappel
		       and our own ISG(ie: Portland

 I selected the above since they all run some derivitive of *nix
 all quite nice.  Oh by the way, Seimens shipped more series32000
 Unix boxes over the last 9 months than your favorite workstation
 supplier, so did Opus.  Encore is makeing good inroads in providing
 more computes than your basic VAX at a very very attractive price.

			Both Encore and ISG were
			at Uniforum as best as 
                        I can remember.

Why do so many people like the 68020????? Certainly not because of
their MMU.

Enough on that, it's too bad you have choosen to discuss only the
past with little interest to the 32532 which is what brought up
the above issues.

     ---- Roger

gemini@homxb.UUCP (Rick Richardson) (04/14/87)

In article <2577@intelca.UUCP>, clif@intelca.UUCP (Clif Purkiser) writes:
> While, I agree that using a global optimizing compiler is not exactly
> kosher for the dhrystone benchmark it sometimes neccessary.  For 
> instance: the GreenHills C compiler is a globally optimizing compiler
> which generates good Dhrystone numbers for many architectures including
> the 80386 and 68020.	 Unfortunately, I can not find a compiler
> switch to turn off the global optimizer.

Is this true?  I have many results using the GreenHills compiler which
are not marked as having a global optimizer turned on.  Are you sure
there's no switch to turn it off?  

I've been watching these benchmark wars for awhile now, and frankly, I'm
a little upset that I put myself in the middle as referee.  With all these
new chips and super hot compilers, I've lost confidence in the validity
of many of the results that have been sent to me.  I used to get
results from Joe Engineer; now I'm getting them from Montague F. Salesman.
And with the way the optimizing technology is taking off, I expect to
see a Commodore 64 reporting 50,000 dhrystones by years end :-).

Here's the plea: turn off the optimizer and send the results marked
no opt, turn on the optimizer and send the results w/<LEVEL> opt, where
<LEVEL> indicates peephole, global, read-programmers-mind, or whatever.
If it can't be turned off, say so.  Meanwhile, any advice on modifying
the Dhrystone for version 1.2 such that a global optimizer won't be
able to remove anything will be appreciated.

And remember, folks, that a test of compiler A/machine A versus
compiler A/machine B is valid.  Compiler A/machine A versus
compiler B/machine A is also valid.  Compiler A/machine A versus
compiler B/machine B is probably invalid if seeking the truth
about machine A/B's relative power, but may be valid if your
goal is to put 147 users on the machine whose only use is to
run the benchmark.


Rick Richardson, PC Research, Inc: (201) 922-1134  ..!ihnp4!castor!pcrat!rick
	         when at AT&T-CPL: (201) 834-1378  ..!ihnp4!castor!polux!rer

[I know those compiler writers...they LOVE to change things]

mcvoy@rsch.WISC.EDU (Larry McVoy) (04/14/87)

(Roger Thompson) writes:
> all quite nice.  Oh by the way, Seimens shipped more series32000
> Unix boxes over the last 9 months than your favorite workstation
> supplier, so did Opus.  

How many, how much $$, running what version of Unix, with what sort of
peripherals?

--larry

P.S.  Roger - I sent you mail requesting the widely touted documentation.
      Did you lose that mail?  Or are you all talk and no action?
-- 
Larry McVoy 	        mcvoy@rsch.wisc.edu  or  uwvax!mcvoy

"It's a joke, son! I say, I say, a Joke!!"  --Foghorn Leghorn

terryl@tekcrl.UUCP (04/15/87)

In article <219@homxb.UUCP> gemini@homxb.UUCP (Rick Richardson) writes:
+In article <2577@intelca.UUCP>, clif@intelca.UUCP (Clif Purkiser) writes:
+> While, I agree that using a global optimizing compiler is not exactly
+> kosher for the dhrystone benchmark it sometimes neccessary.  For 
+> instance: the GreenHills C compiler is a globally optimizing compiler
+> which generates good Dhrystone numbers for many architectures including
+> the 80386 and 68020.	 Unfortunately, I can not find a compiler
+> switch to turn off the global optimizer.
+
+Is this true?  I have many results using the GreenHills compiler which
+are not marked as having a global optimizer turned on.  Are you sure
+there's no switch to turn it off?  

     Well, yes and no. Since Greenhills supports symbolic debugging (at
least for a 68020 system), with the "-g" option, Mr. Richardson is right.
BUT, baring that, the only option I could find to "turn off" optimization
is the -X9 option, which is to disable the local(peephole) optimizer. So
Mr. Purkiser is right.

     Actually, there are a couple of more options to disable optimization,
like not moving frequently used procedure and data addresses into registers,
but I doubt that would cause much of a discrepancy.

tim@amdcad.UUCP (04/15/87)

In article <219@homxb.UUCP> gemini@homxb.UUCP (Rick Richardson) writes:
>                               ...  Meanwhile, any advice on modifying
>the Dhrystone for version 1.2 such that a global optimizer won't be
>able to remove anything will be appreciated.

One easy fix to Dhrystone is to package it as two (or more) separate c
files which must be compiled separately, then linked together.  This
will prevent global optimizers from looking at the entire program at one
whack, and is much more realistic (i.e.  most real-life programs are
made up of many files.) One question would be how to partition the
procedures and data declarations over multiple files realistically.  I
myself tend to use an "object oriented" approach, where data
declarations and the procedures to operate on said data exist in their
own file, and the data declarations tend to be "static" -- only visible
to the procedures within that file. 

	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.AMD.COM)
 

clif@intelca.UUCP (04/15/87)

> In article <2577@intelca.UUCP>, clif@intelca.UUCP (Clif Purkiser) writes:
> > While, I agree that using a global optimizing compiler is not exactly
> > kosher for the dhrystone benchmark it sometimes neccessary.  For 
> > instance: the GreenHills C compiler is a globally optimizing compiler
> > which generates good Dhrystone numbers for many architectures including
> > the 80386 and 68020.	 Unfortunately, I can not find a compiler
> > switch to turn off the global optimizer.
> 
> Is this true?  I have many results using the GreenHills compiler which
> are not marked as having a global optimizer turned on.  Are you sure
> there's no switch to turn it off?  
> 
> I've been watching these benchmark wars for awhile now, and frankly, I'm
> a little upset that I put myself in the middle as referee.  With all these
> new chips and super hot compilers, I've lost confidence in the validity
> of many of the results that have been sent to me.  I used to get
> results from Joe Engineer; now I'm getting them from Montague F. Salesman.
> And with the way the optimizing technology is taking off, I expect to
> see a Commodore 64 reporting 50,000 dhrystones by years end :-).
> 
> Here's the plea: turn off the optimizer and send the results marked
> no opt, turn on the optimizer and send the results w/<LEVEL> opt, where
> <LEVEL> indicates peephole, global, read-programmers-mind, or whatever.
> If it can't be turned off, say so.  Meanwhile, any advice on modifying
> the Dhrystone for version 1.2 such that a global optimizer won't be
> able to remove anything will be appreciated.
> 
> And remember, folks, that a test of compiler A/machine A versus
> compiler A/machine B is valid.  Compiler A/machine A versus
> compiler B/machine A is also valid.  Compiler A/machine A versus
> compiler B/machine B is probably invalid if seeking the truth
> about machine A/B's relative power, but may be valid if your
> goal is to put 147 users on the machine whose only use is to
> run the benchmark.
> 
> 
> Rick Richardson, PC Research, Inc: (201) 922-1134  ..!ihnp4!castor!pcrat!rick
> 	         when at AT&T-CPL: (201) 834-1378  ..!ihnp4!castor!polux!rer
> 
> [I know those compiler writers...they LOVE to change things]

On the GreenHills compiler there is little difference between -O -O2 , reg
and noreg options.  I believe that Green Hills C 386 compiler uses the same
basic switches for all architectures, so I suspect that ALL results using
GHS are global optimizing.  I am looking forward to being told that I am 
incorrect about GHS .  

Rick, I understand that you are not thrilled about refereeing benchmark.war 
but look at the good side:

	The dhrystone benchmarks is better (i.e larger and closer to
modeling an application) than most other benchmarks. 

	It is standardized  (not like the EDN benchmarks )

	It seems to be accepted by most of the uP vendors  
AMD, Fairchild, Intel , Motorola and National  and there exists
a very large number of results which seem to be ordered properly
(i.e CRAYs are faster than Commodore 64s)

	I think that work we have down on USnet has been helpful
in providing information to engineers trying to understand the
performance claims of computer vendors.  

	The only true benchmark is your application, but the Dhrystone
serves as an important data point.


-- 
Clif Purkiser, Intel, Santa Clara, Ca.
{pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif

These views are my own property.  However anyone who wants them can have 
them for a nominal fee.
	

gnu@hoptoad.UUCP (04/17/87)

I must say that Roger Thompson's statement "We provided a range of
numbers, in an effort to allow people to make up their own minds." is a
marvel even in the bizarre world of chip marketing.

Sure -- they mention 19000 dhrystones, and 14000 dhrystones, and a
number in the middle -- people can make up their own mind.  It's just
that the 19000 number was obtained by running the invalid version 1.0
benchmark by violating its instructions about turning off
optimization -- in order that all the systems be compared under the
same conditions.  But you can make up your own mind.

I say keep at 'em Landon -- finally you are in a position where they
can't gag you when you point out their, uh, inappropriate marketing
practices.  (In case the audience doesn't know, Landon used to work
for National and got into trouble with their management for telling
good customers the truth about the parts when it conflicted with the story
NSC management wanted people to believe.)

May the best chip succeed, despite its marketing people!
-- 
Copyright 1987 John Gilmore; you can redistribute only if your recipients can.
(This is an effort to bend Stargate to work with Usenet, not against it.)
{sun,ptsfa,lll-crg,ihnp4,ucbvax}!hoptoad!gnu	       gnu@ingres.berkeley.edu

roger@nsc.nsc.com (Roger Thompson) (04/17/87)

In article <3456@rsch.WISC.EDU>, mcvoy@rsch.WISC.EDU (Larry McVoy) writes:
> 
> P.S.  Roger - I sent you mail requesting the widely touted documentation.
>       Did you lose that mail?  Or are you all talk and no action?
> -- 
> Larry McVoy 	        mcvoy@rsch.wisc.edu  or  uwvax!mcvoy

Larry --- I sent you mail asking for your address but I as yet
haven't gotten a reply.  Mail has it hostage possibly.  If
you still want a lit package post your address and I'll send it.

		    Sorry for the delay
		     
		     Roger