[net.micro.ns32k] 32332: how much faster than the 32032 ?

ehj@mordor.ARPA (Eric H Jensen) (09/09/86)

Maybe this subject has been discussed here before, but ...

Given the same clock speeds how much faster is the 32332 than the
32032?  What are the significant contributors to the increased speed?
I expect the answer to this last question to mention "better
micro-code".  I have heard that the micro-code for the 32032 is very
poor, can someone comment on this?
-- 
eric h. jensen        (S1 Project @ Lawrence Livermore National Laboratory)
Phone: (415) 423-0229 USMail: LLNL, P.O. Box 5503, L-276, Livermore, Ca., 94550
ARPA:  ehj@angband    UUCP:   ...!decvax!decwrl!mordor!angband!ehj

chongo@amdahl.UUCP (Landon Curt Noll) (09/11/86)

In article <15218@mordor.ARPA> ehj@mordor.UUCP (Eric H Jensen) writes:
 >Given the same clock speeds how much faster is the 32332 than the
 >32032?  What are the significant contributors to the increased speed?

I can't understand why the folks at nsc have remained silent on this matter,
unless their system is down or they didn't get this message, or ...

They forced me to sign one of them 'will not disclose' papers, so I can't
give you details of what I have measured.  I can note the range of statements
have been from >3x all the way down to 'slightly slower in some cases'.

Be careful what you read, even from the above.  Sometimes the statements
are just plain BOGUS, sometimes they are mis-leading and sometimes they
are close to the truth.

The first case can come up when someone wanting to prop sales up, or a
customer who is upset and wants to damage the reputation of the chip.
Sad to say, but I have seen this happen with all major chip firms.

The second case can be for a few reasons:

	* the systems do not have the same number of wait states/memory speed
	* one system has 'burst mode' memory and another does not
	* the systems were running with different peripherals or kernels
	* one system has local memory, another has memory over a slower bus
	* one system used mmu X, another used mmu Y, another no mmu!
	* the compiler generates better code
	  [in the case of the 32016 vs. 32032, a better compiler will show
	   a 32016 to be closer to a 32032.  Why?  The more you stay within
	   CPU Reg's, the more the 16 bit bus does not detract from the 32016.
	   Even so, a better compiler will make BOTH run faster.  It just
	   helps the 32016 is bit more]
	  [And remember that the 32332 can operate over a 32,16, or 8 bit bus.]
	* a benchmark uses a given instruction or addressing mode which is
	  faster/slower on one chip

The third case is worth talking about:

	Say you do find a benchmark that is honest.  What does it mean
	for you?  Will your 32332 based system have a greater throughput?
	Will you be able to solve a problem quicker or cheaper?

In general, I have seen a 32332 system running faster than an 32032 system
in my post-nsc employment days.  There were a number of factors why, one
of which was the cpu.

The 32332 runs Unix.  It has had MUCH, MUCH, MUCH fewer stated problems than
the 32016 did way back when it first came out.  The 32332 has a number of
performance/architecture advantages over the 32032 and 32016.

In short the 32332 is a *YUMMO* part.

chongo <for a 32332 vs. 68020 discussion, see net.flame> /\oo/\

	[This is not an Amdahl or NSC official statement]

thomson@utai.UUCP (Brian Thomson) (09/11/86)

We don't have 32032s here, but we have compared 32016s vs. 32332s
running 4.2bsd with a Fuji Eagle and have found the 10 MHz 332 to be about
twice the speed of the 016, or roughly 1.2 VAX 780s for CPU-intensive
non floating-point benchmarks.
-- 
			Brian Thomson,	    CSRG Univ. of Toronto
			{linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!thomson

henry@utzoo.UUCP (Henry Spencer) (09/11/86)

> ...In short the 32332 is a *YUMMO* part.

In fact, it's what the 32032 *should* have been.  If National had been
shipping this at 32032 time, the 32000 series would have been a roaring
success.  As it is, I fear the 32332 is too little too late.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

curry@nsc.UUCP (Ray Curry) (09/12/86)

Sorry that this response was slow but we have had problems with our news
program and mail.

How much faster is the NS32332 than the NS32032?  Like any answer concerning
computers, it depends upon what environment and what tasks your talking
about.  First, where does the speed come from?  

National obtained roughly 3 times the performance for the NS32332 based 
upon better architecture, faster clock, and better compiler technology.  
For the purpose of this discussion, let's limit the discussion to the
first.  Better microcode, faster 32 bit address calculation, better
interupt latency, and better hardware interface (burst mode memory)
are the factors.

As to the results, presently we have measured the following performance
increase for the 32332 compared to the 32032 running the same code and same 
clock (10MHz).

	dhrystone	1.3x
	sieve		1.4x
	puzzle		1.6x
	EDN's		1.25x

Overall I am seeing and average of about 50% faster, with some benchmarks 
as low as 20% and some as much as 90% faster.  The state of optimization 
impacts the performance difference with less optimized code giving greater 
difference as might be expected.  National is working on compilers supporting
faster code by optimizing the instructions used and by using registers 
more intelligently.  Nicknamed CTP for compiler technologies program, 
the compilers are a part of the move to UNIX V. They are due to be released
later this year and I have used both Fortran 77 and C.  I am still working 
on full characterization with a very large number of benchmarks, but in 
general with the new CTP compilers and the higher clock rate (15MHz), the 
overall improvement is pretty dramatic.  

The dhrystones have measured at 2800 dhry/sec on a DB332 board at 15MHz
with code generated by the NSC CTP compiler on the Dhrystone Version 1.1.
The board and the V.2 Unix are production released but the compiler is still 
in QA.  This compares to  855 on the 032 with older compilers. Some of our
customers had done some optimizations and got slightly higher numbers with
dhrystones such as Sequent (up to around 1200-1300).

Floating point intensive programs show less improvement because the 32081 
is still being used, but even so there is some improvement because of the 
non-floating point instruction mix and the higher clock rate (15MHz).  The 
32081 is slower because of the older 16 bit slave protocal used but still 
runs competitively in many floating point benchmarks like the LINPAC.
Whetstones run about 25% faster on the 332 at the same clock frequency with 
the 081.  Early next year, there will be a NS32381 FPU that uses the 32 bit
slave protocal available on the 32332 and some instruction improvements.  
Whetstone performance will of course climb when the new FPU is available
with the expected increase to be about 2 to 1.  As is NSC's policy on the
NS32000 family, the 32381 will be code compatible with the 32081.

chongo@amdahl.UUCP (Landon Curt Noll) (09/18/86)

In article <7115@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
 >> ...In short the 32332 is a *YUMMO* part.
 >
 >In fact, it's what the 32032 *should* have been.  If National had been
 >shipping this at 32032 time, the 32000 series would have been a roaring
 >success.  As it is, I fear the 32332 is too little too late.

How True!  (note my comment was without reference to both time and/or other
more *YUMMO* parts)  So, will the 32532 suffer from the same problem?

chongo <I have my guess already> /\oo/\