[comp.arch] 80960CA v 68040 comparative benchmark

srghgcp@windy.dsir.govt.nz (10/10/90)

We are in the business of designing high performance data communications
servers, and are about to embark on new development work with FDDI. We
have been evaluating two processors (Intel 80960CA and Motorola 68040)
as compute engines. We currently use the 68020. We have 
completed benchmarking the 80960CA, with the following results;

Obtained using Dhrystone (we only want integer performance) 2.1 compiled
under Intel iC960 version 2.0. The target is the Intel SDM evaluation
board running a 25MHz processor. The hardware can handle a 33MHz CPU
when Intel manage to make one. The board runs zero waitstate (WS) pipelined
reads in SRAM and one WS writes. Both reads and writes are burst mode. The 
DRAM runs 3-1-1-1 reads and 2-1-1-1 WS writes. The compiler does not
do any clever re-scheduling of instructions.
The internal instruction cache is on and upto 6 local register sets can be
internally cached.

Running in SRAM optimise=default=1........21916
Running in DRAM		"	  ........12813

	   SRAM	optimise=max=2............22585
	   DRAM 	"	..........13808

Since the hardware can handle 33MHz with no loss of performance (no more WS)
over 25MHz, we can scale the result for 33MHz, SRAM@ level 2......29812!!!

For comparison we ran the same tests on a VAX and our 68020 system....
VAX 11/780......1400. The 68020 has 1 WS DRAM, the compiler was Crosscode
C with no optimisation (it doesn't seem to make a lot of difference anyway)

68020@16.5 MHz cache on reg vars on, ............3242
68020@16.5 MHz cache off reg vars on.............2864

We are still building our 68040 system (we have a sample 25MHz 68040) but
based on Motorolas performance figures this gives a projected figure of
around 22000 Dhrystones. A 68040@25MHz = 5(68020@16.67MHz). This assumes 
no optimisation, and a zero WS 68040 system. 
Some other systems we have figures for (we did NOT run the tests) are;

	SUN 4/260	17800
	MIPS M/1000	22500    I'd treat these figures with care 
	SUN 3/60	4700     as I dont have the exact details
	IBM PS/2 M80	4400     of how the tests were run. 

Food for thought though eh? Anyone else got any figures or comment...I can
supply more details if needed.

Geoff Peck, Douglas Parker
__________________________________________________________________________
|Department of Scientific and Industrial Research                        | 
|Physical Sciences Div. | Internet: srghgcp@grv.dsir.govt.nz             |
|Infomation Tech. Group | Bitnet:   srghgcp%grv.dsir.govt.nz@relay.cs.net|
|P.O. Box 31-311        |                                                |
|Lower Hutt             | Phone:    +64 4 666-919                        |
|New Zealand            | Fax:      +64 4 690-067                        |
--------------------------------------------------------------------------
|Disclaimer: "My views are my own not my employer's"                     |
--------------------------------------------------------------------------

schow@bcarh185.bnr.ca (Stanley T.H. Chow) (10/12/90)

In article <18653.27132b53@windy.dsir.govt.nz> srghgcp@windy.dsir.govt.nz writes:
>We are in the business of designing high performance data communications
>servers, and are about to embark on new development work with FDDI. We
>have been evaluating two processors (Intel 80960CA and Motorola 68040)
>as compute engines. We currently use the 68020. We have 
>completed benchmarking the 80960CA, with the following results;
> [Dhrystone numbers]
>Food for thought though eh? Anyone else got any figures or comment...I can
>supply more details if needed.

First of all, (like everyone else), I will point out that Dhrystone
(or any other benchmark) does not count as much your actual application. To
get a real handle, run your own code.

I have looked at the Intel i960 family in some detail and like it a lot. I
looked at it as a candidate for high-reliability high-performance embedded
controller for communications (yes, including FDDI. For the embedded
processor niche, the i960 family has a lot going for it.
E.g., it has good fault detection/isolation, works well with DRAM, fast
procedure calls, low cost, nice debugging/tracing support, good range of
performace in family, good bit/bitfield handling, etc.

The major problem for us (at least with the early/cheap processors) is bus
bandwidth. Check out your cache hit-rates design your memory system accordingly.


Stanley Chow        BitNet:  schow@BNR.CA
BNR		    UUCP:    ..!uunet!bnrgate!bcarh185!schow
(613) 763-2831               ..!psuvax1!BNR.CA.bitnet!schow
Me? Represent other people? Don't make them laugh so hard.

jesup@cbmvax.commodore.com (Randell Jesup) (10/12/90)

In article <18653.27132b53@windy.dsir.govt.nz> srghgcp@windy.dsir.govt.nz writes:
>We are in the business of designing high performance data communications
>servers, and are about to embark on new development work with FDDI. We
>have been evaluating two processors (Intel 80960CA and Motorola 68040)
>as compute engines. We currently use the 68020. We have 
>completed benchmarking the 80960CA, with the following results;

	Be careful to chose your benchmarks to reflect the sort of load that
will be put on the system.  For example, Dhrystone heavily tests a compiler's
string handling (and the string library), and some other areas are do not
contribute heavily to the result.  Cache size can cause a good showing while
having a dramatic drop in performace for a larger test (for example, pushing
and 80286 past a 64K-byte data space).

	I would advise seeing if you can find some way to benchmark some
portion of the system you expect to use, or something similar (maybe some
tcpip code, or whatever).

>DRAM runs 3-1-1-1 reads and 2-1-1-1 WS writes. The compiler does not

	Do they do this at the hypothetical 33Mhz?

>Since the hardware can handle 33MHz with no loss of performance (no more WS)
>over 25MHz, we can scale the result for 33MHz, SRAM@ level 2......29812!!!

	Scaling can be deceptive.

>For comparison we ran the same tests on a VAX and our 68020 system....
>VAX 11/780......1400. The 68020 has 1 WS DRAM, the compiler was Crosscode
>C with no optimisation (it doesn't seem to make a lot of difference anyway)
>
>68020@16.5 MHz cache on reg vars on, ............3242
>68020@16.5 MHz cache off reg vars on.............2864

	Sounds like a pretty poor C compiler.  A 14Mhz '020 should be able
to do ~4500 with a reasonable microcomputer compiler (I think that was no reg
vars), even on a running system with VBlank interrupts occurring, etc.  I'm
quite sure the 80960 has a good, state-of-the-art compiler (or at least
reasonably close).

>We are still building our 68040 system (we have a sample 25MHz 68040) but
>based on Motorolas performance figures this gives a projected figure of
>around 22000 Dhrystones. A 68040@25MHz = 5(68020@16.67MHz). This assumes 
>no optimisation, and a zero WS 68040 system.

	The relation between speeds of '030's and '040's should be more
accurate than '020's and '040's, since '030's have data caches, and had
tighter microcode.

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.cbm.commodore.com  BIX: rjesup  
Common phrase heard at Amiga Devcon '89: "It's in there!"