[net.micro.68k] 68020 benchmarks??

randy@petfe.UUCP (Randy Banton) (05/13/85)

Has anyone run the BYTE benchmarks on a 68020 based system with UNIX
System V yet?? (Motorola guys, are you listening?)

I have a benchmark paper from the Intel literature group which
claims a 6Mhz 80286 is 1.38 a 10Mhz 68010. The 286 machines
were the Intel 286/310 and IBM PC/AT.  The 68010 machines were
a Sun 2/120 and a Masscomp workstation.

Next they determined that a 10Mhz 80286 (0 wait states) is 2.85
times the same 10Mhz 68010 machines. Note these times are all measured
on real machines (as opposed to paper calculations).  The 10Mhz
80286 is also rated as equal to the 16Mhz 68020. The 68020
assumptions were zero wait states and that a 16Mhz 68020 was 2.84
times a 10Mhz 68010 (i.e. no real 020 system).

They finally extrapolate that a 12.5Mhz 80286 (0 wait states)
is 1.27 times a 16Mhz 68020 (0 wait states).

Without arguing the merits of benchmarks, has anyone run any of 
the "common" benchmarks a 16Mhz (or 12.5Mhz) 68020 system?

(For those who haven't seen it, Intel had a two page color
ad about the results mentioned above.  It was is Electronic News,
I believe May 6, 1985. The report I mention  is called "iAPX 286 
High Performance Benchmark Study Report" and is dated April 1985.)


				Randy

dan@rna.UUCP (Dan Ts'o) (05/16/85)

In article <> randy@petfe.UUCP (Randy Banton) writes:
>I have a benchmark paper from the Intel literature group which
>claims a 6Mhz 80286 is 1.38 a 10Mhz 68010. The 286 machines
>were the Intel 286/310 and IBM PC/AT.  The 68010 machines were
>a Sun 2/120 and a Masscomp workstation.

	Well I haven't had a chance to run benchmarks on a real 68020
system, but I have run benchmarks on an Intel 286/380, an IBM PC/AT and
a Masscomp and a Callan. Except the for the PC/AT the results were posted
in my previous postings along with a dozen other machines all relative to
a 4.2BSD VAX 780.  If you would like to see those results, please let me know.
	Here are just the results for the three machines (VAX 780 == 1.0), all
running UNIX/XENIX.

		Intel 286	PC/AT		Masscomp	Callan

LOOP		.16		.19		.38		.40

CC LOOP		.17		.20		.38		.13

SIEVE		.56		.66		.57		.59

CC SIEVE	.19		.22		.4		.15

FLOAT		.0029		.0027		.030 (.41)	.031

GETPID		.55		.64		.76		.89

GREP		.39		.28		.4		.51

COPY		.10		.14		.25		.15

NROFF		.27		na		.4		.29

SORT		.41		.38		.5		.47

mean
		.28 (.30)	.30 (.34)	.41 (.45)	.36
stddev
		.19		.22		.19		.26


	For the 286 and the PC/AT, no floating point chip/support was available.
The floating point emulation was abysmal - the mean number in parenthesis do not
include the FLOAT benchmark for these machines. The FLOAT benchmark number for
the Masscomp in parenthesis is with their floating point processor, which seems
to be a considerable help. Note that in comparing the Masscomp to the Callan
one should consider that the Masscomp has a cache and a better disk.
	These results do not substantial Intel's claims in comparing their
80286 to the 68000 or 68010 (unless you plan on running just SIEVEs). I would
therefore doubt Intel's claims in comparing the 80286 to the 68020. The 6Mhz
80286 appears to be slightly less than a 68010 at 10Mhz. I don't see how a
10Mhz 80286 could be 2.85 times a 10Mhz 68010. Nevertheless the PC/AT proves
to be quite a good price/performer.


					Cheers,
					Dan Ts'o
					Dept. Neurobiology
					Rockefeller Univ.
					1230 York Ave.
					NY, NY 10021
					212-570-7671
					...cmcl2!rna!dan

dave@soph.UUCP (Dave Brownell) (05/21/85)

I agree with Dan -- I do NOT believe those Intel benchmarks.  Anyone who's
seen the 68000/8086 Benchmark Wars between Motorola and Intel has to be
skeptical about any 68020/80286 figures put out by Intel.  I was quite
surprised at an informal benchmark I came up with last week on 1 Kb core
copies on a set of machines, namely that the 286 processor came out tad
better than a 68010.  There was only a little additional code (Ethernet
protocol processing) going on between the copies.

My numbers, using the fastest block move instructions possible:
    Intel family processors:
	8088, 4.77 MHz	130 copies/sec
	8086, 8 MHz	260 copies/sec
	80286, 6 MHz	450 copies/sec (real mode)
    Motorola ones:
	68000, 12 MHz	340 copies/sec (SLOW block move loop)
	68010, 10 MHz	425 copies/sec

My impression is that the 68010 and 80286 are roughly the same overall.

Disclaimer:  this was NOT an overall benchmark, etc., so no flames please.
No wait states except on the 68000, so far as I know.  Yes, this gives
the 80286 an advantage since it then has the fastest memory system. (?)
-- 
	Dave Brownell
	EnMasse Computer Corporation
	enmasse!dave@Harvard.ARPA
	{genrad,harvard}!enmasse!dave

phil@amdcad.UUCP (Phil Ngai) (05/22/85)

In article <155@soph.UUCP> dave@soph.UUCP (Dave Brownell) writes:
>Disclaimer:  this was NOT an overall benchmark, etc., so no flames please.
>No wait states except on the 68000, so far as I know.  Yes, this gives
>the 80286 an advantage since it then has the fastest memory system. (?)

<enter sarcasm mode>
Gosh, how could a 6 MHz 80286 do as good as a 12 MHz 68010? Well, the
68000 system designers must have been trying to make the 68000 look bad
so they designed wait states into the memory. It's not possible that Intel
designed a more efficient memory interface, of course. Everyone knows Intel
doesn't know how to design anything.
<exit sarcasm mode>

The 80286 does *not* need faster memory devices than the 68000. What the
80286 does do is use the (same cost) memory devices more efficiently than
the 68000. It's called pipelining and is a well known technique among
computer professionals. It's only unfair in the same sense that being
smarter is unfair. You wouldn't give more credit to someone who starts
a fire by rubbing two sticks together than someone who uses a match just
because he works harder, would you?


-- 
 What do you do the day after a peak experience?

 Phil Ngai (408) 749-5720
 UUCP: {ucbvax,decwrl,ihnp4,allegra}!amdcad!phil
 ARPA: amdcad!phil@decwrl.ARPA

dave@soph.UUCP (Dave Brownell) (05/24/85)

In article <> phil@amdcad.UUCP (Phil Ngai) writes:

> <enter sarcasm mode>
> Gosh, how could a 6 MHz 80286 do as good as a 12 MHz 68010?

    Ummm ... what are you talking about?  There is *no such part* as
    a 12 MHz 68010.  Let's not spread any misinformation here ...
    you are drawing some flakey conclusions from those numbers I put
    out.

> <exit sarcasm mode>

> The 80286 does *not* need faster memory devices than the 68000. What the
> 80286 does do is use the (same cost) memory devices more efficiently than
> the 68000. It's called pipelining and is a well known technique among
> computer professionals.

    Gee ... I must have hit a nerve somehow, to make a nettie resort
    to title-dropping.  Only, I thought "computer professional" was
    a term used only by "MIS Week".

Actually, the reason the 80286 chews up memory bandwidth is that it
uses a 250 ns. bus cycle (at 8 MHz) vs. a 400 ns. bus cycle for a 10 MHz
68010.  Since you don't have much time to decode the address, you either
use 120 ns. DRAM ($$$$) or add a wait state and slow it down to a 375 ns.
memory cycle.

Pipelining has nothing to do with it; though it *does* spend a lot of cycles
filling the pipeline, this has nothing to do with the memory speed needed.
-- 
	Dave Brownell
	EnMasse Computer Corporation
	enmasse!dave@Harvard.ARPA
	{genrad,harvard}!enmasse!dave

brian@digi-g.UUCP (Merlyn Leroy) (05/28/85)

Phil Ngai writes:
>The 80286 does *not* need faster memory devices than the 68000. What the
>80286 does do is use the (same cost) memory devices more efficiently than
>the 68000...
>It's only unfair in the same sense that being
>smarter is unfair. You wouldn't give more credit to someone who starts
>a fire by rubbing two sticks together than someone who uses a match just
>because he works harder, would you?

Um, you can't light a fire with a match in a 80*86.  You see, the match is
in one segment, and the firewood is in another...

Merlyn Leroy
Segmented memory is a fractured idea.

clif@intelca.UUCP (Clif Purkiser) (05/30/85)

> 
> In article <> phil@amdcad.UUCP (Phil Ngai) writes:
> 
> > <enter sarcasm mode>
> > Gosh, how could a 6 MHz 80286 do as good as a 12 MHz 68010?
> 
>     Ummm ... what are you talking about?  There is *no such part* as
>     a 12 MHz 68010.  Let's not spread any misinformation here ...
>     you are drawing some flakey conclusions from those numbers I put
>     out.
> 

If there is no 12 MHZ 68010, why does Mot claim that a 16.6MHz 68020 is
2.5x performance of a 12 Mhz 68010?

> > <exit sarcasm mode>
> 
> > The 80286 does *not* need faster memory devices than the 68000. What the
> > 80286 does do is use the (same cost) memory devices more efficiently than
> > the 68000. It's called pipelining and is a well known technique among
> > computer professionals.
> 
>     Gee ... I must have hit a nerve somehow, to make a nettie resort
>     to title-dropping.  Only, I thought "computer professional" was
>     a term used only by "MIS Week".
> 
> Actually, the reason the 80286 chews up memory bandwidth is that it
> uses a 250 ns. bus cycle (at 8 MHz) vs. a 400 ns. bus cycle for a 10 MHz
> 68010.  Since you don't have much time to decode the address, you either
> use 120 ns. DRAM ($$$$) or add a wait state and slow it down to a 375 ns.
> memory cycle.
> 
> Pipelining has nothing to do with it; though it *does* spend a lot of cycles
> filling the pipeline, this has nothing to do with the memory speed needed.
> -- 
> 	Dave Brownell
> 	EnMasse Computer Corporation
> 	enmasse!dave@Harvard.ARPA
> 	{genrad,harvard}!enmasse!dave

	Actually, the 80286 has two types of pipelining:  The internal 
pipelining which cause the CPU to execute instruction faster on the 286
then the 68010.  (e.g ADD AX, #val is only 2 clocks on a 286, 
compared to  8 clocks for ADD #val, (4 Clocks for the operation and 4 more
clocks for the effective address calculation) on a 68010).  

	The other type of pipelining on the 286 is address pipelining. Address
pipelining puts the address out 1/2 clock before the start of the next bus
cycle.  This gives an 8Mhz 286  ~300 nsec of memory access time, when used with 
an interleaved DRAM design.   

	The 68010 uses 4 clock bus which on the surface implies that you have
400 nsec access time. However, a closer look at the 68010 timimg diagram
 shows that that the address isn't valid until 70nsec after the start of
a bus cycle thus reducing the memory access time to ~330 nsec.

	So despite, the 286 having a two clock bus (which speeds up instruction
execution) compared to a four clock bus for the 68010 the parts have similiar
memory access times.  Because, the 286 puts out the address 1/2 clock early and
the 68010 puts out the address over 1/2 clock late.

	Before I recieve tons of flames, I'll concede three points:

1.  An interleaved DRAM design is more complicated than a non interleaved 
    design


2.  I have not discussed several other timings needed to accurately calculate
memory access times, because the timing paramaters are similiar (+/- 2ns) 
between the 68010 @10Mhz and the 286 @8Mhz.

3.  Memory access time is not the same as the speed DRAMs you need.

  	Of course, I also have not talked about the fun the hardware engineers
have designing a MMU that runs a zero wait states with the 68010.

-- 
Clif Purkiser, Intel, Santa Clara, Ca.
HIGH PERFORMANCE MICROPROCESSORS
{pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif
	
{standard disclaimer about how these views are mine and may not reflect
the views of Intel, my boss , or USNET goes here. }

tim@callan.UUCP (Tim Smith) (06/04/85)

[ He is talking about 1K block transfers ]

> My numbers, using the fastest block move instructions possible:
>     Intel family processors:
> 	8088, 4.77 MHz	130 copies/sec
> 	8086, 8 MHz	260 copies/sec
> 	80286, 6 MHz	450 copies/sec (real mode)
>     Motorola ones:
> 	68000, 12 MHz	340 copies/sec (SLOW block move loop)
> 	68010, 10 MHz	425 copies/sec
.
.
> No wait states except on the 68000, so far as I know.  Yes, this gives

I tried this on my 10Mhz 68010 with no wait states.  I used two functions,
blt( src, dst, count ) and wblt( src, dst, count ).  The first copies
'count' bytes from *src to *dst.  The second copies 'count'/2 words from
*src to *dst, with src and dst assumed even.  They were timed from a C
program that called each one 5000 times.  Here are my results:

	68010, 10Mhz	~650 copies/sec ( arbitrary buffers )
	68010, 10Mhz   ~1600 copies/sec ( word aligned buffers )


Note that I am probably note using the fastes block transfer.  I am just
using a dbra loop, and letting the 68010 go into 'loop mode'.

Are you SURE you don't have wait states?

ps:  the above times agree with what the instruction timings in the back
of the book say!  something must be wrong... :-)

davet@oakhill.UUCP (Dave Trissel) (06/07/85)

In article <594@intelca.UUCP> clif@intelca.UUCP (Clif Purkiser) writes:
>> 
>> In article <> phil@amdcad.UUCP (Phil Ngai) writes:
>> 
>> > <enter sarcasm mode>
>> > Gosh, how could a 6 MHz 80286 do as good as a 12 MHz 68010?
>> 
>>     Ummm ... what are you talking about?  There is *no such part* as
>>     a 12 MHz 68010.  Let's not spread any misinformation here ...
>>     you are drawing some flakey conclusions from those numbers I put
>>     out.
>> 
>
>If there is no 12 MHZ 68010, why does Mot claim that a 16.6MHz 68020 is
>2.5x performance of a 12 Mhz 68010?
>

My my, Phil.  Where did you come up with your information?  We have been
shipping 12.5 Megahertz parts for over a year now. Several systems have
been using them such as HP, Charles River Data Systems, Wicat, Tektronix,
... the list goes on.  Look on your local distributer's shelf in the last
year?

How about giving me a penny for every 12.5 Megahertz MC68000 or MC68010 we've
shipped to make amends?   :-)

Motorola Semiconductor Inc.            Dave Trissel
Austin, Texas               {ihnp4,seismo}!ut-sally!oakhill!davet