[net.arch] 68020 Performance Revisited Again

falcone@erlang.DEC (Joe Falcone, HLO2-3/N03, dtn 225-6059) (11/06/84)

CC:	 


Response to redwood!rpw3

1. Your idea of cpu-intensive UNIX benchmarks sure is strange;
   Gosh, I always thought there was a fairly large I/O component to
   cc, nroff, grep, vi, mail, news, etc.  Benchmarks with significant
   I/O components measure your bus and disks, not your processor.
   And since you can put high-performance disks on most microprocessors
   these days, it is not surprising that your figures came out so high.

2. My experience with cpu-bound tasks (with little or no I/O and running 
   essentially core-resident) on the 8MHz HP Series 200 and the 
   10MHz SMI 2-170 is that the VAX-11/780 is anywhere from 2 to 10 times
   faster, and most tests fall between 3 and 5 times faster given comparable
   code quality.

3. Just as you have been able to find a benchmark which runs faster on
   the 68K, I have a benchmark which ran 100 times faster on the 780 and
   did not use floating-point - these benchmarks are meaningless because
   they don't measure the machinery, they are measuring compiler and OS
   quality.  It is very difficult to measure the real beast in these machines.

4. Your comment about the 200ns SBI is ludicrous - the 780 has a large
   cache and the SBI handles 64 bit packets, so there is no way that
   the SBI is kept busy - that is by design to allow enough bandwidth
   for other devices to do their stuff.  One of the faults of the 68K
   family is the tendency to use up nearly all available bus bandwidth
   for instruction execution, leaving very little for I/O and coprocessors.

5. I've spent 8 years working with UNIX systems.  I have yet to see
   a machine run 4.2 better than the 780 does (soon to change with the
   advent of the VAX 8600).  If you do want to get into UNIX vs. VMS
   operating system comparisons, VMS does have significantly better compilers
   and quicker I/O so a lot of benchmarks run faster on it.  While on this
   subject, no one has yet to run benchmarks on the 68020 with a compiler
   which uses the extended instruction set, so this should add a few percent
   to 68020 performance.

6. I would suggest that you read the article on the 68020 in IEEE Micro.
   If you had, you would not have so ridiculously over-simplified the
   performance implications of clock, bus, and cache.  No, doubling the
   clock does not double performance.  Sorry, you don't hit your instruction
   cache 100% of the time, so you'll have to wait around a bit more.  
   Too bad, your 32-bit bus saturates just as quickly as on the 68010
   (because of more 32-bit operands and the doubled clock speed fetching them).
   And multi-processors?  With 70-90% of bus bandwidth gone, you had better
   have some really bright ideas on how to get out of this one, Ollie.
   Yes, it is going to take interleave, data cache, a blazing MMU, and all
   of the things we have come to expect from mainframes - but this all comes
   at a bigger pricetag.

7. I've based my figures on 5 years of experience with VAX and 68K systems
   (most of it not at Digital) - I'm reporting on what I've seen and what
   I think you can expect from the 68020 - at best you can expect 780 
   performance given comparable compilers on CPU intensive benchmarks.
   And that ain't too bad, if you ask me.  

all in the opinion of...

Joe Falcone
Eastern Research Laboratory			decwrl!
Digital Equipment Corporation			decvax!deccra!jrf
Hudson, Massachusetts				tardis!

ian@loral.UUCP (Ian Kaplan) (11/07/84)

  I thought tha Joe Falcone's original article would generate a lot of
  lively discussion and it did.  I have enjoyed it and I am sure that I
  have learned from it.

  The various sides in the discussion have different perspectives and
  objectives.  The discussion so far has discussed the different design
  tradeoff which were made in these differing environments.  Recently the
  discussion has started to heat up.  I realize that this is nothing near
  what people say to each other in other news groups (ie. net.singles and
  net.flame) but I can see that the discussing is starting to drift away
  from an intelectual discussion of architectural merits.  

  The humble opinion of

                           Ian Kaplan
			   Loral Data Flow Group
			   Loral Instrumentation
			   8401 Aero Dr.
			   San Diego, CA
			   92123
			   (619) 560-5888 x4812
			   ucbvax!sdcsvax!sdcc6!loral!ian

trow@uw-june (Jay Trow) (11/08/84)

Forwarded from 68000Interest^.wbst@Xerox.arpa

----------------------------------------------------------------

From: deutsch.pa
Date:  7-Nov-84 10:15:07 PST
Subject: Re: 68020 Performance Revisited Again

I am sorry that I didn't make my argument about off-chip cache performance
sufficiently clear.  ONLY THE CACHE need respond within the 120 ns window.
Since the cache is assumed to use virtual addresses, the MMU and virtual
address translation machinery only comes into play in the case of a cache
miss.  Of course, if these functions are slower, the overall system
performance will be slowed down, but only proportionately to the
frequency of cache misses.  A custom-design cache chip with a 100 ns
designed response time should be well within the capabilities of
current technology.

----------------------------------------------------------------

guy@rlgvax.UUCP (Guy Harris) (11/13/84)

> 1. Your idea of cpu-intensive UNIX benchmarks sure is strange;
>    Gosh, I always thought there was a fairly large I/O component to
>    cc, nroff, grep, vi, mail, news, etc.

Ever timed "cc" or "nroff"?  *VERY* CPU-intensive - at least the versions
we've got here on our 780.  One "make" rebuilding the kernel takes up
between 60 and 90% of an 11/780.

Also, note he only referred to the aforementioned as '"real" UNIX tasks',
not "cpu-intensive UNIX benchmarks."  He referred both to CPU-intensive
and disk-intensive tasks.

> 5. I've spent 8 years working with UNIX systems.  I have yet to see
>    a machine run 4.2 better than the 780 does (soon to change with the
>    advent of the VAX 8600).

Working for a competitor who *has* a machine that runs 4.2 better than the
780 does, unless you're beating the terminals to death (our terminal mux is,
shall we say, sub-optimal), I'm a little biased here, but there do exist
superminis out there that are faster than an 11/780.  Are you willing to make
that claim about the Power 6/32, *and* the Pyramid 90x, *and* the
top-of-the-line Gould (maybe the MV/10000, too)?  (While we're at it, how about
the 11/785?  If it isn't any improvement over the 11/780 running 4.2, *somebody*
screwed up...)  (Anybody put 4.2 up on some big IBM/Amdahl/... iron? For
terminal I/O, I dunno, but I bet it's pretty good on CPU-intensive or
disk-intensive jobs.)  If you mean you've never seen any *micro* out there run
4.2 better than the 11/780, maybe.

I agree that statements of the "wow, this supermicro is faster than a
<fill in your favorite mini>!" ilk are to be taken with a grain of salt -
we had a supermicro in house whose manufacturer boasted that it was as fast
as an 11/70.  We decided, after working some with it, that it was no doubt
true, under certain circumstances.  If you dropped it off a building, it would
fall as fast as an 11/70 (modulo air drag).

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy