[net.micro.68k] 68020 performance

falcone@erlang.DEC (Joe Falcone, HLO2-3/N03, dtn 225-6059) (09/07/84)

CC:	 


When is 16Mhz, not 16Mhz?

It is difficult to discuss performance of the 68020 chip "out of context",
i.e., without information about the rest of the system components.
Because of its potentially very great speed (16.67 Mhz), the 68020 places
significant demands on the memory and I/O portions of a system.  One can
assume that the 68020 instruction buffer and perhaps a cache designed into
the system can reduce the demands on the memory system, however this
reduction is highly dependent on the workload and the design of the cache
and memory.  No two system designs incorporating the 68020 chips are likely to
be the same.

The memory demands have worried me ever since a processor called the HP 9000
claimed to be able to run at 18Mhz with a relatively memory intensive 
stack architecture and no cache.  The HP 9000 got around the problems
by decoding and presenting addresses extremely early in instruction execution
to a heavily pipelined memory controller which sure enough could deliver
a word every 110ns (every other processor cycle).  The memories were
specially developed 128k nmos rams which ran fast and hot. Although
the scheme worked, it was a technological "house of cards" since every
piece was critically dependent on the performance of other components.
Although it was claimed that one could run the HP 9000 processor chips
at speeds over 30Mhz in the lab, the operation of these chips at that speed
would have necessitated a faster bus, memory controller, andrams.
So even though the cpu chip remains unchanged, the rest of the system
has a fit.

Now the problem with the 68020 is that it simply does not present addresses
soon enough to the memory.  Therefore, to run with no wait states,
one must be able to provide the requested data within the few cycles
allotted.  In many current 68000 systems, the cycles allotted are
not sufficient to avoid wait states because of delays from the memory
management unit or slow rams.  Therein lies a dilemma, for all of us want
memory management of some form, and the cheaper, denser rams tend to be
a little slower until technology catches up.

My SR-50 tells me that a 16.67MHz clock gives a 60ns processor cycle.  
Assuming 4 cycles per access, that gives you 240ns for the round trip 
(address out, data in).  A 200 nanosecond multi-megabyte main memory 
and management unit would be prohibitively expensive to integrate into
a moderately priced 68020 workstation (although it is possible to do
it at a definite big price).  Hence, the crying need for a cache to
assist data accesses and perhaps supplement the instruction buffer.

Now, unless someone designs a cache which gives 100% data and instruction
hit rates, that 16.67Mhz clock will be degraded by misses, and unless
the memory subsystems are especially fast, there would be multi-cycle
waits.  Just off the top of my head, using a 90% hit rate cache, one
might be looking at performance degradation anywhere between 10 and 25%
due to miss penalties (it all depends on how big your cache is, how
fast your main memory is, and how quickly your cache cycles the main memory
to get what you need).  So the 16.67Mhz clock has fallen as low as 12Mhz.

Sometimes I think the FTC should get involved with this stuff.

As a final note, the following is my own opinion as an educated individual.

Isn't it kind of ridiculous to compare a cpu chip set to a very large
computer system with caches and high speed I/O buses.  I'm sure there
are a lot of cpus out there that can beat a VAX-11/780 one-on-one running some
benchmark.  On the other hand, how many of them can handle the cpu,
virtual memory, and I/O demands of 20 to 40 users?  The fact is that
there is a lot of stuff in the 780 (special instructions, cache, SBI,
massbus, unibus, etc) just for handling lots of users for long periods
of time.  Unfortunately, this stuff does tend to get in the way of tests
of raw, single-user, processor performance (either directly or indirectly
because of compromises in the design process).  The 780 is not trying to
pretend to be a single-user workstation.

MORAL: If you want to compare the 68020 (or any microprocessor) to
       a VAX, wait for the figures on the forthcoming VAX chip sets (which were
       discussed at some of the chip conferences).  At the system level, the 
       microVAX line of small systems is designed and packaged more for 
       one-on-one personal use.  The microVAX and VAX chip sets offer some
       very interesting comparison opportunities for "fair fights" between 
       Digital and the competition.

In the meantime, as an exercise, one might want to examine the performance of
the grand old pdp-11/70 vs. the J-11 (11/73) chip set, and both of them
relative to other 16-bit processors.  With its memory management and floating
point support, the 11/73 performs very well as a few user Qbus system.
But the 11/70 is clearly still the choice for large systems because of its
cache, special memory architecture, and unibus/massbus I/O.  Although the
machines have similar performance, you just would not want to put 20 users
on an 11/73 - it doesn't have all the extras to take care of all those
people.

So the next time you want to compare microprocessors, go pick on someone
your own size.  Then you will have a more valid comparison.

Joe Falcone
Eastern Research Laboratory		decvax!
Digital Equipment Corporation		decwrl!deccra!jrf
Hudson, Mass				tardis!

henry@utzoo.UUCP (Henry Spencer) (09/15/84)

Joe, I agree with most of what you say -- clock rates are not a realistic
measure until you look at memory access times, and i/o bandwidth is quite
important when evaluating multi-user machines -- but I've got a couple of
comments.

> MORAL: If you want to compare the 68020 (or any microprocessor) to
>        a VAX, wait for the figures on the forthcoming VAX chip sets ...

The question is, will we ever see those VAX chip sets?  As chip sets, not
as overpriced packaged cpus.  Existence is no guarantee of commercial
availability, and conference papers are a poor substitute for chips in hand.

>        ...  At the system level, the 
>        microVAX line of small systems is designed and packaged more for 
>        one-on-one personal use.  ...

So is the VAX 730, in my opinion.  We've had to forcibly dissuade some
local groups who've been mesmerized by Dec marketing into thinking that
a 730 could run a dozen users doing statistics, just because of that
magic word "VAX".  My lousy little 11/44 will run rings around a 730
on anything which doesn't have serious address-space problems.  And
the 44 is a quarter the size and half the price.  When address space is
not an issue, the 44 will give a 750 a good run for its money, and the
size and price comparisons *there* don't bear mentioning.

For that matter, I note that the Murray Hill research folks now (I'm told)
budget for a 750 for every 4 users.

When is Dec going to announce some reasonably-priced big-address-space
processors?  I'd sure like one.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

doug@oakhill.UUCP (Doug MacGregor) (11/01/84)

I understand the motivation behind the note authored by Joe Falcone concerning
the comparison of the 68020 and a VAX. In principle I agree with his comments.
I understand that my figures representing the performance of the 68020 can be
taken out of context by those with a marketing or sales bent to unfairly
compare a processor with a system. In all of the work that I have done, I have 
been very deliberate in avoiding comparisons with any system. Because 
comparisons of this sort are so simplistic and erroneous, I never felt that 
they were an issue to anyone who knew how computers worked.

However, I strongly disagree with some of the interpretations that Mr. Falcone
has drawn as well as some of the conclusions made. 

First, the interpretations of the 68020 performance are a bit narrow.
The note implies that there are no applications except systems equivalent
to a VAX. These non-system applications which include real-time robotics,
graphics, and communication applications are a substantial portion of the 
68020 market. These applications are concerned with the performance of
the processor alone, not a system.

Second, it is implied that general purpose computer systems are not capable
of running without wait states. This is a dangerous underestimation of the
capability of the various system implementors using the 68020. The notion
of an off-chip cache is mentioned but then discarded. I do not believe
that it is possible to dispose of the various system solutions available
to the system designers, many of which are already being used on 68000 systems.

Third, reviewing the performance figures cited in the note there are several
very significant discrepancies. Assuming that we use the figure of 3-5x
performance of a VAX-11/780 to an 8MHz 68000 then I don't understand
how the table below was generated.
                    
>                "VAX MIPS"
>   
>   CPU                100ns          200ns           300ns
>   ----------------------------------------------------------
>   8MHz    68000    0.14-0.23      0.14-0.23       0.14-0.23
>   16MHz   68020*   0.42-0.70      0.30-0.50       0.24-0.40
>   16MHz   68020**  0.56-0.90      0.46-0.76       0.40-0.66
>   ----------------------------------------------------------
>   VAX-11/780                      0.7-1.0       
>   VAX-11/785                      1.0-1.5       
>   ----------------------------------------------------------
>   * I-cache disabled
>   ** 100% I-cache hit ratio

If we start with the presumption that 3-5x is valid then I assume
we use 4x for the 11/780 to 68000 comparison. This would give us
the following performance figures:

CPU                100ns          200ns           300ns
----------------------------------------------------------
8MHz    68000    0.18-0.25      0.18-0.25       0.18-0.25
----------------------------------------------------------
VAX-11/780                      0.7-1.0       
VAX-11/785                      1.0-1.5       
----------------------------------------------------------
                                                         
Using the 68000-68020 relationship shown in the performance tables 
in IEEE MICRO in terms of 68000 performance. 

CPU                100ns          200ns           300ns
----------------------------------------------------------
8MHz  68000        0.6 (1x)       0.6 (1x)        0.6 (1x)  
16MHz 68020*       2.1 (3.5x)     1.5 (2.5x)      1.3 (2.2x)
16MHz 68020**      2.7 (4.5x)     2.3 (3.7x)      2.0 (3.3x)
----------------------------------------------------------

When the two tables are combined, it shows a respectable cost/performance
ratio. If a microprocessor based product performs at or above the
level of a VAX, that seems significant.
                                                       
CPU                100ns          200ns           300ns
----------------------------------------------------------
8MHz  68000      0.18-0.25      0.18-0.25       0.18-0.25
16MHz 68020*     0.61-0.88      0.44-0.63       0.39-0.55    
16MHz 68020**    0.79-1.13      0.65-0.93       0.58-0.83
----------------------------------------------------------
-------------------------
VAX-11/780       0.7-1.0       
VAX-11/785       1.0-1.5       
-------------------------
                                                         
Fourth, the 68020 was designed to operate at 16MHz worst case, just as there
are currently 10 and 12.5MHz 68000 and 68010's available now, it is not
unrealistic to anticipate higher frequency 68020's in the years to come.

In conclusion, we designed a chip which is a processor. We did not design
a system. When we describe the performance of our chip it is only appropriate
to describe it in terms of a processor. There are too many variables that
can make a comprehensible evaluation impossible (i.e. compiler technology,
system configuration, system access times, etc.) that don't have anything
to do directly with the performance "capability" of the 68020. For this
reason, I feel that this method of performance evaluation is not only
appropriate, but essential.

P.S.  I would be interested in seeing a performance evaluation of the 
VLSI-based VAX and microVAX processors as well as the systems,  are there 
any public descriptions of these?


Doug MacGregor