[net.arch] 68020 Performance Revisited after re

bcase@uiucdcs.UUCP (11/01/84)

["...she's a line eater..." -- Hall and Oats]

Let me add my two cents to the discussion by agreeing that any sane
68020 implementation will have a data cache.  The Signetics MAC makes
the inclusion of one rather painless, I think.  And the software time
spent invalidating the cache it absolutely of no conern here:  one
cycle (instruction?).  Even a more expensive cache built out of TI's
2150 tag buffer chips would still only require one cycle to invalidate
(assuming, as has been stated, that the cache keys on virtual addresses).

And regarding Falcone's comments:  It seems to me that you are saying
that one VAX MIP is roughly equivalent to 3-4 68000 MIPS (my conclusion
is based on your final table which shows, at 200 ns memory, a 68020
performance of roughly .5 VAX MIPS, while the IEEE paper claims roughly
2.0 68020 MIPS).  This, and correct me please if I have gone astray
here, conclusion seems a little bogus.  The reason:  the average VAX
instruction that gets executed in the average program is not, in my
opinion, 3-4 times more powerful than the average 68020 instruction
for the same program, and the VAX is certainly not operating at 3-4
times the raw speed of the 68020.

    bcase

kissell@flairvax.UUCP (Kevin Kissell) (11/06/84)

>						And the software time
> spent invalidating the cache it absolutely of no conern here:  one
> cycle (instruction?).  Even a more expensive cache built out of TI's
> 2150 tag buffer chips would still only require one cycle to invalidate
> (assuming, as has been stated, that the cache keys on virtual addresses).

The price paid for purging a virtual-address cache when the virtual map
changes is not so much in the overhead of the act of purging itself, but 
in the loss of those cache lines that otherwise would have remained valid 
and thus been available for use when the pre-purge context was restored.
This can be significant in systems with large caches and frequent context
switches.

bcase@uiucdcs.UUCP (11/06/84)

> > Falcone is setting up something of a "straw man" here.  A conscientious
> > 68020 design MUST use a data cache, and the data cache keys must be
> > VIRTUAL addresses.  
> 
> Whether the cache should be virtual or real-address based would seem to 
> me to depend on a tradeoff between burning an MMU cycle for every access
> (real-address cache) and purging the cache every time the virtual map
> changes (virtual-address cache).
> 
> Kevin D. Kissell

That is correct, and it is generally agreed (at least among people I know,
and apparently among a lot of system designers) that burning an MMU cycle
on every memory reference is much worse for performance in general than
purging the (possibly only user partition of the) cache tags on process
switches.  Slowing cache accesses down from one to two cycles is a 100%
increase while causing a little extra cache refilling due to cold start
conditions in only a marginal percentage overhead, on the average (since
process switches occur (much) less than 200 times a second most of the
time, especially in single user systems).

    bcase

bcase@uiucdcs.UUCP (11/08/84)

/* Written  7:50 am  Nov  7, 1984 by kissell@flairvax in uiucdcs:net.arch */
>						And the software time
> spent invalidating the cache it absolutely of no conern here:  one
> cycle (instruction?).  Even a more expensive cache built out of TI's
> 2150 tag buffer chips would still only require one cycle to invalidate
> (assuming, as has been stated, that the cache keys on virtual addresses).

The price paid for purging a virtual-address cache when the virtual map
changes is not so much in the overhead of the act of purging itself, but 
in the loss of those cache lines that otherwise would have remained valid 
and thus been available for use when the pre-purge context was restored.
This can be significant in systems with large caches and frequent context
switches.
/* End of text from uiucdcs:net.arch */

Perfectly true, but this should not be considered a *SOFTWARE* (look at
the original wording) overhead; to me, software overhead is constituted
by such things as trying to decided when to invalidate caches, page
table entries in the TLB, and then actually performing the operation.
Accordingly, I was pointing out that in modern systems a full invalidation
costs only one cycle, as opposed to one cycle per entry as in older
systems.