[net.arch] MMU Cache revisited

gnu@sun.uucp (John Gilmore) (08/09/85)

Someone pointed out that changing contexts in the IBM 370 doesn't cause
a big performance hit because the page table cache (called the "TLB")
contains multiple contexts' entries, indexed with a pointer into another
cache ("STO stack").  It should be mentioned that early 370's do not have
the STO hardware; it was added because context switching took too long.
(Their first virtual memory systems did not give each process a different
address space, it just let the address space everybody shared be larger
than the physical memory.  For that they didn't need to change the MMU
context.)

It has also not been mentioned that systems where you copy page table
entries into dedicated fast RAMs need not recopy on every context
switch.  In the Sun-2 MMU, for example, 8 complete contexts can remain
in the fast RAMs, and the only reloading required is when you are
context-switching more than 8 processes.  On a single user system
(running Unix where most processes die quickly anyway) this is not a 
performance bottleneck.

From the hardware designs I've seen, it's a lot harder to build an MMU
with a cache than it is to build one out of RAM.  This is because
the cache is doing in hardware what would otherwise be done in software
(updating the entries in the hardware translation table).  Whether
this is worth it or not depends on the individual system and what
it will be used for.  I suspect the overhead difference between the
two is negligible in the overall system load, unless the hardware is
badly designed, so I favor the cheaper approach.

bcase@uiucdcs.Uiuc.ARPA (08/13/85)

/* Written  7:29 pm  Aug  8, 1985 by gnu@sun.uucp in uiucdcs:net.arch */
/* ---------- "Re: MMU Cache revisited" ---------- */
From the hardware designs I've seen, it's a lot harder to build an MMU
with a cache than it is to build one out of RAM.  This is because
the cache is doing in hardware what would otherwise be done in software
(updating the entries in the hardware translation table).  Whether
/* End of text from uiucdcs:net.arch */

No, an MMU built with a TLB (address translation cache) does not by
definition include hardware reload of translation entries.  Software
reload and a TLB are perfectly compatible.

uhclem@trsvax (08/22/85)

/* Written 11:03 am  Aug 13, 1985 by fortune.U!wall in trsvax:net.arch */

>...
>Anyone care to agree with that?  Anyone care to tell me what 
>reasonable application or operating system spends 50% of its time
>in loops that are smaller that 256 bytes??
>...
>    But, hey, I could be wrong. It's happened before. So let's hear
>it. Anyone who claims high cache hit rates on normal applications,
>let's hear the justification for them.

Here are a couple of examples:

1.	Machine code for screen scrolling on bit-mapped graphic systems.
	If you are doing the entire screen, a few block moves will do.
	But if you are scrolling a particular area of the screen, the
	grunt-code needed fits nicely into 256 bytes.

2.	Graphic tiling or painting on bit-mapped systems.
	Once again, the painters' algorithm will fit (or the majority
	of it will fit) into 256 bytes.

I don't have any benchmarks on this stuff on a 68.02K yet, but I can
tell you how slow it runs on a ipax186 with its pipeline flushing
at the end of each scan line (end of loop) and whenever a jump is needed.


<The above is my opinion and not that of my employer; they don't 
 even know what the question is.>
						
						"Thank you, Uh Clem."
						Frank Durda IV
						@ <trsvax!uhclem>