[comp.arch] 68851 PMMU and 68030 MMU

roy@forbrk.UUCP (Roy Gordon) (05/08/87)

(Please forgive if this has been posted twice.  The first posting
didn't show up on our own system.)

There has been some discussion of the compatibility of
the Motorola 68851 (PMMU) and the 68030 on chip PMMU.

Neither the PMMU nor the 68030 MMU consitutes a proper superset of the other.

The PMMU has instructions and registers not found in the 68030 MMU,
while the latter has registers not on the PMMU.  However,
in a typical Unix implementation little work would needed to port
PMMU specific code to the 68030.  The 68030 has registers not on the PMMU.

The important differences between the PMMU and the 68030 MMU in a Unix
environment, as I see it, are:

	(1) The PMMU has a 64 entry cache with 8 process id tags.
	    The 68030 MMU has a 22 entry cache with no process id tags.

	(2) The 68030 has two transparent translation registers (TTRs) that
	    pass the address through untranslated.  Effectively, they are
	    cache hits not requiring cache entries.

CACHE SIZE

Limiting the cache to 22 entries and forcing clearing and reload on
every context switch seems likely to result in performance degradation.
As mentioned, there are no process ids attached to the cache entries.
However, with but 22 entries this may not be the wrong approach.

My beliefs on the effects of the cache reduction on performance are
intuitive.  Does anyone have actual measurements on cache size with/without
process id tags? in a Unix environment? in other environments?


TRANSPARENT TRANSLATION

If kernel text and data are mapped transparently then use of the transparent
translation registers always results in cache hits for the kernel whether
or not the kernel exhibits good locality of reference.  Further, no cache
entries are required for addresses mapped by the TTRs.

We will the TTRs to map RAM and our other board addresses.  Ram starts at
0 up to 64 MB, but the other board space begins at 0x2000,0000.  So we will
use one TTR for kernel ram, and one for the hardware addresses.

However, the TTRs are programmed with a base address and mask,
like the 68451 descriptors, as opposed to a more sensible base
address and limit, so the memory map must be appropriate.  The
minimum area mapped by a TTR is 16MB.

With the PMMU a "shared globally" bit can be set in a long (8 byte)
descriptor.  The effect is that the cache entry is valid for all
process id's.  However, each such page still requires a cache
entry, so non-locality of the kernel could be an issue if
too many cache entries are used.

(I seem to remember some past discussion on kernel non-locality,
but don't remember if the effects on overall cache hit rate
within a Unix environment were reported.)

djl@mips.UUCP (Dan Levin) (05/13/87)

In article <319@forbrk.UUCP>, roy@forbrk.UUCP (Roy Gordon) writes:
> TRANSPARENT TRANSLATION
> 
> We will the TTRs to map RAM and our other board addresses.  Ram starts at
> 0 up to 64 MB, but the other board space begins at 0x2000,0000.  So we will
> use one TTR for kernel ram, and one for the hardware addresses.
> [...]
> (I seem to remember some past discussion on kernel non-locality,
> but don't remember if the effects on overall cache hit rate
> within a Unix environment were reported.)

The UNIX kernel has terrible locality, and combined with its rather 
substantial size these days, that makes for a real problem for TLB's.
If you are not careful, you can wipe out your TLB every time you
take a network input interrupt.  Since you generally don't need to
relocate the kernel, and you probably don't need to page the kernel
in these days of cheap memory, there is little reason to map it through
the TLB.

Note that there are at least three ways to attack this problem.  The AM 29000
allows only for complete disabling of the TLB, with no facility for
selective mapping.  The 68030 allows for two unmapped segments, as discussed
above.  The MIPS R2000 has two unmapped segments of its address space,
one cached and one uncached.  We use the unmapped but cached space for
the kernel, and the unmapped and uncached space for memory mapped I/O. 
Should we ever decide to page a part of the OS, we can simply relink that
part to run in mapped space.  It turns out to be handy to map some kernel
data structures to facilitate copy-on-write,  and for the user structure and
PTE's, so we use part of the mapped kernel address space for those.

-- 
			***dan

decwrl!mips!djl                  mips!djl@decwrl.dec.com