roy@forbrk.UUCP (Roy Gordon) (05/08/87)
(Please forgive if this has been posted twice. The first posting didn't show up on our own system.) There has been some discussion of the compatibility of the Motorola 68851 (PMMU) and the 68030 on chip PMMU. Neither the PMMU nor the 68030 MMU consitutes a proper superset of the other. The PMMU has instructions and registers not found in the 68030 MMU, while the latter has registers not on the PMMU. However, in a typical Unix implementation little work would needed to port PMMU specific code to the 68030. The 68030 has registers not on the PMMU. The important differences between the PMMU and the 68030 MMU in a Unix environment, as I see it, are: (1) The PMMU has a 64 entry cache with 8 process id tags. The 68030 MMU has a 22 entry cache with no process id tags. (2) The 68030 has two transparent translation registers (TTRs) that pass the address through untranslated. Effectively, they are cache hits not requiring cache entries. CACHE SIZE Limiting the cache to 22 entries and forcing clearing and reload on every context switch seems likely to result in performance degradation. As mentioned, there are no process ids attached to the cache entries. However, with but 22 entries this may not be the wrong approach. My beliefs on the effects of the cache reduction on performance are intuitive. Does anyone have actual measurements on cache size with/without process id tags? in a Unix environment? in other environments? TRANSPARENT TRANSLATION If kernel text and data are mapped transparently then use of the transparent translation registers always results in cache hits for the kernel whether or not the kernel exhibits good locality of reference. Further, no cache entries are required for addresses mapped by the TTRs. We will the TTRs to map RAM and our other board addresses. Ram starts at 0 up to 64 MB, but the other board space begins at 0x2000,0000. So we will use one TTR for kernel ram, and one for the hardware addresses. However, the TTRs are programmed with a base address and mask, like the 68451 descriptors, as opposed to a more sensible base address and limit, so the memory map must be appropriate. The minimum area mapped by a TTR is 16MB. With the PMMU a "shared globally" bit can be set in a long (8 byte) descriptor. The effect is that the cache entry is valid for all process id's. However, each such page still requires a cache entry, so non-locality of the kernel could be an issue if too many cache entries are used. (I seem to remember some past discussion on kernel non-locality, but don't remember if the effects on overall cache hit rate within a Unix environment were reported.)
djl@mips.UUCP (Dan Levin) (05/13/87)
In article <319@forbrk.UUCP>, roy@forbrk.UUCP (Roy Gordon) writes: > TRANSPARENT TRANSLATION > > We will the TTRs to map RAM and our other board addresses. Ram starts at > 0 up to 64 MB, but the other board space begins at 0x2000,0000. So we will > use one TTR for kernel ram, and one for the hardware addresses. > [...] > (I seem to remember some past discussion on kernel non-locality, > but don't remember if the effects on overall cache hit rate > within a Unix environment were reported.) The UNIX kernel has terrible locality, and combined with its rather substantial size these days, that makes for a real problem for TLB's. If you are not careful, you can wipe out your TLB every time you take a network input interrupt. Since you generally don't need to relocate the kernel, and you probably don't need to page the kernel in these days of cheap memory, there is little reason to map it through the TLB. Note that there are at least three ways to attack this problem. The AM 29000 allows only for complete disabling of the TLB, with no facility for selective mapping. The 68030 allows for two unmapped segments, as discussed above. The MIPS R2000 has two unmapped segments of its address space, one cached and one uncached. We use the unmapped but cached space for the kernel, and the unmapped and uncached space for memory mapped I/O. Should we ever decide to page a part of the OS, we can simply relink that part to run in mapped space. It turns out to be handy to map some kernel data structures to facilitate copy-on-write, and for the user structure and PTE's, so we use part of the mapped kernel address space for those. -- ***dan decwrl!mips!djl mips!djl@decwrl.dec.com