gnu@sun.uucp (John Gilmore) (08/09/85)
Someone pointed out that changing contexts in the IBM 370 doesn't cause a big performance hit because the page table cache (called the "TLB") contains multiple contexts' entries, indexed with a pointer into another cache ("STO stack"). It should be mentioned that early 370's do not have the STO hardware; it was added because context switching took too long. (Their first virtual memory systems did not give each process a different address space, it just let the address space everybody shared be larger than the physical memory. For that they didn't need to change the MMU context.) It has also not been mentioned that systems where you copy page table entries into dedicated fast RAMs need not recopy on every context switch. In the Sun-2 MMU, for example, 8 complete contexts can remain in the fast RAMs, and the only reloading required is when you are context-switching more than 8 processes. On a single user system (running Unix where most processes die quickly anyway) this is not a performance bottleneck. From the hardware designs I've seen, it's a lot harder to build an MMU with a cache than it is to build one out of RAM. This is because the cache is doing in hardware what would otherwise be done in software (updating the entries in the hardware translation table). Whether this is worth it or not depends on the individual system and what it will be used for. I suspect the overhead difference between the two is negligible in the overall system load, unless the hardware is badly designed, so I favor the cheaper approach.
bcase@uiucdcs.Uiuc.ARPA (08/13/85)
/* Written 7:29 pm Aug 8, 1985 by gnu@sun.uucp in uiucdcs:net.arch */ /* ---------- "Re: MMU Cache revisited" ---------- */ From the hardware designs I've seen, it's a lot harder to build an MMU with a cache than it is to build one out of RAM. This is because the cache is doing in hardware what would otherwise be done in software (updating the entries in the hardware translation table). Whether /* End of text from uiucdcs:net.arch */ No, an MMU built with a TLB (address translation cache) does not by definition include hardware reload of translation entries. Software reload and a TLB are perfectly compatible.
uhclem@trsvax (08/22/85)
/* Written 11:03 am Aug 13, 1985 by fortune.U!wall in trsvax:net.arch */ >... >Anyone care to agree with that? Anyone care to tell me what >reasonable application or operating system spends 50% of its time >in loops that are smaller that 256 bytes?? >... > But, hey, I could be wrong. It's happened before. So let's hear >it. Anyone who claims high cache hit rates on normal applications, >let's hear the justification for them. Here are a couple of examples: 1. Machine code for screen scrolling on bit-mapped graphic systems. If you are doing the entire screen, a few block moves will do. But if you are scrolling a particular area of the screen, the grunt-code needed fits nicely into 256 bytes. 2. Graphic tiling or painting on bit-mapped systems. Once again, the painters' algorithm will fit (or the majority of it will fit) into 256 bytes. I don't have any benchmarks on this stuff on a 68.02K yet, but I can tell you how slow it runs on a ipax186 with its pipeline flushing at the end of each scan line (end of loop) and whenever a jump is needed. <The above is my opinion and not that of my employer; they don't even know what the question is.> "Thank you, Uh Clem." Frank Durda IV @ <trsvax!uhclem>