zs01+@andrew.cmu.edu (Zalman Stern) (02/02/91)
A while back I posted a message about the IBM RISC System/6000 (RIOS) this elicited responses from John Carr and Tim Bray. I'd like to clear up a few points. 1) MMU. The RIOS MMU uses inverted page tables based with 52 bit virtual addresses constructed from a 32 bit effective address (from the user program) and 16 OS managed segment registers. This is quite similar to the RT except that the segment registers are wider on the RIOS. The other architectural difference between the two machines is that the RIOS adds hardware support for granting lock bits. Lock bits provide per process memory protection on 128 byte "lines" of memory and are used to implement "database memory" (or "transactional memory") (*). Lock bits are selected on a per segment basis. Segments for which they are enabled are called "special segments." (The RIOS also has I/O segments with a different kind of protection mechanism. The RT has one wired segment (0xf) for I/O.) Comparing the RT MMU implementation to that of the RIOS ignores certain issues. The RT does not have caches nor is it a multi-chip processor. The RIOS has special support to do uncached fetches to reload the TLB. The fixed point unit and the branch chip have separate TLBs. This provides wider TLB coverage at some cost in silicon. My objections to the RIOS MMU are that it has three different mechanisms for memory protection (four if you count segment accessibility). In addition, the lock bit support eats a significant amount of silicon and is basically useless on a UNIX workstation. There are very few applications which benefit from transactional memory. For those that do (databases and reliable filesystems) there are software techniques to accomplish the same goals. Its not clear to me that the hardware is worth the performance it buys. Especially since nobody writes code for hardware like this in the UNIX marketplace. The portability hit is just too much. Even IBM got burned on this when they looked at porting the AIX Journaling Filesystem to the Intel 80386. In short, it ain't a RISC MMU. I have not seen any *published* numbers for the TLB reload performance on the RIOS. I'm sure it would be an interesting thing to measure. (*) See A. Chang, and M. F. Mergen, "801 Storage: Architecture and Programming," ACM Transactions on Computer Systems, 6, PP 28-50, February 1988. 2) Mach VM support. John Carr's comment that Mach is guilty of "All the world is a VAX" thinking is just wrong. Neither the RT nor the RIOS implementation of Mach 2.5 use any data structure resembling a VAX page table. If you want to see that hack, look at IBM's 4.3 BSD port where they had to emulate VAX page tables because the "machine independent" code makes so many assumptions about the format of PTEs. I maintain that comparing context switch times on the RIOS and an R3000 based box both running Mach 2.5 is a fair test. It is much easier to quantify the software differences and account for them. That way one can get a fair picture of how the hardware performs. Unfortunately, I doubt IBM is interested in doing this. (There is a paper being presented at the upcoming ASPLOS conference that does such measurements using Mach on a number of architectures.) I've also seen a comment or two that one shouldn't use page sharing because segment sharing is more efficient. Unfortunately, shared memory regions are used to represent things like object modules and message buffers. Anyone who believes that 16 of these are enough is "a fascist pig with a read-only mind." (Richard Gosper in HAKMEM item 154). 3) X Server. The X Server software was pretty immature when I was using the RIOS. It improved quite with each release back then. There are hardware problems too. The low end 8 bit color frame buffer sits out on the MicroChannel. (Most workstations today place the frame buffer on the processor memory bus.) The MicroChannel is an I/O bus and has significantly higher latency than memory. Also, all accesses to I/O space (necessary to get out to the bus) are uncached. This eliminates certain cache tricks that X servers can play for even higher performance. The mid-range IBM graphics card (based on Silicon Graphics technology) doesn't allow the processor to write to the frame buffer. This causes problems for certain X server operations. (Though it is a good design for certain other applications.) Zalman Stern, MIPS Computer Systems, 928 E. Arques 1-03, Sunnyvale, CA 94086 zalman@mips.com OR {ames,decwrl,prls,pyramid}!mips!zalman (408) 524 8395 "``Ah, so,'' said Daruma the O-maker" -- Tom Robbins