[comp.arch] RIOS MMU/X performance, Mach VM.

zs01+@andrew.cmu.edu (Zalman Stern) (02/02/91)
A while back I posted a message about the IBM RISC System/6000 (RIOS) this
elicited responses from John Carr and Tim Bray. I'd like to clear up a few
points.

1) MMU.

The RIOS MMU uses inverted page tables based with 52 bit virtual addresses
constructed from a 32 bit effective address (from the user program) and 16
OS managed segment registers. This is quite similar to the RT except that
the segment registers are wider on the RIOS. The other architectural
difference between the two machines is that the RIOS adds hardware support
for granting lock bits. Lock bits provide per process memory protection on
128 byte "lines" of memory and are used to implement "database memory" (or
"transactional memory") (*). Lock bits are selected on a per segment basis.
Segments for which they are enabled are called "special segments." (The
RIOS also has I/O segments with a different kind of protection mechanism.
The RT has one wired segment (0xf) for I/O.)

Comparing the RT MMU implementation to that of the RIOS ignores certain
issues. The RT does not have caches nor is it a multi-chip processor.  The
RIOS has special support to do uncached fetches to reload the TLB.  The
fixed point unit and the branch chip have separate TLBs.  This
provides wider TLB coverage at some cost in silicon.

My objections to the RIOS MMU are that it has three different mechanisms
for memory protection (four if you count segment accessibility). In
addition, the lock bit support eats a significant amount of silicon and is
basically useless on a UNIX workstation. There are very few applications
which benefit from transactional memory. For those that do (databases and
reliable filesystems) there are software techniques to accomplish the same
goals. Its not clear to me that the hardware is worth the performance it
buys. Especially since nobody writes code for hardware like this in the UNIX
marketplace. The portability hit is just too much.  Even IBM got burned on
this when they looked at porting the AIX Journaling Filesystem to the Intel
80386. In short, it ain't a RISC MMU.

I have not seen any *published* numbers for the TLB reload performance on
the RIOS. I'm sure it would be an interesting thing to measure.

(*) See A. Chang, and M. F. Mergen, "801 Storage: Architecture and
Programming," ACM Transactions on Computer Systems, 6, PP 28-50, February
1988.

2) Mach VM support.

John Carr's comment that Mach is guilty of "All the world is a VAX"
thinking is just wrong. Neither the RT nor the RIOS implementation of Mach
2.5 use any data structure resembling a VAX page table. If you want to see
that hack, look at IBM's 4.3 BSD port where they had to emulate VAX page
tables because the "machine independent" code makes so many assumptions
about the format of PTEs.

I maintain that comparing context switch times on the RIOS and an R3000
based box both running Mach 2.5 is a fair test. It is much easier to
quantify the software differences and account for them. That way one can
get a fair picture of how the hardware performs. Unfortunately, I doubt IBM
is interested in doing this. (There is a paper being presented at the
upcoming ASPLOS conference that does such measurements using Mach on a
number of architectures.)

I've also seen a comment or two that one shouldn't use page sharing because
segment sharing is more efficient. Unfortunately, shared memory regions are
used to represent things like object modules and message buffers. Anyone
who believes that 16 of these are enough is "a fascist pig with a read-only
mind." (Richard Gosper in HAKMEM item 154).

3) X Server.

The X Server software was pretty immature when I was using the RIOS. It
improved quite with each release back then. There are hardware problems
too. The low end 8 bit color frame buffer sits out on the MicroChannel.
(Most workstations today place the frame buffer on the processor memory
bus.) The MicroChannel is an I/O bus and has significantly higher latency
than memory. Also, all accesses to I/O space (necessary to get out to the
bus) are uncached. This eliminates certain cache tricks that X servers can
play for even higher performance. The mid-range IBM graphics card (based on
Silicon Graphics technology) doesn't allow the processor to write to the
frame buffer. This causes problems for certain X server operations.
(Though it is a good design for certain other applications.)

Zalman Stern, MIPS Computer Systems, 928 E. Arques 1-03, Sunnyvale, CA 94086
zalman@mips.com OR {ames,decwrl,prls,pyramid}!mips!zalman     (408) 524 8395
       "``Ah, so,'' said Daruma the O-maker" -- Tom Robbins