[comp.arch] HP-PA and MPE XL

huff@svax.cs.cornell.edu (Richard Huff) (02/26/90)

This is a long posting.  It pertains to the following threads of 
discussion:  Inverted Page Tables, 64 bit addressing, and read() vs. 
mmap().

The following software examples come from HP's MPE XL commercial 
operating system, since I have worked in their development lab during 
the last 5 summers and know the OS well.  But of course, HP doesn't even 
know that I am posting this note, so ... #include <std disclaimer> !

One last thing: this isn't an advertisement for HP-PA; I just don't know 
of any other architecture that supports 64-bit virtual addresses and 
inverted page tables. 

   aglew@oberon.csg.uiuc.edu (Andy Glew) writes:
   ----
   Now, let's see.  Physical page addresses map to virtual?  Via a hash
   function? What ever happened to hash chaining? :-) Especially if the
   TLB miss is handled in software (like on MIPS).

I believe that studies have shown that the average number of links that 
get traversed during a TLB fault is between 1 and 2.  So hash chaining 
does not extract a significant performance penalty.  In fact, inverted 
page tables provide an extremely useful property: if a page happens to 
be in physical memory, then a TLB fault on that page can be handled 
without touching nonresident data.  Contrast this with conventional 
multilevel page tables, which may cause a page fault while handling a 
user's TLB fault.  Yuck!

   johnl@esegue.segue.boston.ma.us (John R. Levine) writes:
   ----
   The ROMP, actually its mmu chip which is named Rosetta, handles this 
   pretty well despite having a reverse map page table with no provision 
   for aliasing. The high four bits of a memory address are looked up in 
   a tiny fast memory to get a segment number.  These segment numbers 
   are global to the entire system.  The file mapping we put into AIX 
   maps a file to a segment, so if you map a file into several processes 
   there is no aliasing problem; they use the same segment.

The simpler (and cleaner) way to solve this problem, is to provide a 64 
bit virtual address space for the entire system, rather than a separate 
32-bit address space for each process.  For example, consider HP-PA.  
Each virtual address is composed of a 32 space id (sid) and a 32 bit 
offset --- i.e., we have hardware imposed segmented addressing, but 
without the aliasing problems of the Intel brand of segmentation.

At the lowest levels of the MPE XL file system, ALL files are memory 
mapped.  Each file is guaranteed to be loaded at offset 0 within some 
unspecified sid.  (Only the operating system itself, NL.PUB.SYS, is 
guaranteed to reside in a specific sid --- sid $A.)  Since offset 0 is 
guaranteed, a file can contain linked data structures via 32-bit "short 
pointers", (more on this below).  Currently, I think that only 
executable files (programs, and shared libraries), take advantage of 
this feature --- the linker/loader uses it to maintain various data 
structures that support dynamic linking and loading.

A "short pointer" access is when the user does not (directly) specify a 
sid; instead, the top 2 bits of the address select one of 4 special sid 
registers (sr4 through sr7).  A "long pointer" access is when the user 
specifies one of 4 other space registers (sr0 through sr3) explicitly.  
The user can change sr0 through sr4; changing sr5 through sr7 is a 
privileged operation.  MPE XL sets sr4 to point to the program file's 
sid, sr5 to point to stack/heap space, and sr6/sr7 point to OS sid's.

Since a file must fit within a single space, it's limited to 4GB in 
length.  If 2 processes open a file as a memory mapped file, then they 
are given the pointer sid.0.  (This generalizes the segmented addressing 
used in the ROMP, as I understand John's description.)  If a process 
wishes to do read() and write() calls, then the file system will do the 
appropriate word-by-word copying from the file's space into the user's 
buffer; thus the file system buffers that conventional Unix uses are 
replaced by mapping the file into memory --- hence we don't have a fixed 
(at boot time) limit on the number of file buffers.  In addition, pages 
get swapped out to their associated files.  So we aren't limited by the 
size of a fixed swap file (as in Unix); we can use (almost) all of the 
available disk space for swap space.

Note that since we have a large address space, we can afford to tell 
a user where a file resides, rather than let the user demand that the 
file get mapped to some particular location in the user's address space.  
So those who argue that mmap() should map a file to a user specified 
chunk of memory would destroy the elegance, and practicality, of the 
memory mapping model, solely because their favorite CPU does NOT provide 
a large enough virtual address space.

Note that all forms of logically shared data fit the file sharing model 
discussed above.  E.g., if you want to share a chunk of memory without 
calling it a file, MPE XL will let you do so; call create_object() --- 
the OS will internally manage the memory as an invisible file.

If people were truly worried about "sharing data pages" in a system with 
inverted page tables, then this hardware architecture would solve all of 
their problems.  The true issue is supporting Copy-On-Write, which 
appears to be a Holy COW :-) among some people; this feature does NOT 
involve logically sharing data --- it involves physically sharing data 
pages to represent logically distinct data, until someone breaks the 
illusion with a write access.  I'd call that a hack!

   johnl@esegue.segue.boston.ma.us (John R. Levine) continues:
   ----
   If you map a file copy-on-write, the aliasing problem does exist, and 
   they handled that at the VRM level via soft page faults.  The same 
   problem occurs after a fork, since the pages of the stack and data 
   segments are physically but not logically shared.  In that case, they 
   made them copy-on-touch, evidently because by the time they'd handled 
   the soft page fault, the extra effort involved in making a copy of 
   the page wasn't all that great.

The Unix fork() policy of logically copying the stack is brain dead.  If 
we had a general create_a_process() facility that allowed you to specify 
an arbitrary procedure or program, then we would not need to copy the 
stack --- nor even the process's global data.  This is the approach that 
MPE XL takes.  Besides, if we had a software policy of using ONLY short 
pointers to access a process's global data, then we could easily 
implement such a copy-on-touch, or even a copy-on-write, policy.  The 
necessary protection mechanisms exist in HP-PA, but I will omit that 
discussion for now.  The main point is that you could catch the process 
at either a first touch or a first write access.  Then you could just 
change the appropriate space register to point to the newly allocated 
space with a copy of the data.  MPE XL does NOT provide such a facility.

   lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) responds to johnl:
   ----
   The segment idea could have been built on top of any viable paging 
   scheme, and doesn't require the inverted scheme that was used.

   The inverted table (+ hash table and chains) mostly gets in the way 
   when the software wants aliasing. Since the hardware designers knew    
   that AIX does aliasing, could you explain why they still went with 
   the inverted approach? I don't understand the net benefit, or any 
   benefit.

I would assert that with the large address space of HP-PA, software 
would NOT need address aliasing (aside from the fact that HP-PA 
explicitly does not allow 2 distinct 64 bit virtual addresses to point 
to the same physical address).  Can anyone name a use for aliasing that 
wasn't originally designed to get around the problems and/or limitations 
of:

   a)  small 32-bit virtual address spaces that were logically disjoint 
       between processes, or

   b)  the TLB/cache flush/reload penalty that (conventional) multilevel 
       page tables exact in the presence of rapid context switching 
       between zillions of processes

Can anyone name a use for copy-on-write that wasn't designed to fix one 
of the above two problems (or to support the brain dead policies of 
fork(), as mentioned above) ?

You don't see the net benefit of inverted page tables?  How about 
solving the 2 problems that I mentioned above?  MPE XL does not flush 
the TLB or cache upon context switching; so entering and exiting the 
kernel is essentially as cheap as a procedure call.  In addition, by 
banning virtual address aliasing, we can support virtually addressed 
caches, which can be accessed in parallel with the TLB.

The biggest benefit is being able to efficiently support huge sparse 
virtual address spaces.  The original concept of page tables requires 
the tables to be linear in the size of the virtual address space.  Then 
multilevel page tables were proposed as a way of shrinking the page 
table overhead to be closer to linear in the size of physical memory.  
Note that I said *closer* --- they don't fully achieve that goal, but 
inverted page tables do.  One key feature of inverted page tables is the 
separation of the virtual-->physical and virtual-->secondary 
translations.  The inverted page tables handle virtual-->physical 
translations using a memory resident page directory and associated hash 
table, each of which is linear in the size of PHYSICALLY INSTALLED 
memory.  So these tables are even independent of the size of the 
physical address space, let alone the virtual address space!  The 
virtual-->secondary translations are handled by software.  In MPE XL, we 
use a variation on B-trees to map a virtual address to its object 
descriptor and extent map.  (An "extent" is a maximally contiguous chunk 
of disk space that has been allocated to a single object.)  So these
B-trees are linear in the number of extents, which again is far less 
than the virtual address space.  In addition, the B-trees don't need to 
be memory resident.

Can anyone name another virtual memory scheme that can support a large 
sparse address space as efficiently as the inverted page table approach 
that I have outlined above?  (Where "large address space" means at least 
48 bits, and "efficiently" refers to the size of the data structures, 
especially with regard to memory resident data structures.)


Finally, to learn more about HP-PA, get the architecture manual:

HP 3000/930 and HP 9000/840 Computers
Precision Architecture and Instruction
Reference Manual
manual part number 09740-90014

(All HP 3000 series 9xx and HP 9000 series 8xx computers use HP-PA.
 The 2 models mentioned above were simply the first models for sale.)

-------
huff@svax.cs.cornell.edu (Richard Huff)

bvs@light.uucp (Bakul Shah) (02/27/90)

In article <37774@cornell.UUCP> huff@cs.cornell.edu (Richard Huff) writes:
>       ...
>I would assert that with the large address space of HP-PA, software 
>would NOT need address aliasing (aside from the fact that HP-PA 
>explicitly does not allow 2 distinct 64 bit virtual addresses to point 
>to the same physical address).  Can anyone name a use for aliasing that 
>wasn't originally designed to get around the problems and/or limitations 
>of:
>
>   a)  small 32-bit virtual address spaces that were logically disjoint 
>       between processes, or
>
>   b)  the TLB/cache flush/reload penalty that (conventional) multilevel 
>       page tables exact in the presence of rapid context switching 
>       between zillions of processes
>
>Can anyone name a use for copy-on-write that wasn't designed to fix one 
>of the above two problems (or to support the brain dead policies of 
>fork(), as mentioned above) ?

Aliasing is needed if you want to map a segment at more than one
virtual address (either in the same space or different ones).  A
small 32-bit virtual address space is *not* the reason for doing
so.  For instance, I may want to access the same segment as
readonly as well as read/write from the same address space.  Not
allowing multiple virt. address to point to the same phys page
also disallows copy-on-write.

Copy-on-write is not used to solve a) or b) above.  It is merely
an optimization, albeit a very useful one.  It has two benefits in
cases where atleast page size data gets copied but majority of
such data is never touched again:

	a) time efficiency: overhead for implementing COW is a lot
	   less than copying everything.  Only subsequently
	   changed pages are copied.

	b) space efficiency: overhead for maintaining COW data
	   structures is a lot less than allocating pages for
	   copied data.  Only subsequently changed pages are
	   allocated real pages.

Fork() happens to fall in this category but it isn't the only one.
Other examples are read(), shared libraries, debugging a shared-
text program, IPC where you pass lotsa data, atleast one way of
doing nested transactions, etc.

It is a generic lazy evaluation technique, not just limited to
virtual memory accesses (or if you prefer, excesses :-).  COW is
what you need for building (slowly changing) filesystem/database
on WORM devices.  The editor I use does a COW of an entire file if
I change it (breaks its link).  Allows me to maintain multiple
source trees without maintaining a zillion copies of identical
files.  The paradigm here is to share everything but make a
private copy if you change something.

>Can anyone name another virtual memory scheme that can support a large 
>sparse address space as efficiently as the inverted page table approach 
>that I have outlined above?  (Where "large address space" means at least 
>48 bits, and "efficiently" refers to the size of the data structures, 
>especially with regard to memory resident data structures.)

You may want to read up about Mach.  It seems to handle large
sparse address spaces just fine for a large variety of hardware
page mapping structures.

-- Bakul Shah
   ..!{ames,sun,ucbvax,uunet}!amdcad!light!bvs