huff@svax.cs.cornell.edu (Richard Huff) (02/26/90)
This is a long posting. It pertains to the following threads of discussion: Inverted Page Tables, 64 bit addressing, and read() vs. mmap(). The following software examples come from HP's MPE XL commercial operating system, since I have worked in their development lab during the last 5 summers and know the OS well. But of course, HP doesn't even know that I am posting this note, so ... #include <std disclaimer> ! One last thing: this isn't an advertisement for HP-PA; I just don't know of any other architecture that supports 64-bit virtual addresses and inverted page tables. aglew@oberon.csg.uiuc.edu (Andy Glew) writes: ---- Now, let's see. Physical page addresses map to virtual? Via a hash function? What ever happened to hash chaining? :-) Especially if the TLB miss is handled in software (like on MIPS). I believe that studies have shown that the average number of links that get traversed during a TLB fault is between 1 and 2. So hash chaining does not extract a significant performance penalty. In fact, inverted page tables provide an extremely useful property: if a page happens to be in physical memory, then a TLB fault on that page can be handled without touching nonresident data. Contrast this with conventional multilevel page tables, which may cause a page fault while handling a user's TLB fault. Yuck! johnl@esegue.segue.boston.ma.us (John R. Levine) writes: ---- The ROMP, actually its mmu chip which is named Rosetta, handles this pretty well despite having a reverse map page table with no provision for aliasing. The high four bits of a memory address are looked up in a tiny fast memory to get a segment number. These segment numbers are global to the entire system. The file mapping we put into AIX maps a file to a segment, so if you map a file into several processes there is no aliasing problem; they use the same segment. The simpler (and cleaner) way to solve this problem, is to provide a 64 bit virtual address space for the entire system, rather than a separate 32-bit address space for each process. For example, consider HP-PA. Each virtual address is composed of a 32 space id (sid) and a 32 bit offset --- i.e., we have hardware imposed segmented addressing, but without the aliasing problems of the Intel brand of segmentation. At the lowest levels of the MPE XL file system, ALL files are memory mapped. Each file is guaranteed to be loaded at offset 0 within some unspecified sid. (Only the operating system itself, NL.PUB.SYS, is guaranteed to reside in a specific sid --- sid $A.) Since offset 0 is guaranteed, a file can contain linked data structures via 32-bit "short pointers", (more on this below). Currently, I think that only executable files (programs, and shared libraries), take advantage of this feature --- the linker/loader uses it to maintain various data structures that support dynamic linking and loading. A "short pointer" access is when the user does not (directly) specify a sid; instead, the top 2 bits of the address select one of 4 special sid registers (sr4 through sr7). A "long pointer" access is when the user specifies one of 4 other space registers (sr0 through sr3) explicitly. The user can change sr0 through sr4; changing sr5 through sr7 is a privileged operation. MPE XL sets sr4 to point to the program file's sid, sr5 to point to stack/heap space, and sr6/sr7 point to OS sid's. Since a file must fit within a single space, it's limited to 4GB in length. If 2 processes open a file as a memory mapped file, then they are given the pointer sid.0. (This generalizes the segmented addressing used in the ROMP, as I understand John's description.) If a process wishes to do read() and write() calls, then the file system will do the appropriate word-by-word copying from the file's space into the user's buffer; thus the file system buffers that conventional Unix uses are replaced by mapping the file into memory --- hence we don't have a fixed (at boot time) limit on the number of file buffers. In addition, pages get swapped out to their associated files. So we aren't limited by the size of a fixed swap file (as in Unix); we can use (almost) all of the available disk space for swap space. Note that since we have a large address space, we can afford to tell a user where a file resides, rather than let the user demand that the file get mapped to some particular location in the user's address space. So those who argue that mmap() should map a file to a user specified chunk of memory would destroy the elegance, and practicality, of the memory mapping model, solely because their favorite CPU does NOT provide a large enough virtual address space. Note that all forms of logically shared data fit the file sharing model discussed above. E.g., if you want to share a chunk of memory without calling it a file, MPE XL will let you do so; call create_object() --- the OS will internally manage the memory as an invisible file. If people were truly worried about "sharing data pages" in a system with inverted page tables, then this hardware architecture would solve all of their problems. The true issue is supporting Copy-On-Write, which appears to be a Holy COW :-) among some people; this feature does NOT involve logically sharing data --- it involves physically sharing data pages to represent logically distinct data, until someone breaks the illusion with a write access. I'd call that a hack! johnl@esegue.segue.boston.ma.us (John R. Levine) continues: ---- If you map a file copy-on-write, the aliasing problem does exist, and they handled that at the VRM level via soft page faults. The same problem occurs after a fork, since the pages of the stack and data segments are physically but not logically shared. In that case, they made them copy-on-touch, evidently because by the time they'd handled the soft page fault, the extra effort involved in making a copy of the page wasn't all that great. The Unix fork() policy of logically copying the stack is brain dead. If we had a general create_a_process() facility that allowed you to specify an arbitrary procedure or program, then we would not need to copy the stack --- nor even the process's global data. This is the approach that MPE XL takes. Besides, if we had a software policy of using ONLY short pointers to access a process's global data, then we could easily implement such a copy-on-touch, or even a copy-on-write, policy. The necessary protection mechanisms exist in HP-PA, but I will omit that discussion for now. The main point is that you could catch the process at either a first touch or a first write access. Then you could just change the appropriate space register to point to the newly allocated space with a copy of the data. MPE XL does NOT provide such a facility. lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) responds to johnl: ---- The segment idea could have been built on top of any viable paging scheme, and doesn't require the inverted scheme that was used. The inverted table (+ hash table and chains) mostly gets in the way when the software wants aliasing. Since the hardware designers knew that AIX does aliasing, could you explain why they still went with the inverted approach? I don't understand the net benefit, or any benefit. I would assert that with the large address space of HP-PA, software would NOT need address aliasing (aside from the fact that HP-PA explicitly does not allow 2 distinct 64 bit virtual addresses to point to the same physical address). Can anyone name a use for aliasing that wasn't originally designed to get around the problems and/or limitations of: a) small 32-bit virtual address spaces that were logically disjoint between processes, or b) the TLB/cache flush/reload penalty that (conventional) multilevel page tables exact in the presence of rapid context switching between zillions of processes Can anyone name a use for copy-on-write that wasn't designed to fix one of the above two problems (or to support the brain dead policies of fork(), as mentioned above) ? You don't see the net benefit of inverted page tables? How about solving the 2 problems that I mentioned above? MPE XL does not flush the TLB or cache upon context switching; so entering and exiting the kernel is essentially as cheap as a procedure call. In addition, by banning virtual address aliasing, we can support virtually addressed caches, which can be accessed in parallel with the TLB. The biggest benefit is being able to efficiently support huge sparse virtual address spaces. The original concept of page tables requires the tables to be linear in the size of the virtual address space. Then multilevel page tables were proposed as a way of shrinking the page table overhead to be closer to linear in the size of physical memory. Note that I said *closer* --- they don't fully achieve that goal, but inverted page tables do. One key feature of inverted page tables is the separation of the virtual-->physical and virtual-->secondary translations. The inverted page tables handle virtual-->physical translations using a memory resident page directory and associated hash table, each of which is linear in the size of PHYSICALLY INSTALLED memory. So these tables are even independent of the size of the physical address space, let alone the virtual address space! The virtual-->secondary translations are handled by software. In MPE XL, we use a variation on B-trees to map a virtual address to its object descriptor and extent map. (An "extent" is a maximally contiguous chunk of disk space that has been allocated to a single object.) So these B-trees are linear in the number of extents, which again is far less than the virtual address space. In addition, the B-trees don't need to be memory resident. Can anyone name another virtual memory scheme that can support a large sparse address space as efficiently as the inverted page table approach that I have outlined above? (Where "large address space" means at least 48 bits, and "efficiently" refers to the size of the data structures, especially with regard to memory resident data structures.) Finally, to learn more about HP-PA, get the architecture manual: HP 3000/930 and HP 9000/840 Computers Precision Architecture and Instruction Reference Manual manual part number 09740-90014 (All HP 3000 series 9xx and HP 9000 series 8xx computers use HP-PA. The 2 models mentioned above were simply the first models for sale.) ------- huff@svax.cs.cornell.edu (Richard Huff)
bvs@light.uucp (Bakul Shah) (02/27/90)
In article <37774@cornell.UUCP> huff@cs.cornell.edu (Richard Huff) writes: > ... >I would assert that with the large address space of HP-PA, software >would NOT need address aliasing (aside from the fact that HP-PA >explicitly does not allow 2 distinct 64 bit virtual addresses to point >to the same physical address). Can anyone name a use for aliasing that >wasn't originally designed to get around the problems and/or limitations >of: > > a) small 32-bit virtual address spaces that were logically disjoint > between processes, or > > b) the TLB/cache flush/reload penalty that (conventional) multilevel > page tables exact in the presence of rapid context switching > between zillions of processes > >Can anyone name a use for copy-on-write that wasn't designed to fix one >of the above two problems (or to support the brain dead policies of >fork(), as mentioned above) ? Aliasing is needed if you want to map a segment at more than one virtual address (either in the same space or different ones). A small 32-bit virtual address space is *not* the reason for doing so. For instance, I may want to access the same segment as readonly as well as read/write from the same address space. Not allowing multiple virt. address to point to the same phys page also disallows copy-on-write. Copy-on-write is not used to solve a) or b) above. It is merely an optimization, albeit a very useful one. It has two benefits in cases where atleast page size data gets copied but majority of such data is never touched again: a) time efficiency: overhead for implementing COW is a lot less than copying everything. Only subsequently changed pages are copied. b) space efficiency: overhead for maintaining COW data structures is a lot less than allocating pages for copied data. Only subsequently changed pages are allocated real pages. Fork() happens to fall in this category but it isn't the only one. Other examples are read(), shared libraries, debugging a shared- text program, IPC where you pass lotsa data, atleast one way of doing nested transactions, etc. It is a generic lazy evaluation technique, not just limited to virtual memory accesses (or if you prefer, excesses :-). COW is what you need for building (slowly changing) filesystem/database on WORM devices. The editor I use does a COW of an entire file if I change it (breaks its link). Allows me to maintain multiple source trees without maintaining a zillion copies of identical files. The paradigm here is to share everything but make a private copy if you change something. >Can anyone name another virtual memory scheme that can support a large >sparse address space as efficiently as the inverted page table approach >that I have outlined above? (Where "large address space" means at least >48 bits, and "efficiently" refers to the size of the data structures, >especially with regard to memory resident data structures.) You may want to read up about Mach. It seems to handle large sparse address spaces just fine for a large variety of hardware page mapping structures. -- Bakul Shah ..!{ames,sun,ucbvax,uunet}!amdcad!light!bvs