francis@sirius.ucs.adelaide.edu.au (Francis Vaughan) (03/17/90)
A few people have recently requested info,or better, examples of external pagers. I would second that request. I didn't see any replys on the net, but if there has been some email conversations on the subject I would be most grateful if I could get a summary. Next a few questions on some finer points. The Kernel Interface Manual refers to the external pager as a task. Is there any inherent reason why it cannot be a thread within the same task as the client for the externally paged memory object? (Sure, one would need to be careful not to touch the memory object from within the pager to avoid a potential infinite recursion.) The memory_object_lock_request call allows one to lock a section of a memory object. A few questions about it. The manual says that the kernel will not page align the offset parameter. Can this really be true? (or rather, make sense) A memory_object_lock is applied to the cached memory (ie the physical memory) and the memory is locked for access from all clients. How does this relate to the use of vm_protect? Can I lock access to a section of an externally paged memory object with vm_protect from a particular set of tasks whilst leaving it at a different level of protection from others? Is memory_object_lock_request (when used to protect memory) just shorthand for vm_protect applied to all client tasks? If I use vm_protect to deny access to part of a memory object from a particular task, when that task touched the protected area will a memory_object_data_unlock call be made on the external pager? (I can see that this could be a problem at present as there is no parameter that conveys the identity of the faulting task to the external pager.) ------- And the reasons for my questions? I need to catch accesses to pages of memory to build a coherent distributed persistent data space. The catching mechanism (the external pager) needs to be able to write onto the memory object it is serving, however it needs to catch (when appropriate) read and writes to the page. Hence we deny access to the page and wait for a memory_object_data_unlock call. However it cannot write to the page (it is still locked) and it cannot unlock the memory without violating the timing of the coherency logic (it must finish updating the page before allowing the faulting task access to the page, and there may be more than one thread running in the faulting task. So far the external pager interface seems to just, but not quite have enough functionality. PS. A while ago I posted some questions about the use of the inode pager and its relative merits. To date I have not seen a word in reply. Surely someone has got something to say, I'll repost the questions if anyone wants. Regards, Dept of Computer Science Francis Vaughan University of Adelaide francis@cs.ua.oz.au South Australia
af@spice.cs.cmu.edu (Alessandro Forin) (03/19/90)
In article <821@sirius.ucs.adelaide.edu.au>, francis@sirius.ucs.adelaide.edu.au (Francis Vaughan) writes: > > A few people have recently requested info,or better, examples of external > pagers. I would second that request. The Mach Netmemoryserver is included in the 2.5 tape, and it is a good, working example of an external memory manager providing coherent distributed shared memory objects to its clients. > Is there any inherent reason why it cannot be a thread within the same > task as the client for the externally paged memory object? None. I have used this myself in toy programs and works fine. Only caveat: while debugging the program I got stung by inadvertedly looking at that memory while, of course, the program was blocked. Ouch! > The memory_object_lock_request call allows one to lock a section of a > memory object. Weeelll, lock is used here not in the classical mutual exclusion sense, but rather in the lock-against-read/write access sense. > The manual says that the kernel will not page align the offset parameter. > Can this really be true? (or rather, make sense) The manual says "This must be page aligned", meaning the kernel will get upset otherwise and give you a bad reply. > How does this > relate to the use of vm_protect? Can I lock access to a section of an > externally paged memory object with vm_protect from a particular set of > tasks whilst leaving it at a different level of protection from others? Vm_protect applies to an _individual_ task's view (mapping) of the memory object and has nothing to do with the memory manager locking policy. In other words, the vm_protect protection is checked first on a fault and only if this is ok does the kernel request the page (if missing from its cache). > Is memory_object_lock_request (when used to protect memory) just shorthand > for vm_protect applied to all client tasks? No, there is no trace whatsoever in Mach of VM management for groups of tasks. And for very good reasons: just think at the embarassement you'll have defining a precise semantics in the distributed case. [Atomicity ? Arumph..] The external memory management interface is, in a sense, just how memory looks "from the other side of the world" :-)). > If I use vm_protect to deny access to part of a memory object from a > particular task, when that task touched the protected area will a > memory_object_data_unlock call be made on the external pager? No, it will get an exception message. External memory managers only understand KERNELS and their caches, they have no idea that tasks even exist. [This is indeed the most common misunderstanding I have observed while explaining this subject to people.] > (I can see > that this could be a problem at present as there is no parameter that > conveys the identity of the faulting task to the external pager.) No problem at all: set the EXCEPTION_PORT of the specific task you want to control to some port of yours and be prepared to handle all exception messages for all threads in that task. You will indeed be able to excercise the kind of control you envision by proper use of vm_protect. Note that the exception message will tell you exactly the address the thread faulted on, a much finer grain information. > > I need to catch accesses to pages of memory to build a coherent distributed > persistent data space. .... Once again, I would advise to look carefully in the Netmemoryserver, which should provide you with most of the functionality you need. Adding persistency will not be difficult, I think. A good description of the server is in the CMU techrep CMU-CS-88-165, reading it will certainly help you clarify your ideas and your design. > PS. A while ago I posted some questions about the use of the inode pager > and its relative merits. To date I have not seen a word in reply. Surely > someone has got something to say, I'll repost the questions if anyone > wants. That's an internal component of the 2.5 kernels which noone (hopefully) ever indicated as examplar use of the _external_ memory management interface. As a matter of fact, Richard Draves recently rewrote it from scratch to turn it into a multi-threaded server [this might not be in the 2.5 tape, I think it was about version X115]. For Mach3.0 we have completely different and crazy plans ;-) sandro-
francis@chook.ua.oz (Francis Vaughan) (03/20/90)
Michael Young replyed to my origonal posting by email and requested that I post his reply to comp.os.mach (as he has no posting access). Here it is. ----------------------------- > A few people have recently requested info,or better, examples of external > pagers. I would second that request. I didn't see any replys on the net, > but if there has been some email conversations on the subject I would be > most grateful if I could get a summary. I know of two memory managers for which you can get sources: Mach NetMemoryServer. Implements coherent network shared memory. Distributed as part of the Mach release. Camelot DiskManager. Implements recoverable virtual memory. Distributed as part of the Camelot release. Camelot is a distributed transaction management facility that was developed at CMU. It makes heavy use of Mach features. Work at CMU on out-of-kernel operating system (e.g., Unix) environments makes significant use of the external memory management interface. I don't know whether those sources are being distributed yet. > The Kernel Interface Manual refers to the external pager as a task. Is > there any inherent reason why it cannot be a thread within the same task as > the client for the externally paged memory object? (Sure, one would need to > be careful not to touch the memory object from within the pager to avoid a > potential infinite recursion.) There is no restriction on the structure of a user-level memory manager ("external pager"), the entity that implements a memory object. It will normally be a separate task, but it might be a thread within the same task, or it might be several tasks that work together. The task to which the documents refer is probably just the one that holds receive rights to the memory object port. > The manual says that the kernel will not page align the offset parameter. > Can this really be true? (or rather, make sense) Yes, it's both true and sensible. For example, a file server (a memory manager whose memory objects represent the contents of a file) might permit its clients to map portions of a file at unusual offsets. The implementation of the Unix "execve" call in the Mach 2.5 system makes use of this feature -- the text from an "a.out" file (at a Berkeley VM page offset, meaning 1K, into the file) gets mapped to a page boundary on the Vax architecture (and perhaps others). A memory manager that provides one of its memory objects to clients on more than one node already has to cope with multiple pages sizes. For example, a NetMemoryServer running on host A with page size 4K might provide service for a mapping on host B with page size 1K. Requests for pages coming from host B might fall on any 1K boundary, which may not be a page boundary on host A. Hosts A and B may even be of the same architecture. In the limit (pagesize(host B) => 1), this means that it's reasonable for a memory manager to accept *any* offset. > A memory_object_lock is applied to the cached memory (ie the physical > memory) and the memory is locked for access from all clients. How does this > relate to the use of vm_protect? Can I lock access to a section of an > externally paged memory object with vm_protect from a particular set of > tasks whilst leaving it at a different level of protection from others? Is > memory_object_lock_request (when used to protect memory) just shorthand for > vm_protect applied to all client tasks? The vm_protect value applies to a virtual address, the memory object lock applies to physical address, and their result is *combined*. They are separate mechanisms -- the memory object lock is not a shorthand for calling vm_protect. You may use vm_protect as you suggest to get differential access, but the memory object lock will not help you. > If I use vm_protect to deny access to part of a memory object from a > particular task, when that task touched the protected area will a > memory_object_data_unlock call be made on the external pager? (I can see > that this could be a problem at present as there is no parameter that > conveys the identity of the faulting task to the external pager.) The memory object lock value applies to all tasks, including the task that receives messages from the memory object port (which as described above, may not be the task eventually responsible for satisfying the request). The memory manager can change the memory by cleaning it. If your server doesn't change the values regularly, this may be practical. > there may be more than one thread running in the faulting task. So far the > external pager interface seems to just, but not quite have enough > functionality. The external memory management interface was intended to provide fast access to the main memory cache. Identifying clients in the calls is impractical. Providing differential access could be added to the interface by using additional "related" memory objects. The interface intentionally avoids mentioning particular clients in the memory object initialization and request calls. Having to make a separate call for each client would be wasteful in the normal case where fully shared access to the cache is intended. Identifying the client would require some naming trickery. The task/thread port connotes full rights to abuse that entity. If that port were used to identify clients to a memory manager, a client would have to *completely* trust *every* memory manager with which it does business. Several people have suggested providing memory objects that are restricted forms of another memory object. For example, a file server may want to provide full access to a memory object (file) to some clients, but read-only access to others. Creating a second memory object that is declared to contain the same data as the original, but for which all mappings are restricted, would suffice. In your example, a related memory object for which you could change the restriction at any time (e.g., change it from full access to read-only, and then back again) would seem to suffice also. This would provide 3 orthogonal protection mechanisms (vm_protect, per-page memory object lock, object-wide memory object lock). I MUST MAKE IT CLEAR THAT THIS FEATURE IS FICTIONAL -- it is not provided by Mach 2.5, and probably won't be in Mach 3.0. At one point, I strongly opposed such a feature, but I now admit that it may fulfill a real need. It would be worth thinking through, but I'm no longer in a position to do it.
af@spice.cs.cmu.edu (Alessandro Forin) (03/21/90)
I fear that the sum of my reply and Michael's caused some confusion on one point: the page alignment restrictions. I'll try to clarify it here. [All readers interested in the External Memory Interface to the VM system should try to get hold of Michael's Ph.D. thesis. I do not believe it is published yet, you'll have to ask Michael personally. Another useful (but rather terse) document is the one that appeared in the 1987 Symposium on Operating System Principles, which is part of the standard Mach doc-pack] Michael's post provides the proper view from the user-down: a user can map any memory object at any offset with the vm_map(2) call. The kernel will preserve that offset, and pass it back to the memory manager in its requests. Michael also provides the rationale for this, and how it can be used in practice. This still does not mean that Mach provides arbitrary-size operations, for instance vm_map() will still map _at least_ a page worth of data. I have provided the view from the memory manager's standpoint: kernels work on a page-size basis, and all requests the memory manager makes must be aware of this restriction. In particular, the code for memory_object_lock_request (vm/memory_object.c) does two things: 1) it round_page() the size argument 2) it page_lookup() the (object,offset) After doing this it applies the operation to all pages in the range. For instance, if you ask a kernel that has been booted with an 8k pagesize to lock the range (123,2) of the object you will effectively lock the page(s) that include that range (8k or even 16k). Do not forget this restriction, or you might be surprised. I believe the Netmemoryserver is the only example of a program that deals with the issue of serving kernels with different page sizes. It does so by chosing an (arbitrary but sensible) minimum page size internally and mapping kernel requests into multiples of its internal page size. I was definitely mistaken in saying that the offset must be page-aligned, I was obviously thinking of the size argument and got confused by the obsolete copy of the manual that I checked. The restriction on the size argument still applies, as noted above. sandro-