[comp.os.mach] External Pager Questions

francis@sirius.ucs.adelaide.edu.au (Francis Vaughan) (03/17/90)

A few people have recently requested info,or better, examples of external
pagers. I would second that request. I didn't see any replys on the net,
but if there has been some email conversations on the subject I would be
most grateful if I could get a summary.

Next a few questions on some finer points.

The Kernel Interface Manual refers to the external pager as a task. Is
there any inherent reason why it cannot be a thread within the same task as
the client for the externally paged memory object? (Sure, one would need to
be careful not to touch the memory object from within the pager to avoid a
potential infinite recursion.)

The memory_object_lock_request call allows one to lock a section of a
memory object. A few questions about it.

The manual says that the kernel will not page align the offset parameter.
Can this really be true? (or rather, make sense)

A memory_object_lock is applied to the cached memory (ie the physical
memory) and the memory is locked for access from all clients. How does this
relate to the use of vm_protect? Can I lock access to a section of an
externally paged memory object with vm_protect from a particular set of
tasks whilst leaving it at a different level of protection from others? Is
memory_object_lock_request (when used to protect memory) just shorthand for
vm_protect applied to all client tasks?

If I use vm_protect to deny access to part of a memory object from a
particular task, when that task touched the protected area will a
memory_object_data_unlock call be made on the external pager? (I can see
that this could be a problem at present as there is no parameter that
conveys the identity of the faulting task to the external pager.)

-------

And the reasons for my questions?

I need to catch accesses to pages of memory to build a coherent distributed
persistent data space. The catching mechanism (the external pager) needs to
be able to write onto the memory object it is serving, however it needs to
catch (when appropriate) read and writes to the page. Hence we deny access
to the page and wait for a memory_object_data_unlock call. However it
cannot write to the page (it is still locked) and it cannot unlock the
memory without violating the timing of the coherency logic (it must finish
updating the page before allowing the faulting task access to the page, and
there may be more than one thread running in the faulting task. So far the
external pager interface seems to just, but not quite have enough
functionality.

PS. A while ago I posted some questions about the use of the inode pager
and its relative merits. To date I have not seen a word in reply. Surely
someone has got something to say, I'll repost the questions if anyone
wants.



						Regards,
Dept of Computer Science                        Francis Vaughan
University of Adelaide                          francis@cs.ua.oz.au
South Australia

af@spice.cs.cmu.edu (Alessandro Forin) (03/19/90)

In article <821@sirius.ucs.adelaide.edu.au>, francis@sirius.ucs.adelaide.edu.au (Francis Vaughan) writes:
> 
> A few people have recently requested info,or better, examples of external
> pagers. I would second that request.

The Mach Netmemoryserver is included in the 2.5 tape, and it is a 
good, working example of an external memory manager providing 
coherent distributed shared memory objects to its clients.

> Is there any inherent reason why it cannot be a thread within the same 
> task as the client for the externally paged memory object? 

None.  I have used this myself in toy programs and works fine.
Only caveat: while debugging the program I got stung by inadvertedly
looking at that memory while, of course, the program was blocked.  Ouch!

> The memory_object_lock_request call allows one to lock a section of a
> memory object.

Weeelll, lock is used here not in the classical mutual exclusion
sense, but rather in the lock-against-read/write access sense.

> The manual says that the kernel will not page align the offset parameter.
> Can this really be true? (or rather, make sense)

The manual says "This must be page aligned", meaning the kernel will
get upset otherwise and give you a bad reply.

> How does this
> relate to the use of vm_protect? Can I lock access to a section of an
> externally paged memory object with vm_protect from a particular set of
> tasks whilst leaving it at a different level of protection from others? 

Vm_protect applies to an _individual_ task's view (mapping)
of the memory object and has nothing to do with the memory manager
locking policy.  In other words, the vm_protect protection is checked first
on a fault and only if this is ok does the kernel request the page (if 
missing from its cache).

> Is memory_object_lock_request (when used to protect memory) just shorthand 
> for vm_protect applied to all client tasks?

No, there is no trace whatsoever in Mach of VM management for groups of tasks.
And for very good reasons: just think at the embarassement you'll have 
defining a precise semantics in the distributed case.  [Atomicity ? Arumph..]
The external memory management interface is, in a sense, just how memory looks
"from the other side of the world" :-)).  

> If I use vm_protect to deny access to part of a memory object from a
> particular task, when that task touched the protected area will a
> memory_object_data_unlock call be made on the external pager? 

No, it will get an exception message.  External memory managers only 
understand KERNELS and their caches, they have no idea that tasks even exist.
[This is indeed the most common misunderstanding I have observed
 while explaining this subject to people.]

> (I can see
> that this could be a problem at present as there is no parameter that
> conveys the identity of the faulting task to the external pager.)

No problem at all: set the EXCEPTION_PORT of the specific task you want to
control to some port of yours and be prepared to handle all exception
messages for all threads in that task.  You will indeed be able to
excercise the kind of control you envision by proper use of vm_protect.
Note that the exception message will tell you exactly the address the thread 
faulted on, a much finer grain information.

> 
> I need to catch accesses to pages of memory to build a coherent distributed
> persistent data space. ....

Once again, I would advise to look carefully in the Netmemoryserver,
which should provide you with most of the functionality you
need.  Adding persistency will not be difficult, I think.
A good description of the server is in the CMU techrep CMU-CS-88-165,
reading it will certainly help you clarify your ideas and your design.

> PS. A while ago I posted some questions about the use of the inode pager
> and its relative merits. To date I have not seen a word in reply. Surely
> someone has got something to say, I'll repost the questions if anyone
> wants.

That's an internal component of the 2.5 kernels which noone (hopefully)
ever indicated as examplar use of the _external_ memory management interface.
As a matter of fact, Richard Draves recently rewrote it from scratch
to turn it into a multi-threaded server [this might not be in the 2.5
tape, I think it was about version X115].
For Mach3.0 we have completely different and crazy plans ;-)

sandro-

francis@chook.ua.oz (Francis Vaughan) (03/20/90)

Michael Young replyed to my origonal posting by email and requested that I
post his reply to comp.os.mach (as he has no posting access). Here it is.

-----------------------------

> A few people have recently requested info,or better, examples of external
> pagers. I would second that request. I didn't see any replys on the net,
> but if there has been some email conversations on the subject I would be
> most grateful if I could get a summary.

I know of two memory managers for which you can get sources:

	Mach NetMemoryServer.  Implements coherent network shared memory.
	       Distributed as part of the Mach release.

	Camelot DiskManager.  Implements recoverable virtual memory.
	       Distributed as part of the Camelot release.  Camelot
	       is a distributed transaction management facility that
	       was developed at CMU.  It makes heavy use of Mach features.

Work at CMU on out-of-kernel operating system (e.g., Unix) environments
makes significant use of the external memory management interface.  I don't
know whether those sources are being distributed yet.

> The Kernel Interface Manual refers to the external pager as a task. Is
> there any inherent reason why it cannot be a thread within the same task as
> the client for the externally paged memory object? (Sure, one would need to
> be careful not to touch the memory object from within the pager to avoid a
> potential infinite recursion.)

There is no restriction on the structure of a user-level memory manager
("external pager"), the entity that implements a memory object.  It will
normally be a separate task, but it might be a thread within the same
task, or it might be several tasks that work together.  The task to
which the documents refer is probably just the one that holds receive rights
to the memory object port.

> The manual says that the kernel will not page align the offset parameter.
> Can this really be true? (or rather, make sense)

Yes, it's both true and sensible.  For example, a file server (a memory
manager whose memory objects represent the contents of a file) might
permit its clients to map portions of a file at unusual offsets.  The
implementation of the Unix "execve" call in the Mach 2.5 system makes
use of this feature -- the text from an "a.out" file (at a Berkeley VM
page offset, meaning 1K, into the file) gets mapped to a page boundary
on the Vax architecture (and perhaps others).

A memory manager that provides one of its memory objects to clients on
more than one node already has to cope with multiple pages sizes.
For example, a NetMemoryServer running on host A with page size 4K
might provide service for a mapping on host B with page size 1K.
Requests for pages coming from host B might fall on any 1K boundary,
which may not be a page boundary on host A.  Hosts A and B may even
be of the same architecture.  In the limit (pagesize(host B) => 1),
this means that it's reasonable for a memory manager to accept *any* offset.

> A memory_object_lock is applied to the cached memory (ie the physical
> memory) and the memory is locked for access from all clients. How does this
> relate to the use of vm_protect? Can I lock access to a section of an
> externally paged memory object with vm_protect from a particular set of
> tasks whilst leaving it at a different level of protection from others? Is
> memory_object_lock_request (when used to protect memory) just shorthand for
> vm_protect applied to all client tasks?

The vm_protect value applies to a virtual address, the memory object
lock applies to physical address, and their result is *combined*.
They are separate mechanisms -- the memory object lock is not a shorthand
for calling vm_protect.  You may use vm_protect as you suggest to
get differential access, but the memory object lock will not help you.

> If I use vm_protect to deny access to part of a memory object from a
> particular task, when that task touched the protected area will a
> memory_object_data_unlock call be made on the external pager? (I can see
> that this could be a problem at present as there is no parameter that
> conveys the identity of the faulting task to the external pager.)

The memory object lock value applies to all tasks, including the task
that receives messages from the memory object port (which as described
above, may not be the task eventually responsible for satisfying the
request).

The memory manager can change the memory by cleaning it.  If your server
doesn't change the values regularly, this may be practical.

> there may be more than one thread running in the faulting task. So far the
> external pager interface seems to just, but not quite have enough
> functionality.

The external memory management interface was intended to provide fast
access to the main memory cache.  Identifying clients in the calls
is impractical.  Providing differential access could be added to the
interface by using additional "related" memory objects.

The interface intentionally avoids mentioning particular clients in the
memory object initialization and request calls.  Having to make a
separate call for each client would be wasteful in the normal case
where fully shared access to the cache is intended.  Identifying the
client would require some naming trickery.  The task/thread port
connotes full rights to abuse that entity.  If that port were used
to identify clients to a memory manager, a client would have to
*completely* trust *every* memory manager with which it does business.

Several people have suggested providing memory objects that are
restricted forms of another memory object.  For example, a file server
may want to provide full access to a memory object (file) to some
clients, but read-only access to others.  Creating a
second memory object that is declared to contain the same data as
the original, but for which all mappings are restricted, would suffice.
In your example, a related memory object for which you could change
the restriction at any time (e.g., change it from full access to
read-only, and then back again) would seem to suffice also.
This would provide 3 orthogonal protection mechanisms (vm_protect,
per-page memory object lock, object-wide memory object lock).
I MUST MAKE IT CLEAR THAT THIS FEATURE IS FICTIONAL -- it is not
provided by Mach 2.5, and probably won't be in Mach 3.0.  At one
point, I strongly opposed such a feature, but I now admit that it
may fulfill a real need.  It would be worth thinking through, but
I'm no longer in a position to do it.

af@spice.cs.cmu.edu (Alessandro Forin) (03/21/90)

I fear that the sum of my reply and Michael's caused some confusion on one
point: the page alignment restrictions.  I'll try to clarify it here.
[All readers interested in the External Memory Interface to the VM system
 should try to get hold of Michael's Ph.D. thesis.  I do not believe
 it is published yet, you'll have to ask Michael personally.  Another
 useful (but rather terse) document is the one that appeared in the
 1987 Symposium on Operating System Principles, which is part of the
 standard Mach doc-pack]

Michael's post provides the proper view from the user-down: a user can map 
any memory object at any offset with the vm_map(2) call.  The kernel will
preserve that offset, and pass it back to the memory manager in its requests.
Michael also provides the rationale for this, and how it can be used in
practice.  This still does not mean that Mach provides arbitrary-size
operations, for instance vm_map() will still map _at least_ a page worth of 
data.

I have provided the view from the memory manager's standpoint: kernels
work on a page-size basis, and all requests the memory manager
makes must be aware of this restriction.  In particular, the code for
memory_object_lock_request (vm/memory_object.c) does two things:
1) it round_page() the size argument
2) it page_lookup() the (object,offset)
After doing this it applies the operation to all pages in the range.
For instance, if you ask a kernel that has been booted with an 8k pagesize
to lock the range (123,2) of the object you will effectively lock the page(s)
that include that range (8k or even 16k).  Do not forget this restriction,
or you might be surprised.

I believe the Netmemoryserver is the only example of a program that deals
with the issue of serving kernels with different page sizes.  It does so
by chosing an (arbitrary but sensible) minimum page size internally and
mapping kernel requests into multiples of its internal page size.

I was definitely mistaken in saying that the offset must be page-aligned,
I was obviously thinking of the size argument and got confused by the 
obsolete copy of the manual that I checked.  The restriction on the
size argument still applies, as noted above.

sandro-