[comp.os.mach] Do NetMsgServers really extend Mach IPC to network TRANSPARENTLY?

cy01@gte.com (Che-Liang Yang) (06/12/91)

Do MSD 2.6 NetMsgServers really extend Mach IPC to network TRANSPARENTLY?
I am not sure.  Here are three examples.


Example 1:

Suppose we have two Mach machines A and B.
Task A (in Machine A) creates a port, PORT A,
and gives the ownership right to Task B in Machine B.
According to the implementation of MSD 2.6 NetMsgServer:
in Kernel A, Task A has receive right (to Port A)
where NetMsgServer A has ownership right;
in Kernel B, Task B has ownership right
where NetMsgServer B has receive right.

Now suppose that Task B exits.
NetMsgServer A must transfer the ownership right back to Task A
while still retaining the send right.
According to the implementation, NetMsgServer A:
1. first msg_send itself a message with the send right to Port A,
2. port_deallocate Port A, (This will transfer the ownership right
   to Task A transparently.)
3. msg_receive the message with send right to Port A.

The problem is that the new port name obtained for Port A at step 3
is not the same as that in step 1 where the port record
for Port A in NetMsgServer A still uses the old name.
The consequence is that now no tasks on remote machines can
send messages to Port A (because NetMsgServer A will
relay the message to an invalid port.)
Further, if later Task A exits:
1. the logical name for Port A (checked in by Task A)
   will still hang in NetMsgServer A;
2. none of remote tasks with send rights to Port A will get notified.

I solved this problem by having NetMsgServer A
port_rename the new name to the old name.
Although this works for Mach 2.5 kernel,
this is not a safe solution because I assume that
the kernel will not immediately reuse the name assigned to a
port just deallocated.

So, my question is:
Under Mach 2.5 kernel, is there any safe way for a task
to transfer ownership right to another task
(which already has receive right) without sending a message
while still maintaining send right with the same name?


Example 2:

Task A creates a port, Port A,
and sends the receive/ownership right in a message, Msg A,
to Task B in Machine B.
Further, in Msg A, the "msg_type_deallocate" bit is on.

The problem is that when Task B receives Msg A (across the network),
the "msg_type_deallocate" bit was turned off by NetMsgServer B.
(NetMsgServer B has to turn it off when relaying Msg A
because it wants to retain the send right to Port A.)

Normally, this is not a big deal.
But, for applications that use MIG, this will cause the
server stub to reject request messages from remote machines.
In fact, an external pager in MSD 2.6 will never accept a
"memory_object_terminate" request from a remote kernel.


Example 3:

Task A creates Port A and Task B creates Port B.
Task A calls port_set_backup with primary = Port A
and backup = Port B.
Now, suppose that Machine A crashes.
I am sure that Task B will not receive the receive right to Port A
because when port_set_backup was called,
NetMsgServer A and then B were not informed by Kernel A.

This convinces me that the ownership abstraction is still useful
(for a task to back up a remote task) and should not be deleted.


C-L Yang

dpj@CS.CMU.EDU (Daniel Julin) (06/13/91)

In article <11355@bunny.GTE.COM> cy01@gte.com (Che-Liang Yang) writes:

> Example 1:
>
> Suppose we have two Mach machines A and B.
> Task A (in Machine A) creates a port, PORT A,
> and gives the ownership right to Task B in Machine B.
> According to the implementation of MSD 2.6 NetMsgServer:
> in Kernel A, Task A has receive right (to Port A)
> where NetMsgServer A has ownership right;
> in Kernel B, Task B has ownership right
> where NetMsgServer B has receive right.
>
> Now suppose that Task B exits.
> NetMsgServer A must transfer the ownership right back to Task A
> while still retaining the send right.
> According to the implementation, NetMsgServer A:
> 1. first msg_send itself a message with the send right to Port A,
> 2. port_deallocate Port A, (This will transfer the ownership right
>    to Task A transparently.)
> 3. msg_receive the message with send right to Port A.
>
> The problem is that the new port name obtained for Port A at step 3
> is not the same as that in step 1 where the port record
> for Port A in NetMsgServer A still uses the old name.
> The consequence is that now no tasks on remote machines can
> send messages to Port A (because NetMsgServer A will
> relay the message to an invalid port.)
> Further, if later Task A exits:
> 1. the logical name for Port A (checked in by Task A)
>    will still hang in NetMsgServer A;
> 2. none of remote tasks with send rights to Port A will get notified.
>
> I solved this problem by having NetMsgServer A
> port_rename the new name to the old name.
> Although this works for Mach 2.5 kernel,
> this is not a safe solution because I assume that
> the kernel will not immediately reuse the name assigned to a
> port just deallocated.
>
> So, my question is:
> Under Mach 2.5 kernel, is there any safe way for a task
> to transfer ownership right to another task
> (which already has receive right) without sending a message
> while still maintaining send right with the same name?

This looks like a plain bug in the netmsgserver implementation, not
any particular design problem. The fact that this bug has never been
reported until now shows how often ownership rights are actually used,
and partly justifies their removal from the IPC model... (see my
additional comments below on this matter)

The correct fix is not to rename the port right, but simply to update
the port record for the port in question to have the new name for the
port. Presumably, this port record is already locked while this rights
transfer operation is in progress, so there should not be any problems
with such an update.

In Mach 3.0, it is possible to transfer specific rights separately, so
that all those msg_send/msg_receive gymnastics would not be necessary
at all.  But of course, there are no ownership rights in 3.0, so this
particular issue is moot.


> Example 2:
>
> Task A creates a port, Port A,
> and sends the receive/ownership right in a message, Msg A,
> to Task B in Machine B.
> Further, in Msg A, the "msg_type_deallocate" bit is on.
>
> The problem is that when Task B receives Msg A (across the network),
> the "msg_type_deallocate" bit was turned off by NetMsgServer B.
> (NetMsgServer B has to turn it off when relaying Msg A
> because it wants to retain the send right to Port A.)
>
> Normally, this is not a big deal.
> But, for applications that use MIG, this will cause the
> server stub to reject request messages from remote machines.
> In fact, an external pager in MSD 2.6 will never accept a
> "memory_object_terminate" request from a remote kernel.

This is a basic problem with the old IPC model. It has been solved in
newer kernels by always forcing the "deallocate" bits to zero before
any message is delivered to a receiver. With that change, both the
kernel and the netmsgserver do the same thing. This "deallocate" bit
is really a matter between the sender and the kernel anyway, and
should never have been made visible to receivers.

In general, there are a number of "rough edges" like this one in the
old IPC model, that the new (3.0) IPC model attempts to eliminate. See
Rich Draves's paper on IPC at the 1990 USENIX Mach workshop for
details.

I don't know exactly how the netmemory server in 2.5 gets around this
problem, but I am told that it does work. My guess is that it is
compiled with a different version of Mig, or with UseStaticMsgType or
TypeCheck turned off.


> Example 3:
>
> Task A creates Port A and Task B creates Port B.
> Task A calls port_set_backup with primary = Port A
> and backup = Port B.
> Now, suppose that Machine A crashes.
> I am sure that Task B will not receive the receive right to Port A
> because when port_set_backup was called,
> NetMsgServer A and then B were not informed by Kernel A.
>
> This convinces me that the ownership abstraction is still useful
> (for a task to back up a remote task) and should not be deleted.

Ownership rights increase the complexity and size of the netmsgserver
as well the kernel.  Even with the existing 2.5 netmsgserver, which
does support ownership rights, there are cases involving host crashes
where messages sent to ports with split ownership and receive rights
may be lost. Handling those cases would introduce even more complexity
and overhead, and turn the netmsgserver into something close to a
full-blown transaction facility.

Conversely, I know of only one application of ownership rights in 2.5:
to allow the service server to keep track of the name server port. In
general, ownership rights are not a very flexible mechanism on which
to base resilient distributed applications, because they are limited
to a single "backup" site for each port and allow very little
variation in how they can be used.

Therefore, it was decided that such replication services could be more
usefully implemented in various application-level facilities built on
top of the simple IPC facility, and not integrated with it.  Ownership
rights have been eliminated in the IPC model in 3.0 (and even in some
late versions of the 2.5/2.6 kernel). Backup ports have been
introduced solely for the purpose of replacing the use of ownership
rights in the service server. They are not extended over the network.



======================================================================
Daniel Julin                                            dpj@cs.cmu.edu
School of Computer Science
Carnegie Mellon University, Pittsburgh, PA 15213
======================================================================

rds+@CS.CMU.EDU (Robert Sansom) (06/13/91)

In article <DPJ.91Jun12134524@NATASHA.MACH.CS.CMU.EDU>, dpj@CS.CMU.EDU (Daniel Julin) writes:
|> 
|> In article <11355@bunny.GTE.COM> cy01@gte.com (Che-Liang Yang) writes:
|> 
|> > Example 1:
|> >
|> > Suppose we have two Mach machines A and B.
|> > Task A (in Machine A) creates a port, PORT A,
|> > and gives the ownership right to Task B in Machine B.
|> > According to the implementation of MSD 2.6 NetMsgServer:
|> > in Kernel A, Task A has receive right (to Port A)
|> > where NetMsgServer A has ownership right;
|> > in Kernel B, Task B has ownership right
|> > where NetMsgServer B has receive right.
|> >
|> > Now suppose that Task B exits.
|> > NetMsgServer A must transfer the ownership right back to Task A
|> > while still retaining the send right.
|> > According to the implementation, NetMsgServer A:
|> > 1. first msg_send itself a message with the send right to Port A,
|> > 2. port_deallocate Port A, (This will transfer the ownership right
|> >    to Task A transparently.)
|> > 3. msg_receive the message with send right to Port A.
|> >
|> > The problem is that the new port name obtained for Port A at step 3
|> > is not the same as that in step 1 where the port record
|> > for Port A in NetMsgServer A still uses the old name.
|> > The consequence is that now no tasks on remote machines can
|> > send messages to Port A (because NetMsgServer A will
|> > relay the message to an invalid port.)
|> > Further, if later Task A exits:
|> > 1. the logical name for Port A (checked in by Task A)
|> >    will still hang in NetMsgServer A;
|> > 2. none of remote tasks with send rights to Port A will get notified.
|> >
|> > I solved this problem by having NetMsgServer A
|> > port_rename the new name to the old name.
|> > Although this works for Mach 2.5 kernel,
|> > this is not a safe solution because I assume that
|> > the kernel will not immediately reuse the name assigned to a
|> > port just deallocated.
|> >
|> > So, my question is:
|> > Under Mach 2.5 kernel, is there any safe way for a task
|> > to transfer ownership right to another task
|> > (which already has receive right) without sending a message
|> > while still maintaining send right with the same name?
|> 
|> This looks like a plain bug in the netmsgserver implementation, not
|> any particular design problem. The fact that this bug has never been
|> reported until now shows how often ownership rights are actually used,
|> and partly justifies their removal from the IPC model... (see my
|> additional comments below on this matter)
|> 
|> The correct fix is not to rename the port right, but simply to update
|> the port record for the port in question to have the new name for the
|> port. Presumably, this port record is already locked while this rights
|> transfer operation is in progress, so there should not be any problems
|> with such an update.
|> 

OK, I admit that the original implementation took advantage of the fact that
the port would never be renamed during this "retention of send rights" hack.
I know that this used to work as I explicitly tested for it as part of my
PhD these work.  In retrospect, it is obvious that the code should have
checked in case the port did get renamed and then it should have done as Dan
says and updated the port record (and I guess rehashed it).

-- 
Robert Sansom, School of Computer Science
Carnegie Mellon University, Pittsburgh, PA 15213
INTERNET: sansom@cs.cmu.edu         CSNET: sansom%cs.cmu.edu@relay.cs.net
BITNET: sansom%cs.cmu.edu@cmuccvma  UUCP: ...!seismo!cs.cmu.edu!sansom

gansevle@cs.utwente.nl (Fred Gansevles) (06/18/91)

In article <1991Jun12.220121.29467@cs.cmu.edu>, rds+@CS.CMU.EDU (Robert Sansom) writes:
|> In article <DPJ.91Jun12134524@NATASHA.MACH.CS.CMU.EDU>, dpj@CS.CMU.EDU (Daniel Julin) writes:
|> |> 
|> |> In article <11355@bunny.GTE.COM> cy01@gte.com (Che-Liang Yang) writes:
|> |> 
|> |> > Example 1:
|> |> >
|> |> > Suppose we have two Mach machines A and B.
|> |> > Task A (in Machine A) creates a port, PORT A,
|> |> > and gives the ownership right to Task B in Machine B.
|> |> > According to the implementation of MSD 2.6 NetMsgServer:
|> |> > in Kernel A, Task A has receive right (to Port A)
|> |> > where NetMsgServer A has ownership right;
|> |> > in Kernel B, Task B has ownership right
|> |> > where NetMsgServer B has receive right.
|> |> >
|> |> > Now suppose that Task B exits.
|> |> > NetMsgServer A must transfer the ownership right back to Task A
|> |> > while still retaining the send right.
|> |> > According to the implementation, NetMsgServer A:
|> |> > 1. first msg_send itself a message with the send right to Port A,
|> |> > 2. port_deallocate Port A, (This will transfer the ownership right
|> |> >    to Task A transparently.)
|> |> > 3. msg_receive the message with send right to Port A.
|> |> >
|> |> > The problem is that the new port name obtained for Port A at step 3
|> |> > is not the same as that in step 1 where the port record
|> |> > for Port A in NetMsgServer A still uses the old name.
|> |> > The consequence is that now no tasks on remote machines can
|> |> > send messages to Port A (because NetMsgServer A will
|> |> > relay the message to an invalid port.)
|> |> > Further, if later Task A exits:
|> |> > 1. the logical name for Port A (checked in by Task A)
|> |> >    will still hang in NetMsgServer A;
|> |> > 2. none of remote tasks with send rights to Port A will get notified.
|> |> >
|> |> > I solved this problem by having NetMsgServer A
|> |> > port_rename the new name to the old name.
|> |> > Although this works for Mach 2.5 kernel,
|> |> > this is not a safe solution because I assume that
|> |> > the kernel will not immediately reuse the name assigned to a
|> |> > port just deallocated.
|> |> >
|> |> > So, my question is:
|> |> > Under Mach 2.5 kernel, is there any safe way for a task
|> |> > to transfer ownership right to another task
|> |> > (which already has receive right) without sending a message
|> |> > while still maintaining send right with the same name?
|> |> 
|> |> This looks like a plain bug in the netmsgserver implementation, not
|> |> any particular design problem. The fact that this bug has never been
|> |> reported until now shows how often ownership rights are actually used,
|> |> and partly justifies their removal from the IPC model... (see my
|> |> additional comments below on this matter)
|> |> 
|> |> The correct fix is not to rename the port right, but simply to update
|> |> the port record for the port in question to have the new name for the
|> |> port. Presumably, this port record is already locked while this rights
|> |> transfer operation is in progress, so there should not be any problems
|> |> with such an update.
|> |> 
|> 
|> OK, I admit that the original implementation took advantage of the fact that
|> the port would never be renamed during this "retention of send rights" hack.
|> I know that this used to work as I explicitly tested for it as part of my
|> PhD these work.  In retrospect, it is obvious that the code should have
|> checked in case the port did get renamed and then it should have done as Dan
|> says and updated the port record (and I guess rehashed it).
|> 
|> -- 
|> Robert Sansom, School of Computer Science
|> Carnegie Mellon University, Pittsburgh, PA 15213
|> INTERNET: sansom@cs.cmu.edu         CSNET: sansom%cs.cmu.edu@relay.cs.net
|> BITNET: sansom%cs.cmu.edu@cmuccvma  UUCP: ...!seismo!cs.cmu.edu!sansom
-- 
_________________________________________________________________________
Fred Gansevles              e-mail: gansevle@cs.utwente.nl        
                            Phone:  +31 53 89 3744
University Twente
Dept of CS
Box 217
7500 AE Enschede
Netherlands
_________________________________________________________________________