cy01@gte.com (Che-Liang Yang) (06/12/91)
Do MSD 2.6 NetMsgServers really extend Mach IPC to network TRANSPARENTLY? I am not sure. Here are three examples. Example 1: Suppose we have two Mach machines A and B. Task A (in Machine A) creates a port, PORT A, and gives the ownership right to Task B in Machine B. According to the implementation of MSD 2.6 NetMsgServer: in Kernel A, Task A has receive right (to Port A) where NetMsgServer A has ownership right; in Kernel B, Task B has ownership right where NetMsgServer B has receive right. Now suppose that Task B exits. NetMsgServer A must transfer the ownership right back to Task A while still retaining the send right. According to the implementation, NetMsgServer A: 1. first msg_send itself a message with the send right to Port A, 2. port_deallocate Port A, (This will transfer the ownership right to Task A transparently.) 3. msg_receive the message with send right to Port A. The problem is that the new port name obtained for Port A at step 3 is not the same as that in step 1 where the port record for Port A in NetMsgServer A still uses the old name. The consequence is that now no tasks on remote machines can send messages to Port A (because NetMsgServer A will relay the message to an invalid port.) Further, if later Task A exits: 1. the logical name for Port A (checked in by Task A) will still hang in NetMsgServer A; 2. none of remote tasks with send rights to Port A will get notified. I solved this problem by having NetMsgServer A port_rename the new name to the old name. Although this works for Mach 2.5 kernel, this is not a safe solution because I assume that the kernel will not immediately reuse the name assigned to a port just deallocated. So, my question is: Under Mach 2.5 kernel, is there any safe way for a task to transfer ownership right to another task (which already has receive right) without sending a message while still maintaining send right with the same name? Example 2: Task A creates a port, Port A, and sends the receive/ownership right in a message, Msg A, to Task B in Machine B. Further, in Msg A, the "msg_type_deallocate" bit is on. The problem is that when Task B receives Msg A (across the network), the "msg_type_deallocate" bit was turned off by NetMsgServer B. (NetMsgServer B has to turn it off when relaying Msg A because it wants to retain the send right to Port A.) Normally, this is not a big deal. But, for applications that use MIG, this will cause the server stub to reject request messages from remote machines. In fact, an external pager in MSD 2.6 will never accept a "memory_object_terminate" request from a remote kernel. Example 3: Task A creates Port A and Task B creates Port B. Task A calls port_set_backup with primary = Port A and backup = Port B. Now, suppose that Machine A crashes. I am sure that Task B will not receive the receive right to Port A because when port_set_backup was called, NetMsgServer A and then B were not informed by Kernel A. This convinces me that the ownership abstraction is still useful (for a task to back up a remote task) and should not be deleted. C-L Yang
dpj@CS.CMU.EDU (Daniel Julin) (06/13/91)
In article <11355@bunny.GTE.COM> cy01@gte.com (Che-Liang Yang) writes: > Example 1: > > Suppose we have two Mach machines A and B. > Task A (in Machine A) creates a port, PORT A, > and gives the ownership right to Task B in Machine B. > According to the implementation of MSD 2.6 NetMsgServer: > in Kernel A, Task A has receive right (to Port A) > where NetMsgServer A has ownership right; > in Kernel B, Task B has ownership right > where NetMsgServer B has receive right. > > Now suppose that Task B exits. > NetMsgServer A must transfer the ownership right back to Task A > while still retaining the send right. > According to the implementation, NetMsgServer A: > 1. first msg_send itself a message with the send right to Port A, > 2. port_deallocate Port A, (This will transfer the ownership right > to Task A transparently.) > 3. msg_receive the message with send right to Port A. > > The problem is that the new port name obtained for Port A at step 3 > is not the same as that in step 1 where the port record > for Port A in NetMsgServer A still uses the old name. > The consequence is that now no tasks on remote machines can > send messages to Port A (because NetMsgServer A will > relay the message to an invalid port.) > Further, if later Task A exits: > 1. the logical name for Port A (checked in by Task A) > will still hang in NetMsgServer A; > 2. none of remote tasks with send rights to Port A will get notified. > > I solved this problem by having NetMsgServer A > port_rename the new name to the old name. > Although this works for Mach 2.5 kernel, > this is not a safe solution because I assume that > the kernel will not immediately reuse the name assigned to a > port just deallocated. > > So, my question is: > Under Mach 2.5 kernel, is there any safe way for a task > to transfer ownership right to another task > (which already has receive right) without sending a message > while still maintaining send right with the same name? This looks like a plain bug in the netmsgserver implementation, not any particular design problem. The fact that this bug has never been reported until now shows how often ownership rights are actually used, and partly justifies their removal from the IPC model... (see my additional comments below on this matter) The correct fix is not to rename the port right, but simply to update the port record for the port in question to have the new name for the port. Presumably, this port record is already locked while this rights transfer operation is in progress, so there should not be any problems with such an update. In Mach 3.0, it is possible to transfer specific rights separately, so that all those msg_send/msg_receive gymnastics would not be necessary at all. But of course, there are no ownership rights in 3.0, so this particular issue is moot. > Example 2: > > Task A creates a port, Port A, > and sends the receive/ownership right in a message, Msg A, > to Task B in Machine B. > Further, in Msg A, the "msg_type_deallocate" bit is on. > > The problem is that when Task B receives Msg A (across the network), > the "msg_type_deallocate" bit was turned off by NetMsgServer B. > (NetMsgServer B has to turn it off when relaying Msg A > because it wants to retain the send right to Port A.) > > Normally, this is not a big deal. > But, for applications that use MIG, this will cause the > server stub to reject request messages from remote machines. > In fact, an external pager in MSD 2.6 will never accept a > "memory_object_terminate" request from a remote kernel. This is a basic problem with the old IPC model. It has been solved in newer kernels by always forcing the "deallocate" bits to zero before any message is delivered to a receiver. With that change, both the kernel and the netmsgserver do the same thing. This "deallocate" bit is really a matter between the sender and the kernel anyway, and should never have been made visible to receivers. In general, there are a number of "rough edges" like this one in the old IPC model, that the new (3.0) IPC model attempts to eliminate. See Rich Draves's paper on IPC at the 1990 USENIX Mach workshop for details. I don't know exactly how the netmemory server in 2.5 gets around this problem, but I am told that it does work. My guess is that it is compiled with a different version of Mig, or with UseStaticMsgType or TypeCheck turned off. > Example 3: > > Task A creates Port A and Task B creates Port B. > Task A calls port_set_backup with primary = Port A > and backup = Port B. > Now, suppose that Machine A crashes. > I am sure that Task B will not receive the receive right to Port A > because when port_set_backup was called, > NetMsgServer A and then B were not informed by Kernel A. > > This convinces me that the ownership abstraction is still useful > (for a task to back up a remote task) and should not be deleted. Ownership rights increase the complexity and size of the netmsgserver as well the kernel. Even with the existing 2.5 netmsgserver, which does support ownership rights, there are cases involving host crashes where messages sent to ports with split ownership and receive rights may be lost. Handling those cases would introduce even more complexity and overhead, and turn the netmsgserver into something close to a full-blown transaction facility. Conversely, I know of only one application of ownership rights in 2.5: to allow the service server to keep track of the name server port. In general, ownership rights are not a very flexible mechanism on which to base resilient distributed applications, because they are limited to a single "backup" site for each port and allow very little variation in how they can be used. Therefore, it was decided that such replication services could be more usefully implemented in various application-level facilities built on top of the simple IPC facility, and not integrated with it. Ownership rights have been eliminated in the IPC model in 3.0 (and even in some late versions of the 2.5/2.6 kernel). Backup ports have been introduced solely for the purpose of replacing the use of ownership rights in the service server. They are not extended over the network. ====================================================================== Daniel Julin dpj@cs.cmu.edu School of Computer Science Carnegie Mellon University, Pittsburgh, PA 15213 ======================================================================
rds+@CS.CMU.EDU (Robert Sansom) (06/13/91)
In article <DPJ.91Jun12134524@NATASHA.MACH.CS.CMU.EDU>, dpj@CS.CMU.EDU (Daniel Julin) writes: |> |> In article <11355@bunny.GTE.COM> cy01@gte.com (Che-Liang Yang) writes: |> |> > Example 1: |> > |> > Suppose we have two Mach machines A and B. |> > Task A (in Machine A) creates a port, PORT A, |> > and gives the ownership right to Task B in Machine B. |> > According to the implementation of MSD 2.6 NetMsgServer: |> > in Kernel A, Task A has receive right (to Port A) |> > where NetMsgServer A has ownership right; |> > in Kernel B, Task B has ownership right |> > where NetMsgServer B has receive right. |> > |> > Now suppose that Task B exits. |> > NetMsgServer A must transfer the ownership right back to Task A |> > while still retaining the send right. |> > According to the implementation, NetMsgServer A: |> > 1. first msg_send itself a message with the send right to Port A, |> > 2. port_deallocate Port A, (This will transfer the ownership right |> > to Task A transparently.) |> > 3. msg_receive the message with send right to Port A. |> > |> > The problem is that the new port name obtained for Port A at step 3 |> > is not the same as that in step 1 where the port record |> > for Port A in NetMsgServer A still uses the old name. |> > The consequence is that now no tasks on remote machines can |> > send messages to Port A (because NetMsgServer A will |> > relay the message to an invalid port.) |> > Further, if later Task A exits: |> > 1. the logical name for Port A (checked in by Task A) |> > will still hang in NetMsgServer A; |> > 2. none of remote tasks with send rights to Port A will get notified. |> > |> > I solved this problem by having NetMsgServer A |> > port_rename the new name to the old name. |> > Although this works for Mach 2.5 kernel, |> > this is not a safe solution because I assume that |> > the kernel will not immediately reuse the name assigned to a |> > port just deallocated. |> > |> > So, my question is: |> > Under Mach 2.5 kernel, is there any safe way for a task |> > to transfer ownership right to another task |> > (which already has receive right) without sending a message |> > while still maintaining send right with the same name? |> |> This looks like a plain bug in the netmsgserver implementation, not |> any particular design problem. The fact that this bug has never been |> reported until now shows how often ownership rights are actually used, |> and partly justifies their removal from the IPC model... (see my |> additional comments below on this matter) |> |> The correct fix is not to rename the port right, but simply to update |> the port record for the port in question to have the new name for the |> port. Presumably, this port record is already locked while this rights |> transfer operation is in progress, so there should not be any problems |> with such an update. |> OK, I admit that the original implementation took advantage of the fact that the port would never be renamed during this "retention of send rights" hack. I know that this used to work as I explicitly tested for it as part of my PhD these work. In retrospect, it is obvious that the code should have checked in case the port did get renamed and then it should have done as Dan says and updated the port record (and I guess rehashed it). -- Robert Sansom, School of Computer Science Carnegie Mellon University, Pittsburgh, PA 15213 INTERNET: sansom@cs.cmu.edu CSNET: sansom%cs.cmu.edu@relay.cs.net BITNET: sansom%cs.cmu.edu@cmuccvma UUCP: ...!seismo!cs.cmu.edu!sansom
gansevle@cs.utwente.nl (Fred Gansevles) (06/18/91)
In article <1991Jun12.220121.29467@cs.cmu.edu>, rds+@CS.CMU.EDU (Robert Sansom) writes: |> In article <DPJ.91Jun12134524@NATASHA.MACH.CS.CMU.EDU>, dpj@CS.CMU.EDU (Daniel Julin) writes: |> |> |> |> In article <11355@bunny.GTE.COM> cy01@gte.com (Che-Liang Yang) writes: |> |> |> |> > Example 1: |> |> > |> |> > Suppose we have two Mach machines A and B. |> |> > Task A (in Machine A) creates a port, PORT A, |> |> > and gives the ownership right to Task B in Machine B. |> |> > According to the implementation of MSD 2.6 NetMsgServer: |> |> > in Kernel A, Task A has receive right (to Port A) |> |> > where NetMsgServer A has ownership right; |> |> > in Kernel B, Task B has ownership right |> |> > where NetMsgServer B has receive right. |> |> > |> |> > Now suppose that Task B exits. |> |> > NetMsgServer A must transfer the ownership right back to Task A |> |> > while still retaining the send right. |> |> > According to the implementation, NetMsgServer A: |> |> > 1. first msg_send itself a message with the send right to Port A, |> |> > 2. port_deallocate Port A, (This will transfer the ownership right |> |> > to Task A transparently.) |> |> > 3. msg_receive the message with send right to Port A. |> |> > |> |> > The problem is that the new port name obtained for Port A at step 3 |> |> > is not the same as that in step 1 where the port record |> |> > for Port A in NetMsgServer A still uses the old name. |> |> > The consequence is that now no tasks on remote machines can |> |> > send messages to Port A (because NetMsgServer A will |> |> > relay the message to an invalid port.) |> |> > Further, if later Task A exits: |> |> > 1. the logical name for Port A (checked in by Task A) |> |> > will still hang in NetMsgServer A; |> |> > 2. none of remote tasks with send rights to Port A will get notified. |> |> > |> |> > I solved this problem by having NetMsgServer A |> |> > port_rename the new name to the old name. |> |> > Although this works for Mach 2.5 kernel, |> |> > this is not a safe solution because I assume that |> |> > the kernel will not immediately reuse the name assigned to a |> |> > port just deallocated. |> |> > |> |> > So, my question is: |> |> > Under Mach 2.5 kernel, is there any safe way for a task |> |> > to transfer ownership right to another task |> |> > (which already has receive right) without sending a message |> |> > while still maintaining send right with the same name? |> |> |> |> This looks like a plain bug in the netmsgserver implementation, not |> |> any particular design problem. The fact that this bug has never been |> |> reported until now shows how often ownership rights are actually used, |> |> and partly justifies their removal from the IPC model... (see my |> |> additional comments below on this matter) |> |> |> |> The correct fix is not to rename the port right, but simply to update |> |> the port record for the port in question to have the new name for the |> |> port. Presumably, this port record is already locked while this rights |> |> transfer operation is in progress, so there should not be any problems |> |> with such an update. |> |> |> |> OK, I admit that the original implementation took advantage of the fact that |> the port would never be renamed during this "retention of send rights" hack. |> I know that this used to work as I explicitly tested for it as part of my |> PhD these work. In retrospect, it is obvious that the code should have |> checked in case the port did get renamed and then it should have done as Dan |> says and updated the port record (and I guess rehashed it). |> |> -- |> Robert Sansom, School of Computer Science |> Carnegie Mellon University, Pittsburgh, PA 15213 |> INTERNET: sansom@cs.cmu.edu CSNET: sansom%cs.cmu.edu@relay.cs.net |> BITNET: sansom%cs.cmu.edu@cmuccvma UUCP: ...!seismo!cs.cmu.edu!sansom -- _________________________________________________________________________ Fred Gansevles e-mail: gansevle@cs.utwente.nl Phone: +31 53 89 3744 University Twente Dept of CS Box 217 7500 AE Enschede Netherlands _________________________________________________________________________