ken@gvax.cs.cornell.edu (Ken Birman) (08/07/89)
Based on feedback to prior posting, I thinking of changing ISIS as follows (for V2.0) 1: pg_leave() currently takes a while to run, and during the period between when you call it and when it finishes a process can still get group-related requests. I propose to change the semantics of pg_leave() to include a "filter" that will discard messages sent to a process during this period. Note that other group members will assume the process received this discarded messages, but then will see the pg_leave occur and will presumably reassign any work that the "failed" process was doing and resume execution without it. I will also send a nullreply() on behalf of any discarded message (since the process is leaving intentionally and not due to a failure, a client that sent to it might otherwise hang waiting for a reply). If a message is sent to a process for "several reasons" I am hesitating between several alternatives: a) Deliver to all listed entry points, if any b) Don't deliver to any listed entry points, if one of them is due to address expansion for the groups that I am leaving c) Deliver to entry points for groups to which I still belong or that mention this process explicitly, selectively inhibit delivery for groups that I am leaving. In this case one should presumably send a nullreply() after all deliveries finish -- just in case the "dropped" entry point would normally have sent a reply. I see option (c) as undesirable because it violates normal assumptions of addressing atomicity. As a result, I am currently leaning towards option (b), since (a) might violate potential causality by discarding one message but delivering a subsequent one. Do people see problems with this proposal? I recognize that there is a hard problem here -- in my view, we had come down on the side of one style of solution, and people were finding it inconvenient. There seems to be no ideal solution, but perhaps this proposal will be less inconvenient to program with. 2) spooler. I am adding a "checkpoint pointer" as well; this will point to the last checkpoint in the spool, and the idea is that a replay will replay messages between the checkpoint pointer and the replay pointer subject to the replay pattern being matched, then replay any subsequent messages if in "play through" mode. Messages with sequence numbers smaller than the checkpoint pointer would be kept until explicitly discarded. Comments would be appreciated. By the way, we have a more solid version of the spooler for those who would like to use it (the V1.2 version was an alpha test, as I think I mentioned -- call the newer one a beta test). We also have a beta-test version of isis_connect(). Ken Birman