ken@gvax.cs.cornell.edu (Ken Birman) (08/07/89)
Based on feedback to prior posting, I thinking of changing
ISIS as follows (for V2.0)
1: pg_leave() currently takes a while to run, and during the period
between when you call it and when it finishes a process can still
get group-related requests. I propose to change the semantics of
pg_leave() to include a "filter" that will discard messages sent to
a process during this period. Note that other group members will
assume the process received this discarded messages, but then will
see the pg_leave occur and will presumably reassign any work that
the "failed" process was doing and resume execution without it.
I will also send a nullreply() on behalf of any discarded message
(since the process is leaving intentionally and not due to a failure,
a client that sent to it might otherwise hang waiting for a reply).
If a message is sent to a process for "several reasons" I am
hesitating between several alternatives:
a) Deliver to all listed entry points, if any
b) Don't deliver to any listed entry points, if one
of them is due to address expansion for the groups that
I am leaving
c) Deliver to entry points for groups to which I still belong
or that mention this process explicitly, selectively
inhibit delivery for groups that I am leaving. In this
case one should presumably send a nullreply() after
all deliveries finish -- just in case the "dropped"
entry point would normally have sent a reply.
I see option (c) as undesirable because it violates normal assumptions
of addressing atomicity. As a result, I am currently leaning towards
option (b), since (a) might violate potential causality by discarding
one message but delivering a subsequent one.
Do people see problems with this proposal? I recognize that there
is a hard problem here -- in my view, we had come down on the side
of one style of solution, and people were finding it inconvenient.
There seems to be no ideal solution, but perhaps this proposal will
be less inconvenient to program with.
2) spooler. I am adding a "checkpoint pointer" as well; this will
point to the last checkpoint in the spool, and the idea is that a
replay will replay messages between the checkpoint pointer and the
replay pointer subject to the replay pattern being matched, then
replay any subsequent messages if in "play through" mode. Messages
with sequence numbers smaller than the checkpoint pointer would be
kept until explicitly discarded.
Comments would be appreciated. By the way, we have a more solid version
of the spooler for those who would like to use it (the V1.2 version
was an alpha test, as I think I mentioned -- call the newer one a beta
test). We also have a beta-test version of isis_connect().
Ken Birman