ken@cs.cornell.edu (Ken Birman) (04/26/91)
> From stasel@cs.unc.edu Thu Apr 25 17:16:42 1991 > Subject: ISIS LOG TOOL > To: isis-bugs@cs.cornell.edu > I would appreciate your help in answering a few questions. > BACKGROUND: I am running isis version 2.1 using the log manager > with automatic logging and manual flushing (I have also tried automatic > flushing. The problem is that when all processes of a group die I am > able to restart any process even if the process has an out-of-date log. > On page 252 of the manual it seems to state that this will not happen, > that pg_join will die with a IE_MUSTJOIN error. > In our class someone thought the meaning of the book was that this would > happen (a IE_MUSTJOIN error) only if the log were really out of date. > QUESTIONS: > 1. What are the semantics of the restart from total failure in version 2.1? > (Intended and actual). In V2.1 and V3.0, the tool requires that the lmgr, rmgr and news (!) subsystems be included in the isis.rc file. V2.1 further requires that news "survive" the failure of your application. If news itself goes down the problem you have observed will often occur after restart. > 2. What are the semantics of the restart from total failure in version 3.0? In V3.0 the system handles this problem correctly even if a total failure occurs and all of ISIS gets shut down and restarted. But, we still require that the news facility be run -- in fact, we need the new, V3.0 news facility -- and we get stuck if the copies of the news that are up don't include one that was up when the group was last active. The basic idea is that news is maintaining "last view" information for the groups that use this type of logging and unless we can track down the last view, the first to recover will always be allowed to restart. > 3. If I am reading the book wrong and the semantics are that IE_MUSTJOIN > will happen only if the log is significantly out-of-date, can this be > worked around so that it must completely up-to-date? Well, this probably due to having had the news system shut down under V2.1 and would go away under the V3.0 solution. > 4. If it is a programming error of mine, is there any obvious mistakes I > could have made that you know of? Looks like you were right and the manual was missing an important warning. > Judith A. Stasel > CS Grad - Chapel Hill -- Kenneth P. Birman E-mail: ken@cs.cornell.edu 4105 Upson Hall, Dept. of Computer Science TEL: 607 255-9199 (office) Cornell University Ithaca, NY 14853 (USA) FAX: 607 255-4428