[comp.sys.isis] Description of bug...

ken@gvax.cs.cornell.edu (Ken Birman) (06/17/89)

This is in response to someone who asked for more details on
the "symptoms" of the vsync bug we just fixed.

The bug was in pr_addr.c; a line that read if(used_view)...
should have read if(!used_view)...  The effect was that ISIS
wasn't doing read/write locking on process group addresses.
Somehow I deleted the '!' operator the other day when doing
some code rearrangement.  The result was that messages were
sometimes delivered in the wrong view.

The symptoms of this were several:
1) The grid programs sometimes showed different states when stopped.
   This is really awful when doing demos...
2) pg_join sometimes hung
3) cbcast sometimes went into an infinite loop filling a log
   with messages about "look at 3"
4) it was possible to get a panic "shr_gunlock: wasn't locked"
5) gbcast sometimes hung

With the bug fixed, ISIS is back to its old self.  We pounded
on the system quite a bit and nothing upsetting happened.  

In a sense, this is encouraging; the point is that ISIS itself
depends rather strongly on virtual synchrony, and things that
break virtual synchrony show up in extreme ways rather quickly. 
This makes it reasonably likely that we will notice and be able to
fix ISIS bugs when they occur...

Ken