STODOLA@xrt.upenn.EDU (Bob Stodola) (07/02/87)
We had an unusual occurence today with our VAXCluster.  The room flooded and
the power was shut off for a while.  Four (of five) systems rebooted a new
cluster when power restored, but the fifth came up in a funny state.  We
powered off the LSI-11 and powered on again, and the system began a warm
restart.  This system, coincidentally, had the least memory, and apparently
retained its restartable state longer.  The first four systems recognized the
presence of a pre-existing cluster and immediately suicided (CLUEXIT).  All
of the disks, had, of course, by that time been mounted by the four system
cluster, and the new(old) one system cluster found them to be strangers and
sent them all into mount verification.
I can't quite sort out how I feel about this state of affairs, but something
seems inherently wrong about this sequence.  It seems like a typical problem
with any cluster where the battery backup has different lifetimes.  Also,
I am not sure whether these various suicides/mount verifications actually
protected the integrity of the disks.
Reply to:
  STODOLA%XRT@CIS.UPENN.EDU             Robert K. Stodola
  Phone: (215) 728-3660                 The Fox Chase Cancer Center
                                        7701 Burholme Avenue
                                        Philadelphia, PA  19111
                                        USA