McGuire_Ed@GRINNELL.MAILNET (08/04/86)
>Date: 31 Jul 86 13:21:00 PDT >From: "ANCHOR::PEARSON" <pearson%anchor.decnet@lll-icdc> >Subject: VAX Cluster Failures: Summary of Replies (long). > >A Quorum disk must be a system disk. However, note this warning: > > "With a two-node cluster, you'll want to use a > quorum disk. Choosing it could be tricky, if, as > I recall, a quorum disk can't be shadowed [ TRUE! ]: > If you make either of your system disks the > quorum disk, when that disk goes, both the system > that boots from it and the disk's quorum vote > goes, so the remaining system hangs waiting for > quorum - exactly what having a quorum disk and > two system disks was supposed to avoid! So you'd > need some third non-shadowed disk to use as the > quorum disk...." This seems to contradict my experience with a two-node cluster. If a disk goes offline, I/O queued to the disk will wait until the disk goes online or until mount verification is aborted. Operations on other devices, however, are not affected. In particular, cluster communication does not stop. For example, system A boots from disk DA, and system B boots from DB, and DB is the quorum disk. There are 3 votes--quorum is 2. If DB goes offline, votes are reduced to 2. Many processes on system B are liable to stall for I/O completion on DB, but system B does not withdraw its vote from the cluster. If system A is configured to never do I/O to the other node's system disk, it is not interrupted by the disk failure. In the case that mount verification is aborted, my information is incomplete. I've never had a system disk in mount verification. If B crashes, then quorum would be 1 and A would hang. It is up to you to be sure that mount verification is not aborted, or to use the console method of reducing the remaining system's quorum to 1 after the crash. This would take several minutes. >In a dual-HSC50 system, each disk is accessed by one HSC50 at a time. >The path from that disk to the other HSC50 won't be used until the >first HSC50 fails. This means that you generally can't be too >confident that your backup system will work when you need it. If you >want to find out whether one HSC50 is working, reboot the other one. (!) You can also use the path select buttons on each disk to test HSC failover. On an RA81, the A button is lit if the A path is active. Pop the A button to prevent I/O on the A path. The B button will light shortly, as the other HSC takes over I/O for the disk. McGuire_Ed%GRINNELL.MAILNET@MIT-MULTICS.ARPA