gibson@rosemary.Berkeley.EDU (Garth Gibson) (08/14/89)
> From: eho@cognito.Princeton.EDU (Eric Ho) > Newsgroups: comp.os.research > Message-Id: <EHO.89Aug12232420@cognito.Princeton.EDU> > Subject: raid downtime fault-tolerance & more ?? > > Is there a way to increase the downtime fault-tolerance for RAID ? > I mean if the server that "manages" RAID crashed or got some other hardware > faults (or other software bugs) then the whole thing will be screwed even > though the disk arrays are completely ok. If the CPU hardware or software on a file server fails, then, as Eric suggests, that server is hosed regardless of the availability of the disks (RAID or not). The UNIX reaction is to reboot, forget what was in progress and clean up the disks so that partially completed operations are forgotten. Sprite has a similar approach, but if the file system is log-structured then "cleanup" means read in the most recently written file system "superblock". Since the log-structured file system is (almost) non-overwrite (almost means you have to wrap around eventually), this superblock points to a completely clean filesystem. All the partially complete operations were written after this superblock, but because a new superblock describing them has not been found they are forgotten. So cleanup is not a complete scan of all disks as long as the superblock can be found more easily (timestamped and only in certain offsets into each "cyinder group". This improves availability by recovering more quickly. To preserve availability during the server downtime, the disks can be dual ported to another host. In SCSI this means that two hosts are connected to each cable (though one may never use it because of cache inconsistency issues). The problem with this scheme is again incomplete metadata. So the other host, if it is to pick up the ball, must cleanup the disks. This gives you a problem when the failed server reboots and wants to clean its disks himself. It needs to assume that it is no longer master of its disks and go looking for the usurper. Of course, one could journal all metadata changes to your pair so that it might be able to avoid cleanup, but this is rather expensive given failures are rare and Sprite clients can tell the server what files they have open and what dirty blocks they have. In short our main vulnerability is that dirty data in the server's cache that has delayed (hoping to be deleted). Databases will require that their journal data go to disk, so it is just your basic operations at risk. How much do you want to pay for their protection? garth gibson