cliff@SSD.HARRIS.COM (Cliff Van Dyke) (10/18/89)
I'm currently implementing a shadow disk capability for Harris' CX/UX version of UNIX. Shadow disk is the capability to duplicate a disk dynamically so that you have a backup copy in the event that one of the disks fail. We've taken Pyramid's basic approach of shadowing individual partitions using a virtual disk driver which sits immediately above the regular disk driver. I'd like to draw on the collective wisdom of this group as a sanity check on a couple of issues. Issue 1 ------- There are times when one of the shadowed copies will be deemed to be good and other shadowed copies will be deemed to be of unknown quality. For instance, after a system crash we'll pick one copy to fsck and we'll need to ensure that the other copy is up to date with the fsck'ed copy. In another instance, when a new shadow partition is brought online (for instance, after the disk drive is fixed following a failure), we'll need to populate the new partition with known good data. Our general solution to this is to copy all of the data from the known good copy to the partition with the unknown data. We properly coordinate this updating with other I/O currently happening to the virtual disk so as to allow the updating to happen as a background activity. Nevertheless, this is a lot of I/O to do. Am I missing something? Issue 2 ------- My current intent is to remember which disks fail in a file on the root partition. This information will be used on future reboots to reconfigure the system the same way after a reboot as before the reboot. The operator can always interactively attempt to mark a partition as failed or to bring a failed partition online. That sure sounds like the right way to do it to me. Does it to you? Issue 3 ------- Are there any experiences concerning shadow disk that you've had that are worth sharing with an implementor? Please e-mail responses to me. I'll summarize to the net if it seems warranted. -- Cliff Van Dyke cliff@ssd.harris.com Harris Computer Systems cliff%ssd.harris.com@eddie.mit.edu 2101 W. Cypress Creek Rd. ...!{mit-eddie,uunet,novavax}!hcx1!cliff Ft. Lauderdale, FL 33309-1892 Tel: (305) 974-1700
nagle@well.UUCP (John Nagle) (10/20/89)
Tandem has more experience with shadow disks than anybody else, so examining their system is probably worthwhile. They support such things as network shadowed disks, including resynchronization after a fault over the network. They also support distributed fault-tolerant multiprocessing over their net, of course. John Nagle