dss@fatkid.UUCP (02/10/87)
In article <2592@phri.UUCP> roy@phri.UUCP (Roy Smith) writes: > > How hard would it be to incorporate a "shadow server" mode into >NFS? Imagine 2 servers serving the same file system. When a write request >comes in, both servers do it and the client waits to hear an ack from both >of them. When a read request comes in, both servers try to do it and the >client takes the data from whichever server responds first. First off, you've made some assumptions about a single shadow.... think, instead, of a set of shadow 'devices' (or filesystems). Second, if you 'broadcast' read requests to all shadow servers, you're imposing a lot of distributed overhead for the no-error (99.9%) case. If, instead, you attempt the read first from a 'primary' source then you have to decide which filesystem is primary (it could rotate, but you'd probably lose any advantages that might be gained from buffered read-ahead). In many ways it is simpler to have a physical disk shadow than a filesystem shadow....there the intent is to provide some reasonable degree of redundancy in case of hard failure. Then, the issues are: Which disk do you read first? If you get an error writing to a shadow, was the 'write' successful? Are you concerned with data integrity between shadows (i.e., what do you do if one shadow has different data than then other?) How do you deal with bad-block mapping? Since disk shadowing is fundamentally intended to reduce the number of read errors (by providing redundancy), all the interesting decisions must be made when an error on one shadow occurs. When you shadow file- systems, you increase dramatically the number of types of errors that can occur. Consequently, the decision matrix gets far more complex. For instance, if you are shadowing filesystems, you cannot tolerate soft failures (e.g., timeout) on writing to any of the shadows. This is because a subsequent read to that shadow may succeed where the corresponding write failed. Of course, there's also naming conflict problems: what if creat() works on one filesystem but not another? what happens if one filesystem is used to shadow multiple clients? But even if you ignored those problems, there's still a whole can of worms involving failure recovery. For example, you raised the question of what happens if one shadow filesystem fills up before another. Consider, also, what happens when a read request returns end-of-file. Do you accept this or do you try all the shadows to see if one got further (and if it did, is it necessarily correct)? Daniel Steinberg (ihnp4|ucbvax)!sun!dss
jans@stalker.UUCP (02/10/87)
In article <12959@sun.uucp> dss@sun.UUCP (Daniel Steinberg) writes: >In article <2592@phri.UUCP> roy@phri.UUCP (Roy Smith) writes: >> >> How hard would it be to incorporate a "shadow server" mode into >>NFS? Imagine 2 servers serving the same file system... > >In many ways it is simpler to have a physical disk shadow than a >filesystem shadow... Simpler, and MUCH more efficient. The Tandem NonStop fault-tolerant computers provide parallel writes, but split seeks on reads. I.e.: One disk is assigned cylinders 0-405, the other cylinders 406-811, reducing the average seek time by a factor approching two. The actual improvement is tempered by accelerated seeks, out-of-area writes, and non-locality of reference. :::::: Artificial Intelligence Machines --- Smalltalk Project :::::: :::::: Jan Steinman Box 1000, MS 60-405 (w)503/685-2956 :::::: :::::: tektronix!tekecs!jans Wilsonville, OR 97070 (h)503/657-7703 ::::::
mangler@cit-vax.UUCP (02/16/87)
In article <12959@sun.uucp> dss@sun.UUCP (Daniel Steinberg) writes: >In many ways it is simpler to have a physical disk shadow than a >filesystem shadow... Another variation is two servers for a set of dual-ported disks. In this case there's no duplication of effort, and you're not buying twice as many disks. Eagles are so reliable that you'll see more downtime from power outages than from broken drives. This would be an attractive mode for us, because we've got the hardware for it. The hard part is getting two machines to share dual-ported disks read-write. The problem is caching; dual-port kits have no way to let the other CPU know that something was written and that something in the in-core cache should be invalidated. Some other way has to be provided to communicate this (Ethernet?). The SI SIMACS controller is supposed to be able to do this - I don't understand how. (Maybe they just don't cache anything? But that's too horrible to contemplate...). Don Speck speck@vlsi.caltech.edu {seismo,rutgers,ames}!cit-vax!speck