johnk@gordian.com (John Kalucki) (03/23/91)
I'm having trouble arranging my nfs links to avoid halting all of my machines when one machine crashes. I have 6 Mips workstations which are all intermounted, each host has the data disk of all other hosts mounted. All of the filesystems are mounted -soft and all of the mountpoints are in /n/{machine}, where they should be out of some of harm's way. When one machine goes down, all others have difficulties, ranging from simply freezing up to having pwd(1) timeout. This happens even though data is not being accessed on the machine that is down (or so I'd like to think). I'd like to be able to umount the machine that is down, perhaps in the same way as when Unix is being shutdown-a forceable unmount occurs. I don't really care if any processes are confused by the forceable unmount, I'd just like to avoid the NFS:timed out messages (and the occasional long timeout period) to get the client machine functional again. Another thing I've thought of is to put each server in /{machine}/{machine} to avoid having pwd stat the directory with the mountpoint unless needed, but this seems quite ugly. Also, when a server is up, but I can't unmount it, because the fs is busy, is there an easy way to find which processes are the offenders? I've looked at the fuser command that comes on the Mips boxes, but it doesn't work with NFS. Any hints? -John Kalucki johnk@gordian.com
vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (03/24/91)
In article <118@gordius.gordian.com>, johnk@gordian.com (John Kalucki) writes: > > I'm having trouble arranging my nfs links to avoid halting all of > my machines when one machine crashes.... > When one machine goes down, all others have difficulties, ranging > from simply freezing up to having pwd(1) timeout.... Symbolic links from /{mach} to /mount/point could make the /{mach}/{mach} mounting scheme palatable, since there are only 6 machines and 30-36 links involved. Things change when you go from dozens to grosses....pun possibly intended The automounter sounds like a good bet for the situation described. It's reduced many of the dead host problems around here, where everyone has from two to several dozen mounts, with many of the most used servers several buildings, routers, and networks distant. The automounter helps, because you're more likely to get messed up only by the (network) crash of a machine you currently care about. Unfortunately the single thread nature of the automounter makes things more painful when a server you care about does crash or become unreachable. The Sun automounter is not too smart about doing other mounts while it is waiting to give up on a dead server. Vernon Schryver, vjs@sgi.com
brent@terra.Eng.Sun.COM (Brent Callaghan) (03/26/91)
In article <93325@sgi.sgi.com>, vjs@rhyolite.wpd.sgi.com (Vernon Schryver) writes: > In article <118@gordius.gordian.com>, johnk@gordian.com (John Kalucki) writes: > > > > I'm having trouble arranging my nfs links to avoid halting all of > > my machines when one machine crashes.... > > When one machine goes down, all others have difficulties, ranging > > from simply freezing up to having pwd(1) timeout.... > > > Symbolic links from /{mach} to /mount/point could make the /{mach}/{mach} > mounting scheme palatable, since there are only 6 machines and 30-36 > links involved. Things change when you go from dozens to grosses....pun > possibly intended > > The automounter sounds like a good bet for the situation described. It's > reduced many of the dead host problems around here, where everyone has from > two to several dozen mounts, with many of the most used servers several > buildings, routers, and networks distant. The automounter helps, because > you're more likely to get messed up only by the (network) crash of a > machine you currently care about. > > Unfortunately the single thread nature of the automounter makes things > more painful when a server you care about does crash or become unreachable. > The Sun automounter is not too smart about doing other mounts while it is > waiting to give up on a dead server. The automounter can help reduce the incidence of NFS hangs through a reduction in the number of static NFS mounts that getwd might bump into, though I'd not recommend it as a solution to this problem. Hangs due to dead servers are no surprise if the filesystem that you're dealing with *directly* becomes temporarily unavailable due to a server crash. What annoys me (and other NFS users) is where your application hangs due to a dead NFS filesystem that you have no interest in. There are two well-known mechanisms for this kind of hang: You type a command and the $PATH search hangs on a dead directory in the PATH. A workaround here is to keep the PATH as short as possible and substitute shell aliases where possible. Another mechanism is the getwd() mechanism: this function does a file tree walk in order to construct a path from the current directory to the root. If the walk (climb?) up the tree trips over a dead NFS mountpoint then the process that invoked getwd() is hung. This problem can be avoided by implementations that walk up to the first mountpoint and prepend the remainder of the path from the /etc/mtab entry. If the /etc/mtab entries have the device id stashed in the mount options (as a "dev" or "fsid" string) then getwd shouldn't hang except if invoked withing a dead filesystem. BTW: it's not easy to hang the automounter with a dead NFS mount. After the automounter has done a mount, it doesn't give a brass razzoo about whether the server is up or not. It just points its symbolic link at the mountpoint in /tmp_mnt and the application that asked for it follows it. If the mountpoint is dead then the application hangs, but the automounter carries on. In SunOS 4.0.3 the automounter could hang in unmount(2). An unmount of a dead NFS mountpoint will succeed - the kernel does not contact the server when it does an unmount(), but if the path to the mountpoint contains a dead server then it will hang in the lookup. The 4.1 automounter carefully pings servers along the path before attempting the unmount system call. Of course, a lot of this jiggery-pokery wouldn't be necessary if the automounter didn't have to be so paranoid about it's single thread blocking. A multithreaded automounter will definitely be an improvement. -- Made in New Zealand --> Brent Callaghan @ Sun Microsystems Email: brent@Eng.Sun.COM phone: (415) 336 1051
vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (03/27/91)
In article <10472@exodus.Eng.Sun.COM>, brent@terra.Eng.Sun.COM (Brent Callaghan) writes: (in response to an off hand complaint of mine about automounter hangs) > ... > BTW: it's not easy to hang the automounter with a dead NFS mount. some agreement, but see below. > A multithreaded automounter will > definitely be an improvement. Much agreement, providing the locking hassles are safely resolved. One would not want new problems in the style of the lock-demons. Real life example from the campus of a workstation maker on the non-solar side of the Mtn.View dump: Consider a bunch of source machines, all mounted indirectly in /net, via some handy map. Imagine `ln -s /net/{foo,bar,new,dead,so_on} /` so that symbolic links within the trees source work universally. (Some people mount on /n, some in /hosts, and the symbolic links must work on the servers as well.) Now imagine what happens if (1) you have foo,bar, and new all properly mounted, 'dead' not mounted, and the server for dead dies or becomes unreachable (2) vi /dead/src/whatever.c which is sym-link resolved into /net/dead/src/whatever.c (3) long pause grumble realize the trouble do something about the network or the server for dead (4) try to do something useful while the 10GB of file systems for 'dead' are fsck'ing so `vi /foo/src/stuff.c` Now you've hung another window, as namei() tries to resolve the /foo->/net/foo link by talking to the file system mounted on /net. Until the automounter stop patently waiting for the server for dead to answer the mount protocol or for a timeout, you're stuck. There is no equivalent hiccup in the obvious manual NFS mounting of {foo,bar,...} on /net. An obvious solution is to have the automounter not do the unmount when it times out. Instead, let the automounter locally unmount the filesystem, but cache the fh etc. and not bother to tell the server. Later when the file system is referenced, the automounter should first consult its cache of file handles etc, and do only the local part of the (re-)mount. Yes, it would need a varient mount system call. There are few good reasons to tell server when the automounter hits the default 5 minute timeout. It justs wastes server cycles and network bandwidth to update /etc/rmtab. For many years B.Lyon has been saying /etc/rmtab is a botch that should be deleted. Vernon Schryver, vjs@sgi.com
jay@silence.princeton.nj.us (Jay Plett) (03/28/91)
In article <93840@sgi.sgi.com>, vjs@rhyolite.wpd.sgi.com (Vernon Schryver) writes: < In article <10472@exodus.Eng.Sun.COM>, brent@terra.Eng.Sun.COM (Brent Callaghan) writes: < (in response to an off hand complaint of mine about automounter hangs) < > ... < > BTW: it's not easy to hang the automounter with a dead NFS mount. < < some agreement, but see below. < < > A multithreaded automounter will < > definitely be an improvement. < < Much agreement, providing the locking hassles are safely resolved. Just get amd. It forks a child for potentially-blocking operations. I have seen it wedge in some very bizarre situations, but it is much more robust than Sun's automounter. And a lot more flexible/useful/powerful. It's available by anon ftp on usc.edu. ...jay