tr@dduck.ctt.bellcore.com (tom reingold) (04/28/89)
It says in the SunOS 4.x manual page for mount(8) that a soft mount "returns an error if the server does not respond" and that a hard mount "continues the retry request until the server responds". In my experience, the choice is worse than it appears: Soft mounts do not always return errors on writes; they can just garble the output without warnings. And hard mounts are not interruptible, even with the "intr" option. Further, it says that "filesystems that are mounted `rw' (read-write) should use the `hard' option. The problem with this is that the entire client and all of its processes can hang when one process is trying to write on a hard-mounted filesystem when the server is down! This is a terrible situation when the client is a large time-sharing system. What are my real choices, and how do they differ from the manual. What else can I read on this? Tom Reingold |INTERNET: tr@ctt.bellcore.com Bell Communications Research |UUCP: bellcore!ctt!tr 444 Hoes La room 1E225 |PHONE: (201) 699-7058 [work], Piscataway, NJ 08854 | (201) 287-2345 [home]
ed@mtxinu.COM (Ed Gould) (04/29/89)
>And hard mounts are not interruptible, even with the "intr" >option. Hard mounts *are* interruptible with the "intr" option. That's precisely what the option is for, since soft mounts were always interruptible in essence. >Further, it says that "filesystems that are mounted `rw' (read-write) >should use the `hard' option. The problem with this is that the entire >client and all of its processes can hang when one process is trying to >write on a hard-mounted filesystem when the server is down! This is a >terrible situation when the client is a large time-sharing system. The client shouldn't hang when waiting for a server, only processes waiting for that server should block. It's easy to be careless and depend too heavily on a server, though. -- Ed Gould mt Xinu, 2560 Ninth St., Berkeley, CA 94710 USA ed@mtxinu.COM +1 415 644 0146 "I'll fight them as a woman, not a lady. I'll fight them as an engineer."
mre@beatnix.UUCP (Mike Eisler) (05/02/89)
In article <840@mtxinu.UUCP> ed@garcia.mtxinu.COM (Ed Gould) writes: >Hard mounts *are* interruptible with the "intr" option. That's precisely >what the option is for, since soft mounts were always interruptible >in essence. They are certainly designed and coded to be interruptible in SunOS, but I know that in at least SunOS 3.2, they didn't work, or they took a loooong time to interrupt. Can't say about SunOS 4.0 though. In Lachman's NFS for System V.3, hard mounts were always instantly interruptible. >>Further, it says that "filesystems that are mounted `rw' (read-write) >>should use the `hard' option. The problem with this is that the entire >>client and all of its processes can hang when one process is trying to >>write on a hard-mounted filesystem when the server is down! This is a >>terrible situation when the client is a large time-sharing system. Often, NFS file systems are mounted on a directory immediately under "/". When a process does a getcwd() this will ultimately result in the directory entries of "/" being searched and stat()ed. So if one those directories is a mount point for a NFS filesystem, and the server is down, getcwd() will hang, and so will the process calling it. Getcwd() is called more often than one would think, so most of the interactive proceses on the client end up hanging. This is particularily noticeable during login. The fix is to never mount NFS filesystems under a directory that is likely going to be a component of somebodies cwd, and "/" is in everyone's cwd. We mount all our NFS filesystems under a directory called /n. -Mike Eisler {uunet,sun}!elxsi!mre
fletcher@cs.utexas.edu (Fletcher Mattox) (05/02/89)
In article <840@mtxinu.UUCP> ed@garcia.mtxinu.COM (Ed Gould) writes: >Hard mounts *are* interruptible with the "intr" option. That's precisely >what the option is for, since soft mounts were always interruptible >in essence. I realise that's what the manuals say. And I've looked at the code which puports to do this. But the fact is that the "intr" option just plain doesn't work much of the time. Does anybody know why this is so? Thanks Fletcher Mattox Ps: Oh yeah, I'm *not* complaining about it taking a long, but finite, amount of time to process the interrupt.
hedrick@geneva.rutgers.edu (Charles Hedrick) (05/03/89)
I have yet to see a case on the Sun where you couldn't ^C out of a hung NFS connection when intr is set. However it only checks for interrupts at one point in the code. Depending upon how timeouts are set, it can take a couple of minutes to get out. I agree that this is very frustrating. My three big gripe for NFS are: - ^C should happen immediately - df should not hang if one of the servers is hung. It should simply print "server not responding" on that line and go on to the next. (Pyramid attempted to make this work, by allowing a program to specify that disk access should be treated as soft just for this program, even if the disk is mounted hard.) - pwd, getcwd, etc., should be coded so that they don't hang unless your current directory is actually on the hung server NFS is very useful, but our users have gotten to hate it. (It doesn't help that SunOS 4.0 has several bugs that cause NFS to hang. They're slowing being fixed, but our users are never going to forgive me for putting them through this. Of course *I'm* to blame personally for all Sun bugs.)
alarson@pavo.SRC.Honeywell.COM (Aaron Larson) (05/03/89)
In article <May.2.15.58.31.1989.5175@geneva.rutgers.edu> hedrick@geneva.rutgers.edu (Charles Hedrick) writes:
....
- pwd, getcwd, etc., should be coded so that they don't hang
unless your current directory is actually on the hung server
We've gotten around this particular bug (and in the process several related
ones) by hard mounting our partitions in a rather indirect manner. On our
machines /n is a directory where we (conceptually) mount all the remote
file systems. If you ACTUALLY mount remote partitions in one directory you
lose when one of the machines goes down. This happens because as the
directory is scanned looking for machine foo, stat hangs on the down machine.
We get around this by actually mounting the paritions in another directory
structure, and link to them. For example our /n dir looks like:
/n/foo -> mumble/foo/mp
/n/bar -> mumble/bar/mp
We mount foo's partitions on top of mumble/foo/mp, and reference them
through /n/foo (the actuall structure is not important other than you must
be sure that only one machines partitions are mounted in any one
directory). Now, when the /n directory is scanned, the process won't hang
because it won't actually stat a directory that has mount points until it
finds the desired link. Since doing this, we've not had any problems with
hanging processes unless they actually reference the partitions of a down
host (as df and friends do). It even appears that the time spent following
the links is more than offset by the time spent STATing the remote
partitions!
Aaron Larson MN65-2100 (612) 782-7308
Honeywell Systems & Research Center alarson@SRC.Honeywell.COM (internet)
3660 Technology Drive alarson@srcsip (uucp)
Mpls, MN 55418 {umn-cs,ems,bthpyd}!srcsip!alarson
beepy%commuter@Sun.COM (Brian Pawlowski) (05/04/89)
<I wasn't sure if this made it to the newsgroup - I'll try again - bjp> Carl Smith and Mark Stein have given me some notes on what intr, hard and soft mean to an NFS client. This doesn't answer all your questions, but does put them in context. This information is pretty accurate for UNIX client implementations of NFS derived from the NFS/ONC reference port. The analogies are entirely my own. An NFS client will timeout a request if the server does not respond in some (user specifiable) period. This is coupled with a retrans count and a backoff mechanism on the timeouts to deal with slow servers. A server must be able to deal with multiple, duplicate requests arising from retries as a result of his tardy responses. Mark Stein gave an enlightening talk on the timeout strategy for a UNIX client during the MVS/NFS server development: The mount operation allows specifying a timeout value - timeo, and a retry value - retrans - the number of retransmissions of the NFS operation, and whether the mount is hard or soft. The soft option returns an error if the server does not respond (as described below), whereas hard says continue the retry request until the server responds. The intr option may be added to modify the the behaviour of hard mounts, and allows keyboard interrupts to stop the retransmissions. These are described in the following pictures. A normal NFS request (which is successful first shot) is processed as follows: Client The Ether Server ------ --------- ------ NFS ---> ---------> | request | <------ | server | Increasing responds | Time <--------- | Response | Client | Continues | V The following picture shows timeouts (timeo value is entered in tenths of seconds) up to a retry value (4) against an unresponsive server: Client The Ether Server ------ --------- ------ NFS ---> - ---------> | | request | timeo = 7 | | | NFS ---> - ---------> | | request | timeo = 14 | | | Increasing NFS ---> - ---------> | Time | request | timeo = 28 | | | NFS ---> - ---------> | | request | timeo = 56 | retrans = 4 | | - ---------> | request | <------- | Timeout returned | to caller IF SOFT! - A Major Timeout | else if HARD or INTR, double | timeo and reenter loop. | V A TIMEOUT is registered on the client from NFS only after the timeo time has elapsed for the specified number of retrans retransmission specified. The initial timeo value itself may be dependent on the type of operation (write vs. getattr vs. read) in a given NFS client implementation. On each retransmission, the timeo value is doubled. If a server is mounted soft, the timeout is returned to the calling procedure or program. If a server is mounted hard, NFS will backoff (double the current timeo value on each major timeout to some maximum) with a new, longer timeo value and attempt again for the specified retrans count (with the new current timeo value doubled at each retransmission). The initial default timeo on entry to each retransmission cycle has a maximum value of 30 seconds. The maximum timeo in retransmission sequence has a maximum value of 60 seconds. timeo is specified in tenths of seconds If the server is mounted intr, this is the same as hard, except that on major timeouts (current, aged timeo value times retrans count with backoff) a software interrupt may force an error return of timeout to the calling procedure or program. In older implementations of NFS, an interrupt can only slip in on a major timeout, a request that has an aged timeo value with even a small retrans count can take a mighty long time indeed to respond when a server is mounted intr. Later implementations allow the interrupt to stop retransmissions much sooner. Sometimes a mount may seem uninterruptable, when in actuality the client may have backed off so the window for the interrupt to take effect is a long way off, in an older NFS implementation. Now on soft mounts garbling the data: this is entrirely application dependent. If applications check their errors on write()'s (mine do :-) then they will see the error and will most likely abnormally end. Most applications probably do not, so you get intermittent failures, some successes, and resulting garbled data. Now what I forget to do is to check the return values on close() - where you may see an asynchronous error from a previous write() call. This may be where you see writes returning OK - but if you check your close(), it will probably fail. Remember - the writes to the server a REALLY asynchronous to your application given the buffering inherent in UNIX (which exists between your application and NFS). The fact that the write returns OK to the application, and may later fail (soft mounted) is consistent with normal UNIX behaviour for say a failing disk - where the error is detected at some time after the write() returns OK to the application when the buffer is actually attempted to write to disk. That is why the safest bet, for critical data (such as the NFS files which represent your ROOT PARTITION for diskless clients), is hard mount the file system. If you like living on the ragged edge, specify the intr option on writable partitions - then you have the control as to whether or not you'll trash your file writes in process - with the same behaviour as if you've interrupted write()'s to a local hard disk. An analogy is that mounting hard with the intr option makes your server most resemble local hard disk for your applications. If you mount your active writable filesystems soft, you might consider taking up skyjumping for a hobby where you use randomly defective parachutes for that certain extra thrill. Some work is being done for future NFS releases which implements a dynamic retransmission algorithmn which would affect the above discussion. This is pretty valid for UNIX clients out there now. Brian Pawlowski Manager ONC Porting Brian Pawlowski <beepy@sun.com> <sun!beepy> Sun Microsystems, Portable Software Products
paul@morganucodon.cis.ohio-state.edu (Paul Placeway) (05/16/89)
In article <21283@srcsip.UUCP> alarson@pavo.SRC.Honeywell.COM (Aaron Larson) writes: In article <May.2.15.58.31.1989.5175@geneva.rutgers.edu> hedrick@geneva.rutgers.edu (Charles Hedrick) writes: .... - pwd, getcwd, etc., should be coded so that they don't hang unless your current directory is actually on the hung server We get around this by actually mounting the paritions in another directory structure, and link to them. For example our /n dir looks like: /n/foo -> mumble/foo/mp /n/bar -> mumble/bar/mp We mount foo's partitions on top of mumble/foo/mp, and reference them through /n/foo (the actuall structure is not important other than you must be sure that only one machines partitions are mounted in any one directory). This is quite similar to what we do here. For each client, we have a local directory /n, and in it a _local_ directory for each machine, inside of which are the NFS mount points: /n/dinosaur/0/paul ^ ^ | NFS mount point LOCAL directory containing all mount points for dinosaur (the staff server) When pwd comes crawling down, it goes from the nfs mounts into a directory of mount points for just that server (which must be up to get this far), and then into /. We also have the servers mount the disks in the same places so we don't go out of our minds when on a server rather than a client. We generally soft mount everything with timeo=20 and retrans=8. Despite what anyone from Sun may say, we havn't had any problems with soft mounting most everything, (and no, none of us skydive :-) ). I have my own machine hard mount all of it's own server's partitions, since if the server is down my sun hangs anyway (no local disks here). On the other hand, waiting forever and a day for ^C to respond is _real_ annoying. I'm supprised that Sun didn't put in a special check for NFS while they were fattening the kernal with other stuff... -- Paul