[comp.unix.wizards] NFS, hung processes

samperi@marob.masa.com (Dominick Samperi) (07/29/89)

There seems to be no obvious way to deal with the problem of hung processes
due to a dead NFS server. The recently posted 'cknfs' program might help
somewhat, but it does not deal with the situation where a server dies after
a user logs in. Furthermore, it appears that a process may still hang
even if no reference is made to dead NFS paths. I don't know why...perhaps
the crashed machine is flooding the network with garbage packets????

Perhaps some experienced NFS users could comment on various tricks that they
have used to deal with the "NFS hang problem"?

Thanks!
-- 
Dominick Samperi -- ESCC
samperi@marob.masa.com
uunet!hombre!samperi

lamy@ai.utoronto.ca (Jean-Francois Lamy) (07/30/89)

Under SunOS 3.x, religiously enforce the following policy: never mount
two partitions from two different machines in the same directory.  That way
the mount point for the partition you are in will always be found before
getwd has to look at the directory entry for a dead machine.  You can most
simply acheive this by mounting all remote partitions on
/nfs/machine/partition and making symlinks to preserve the illusion.

Under SunOS 4.x, getwd caches results.  Poking around with trace reveals that
if all NFS mounts refer to symlinks pointing to the actual mount point you
won't see the NFS hang problem.

Jean-Francois Lamy               lamy@ai.utoronto.ca, uunet!ai.utoronto.ca!lamy
AI Group, Department of Computer Science, University of Toronto, Canada M5S 1A4

jik@athena.mit.edu (Jonathan I. Kamens) (07/31/89)

Query: any reason why this wasn't asked in comp.protocols.nfs?

In article <24D1DF49.7A5@marob.masa.com> samperi@marob.masa.com (Dominick
Samperi) writes:
>There seems to be no obvious way to deal with the problem of hung processes
>due to a dead NFS server. The recently posted 'cknfs' program might help
>somewhat, but it does not deal with the situation where a server dies after
>a user logs in. Furthermore, it appears that a process may still hang
>even if no reference is made to dead NFS paths. I don't know why...perhaps
>the crashed machine is flooding the network with garbage packets????
>
>Perhaps some experienced NFS users could comment on various tricks that they
>have used to deal with the "NFS hang problem"?

  Project Athena has well over 1,000 workstations, with over 10,000
user accounts, and every user gets his home directory over NFS.  There
are also a lot of third-party lockers that people use often that are
exported via NFS.  We therefore encounter this problem much more often
than we'd like.

  The most common way of referencing a dead NFS path even if you don't
realize you're doing it is if you have said path in your search path
and try to execute a program and/or start a new shell.  Both will
cause the search path to be scanned, and they could encounter the dead
path and hang on it.

  One solution, which is what we use, is not to hard mount anything
but the most important NFS filesystems.  We mount all user filesystems
soft with a five minute error timeout by default, so if a user's
fileserver goes down, processes will only try to access it for five
minutes.  Once the user gets his prompt back, he can carefully save
whatever work he is doing to a local hard disk or mail it to himself
to prevent it from being lost.

  The only filesystems we hard mount by default are the system
software packs, since if they go down there isn't much you can do with
the workstation anyway.

Jonathan Kamens			              USnail:
MIT Project Athena				432 S. Rose Blvd.
jik@Athena.MIT.EDU				Akron, OH  44320
Office: 617-253-4261			      Home: 216-869-6432

brent%terra@Sun.COM (Brent Callaghan) (08/01/89)

In article <13134@bloom-beacon.MIT.EDU>, jik@athena.mit.edu (Jonathan I. Kamens) writes:
> 
>   One solution, which is what we use, is not to hard mount anything
> but the most important NFS filesystems.  We mount all user filesystems
> soft with a five minute error timeout by default, so if a user's
> fileserver goes down, processes will only try to access it for five
> minutes.  Once the user gets his prompt back, he can carefully save
> whatever work he is doing to a local hard disk or mail it to himself
> to prevent it from being lost.

A problem with "soft" mounting is that a timed-out I/O will return
an error result to the user program.  Unix programs are notorious
for not checking for error returns on read(), write() etc and can
fail in mysterious ways.

This can be particularly bad in the case of an executable that
is running from a dead server.  A pagein that gets an error from
a soft mount will crash the process and leave a core dump.  I prefer
to mount /usr and local executables ("/usr/local" around here) with
"hard" and set the "intr" option so that I can at least kill a hung
process with a SIGTERM if I get fed up waiting.  The "intr" should
work OK - although it can take a while since it has to wait for the
hung NFS operation to timeout (can take a minute or so).

Made in New Zealand -->  Brent Callaghan  @ Sun Microsystems
			 uucp: sun!bcallaghan
			 phone: (415) 336 1051

pat@orac.pgh.pa.us (Pat Barron) (08/01/89)

In article <13134@bloom-beacon.MIT.EDU> jik@athena.mit.edu (Jonathan I. Kamens) writes:
>[...]
>Samperi) writes:
>>[...] Furthermore, it appears that a process may still hang
>>even if no reference is made to dead NFS paths. I don't know why...perhaps
>>the crashed machine is flooding the network with garbage packets????
>[...]
>  The most common way of referencing a dead NFS path even if you don't
>realize you're doing it is if you have said path in your search path
>and try to execute a program and/or start a new shell.  Both will
>cause the search path to be scanned, and they could encounter the dead
>path and hang on it.

Another possible problem, which can be particularly vexing until you
really figure out what's going on, is that getwd() can hang if there
is a dead NFS fileserver mounted within some directory above your
current directory.  That is, if your current directory is, for instance,
"/usr/users/foobar", and "/usr/local" is mounted from an NFS server
and that server is dead, then doing a getwd() [i.e., "pwd", etc...]
from /usr/users/foobar *may* hang, depending on what order the directory
entries for "users" and "local" appear in /usr.  If "users" is first
then everything is fine.  If "local" is first, then getwd() will try
to do a stat() on /usr/local, and you lose.

The solution to this is to make sure NFS mount points are leaf
nodes in the directory tree (i.e., make sure that no files or
directories in the local filesystem appear in the same directory
as an NFS mount point).

--Pat.
-- 
Pat Barron
Internet:  pat@orac.pgh.pa.us  - or -   orac!pat@gateway.sei.cmu.edu
UUCP:  ...!uunet!apexepa!sei!orac!pat  - or -  ...!pitt!darth!orac!pat

richard@aiai.ed.ac.uk (Richard Tobin) (08/02/89)

In article <13134@bloom-beacon.MIT.EDU> jik@athena.mit.edu (Jonathan I. Kamens) writes:
>  The most common way of referencing a dead NFS path even if you don't
>realize you're doing it is if you have said path in your search path
>and try to execute a program and/or start a new shell.  Both will
>cause the search path to be scanned, and they could encounter the dead
>path and hang on it.
>
>  One solution, which is what we use, is not to hard mount anything
>but the most important NFS filesystems.  

Another solution is to mount the filesystems in, say, /nfs, and have symbolic
links to them from the places people actually refer to.  Then you can
remove the symbolic links if the server is down.

Even better, you can have a program do it.  Here's one I wrote recently.
We've only just started using it, so it may not be bug-free.

-- Richard

/*
 * nfslink [-i interval] [-t timeout] host name mountpt [name mountpt ...]
 *
 * maintain links to mounted file systems, removing them if the
 * remote machine isn't responding.
 *
 * Copyright Richard Tobin / AIAI 1989
 * 
 * May be freely redistributed if this whole notice remains intact.
 */

#include <stdio.h>
#include <errno.h>
#include <signal.h>
#include <sys/time.h>
#include <rpc/rpc.h>
#include <rpc/clnt.h>
#include <nfs/nfs.h>
#include <setjmp.h>
#include <sys/stat.h>

main(argc, argv)
int argc;
char **argv;
{
    int c, interval = 20, timeout = 5, firsttime = 1;
    extern char *optarg;
    extern int optind, opterr;

    while((c = getopt(argc, argv, "i:t:")) != EOF)
	switch(c)
	{
	  case 'i':
	    interval = atoi(optarg);
	    break;

	  case 't':
	    timeout = atoi(optarg);
	    break;

	  case '?':
	    usage();
	    break;
	}

    if((argc - optind) < 3 || ((argc - optind) & 1) == 0)
	usage();

    while(1)
    {
	if(nfscheck(argv[optind], timeout) == 0)
	    makelinks(&argv[optind+1], firsttime);
	else
	    removelinks(&argv[optind+1], firsttime);

	firsttime = 0;
	sleep(interval);
    }
}

makelinks(links, verbose)
char **links;
int verbose;
{
    struct stat namestat;
    
    while(*links)
    {
	char *name = *links++;
	char *mountpt = *links++;

	if(lstat(name, &namestat) == -1)
	{
	    if(errno == ENOENT)
	    {
		if(symlink(mountpt, name) == -1)
		{
		    perror("nfslink: symlink");
		    fatal("can't link %s to %s\n", name, mountpt);
		}
		printf("nfslink: linked %s to %s\n", name, mountpt);
		fflush(stdout);
		continue;
	    }
	    else
	    {
		perror("nfslink: lstat");
		fatal("can't lstat %s\n", name, 0);
	    }
	}

	if((namestat.st_mode & S_IFMT) == S_IFLNK)
	{
	    if(pointsto(name, mountpt))
	    {
		if(verbose)
		{
		    printf("nfslink: %s is already linked to %s\n",
			   name, mountpt);
		    fflush(stdout);
		}
	    }
	    else
	    {
		fatal("%s is a link, but not to %s\n", name, mountpt);
	    }
	}
	else
	{
	    fatal("%s exists, but is not a symbolic link\n", name, 0);
	}
    }
}

removelinks(links, verbose)
char **links;
int verbose;
{
    struct stat namestat;
    
    while(*links)
    {
	char *name = *links++;
	char *mountpt = *links++;

	if(lstat(name, &namestat) == -1)
	{
	    if(errno == ENOENT)
	    {
		if(verbose)
		{
		    printf("nfslink: link from %s to %s is already removed\n",
			   name, mountpt);
		    fflush(stdout);
		}
		continue;
	    }
	    else
	    {
		perror("nfslink: lstat");
		fatal("can't lstat %s\n", name, 0);
	    }
	}

	if((namestat.st_mode & S_IFMT) == S_IFLNK)
	{
	    if(pointsto(name, mountpt))
	    {
		if(unlink(name) == -1)
		{
		    perror("nfslink: unlink");
		    fatal("can't remove link from %s to %s\n",
			  name, mountpt);
		}
		printf("nfslink: removed link from %s to %s\n",
		       name, mountpt);
		fflush(stdout);
	    }
	    else
	    {
		fatal("%s is a link, but not to %s\n", name, mountpt);
	    }
	}
	else
	{
	    fatal("%s exists, but is not a symbolic link\n", name, 0);
	}
    }
}

int pointsto(name, target)
char *name, *target;
{
    /* We don't use stat lest it hang, so it's not quite right */

    char buf[200];
    int len;

    len = readlink(name, buf, sizeof(buf)-1);
    if(len == -1)
    {
	perror("nfslink: readlink");
	fatal("can't read link %s\n", name, 0);
    }

    buf[len] = '\0';
    return strcmp(buf, target) == 0;
}

fatal(fmt, arg1, arg2)
char *fmt, *arg1, *arg2;
{
    fprintf(stderr, "nfslink: fatal error: ");
    fprintf(stderr, fmt, arg1, arg2);
    exit(1);
}

usage()
{
    fprintf(stderr, "usage: nfslink [-i interval] [-t timeout] host name mountpt [name mountpt ...]\n");
    exit(2);
}

static jmp_buf env;
void timedout();

int nfscheck(host, timeout)
char *host;
int timeout;
{
    int stat;
    signal(SIGALRM, timedout);

    if(setjmp(env) == 0)
    {
	alarm(timeout);
	stat = callrpc(host, NFS_PROGRAM, NFS_VERSION, RFS_NULL,
		       xdr_void, 0, xdr_void, 0);
	alarm(0);
	if(stat == 0)
	    return 0;
    }
    return -1;
}

void timedout()
{
    longjmp(env, 1);
}
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

cole@dip.cs.wisc.edu (Bruce Cole) (08/03/89)

In article <658@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes:
>Another solution is to mount the filesystems in, say, /nfs, and have symbolic
>links to them from the places people actually refer to.  Then you can
>remove the symbolic links if the server is down.
>
>Even better, you can have a program do it.  Here's one I wrote recently.
>We've only just started using it, so it may not be bug-free.

     We considered implementing something like this.  Unfortunately, for our
usage of NFS, simply switching symbolic links is not enough.  Most of our
workstations NFS their /usr partition.  All of the processes that are
referencing the original server remain hung even after the necessary symbolic
links are changed.
     I solved this problem by implementing NFS kernel changes on the client
workstations.  When an NFS request times out, the client consults a user
specified list of equivalent servers to find an alternate server.  The NFS
request is automatically converted by the client workstation to be sent to the
alternate server.  Running programs do not even realize that a switch in
servers has been made.  The code works great for read only NFS partitions
(such as /usr.)  It is useless for handling remotely mounted home directories.
     I also made made changes to the way NFS handles interruptible file systems
so that a user can interrupt an NFS request in a realistic amount of time.
(Current NFS implementations can take several minutes to act upon an interrupt
request.)

--
Bruce Cole
Computer Sciences Dept.
U. of Wisconsin - Madison