[comp.unix.wizards] Lots of NFS cross mounts?

berger@datacube.UUCP (04/05/88)

We are planning to have about 30 Sun 3/50's and 3/60's on our network.
Our plan is to put 150Mbyte to 380Mbyte SCSI drives on all of these
instead of running them diskless. It seems to be cheaper than having a
few large nd servers. Personal and some group file systems would be on
each of these local disks and they would generally be cross NFS
mounted all around the net.

Does anyone have any experience with having many (30 or more) partions
cross mounted on many (30 or more) machines? Are there any impacts
that we should be aware of?

One thing that makes us nerveous is a problem we have seen on our
current set up. The problem is when one server is down but clients
have partitions of the downed server NFS mounted.  The clients get
bogged down even when they are not explicitly trying to access
partitions on the downed server. We are using Soft mounts.... 

Please send pointers, comments, suggestions to:

				Bob Berger 

Datacube Inc. Systems / Software Group	4 Dearborn Rd. Peabody, Ma 01960
VOICE:	617-535-6644;	FAX: (617) 535-5643;  TWX: (710) 347-0125
UUCP:	berger@datacube.COM,  rutgers!datacube!berger, ihnp4!datacube!berger
	{cbosgd,cuae2,mit-eddie}!mirror!datacube!berger

randy@ncifcrf.gov (The Computer Grue) (04/05/88)

In article <106600042@datacube> berger@datacube.UUCP writes:
>	One thing that makes us nerveous is a problem we have seen on our
>	current set up. The problem is when one server is down but clients
>	have partitions of the downed server NFS mounted.  The clients get
>	bogged down even when they are not explicitly trying to access
>	partitions on the downed server. We are using Soft mounts.... 

    I believe I understand this problem, and it might make useful
  information for many.  When a user logs in, the login program
  automatically runs the quota program for all mounted file systems.
  This looks for the file 'quotas' in the top directory of the mounted
  file system.  This is, of course, an NFS access, and if the system
  is down can cause login to hang for a *long* (well, relativeley
  long.  A minute per fs) time.  There are two solutions.  One (what I
  would recommend) is to mount all of those file systems with the
  noquota option in fstab; this should prevent the check.  The other
  (call it the quick and dirty method) is to make /usr/ucb/quota a
  link to /bin/true.  That will sortof blow away the problem (at the
  expense of quotas being runnable, but you get what you pay for . .
  .) 

    Just so noone thinks I'm trying to take credit, both of these
  suggestions originated with SUN in their Tech bulletins.  I believe
  they were actually attributed to Chuq von Rospach.  In any case,
  they are rather useful.

						-- Randy



-- 
  Randy Smith    @	NCI Supercomputer Facility
  c/o PRI, Inc.		Phone: (301) 698-5660                  
  PO Box B, Bldng. 430  Uucp: ...!uunet!ncifcrf.gov!randy
  Frederick, MD 21701	Arpa: randy@ncifcrf.gov

meissner@xyzzy.UUCP (Michael Meissner) (04/07/88)

In article <106600042@datacube> berger@datacube.UUCP writes:
| 
| We are planning to have about 30 Sun 3/50's and 3/60's on our network.
| Our plan is to put 150Mbyte to 380Mbyte SCSI drives on all of these
| instead of running them diskless. It seems to be cheaper than having a
| few large nd servers. Personal and some group file systems would be on
| each of these local disks and they would generally be cross NFS
| mounted all around the net.
| 
| Does anyone have any experience with having many (30 or more) partions
| cross mounted on many (30 or more) machines? Are there any impacts
| that we should be aware of?
| 
| One thing that makes us nerveous is a problem we have seen on our
| current set up. The problem is when one server is down but clients
| have partitions of the downed server NFS mounted.  The clients get
| bogged down even when they are not explicitly trying to access
| partitions on the downed server. We are using Soft mounts.... 

I do not have direct experience with NFS on Suns, but I do have some with
Data General's port of NFS on it's native unix (DG/UX).  However, since
the majority of UNIX's seem to implement file lookup in the same way, it
should apply to other machines.

When you are scanning a directory (readdir, getdirent) and running stat
and hit an entry that is a NFS mounted disk, you will either time out
(soft mount) or wait until the server comes up (hard mount).  In the
classical UNIX implementation, the order searched by readdir is in
directory slot order (ie, in a new directory . and .. fill the first two
slots, the next entry created gets the next free slot.  Deleting entries
puts the slots on the free list and so forth.  Thus if you are searching
for an entry "fred", but "barney" is in a previous slot and is currently
down, you will be pended.  Thus for frequently searched file systems (/,
/usr in particular), you can put local files and directories ahead of
remote directories, by careful moves, renames, deletions.  For systems
that use hashing, you probably don't have that option, and have to hope
for the best.

You may be saying that acceptable for when you have to search the entire
directory for things like ls, and it typically is.  However, the usual
implementation of pwd is to open the parent directory, do readdir until
it finds an entry whose inode corresponds to ".".  This is repeated for
succesive parent directories until you come to "/".  It turns out that
lots of commands do a pwd (or popen("pwd"), getcwd, or getwd which
amount to the same thing).
-- 
Michael Meissner, Data General.		Uucp: ...!mcnc!rti!xyzzy!meissner
					Arpa/Csnet:  meissner@dg-rtp.DG.COM

karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) (04/08/88)

berger@datacube.UUCP writes:
   We are planning to have about 30 Sun 3/50's and 3/60's on our network.
   Our plan is to put 150Mbyte to 380Mbyte SCSI drives on all of these
   instead of running them diskless. It seems to be cheaper than having a
   few large nd servers. Personal and some group file systems would be on
   each of these local disks and they would generally be cross NFS
   mounted all around the net.

   Does anyone have any experience with having many (30 or more) partions
   cross mounted on many (30 or more) machines? Are there any impacts
   that we should be aware of?

We're working in that vicinity.  My particular 3/50 has 32 NFS
filesystems mounted, plus the ND partitions / and /pub.  Our
partitions are physically resident on a mix of Sun3/180 fileservers
and Pyramids.  We have been pretty nervous about moving to soft mounts
everywhere; a lot of programs behave quite well in the presence of
write I/O failures, but a lot don't, and we have to worry quite a bit
about confused undergrads wondering what Emacs is trying to tell them
when the message "file disappeared; hope that's OK" (or whatever the
actual text is; I forget just now) occurs in the minibuffer.  Shell
">" redirection is particularly uninformative when it can't
successfully open the file, but again I can't remember the exact
diagnostic - but it does not intuitively connect to a missing server.
Nonetheless, on the subnet where the staff Suns live, we have most
everything soft-mounted and it seems to be doing really quite well.
I've been advocating it for some months and we will probably get there
over summer.

One of the bigger problems to explain to people who do not fully grok
your network is why they can't get at their files when particular
servers are missing.  Our network is intended to be totally uniform:
every single 3/50 can be logged into by anyone who can login to any of
them, and an identical view of the world is presented on all of them.
We on the staff do personalized diddling (my desk 3/50 mounts 6
filesystems more than normal, for example), but in general one can
expect this total uniformity.  Now, a user whose files live on the Sun
server Fish doesn't understand immediately what's wrong when he can
login to a 3/50 named Ostrich (on the Bird subnet) but can't see the
files in his home directory when Fish is down.  It would have been
much more obvious that there was a problem if he'd tried to login on
Carp, since, as an ND client of Fish, he wouldn't have gotten anywhere
at all.  This sort of partial success is what drives our more naive
users bananas (not to mention the operators and consultants who are
answering questions).

Sooner or later, we're going to run into the minor headache that the
kernel's mount table will be too small to let us mount as many
filesystems as we want.  I don't remember just now where that can be
config'd, but the procedure exists and we have to plan on putting it
to use before too long.  You will have to do that, too.

In the realm of subjective personal opinion, I think your plan for
large numbers of discs spread across more-or-less personal machines is
an administrative nightmare.  We've found that the centralized server
approach is relatively lacking in pain to manage, whereas if we had
local discs on a large fraction of the systems here, we would never
get any real work done at all.  As it is, with 11 Sun fileservers
(roughly 12-20 3/50 clients apiece) and 3 Pyramids (soon to be 5),
major client reconfigs are something we delay until breaks between
academic quarters in the hope of not screwing up the work of too many
people at once.  Software updates can be simplified some via rdist and
similar tools, but still one must wonder about the sanity of that many
discs spread across that many systems over which I expect you don't
have full physical access control.  (Our servers live in protected,
locked machine rooms, of course.)  If you want to go that route, feel
free, but I don't envy you the task.

In any event, if little 3/50s are going to be providing disc service
to substantial numbers of other systems, your CPUs are going to be
screaming for relief.  At least try to do it all with 8Mb 3/60s lest
you be stuck with 4Mb 3/50s that spend all their time swapping.  One
advantage of lots of local disc is that you'll be able to swap to disc
rather than over the ethernet, but I would consider that small
consolation against the unavoidable overhead of that much NFS traffic.

--Karl

lmb@vsi1.UUCP (Larry Blair) (04/08/88)

<In article <106600042@datacube> berger@datacube.UUCP writes:
< 
< We are planning to have about 30 Sun 3/50's and 3/60's on our network.
< 
< One thing that makes us nerveous is a problem we have seen on our
< current set up. The problem is when one server is down but clients
< have partitions of the downed server NFS mounted.  The clients get
< bogged down even when they are not explicitly trying to access
< partitions on the downed server. We are using Soft mounts.... 

I'm not sure why things bog down, but we always mount hard,intr,retrans=10.
This eliminates the problem of lost writes that we kept seeing with soft
mounts.
down servers.  We had a lot

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (04/08/88)

In article <106600042@datacube> berger@datacube.UUCP writes:
| 
| Does anyone have any experience with having many (30 or more) partions
| cross mounted on many (30 or more) machines? Are there any impacts
| that we should be aware of?

Someone else mentions disabling the quota checks. YES!!!

Another thing you can do ( besides wait for SunOS 4.0) is to keep the
NFS partitons away from your searchpath. That is, include a
/usr/local/bin in your searchpath, but avoid /usr/server/local/bin
in case the server is down. (You will find it very frustrating
when you can't open a new window if your .cshrc file sets the path
to a down NFS partition).

Instead, make a link from /usr/local/bin/prog or $HOME/bin/prog to
/usr/server/local/bin/prog.

This way, you will only get a timeout when you EXECUTE the program.

Same thing with .rootmenu files - being unable to pop open your root
menu is also frustrating.

Another suggestion is to change the mount point from

	/usr/server
to
	/home/server
with a symbolic link from /usr/server to /home/server


You don't have to change the /etc/fstab entry, BTW.

what does this get you? - well, you can do this 

	cd /usr
	ls -l
	du
without getting hung on down NSF machines.
Also - I believe SunOS 4.0 and BSD-4.3-tahoe are going to a similar
structure.

I have also learned to check out disk  space with either

	df -t 4.2
or
	df &

The rule is - avoid accessing 'wired' NFS mounts in your menus, search
paths, shell scripts, etc.

Current count of diskless Suns: 131

-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

leres@ucbarpa.Berkeley.EDU (Craig Leres) (04/10/88)

Another way to lose is to have directories in your path that are
on nfs filesystems. If the remote system is down, your login csh
will get hung up waiting for the readdir() to time out while it's
hashing your path.

		Craig

dannyb@kulcs.uucp (Danny Backx) (04/11/88)

In article <371@ncifcrf.ncifcrf.gov> randy@ncifcrf.gov (The Computer Grue) writes:
>  information for many.  When a user logs in, the login program
>  automatically runs the quota program for all mounted file systems.
>  This looks for the file 'quotas' in the top directory of the mounted
>  file system.  This is, of course, an NFS access, and if the system
>  is down can cause login to hang for a *long* (well, relativeley
>  long.  A minute per fs) time.  There are two solutions.  One (what I
>  would recommend) is to mount all of those file systems with the
>  noquota option in fstab; this should prevent the check.  The other
>  (call it the quick and dirty method) is to make /usr/ucb/quota a
>  link to /bin/true.  That will sortof blow away the problem (at the
>  expense of quotas being runnable, but you get what you pay for . .
>  .) 

An easy fix is to change the /bin/login program, which contains a line

	char QUOTAWARN[] = "/usr/ucb/quota" ;

into
	char QUOTAWARN[] = "/usr/ucb/quota &" ;

Advantage : you don't have to wait a few minutes before getting in the system.
Disadvantage : the output from quota appears a bit later, and may clobber your
screen contents...

It is a simple fix, though, that we are using for a few months now.

	Danny

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Danny Backx                            |  mail: Katholieke Universiteit Leuven 
 Tel: +32 16 200656 x 3058              |        Dept. Computer Science
 E-mail: dannyb@kulcs.UUCP              |        Celestijnenlaan 200 A
         ... mcvax!prlb2!kulcs!dannyb   |        B-3030 Leuven
         dannyb@blekul60.BITNET         |        Belgium

mouse@mcgill-vision.UUCP (der Mouse) (04/23/88)

In article <23567@ucbvax.BERKELEY.EDU>, leres@ucbarpa.Berkeley.EDU (Craig Leres) writes:
> Another way to lose is to have directories in your path that are on
> nfs filesystems. If the remote system is down, your login csh will
> get hung up waiting for the readdir() to time out while it's hashing
> your path.

In my case, the shell hangs in getwd(), because there's an nfs mount
point as a sibling of one of the directories in the chain getwd follows
from my ~ to /.

This is also true whenever I start something that does a getwd(), like
emacs.  Very annoying.

What NFS needs is per-host rather than per-operation timeouts.  When an
operation times out, all operations to that host fail immediately.  The
client's kernel periodically pings the host with a no-op request and
when it gets an answer, it cancels its dead status.  I'd like to put
this into our VAX kernels, but I don't have time to just now.

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu