[comp.protocols.nfs] NFS Mount Point Strategy

geoff@bodleian.East.Sun.COM (Geoff Arnold @ Sun BOS - R.H. coast near the top) (12/03/90)

[Since this is of interest to folks who might not read comp.sys.apollo
and comp.unix.admin, where it was originally posted, I am reposting
this to comp.protocols.nfs with Mike's permission. Geoff]

Quoth system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) (in <1990Nov28.150038.3610@alchemy.chem.utoronto.ca>):

Here is the summary of the e-mail and posted responses to my questions
about NFS mount point strategies:

>1) What options should I use on various systems for the mount command
>   (e.g. soft vs. hard, use bg or not, use 'hard,bg', retry counts,
>   timeouts)?

Responses:

I use hard - I used soft for a little while but one of my users lost
some data when the server crashed (his program didn't check for errors
on the printf).

Soft.

For read only file systems use soft option. For read/write use hard option.
Use the defaults for retry and timeouts unless you start getting
"system_xyz not responding" messages and then you can increase them.

The only reason I can see for not using bg would be if the machines
depends on information on the file system to configure itself.

ALWAYS use "bg"; it just means your clients won't hang as badly on bootup
when a server is down.  I use "hard,intr" mounts for filesystems that are
writable so that I get the best data integrity while leaving people have a
chance to kill a hung process.  "soft,ro" is a nice combination for stuff
like man pages and reference sources that programs don't depend on.

Summary: use 'hard,bg' where integrity matters, 'soft' where it doesn't
(probably mainly useful with 'ro').


>2) What directory structure is best for the actual mount points:
>   a) mount "system:/dir" on /system_dir and let the users refer to 
>      /system_dir/..... (so that the user reference point is the
>      actual mount point)?
>   b) mount "system:/dir" on /nfs/system_dir and let the users refer to 
>      /system_dir/..... where /system_dir is a link to /nfs/system_dir
>      (so that the user reference point is 1 link removed from the
>      actual mount point)?
>   c) mount "system:/dir" on /nfs/system_dir and let the users refer to 
>      /system_dir/..... where /system_dir is a link to /net/system_dir
>      and /net/system_dir is a link to /nfs/system_dir
>      (so that the user reference point is 2 links removed from the
>      actual mount point)?

Summary: almost everyone preferred option b).

Responses:

I use two technical guidelines:
1. Avoid mount points from different hosts in the same directory.
2. Avoid mounting an NFS file system on top of another.

The choice here depends on several criteria.
One major one is that you don't want anyone to hang on a down NFS server
if they are not using that server.  This means that the mount points
have to be `out of the way' when users are doing a pwd (search backward
in directory tree to find current position) or something similar.

We actually use /rmt/<servername>/<fsname> so it's easier to figure
out where everything is, and we have a symlink from /<fsname> to
the actual mount point.  This is really important, since getwd()
and 'pwd' can hang if it stumbles across a remote mount point as
it walks up the tree looking for the right components.  If you can't
do symbolic links, you're kind of stuck, though.

Well, here at OSU, all of our machines run NFS and we ran into a BIG
problem with inconsistant naming, so we thought a bit and cam up with
this: mount all partitions under /nfs/machinename/partition.

Question: Is there some advantage I'm missing to having everything mounted
under /nfs?  E.g., why not just /machinename/partition?
Reply: You'll regret it if you don't have your nfs mounts 3 levels down.  The
method used by pwd (and by the C library getwd() routine which uses
what pwd does to determine the current directory) necessitates walking
up the directory tree and doing a stat() on each directory in .. to
find out where it came from (save the inode number of . before you
move up to .. and then compare that against the inode number of each
directory in the new current directory).  When it does a stat() on an
nfs mounted directory where the nfs server is down you'll hang.  csh
uses getwd() to initialize the csh variable $cwd so users will hang
when logging in if one of the nfs servers is down.  Likewise, every
time you do cd csh uses getwd() to set $cwd.
So each nfs mount has to be mounted on a directory that must be the
only directory in its parent; i.e., it must not have any "sisters" or
"brothers".
I can't remember why they have to be 3 levels down instead of only 2;
someone else can probably explain why.

One problem I can see with mounting every thing in the / directory is 
performance of the getwd() function. Getwd backtracks all directories
with .. stat'ing all the directory entries, i.e.
   until root do
   {
	get i-node of current dir 
	cd ..
	for every dir entry 
	{
		stat 
		compare with i-node
		if they match 
			we have discovered part of path, add it to name
			&& break;
	}
   }
(* hope you get the idea *) 
Now if you have all the NFS mounted stuff in /, EVERY getwd means
stat'ing NFS mounted volumes. This of course works, EXCEPT if one NFS 
server is down, because then the stat will hang wait & for time-outs etc.
meaning that
# pwd 
(and any other program which uses getwd() :-() will more or less hang.

Mounting everything in /nfs means that you only will hang if you are
below /nfs - a major improvement, but still: if you use pwd in an NFS 
file system, you'll have the same problem as described before, so 
if one NFS server is down, getwd() is down for _ALL_ NFS file systems.

There was an interesting paper at the last EUUG/EurOpen conference
(Autumn '90) how they solved the problem at Chalmers University.
You may want to get hold of a copy from:
   EUUG Secretariat 
   Owles Hall
   Buntingford
   Herts SG9 9PL
   UK

Is there some advantage I'm missing to having everything mounted under /nfs?
E.g., why not just /machinename/partition?
Aside from the problems of getwd(), there is also the simple fact that
/ gets _awfully_ cluttered if you have a lot of servers.  I have 31
servers in my fstabs; I don't want an extra 31 directories in /.

>We mount system:dir on /nfs/system/dir and have a symbolic link to this.
>This has the advantage that when getwd() searches a directory, it never
>looks at unnecessary remote mount points.
This does not necessarily fix the hanging problem for SunOS 4.0.x systems.
The getwd() algorithm was changed to where, every time a mount point is
crossed, getwd checks the /etc/mtab and tries to find a mount moint with
the same device id. If it does find one, it prepends the path for this
mount point to the current path (derived so far..). While this means that
getwd doesn't walk all the way up the tree to /, it may stat most of the
entries in /etc/mtab which of course could make things worse...Sun uses
a getwd cache to get around this problem which in turn leads to other
problems...


>3) Does the answer to 2) depend on the answer to 1), and/or the
>   reliability of the systems involved?

Response:

Where you mount a remote file system has nothing to do with the mount
options. Unless you have some policy about where you mount read only file 
systems.


>4) What naming schemes are used to handle the large number of potential
>   NFS mounts (for example, Physics/Astronomy/CITA here give each
>   disk/partition a name (of a tree from the forest), and Apollo
>   suggests systemname_dir; I can see advantages of both schemes since
>   the former makes disk names consistent everywhere and users don't
>   need to know what physical systems files really reside on, whereas 
>   the latter brings some order, especially for the sysadmin)?

Responses:

I mount all the filesystems under "/n/<system>/<disk>",
ie: "/n/shape/0", "/n/shape/1", "/n/point/0", etc...This way I
can NFS mount "/n/shape/0" on all the clients as "/n/shape/0" - which
means the users don't have to worry about their home directory being
in a different place on different machines.

We use real mount points with names of the form /net/hostname/fsname and use
symlinks to point to these mount points.
We name file systems containing user home directories things like /cis/reef0
or /cis/lightning1, where reef and lightning are server names.  These names
are true mount points on the server itself, but everywhere else they are
symlinks to /net/reef/0 and /net/reef/1.  We also have a few directories
like /cis/tmp, /cis/adm and /cis/src, which are symlinks to /net/beach/tmp,
/net/manatee/adm and /net/manatee/src.

From an administration point of view, it is very useful to have a
strict scheme for mounting file systems.  Packages that insist on being
in certain places can be pacified with an appropriate symbolic link.
The scheme used here in our network (MIPS M/120s, MIPS 3000s, DecStation
3100s, Sun 4s, Sun 3s and Vaxen) is the following:
Everything from machine x is mounted under /nfs/x.  Ie, if we wanted to
mount /users/cs/staff, /usr/lib/tex and /usr/local from machine x on
machine y
        x:/users/cs/staff       mounted on      /nfs/x/users.cs.staff
        x:/usr                  mounted on      /nfs/x/usr
with symbolic links
        /users/cs/staff         ->              /nfs/x/users.cs.staff
        /usr/lib/tex            ->              /nfs/x/usr/lib/tex
        /usr/local              ->              /nfs/x/usr/local
(and if X11 was kept in /usr/local/X11 [as it is here])
        /usr/bin/X11            ->              /usr/local/X11/bin
        /usr/lib/X11            ->              /usr/local/X11/lib
        /usr/include/X11        ->              /usr/local/X11/include
The local aim in regards to how users access things on different systems is
to use symbolic links provide a single name on all computers.  If the
user wants to know where something actually resides, he can use the
df command to find out (ie, df . will tell you where the current directory
physically resides).

Another thing you want to think about is whether your pathnames look
the same no matter where you are (i.e., on a machine for which the fs
is local or one for which it is nfs).  If this is the case, you can
have one home directory, run yp, and everything is nice and
transparent.  On our cluster, all user filesystems, both local and nfs,
are mounted underneath /home -- I wish this were the convention for all
machines I have accounts on...

In general, make data available through:
	/<class>/<instance>
Where <class> could be "home" for home directories, "vol" for volumes
(i.e. applications and other mostly read-only data, "source" for public domain or free source code that you have
collected, "project" for project directories, "distrib" for software distributions, etc. Some examples:
	/home/john	John's home dir
	/home/mary	Mary's home dir
	/vol/frame	default FrameMaker application directory
	/vol/frame-2.1	explicit FrameMaker 2.1 application directory
	/vol/emacs	GNU Emacs
	/project/xyz	Project xyz
	/project/abc	Project abc
	/source/emacs-18.55 GNU Emacs 18.55 sources
	/source/nn-6.3	nn 6.3 sources
	/distrib/frame-2.1 FrameMaker 2.1 SunView distribution
	/distrib/framex-2.1 FrameMaker 2.1 X-Windows distribution
Get it?! I consider having the server name (or partition for that
matter) somewhere in a path a bad thing. It means that you have to tell
all and everyone when you move, say, an application to another
partition or server.

BTW I am planning to write an article on this and am preparing a
presentation on the what and how. Be patient, I'll try to get it posted
in this news group.  Martien F. van Steenbergen

You may want to read a paper that some colleagues and myself wrote
recently.  Its called "The Depot: A Framework for Sharing Software
Installation Across Organizational and Unix Platform Bounderies".
It details a mechanism we came up with here in our center for sharing
large application distributions such as X windows, GNU software, and
Frame Maker.
You can get the paper from durer.cme.nist.gov (129.6.32.4) in a file
called ~ftp/pub/depot.lisa.ps.Z.  This paper was presented at the LISA
conference in Colorado Springs last month.  I welcome all to read it.
We use the automounter here extensively for this type of thing, and
the depot paper outlines some naming conventions we decided on here.
Walter Rowe

Me and a couple of others came up with the following scheme, which
has been in use at Island Graphics for a couple of years:
Every machine with a disk has a /usr/machinename/people and
/usr/machinename/projects.  For instance, my home is
/usr/bermuda/people/daniel, and I can log into any other island machine
and access it as /usr/bermuda/people/daniel.  This simplifies things
a lot, since we can talk about a project directory (/usr/java/projects/whatever)
or a something in someone's account as the same absolute name wherever
we're logged in.   It was fun throwing the /usr2 name out the window.
Another benefit is that all of us have just one big account, rather than
10 or 15 little ones (one on each machine).

We're also starting to drift towards a lot of automounting, since
all the crossmounting on our machines is getting to be a bit much.


Thanks to the following for e-mail/posting responses;

alden@shape.mps.ohio-state.edu (Dave Alden)
seeger@manatee.cis.ufl.edu (F. L. Charles Seeger III)
srm@unify.com (Steve Maraglia)
larry@stretch.cs.mun.ca (Larry Bouzane)
thurlow@convex.com (Robert Thurlow)
richard@aiai.ed.ac.uk (Richard Tobin)
larsen@prism.cs.orst.edu (Scott Larsen)
de5@ornl.gov (Dave Sill)
rusty@belch.Berkeley.EDU (rusty wright)
faustus@ygdrasil.Berkeley.EDU (Wayne A. Christopher)
mike@vlsivie.tuwien.ac.at (Michael K. Gschwind)
martien@westc.uucp (Martien F. van Steenbergen)
rowe@cme.nist.gov (Walter Rowe)
dansmith@well.sf.ca.us (Daniel Smith)
karl_kleinpaste@cis.ohio-state.edu
kseshadr@quasar.intel.com (Kishore Seshadri)
-- 
Mike Peterson, System Administrator, U/Toronto Department of Chemistry
E-mail: system@alchemy.chem.utoronto.ca
Tel: (416) 978-7094                  Fax: (416) 978-8775


-- Geoff Arnold, PC-NFS architect, Sun Microsystems. (geoff@East.Sun.COM)   --
   *** "Now is no time to speculate or hypothecate, but rather a time ***
   *** for action, or at least not a time to rule it out, though not  ***
   *** necessarily a time to rule it in, either." - George Bush       ***