geoff@bodleian.East.Sun.COM (Geoff Arnold @ Sun BOS - R.H. coast near the top) (12/03/90)
[Since this is of interest to folks who might not read comp.sys.apollo and comp.unix.admin, where it was originally posted, I am reposting this to comp.protocols.nfs with Mike's permission. Geoff] Quoth system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) (in <1990Nov28.150038.3610@alchemy.chem.utoronto.ca>): Here is the summary of the e-mail and posted responses to my questions about NFS mount point strategies: >1) What options should I use on various systems for the mount command > (e.g. soft vs. hard, use bg or not, use 'hard,bg', retry counts, > timeouts)? Responses: I use hard - I used soft for a little while but one of my users lost some data when the server crashed (his program didn't check for errors on the printf). Soft. For read only file systems use soft option. For read/write use hard option. Use the defaults for retry and timeouts unless you start getting "system_xyz not responding" messages and then you can increase them. The only reason I can see for not using bg would be if the machines depends on information on the file system to configure itself. ALWAYS use "bg"; it just means your clients won't hang as badly on bootup when a server is down. I use "hard,intr" mounts for filesystems that are writable so that I get the best data integrity while leaving people have a chance to kill a hung process. "soft,ro" is a nice combination for stuff like man pages and reference sources that programs don't depend on. Summary: use 'hard,bg' where integrity matters, 'soft' where it doesn't (probably mainly useful with 'ro'). >2) What directory structure is best for the actual mount points: > a) mount "system:/dir" on /system_dir and let the users refer to > /system_dir/..... (so that the user reference point is the > actual mount point)? > b) mount "system:/dir" on /nfs/system_dir and let the users refer to > /system_dir/..... where /system_dir is a link to /nfs/system_dir > (so that the user reference point is 1 link removed from the > actual mount point)? > c) mount "system:/dir" on /nfs/system_dir and let the users refer to > /system_dir/..... where /system_dir is a link to /net/system_dir > and /net/system_dir is a link to /nfs/system_dir > (so that the user reference point is 2 links removed from the > actual mount point)? Summary: almost everyone preferred option b). Responses: I use two technical guidelines: 1. Avoid mount points from different hosts in the same directory. 2. Avoid mounting an NFS file system on top of another. The choice here depends on several criteria. One major one is that you don't want anyone to hang on a down NFS server if they are not using that server. This means that the mount points have to be `out of the way' when users are doing a pwd (search backward in directory tree to find current position) or something similar. We actually use /rmt/<servername>/<fsname> so it's easier to figure out where everything is, and we have a symlink from /<fsname> to the actual mount point. This is really important, since getwd() and 'pwd' can hang if it stumbles across a remote mount point as it walks up the tree looking for the right components. If you can't do symbolic links, you're kind of stuck, though. Well, here at OSU, all of our machines run NFS and we ran into a BIG problem with inconsistant naming, so we thought a bit and cam up with this: mount all partitions under /nfs/machinename/partition. Question: Is there some advantage I'm missing to having everything mounted under /nfs? E.g., why not just /machinename/partition? Reply: You'll regret it if you don't have your nfs mounts 3 levels down. The method used by pwd (and by the C library getwd() routine which uses what pwd does to determine the current directory) necessitates walking up the directory tree and doing a stat() on each directory in .. to find out where it came from (save the inode number of . before you move up to .. and then compare that against the inode number of each directory in the new current directory). When it does a stat() on an nfs mounted directory where the nfs server is down you'll hang. csh uses getwd() to initialize the csh variable $cwd so users will hang when logging in if one of the nfs servers is down. Likewise, every time you do cd csh uses getwd() to set $cwd. So each nfs mount has to be mounted on a directory that must be the only directory in its parent; i.e., it must not have any "sisters" or "brothers". I can't remember why they have to be 3 levels down instead of only 2; someone else can probably explain why. One problem I can see with mounting every thing in the / directory is performance of the getwd() function. Getwd backtracks all directories with .. stat'ing all the directory entries, i.e. until root do { get i-node of current dir cd .. for every dir entry { stat compare with i-node if they match we have discovered part of path, add it to name && break; } } (* hope you get the idea *) Now if you have all the NFS mounted stuff in /, EVERY getwd means stat'ing NFS mounted volumes. This of course works, EXCEPT if one NFS server is down, because then the stat will hang wait & for time-outs etc. meaning that # pwd (and any other program which uses getwd() :-() will more or less hang. Mounting everything in /nfs means that you only will hang if you are below /nfs - a major improvement, but still: if you use pwd in an NFS file system, you'll have the same problem as described before, so if one NFS server is down, getwd() is down for _ALL_ NFS file systems. There was an interesting paper at the last EUUG/EurOpen conference (Autumn '90) how they solved the problem at Chalmers University. You may want to get hold of a copy from: EUUG Secretariat Owles Hall Buntingford Herts SG9 9PL UK Is there some advantage I'm missing to having everything mounted under /nfs? E.g., why not just /machinename/partition? Aside from the problems of getwd(), there is also the simple fact that / gets _awfully_ cluttered if you have a lot of servers. I have 31 servers in my fstabs; I don't want an extra 31 directories in /. >We mount system:dir on /nfs/system/dir and have a symbolic link to this. >This has the advantage that when getwd() searches a directory, it never >looks at unnecessary remote mount points. This does not necessarily fix the hanging problem for SunOS 4.0.x systems. The getwd() algorithm was changed to where, every time a mount point is crossed, getwd checks the /etc/mtab and tries to find a mount moint with the same device id. If it does find one, it prepends the path for this mount point to the current path (derived so far..). While this means that getwd doesn't walk all the way up the tree to /, it may stat most of the entries in /etc/mtab which of course could make things worse...Sun uses a getwd cache to get around this problem which in turn leads to other problems... >3) Does the answer to 2) depend on the answer to 1), and/or the > reliability of the systems involved? Response: Where you mount a remote file system has nothing to do with the mount options. Unless you have some policy about where you mount read only file systems. >4) What naming schemes are used to handle the large number of potential > NFS mounts (for example, Physics/Astronomy/CITA here give each > disk/partition a name (of a tree from the forest), and Apollo > suggests systemname_dir; I can see advantages of both schemes since > the former makes disk names consistent everywhere and users don't > need to know what physical systems files really reside on, whereas > the latter brings some order, especially for the sysadmin)? Responses: I mount all the filesystems under "/n/<system>/<disk>", ie: "/n/shape/0", "/n/shape/1", "/n/point/0", etc...This way I can NFS mount "/n/shape/0" on all the clients as "/n/shape/0" - which means the users don't have to worry about their home directory being in a different place on different machines. We use real mount points with names of the form /net/hostname/fsname and use symlinks to point to these mount points. We name file systems containing user home directories things like /cis/reef0 or /cis/lightning1, where reef and lightning are server names. These names are true mount points on the server itself, but everywhere else they are symlinks to /net/reef/0 and /net/reef/1. We also have a few directories like /cis/tmp, /cis/adm and /cis/src, which are symlinks to /net/beach/tmp, /net/manatee/adm and /net/manatee/src. From an administration point of view, it is very useful to have a strict scheme for mounting file systems. Packages that insist on being in certain places can be pacified with an appropriate symbolic link. The scheme used here in our network (MIPS M/120s, MIPS 3000s, DecStation 3100s, Sun 4s, Sun 3s and Vaxen) is the following: Everything from machine x is mounted under /nfs/x. Ie, if we wanted to mount /users/cs/staff, /usr/lib/tex and /usr/local from machine x on machine y x:/users/cs/staff mounted on /nfs/x/users.cs.staff x:/usr mounted on /nfs/x/usr with symbolic links /users/cs/staff -> /nfs/x/users.cs.staff /usr/lib/tex -> /nfs/x/usr/lib/tex /usr/local -> /nfs/x/usr/local (and if X11 was kept in /usr/local/X11 [as it is here]) /usr/bin/X11 -> /usr/local/X11/bin /usr/lib/X11 -> /usr/local/X11/lib /usr/include/X11 -> /usr/local/X11/include The local aim in regards to how users access things on different systems is to use symbolic links provide a single name on all computers. If the user wants to know where something actually resides, he can use the df command to find out (ie, df . will tell you where the current directory physically resides). Another thing you want to think about is whether your pathnames look the same no matter where you are (i.e., on a machine for which the fs is local or one for which it is nfs). If this is the case, you can have one home directory, run yp, and everything is nice and transparent. On our cluster, all user filesystems, both local and nfs, are mounted underneath /home -- I wish this were the convention for all machines I have accounts on... In general, make data available through: /<class>/<instance> Where <class> could be "home" for home directories, "vol" for volumes (i.e. applications and other mostly read-only data, "source" for public domain or free source code that you have collected, "project" for project directories, "distrib" for software distributions, etc. Some examples: /home/john John's home dir /home/mary Mary's home dir /vol/frame default FrameMaker application directory /vol/frame-2.1 explicit FrameMaker 2.1 application directory /vol/emacs GNU Emacs /project/xyz Project xyz /project/abc Project abc /source/emacs-18.55 GNU Emacs 18.55 sources /source/nn-6.3 nn 6.3 sources /distrib/frame-2.1 FrameMaker 2.1 SunView distribution /distrib/framex-2.1 FrameMaker 2.1 X-Windows distribution Get it?! I consider having the server name (or partition for that matter) somewhere in a path a bad thing. It means that you have to tell all and everyone when you move, say, an application to another partition or server. BTW I am planning to write an article on this and am preparing a presentation on the what and how. Be patient, I'll try to get it posted in this news group. Martien F. van Steenbergen You may want to read a paper that some colleagues and myself wrote recently. Its called "The Depot: A Framework for Sharing Software Installation Across Organizational and Unix Platform Bounderies". It details a mechanism we came up with here in our center for sharing large application distributions such as X windows, GNU software, and Frame Maker. You can get the paper from durer.cme.nist.gov (129.6.32.4) in a file called ~ftp/pub/depot.lisa.ps.Z. This paper was presented at the LISA conference in Colorado Springs last month. I welcome all to read it. We use the automounter here extensively for this type of thing, and the depot paper outlines some naming conventions we decided on here. Walter Rowe Me and a couple of others came up with the following scheme, which has been in use at Island Graphics for a couple of years: Every machine with a disk has a /usr/machinename/people and /usr/machinename/projects. For instance, my home is /usr/bermuda/people/daniel, and I can log into any other island machine and access it as /usr/bermuda/people/daniel. This simplifies things a lot, since we can talk about a project directory (/usr/java/projects/whatever) or a something in someone's account as the same absolute name wherever we're logged in. It was fun throwing the /usr2 name out the window. Another benefit is that all of us have just one big account, rather than 10 or 15 little ones (one on each machine). We're also starting to drift towards a lot of automounting, since all the crossmounting on our machines is getting to be a bit much. Thanks to the following for e-mail/posting responses; alden@shape.mps.ohio-state.edu (Dave Alden) seeger@manatee.cis.ufl.edu (F. L. Charles Seeger III) srm@unify.com (Steve Maraglia) larry@stretch.cs.mun.ca (Larry Bouzane) thurlow@convex.com (Robert Thurlow) richard@aiai.ed.ac.uk (Richard Tobin) larsen@prism.cs.orst.edu (Scott Larsen) de5@ornl.gov (Dave Sill) rusty@belch.Berkeley.EDU (rusty wright) faustus@ygdrasil.Berkeley.EDU (Wayne A. Christopher) mike@vlsivie.tuwien.ac.at (Michael K. Gschwind) martien@westc.uucp (Martien F. van Steenbergen) rowe@cme.nist.gov (Walter Rowe) dansmith@well.sf.ca.us (Daniel Smith) karl_kleinpaste@cis.ohio-state.edu kseshadr@quasar.intel.com (Kishore Seshadri) -- Mike Peterson, System Administrator, U/Toronto Department of Chemistry E-mail: system@alchemy.chem.utoronto.ca Tel: (416) 978-7094 Fax: (416) 978-8775 -- Geoff Arnold, PC-NFS architect, Sun Microsystems. (geoff@East.Sun.COM) -- *** "Now is no time to speculate or hypothecate, but rather a time *** *** for action, or at least not a time to rule it out, though not *** *** necessarily a time to rule it in, either." - George Bush ***