fredrick@acd.acd.ucar.edu (Tim Fredrick) (04/26/91)
(Regarding AIX3.1.5) We have had this situation since we first tried to use automounter. We are starting it up like this: /usr/etc/automount /r /etc/auto.direct as the last thing done in /etc/rc.tcpip. A couple of our our auto.direct entries look like this: u1 -rw,intr,soft pyr:/u1 s1 -rw,intr,soft wk1:/s1 These are a Pyramid 90x and a SPARCstation-1 (SUNOS4.1.1) respectively. Everything seems to work just fine until the first mounting problem -- a machine goes down, or something times out because of a busy network, etc. Then our disasters begin. Doing a "ps -efgaux" doesn't show automounter running in memory. Yet a "df" command will hang almost indefinitely sapping the IBM's CPU. Any attempt to cd to /r/s1 or /r/u1 does the same thing -- that process hangs and saps CPU time. Attempting to stop and start the nfsd and biods doesn't work. Nor does "stop nfs" and "start nfs" in SMIT work. The only thing that we've found that works is rebooting the machine -- which is very inconvenient because of the number of users we have on our Risc-System 530 running long jobs. So, 1. Has anyone else run into this problem (with automount and NFS mounts on non-IBM machines?) 2. We're running 8 nfsd's and 15 biod's -- /etc/auto.direct has about 15 entries. Does that sound right? Do you have to have a biod for each entry in /etc/auto.direct? 3. Can anyone explain what is happening to us when this occurs? 4. Is there a way to recover from this state without rebooting the machine? 5. And the biggie: Is there a way to *prevent* this problem? 6. Sun's automount doesn't do this with exactly the same configuration. Should we just forget about using IBM's implementation? Is there a public domain implementation out there (GNU automount?)? Any help from an NFS-expert or other knowledgeable person would be greatly appreciated by dozens of people around here. Thanks in advance. --Tim