loki@physicsa.mcgill.ca (Loki Jorgenson Rm421) (04/22/89)
Hello again. My previous plea for help referred to a possible problem with mounting unused client partitions. I am not yet convinced that the use of these partitions is unrelated but the replies from the concerned public (ie. you) insisted I had done nothing wrong there. I shall now try to describe what is happening to the system and hopefully someone will understand the underlying problem. I am running a SUN OS 3.5 on a server 3/180 with 4 3/50 clients. Without apparent reason, the lock daemon will spontaneously die and all hell breaks loose. Similarly, the portmapper will sometimes die and other problems result. If I restart the lock daemon, the system will resume chugging along and start cleaning up the backlog of processes waiting for lock-manager service. After some minutes, it will die again. Dmesg only reports the problems of having no lock daemon. The process which always seems to be present (and in trouble) is sendmail. I wonder if the sendmail isn't somehow responsible. Which comes first, chicken or egg? It starts setting up running queues as it tries and retries mail delivery. It eventually fills the swap space. The errors associated with the undelivered mail is "service unavailable: Bad file number". These problems occurred three weeks after I first mounted the unused client partitions. They occurred without apparent provocation. The only reason that I expect the mounting of the partitions is associated is because the problems ceased the instant I removed the nd.local entries (for the unused ndl's) and the added rc.local mount command and rebooted. Afterwards, I returned the nd.local entries (for the unused ndl's), rebooted and mounted by hand without problems. This was fine for 2 weeks. Then I changed the nd.local entries so that, instead of referring to bogus client names, the partitions were claimed as extras of existing clients (as per Sys Admin guide). Three hours later, the system was a mess. I removed all extra nd.local references (and the mount) and it went back to normal. I haven't changed a thing since. Note: the server system partition almost never survives the reboot fsck. It is usually corrupted. It is seems like the server system is "stepping on its self". I know that this probably sounds pretty vague. I'm sorry for lack of details but I'm confused too. If you are interested in helping and can think of any pertinent questions to ask me, please do. If you have an explanation, I will be deeply gratified by your assistance. Loki Jorgenson node: loki@physicsa.mcgill.ca Physics, McGill University fax: (514) 398-3733 Montreal Quebec CANADA phone: (514) 398-6531