[comp.sys.sun] Lock daemon dying 'spontaneously'.....

loki@physicsa.mcgill.ca (Loki Jorgenson Rm421) (04/22/89)

Hello again.  My previous plea for help referred to a possible problem
with mounting unused client partitions.  I am not yet convinced that the
use of these partitions is unrelated but the replies from the concerned
public (ie. you) insisted I had done nothing wrong there.

I shall now try to describe what is happening to the system and hopefully
someone will understand the underlying problem. I am running a SUN OS 3.5
on a server 3/180 with 4 3/50 clients.

Without apparent reason, the lock daemon will spontaneously die and all
hell breaks loose.  Similarly, the portmapper will sometimes die and other
problems result.  If I restart the lock daemon, the system will resume
chugging along and start cleaning up the backlog of processes waiting for
lock-manager service.  After some minutes, it will die again.  Dmesg only
reports the problems of having no lock daemon.

The process which always seems to be present (and in trouble) is sendmail.
I wonder if the sendmail isn't somehow responsible.  Which comes first,
chicken or egg?  It starts setting up running queues as it tries and
retries mail delivery.  It eventually fills the swap space.  The errors
associated with the undelivered mail is "service unavailable: Bad file
number".

These problems occurred three weeks after I first mounted the unused
client partitions.  They occurred without apparent provocation.  The only
reason that I expect the mounting of the partitions is associated is
because the problems ceased the instant I removed the nd.local entries
(for the unused ndl's) and the added rc.local mount command and rebooted.
Afterwards, I returned the nd.local entries (for the unused ndl's),
rebooted and mounted by hand without problems.  This was fine for 2 weeks.
Then I changed the nd.local entries so that, instead of referring to bogus
client names, the partitions were claimed as extras of existing clients
(as per Sys Admin guide).  Three hours later, the system was a mess.  I
removed all extra nd.local references (and the mount) and it went back to
normal.  I haven't changed a thing since.

	Note:  the server system partition almost never survives the reboot
	fsck.  It is usually corrupted.  It is seems like the server system
	is "stepping on its self".

I know that this probably sounds pretty vague.  I'm sorry for lack of
details but I'm confused too.  If you are interested in helping and can
think of any pertinent questions to ask me, please do.  If you have an
explanation, I will be deeply gratified by your assistance.



Loki Jorgenson				node:  loki@physicsa.mcgill.ca
Physics, McGill University	fax:   (514) 398-3733
Montreal Quebec CANADA		phone: (514) 398-6531