[comp.unix.ultrix] Machines locking up - NFS related

devin@samwise.Colorado.EDU (Yampalimardilor) (09/18/90)

Our DECstations running Ultrix 4.0 have a problem - if an NFS server
goes down, and one of its disks is mounted, the machine locks up.
Especially if no-one is logged in - Xprompter goes away.
Is there some way to tell mount that if an NFS server goes down, just
don't access that disk?  (note:  the bg option in /etc/fstab doesn't
seem to help - only when initially mounting the disk)

Thanks.
		-Devin

***********************************************************************
* Yampalimardilor          *     a.k.a Devin Hooker (RP90) DoD # 0034 *
* Mage School, Glantri     *           University of Colorado-Boulder *
* (Master of Movement)     *           (Undergrad Student)            *
***********************************************************************

bachesta@bcstec.UUCP (Jim Bachesta) (10/04/90)

 I currently manage a computer site with a VAX 6000-440 running Ultrix
4.0 and a VAX 8800 running Ultrix 3.1. We have recently upgraded to
the VAX6000-440 running 4.0. We have observed four instances where both
system locked up. We are currently running NFS mounts between the two
VAXes. At the time of the lock up all users who were logged in could not
get any response. Any new users trying to log in never got a login
prompt. Our only recourse was to crash the system and reboot. We also
have both system connected to an HSC-70. The HSC console gave the
following messages:

	HOST-W Node 2 (trident) Path A has gone from good to bad.
	HOST-W Node 4 (terra) Path A has gone from good to bad.
	HOST-W VC closed with node 2 (   ) due to START received.
	HOST-W VC closed with node 4 (   ) due to START received.

	HOST-I VC open with node 2 (  ).
	HOST-I VC open with node 4 (  ).

This sequence would repeat at about 20 min intervals. The system 
errlog at /usr/adm/syserr did not log any errors during this time.
All system statistics we gather via cron (i.e. iostat, vmstat etc)
didn't show anything unusual. 

We called DEC and there isn't any record of other sites having this
problem. 

Has any one else seen this, or does anybody have an idea what may be 
causing this.

We suspect the following:

	o NFS incompatiblity between 4.0 and 3.1

	o HSC hardware problem, although we have had this HSC -70 for
	  about a year without any problems. This problem surfaced after
	  the upgrade. In addition we had one occurance of a lockup on
	  one system while the other stayed up.

Any help or comments would be greatly appreciated.


					James Bachesta
					System Manager
					Boeing P3 Update IV Project.
					bachesta@trident.boeing.com