[comp.sys.hp] HP-UX 6.5 NFS problem on cnodes.

barbour@boulder.Colorado.EDU (Jim Barbour) (06/30/89)

We have an instructional lab which has 2 clusters.  Each file server is a
9000/350 and the cnodes or 320s.  There are 9 cnodes per cluster.

Currently, we have user's files cross mounted between clusters.  i.e. cluster A
user files are remotely mounted on cluster B and cluster B user files are
remotely mounted on cluster A.  Thus, if a user logs onto a machine, his/her
home directory could very well be nfs-mounted.  So far, this has worked fine.
 
However, we are planning to upgrading from HP-UX 6.2 to 6.5 very soon.  We
discovered in the release notes a very serious problem.  Apparently, if you are
logged in on a cnode and your cwd is an nfs directory, you can not start up csh.

This would seem to indicate that we would no longer be able to cross-mount 
these user files.  Because of the configuration of the clusters -- software
availabillity and so on -- this is highly undesirable.

I realize that I could give each person a local home directory on
each machine.  However, can anyone suggest another work around for this problem?

Jim Barbour (barbour@alumni.Colorado.EDU)

C.U. Boulder -- HP operations

diamant@hpfclp.SDE.HP.COM (John Diamant) (07/02/89)

> However, we are planning to upgrading from HP-UX 6.2 to 6.5 very soon.  We
> discovered in the release notes a very serious problem.  Apparently, if
> you are logged in on a cnode and your cwd is an nfs directory, you can not
> start up csh.

The problem referred to in the release notes is not with NFS, but with
RFA (netunam).  I am running 6.5 on my cluster and just confirmed that
csh starts up just fine when my current directory is under an NFS mount point.

I believe the passage in the release notes you're referring to is the
following one:

                  It is no longer possible to start "csh(1)" when the
                  current directory is on an RFA-connected file system.
                  Both "sh(1)" and "ksh(1)" will start properly in a
                  remote directory.

RFA is the name of the HP-proprietary transparent remote file access
system (accessed via the netunam shell builtins and the netunam system
call).  It does not refer to NFS.

> This would seem to indicate that we would no longer be able to cross-mount 
> these user files.  Because of the configuration of the clusters -- software
> availabillity and so on -- this is highly undesirable.

Since the problem is with RFA and not NFS, you should not have any problem
with your configuration.  By the way, the bug is supposed to be fixed in
a forthcoming release.

John Diamant
Software Engineering Systems Division
Hewlett-Packard Co.		Internet: diamant@hpfclp.sde.hp.com
Fort Collins, CO		    UUCP: {hplabs,hpfcla}!hpfclp!diamant

raveling@venera.isi.edu (Paul Raveling) (07/05/89)

In article <7540029@hpfclp.SDE.HP.COM> diamant@hpfclp.SDE.HP.COM (John Diamant) writes:
>> However, we are planning to upgrading from HP-UX 6.2 to 6.5 very soon.  We
>> discovered in the release notes a very serious problem.  Apparently, if
>> you are logged in on a cnode and your cwd is an nfs directory, you can not
>> start up csh.
>
>The problem referred to in the release notes is not with NFS, but with
>RFA (netunam).  I am running 6.5 on my cluster and just confirmed that
>csh starts up just fine when my current directory is under an NFS mount point.

	We have no problem starting csh when all hosts are alive
	that we have NFS mounts on, but have experienced inability
	to start csh when any of those hosts has gone down.  (Further
	qualifications follow.)  This happens if ANY of those hosts
	goes down -- it's not necessary to have an file open on the
	defunct host.  The situation cures itself when the dead host
	revives.

	What usually happens is assorted operations within X windows
	start looking catatonic.  I usually try to SU in an existing
	window or open a new xterm window to check it out; in both
	cases starting up the new csh hangs.

	However, in these circumstances csh comes up successfully as
	a login shell.  I've been able to log out & log in again as
	root, using csh, to unmount the offending file system.  I've
	also logged in as myself with csh as the login shell to bring
	up X without unmounting anything, and X comes up, but X clients
	that depend on csh hang.

	It appears that our Sun users have the same problem, suggesting
	it's purely an NFS behavior.  That 2nd hand info -- I haven't
	personally looked at it on Suns.

	To confuse matters more, it appeared that for a brief time
	the problem went away, then came back.  Perhaps it's sensitive
	to some piece of setup in .cshrc files, but we haven't spotted
	it yet.

	BTW, we're not running totally diskless.  Each workstation
	has local swap space and enough file system to have a local
	kernel, /bin, and /etc directories.  Practically everything
	else, including /usr and all users' home directories, is
	mounted via NFS.


----------------
Paul Raveling
Raveling@isi.edu

burzio@mmlai.UUCP (Tony Burzio) (07/07/89)

In article <8840@venera.isi.edu>, raveling@venera.isi.edu (Paul Raveling) writes:
> 	We have no problem starting csh when all hosts are alive
> 	that we have NFS mounts on, but have experienced inability
> 	to start csh when any of those hosts has gone down.  (Further
> 	qualifications follow.)  This happens if ANY of those hosts
> 	goes down -- it's not necessary to have an file open on the
> 	defunct host.  The situation cures itself when the dead host
> 	revives.

Are your NFS file systems mounted "soft"?  If they are not, NFS will
sit around and wait for the desired system to answer the file request
forever.  An example checklist entry (for a paranoid admin with a 
hanging ethernet) would be:

mmlai:/users    /mmlai/users nfs soft,timeo=100 0 0 # NFS mount to MMLAI

*********************************************************************
Tony Burzio               * All right, so where's rfbackup anyway???
Martin Marietta Labs      *
mmlai!burzio@uunet.uu.net *
*********************************************************************

jack@hpindda.HP.COM (Jack Repenning) (07/11/89)

	We have no problem starting csh when all hosts are alive
	that we have NFS mounts on, but have experienced inability
	to start csh when any of those hosts has gone down.  (Further
	qualifications follow.)  This happens if ANY of those hosts
	goes down -- it's not necessary to have an file open on the
	defunct host.  The situation cures itself when the dead host
	revives.

	..., but X clients that depend on csh hang.

Does starting up X include adding some directory to your PATH, which
is actually an NFS mount from the system that's down?  As I understand
it, csh attempts to hash all directories in $PATH whenever it starts:
if it can't get to one, then it hangs (not necessary to actually *run*
anything from the inaccessible directory, merely to have it in your
PATH).

As someone else pointed out, this (and possibly other interactions)
get better when you do a soft mount.  Basically, "hard mount" means
"keep trying forever, " while "soft mount" means "give up after a
reasonable time."  And "hard" is the default.


Jack Repenning