[comp.unix.wizards] Bug in SUN Kernel RPC

dpk@BRL.ARPA (Doug Kingston) (12/17/86)

Index:  sys/rpc/clnt_kudp.c  FIX  (Gould version and others?)

Description:
	The SUN kernel mode RPC can hang while doing remote
RPC that should timeout.  An example is NFS when you remote
mount a filesystem "soft".  Some RPC's to this filesystem
will hang.  Mount is one such RPC.

	This problem was found in our Gould kernels which contain
code almost identical to the SUN code.  I know some other vendors
using this code also have the problem.

	Specifically, the problem is that the function clntkudp_callit
does a sleep on &so->so_rcv.sb_cc.  The timeout routine, ckuwakeup(),
had an incorrect wakeup value which is corrected below.

Repeat By:
	Edit /etc/fstab to remote mount a filesystem.
	Shutdown the remote system.
	Reboot the your system.
	Watch your system hang when it attemps to mount
		the NFS filesystems.

Fix:
	Apply the following diff:

*** /tmp/,RCSt1016522	Tue Dec 16 20:37:15 1986
--- clnt_kudp.c	Mon Dec  8 22:34:48 1986
***************
*** 498,504 ****
  	rpc_debug(4, "cku_timeout\n");
  #endif
  	p->cku_flags |= CKU_TIMEDOUT;
! 	sbwakeup(&p->cku_sock->so_rcv);
  }
  
  /*
--- 498,504 ----
  	rpc_debug(4, "cku_timeout\n");
  #endif
  	p->cku_flags |= CKU_TIMEDOUT;
! 	sbwakeup(&p->cku_sock->so_rcv.sb_cc);
  }
  
  /*

chris@mimsy.UUCP (02/18/87)

In article <1601@brl-adm.ARPA> dpk@BRL.ARPA (Doug Kingston) suggests
changing an

	sbwakeup(&p->cku_sock->so_rcv);

to

	sbwakeup(&p->cku_sock->so_rcv.sb_cc);

Yet sbwakeup() takes a `struct sockbuf *', not an `int *', and does
a wakeup on &sb->sb_cc.  How can the above be right?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

guy@gorodish.UUCP (02/26/87)

>Yet sbwakeup() takes a `struct sockbuf *', not an `int *', and does
>a wakeup on &sb->sb_cc.  How can the above be right?

It can't.  I believe the original bug report spoke of problems with
soft mounts hanging forever; there is a bug here, but it's unrelated
to the RPC code.  The problem is that when you do a mount, you do
some NFS calls to do things such as getting the attributes of the
directory you're mounting.  However, the problem is that "nfsrootvp"
doesn't set (or, more precisely, clear) the "mi_hard" entry for the
file system based on the mount options until *after* all those NFS
calls have been made.  Thus, if the mount server for a host is
responding but the NFS server isn't, the "mount" will hang.