[comp.sys.sun] Diskless client hangs

jch@sonne.tn.cornell.edu (Jeffrey C Honig) (10/04/89)

We have several diskless 3/50s and a 3/60, all with 4M of memory, serving
from a 3/160 with 8M.  All are running SunOS 4.0.3, but this problem has
been with us since 4.0.1.

Sometimes, when a client is fairly busy (X with a bunch of windows, and
maybe a compile or two), a program on the client will lock up in
short-term wait.  Sometimes this is xterm, sometimes csh, sometimes Xsun,
sometimes another program (like the editor when I haven't saved the two
hours of work I've done).  When this happens, all instances of the program
are locked up.  If it is csh, the only way to log in is with an userid
using /bin/sh, if it is Xsun, telnet still works. A ps -axlw of the
process shows WCHAN=kernelmap STAT=D.

I've looked at the stack of the processes that are hung, both online and
in post-mortem system dumps.  The common elements seem to be:

	idle(?)
	_sleep(..addr..,0x16) + 72
	_clntkudp_callit_addr() + 154
	_clntkudp_callit(..addr.., 0xb, ..addr.., ..addr.., ..addr..,
		..addr.., 0x4, ..addr..) + 2a
	_rfscall(..addr.., 0xb, ..addr.., ..addr.., ..addr..,
		..addr.., ..addr..) +10e

I've talked to Sun support, who say I have to increase my swap space
because strange things happen when you run out of swap space.  They also
said that because I am running X I need at least 50M of swap for each
client.

I figure this is a bit unreasonable, but have increased my swap area on my
client to 32M (from 16M) with the file set to allocate on demand so I can
see the high-water mark.  I haven't seen the problem yet, but it only
happens about once per day.

I don't think that X is directly related to this problem, except for
increasing the system load and memory requirements.

Has anyone seen a problem like this?  Any Suggestions?

Thanks.

Jeff