[comp.sys.sun] Sun-4 slows down like molasses

sitongia@hao.ucar.edu (Leonard Sitongia) (12/16/88)

System:         Sun-4/280S
OS:             4.0
Category:       OS/ rlogind (perhaps only indirectly related)
Date:           5 Dec 88

(Other hardware installed in this system include: Sun SCSI and Exabyte
cartridge tape drive, Integrated Solutions (xt interface), two SMD disk
drives and two SCSI disk drives, 3 ALM-II boards, Hyperchannel interface.)

The symptoms of this problem are similar to those described by Loki
Jorgenson @physicsa.mcgill.ca (Vol 7, Issue 33, message 3 of 12) in the
NFS disk wait slowdown problem, but the cause is different.

We have seen a bizarre phenomenon on this machine.  Periodically (happened
three times today) it will appear that the system has died in that
directly connected terminals and rlogins hang up on terminals that are
logged in and when an attempt is made to log in usually there is no
response or maybe login is allowed but no motd is printed (or other stuff
the user may do in the .login) and then the terminal hangs.  The system
appears to have become tremendously busy. {B^D>

It is possible to rsh to this machine in this situation, from other
machines.  In fact, one can "log in" by starting up csh through rsh (rsh
target /bin/csh -i).  From this we can look at what is going on on the
target.

The only unusual behavior is that the in.rlogind's associated with
existing rlogin's are running away, gobbling up lots of cpu time (I mean
lots! very unusual) and the LEDS on the cpu board do their "converging"
pattern, but *very slowly*.  Also, there is often a tremendous number of
system calls per second (on the order of 3-5 THOUSAND!) occurring.  

The number of interupts per second is normal.  The load average is small
(1-5).

We can then disable in.rlogind from inetd.conf and kill all the running
in.rlogin's and the LEDS will go back to their normal speed of
"converging".  In fact, I've killed just about everything on the system,
so that vmstat, iostat, etc. show little activity, 99% idle cpu, typical
numbers of interupts and system calls per second...

...but the system never returns to normal...

...well, I should qualify that: once it returned to normal after about 5
minutes with *no* intervention (on it's own).  Perhaps this was an
unrelated type of slowdown.

So we have to reboot.  Unfortunately, dumps that are generated by breaking
and then typing "g0" at the console monitor *always* have no u-area and
only traceback to the panic (panic 0, the break).

Have others seen this problem?  Does anyone know what is causing this?

We have recieved the 4.0.1 patches but have not installed them.
[[ Do it.  There are quite a few bugs fixed in the upgrade.  --wnl ]]

Thank you for your time,

-Leonard E. Sitongia    System Programmer		 (303) 497-1509
USPS Mail: High Altitude Observatory P.O. Box 3000 Boulder CO  80307
Internet:               sitongia@hao.ucar.edu
SPAN:			NSFGW::"hao.ucar.edu!sitongia"	[NSFGW=9580]

nesheim@think.com (12/25/88)

> From:    sitongia@hao.ucar.edu (Leonard Sitongia)
> 
> We have seen a bizarre phenomenon on this machine.  Periodically (happened
> three times today) it will appear that the system has died in that
> directly connected terminals and rlogins hang up on terminals that are
> logged in and when an attempt is made to log in usually there is no
> response or maybe login is allowed but no motd is printed (or other stuff
> the user may do in the .login) and then the terminal hangs.  The system
> appears to have become tremendously busy. {B^D>
...

I've seen this happen when you run out of STREAMS buffers.  When the
machine gets into a weird state, do a "netstat -m" and look at the stream
allocation table.  If you see allocation failures you will have to
increase the stream parameters.  Look at
/usr/share/sys/sun?/YOURMACHINE/param.c for the lines that define NBLKXXX.
Up them until you no longer get stream allocation failures.

-- Bill Nesheim; Thinking Machines Corporation, Cambridge, MA +1 617-876-1111
   nesheim@think.com, {harvard,bloom-beacon,topaz}!think!nesheim