sitongia@hao.ucar.edu (Leonard Sitongia) (12/16/88)
System: Sun-4/280S OS: 4.0 Category: OS/ rlogind (perhaps only indirectly related) Date: 5 Dec 88 (Other hardware installed in this system include: Sun SCSI and Exabyte cartridge tape drive, Integrated Solutions (xt interface), two SMD disk drives and two SCSI disk drives, 3 ALM-II boards, Hyperchannel interface.) The symptoms of this problem are similar to those described by Loki Jorgenson @physicsa.mcgill.ca (Vol 7, Issue 33, message 3 of 12) in the NFS disk wait slowdown problem, but the cause is different. We have seen a bizarre phenomenon on this machine. Periodically (happened three times today) it will appear that the system has died in that directly connected terminals and rlogins hang up on terminals that are logged in and when an attempt is made to log in usually there is no response or maybe login is allowed but no motd is printed (or other stuff the user may do in the .login) and then the terminal hangs. The system appears to have become tremendously busy. {B^D> It is possible to rsh to this machine in this situation, from other machines. In fact, one can "log in" by starting up csh through rsh (rsh target /bin/csh -i). From this we can look at what is going on on the target. The only unusual behavior is that the in.rlogind's associated with existing rlogin's are running away, gobbling up lots of cpu time (I mean lots! very unusual) and the LEDS on the cpu board do their "converging" pattern, but *very slowly*. Also, there is often a tremendous number of system calls per second (on the order of 3-5 THOUSAND!) occurring. The number of interupts per second is normal. The load average is small (1-5). We can then disable in.rlogind from inetd.conf and kill all the running in.rlogin's and the LEDS will go back to their normal speed of "converging". In fact, I've killed just about everything on the system, so that vmstat, iostat, etc. show little activity, 99% idle cpu, typical numbers of interupts and system calls per second... ...but the system never returns to normal... ...well, I should qualify that: once it returned to normal after about 5 minutes with *no* intervention (on it's own). Perhaps this was an unrelated type of slowdown. So we have to reboot. Unfortunately, dumps that are generated by breaking and then typing "g0" at the console monitor *always* have no u-area and only traceback to the panic (panic 0, the break). Have others seen this problem? Does anyone know what is causing this? We have recieved the 4.0.1 patches but have not installed them. [[ Do it. There are quite a few bugs fixed in the upgrade. --wnl ]] Thank you for your time, -Leonard E. Sitongia System Programmer (303) 497-1509 USPS Mail: High Altitude Observatory P.O. Box 3000 Boulder CO 80307 Internet: sitongia@hao.ucar.edu SPAN: NSFGW::"hao.ucar.edu!sitongia" [NSFGW=9580]
nesheim@think.com (12/25/88)
> From: sitongia@hao.ucar.edu (Leonard Sitongia) > > We have seen a bizarre phenomenon on this machine. Periodically (happened > three times today) it will appear that the system has died in that > directly connected terminals and rlogins hang up on terminals that are > logged in and when an attempt is made to log in usually there is no > response or maybe login is allowed but no motd is printed (or other stuff > the user may do in the .login) and then the terminal hangs. The system > appears to have become tremendously busy. {B^D> ... I've seen this happen when you run out of STREAMS buffers. When the machine gets into a weird state, do a "netstat -m" and look at the stream allocation table. If you see allocation failures you will have to increase the stream parameters. Look at /usr/share/sys/sun?/YOURMACHINE/param.c for the lines that define NBLKXXX. Up them until you no longer get stream allocation failures. -- Bill Nesheim; Thinking Machines Corporation, Cambridge, MA +1 617-876-1111 nesheim@think.com, {harvard,bloom-beacon,topaz}!think!nesheim