robinb@merlin.bhpmrl.oz (Robin Brown) (01/08/90)
Has anybody encountered a situation where the init process grows to excessive proportions and brings the system to its knees? What follows is some ps stats on the init process on one of our machines that has this problem (a DN3500 running sr10.1) USER PID SZ RSS TTY STAT TIME COMMAND before reboot root 121600 54 ? S < 6:05 init USER PID SZ RSS TTY STAT TIME COMMAND immediatly after root 1 1376 708 ? S < 0:23 init USER PID SZ RSS TTY STAT TIME COMMAND a bit later on root 1 1408 664 ? S < 0:23 init USER PID SZ RSS TTY STAT TIME COMMAND ditto root 1 2144 70 ? S < 0:36 init USER PID SZ RSS TTY STAT TIME COMMAND the next day root 110272 70 ? S < 2:48 init at this point response time was significantly longer than normal and the system had to be rebooted. Apollo(/HP) have seen this sort of thing before but they don't know what causes it. The only known solution is to rebuild the machine, which (as I'm sure you can appreciate) I don't want to have to do - at least not until I lay my hands on a copy of sr10.2 Any help would be much appreciated Robin -- /\/\ Robin Brown (Mr), Computer Scientist / / /\ BHP Melbourne Research Laboratories / / / \ 245 Wellington Rd Mulgrave Vic 3170 AUSTRALIA / / / /\ \ Phone : +61-3-560-7066 \ \/ / / / Fax : +61-3-561-6709 \ / / / ACSnet : robinb@merlin.bhpmrl.oz.au \/\/\/ Internet: robinb%merlin.bhpmrl.oz.au@uunet.uu.net
krowitz%richter@UMIX.CC.UMICH.EDU (David Krowitz) (01/08/90)
The problem you described with "init" under Sr10 is very similar to what we have encountered here at MIT with several systems. The problem in each case was with the /etc/ttys file (which is read by "init" at boot time or whenever it receives a "hangup" signal from /bin/kill). In our particular cases, the /etc/ttys file enabled a tty line that did not have a device attached to it, or that had an incorrectly configured /etc/getty option. In either case it seems that random noise on the tty line would cause "init" to start growing in size without bound. -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter.mit.edu@eddie.mit.edu krowitz%richter.mit.edu@mitvma.bitnet (in order of decreasing preference)
pcc@apollo.HP.COM (Peter Craine) (01/10/90)
(If this works, I'll be moderately impressed. I've not been able to post to this group for a while). In article <1372@merlin.bhpmrl.oz>, robinb@merlin.bhpmrl.oz (Robin Brown) writes: > > Has anybody encountered a situation where the init process grows to > excessive proportions and brings the system to its knees? > Several possibilities (both having to do with /etc/ttys): 1) /etc/ttys has a device turned "on", but the device isn't there, and something funny is happening to DTR/CARRIER (pins 20 and 8, respectively). 2) You have the /dev/pty?? devices turned 'on' (NEVER DO THIS) 3) You have incorrectly specified the syntax for "getty" Proper example: tty01 "/etc/getty std.9600" dumb on secure Improper example: tty01 /etc/getty std.9600 dumb on secure Yes, the quotes are significant. Any field whose contents require whitespace MUST be enclosed in quotes. What do all of these have in common, and, therefore, why does the system bog down? Well, for every device that has 'on' in the 4th field, init tries to launch the program in the 2nd field (with the specified arguments). When that program exits, SIGCLD (or SIGCHLD, I'm not really certain) gets delivered to init (the parent). Init, sees: a) a "getty" process died, and b) the line is turned on figures that whoever logged onto the line just logged out, so it must be time to re-launch the process. > > Apollo(/HP) have seen this sort of thing before but they don't know > what causes it. The only known solution is to rebuild the machine, Who told you this? 50 lashes with a soggy noodle to whoever did. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Peter Craine + "Sometimes you have to slap them in the face Hewlett-Packard + to get their attention." Chelmsford Response Center + *I* don't want my opinions. Why would HP?