[comp.sys.apollo] memory hungry init?

robinb@merlin.bhpmrl.oz (Robin Brown) (01/08/90)

Has anybody encountered a situation where the init process grows to
excessive proportions and brings the system to its knees?

What follows is some ps stats on the init process on one of our
machines that has this problem (a DN3500 running sr10.1)

USER       PID   SZ  RSS TTY     STAT  TIME COMMAND  before reboot
root         121600   54 ?       S <   6:05 init

USER       PID   SZ  RSS TTY     STAT  TIME COMMAND  immediatly after
root         1 1376  708 ?       S <   0:23 init

USER       PID   SZ  RSS TTY     STAT  TIME COMMAND  a bit later on
root         1 1408  664 ?       S <   0:23 init

USER       PID   SZ  RSS TTY     STAT  TIME COMMAND  ditto
root         1 2144   70 ?       S <   0:36 init

USER       PID   SZ  RSS TTY     STAT  TIME COMMAND  the next day
root         110272   70 ?       S <   2:48 init

at this point response time was significantly longer than normal and
the system had to be rebooted. 

Apollo(/HP) have seen this sort of thing before but they don't know
what causes it.  The only known solution is to rebuild the machine,
which (as I'm sure you can appreciate) I don't want to have to do - 
at least not until I lay my hands on a copy of sr10.2

Any help would be much appreciated

Robin

-- 
     /\/\       Robin Brown (Mr), Computer Scientist
    / / /\      BHP Melbourne Research Laboratories
   / / /  \     245 Wellington Rd Mulgrave Vic 3170 AUSTRALIA
  / / / /\ \    Phone   :  +61-3-560-7066
  \ \/ / / /    Fax     :  +61-3-561-6709
   \  / / /     ACSnet  :  robinb@merlin.bhpmrl.oz.au
    \/\/\/      Internet:  robinb%merlin.bhpmrl.oz.au@uunet.uu.net

krowitz%richter@UMIX.CC.UMICH.EDU (David Krowitz) (01/08/90)

The problem you described with "init" under Sr10 is very similar
to what we have encountered here at MIT with several systems.
The problem in each case was with the /etc/ttys file (which is
read by "init" at boot time or whenever it receives a "hangup"
signal from /bin/kill). In our particular cases, the /etc/ttys
file enabled a tty line that did not have a device attached to
it, or that had an incorrectly configured /etc/getty option.
In either case it seems that random noise on the tty line 
would cause "init" to start growing in size without bound.


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter.mit.edu@eddie.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

pcc@apollo.HP.COM (Peter Craine) (01/10/90)

  (If this works, I'll be moderately impressed.  I've not been able to post
to this group for a while).

In article <1372@merlin.bhpmrl.oz>, robinb@merlin.bhpmrl.oz (Robin
Brown) writes:
> 
> Has anybody encountered a situation where the init process grows to
> excessive proportions and brings the system to its knees?
>

Several possibilities (both having to do with /etc/ttys):

    1) /etc/ttys has a device turned "on", but the device isn't
         there, and something funny is happening to DTR/CARRIER
         (pins 20 and 8, respectively).

    2) You have the /dev/pty?? devices turned 'on' (NEVER DO THIS)

    3) You have incorrectly specified the syntax for "getty"
	Proper example:
          tty01  "/etc/getty std.9600" dumb on secure
        Improper example:
          tty01  /etc/getty std.9600 dumb on secure
       Yes, the quotes are significant.  Any field whose contents require
		whitespace MUST be enclosed in quotes.

What do all of these have in common, and, therefore, why does the system
bog down?  Well, for every device that has 'on' in the 4th field, init
tries to launch the program in the 2nd field (with the specified arguments).
When that program exits, SIGCLD (or SIGCHLD, I'm not really certain) gets
delivered to init (the parent).  Init, sees:
		a) a "getty" process died, and
		b) the line is turned on
figures that whoever logged onto the line just logged out, so it must
be time to re-launch the process.


> 
> Apollo(/HP) have seen this sort of thing before but they don't know
> what causes it.  The only known solution is to rebuild the machine,

Who told you this?  50 lashes with a soggy noodle to whoever did.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    Peter Craine                +  "Sometimes you have to slap them in the face
    Hewlett-Packard             +       to get their attention."
    Chelmsford Response Center  +  *I* don't want my opinions.  Why would HP?