[comp.sys.hp] Lots of zombies on HP-UX

tml@hemuli.atk.vtt.fi (Tor Lillqvist) (07/06/89)

We are experiencing lots of zombie processes on an HP9000 Series 800
running HP-UX 3.1 (the same occurred also in 2.1 and 3.01).  They are
all children of remshd (rshd in BSD) processes (which have no other
children).  All these remshds are sleeping on selwait.  We have a
configuration with a bunch of Series 300 workstations running X
clients on the 840.  Right now, for instance, there are 25 of these
zombies when the system has been up for three days, with perhaps ten
active workstation users.

What could be the problem?  Is there any cure, except writing a perl
script that scans ps now and then, and kills off remshd processes with
a zombie child? 
-- 
Tor Lillqvist
Technical Research Centre of Finland, Computing Services (VTT/ATK)
tml@hemuli.atk.vtt.fi [130.188.52.2]

jack@hpindda.HP.COM (Jack Repenning) (07/11/89)

I've seen persistent zombie children of remshd when people remsh
processes that they put into background, e.g.:

	       remsh m840 hpterm -display $DISPLAY \& &

If this is happening, you should find active processes belonging to
the same users (the hpterm, in the example).  These will be the
program started by the zombie (which was a shell).

Unfortunately, they'll be a little hard to track down from ps:
although they'll be in the same process group as the zombie, that's
not visible, and they'll show PPID of "1".

If this is your problem, then to make these zombies go away (and the
remshd, and the remsh back on the workstation as well), you need to
arrange to close stdin, stdout, and stderr on the "real" process
(again, the hpterm in the example).

Jack Repenning
jack@hpda.hp.com

scott@grlab.UUCP (Scott Blachowicz) (07/15/89)

/ grlab:comp.sys.hp / jack@hpindda.HP.COM (Jack Repenning) /  1:19 pm  Jul 10, 1989 /
> I've seen persistent zombie children of remshd when people remsh
> processes that they put into background, e.g.:
> 
>         remsh m840 hpterm -display $DISPLAY \& &
> 
> If this is happening, you should find active processes belonging to
> the same users (the hpterm, in the example).  These will be the
> program started by the zombie (which was a shell).

This isn't exactly the same problem you're seeing, but...
I got tired of seeing remshd hanging out in memory waiting on stuff, so 
I wrote a remsh replacement (I call qremsh) that just does the remote
schedule without wait & dies. It closes off stdin/out/err before doing
the remote schedule (uses inetd). It was both an exercise in using the
network programming and conserving memory. I'm not sure if I'm missing
something, but it seems to work fine for what I want to do
(hpterm,xload,etc) since none of them care about stdin/out/err that they
inherit.

Let me know if you want it & I'll try to package it up.

Scott Blachowicz
USPS:  Graphicus                UUCP:    ...!hpubvwa!grlab!scott
       150 Lake Str S, #206     VoicePh: 206/828-4691
       Kirkland, WA 98033       FAX:     206/828-4236