[comp.unix.questions] rloginds

enteles@tahoe.unr.edu (Philip Enteles) (10/30/90)

	We are having some trouble at our site and I would like to know 
if anyone has had a similar problem. 
	We are running 4.3 BSD on a Sperry 7000-40. This system is the
backbone of our undergraduate computer science department and has about
500 users. The system itself seems to run fine however left to its own
devices it accumulates a number of rlogind processes that become hung.
We are not sure where these processes are coming from but they only
effect ptys. These stay hung until they are killed. If they are not
killed at some point they begin to hold all the ptys and error messages
start appearing when users try to do things like use script or login in
from a dial-up port(like a sytec box). The error message is 
	'no more pty's'
When we reboot or manually kill those process everything is fine. 
	Checking the device table in /dev the pty nodes that are hung
have the permissions unset. The following is a partial listing of this:

crw-rw-rw-  1 root     wheel      9,   0 Oct 26 10:59 /dev/ttyp0
c---------  1 root     wheel      9,   1 Oct 22 15:45 /dev/ttyp1
c---------  1 root     wheel      9,   2 Oct 22 11:43 /dev/ttyp2
c---------  1 root     wheel      9,   3 Oct 20 14:15 /dev/ttyp3
c---------  1 root     wheel      9,   4 Oct 19 00:20 /dev/ttyp4
c---------  1 root     wheel      9,   5 Oct 19 00:25 /dev/ttyp5
c---------  1 root     wheel      9,   6 Oct 23 08:06 /dev/ttyp6
c---------  1 root     wheel      9,   7 Oct 25 21:44 /dev/ttyp7
crw--w----  1 fran     tty        9,   8 Oct 26 12:11 /dev/ttyp8
crw--w----  1 ed       tty        9,   9 Oct 26 11:27 /dev/ttyp9
c---------  1 root     wheel      9,  10 Oct 23 09:42 /dev/ttypa
crw--w----  1 garav    tty        9,  11 Oct 26 12:11 /dev/ttypb
c---------  1 root     wheel      9,  12 Oct 22 18:40 /dev/ttypc
crw--w----  1 melanie  tty        9,  13 Oct 26 12:05 /dev/ttypd
c---------  1 root     wheel      9,  14 Oct 24 15:40 /dev/ttype
crw-rw-rw-  1 root     wheel      9,  15 Oct 26 12:10 /dev/ttypf
c---------  1 root     wheel      9,  16 Oct 23 10:55 /dev/ttyq0
crw-rw-rw-  1 root     wheel      9,  17 Oct 26 12:11 /dev/ttyq1
crw-rw-rw-  1 root     wheel      9,  18 Oct 26 12:10 /dev/ttyq2
crw-rw-rw-  1 root     wheel      9,  19 Oct 26 12:09 /dev/ttyq3
crw-rw-rw-  1 root     wheel      9,  20 Oct 26 12:09 /dev/ttyq4
crw--w----  1 hong     tty        9,  21 Oct 26 12:11 /dev/ttyq5
crw--w----  1 cheng    tty        9,  22 Oct 26 12:11 /dev/ttyq6
crw--w----  1 woods    tty        9,  23 Oct 26 12:11 /dev/ttyq7
crw-rw-rw-  1 root     wheel      9,  24 Oct 26 12:03 /dev/ttyq8

	The lines with root are not being used, the lines with names
on them are in use and the lines with root and no set permissions
are the hung ptys.
	An example of the process status follows:

root     11576  0.0  0.0    47     3 p5 IW   0:00 rlogind
root      5505  0.0  0.0    48     3 p2 IW   0:00  (rlogind)
root     26592  0.0  0.0    48     3 q0 IW   0:00  (rlogind)
root     22903  0.0  0.0    48     3 pa IW   0:00  (rlogind)
root     17988  0.0  0.0    48     3 p3 IW   0:00  (rlogind)
root     20143  0.0  0.0    48     3 p6 IW   0:00  (rlogind)
root     21751  0.0  0.0    48     3 pc IW   0:00  (rlogind)
root     11498  0.0  0.0    47     3 p4 IW   0:00 rlogind
Fri Oct 26 10:47:36 PDT 1990

	The processes are idle and waiting but I don't know what
they are waiting for. They aren't taking any resources except
the use of a pty. As long as the system is rebooted they don't
present a problem but I would like to know what is causing them 
and how to fix it so that they system can be allowed to run for
extended periods with minimal maintance. 
	Please reply by e-mail and I will summarize for the net.
I would like to hear from anyone who has a clue about this.

	thanks
		Philip Enteles

enteles@tahoe.unr.edu

weimer@ssd.kodak.com (Gary Weimer) (11/08/90)

In article <4852@tahoe.unr.edu> enteles@tahoe.unr.edu (Philip Enteles) writes:
>
>	We are having some trouble at our site and I would like to know 
>if anyone has had a similar problem. 
>	We are running 4.3 BSD on a Sperry 7000-40. This system is the
>backbone of our undergraduate computer science department and has about
>500 users. The system itself seems to run fine however left to its own
>devices it accumulates a number of rlogind processes that become hung.
>We are not sure where these processes are coming from but they only
>effect ptys. These stay hung until they are killed. If they are not
>killed at some point they begin to hold all the ptys and error messages
>start appearing when users try to do things like use script or login in
>from a dial-up port(like a sytec box). The error message is 
>	'no more pty's'
>When we reboot or manually kill those process everything is fine. 


This doesn't help you with the PROBLEM, but it may function as a
temporary fix. It is a shell script that will automatically find
and kill these errant logind processes (based on the info you supplied).


Here is the main script to be called by cron (or manually)
(note that program assumes a test mode (see 'set TEST') move
 the comment to actually perfom kill)

------------------------- CUT HERE -------------------------
#!/bin/csh -fb
#
#     FILE: fxlogin
#     DESC: fix errant logind processes
#

# find path and name for this program
set PROGRAM = $0
set PATH    = $PROGRAM:h
set PROGRAM = $PROGRAM:t
if ("$PATH" == "$PROGRAM") set PATH=$cwd

# assume PARSER (used for awk) has same name as this prog with .parse ext
set PARSER  = $PATH/$PROGRAM.parse

set SIG    = -9                   # signal for killing processes
set TMP    = "/tmp/tmp$$"         # tmp file

set TEST   = 1                    # test run of program, don't kill processes
#set TEST   = 0                    # the real thing, DO kill processes

# find which terminals have errant logind's and put the list in LIST
set LIST = ls -l /dev/ttyp? | awk '{if ($3 == "root" && $1 == "c---------") print substr($10,length($10)-2)}'
# in case your mailer truncates here is a copy of above line
# set LIST = ls -l /dev/ttyp? | awk '{if ($3 == "root" && $1 == "c---------")
# print substr($10,length($10)-2)}'

# if no processes to kill, then exit
if ("$LIST" == "") exit

# find all logind process and put them in TMP in '<tty> <pid>' format,
# sorted by <tty>
ps -aux | grep "rlogind" | grep -v "grep" | awk '{print $7 " " $2}' | sort >$TMP

# store pid's of errant logins in PIDS
set PIDS = `(echo $LIST; cat $TMP) | awk -f $PARSER`

rm $TMP

if ("$TEST") then
    echo "In test mode. Would have killed pids:"
    echo "    $PIDS"
else
    kill $SIG PIDS
endif
------------------------- CUT HERE -------------------------

and here is the paser used by above program:

------------------------- CUT HERE -------------------------
#
#     FILE: fxlogin.parse
#     DESC: awk script used by fxlogin
#
#     first line of input (nt == 0) is sorted list of tty's to kill processes on
#     remaining lines have <tty> <pid>, lines sorted in tty order
#
#     nt      = number of tty's to kill processes of
#     ct      = current tty to kill processes of
#     tty[]   = array of tty's to kill processes of
#
#     NOTE:  does not assume one process per tty

# get tty's to kill processes on
{if (nt == 0) {
		if (NF == 0) exit;
                for (nt=1; nt <= NF; nt++) {
                        tty[nt] = $nt
                }
                nt--;
                ct = 1;
                next
        }
}

# found a process for tty[ct]
{if (tty[ct] == $1) {
                print $2;
                next
        }
}

# tty[ct] has no more processes to kill
{while (ct <= nt && tty[ct] < $1) ct++ }

# this tty not in kill list
------------------------- CUT HERE -------------------------

I hope you've found the problem and don't need this.

Gary