[comp.lang.perl] interactive governor in Perl

lijewski@theory.TC.Cornell.EDU (Mike Lijewski) (06/24/91)

Appended below is a daemon, written in Perl, which we use to regulate the
rate at which our interactive users can burn cycles.  I thought it might be
of interest to other production sites.

Mike Lijewski  (H)607/272-0238 (W)607/254-8686
Cornell National Supercomputer Facility
ARPA: mjlx@eagle.tc.cornell.edu  BITNET: mjlx@cornellf.bitnet
SMAIL:  25 Renwick Heights Road, Ithaca, NY  14850

The readme:

Governor is an interactive monitor developed for AIX/370.  It was
designed to encourage our users to use NQS instead of running large
jobs on our interactive site.  It runs on our interactive site as a
daemon, doing a `ps' every X seconds.  It then sorts the `ps' data
and checks for non root-owned processes which have been around for
the last two `ps's.  If it finds such a process, if that process has
used more than Y% of the CPU in that interval, if the process' owner
isn't "privileged" and if the process has accumulated more than Z
seconds, it is "punished".  As configured now, the process is
killed.  If you have renice(2), you might prefer to renice the
process or something similar.  The user, as well as a list of
interested parties, is then sent a mailfile.  The governor is table
driven, with major parameters read in at startup.  Sending the
governor a SIGUSR1 will cause it to re-read it's startup file, so you
can change parameters while it is running.  To get this running on
another system you'll need to set the pathname for the startup file,
set the options to `ps' to those appropriate for your system, and
check that the parsing of `ps' lines does the "right thing" with your
version of `ps'.  The three UNIX commands it uses are `ps', `mail'
and `groups'.  If your system doesn't disconnect controlling
terminals on setpgrp(2), you might want to modify how governor
daemonizes, though since SIGINT, SIGQUIT and SIGTSTP are ignored, it
should run OK as is.


#!/bin/sh
# This is a shell archive.  Remove anything before this line,
# then feed it into a shell via "sh file" or similar. To overwrite
# existing files, type "sh file -c".
#
# Wrapped by mjlx on Sun Jun 23 16:34:04 EDT 1991
#
# Contents: governor gov.table gov.readme
#
PATH=/bin:/usr/bin:/usr/ucb; export PATH;
if test -f 'governor' -a "${1}" != "-c" ; then
  echo Will not clobber existing file 'governor'
else
  echo Extracting "governor" \(12025 bytes\)
  sed 's/^X//' > governor <<'END_OF_FILE governor'
X#!/usr/local/sys/bin/perl
X
X#
X# governor.pl 2.5   Delta'd: 15:43:06 6/10/91   Mike Lijewski, CNSF
X#
X# governor - This is a script which monitors the non-root processes
X#            for processes using "too much" cpu time in a moving
X#            window of time.  It starts itself as a daemon and writes
X#            a log of those processes it catches to standard output.
X#            Furthermore, it will kill those processes which are abusing
X#            the interactive nature of the system, provided they aren't
X#            privileged to do so.
X#
X#            To start it, updating a log file:
X#
X#                governor >> governor.log 2>&1     - under Bourne shell
X#                governor >>& governor.log         - under C shell
X#
X#            There is no need to put it in the background, as it
X#            daemonizes itself. 
X#
X
X#
X# requires
X#
Xrequire 'ctime.pl';
X
X#
X# ******************** main routine ********************
X#
X
X# turn on debugging and warnings
X$debugging = 1, $ = 1 if $ARGV[0] eq '-d';
X
X# initialization table
X$table = $debugging ? '/staff/mjlx/src/perl/gov.table' : '/usr/local/sys/lib/gov.table';
X
X#
X# Settable global variables - these variables are set in our
X# initialization table.  If the table changes, all you need to do is
X# send the "governor" a SIGUSR1 and it will re-read the table.
X#
X$sleep_interval;    # seconds to sleep between checking process statistics
X$cpu_rate;          # what we consider excessive cpu utilization
X$cpu_threshold;     # OK to kill if accumulated more cpu seconds than this
X@privileged_groups; # list of privileged groups which we never "govern"
X@privileged_users;  # list of users we never govern
X@maintainers;       # list of maintainers of this script
X@mailees;           # list of other interested parties who receive "kill" notes
X
X#
X# other globals
X#
X$ps_opts        = 'aux'; # options to "ps"
X@ps_begin       = ();    # contains "ps" at beginning of time window
X@ps_end         = ();    # contains "ps" at end of time window
X$got_sigusr1    = 0;     # set to "true" if we catch a SIGUSR1
X$slept          = 0;     # how long we slept in main 'sleep'
X
X# our path - we only need access to "ps", "mail" & "groups" currently.
X$ENV{'PATH'} = '/bin:/usr/bin:/usr/ucb/bin';
X
X# some signal handlers
X$SIG{'INT'}    = 'IGNORE';
X$SIG{'QUIT'}   = 'IGNORE';
X$SIG{'HANGUP'} = 'IGNORE';
X$SIG{'TSTP'}   = 'IGNORE';
X$SIG{'TERM'}   = 'cleanup';
X$SIG{'USR1'}   = 'IGNORE';
X
X# daemonize
Xfork && exit; setpgrp(0, $$); close STDIN; close STDOUT; chdir('/');
X
Xrequire $table;   # read initialization table
X
X#
X# If debugging, nullify these lists.  NEVER run with debugging mode
X# on as root.  You could kill some legitimate processes.
X#
Xif ($debugging) {
X    @privileged_groups = @privileged_users = @mailees = ();
X    @maintainers = ('mjlx');
X}
X
Xprint STDERR "started at ", &ctime($), "rate: ", sprintf("%4.2f", $cpu_rate),
X      ", threshold: $cpu_threshold, interval: $sleep_interval\n";
X
X&get_process_stats;  # get most recent process statistics
X
Xfor (;;) {  # do forever
X    $SIG{'USR1'}   = 'usr1_handler';
X    #
X    # This 'sleep' is the only place in the code where we will
X    # accept a SIGUSR1.  Everywhere else we IGNORE it.  If you send a
X    # SIGUSR1 and don't immediately get a "caught SIGUSR1 ..." in the
X    # logfile, wait a bit and send it again.
X    #
X    $slept = sleep $sleep_interval;
X    $SIG{'USR1'}   = 'IGNORE';
X    &reinitialize if $got_sigusr1;   # reinitialize if caught SIGUSR1
X    @ps_begin = @ps_end;             # save previous process statistics
X    &get_process_stats;              # get most recent process statistics
X    &find_abusers;                   # find and kill abusive processes
X}
Xexit 0;
X
X#
X# Two sample "ps aux" lines from AIX/370:
X#
X# msq       123195  0.0  0.7 2040 1572      qe I>    0:05 emacs Makefile
X# sgh       143050  0.0  0.1  396  192      p1 I>    0:00 -csh
X#
X# Parsing "ps" lines can be a mite tricky because, at least with
X# "aux", we aren't guaranteed that there is always a space between
X# fields.  In particular, if it happens that ps reports 100.0%
X# cpu utilization or better, this will run into the pid field.
X#
X
X#
X# These two subroutines implement a numerical sort by pid on
X# an array of "ps" output of the above form.
X#
Xsub _by_pid { $pids[$a] <=> $pids[$b]; } 
Xsub sort_by_pid
X{
X    local(@pids) = ();
X    foreach (@_) { /^\S+\s+(\d+)/ && push(@pids, $1); } 
X    @_[sort _by_pid 0..$#pids];
X}
X
X#
X# This subroutine calls ps, deletes all processes owned by root,
X# and then sorts it by pid.  It writes into the global @ps_end
X# and returns nothing useful.
X#
X# usage: &get_process_stats
X#
Xsub get_process_stats
X{
X    open (PS, "ps $ps_opts 2>/dev/null |");
X    @ps_end = <PS>;  # slurp up the ps output
X    shift @ps_end;   # chop off the header
X    @ps_end = grep(!/^root\b/, @ps_end);    # delete root processes
X    close PS;
X    #
X    # This should "really" be a die since this command should never fail
X    # in normal operation.  Unfortunately, on AIX/370, I've seen the
X    # following message in my log file: 'sh: 111615 Floating exception'.
X    # What this means to me is that 'ps' is failing somehow.  It is
X    # probably happening right before a system crash when the machine is
X    # foobar'd.  To be safe however, I'll timestamp the message and make it
X    # a warning instead.  If the machine is actually about to go down, we
X    # crash.  If however it is an obscure bug, not dying is OK since we still
X    # sort the output of the 'ps'.  If there is no output, we won't find any
X    # killable processes and we'll just go to sleep and try again later.
X    #
X    warn "error on `ps' pipe at ", &ctime(time) if $?;
X}
X
X#
X# This subroutine finds those processes using "too much" CPU time
X# and "punishes" them if they aren't appropriately privileged.
X#
X# usage: &find_abusers
X#
Xsub find_abusers
X{
X    local(@merged_pids) = &sort_by_pid(@ps_begin, @ps_end); # merge old and new
X
X    local($i,$min1,$min2,$sec1,$sec2,$cpu,$uname,$pid1,$pid2,$time1,$time2);
X    local($cpu1, $cpu2, $time, $ps_line);
X
X    ($pid1, $time1) = $merged_pids[0] =~ /^\S+\s+(\d+).{3}[.][^:]+(\d+:\d+)/;
X
X    for ($i = 1; $i <= $#merged_pids; $i++) {  # loop through lines
X
X        ($pid2,$time2)=$merged_pids[$i] =~ /^\S+\s+(\d+).{3}[.][^:]+(\d+:\d+)/;
X
X        #
X        # if pids are identical and time fields are different
X        #
X        if (($pid1 == $pid2) && ($time1 ne $time2)) {
X            #
X            # The process has run for $sleep_interval seconds and has
X            # accumulated some time in the last interval.
X            #
X            ($min1, $sec1) = $time1 =~ /(\d+):(\d+)/;
X            ($min2, $sec2) = $time2 =~ /(\d+):(\d+)/;
X            #
X            # We don't know which of $cpu1 or $cpu2 is larger because
X            # the sort is not stable.
X            #
X            $cpu1 = (60 * $min1 + $sec1);
X            $cpu2 = (60 * $min2 + $sec2);
X            $cpu = $cpu2 - $cpu1;
X            $cpu = -$cpu if $cpu < 0;  # make sure cpu is positive
X            if ($cpu / $sleep_interval  > $cpu_rate) {
X                ($uname) = $merged_pids[$i] =~ /^(\S+)\b/;
X                #
X                # Process $pid2 accumulated cputime faster than $cpu_rate
X                # in the previous interval - log it regardless of owner's
X                # privileges.
X                #
X                chop($time = &ctime(time));
X                printf STDERR "%19.19s %8s\'s process %8s used %3d seconds last interval\n",
X                       $time, $uname, $pid2, $cpu;
X                if ($cpu1 > $cpu_threshold || $cpu2 > $cpu_threshold) {
X                    #
X                    # the process is over the "total-cpu-used" threshold 
X                    #
X                    ($uname) = $merged_pids[$i] =~ /^(\S+)\b/;
X                    $ps_line = $merged_pids[$cpu1 > $cpu2 ? $i - 1 : $i];
X                    #
X                    # punish the infidel, unless privileged
X                    #
X                    &punish($pid2,$ps_line,$uname) unless &privileged($uname);
X                }
X            }
X        }
X	($pid1, $time1) = ($pid2, $time2); # update last line
X    }
X}
X
X#
X# This subroutine returns true or false depending on whether or not
X# the first argument, taken to be a uname, has privileges.
X# Specifically, if $uname is in @privileged_users or
X# @privileged_groups, they can abuse the system to their hearts content.  
X#
X# usage: &privileged($uname)
X#
Xsub privileged
X{
X    return 1 if grep(/\b$_[0]\b/, @privileged_users);
X    foreach $group (split(' ', `groups $_[0]`)) { 
X        return 1 if grep(/\b$group\b/, @privileged_groups);
X    }
X    0;
X}
X
X#
X# In this subroutine we "deal" with cpu abusers. We pass the username
X# of the process, the pid of the process and the chronologically last
X# "ps" line showing the process.
X#
X# usage: &punish($pid, $ps_line, $uname);
X#
Xsub punish
X{
X    #
X    # Kill the process and mail the abuser and interested  parties a
X    # note, if we successfully killed it.  There is always the
X    # possibility that the process has exited since the last "ps".
X    # When running in debug mode under a non root-privileged pid, this
X    # also keeps us from sending mailfiles to processes we tried to kill,
X    # but in fact didn't.  This is because 'kill' only lets us kill our
X    # own processes, unless we're root.
X    #
X    if (kill 'KILL', $_[0]) {  # if the process was successfully killed
X        &mail_note(@_);   # note to the user
X        printf STDERR "killed %8s's process %8d at %s", $_[2], $_[0], &ctime(time);
X        &mail_note($_[0], $_[1], @mailees, @maintainers);
X    }
X}    
X
X#
X# Routine which mails a note to a collection of recipients, regarding
X# a process which abused the system.  We pass the pid of the process,
X# the chronologically last "ps" line showing the process, and a list
X# of recipients.
X#
X# usage: &mail_note($pid, $ps_line, @recipients);
X#
Xsub mail_note
X{
X    open(MAIL, "| mail -s \'your process $_[0] killed\' @_[2..$#_]");
X    print MAIL "Your process $_[0] running on eagle was killed because ",
X               "it was\nconsuming a significant amount of system resources.\n",
X               "\n\'ps aux\' showed:\n\n$_[1]\n",
X               "Eagle is for interactive use only.  Do not run any large ",
X               "jobs on eagle.\n\nPlease refer to our policy on interactive ",
X               "and batch use, available online\nby issuing cnsfinfo.  ",
X               "Select the Policies menu, then the Large.Jobs submenu.\n\n",
X               "If you have any questions, please send email to ",
X               "consult@eagle.tc.cornell.edu,\nor call 607-254-8686 and ",
X               "ask for a consultant.";
X    close MAIL;
X    die "error on `mail' pipe at ", &ctime(time) if $?;
X}
X
X#
X# This routine is called if we wake up and find that we've
X# caught a SIGUSR1.  Note that sending a SIGUSR1 will interrupt
X# our `sleep'.  That is to say, if we wake up and find out that
X# $got_sigusr1 is "true", we must check $slept to see how long
X# we really slept.
X# 
Xsub reinitialize {
X    $got_sigusr1 = 0;  # reset $got_sigusr1 to "false"
X    delete $INC{$table}, require $table;    # reread initialization table
X    print STDERR "reinitializing ...rate: ", sprintf("%4.2f", $cpu_rate),
X          ", threshold: $cpu_threshold, interval: $sleep_interval\n\n";
X    if ($slept < $sleep_interval) {
X        # go to sleep until have slept $sleep_interval seconds
X        sleep $sleep_interval - $slept;
X    }
X    elsif ($slept > $sleep_interval) {
X        # in this case, we simply start over
X        &get_process_stats;
X        sleep $sleep_interval;
X    }
X}
X
X#
X# Deal with SIGUSR1 - set flag indicating that we need to
X# reread our initialization file.
X#
Xsub usr1_handler {
X    $SIG{'USR1'}   = 'IGNORE';  # in case another is sent "really" quickly
X    print STDERR "caught SIGUSR1 ... ";
X    $got_sigusr1 = 1;
X}
X    
X#
X# catch SIGTERM and print out exiting message
X#
Xsub cleanup {
X    local($time);
X    chop($time = &ctime(time));
X    print STDERR "caught SIG$_[0] at $time - exiting ...\n";
X    exit 0;
X}
END_OF_FILE governor
  if test 12025 -ne `wc -c <'governor'`; then
    echo 'governor' unpacked with wrong size!
  fi
fi
chmod 700 governor
if test -f 'gov.table' -a "${1}" != "-c" ; then
  echo Will not clobber existing file 'gov.table'
else
  echo Extracting "gov.table" \(717 bytes\)
  sed 's/^X//' > gov.table <<'END_OF_FILE gov.table'
X#
X# governor's initialization table
X#
X# each line in this file must be a legitimate perl line
X#
X# gov.table 1.3   Delta'd: 15:36:19 6/10/91   Mike Lijewski, CNSF
X#
X
X$sleep_interval = 300;  # seconds to sleep between checking process statistics
X
X$cpu_rate       = .2;   # what we consider excessive cpu utilization
X
X$cpu_threshold  = 300;  # OK to kill if accumulated more cpu seconds than this
X
X# groups we don't mess with
X@privileged_groups = ('staff','operator','root','daemon','bin','sys','adm','mail','notes','tty');
X
X# users  we don't mess with
X@privileged_users  = ();
X
X# maintainers of this script
X@maintainers = ('mjlx');
X
X# people to mail once a user is caught
X@mailees = ('consult', 'frv', 'pdb', 'len'); 
X
END_OF_FILE gov.table
  if test 717 -ne `wc -c <'gov.table'`; then
    echo 'gov.table' unpacked with wrong size!
  fi
fi
chmod 400 gov.table
if test -f 'gov.readme' -a "${1}" != "-c" ; then
  echo Will not clobber existing file 'gov.readme'
else
  echo Extracting "gov.readme" \(1481 bytes\)
  sed 's/^X//' > gov.readme <<'END_OF_FILE gov.readme'
XGovernor is an interactive monitor developed for AIX/370.  It was
Xdesigned to encourage our users to use NQS instead of running large
Xjobs on our interactive site.  It runs on our interactive site as a
Xdaemon, doing a `ps' every X seconds.  It then sorts the `ps' data
Xand checks for non root-owned processes which have been around for
Xthe last two `ps's.  If it finds such a process, if that process has
Xused more than Y% of the CPU in that interval, if the process' owner
Xisn't "privileged" and if the process has accumulated more than Z
Xseconds, it is "punished".  As configured now, the process is
Xkilled.  If you have renice(2), you might prefer to renice the
Xprocess or something similar.  The user, as well as a list of
Xinterested parties, is then sent a mailfile.  The governor is table
Xdriven, with major parameters read in at startup.  Sending the
Xgovernor a SIGUSR1 will cause it to re-read it's startup file, so you
Xcan change parameters while it is running.  To get this running on
Xanother system you'll need to set the pathname for the startup file,
Xset the options to `ps' to those appropriate for your system, and
Xcheck that the parsing of `ps' lines does the "right thing" with your
Xversion of `ps'.  The three UNIX commands it uses are `ps', `mail'
Xand `groups'.  If your system doesn't disconnect controlling
Xterminals on setpgrp(2), you might want to modify how governor
Xdaemonizes, though since SIGINT, SIGQUIT and SIGTSTP are ignored, it
Xshould run OK as is.
END_OF_FILE gov.readme
  if test 1481 -ne `wc -c <'gov.readme'`; then
    echo 'gov.readme' unpacked with wrong size!
  fi
fi
chmod 600 gov.readme
exit 0
-- 
Mike Lijewski  (H)607/272-0238 (W)607/254-8686
Cornell National Supercomputer Facility
ARPA: mjlx@eagle.tc.cornell.edu  BITNET: mjlx@cornellf.bitnet
SMAIL:  25 Renwick Heights Road, Ithaca, NY  14850