lijewski@theory.TC.Cornell.EDU (Mike Lijewski) (06/24/91)
Appended below is a daemon, written in Perl, which we use to regulate the rate at which our interactive users can burn cycles. I thought it might be of interest to other production sites. Mike Lijewski (H)607/272-0238 (W)607/254-8686 Cornell National Supercomputer Facility ARPA: mjlx@eagle.tc.cornell.edu BITNET: mjlx@cornellf.bitnet SMAIL: 25 Renwick Heights Road, Ithaca, NY 14850 The readme: Governor is an interactive monitor developed for AIX/370. It was designed to encourage our users to use NQS instead of running large jobs on our interactive site. It runs on our interactive site as a daemon, doing a `ps' every X seconds. It then sorts the `ps' data and checks for non root-owned processes which have been around for the last two `ps's. If it finds such a process, if that process has used more than Y% of the CPU in that interval, if the process' owner isn't "privileged" and if the process has accumulated more than Z seconds, it is "punished". As configured now, the process is killed. If you have renice(2), you might prefer to renice the process or something similar. The user, as well as a list of interested parties, is then sent a mailfile. The governor is table driven, with major parameters read in at startup. Sending the governor a SIGUSR1 will cause it to re-read it's startup file, so you can change parameters while it is running. To get this running on another system you'll need to set the pathname for the startup file, set the options to `ps' to those appropriate for your system, and check that the parsing of `ps' lines does the "right thing" with your version of `ps'. The three UNIX commands it uses are `ps', `mail' and `groups'. If your system doesn't disconnect controlling terminals on setpgrp(2), you might want to modify how governor daemonizes, though since SIGINT, SIGQUIT and SIGTSTP are ignored, it should run OK as is. #!/bin/sh # This is a shell archive. Remove anything before this line, # then feed it into a shell via "sh file" or similar. To overwrite # existing files, type "sh file -c". # # Wrapped by mjlx on Sun Jun 23 16:34:04 EDT 1991 # # Contents: governor gov.table gov.readme # PATH=/bin:/usr/bin:/usr/ucb; export PATH; if test -f 'governor' -a "${1}" != "-c" ; then echo Will not clobber existing file 'governor' else echo Extracting "governor" \(12025 bytes\) sed 's/^X//' > governor <<'END_OF_FILE governor' X#!/usr/local/sys/bin/perl X X# X# governor.pl 2.5 Delta'd: 15:43:06 6/10/91 Mike Lijewski, CNSF X# X# governor - This is a script which monitors the non-root processes X# for processes using "too much" cpu time in a moving X# window of time. It starts itself as a daemon and writes X# a log of those processes it catches to standard output. X# Furthermore, it will kill those processes which are abusing X# the interactive nature of the system, provided they aren't X# privileged to do so. X# X# To start it, updating a log file: X# X# governor >> governor.log 2>&1 - under Bourne shell X# governor >>& governor.log - under C shell X# X# There is no need to put it in the background, as it X# daemonizes itself. X# X X# X# requires X# Xrequire 'ctime.pl'; X X# X# ******************** main routine ******************** X# X X# turn on debugging and warnings X$debugging = 1, $ = 1 if $ARGV[0] eq '-d'; X X# initialization table X$table = $debugging ? '/staff/mjlx/src/perl/gov.table' : '/usr/local/sys/lib/gov.table'; X X# X# Settable global variables - these variables are set in our X# initialization table. If the table changes, all you need to do is X# send the "governor" a SIGUSR1 and it will re-read the table. X# X$sleep_interval; # seconds to sleep between checking process statistics X$cpu_rate; # what we consider excessive cpu utilization X$cpu_threshold; # OK to kill if accumulated more cpu seconds than this X@privileged_groups; # list of privileged groups which we never "govern" X@privileged_users; # list of users we never govern X@maintainers; # list of maintainers of this script X@mailees; # list of other interested parties who receive "kill" notes X X# X# other globals X# X$ps_opts = 'aux'; # options to "ps" X@ps_begin = (); # contains "ps" at beginning of time window X@ps_end = (); # contains "ps" at end of time window X$got_sigusr1 = 0; # set to "true" if we catch a SIGUSR1 X$slept = 0; # how long we slept in main 'sleep' X X# our path - we only need access to "ps", "mail" & "groups" currently. X$ENV{'PATH'} = '/bin:/usr/bin:/usr/ucb/bin'; X X# some signal handlers X$SIG{'INT'} = 'IGNORE'; X$SIG{'QUIT'} = 'IGNORE'; X$SIG{'HANGUP'} = 'IGNORE'; X$SIG{'TSTP'} = 'IGNORE'; X$SIG{'TERM'} = 'cleanup'; X$SIG{'USR1'} = 'IGNORE'; X X# daemonize Xfork && exit; setpgrp(0, $$); close STDIN; close STDOUT; chdir('/'); X Xrequire $table; # read initialization table X X# X# If debugging, nullify these lists. NEVER run with debugging mode X# on as root. You could kill some legitimate processes. X# Xif ($debugging) { X @privileged_groups = @privileged_users = @mailees = (); X @maintainers = ('mjlx'); X} X Xprint STDERR "started at ", &ctime($), "rate: ", sprintf("%4.2f", $cpu_rate), X ", threshold: $cpu_threshold, interval: $sleep_interval\n"; X X&get_process_stats; # get most recent process statistics X Xfor (;;) { # do forever X $SIG{'USR1'} = 'usr1_handler'; X # X # This 'sleep' is the only place in the code where we will X # accept a SIGUSR1. Everywhere else we IGNORE it. If you send a X # SIGUSR1 and don't immediately get a "caught SIGUSR1 ..." in the X # logfile, wait a bit and send it again. X # X $slept = sleep $sleep_interval; X $SIG{'USR1'} = 'IGNORE'; X &reinitialize if $got_sigusr1; # reinitialize if caught SIGUSR1 X @ps_begin = @ps_end; # save previous process statistics X &get_process_stats; # get most recent process statistics X &find_abusers; # find and kill abusive processes X} Xexit 0; X X# X# Two sample "ps aux" lines from AIX/370: X# X# msq 123195 0.0 0.7 2040 1572 qe I> 0:05 emacs Makefile X# sgh 143050 0.0 0.1 396 192 p1 I> 0:00 -csh X# X# Parsing "ps" lines can be a mite tricky because, at least with X# "aux", we aren't guaranteed that there is always a space between X# fields. In particular, if it happens that ps reports 100.0% X# cpu utilization or better, this will run into the pid field. X# X X# X# These two subroutines implement a numerical sort by pid on X# an array of "ps" output of the above form. X# Xsub _by_pid { $pids[$a] <=> $pids[$b]; } Xsub sort_by_pid X{ X local(@pids) = (); X foreach (@_) { /^\S+\s+(\d+)/ && push(@pids, $1); } X @_[sort _by_pid 0..$#pids]; X} X X# X# This subroutine calls ps, deletes all processes owned by root, X# and then sorts it by pid. It writes into the global @ps_end X# and returns nothing useful. X# X# usage: &get_process_stats X# Xsub get_process_stats X{ X open (PS, "ps $ps_opts 2>/dev/null |"); X @ps_end = <PS>; # slurp up the ps output X shift @ps_end; # chop off the header X @ps_end = grep(!/^root\b/, @ps_end); # delete root processes X close PS; X # X # This should "really" be a die since this command should never fail X # in normal operation. Unfortunately, on AIX/370, I've seen the X # following message in my log file: 'sh: 111615 Floating exception'. X # What this means to me is that 'ps' is failing somehow. It is X # probably happening right before a system crash when the machine is X # foobar'd. To be safe however, I'll timestamp the message and make it X # a warning instead. If the machine is actually about to go down, we X # crash. If however it is an obscure bug, not dying is OK since we still X # sort the output of the 'ps'. If there is no output, we won't find any X # killable processes and we'll just go to sleep and try again later. X # X warn "error on `ps' pipe at ", &ctime(time) if $?; X} X X# X# This subroutine finds those processes using "too much" CPU time X# and "punishes" them if they aren't appropriately privileged. X# X# usage: &find_abusers X# Xsub find_abusers X{ X local(@merged_pids) = &sort_by_pid(@ps_begin, @ps_end); # merge old and new X X local($i,$min1,$min2,$sec1,$sec2,$cpu,$uname,$pid1,$pid2,$time1,$time2); X local($cpu1, $cpu2, $time, $ps_line); X X ($pid1, $time1) = $merged_pids[0] =~ /^\S+\s+(\d+).{3}[.][^:]+(\d+:\d+)/; X X for ($i = 1; $i <= $#merged_pids; $i++) { # loop through lines X X ($pid2,$time2)=$merged_pids[$i] =~ /^\S+\s+(\d+).{3}[.][^:]+(\d+:\d+)/; X X # X # if pids are identical and time fields are different X # X if (($pid1 == $pid2) && ($time1 ne $time2)) { X # X # The process has run for $sleep_interval seconds and has X # accumulated some time in the last interval. X # X ($min1, $sec1) = $time1 =~ /(\d+):(\d+)/; X ($min2, $sec2) = $time2 =~ /(\d+):(\d+)/; X # X # We don't know which of $cpu1 or $cpu2 is larger because X # the sort is not stable. X # X $cpu1 = (60 * $min1 + $sec1); X $cpu2 = (60 * $min2 + $sec2); X $cpu = $cpu2 - $cpu1; X $cpu = -$cpu if $cpu < 0; # make sure cpu is positive X if ($cpu / $sleep_interval > $cpu_rate) { X ($uname) = $merged_pids[$i] =~ /^(\S+)\b/; X # X # Process $pid2 accumulated cputime faster than $cpu_rate X # in the previous interval - log it regardless of owner's X # privileges. X # X chop($time = &ctime(time)); X printf STDERR "%19.19s %8s\'s process %8s used %3d seconds last interval\n", X $time, $uname, $pid2, $cpu; X if ($cpu1 > $cpu_threshold || $cpu2 > $cpu_threshold) { X # X # the process is over the "total-cpu-used" threshold X # X ($uname) = $merged_pids[$i] =~ /^(\S+)\b/; X $ps_line = $merged_pids[$cpu1 > $cpu2 ? $i - 1 : $i]; X # X # punish the infidel, unless privileged X # X &punish($pid2,$ps_line,$uname) unless &privileged($uname); X } X } X } X ($pid1, $time1) = ($pid2, $time2); # update last line X } X} X X# X# This subroutine returns true or false depending on whether or not X# the first argument, taken to be a uname, has privileges. X# Specifically, if $uname is in @privileged_users or X# @privileged_groups, they can abuse the system to their hearts content. X# X# usage: &privileged($uname) X# Xsub privileged X{ X return 1 if grep(/\b$_[0]\b/, @privileged_users); X foreach $group (split(' ', `groups $_[0]`)) { X return 1 if grep(/\b$group\b/, @privileged_groups); X } X 0; X} X X# X# In this subroutine we "deal" with cpu abusers. We pass the username X# of the process, the pid of the process and the chronologically last X# "ps" line showing the process. X# X# usage: &punish($pid, $ps_line, $uname); X# Xsub punish X{ X # X # Kill the process and mail the abuser and interested parties a X # note, if we successfully killed it. There is always the X # possibility that the process has exited since the last "ps". X # When running in debug mode under a non root-privileged pid, this X # also keeps us from sending mailfiles to processes we tried to kill, X # but in fact didn't. This is because 'kill' only lets us kill our X # own processes, unless we're root. X # X if (kill 'KILL', $_[0]) { # if the process was successfully killed X &mail_note(@_); # note to the user X printf STDERR "killed %8s's process %8d at %s", $_[2], $_[0], &ctime(time); X &mail_note($_[0], $_[1], @mailees, @maintainers); X } X} X X# X# Routine which mails a note to a collection of recipients, regarding X# a process which abused the system. We pass the pid of the process, X# the chronologically last "ps" line showing the process, and a list X# of recipients. X# X# usage: &mail_note($pid, $ps_line, @recipients); X# Xsub mail_note X{ X open(MAIL, "| mail -s \'your process $_[0] killed\' @_[2..$#_]"); X print MAIL "Your process $_[0] running on eagle was killed because ", X "it was\nconsuming a significant amount of system resources.\n", X "\n\'ps aux\' showed:\n\n$_[1]\n", X "Eagle is for interactive use only. Do not run any large ", X "jobs on eagle.\n\nPlease refer to our policy on interactive ", X "and batch use, available online\nby issuing cnsfinfo. ", X "Select the Policies menu, then the Large.Jobs submenu.\n\n", X "If you have any questions, please send email to ", X "consult@eagle.tc.cornell.edu,\nor call 607-254-8686 and ", X "ask for a consultant."; X close MAIL; X die "error on `mail' pipe at ", &ctime(time) if $?; X} X X# X# This routine is called if we wake up and find that we've X# caught a SIGUSR1. Note that sending a SIGUSR1 will interrupt X# our `sleep'. That is to say, if we wake up and find out that X# $got_sigusr1 is "true", we must check $slept to see how long X# we really slept. X# Xsub reinitialize { X $got_sigusr1 = 0; # reset $got_sigusr1 to "false" X delete $INC{$table}, require $table; # reread initialization table X print STDERR "reinitializing ...rate: ", sprintf("%4.2f", $cpu_rate), X ", threshold: $cpu_threshold, interval: $sleep_interval\n\n"; X if ($slept < $sleep_interval) { X # go to sleep until have slept $sleep_interval seconds X sleep $sleep_interval - $slept; X } X elsif ($slept > $sleep_interval) { X # in this case, we simply start over X &get_process_stats; X sleep $sleep_interval; X } X} X X# X# Deal with SIGUSR1 - set flag indicating that we need to X# reread our initialization file. X# Xsub usr1_handler { X $SIG{'USR1'} = 'IGNORE'; # in case another is sent "really" quickly X print STDERR "caught SIGUSR1 ... "; X $got_sigusr1 = 1; X} X X# X# catch SIGTERM and print out exiting message X# Xsub cleanup { X local($time); X chop($time = &ctime(time)); X print STDERR "caught SIG$_[0] at $time - exiting ...\n"; X exit 0; X} END_OF_FILE governor if test 12025 -ne `wc -c <'governor'`; then echo 'governor' unpacked with wrong size! fi fi chmod 700 governor if test -f 'gov.table' -a "${1}" != "-c" ; then echo Will not clobber existing file 'gov.table' else echo Extracting "gov.table" \(717 bytes\) sed 's/^X//' > gov.table <<'END_OF_FILE gov.table' X# X# governor's initialization table X# X# each line in this file must be a legitimate perl line X# X# gov.table 1.3 Delta'd: 15:36:19 6/10/91 Mike Lijewski, CNSF X# X X$sleep_interval = 300; # seconds to sleep between checking process statistics X X$cpu_rate = .2; # what we consider excessive cpu utilization X X$cpu_threshold = 300; # OK to kill if accumulated more cpu seconds than this X X# groups we don't mess with X@privileged_groups = ('staff','operator','root','daemon','bin','sys','adm','mail','notes','tty'); X X# users we don't mess with X@privileged_users = (); X X# maintainers of this script X@maintainers = ('mjlx'); X X# people to mail once a user is caught X@mailees = ('consult', 'frv', 'pdb', 'len'); X END_OF_FILE gov.table if test 717 -ne `wc -c <'gov.table'`; then echo 'gov.table' unpacked with wrong size! fi fi chmod 400 gov.table if test -f 'gov.readme' -a "${1}" != "-c" ; then echo Will not clobber existing file 'gov.readme' else echo Extracting "gov.readme" \(1481 bytes\) sed 's/^X//' > gov.readme <<'END_OF_FILE gov.readme' XGovernor is an interactive monitor developed for AIX/370. It was Xdesigned to encourage our users to use NQS instead of running large Xjobs on our interactive site. It runs on our interactive site as a Xdaemon, doing a `ps' every X seconds. It then sorts the `ps' data Xand checks for non root-owned processes which have been around for Xthe last two `ps's. If it finds such a process, if that process has Xused more than Y% of the CPU in that interval, if the process' owner Xisn't "privileged" and if the process has accumulated more than Z Xseconds, it is "punished". As configured now, the process is Xkilled. If you have renice(2), you might prefer to renice the Xprocess or something similar. The user, as well as a list of Xinterested parties, is then sent a mailfile. The governor is table Xdriven, with major parameters read in at startup. Sending the Xgovernor a SIGUSR1 will cause it to re-read it's startup file, so you Xcan change parameters while it is running. To get this running on Xanother system you'll need to set the pathname for the startup file, Xset the options to `ps' to those appropriate for your system, and Xcheck that the parsing of `ps' lines does the "right thing" with your Xversion of `ps'. The three UNIX commands it uses are `ps', `mail' Xand `groups'. If your system doesn't disconnect controlling Xterminals on setpgrp(2), you might want to modify how governor Xdaemonizes, though since SIGINT, SIGQUIT and SIGTSTP are ignored, it Xshould run OK as is. END_OF_FILE gov.readme if test 1481 -ne `wc -c <'gov.readme'`; then echo 'gov.readme' unpacked with wrong size! fi fi chmod 600 gov.readme exit 0 -- Mike Lijewski (H)607/272-0238 (W)607/254-8686 Cornell National Supercomputer Facility ARPA: mjlx@eagle.tc.cornell.edu BITNET: mjlx@cornellf.bitnet SMAIL: 25 Renwick Heights Road, Ithaca, NY 14850