[comp.lang.perl] Program to look for CPU hogs

ted@evi.com (Ted Stefanik) (05/04/91)

Sometime back, I wrote a Perl program to look for CPU hogs.  I was going to
post it, then got distracted by paying work.

Since lijewski@theory.tn.cornell.edu (Mike Lijewski) just posted a program
trying to do somewhat the same thing, I thought I'd post mine.  (Sans
full documentation, however.)

The program, Scan_PS, looks at the "ps auxww" output, and prints out any
processes that have used more than some threshold of CPU time.  Then,
it stores the ps info for the hogs in a "memories" log.  Thereafter, the
hogs are reported on after increments of a second threshold are gobbled.

Scan_PS was designed to be kicked off from cron on a regular basis
(such as hourly).  The program Scan_Hour checks the hog status on all of
our workstations, and rolls the results into one report.

Some neat Scan_PS features:

   1) Automatic recognition of "bad guy" processes.  (Our Ultrix 3.2 leaves
      CPU buring hung processes around; Scan_PS could be made to automatically
      gun these processes.)
   2) Different thresholds for system, normal, and "bad guy" processes.
   3) Different primary and secondary thresholds for individual programs,
      if desired.

Unfortunately, the you must customize Scan_PS at both the top and bottom
of the program (constants at the top, and process recognizer at the bottom).

Scan_PS was inspired by perl/eg/scan/scan_ps in the standard Perl distribution.

I hope this is helpful to somebody!


#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  Scan_Hour Scan_PS
# Wrapped by ted@blt on Fri May  3 12:18:04 1991
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'Scan_Hour' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'Scan_Hour'\"
else
echo shar: Extracting \"'Scan_Hour'\" \(1774 characters\)
sed "s/^X//" >'Scan_Hour' <<'END_OF_FILE'
X#!/usr/bin/perl
X
X#
X# Scan_Hour - Does hourly (more or less) checks on all workstations, looking
X#             for CPU hogs.
X#
X# Created by: Ted Stefanik <ted@evi.com>
X#             February, 1991
X#
X# Copyright 1991 by Ted Stefanik and Expert Views, Inc.  All Rights Reserved.
X#
X# Permission to use, copy, modify, and distribute this software and its
X# documentation for any purpose and without fee is hereby granted, provided
X# that the above copyright notice appear in all copies.
X#
X# This code is distributed in the hope that it will be useful, but WITHOUT ANY
X# WARRANTY.  Ted Stefanik and Expert Views, Inc. disclaim all warranties with
X# regard to this software, including all implied warranties of merchantability
X# and fitness, and in no event shall be liable for any special, indirect or
X# consequential damages or any damages whatsoever resulting from loss of use,
X# data or profits arising out of or in connection with the use or performance
X# of this software.
X#
X
X
X#
X# First, set up our manifest constants
X#
X@Systems =   ("sys1", "sys2", "sys3", "sys4", "sys5", "sys6", "sys7");
X@AlertList = ("ted", "ned");
X$Master = "/usr/local/lib/perl";
X
X#
X# Next, read in the current system status
X#
Xforeach $sys (@Systems)
X{
X   if (open(Ps, "rsh $sys -n $Master/Scan_PS $Master |"))
X   {
X      @systat = <Ps>;
X      close(Ps);
X   }
X   else
X   {
X      @systat = "**** Can't run scan_ps on $sys ****";
X   }
X
X   if ($#systat != -1)
X   {
X      push(@results,
X      "***************************** On $sys: *****************************\n",
X      @systat,
X      "********************************************************************\n",
X      "\n\n");
X   }
X}
X
Xif ($#results != -1)
X{
X   open(MH, "| /usr/ucb/mail @AlertList");
X   print MH "~sOink Alert!\n";
X   print MH @results;
X}
END_OF_FILE
if test 1774 -ne `wc -c <'Scan_Hour'`; then
    echo shar: \"'Scan_Hour'\" unpacked with wrong size!
fi
chmod +x 'Scan_Hour'
# end of 'Scan_Hour'
fi
if test -f 'Scan_PS' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'Scan_PS'\"
else
echo shar: Extracting \"'Scan_PS'\" \(6597 characters\)
sed "s/^X//" >'Scan_PS' <<'END_OF_FILE'
X#!/usr/bin/perl
X
X#
X# Scan_PS - A program to look for CPU hogs and looping processes
X#
X# Created by: Ted Stefanik <ted@evi.com>
X#             February, 1991
X#
X# Copyright 1991 by Ted Stefanik and Expert Views, Inc.  All Rights Reserved.
X#
X# Permission to use, copy, modify, and distribute this software and its
X# documentation for any purpose and without fee is hereby granted, provided
X# that the above copyright notice appear in all copies.
X#
X# This code is distributed in the hope that it will be useful, but WITHOUT ANY
X# WARRANTY.  Ted Stefanik and Expert Views, Inc. disclaim all warranties with
X# regard to this software, including all implied warranties of merchantability
X# and fitness, and in no event shall be liable for any special, indirect or
X# consequential damages or any damages whatsoever resulting from loss of use,
X# data or profits arising out of or in connection with the use or performance
X# of this software.
X#
X
X
X#
X# First, set up our manifest constants
X#
X$Master = shift(@ARGV);      # First argument is "memories" directory
X
X$AlertUnit = 10 * 60;        # 10 minutes
X
X$Normal  = 0;                # Enumeration for type of process
X$SysProc = 1;
X$BadGuys = 2;
X
X@Mult = (1, 4, 1);           # AlertUnit Multiplier for each type
X
X%Init = ("sqlexec",  100*60, # Special processes - initial hog report time
X         "isql",     50*60,
X         "gawk",     100*60,
X         "awk",      100*60,
X         "perl",     100*60,
X         "new2old",  100*60,
X         "rtgenerate", 50*60,
X         "newton",   50*60,
X         "Xmfbpmax", 100*60,
X         "Xcfbpmax", 100*60,
X         "epoch",    50*60,
X         "tee",      50*60);
X%Step = ("sqlexec",  200*60, # Special processes - incremental hog report time
X         "gawk",     200*60,
X         "isql",     50*60,
X         "awk",      200*60,
X         "perl",     200*60,
X         "new2old",  200*60,
X         "rtgenerate", 50*60,
X         "newton",   25*60,
X         "Xmfbpmax", 100*60,
X         "Xcfbpmax", 100*60,
X         "epoch",    50*60,
X         "tee",      50*60);
X
Xchop($hostname = `hostname`);
X$histfile = "oldhogs" . ".$hostname";
X
X
X#
X# Second, read in the old hogs log
X#
Xchdir "$Master/Hogs" || die "Can't cd to memories: $!\n";
Xif (open(OH, $histfile))
X{
X   &ScanPSList(*OH, *OldPS,*OldPIDs,*OldTimes,*OldTypes,*OldNames, *OldIdx);
X}
X   
X
X#
X# Next, read in the current system status
X#
Xopen(Ps, '/bin/ps auxww |') || die "scan_ps: can't run ps: $!\n";
X&ScanPSList(*Ps, *NewPS,*NewPIDs,*NewTimes,*NewTypes,*NewNames, *NewIdx);
X
X
X#
X# Then, scan the current system status looking for hogs
X#
Xfor ($i = 1;  $i <= $#NewPIDs;  $i++)   # Element 0 is banner
X{
X   $oldTime = 0;
X
X   if (defined($OldIdx{$NewPIDs[$i]}))
X   {
X      $oldTime = $OldTimes[$OldIdx{$NewPIDs[$i]}];
X   }
X
X   $elapse = $NewTimes[$i] - $oldTime;
X   $incr = $AlertUnit * $Mult[$NewTypes[$i]];
X   if ($oldTime == 0 && defined($Init{$NewNames[$i]}))
X   {
X      $incr = $Init{$NewNames[$i]};
X   }
X   if ($oldTime != 0 && defined($Step{$NewNames[$i]}))
X   {
X      $incr = $Step{$NewNames[$i]};
X   }
X
X   if ($elapse >= $incr)
X   {
X      push(@HogList, $i);
X      push(@HogNorm, $i) if ($NewTypes[$i] == $Normal);
X      push(@HogSys,  $i) if ($NewTypes[$i] == $SysProc);
X      push(@HogBad,  $i) if ($NewTypes[$i] == $BadGuys);
X   }
X}
X
X
X#
X# Penultimately, print out the hog list
X#
Xif ($#HogList != -1)
X{
X   print "System load:\n   ", `uptime`;
X}
X&PrintBad(*HogSys,
X          "System processes.", "(We generally shouldn't have to kill these.)");
X&PrintBad(*HogBad,
X          "Previously known offenders.",
X          "(If I had permission, I'd kill them automatically.)");
X&PrintBad(*HogNorm,
X          "Normal processes.",
X"(If a particular program shows up here a lot, add it to the offender list.)");
X
X
X#
X# Lastly, save the current hogs as the "old hog log" for next time
X#
Xopen(OH, ">" . $histfile)  || die "scan_ps: can't write to history file: $!\n";
Xforeach $i (@HogList)
X{
X   print OH "$NewPS[$i]\n";
X}
XOld: foreach $i (@OldPIDs)
X{
X   next Old if (!defined($NewIdx{$i}));
X   for $j (@HogList)
X   {
X      next Old if ($NewPIDs[$j] == $i);
X   }
X   print OH "$OldPS[$OldIdx{$i}]\n";
X}
Xclose(OH);
X
X
X#
X# Read a process status list
X#
Xsub ScanPSList
X{
X   local(*fh, *lines, *pids, *times, *types, *names, *idx) = @_;
X   local($i, $pid, $elapse, $type, $proc, $cmds, @toks, @name);
X
X   for ($i = 0;  <fh>;  $i++)
X   {
X       chop;
X       $lines[$i] = $_;
X       &ScanPSEnt(*pid, *elapse, *type, *proc, $_);
X       $pids[$i] = $pid;
X       $times[$i] = $elapse;
X       $types[$i] = $type;
X       @toks = split(/[ \t]/o, $proc);
X       @name = split(m|/|o, $toks[0]);
X       $names[$i] = pop(@name);
X       $idx{$pid} = $i;
X   }
X   close(fh);
X
X   return(undef);
X}
X
X
X#
X# Decode a single process status line
X#
Xsub ScanPSEnt
X{ 
X   local(*pid, *elapse, *type, *proc, $line) = @_;
X   local($user, $cpu, $mem, $sz, $rss, $tt, $stat, $time, $min, $sec);
X
X   $user = substr($line,  0,  8);   # Use substr instead of split, because
X   $pid  = substr($line,  9,  5);   #   fields can overflow and get stuck
X   $cpu  = substr($line, 14,  5);   #   together; then split thinks that
X   $mem  = substr($line, 19,  5);   #   two fields are a single field!
X   $sz   = substr($line, 24,  5);
X   $rss  = substr($line, 29,  5);
X   $tt   = substr($line, 35,  2);
X   $stat = substr($line, 38,  3);
X   $time = substr($line, 41,  7);
X   $proc = substr($line, 49);
X
X   ($min, $sec) = split(/:/o, $time);
X   $elapse  = $min * 60 + $sec;
X
X   $type = $Normal;
X
X   if ($proc eq "swapper"                                      ||
X       $proc eq "pagedaemon"                                   ||
X       $proc =~ m|^/etc/|o                                     ||
X       $proc =~ m|^/usr/lib/|o                                 ||
X       $proc =~ m|^/usr/local/gbin/X[mc]fbpmax|o               ||
X       $proc eq "/usr/local/gbin/xdm"                          ||
X       $proc eq "/usr/local/gbin/xdsxdm"                       ||
X       $proc =~ /^-\w*:\w+(\.\w+)? \(xdm\)$/o                  ||
X       $proc =~ /^- \w+\.[0-9]+ (console|tty\w+) \(getty\)$/o  ||
X       $proc =~ /^rdump -/o                                    ||
X       $proc eq "rwhod"                                        ||
X       $proc eq "routed")
X   {
X      $type = $SysProc;
X   }
X   elsif ($proc =~ m|^-[a-z/]*sh \(c?sh\)$|o)
X   {
X      $type = $BadGuys;
X   }
X
X   return(undef);
X}
X
X
X
X#
X# Print out a hogs table
X#
Xsub PrintBad
X{
X   local(*bl, $msg1, $msg2) = @_;
X
X   return(undef) if ($#bl == -1);
X
X   print "\n\n$msg1\n   $msg2\n\n$NewPS[0]\n";
X
X   foreach $i (@bl)
X   {
X      print "$NewPS[$i]\n";
X   }
X
X   return(undef);
X}
END_OF_FILE
if test 6597 -ne `wc -c <'Scan_PS'`; then
    echo shar: \"'Scan_PS'\" unpacked with wrong size!
fi
chmod +x 'Scan_PS'
# end of 'Scan_PS'
fi
echo shar: End of shell archive.
exit 0