[mod.computers.vax] Job Queue Monitor

jmleonar@CRDC.ARPA ("Dr. Joseph M. Leonard") (05/21/86)

Increased system load has led me to consider splitting the batch queue into
a fast and a slow queue.  Before I do this, I want some kind of detached
process that can (a) determine the job queue of a batch job, (b) determine
the amount of CPU time consumed and (c) notify me of jobs that use more
that a preset time limit.  This would enable me to "enforce" the distinction
between the two queues.

If my terminology has not given me away, I'm running VMS 4.3 (with 4.4
expected in the near future).  Please reply to me directly if you have an
idea or two, and I'll summarize if there is a lot of response (or a lot of
different responses).

                                      Thanks in advance,

                                            Joe Leonard
                                         <jmleonar@crdc.arpa>y

JMS@ARIZMIS.BITNET.UUCP (05/24/86)

In re: the fellow that wanted to have two batch queues, with some sort
of adminstrative system where the system tells him who the hogs
are, and he can squash them.

It all depends on your system philosophy.  In order to provide a
minimum of friction and let the system 'run itself,' the University
of Arizona has a time limit on batch queues.  Thus, whatever image
is executing when the time limit expires aborts, and the whole
batch job goes down the drain.  There's a nice informative message
in the batch log for jobs that hit the wall time-limit-wise.

I believe that it will only take once for batch queue users to get
this message before they begin to use the slower queue for long
jobs.

The beauty of letting VMS do the work is that the system does the correction
and not the 'system manager.'  Our experience is that the less
a 'system manager' tells the people who own the VAX what to do with
their resources, the happier the people who own the VAX (often
called 'my users' by system managers) will be.

On a related note, it's good to put some reasonable time limit on
batch queues ANYWAY; in a University environment without heavy OR
people, 24 hours is as good a limit as any.  Then, someone who lets
loose a batch job with an infinite loop in it (and doesn't know
how to kill the job, or doesn't realize) is self-corrected.  Typically,
we have established a DEFAULT maximum CPU time of 24 hours, and an
ABSOLUTE MAXIMUM CPU time of 1 week.  Thus, unless you KNOW you're
job is going to run long, you don't pay attention to CPULIMIT switches,
and the system protects itself and you.

jms

Joel M Snyder
University of Arizona Department of MIS
Tucson, Arizona 85721  (602) 621-2748
JMS@ARIZMIS.BITNET