[bit.listserv.vmxa-l] Looking a way to limit interactive CPU utilization

MONTANAN@EVALUN11.BITNET (Rogelio Montanyana) (02/07/90)

Dear colleagues:

We have a 3090 that is actively being used by a very wide community of
people in our University. As System Manager I am responsible of providing
a fair share of computing resources to all of them.

We are experiencing a big problem because we have some number crunchers that
would use as much CPU power as they can access, and as far as I know there is
no way in VM/XA SP2 to limit the maximum CPU time for an interactive user.

In an attemt to avoid the typical situation where every user executes
his/her program interactively (being Disconnected) instead of using the batch
queues, we have modified the priorities (SHARE parameter in XA) in order to
make batch executions faster than interactive ones. This solution has the
undesired side effect of effectively reducing the performance for users
that actually do interactive work (short executions, compilations, for
instance).

I have been thinking about writing a REXX program that would run in a
disconnected service machine, and whose main task would be to have a look
at the system from time to time (every three minutes, for example); that
program would follow (using the CP INDICATE QUEUE command) those CPU bound
users, and would reduce their SHARE progressively; of course, certain users
(i.e., batch machines) would be excluded from the process.

Because I do not want to reinvent the wheel, I first want to ask if anybody
has already written something similar. I would also be very glad to
accept comment and suggestions from other people.

Thanks to everybody,
Rogelio Montanana
System Manager
Valencia University
BITNET: MONTANAN at EVALUN11

RWWMAINT@MSU.BITNET (Rich Wiggins) (02/07/90)

>
>I have been thinking about writing a REXX program that would run in a
>disconnected service machine, and whose main task would be to have a look
>at the system from time to time (every three minutes, for example); that
>program would follow (using the CP INDICATE QUEUE command) those CPU bound
>users, and would reduce their SHARE progressively; of course, certain users
>(i.e., batch machines) would be excluded from the process.
>
>Because I do not want to reinvent the wheel, I first want to ask if anybody
>has already written something similar. I would also be very glad to
>accept comment and suggestions from other people.

We have exactly the same sort of diverse mix, and exactly the same
problem you detail: If our CPU crunchers would run via batch only,
our load would be quite manageable, even though we run at 100%
CPU utilization most of the time.

Since they don't, I've implemented a simple little Rexx tool to do
exactly what you propose.  It's called QSCAN.  It scans through
IND POS lists to find people who seem to be getting lots of Q3
time, and slowly degrades their priority, eventually to be worse
than that of the batch workers.

This works fairly well.  It cuts way down on the amount of CPU
delivered to interactive crunchers.  However, some folks don't
get the hint, and on a busy day even the degraded crunchers
have an impact.

We did this under HPO 4.2.  I'll be glad to send the exec along,
especially if I get an XA-SP version back in return!

/Rich Wiggins
 Computer Lab
 Michigan State U

CMS2@ETSU.BITNET (Bill Williams) (02/08/90)

Just out of curiosity has/does anybody use SCI's VMMONITOR under VM/XA
to control user resource consumption -- CPU, SIO, etc.?
If so, how's it working out?  Do you like it?  Any big problems?
Comments?
----------
    B.R.Wms

BOEHEIM@SLACVM.BITNET (Chuck Boeheim) (02/08/90)

REPLY TO 02/06/90 18:30 FROM MONTANAN@EVALUN11.BITNET "Rogelio Montanyana":
Looking a way to limit interactive CPU utilization

We have such a system in place at SLAC.  It's more than a single
exec, and not at all documented, but you can probably understand
it from reading the execs in fairly short order.  It uses
monitor data from RTM/SMART, and so would only be of use to you
if you have that product.

The execs find out from SMART which users have exceeded a
threshold of CPU use averaged over an interval.  We use 10%
over 10 minutes with fairly good results.  This tends to
allow people to have quick spikes of activity without penalty
and only get the ones that crunch for a while.  Under HPO it
changes the user's priority from our standard of 50 to 99.
Under XA it sets their relative SHARE from 100 to 5.  If they're
good in the next interval, it sets it back.

It also has an exclusion list, a notification list, and logging.

If you would like this, AS IS, send me mail at BOEHEIM at SLACVM.

-Chuck Boeheim

Z00NER01@AWIUNI11.BITNET (Ernst Neuwirth) (02/08/90)

Dear Mr. Montanyana,

we have written a REXX-Program that does the following:

it does an INDICATE QUEUE every 10 seconds and sets the relative share
of *normal* users depending on the (real)time since the start of the
transaction. For the first 4 minutes of real time a user gets 100 relative
share, then 80, 60, 50, etc. After more than 300 minutes the relative
share will be 1. Users in Q1 and Q2 get relative share 100.

We use this in production since about April 1989 and it has worked fine
in moving all long production work to the batch system.

Batch machines have a relative share of 50, so they do not normally
influence interactive work.


Ernst Neuwirth, University of Vienna Computer Center.