MONTANAN@EVALUN11.BITNET (Rogelio Montanyana) (02/07/90)
Dear colleagues: We have a 3090 that is actively being used by a very wide community of people in our University. As System Manager I am responsible of providing a fair share of computing resources to all of them. We are experiencing a big problem because we have some number crunchers that would use as much CPU power as they can access, and as far as I know there is no way in VM/XA SP2 to limit the maximum CPU time for an interactive user. In an attemt to avoid the typical situation where every user executes his/her program interactively (being Disconnected) instead of using the batch queues, we have modified the priorities (SHARE parameter in XA) in order to make batch executions faster than interactive ones. This solution has the undesired side effect of effectively reducing the performance for users that actually do interactive work (short executions, compilations, for instance). I have been thinking about writing a REXX program that would run in a disconnected service machine, and whose main task would be to have a look at the system from time to time (every three minutes, for example); that program would follow (using the CP INDICATE QUEUE command) those CPU bound users, and would reduce their SHARE progressively; of course, certain users (i.e., batch machines) would be excluded from the process. Because I do not want to reinvent the wheel, I first want to ask if anybody has already written something similar. I would also be very glad to accept comment and suggestions from other people. Thanks to everybody, Rogelio Montanana System Manager Valencia University BITNET: MONTANAN at EVALUN11
RWWMAINT@MSU.BITNET (Rich Wiggins) (02/07/90)
> >I have been thinking about writing a REXX program that would run in a >disconnected service machine, and whose main task would be to have a look >at the system from time to time (every three minutes, for example); that >program would follow (using the CP INDICATE QUEUE command) those CPU bound >users, and would reduce their SHARE progressively; of course, certain users >(i.e., batch machines) would be excluded from the process. > >Because I do not want to reinvent the wheel, I first want to ask if anybody >has already written something similar. I would also be very glad to >accept comment and suggestions from other people. We have exactly the same sort of diverse mix, and exactly the same problem you detail: If our CPU crunchers would run via batch only, our load would be quite manageable, even though we run at 100% CPU utilization most of the time. Since they don't, I've implemented a simple little Rexx tool to do exactly what you propose. It's called QSCAN. It scans through IND POS lists to find people who seem to be getting lots of Q3 time, and slowly degrades their priority, eventually to be worse than that of the batch workers. This works fairly well. It cuts way down on the amount of CPU delivered to interactive crunchers. However, some folks don't get the hint, and on a busy day even the degraded crunchers have an impact. We did this under HPO 4.2. I'll be glad to send the exec along, especially if I get an XA-SP version back in return! /Rich Wiggins Computer Lab Michigan State U
CMS2@ETSU.BITNET (Bill Williams) (02/08/90)
Just out of curiosity has/does anybody use SCI's VMMONITOR under VM/XA to control user resource consumption -- CPU, SIO, etc.? If so, how's it working out? Do you like it? Any big problems? Comments? ---------- B.R.Wms
BOEHEIM@SLACVM.BITNET (Chuck Boeheim) (02/08/90)
REPLY TO 02/06/90 18:30 FROM MONTANAN@EVALUN11.BITNET "Rogelio Montanyana": Looking a way to limit interactive CPU utilization We have such a system in place at SLAC. It's more than a single exec, and not at all documented, but you can probably understand it from reading the execs in fairly short order. It uses monitor data from RTM/SMART, and so would only be of use to you if you have that product. The execs find out from SMART which users have exceeded a threshold of CPU use averaged over an interval. We use 10% over 10 minutes with fairly good results. This tends to allow people to have quick spikes of activity without penalty and only get the ones that crunch for a while. Under HPO it changes the user's priority from our standard of 50 to 99. Under XA it sets their relative SHARE from 100 to 5. If they're good in the next interval, it sets it back. It also has an exclusion list, a notification list, and logging. If you would like this, AS IS, send me mail at BOEHEIM at SLACVM. -Chuck Boeheim
Z00NER01@AWIUNI11.BITNET (Ernst Neuwirth) (02/08/90)
Dear Mr. Montanyana, we have written a REXX-Program that does the following: it does an INDICATE QUEUE every 10 seconds and sets the relative share of *normal* users depending on the (real)time since the start of the transaction. For the first 4 minutes of real time a user gets 100 relative share, then 80, 60, 50, etc. After more than 300 minutes the relative share will be 1. Users in Q1 and Q2 get relative share 100. We use this in production since about April 1989 and it has worked fine in moving all long production work to the batch system. Batch machines have a relative share of 50, so they do not normally influence interactive work. Ernst Neuwirth, University of Vienna Computer Center.