WIZARD@RITA.ACS.WASHINGTON.EDU (The Bandit "." "." "." "", on" "RITA) (12/18/87)
Michael J. Porter <mike@vax.oit.udel.edu> writes: > We have all our users submit heavy crunch jobs as batch. The batch > queues have a lower priority than interactive, so our interactive > users are not affected. Putting CPU time limits on users will > insure that they submit jobs. We simply threatened to do this and > our users cooperated making life easier for all. Here at the University of Washington, we have set the batch queue base priority to 3, while user's normal base priority is 4. We also have a batch job limit of three jobs in execution. This works in most instances, but you have to realize that base priority is not the only factor involved. The base priority (currently) only determines when the job will get the CPU. An example will illustrate the problem. We had a user here who submitted three batch jobs, and the system died. Setting the base priority to 0 did not help. Suspending the processes did. All three batch jobs were running a BASIC program the user had written, and all three were executed through the BASIC interpreter, rather than being compiled and linked. The program's job was to resequence another BASIC program. The problem was that the virtual working set for these jobs was huge, and the jobs were causing page faults like crazy. BASIC was installed, and page sharing wasn't the problem. The jobs just didn't want the same pages at the same time. Why did the user submit 3 jobs? Because they ran at a lower priority, and thus wouldn't affect anyone. (ha!) Why didn't the user compile and link his program? Because he couldn't LINK it. He was only 20,000+ blocks over his disk quota (which was about 2000 blocks). (Most languages get installed here with EXQUOTA, but not the linker.) Derek Haining Academic Computing Services University of Washington Seattle, Washington DEREK@UWARITA.BITNET -or- DEREK@RITA.ACS.WASHINGTON.EDU
david@elroy.Jpl.Nasa.Gov (David Robinson) (12/18/87)
In article <UWAVM.ACS.WASHINGTON.EDU:LAGN$uXK*>, WIZARD@RITA.ACS.WASHINGTON.EDU (The Bandit "." "." "." "", on" "RITA) writes: > Michael J. Porter <mike@vax.oit.udel.edu> writes: > > > We have all our users submit heavy crunch jobs as batch. The batch > > queues have a lower priority than interactive, so our interactive > > users are not affected. Putting CPU time limits on users will > > insure that they submit jobs. We simply threatened to do this and > > our users cooperated making life easier for all. > > This works in most instances, but you have to realize that base priority > is not the only factor involved. The base priority (currently) only > determines when the job will get the CPU. An example will illustrate > the problem. > > We had a user here who submitted three batch jobs, and the system died. > Setting the base priority to 0 did not help. Suspending the processes did. > All three batch jobs were running a BASIC program the user had written, > and all three were executed through the BASIC interpreter, rather than > being compiled and linked. The program's job was to resequence another > BASIC program. The problem was that the virtual working set for these jobs > was huge, and the jobs were causing page faults like crazy. BASIC was > installed, and page sharing wasn't the problem. The jobs just didn't want > the same pages at the same time. The one thing I have always hated about the VMS priority scheme was the lack of granulatity. There are 32 levels and one almost never runs anything over 16. So now you are down to 16 levels. With the priority boosting that is given to I/O you have considerable overlap. An example is to have a priority 3 job doing raw character at a time I/O. Since I/O gets a big boost the priority 4 jobs get next to no work done. People have mentioned it before but having a batch queue with priority 3 and interactive users at 4 works fine if you have CPU intensive jobs but if you have I/O intensive batch jobs they will serverly hurt interactive priority. A better scheme in VMS is to have the base batch priority 2 or more levels below interactive so that I/O bound jobs don't get excessive CPU. I also advocate running Interactive users at 5 or 6 to allow for a bigger range for the batch queues. -- David Robinson elroy!david@csvax.caltech.edu ARPA david@elroy.jpl.nasa.gov ARPA {cit-vax,ames}!elroy!david UUCP Disclaimer: No one listens to me anyway!
dhesi@bsu-cs.UUCP (Rahul Dhesi) (12/20/87)
In article <UWAVM.ACS.WASHINGTON.EDU:LAGN$uXK*> WIZARD@RITA.ACS.WASHINGTON.EDU (The Bandit "." "." "." "", on" "RITA) writes: >We had a user here who submitted three batch jobs, and the system died. >Setting the base priority to 0 did not help. Suspending the processes did. >...The problem was that the virtual working set for these jobs >was huge, and the jobs were causing page faults like crazy. This sounds like a design flaw (i.e., a bug) in VMS's scheduling and/or memory management. A zero-priority job should simply remain swapped- out in its entirety most of the time. It should get swapped in only very rarely, and on those rare occasions, it should get all the pages it needs. It certainly ought not to bring the system down, unless the amount of disk space set aside for swapping is not enough--and if so, a well-designed system will simply abort the job with an out-of-memory or out-of-swap-space error message. Yes, there are systems that will die if three jobs do excessive paging (best exemplified by Primos revision 17), but I had expected VMS to be better able to protect itself from this. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
dhesi@bsu-cs.UUCP (Rahul Dhesi) (12/20/87)
In article <UWAVM.ACS.WASHINGTON.EDU:LAGN$uXK*> WIZARD@RITA.ACS.WASHINGTON.EDU (The Bandit "." "." "." "", on" "RITA) writes: >Why didn't the user compile and link his program? Because he couldn't LINK >it. He was only 20,000+ blocks over his disk quota (which was about 2000 >blocks). (Most languages get installed here with EXQUOTA, but not the >linker.) A good solution to this is to make SYS$SCRATCH point to a device with plenty of scratch disk space, and to have a batch job clean out old files every 15 minutes. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
WIZARD@RITA.ACS.WASHINGTON.EDU (The Bandit "." "." "." "", on" "RITA) (12/21/87)
Rahul Dhesi <bsu-cs!dhesi@iuvax.cs.indiana.edu> responded to my recent message with two comments. It seems I needed to take more care to ensure that my messages are clear. > A good solution to this is to make SYS$SCRATCH point to a device with > plenty of scratch disk space, and to have a batch job clean out old > files every 15 minutes. Although this suggestion appears to be good, and in many environments would probably do the trick, it doesn't suit our needs. As mentioned, the user was over his disk quota by quite a lot. Typically, our users get a scant 750 to 1500 blocks of disk quota. This is because we have many many users, and limited disk space. We don't have any disk with plenty of disk space. Our approach is as follows. I mentioned in my earlier message that the compilers are installed with EXQUOTA, but I failed to mention that our editors are also installed with EXQUOTA. This allows the students to edit and edit and edit (as well as to compile and compile and compile) until they get a program which compiles with no errors. By doing this, the student can save multiple backups of his/her programs in order to back out an unwise change. However, once the program compiles with no errors, the student MUST clean up his/her directory in order to proceed. Since most students are learning to program, as well as learning VMS, this seems to help them the most by keeping VMS problems out of their way until they have finished dealing with programming problems. ******************************************************* The second comment made was: > It [a zero-priority job] certainly ought not to bring the system down, > unless the amount of disk space set aside for swapping is not enough--and > if so, a well-designed system will simply abort the job with an > out-of-memory or out-of-swap-space error message. > ... > but I had expected VMS to be better able to protect itself from this. I believe my choice of the word "died" here was deceptive. I meant that these jobs brought the system to its knees. It was essentially thrashing itself to death with page-faults. VMS does handle this situation OK, however. I understand that in VMS V5.0 there are changes to be made to the job scheduler to address this particular problem. Low(er) priority jobs will have all of their memory taken away in order to service jobs with higher priorities. They simply won't get any memory until there is enough idle time to do anything for it. Sorry for the confusion. Derek Haining
jdc@beach.cis.ufl.edu (Jeff Capehart) (12/22/87)
In article <UWAVM.ACS.WASHINGTON.EDU:LAV35bCD*> WIZARD@RITA.ACS.WASHINGTON.EDU (The Bandit "." "." "." "", on" "RITA) writes: >the user was over his disk quota by quite a lot. Typically, our users >get a scant 750 to 1500 blocks of disk quota. This is because we have >many many users, and limited disk space. We don't have any disk with >plenty of disk space. > >Our approach is as follows. I mentioned in my earlier message that the >compilers are installed with EXQUOTA, but I failed to mention that our >editors are also installed with EXQUOTA. This allows the students to >edit and edit and edit (as well as to compile and compile and compile) On our vax, users are divided up into personal accounts and class accounts. We have a cluster with an 8600 and two 780's. There are 7 RA81's in use. Two system, 1 for personal , 2 for class, one backup, and one spare. Each personal user gets 300 blocks and each class account gets 400. The compilers and linker are installed with EXQUOTA. The editor is not. It was possible to use the EDITOR to exceed many many blocks of diskquota very easily just by including whatever file and then saving it. -- Jeff Capehart Internet: micronaut%oak.decnet@pine.circa.ufl.edu University of Florida UUCP: ..!ihnp4!codas!ufcsv!beach.cis.ufl.edu!jdc