[comp.os.vms] Performance and batch jobs.

WIZARD@RITA.ACS.WASHINGTON.EDU (The Bandit "." "." "." "", on" "RITA) (12/18/87)

Michael J. Porter <mike@vax.oit.udel.edu> writes:

> We have all our users submit heavy crunch jobs as batch.  The batch
> queues have a lower priority than interactive, so our interactive
> users are not affected.  Putting CPU time limits on users will
> insure that they submit jobs.  We simply threatened to do this and
> our users cooperated making life easier for all.

Here at the University of Washington, we have set the batch queue base
priority to 3, while user's normal base priority is 4.  We also have
a batch job limit of three jobs in execution.

This works in most instances, but you have to realize that base priority
is not the only factor involved.  The base priority (currently) only
determines when the job will get the CPU.  An example will illustrate
the problem.

We had a user here who submitted three batch jobs, and the system died.
Setting the base priority to 0 did not help.  Suspending the processes did.
All three batch jobs were running a BASIC program the user had written,
and all three were executed through the BASIC interpreter, rather than
being compiled and linked.  The program's job was to resequence another
BASIC program.  The problem was that the virtual working set for these jobs
was huge, and the jobs were causing page faults like crazy.  BASIC was
installed, and page sharing wasn't the problem.  The jobs just didn't want
the same pages at the same time.

Why did the user submit 3 jobs?  Because they ran at a lower priority, and
thus wouldn't affect anyone.  (ha!)

Why didn't the user compile and link his program?  Because he couldn't LINK
it.  He was only 20,000+ blocks over his disk quota (which was about 2000
blocks).   (Most languages get installed here with EXQUOTA, but not the
linker.)

Derek Haining
Academic Computing Services
University of Washington
Seattle, Washington

DEREK@UWARITA.BITNET
        -or-
DEREK@RITA.ACS.WASHINGTON.EDU

david@elroy.Jpl.Nasa.Gov (David Robinson) (12/18/87)

In article <UWAVM.ACS.WASHINGTON.EDU:LAGN$uXK*>, WIZARD@RITA.ACS.WASHINGTON.EDU (The Bandit "." "." "." "", on" "RITA) writes:
> Michael J. Porter <mike@vax.oit.udel.edu> writes:
> 
> > We have all our users submit heavy crunch jobs as batch.  The batch
> > queues have a lower priority than interactive, so our interactive
> > users are not affected.  Putting CPU time limits on users will
> > insure that they submit jobs.  We simply threatened to do this and
> > our users cooperated making life easier for all.
> 
> This works in most instances, but you have to realize that base priority
> is not the only factor involved.  The base priority (currently) only
> determines when the job will get the CPU.  An example will illustrate
> the problem.
> 
> We had a user here who submitted three batch jobs, and the system died.
> Setting the base priority to 0 did not help.  Suspending the processes did.
> All three batch jobs were running a BASIC program the user had written,
> and all three were executed through the BASIC interpreter, rather than
> being compiled and linked.  The program's job was to resequence another
> BASIC program.  The problem was that the virtual working set for these jobs
> was huge, and the jobs were causing page faults like crazy.  BASIC was
> installed, and page sharing wasn't the problem.  The jobs just didn't want
> the same pages at the same time.

The one thing I have always hated about the VMS priority scheme
was the lack of granulatity.  There are 32 levels and one almost
never runs anything over 16.  So now you are down to 16 levels.
With the priority boosting that is given to I/O you have considerable
overlap.  An example is to have a priority 3 job doing raw character
at a time I/O.  Since I/O gets a big boost the priority 4 jobs get next
to no work done.  

People have mentioned it before but having a batch queue with priority
3 and interactive users at 4 works fine if you have CPU intensive
jobs but if you have I/O intensive batch jobs they will serverly
hurt interactive priority.  A better scheme in VMS is to have the
base batch priority 2 or more levels below interactive so that I/O
bound jobs don't get excessive CPU.  I also advocate running
Interactive users at 5 or 6 to allow for a bigger range for
the batch queues.  

-- 
	David Robinson		elroy!david@csvax.caltech.edu     ARPA
				david@elroy.jpl.nasa.gov	  ARPA
				{cit-vax,ames}!elroy!david	  UUCP
Disclaimer: No one listens to me anyway!

dhesi@bsu-cs.UUCP (Rahul Dhesi) (12/20/87)

In article <UWAVM.ACS.WASHINGTON.EDU:LAGN$uXK*> WIZARD@RITA.ACS.WASHINGTON.EDU 
(The Bandit "." "." "." "", on" "RITA) writes:
>We had a user here who submitted three batch jobs, and the system died.
>Setting the base priority to 0 did not help.  Suspending the processes did.
>...The problem was that the virtual working set for these jobs
>was huge, and the jobs were causing page faults like crazy. 

This sounds like a design flaw (i.e., a bug) in VMS's scheduling and/or
memory management.  A zero-priority job should simply remain swapped-
out in its entirety most of the time.  It should get swapped in only
very rarely, and on those rare occasions, it should get all the pages
it needs.  It certainly ought not to bring the system down, unless the
amount of disk space set aside for swapping is not enough--and if so, a
well-designed system will simply abort the job with an out-of-memory or
out-of-swap-space error message.

Yes, there are systems that will die if three jobs do excessive paging
(best exemplified by Primos revision 17), but I had expected VMS to be
better able to protect itself from this.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

dhesi@bsu-cs.UUCP (Rahul Dhesi) (12/20/87)

In article <UWAVM.ACS.WASHINGTON.EDU:LAGN$uXK*> WIZARD@RITA.ACS.WASHINGTON.EDU 
(The Bandit "." "." "." "", on" "RITA) writes:
>Why didn't the user compile and link his program?  Because he couldn't LINK
>it.  He was only 20,000+ blocks over his disk quota (which was about 2000
>blocks).   (Most languages get installed here with EXQUOTA, but not the
>linker.)

A good solution to this is to make SYS$SCRATCH point to a device with
plenty of scratch disk space, and to have a batch job clean out old
files every 15 minutes.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

WIZARD@RITA.ACS.WASHINGTON.EDU (The Bandit "." "." "." "", on" "RITA) (12/21/87)

Rahul Dhesi <bsu-cs!dhesi@iuvax.cs.indiana.edu> responded to my recent
message with two comments.  It seems I needed to take more care to ensure
that my messages are clear.

> A good solution to this is to make SYS$SCRATCH point to a device with
> plenty of scratch disk space, and to have a batch job clean out old
> files every 15 minutes.

Although this suggestion appears to be good, and in many environments
would probably do the trick, it doesn't suit our needs.  As mentioned,
the user was over his disk quota by quite a lot.  Typically, our users
get a scant 750 to 1500 blocks of disk quota.  This is because we have
many many users, and limited disk space.  We don't have any disk with
plenty of disk space.

Our approach is as follows.  I mentioned in my earlier message that the
compilers are installed with EXQUOTA, but I failed to mention that our
editors are also installed with EXQUOTA.  This allows the students to
edit and edit and edit (as well as to compile and compile and compile)
until they get a program which compiles with no errors.  By doing this,
the student can save multiple backups of his/her programs in order to
back out an unwise change.  However, once the program compiles with no
errors, the student MUST clean up his/her directory in order to proceed.

Since most students are learning to program, as well as learning VMS,
this seems to help them the most by keeping VMS problems out of their
way until they have finished dealing with programming problems.

*******************************************************

The second comment made was:

> It [a zero-priority job] certainly ought not to bring the system down,
> unless the amount of disk space set aside for swapping is not enough--and
> if so, a well-designed system will simply abort the job with an
> out-of-memory or out-of-swap-space error message.
> ...
> but I had expected VMS to be better able to protect itself from this.

I believe my choice of the word "died" here was deceptive.  I meant that
these jobs brought the system to its knees.  It was essentially thrashing
itself to death with page-faults.  VMS does handle this situation OK, however.

I understand that in VMS V5.0 there are changes to be made to the job
scheduler to address this particular problem.  Low(er) priority jobs
will have all of their memory taken away in order to service jobs with
higher priorities.  They simply won't get any memory until there is enough
idle time to do anything for it.

Sorry for the confusion.

Derek Haining

jdc@beach.cis.ufl.edu (Jeff Capehart) (12/22/87)

In article <UWAVM.ACS.WASHINGTON.EDU:LAV35bCD*> WIZARD@RITA.ACS.WASHINGTON.EDU (The Bandit "." "." "." "", on" "RITA) writes:
>the user was over his disk quota by quite a lot.  Typically, our users
>get a scant 750 to 1500 blocks of disk quota.  This is because we have
>many many users, and limited disk space.  We don't have any disk with
>plenty of disk space.
>
>Our approach is as follows.  I mentioned in my earlier message that the
>compilers are installed with EXQUOTA, but I failed to mention that our
>editors are also installed with EXQUOTA.  This allows the students to
>edit and edit and edit (as well as to compile and compile and compile)

On our vax, users are divided up into personal accounts and class accounts.
We have a cluster with an 8600 and two 780's.  There are 7 RA81's in use.
Two system, 1 for personal , 2 for class, one backup, and one spare.
Each personal user gets 300 blocks and each class account gets 400.
The compilers and linker are installed with EXQUOTA.  The editor is not.
It was possible to use the EDITOR to exceed many many blocks of diskquota
very easily just by including whatever file and then saving it.
--
Jeff Capehart 		Internet: micronaut%oak.decnet@pine.circa.ufl.edu
University of Florida	UUCP:	  ..!ihnp4!codas!ufcsv!beach.cis.ufl.edu!jdc