[comp.os.vms] "Flaw

BOLTHOUSE%MCOPN1@eg.ti.COM (12/21/87)
>In article <UWAVM.ACS.WASHINGTON.EDU:LAGN$uXK*> WIZARD@RITA.ACS.WASHINGTON.EDU 
>(The Bandit "." "." "." "", on" "RITA) writes:
>>We had a user here who submitted three batch jobs, and the system died.
>>Setting the base priority to 0 did not help.  Suspending the processes did.
>>...The problem was that the virtual working set for these jobs
>>was huge, and the jobs were causing page faults like crazy. 

>This sounds like a design flaw (i.e., a bug) in VMS's scheduling and/or
>memory management.  A zero-priority job should simply remain swapped-
>out in its entirety most of the time.  It should get swapped in only
>very rarely, and on those rare occasions, it should get all the pages
>it needs.

Ah yes, another operating system designer.  Rather than blaming the designers
of the VMS memory management system/scheduler for behavior due to poor system
management (you *do* maintain your own SYSGEN parameters, right?) why not
set up your system such that it can survive such applications?  The last thing
a system manager should do is to give *any* user enough of any quota to cause
significant system degradation, in this case modified/free list turnover. 

You have four alternatives.  1.  You might slap the user's hands.  This does
not, generally, have the desired effect.  2.  You might increase the size of
your page caches, which will generally eliminate the problem for a well behaved
application (i.e., it isn't accessing arrays in the wrong order). Of course,
this requires a fair amount of memory, and does not eliminate the case of the
well-intentioned user who submits X large jobs at once. If you were to allow no
subprocesses and set JOBLIM for all queues to one, this problem would be
eliminated, but that's rather a counterproductive approach to take.  3.  You
might flame against DEC.  They aren't going to *significantly* alter the memory
or process management of VMS any time soon.  Believe me.  There are some
nice touches coming in a future, unmentionable release of VMS which makes
the scheduler more like that of TOPS, but these are tweaks, not fundamental
changes.  Fundamental changes are very expensive.

4.  You might look into writing a quick and dirty CMKRNL program which checks
to see if a batch job is executing in a specific queue, (with a JOBLIM of 1,
preferably), and write a new, higher, value into the process' header (PHD).  We
do this -- I'd send out the code, but our lawyers are still arguing about
whether/how to put into public domain.  Suffice it to say it took us about a
man-day to write and test.  Our users run with a pagefile quota of 10000 -
20000 on our 8800, and we have a modified page list which can go to 16000
pages.  (32 Mbytes main memory...small for an 8800).  We have several
production jobs which need about 60000 - 70000 pages which run in batch.  One
fine day, a user thought it would be faster to run 3 of them at a time,
interactively.  That's when we wrote our CMKRNL code.  Now, if anyone tries,
they get 'insufficient virtual memory', typically at image activation, unless
they're running in the queue with 'elevated pagefile quota'.  The code to check
the queue and possibly alter quota runs during the login sequence, and cannot
be bypassed without significant privilege.

David L. Bolthouse
Texas Instruments Defense Electronics Information Systems VAX System Support
McKinney, Texas

Ma Bell:     214-952-2059
Internet:    bolthouse%mcopn1@eg.ti.com

"Standard disclaimer goes here..."