BOLTHOUSE%MCOPN1@eg.ti.COM (12/21/87)
>In article <UWAVM.ACS.WASHINGTON.EDU:LAGN$uXK*> WIZARD@RITA.ACS.WASHINGTON.EDU >(The Bandit "." "." "." "", on" "RITA) writes: >>We had a user here who submitted three batch jobs, and the system died. >>Setting the base priority to 0 did not help. Suspending the processes did. >>...The problem was that the virtual working set for these jobs >>was huge, and the jobs were causing page faults like crazy. >This sounds like a design flaw (i.e., a bug) in VMS's scheduling and/or >memory management. A zero-priority job should simply remain swapped- >out in its entirety most of the time. It should get swapped in only >very rarely, and on those rare occasions, it should get all the pages >it needs. Ah yes, another operating system designer. Rather than blaming the designers of the VMS memory management system/scheduler for behavior due to poor system management (you *do* maintain your own SYSGEN parameters, right?) why not set up your system such that it can survive such applications? The last thing a system manager should do is to give *any* user enough of any quota to cause significant system degradation, in this case modified/free list turnover. You have four alternatives. 1. You might slap the user's hands. This does not, generally, have the desired effect. 2. You might increase the size of your page caches, which will generally eliminate the problem for a well behaved application (i.e., it isn't accessing arrays in the wrong order). Of course, this requires a fair amount of memory, and does not eliminate the case of the well-intentioned user who submits X large jobs at once. If you were to allow no subprocesses and set JOBLIM for all queues to one, this problem would be eliminated, but that's rather a counterproductive approach to take. 3. You might flame against DEC. They aren't going to *significantly* alter the memory or process management of VMS any time soon. Believe me. There are some nice touches coming in a future, unmentionable release of VMS which makes the scheduler more like that of TOPS, but these are tweaks, not fundamental changes. Fundamental changes are very expensive. 4. You might look into writing a quick and dirty CMKRNL program which checks to see if a batch job is executing in a specific queue, (with a JOBLIM of 1, preferably), and write a new, higher, value into the process' header (PHD). We do this -- I'd send out the code, but our lawyers are still arguing about whether/how to put into public domain. Suffice it to say it took us about a man-day to write and test. Our users run with a pagefile quota of 10000 - 20000 on our 8800, and we have a modified page list which can go to 16000 pages. (32 Mbytes main memory...small for an 8800). We have several production jobs which need about 60000 - 70000 pages which run in batch. One fine day, a user thought it would be faster to run 3 of them at a time, interactively. That's when we wrote our CMKRNL code. Now, if anyone tries, they get 'insufficient virtual memory', typically at image activation, unless they're running in the queue with 'elevated pagefile quota'. The code to check the queue and possibly alter quota runs during the login sequence, and cannot be bypassed without significant privilege. David L. Bolthouse Texas Instruments Defense Electronics Information Systems VAX System Support McKinney, Texas Ma Bell: 214-952-2059 Internet: bolthouse%mcopn1@eg.ti.com "Standard disclaimer goes here..."