[comp.os.vms] Need help on memory related system tuning

RALPH@UHHEPG.BITNET (02/12/88)

Date: 11-FEB-1988 15:30:26.70
From: Ralph Becker-Szendy RALPH AT UHHEPG
To:   OPERDI,BITNET::"info-vax@kl.sri.com",RALPH
Subj: Need help on memory related system tuning
Hi everyone

I have some system memory tuning questions (i am not the system manager, but
she doesn't subscribe to INFO-VAX, and asked me to inquire for her):

We have a 11/780 with 16 MB of memory, as system disk an RA81 connected to a
unibus-controller, and a RP07 on the massbus. We run VMS 4.2 (should become 4.7
next week). Page- and swapfile live on the RA81 (and therefore on the unibus).

Typically there are 15 to 20 users during daytime: half "administration", using
mostly Mass-11 for wordprocessing, half "scientists", using Mass-11 too,
editing programs (EDT and EVE), compiling, running programs, TeXing. During
evenings/nights that drops to 3 to 5 users. Every user has /WSDEFAULT=200,
/WSQUOTA=500, /WSEXTENT=1000.

Then we have three batch queues:
FAST     (base_prio=2, job_lim=4)
BATCH    (base_prio=1, job_lim=2)
LONG     (base_prio=1, job_lim=1), all with
/WSDEFAULT=200, /WSQUOTA=500, /WSEXTENT=1500. Typically the BATCH and LONG
queues are full of jobs all the time, with FAST being the fast lane for
10-minute jobs (restricted by cooperation only). Most batch jobs are "mostly
CPU-bound" (that is a gross generalization).

Software packages (VAXsim-Monitor, Fusion for TCP/IP and ANL-NJE for BITNET)
eat another 1000 blocks workspace. There is no clustering, but we have DECNET
(because ANL-NJE needs it, there is nothing actually connected).

Performance: during the day, the response gets slow sometimes. Usually that
happens at periods of heavy page-faulting, with sometimes as much as 25 users
logged in. Note that Mass-11 is a real memory hog. Users are discouraged from
using a lot of CPU-time online during the day (and they comply, mostly). Note
that batch jobs don't get very much CPU-time during the day either, in
particular the ones in BATCH and LONG if there is a job in FAST (but that is
expected and not a hassle).

Now, there are basically three questions:
1. Does it make any sense to split the batch queues further (like FAST at
   prio=3, BATCH at prio=2, LONG at PRIO=1), looking at total performance (not
   just to differentiate between jobs). Probably not.

   Does it make any sense to change certain users from priority 4 to 3 or 5
   (like run Mass-11 at priority 5, as it doesn't use a lot of CPU time) ? We
   don't think so, because the ones in the lower priority would get unbearably
   bad response.

2. Does the choice of workspace-parameters sound sensefull ? During daytime we
   have >heavy< page-faulting, but typically 10000 blocks of memory are free
   (looks very odd to me). During the evening you typically see 25000 free
   blocks of memory (which i consider a waste) with light page-faulting.

3. The unibus is (supposedly) the slower bus. So, is it sensefull to have both
   page- and swapfile on the RA81 on the unibus ? Maybe we should have one (or
   both) on the RP07. Or maybe have smaller ones on one disk, and a secondary
   one on the other.

Obviously the questions are sorted by importance, but in >reverse order< !

I'm sorry ... it feels like a scam to go out and ask the gurus out there,
instead of figuring out the tuning manual myself (or having the system manager
go and test all the possibilities), but we are a little helpless when it
comes to the guts of system tuning, and out here in Hawaii you can't just
go and "ask your friendly VMS-neighbour".

Thanks a lot in advance, anyhow. As usual, send mail directly, i'll write
a summary for the net.

Ralph Becker-Szendy                                     RALPH@UHHEPG.BITNET
University of Hawaii / High Energy Physics Group              (808)948-7391
Watanabe Hall #203, 2505 Correa Road, Honolulu, HI 96822
"Hawaii - it's not just for tourists. People actually live and work there."

tencati@VLSI.JPL.NASA.GOV (Ron Tencati) (02/16/88)

  
  Ralph,
  
  In your user's batch files, have them do a SHOW PROCESS/ACCOUNTING before they
  log off (your interactive users can do this too).  Look at the Virtual Page
  Count.  You may be restricting your users to too little physical memory.
  
  The number of peak virtual pages used will be your key.  If your users are 
  hitting their /wsextent value with their "peak physical memory" usage, and
  exceeding it with their "peak virtual usage", then consider upping that 
  user`s /wsquo and /wsextent parameters.  This will reduce the number of
  page faults at the expense of using more physical memory.
  
  A batch job using lots of memory (/wsquo, /wsextent) running at prio 1 will 
  lock up resources longer.  We run a job limit of 2 on our slow batch queue.
  
  Don't alter base priority of 4.  If you feel you need this, there is a program
  called NANNY (available in source from ZAR@Hamlet.Caltech.EDU) that allows for
  automatic/dynamic adjustment of priorities based on system load and job
  requirements.  It's a pretty good program, but it also consumes it's share
  of resources albeit small.
  
  Check your PFRATL sysgen parameter.  If you set it to zero, you disable 
  working set "shrinking", which again at the cost of using more physical
  memory will reduce page faults (this is a dynamic parameter - you can 
  change it to zero, see how the system reacts, then change it back if
  you wish).
  
  You're bound to get more information from the list, but  I think I hit on 
  the major points.
  
  Good Luck,
  
  Ron Tencati
  System Mgr, VLSI.JPL.NASA.GOV