RALPH@UHHEPG.BITNET (02/12/88)
Date: 11-FEB-1988 15:30:26.70 From: Ralph Becker-Szendy RALPH AT UHHEPG To: OPERDI,BITNET::"info-vax@kl.sri.com",RALPH Subj: Need help on memory related system tuning Hi everyone I have some system memory tuning questions (i am not the system manager, but she doesn't subscribe to INFO-VAX, and asked me to inquire for her): We have a 11/780 with 16 MB of memory, as system disk an RA81 connected to a unibus-controller, and a RP07 on the massbus. We run VMS 4.2 (should become 4.7 next week). Page- and swapfile live on the RA81 (and therefore on the unibus). Typically there are 15 to 20 users during daytime: half "administration", using mostly Mass-11 for wordprocessing, half "scientists", using Mass-11 too, editing programs (EDT and EVE), compiling, running programs, TeXing. During evenings/nights that drops to 3 to 5 users. Every user has /WSDEFAULT=200, /WSQUOTA=500, /WSEXTENT=1000. Then we have three batch queues: FAST (base_prio=2, job_lim=4) BATCH (base_prio=1, job_lim=2) LONG (base_prio=1, job_lim=1), all with /WSDEFAULT=200, /WSQUOTA=500, /WSEXTENT=1500. Typically the BATCH and LONG queues are full of jobs all the time, with FAST being the fast lane for 10-minute jobs (restricted by cooperation only). Most batch jobs are "mostly CPU-bound" (that is a gross generalization). Software packages (VAXsim-Monitor, Fusion for TCP/IP and ANL-NJE for BITNET) eat another 1000 blocks workspace. There is no clustering, but we have DECNET (because ANL-NJE needs it, there is nothing actually connected). Performance: during the day, the response gets slow sometimes. Usually that happens at periods of heavy page-faulting, with sometimes as much as 25 users logged in. Note that Mass-11 is a real memory hog. Users are discouraged from using a lot of CPU-time online during the day (and they comply, mostly). Note that batch jobs don't get very much CPU-time during the day either, in particular the ones in BATCH and LONG if there is a job in FAST (but that is expected and not a hassle). Now, there are basically three questions: 1. Does it make any sense to split the batch queues further (like FAST at prio=3, BATCH at prio=2, LONG at PRIO=1), looking at total performance (not just to differentiate between jobs). Probably not. Does it make any sense to change certain users from priority 4 to 3 or 5 (like run Mass-11 at priority 5, as it doesn't use a lot of CPU time) ? We don't think so, because the ones in the lower priority would get unbearably bad response. 2. Does the choice of workspace-parameters sound sensefull ? During daytime we have >heavy< page-faulting, but typically 10000 blocks of memory are free (looks very odd to me). During the evening you typically see 25000 free blocks of memory (which i consider a waste) with light page-faulting. 3. The unibus is (supposedly) the slower bus. So, is it sensefull to have both page- and swapfile on the RA81 on the unibus ? Maybe we should have one (or both) on the RP07. Or maybe have smaller ones on one disk, and a secondary one on the other. Obviously the questions are sorted by importance, but in >reverse order< ! I'm sorry ... it feels like a scam to go out and ask the gurus out there, instead of figuring out the tuning manual myself (or having the system manager go and test all the possibilities), but we are a little helpless when it comes to the guts of system tuning, and out here in Hawaii you can't just go and "ask your friendly VMS-neighbour". Thanks a lot in advance, anyhow. As usual, send mail directly, i'll write a summary for the net. Ralph Becker-Szendy RALPH@UHHEPG.BITNET University of Hawaii / High Energy Physics Group (808)948-7391 Watanabe Hall #203, 2505 Correa Road, Honolulu, HI 96822 "Hawaii - it's not just for tourists. People actually live and work there."
tencati@VLSI.JPL.NASA.GOV (Ron Tencati) (02/16/88)
Ralph, In your user's batch files, have them do a SHOW PROCESS/ACCOUNTING before they log off (your interactive users can do this too). Look at the Virtual Page Count. You may be restricting your users to too little physical memory. The number of peak virtual pages used will be your key. If your users are hitting their /wsextent value with their "peak physical memory" usage, and exceeding it with their "peak virtual usage", then consider upping that user`s /wsquo and /wsextent parameters. This will reduce the number of page faults at the expense of using more physical memory. A batch job using lots of memory (/wsquo, /wsextent) running at prio 1 will lock up resources longer. We run a job limit of 2 on our slow batch queue. Don't alter base priority of 4. If you feel you need this, there is a program called NANNY (available in source from ZAR@Hamlet.Caltech.EDU) that allows for automatic/dynamic adjustment of priorities based on system load and job requirements. It's a pretty good program, but it also consumes it's share of resources albeit small. Check your PFRATL sysgen parameter. If you set it to zero, you disable working set "shrinking", which again at the cost of using more physical memory will reduce page faults (this is a dynamic parameter - you can change it to zero, see how the system reacts, then change it back if you wish). You're bound to get more information from the list, but I think I hit on the major points. Good Luck, Ron Tencati System Mgr, VLSI.JPL.NASA.GOV