[mod.computers.vax] Tuning

rod@CHEETA.ISI.EDU.UUCP (03/25/87)

Hey Jerry, (or anyone else who can answer questions quickly, coherently,
and correctly),

	With all this talk about tuning in general, I have a specific
question. We have several 750's in a cluster with one 8650. Most of the
750's are nominally single-user machines.

	The setup:

	8 Meg of memory
	WSMAX	 12288
	WSEXTENT 12288
	WSQUO	1024
	WSDEF	150
	BORROWLIM	236
	FREEGOAL	189
	GROWLIM		188
	FREELIM		63

The primary use of these machines is a large program which likes to have
a working set of at least 10240 pages (its response is horrendous and it
pages something awful without that), but will gladly take whatever it
can get.

    With all the other activity on the machine (mail, small user processes,
Decnet, etc.) there are usually around 10-11K pages to be had, sometimes
a little more.

    The problem: The user is running along just fine, with a working set
of around 10K pages, when all of a sudden he gets knocked on his butt,
all the way back to 1024 pages. It then takes him a while to get back
up to speed again.

    What could be causing this? How can we remedy it?

    PFRATL is currently set to 0, but raising it to something low
didn't seem to correct the behavior. I think it's still probably not
a good idea to have it at zero. Comments? PFRATH is 120. The idea is to
try to keep the user in memory even if he's not pagefaulting much. Is
there a better way?

    WSINC is 150, WSDEC is 35.

    Is there anything wrong with raising WSEXTENT all the way through the
ceiling? It couldn't hurt to allow users to get more memory on the 8650
than the 750s, I don't suppose, provided no one else wants it. I would
assume the proper way to handle this is to make WSEXTENT enormous, and then
set WSMAX to something appropriate on each machine, like around the size
of physical memory minus some overhead. Am I missing anything?


	    Thanks,

	    --Rod

CHRIS@ENGVAX.SCG.HAC.COM.UUCP (03/25/87)

[bug poison]

     Sigh.  I open my mouth and find out just how spoiled I really am.  I did
not mean to get into a general tuning debate when I sent my message to INFO-VAX
describing what I have learned about working set parameters, but I guess that
I wasn't careful enough in describing where I'm coming from.

     Where I'm coming from is an environment where we have at least "enough"
memory, and a management that knows enough to listen to us when we say that we
need more *before* it's too late.  We also have a general mix of users, very
few of whom are running massive jobs (these run a couple of times a month at
best).  I think that our systems run with what might termed by DEC a "typical"
workload environment.  I still say that in this case, where you don't have big
jobs and where you have enough memory, swapping is generally *very* bad.

     There are cases when swapping is good.  These cases *usually* occur on
systems that don't have "enough" memory or that run massive jobs (there may be
others, but they haven't been pointed out to me, yet...).  I have no experience
trying to run memory-poor systems, and I really do feel sorry for those of you
that are stuck with doing so, but if you don't have enough memory, all that I
can suggest is that you start (or keep) bitching to whomever can get you more
and that you do the best you can with what you've got.  I get no commissions,
lunches, 6 packs of Dr. Pepper, or any other compensation for you getting more
memory, but I firmly believe that having "enough" memory is akin to having
"enough" disk-space, it's just harder to measure.

     I feel that massive jobs should be run a) in a batch queue or b) on a
machine dedicated to (and tuned for) running massive jobs.  With a batch queue,
at least you can suspend the jobs in the queue during "peak" hours or set the
queue to a priority lower than interactive so that they get swapped out first
(again, in the general case, I know all about batch jobs doing a lot of I/O
screwing this model up).  With a dedicated "big-job" machine, I'm sure that
there are people running them who would be willing to give pointers to those
who might need it.  I could come up with some guesses that I think are pretty
good that are in reality dead wrong, so I won't say anything.

-- Chris Yoder                  UUCP -- {allegra or ihnp4}!scgvaxd!engvax!chris
   Hughes Aircraft Company  Internet -- @ymir.bitnet:chris@engvax.scg.hac.com
                                ARPA -- chris%engvax.uucp@usc-oberon.usc.edu

JCV@CERNVM.BITNET.UUCP (03/26/87)

Your problem probably resides with '...a nominally single-user system'.
Apparently, your user with the enormous working set gets trimmed down to
his WSQUO. This is done by the swapper before starting to swap in an effort
to recover physical memory - somebody else (the non-nominal users!) requested
it. As the swapper doesn't really know how much it might need, it just cuts
your poor user down to WSQUO (which is guaranteed to every process in the
system).

There are three possibilities I can see (not all exclusive) to help in this
situation:

1 - Make it a real single user system or more severely restrict the number
    of user you allow to log in.

2 - Raise the WSQUO for your special user so he his guaranteed a larger share
    of memory. You could have the login file set the actual WSQUO for the
    process to a lower value and a command file raise it to the authorized
    value just before your monster of a program is run.

3 - Increase WSINC. This will allow the program to reach full strength as
    quickly as possible when it does get trimmed, but works on a system-wide
    basis and might generate problems from the other users.

Finally, check you aren't wasting memory elsewhere: Check the sizes of nonpaged
pool (AUTOGEN tends to grossly oversize them, especially in such an environ-
ment). Check you aren't wasting memory on too large a system working set:
Only install the images which are really shared /SHARED, reduce the parameter
GBLSECTIONS to its necessary minimum, same for GBLPAGES and PAGEDYN.
And last, if you're using NETSERVER heavily, consider applying a small patch
(which DEC should have supplied since VMS 3.5...) to make it purge its
working set before it goes to sleep. This typically saves 200-250 pages per
NETSERVER process.

 -- Jan

LEICHTER-JERRY@YALE.ARPA.UUCP (03/26/87)

[The author described a 750 set up with a very large WSEXTENT and a moderate
size WSQUOTA, used mainly to run a single very large process.  Every once in
a while, a process that's been happily running using all of its WSEXTENT is
suddenly knocked down to its WSQUOTA.  The question is, why?]

Quick guess:  If I remember right, the Swapper will trim a process back to
WSQUOTA before swapping it out.  This would give the symptoms you describe.
As to WHY the Swapper would decide it was time for the process to be swapped,
I can't say.  Running one process above BALSETCNT would do it, and the extra
process could be something like a network server started up to receive some
mail, for example.

You don't mention what you have BALSETCNT set to, but given your other numbers
and your intended usage of the machine I'd guess you have it set fairly small.

							-- Jerry
-------