[comp.unix.ultrix] Tuning up an Ultrix system? HELP

klarich@a.cs.okstate.edu (Terry Klarich) (02/16/90)

I have been asked to improve the proformance of our Ultrix machine.  It is
a 8350 running Ultrix 3.1.  How would one use the information given by vmstat
and iostat to decide what kernell parameters to change to get the best
proformance given our situation.  If anyone can help with this problem, I
would like to hear from you.

Thanks a bunch.
------------------------------------------------------------------------------
Terry Klarich (klarich@a.cs.okstate.edu) n5hts
A man is not complete until he is married then, he is finished.

grr@cbmvax.commodore.com (George Robbins) (02/16/90)

In article <5383@okstate.UUCP> klarich@okstate.UUCP (Terry Klarich) writes:
> 
> I have been asked to improve the proformance of our Ultrix machine.  It is
> a 8350 running Ultrix 3.1.  How would one use the information given by vmstat
> and iostat to decide what kernell parameters to change to get the best
> proformance given our situation.  If anyone can help with this problem, I
> would like to hear from you.

In almost all respects tuning an Ultrix system and interpreting the output
of these programs is the same as any other BSD derived operating system.
You might find asking over in comp.unix.wizards more profitable.

Generalities:

uptime:		elevated load index		buy more cpu

vmstat -s:	lots of page out / swap out	buy more core

iostat:		uneven loading			rearrange partitions

Generally, books won't get you too far when it comes to tuning, it's more a
matter of looking and learning on your own system (the game being different
between trying to improve response of a moderatly loaded system and trying
to keep a mega-loaded student timeshareing machine from expiring).  I would
expect that there would have been a number of papers / sessions at Usenix
over the years, but don't have any references...

-- 
George Robbins - now working for,     uucp:   {uunet|pyramid|rutgers}!cbmvax!grr
but no way officially representing:   domain: grr@cbmvax.commodore.com
Commodore, Engineering Department     phone:  215-431-9349 (only by moonlite)

alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) (02/17/90)

	The summary refers to the fact that in past years the 
	VMS answer to "How do it make it run faster" was "Get 
	more memory.".   That may still be true in your case,
	but we have to find out.

	1.  First use vmstat(1) to look at:

	    a.  How much memory you have and if you might be
		paging.
	    b.  How the CPU is spending its time.

	Sample vmstat(1) output:

 procs     memory                       page      disk  faults          cpu
 r b w   avm  fre  re at  pi  po  fr  de  sr x0 x1 x2 x3  in  sy  cs us sy id
 0 0 0   964 7576   0  0   0   0   0  36   0  0  0  0  0  12  32   6  0  1 98

							  (CPU related stuff)

	For CPU time look at the end of the line.  The columns for
	"us", "sy" and "id" are times spent in user and kernel mode 
	and idle time (percentages).  Since you have a two CPU system
	you'll also want to look at the individual CPU break down
	with iostat(1).  If the majority of time is being spent in
	user mode and the slave processor is reasonably busy then
	the problem is that you don't have a fast enough system.

	On the other hand if the majority (or a signifcant part) of
	the time is spent in kernel mode or there is idle time and
	the slave processor is mostly idle, look for a problem else
	where.  The first place I'd look is "in", "sy" and "cs".

	These are device interrupts, system calls and context switches
	(per second).  Lots of system calls will tend to create lots
	of time spent in kernel mode.  "Lots" depends a lot on the
	system and I'm afraid I don't have a good feel for what a lot
	is on your system.  You might want to look at it when the system
	doesn't seem slow and when it does to see if there is a difference.

 procs     memory                       page      disk  faults          cpu
 r b w   avm  fre  re at  pi  po  fr  de  sr x0 x1 x2 x3  in  sy  cs us sy id
 0 0 0   964 7576   0  0   0   0   0  36   0  0  0  0  0  12  32   6  0  1 98
	(Memory usage)

	If the CPU utilization looks "reasonable" (meaning you have
	idle time that isn't being used).  Look to see if you have
	enough memory.  The current version of ULTRIX tries to keep
	about 512 KB. free and will start paging and perhaps even
	swapping to do this.  If "fre" is around there or below see 
	if you have many non-zero number in the "re" through "sr"
	fields.  These are the paging stats and are resonably des-
	cribed in the manual page for vmstat(1).  The fields "pi"
	and "po" represent real paging I/O where "re" and "at" are
	usually "soft" page faults.

	If you're paging there are a couple of choices.

	1.  Get more memory.
	2.  Use less memory.
	3.  If you must page, page more efficiently.

	Getting more memory is good for DEC or the company you buy 
	memory from.  Arrangeing to use less memory takes more work
	on the part of the system manager.  Use ps(1) to look for
	processes that are using lots of memory.  If they are user
	applications work with the users to see if they can reduce
	the memory requirements.  If all else fails you can start
	looking at hand scheduling them with kill -{STOP,CONT} and
	letting the page daemon reclaim their pages when they don't
	run.

	If you're still stuck with paging I/O look to see if you can
	arrange the page/swap space so that it is more efficient.
	Put the page/swap partitions on the fastest disks and
	spread them between the controllers and disks.  If you have
	the option look at putting the page/swap partition towards
	the logical middle of the disk (this should be close to the
	physical middle).

	Have you changed the size of the buffer cache?  It might be
	better to give some memory back to the system to use for
	program rather than as buffer cache.  You might be able to
	reduce the amount of memory the system tries to keep free
	(_lotsfree I think).

	This should give you enough to start.  You might also want
	to ask your local DEC office for a program called monitor.
	It might make collecting and looking at the vmstat(1) and
	iostat(1) style data easier.  If they haven't heard of it
	they can ask me (I also work for DEC).
-- 
Alan Rollow				alan@nabeth.enet.dec.com

stefan@wheaton.UUCP (Stefan Brandle ) (02/22/90)

In article <722@shodha.dec.com> alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) writes:
>	1.  First use vmstat(1) to look at:
>
>	    a.  How much memory you have and if you might be
>		paging.
>	    b.  How the CPU is spending its time.
 procs     memory                       page      disk  faults          cpu
 r b w   avm  fre  re at  pi  po  fr  de  sr r0 r2 x2 x3  in  sy  cs us sy id
 2 0 0   671 5911   0  0   0   0   0   0   0  2  0  0  0  85  73  14 10 20 70
 1 1 0   710 5865   0  0   0   0   0   0   0  2  0  0  0 901 256  10  5 65 31
 1 0 0   313 5865   0  0   0   0   0   0   0  0  0  0  0 950 262   9  7 80 13
 1 0 0   542 5865   0  0   0   0   0   0   0  1  1  0  0 956 268  10  6 84  9
 1 0 0   476 5865   0  0   0   0   0   0   0  1  3  0  0 475 147  26  6 76 18
 1 0 0   440 5865   0  0   0   0   0   0   0  1  0  0  0 959 268  14  6 68 26
 0 0 0   542 5865   0  0   0   0   0   0   0  1  1  0  0 953 275   9  7 83 11
 0 0 0   416 5865   0  0   0   0   0   0   0  2  0  0  0 314  94   6  5 54 41
 1 0 0   410 5865   0  0   0   0   0   0   0  0  0  0  0 957 266  15  7 83 10
 1 0 0   385 5865   0  0   0   0   0   0   0  1  0  0  0 950 266   9  6 85  9
 1 0 0   513 5865   0  0   0   0   0   0   0  1  0  0  0 964 272  11  6 84 10
 1 0 0   313 5865   0  0   0   0   0   0   0  1  1  0  0 318  90  24  5 62 33
 0 0 0   542 5865   0  0   0   0   0   0   0  1  1  0  0 961 267  20  6 84  9
 1 0 0   513 5865   0  0   0   0   0   0   0  1  0  0  0 931 265  12  6 84 10
 1 0 0   477 5865   0  0   0   0   0   0   0  1  0  0  0 563 165   6  5 56 39
 1 0 0   567 5863   0  0   0   0   0   0   0  1  0  0  0 941 265   9  8 84  8
 1 2 0   770 5829   0  0   0   0   0   0   0  3  0  0  0 639 291  25 18 80  2
 1 0 0  1066 5787   4 10   5   0   0   0   0  3  1  0  0 491 205  25 19 75  7
 2 0 0   856 5658   0  0   0   0   0   0   0  2  0  0  0 920 273  11 11 70 19
 1 0 0   942 5655   0  0   0   0   0   0   0  0  0  0  0 954 267   9  8 82 10

I'm running this on a uVAX II Ultrix 2.0.
Looks like what makes me sluggish is all those interrupts.  It gets up over
1000/second frequently.  Wonder if all that news coming in is relevant (:-).

On the basis of what Alan said, my problem is not too much user activity.  It's
also not memory, since we're not running under 5MB free in this case.  There is
that one blip of activity in paging country, but it doesn't appear to be a big
deal at all.

 procs     memory                       page      disk  faults          cpu
 r b w   avm  fre  re at  pi  po  fr  de  sr r0 r2 x2 x3  in  sy  cs us sy id
 1 0 0  1066 5787   4 10   5   0   0   0   0  3  1  0  0 491 205  25 19 75  7

We do have a number of students using this machine and it sometimes gets rather
sluggish.  My feeling is that news and many students don't mix well on a 
uVAX II.  I can reschedule news--I know how to do that--but wonder whether 
there is anything else that can be modified kernel-wise that will make a 
significant difference.  My guess is no, but maybe somebody has ideas.

-stefan
-- 
---------------------------------------------- MA Bell: (708) 260-5019 ---------
Stefan Brandle                  UUCP: ...!{obdient,uunet!tellab5}!wheaton!stefan
Wheaton College			or	stefan@wheaton.UUCP
Wheaton, IL 60187 		"But I never claimed to be sane!"

alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) (02/23/90)

In article <1872@wheaton.UUCP>, stefan@wheaton.UUCP (Stefan Brandle ) writes:

>  r b w   avm  fre  re at  pi  po  fr  de  sr r0 r2 x2 x3  in  sy  cs us sy id
>  2 0 0   671 5911   0  0   0   0   0   0   0  2  0  0  0  85  73  14 10 20 70
>  1 1 0   710 5865   0  0   0   0   0   0   0  2  0  0  0 901 256  10  5 65 31
>  1 0 0   313 5865   0  0   0   0   0   0   0  0  0  0  0 950 262   9  7 80 13
>  1 0 0   542 5865   0  0   0   0   0   0   0  1  1  0  0 956 268  10  6 84  9
>  1 0 0   476 5865   0  0   0   0   0   0   0  1  3  0  0 475 147  26  6 76 18

	I may have previously commented that I didn't have the
	experience in determining when the number of interupts
	was "too many".  Sometimes though it's pretty obvious.
	The real question though is where are they all coming
	from?  The first place I'd look at this point is to what
	the tty I/O is like.  Does seem be a relationship between
	the high number of interrupts and either tty input or tty
	output?  (Use iostat(1) to look at tty I/O).

	What are you using the console port for?  Keep it find that
	the console interface is real braindead serial line.  Every
	character input or output over it causes an interrupt.

>  procs     memory                       page      disk  faults          cpu
>  r b w   avm  fre  re at  pi  po  fr  de  sr r0 r2 x2 x3  in  sy  cs us sy id
>  1 0 0  1066 5787   4 10   5   0   0   0   0  3  1  0  0 491 205  25 19 75  7

	The paging blip is probably somebody starting up a program.
> 
> I can reschedule news--I know how to do that--but wonder whether 
> there is anything else that can be modified kernel-wise that will make a 
> significant difference.  My guess is no, but maybe somebody has ideas.

	Unfortunately there is very little in the kernel that can be
	"tuned".  What you usually have to do is look at all the
	different *stat programs to find out where the bottlenecks
	or problems are and then see if you can get rid of them or
	balance around them.
> 
> -stefan
> -- 
> ------------------------------------------- MA Bell: (708) 260-5019 ---------
> Stefan Brandle                UUCP: ...!{obdient,uunet!tellab5}!wheaton!stefan
> Wheaton College			or	stefan@wheaton.UUCP
> Wheaton, IL 60187 		"But I never claimed to be sane!"

-- 
Alan Rollow				alan@nabeth.enet.dec.com