klarich@a.cs.okstate.edu (Terry Klarich) (02/16/90)
I have been asked to improve the proformance of our Ultrix machine. It is a 8350 running Ultrix 3.1. How would one use the information given by vmstat and iostat to decide what kernell parameters to change to get the best proformance given our situation. If anyone can help with this problem, I would like to hear from you. Thanks a bunch. ------------------------------------------------------------------------------ Terry Klarich (klarich@a.cs.okstate.edu) n5hts A man is not complete until he is married then, he is finished.
grr@cbmvax.commodore.com (George Robbins) (02/16/90)
In article <5383@okstate.UUCP> klarich@okstate.UUCP (Terry Klarich) writes: > > I have been asked to improve the proformance of our Ultrix machine. It is > a 8350 running Ultrix 3.1. How would one use the information given by vmstat > and iostat to decide what kernell parameters to change to get the best > proformance given our situation. If anyone can help with this problem, I > would like to hear from you. In almost all respects tuning an Ultrix system and interpreting the output of these programs is the same as any other BSD derived operating system. You might find asking over in comp.unix.wizards more profitable. Generalities: uptime: elevated load index buy more cpu vmstat -s: lots of page out / swap out buy more core iostat: uneven loading rearrange partitions Generally, books won't get you too far when it comes to tuning, it's more a matter of looking and learning on your own system (the game being different between trying to improve response of a moderatly loaded system and trying to keep a mega-loaded student timeshareing machine from expiring). I would expect that there would have been a number of papers / sessions at Usenix over the years, but don't have any references... -- George Robbins - now working for, uucp: {uunet|pyramid|rutgers}!cbmvax!grr but no way officially representing: domain: grr@cbmvax.commodore.com Commodore, Engineering Department phone: 215-431-9349 (only by moonlite)
alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) (02/17/90)
The summary refers to the fact that in past years the VMS answer to "How do it make it run faster" was "Get more memory.". That may still be true in your case, but we have to find out. 1. First use vmstat(1) to look at: a. How much memory you have and if you might be paging. b. How the CPU is spending its time. Sample vmstat(1) output: procs memory page disk faults cpu r b w avm fre re at pi po fr de sr x0 x1 x2 x3 in sy cs us sy id 0 0 0 964 7576 0 0 0 0 0 36 0 0 0 0 0 12 32 6 0 1 98 (CPU related stuff) For CPU time look at the end of the line. The columns for "us", "sy" and "id" are times spent in user and kernel mode and idle time (percentages). Since you have a two CPU system you'll also want to look at the individual CPU break down with iostat(1). If the majority of time is being spent in user mode and the slave processor is reasonably busy then the problem is that you don't have a fast enough system. On the other hand if the majority (or a signifcant part) of the time is spent in kernel mode or there is idle time and the slave processor is mostly idle, look for a problem else where. The first place I'd look is "in", "sy" and "cs". These are device interrupts, system calls and context switches (per second). Lots of system calls will tend to create lots of time spent in kernel mode. "Lots" depends a lot on the system and I'm afraid I don't have a good feel for what a lot is on your system. You might want to look at it when the system doesn't seem slow and when it does to see if there is a difference. procs memory page disk faults cpu r b w avm fre re at pi po fr de sr x0 x1 x2 x3 in sy cs us sy id 0 0 0 964 7576 0 0 0 0 0 36 0 0 0 0 0 12 32 6 0 1 98 (Memory usage) If the CPU utilization looks "reasonable" (meaning you have idle time that isn't being used). Look to see if you have enough memory. The current version of ULTRIX tries to keep about 512 KB. free and will start paging and perhaps even swapping to do this. If "fre" is around there or below see if you have many non-zero number in the "re" through "sr" fields. These are the paging stats and are resonably des- cribed in the manual page for vmstat(1). The fields "pi" and "po" represent real paging I/O where "re" and "at" are usually "soft" page faults. If you're paging there are a couple of choices. 1. Get more memory. 2. Use less memory. 3. If you must page, page more efficiently. Getting more memory is good for DEC or the company you buy memory from. Arrangeing to use less memory takes more work on the part of the system manager. Use ps(1) to look for processes that are using lots of memory. If they are user applications work with the users to see if they can reduce the memory requirements. If all else fails you can start looking at hand scheduling them with kill -{STOP,CONT} and letting the page daemon reclaim their pages when they don't run. If you're still stuck with paging I/O look to see if you can arrange the page/swap space so that it is more efficient. Put the page/swap partitions on the fastest disks and spread them between the controllers and disks. If you have the option look at putting the page/swap partition towards the logical middle of the disk (this should be close to the physical middle). Have you changed the size of the buffer cache? It might be better to give some memory back to the system to use for program rather than as buffer cache. You might be able to reduce the amount of memory the system tries to keep free (_lotsfree I think). This should give you enough to start. You might also want to ask your local DEC office for a program called monitor. It might make collecting and looking at the vmstat(1) and iostat(1) style data easier. If they haven't heard of it they can ask me (I also work for DEC). -- Alan Rollow alan@nabeth.enet.dec.com
stefan@wheaton.UUCP (Stefan Brandle ) (02/22/90)
In article <722@shodha.dec.com> alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) writes: > 1. First use vmstat(1) to look at: > > a. How much memory you have and if you might be > paging. > b. How the CPU is spending its time. procs memory page disk faults cpu r b w avm fre re at pi po fr de sr r0 r2 x2 x3 in sy cs us sy id 2 0 0 671 5911 0 0 0 0 0 0 0 2 0 0 0 85 73 14 10 20 70 1 1 0 710 5865 0 0 0 0 0 0 0 2 0 0 0 901 256 10 5 65 31 1 0 0 313 5865 0 0 0 0 0 0 0 0 0 0 0 950 262 9 7 80 13 1 0 0 542 5865 0 0 0 0 0 0 0 1 1 0 0 956 268 10 6 84 9 1 0 0 476 5865 0 0 0 0 0 0 0 1 3 0 0 475 147 26 6 76 18 1 0 0 440 5865 0 0 0 0 0 0 0 1 0 0 0 959 268 14 6 68 26 0 0 0 542 5865 0 0 0 0 0 0 0 1 1 0 0 953 275 9 7 83 11 0 0 0 416 5865 0 0 0 0 0 0 0 2 0 0 0 314 94 6 5 54 41 1 0 0 410 5865 0 0 0 0 0 0 0 0 0 0 0 957 266 15 7 83 10 1 0 0 385 5865 0 0 0 0 0 0 0 1 0 0 0 950 266 9 6 85 9 1 0 0 513 5865 0 0 0 0 0 0 0 1 0 0 0 964 272 11 6 84 10 1 0 0 313 5865 0 0 0 0 0 0 0 1 1 0 0 318 90 24 5 62 33 0 0 0 542 5865 0 0 0 0 0 0 0 1 1 0 0 961 267 20 6 84 9 1 0 0 513 5865 0 0 0 0 0 0 0 1 0 0 0 931 265 12 6 84 10 1 0 0 477 5865 0 0 0 0 0 0 0 1 0 0 0 563 165 6 5 56 39 1 0 0 567 5863 0 0 0 0 0 0 0 1 0 0 0 941 265 9 8 84 8 1 2 0 770 5829 0 0 0 0 0 0 0 3 0 0 0 639 291 25 18 80 2 1 0 0 1066 5787 4 10 5 0 0 0 0 3 1 0 0 491 205 25 19 75 7 2 0 0 856 5658 0 0 0 0 0 0 0 2 0 0 0 920 273 11 11 70 19 1 0 0 942 5655 0 0 0 0 0 0 0 0 0 0 0 954 267 9 8 82 10 I'm running this on a uVAX II Ultrix 2.0. Looks like what makes me sluggish is all those interrupts. It gets up over 1000/second frequently. Wonder if all that news coming in is relevant (:-). On the basis of what Alan said, my problem is not too much user activity. It's also not memory, since we're not running under 5MB free in this case. There is that one blip of activity in paging country, but it doesn't appear to be a big deal at all. procs memory page disk faults cpu r b w avm fre re at pi po fr de sr r0 r2 x2 x3 in sy cs us sy id 1 0 0 1066 5787 4 10 5 0 0 0 0 3 1 0 0 491 205 25 19 75 7 We do have a number of students using this machine and it sometimes gets rather sluggish. My feeling is that news and many students don't mix well on a uVAX II. I can reschedule news--I know how to do that--but wonder whether there is anything else that can be modified kernel-wise that will make a significant difference. My guess is no, but maybe somebody has ideas. -stefan -- ---------------------------------------------- MA Bell: (708) 260-5019 --------- Stefan Brandle UUCP: ...!{obdient,uunet!tellab5}!wheaton!stefan Wheaton College or stefan@wheaton.UUCP Wheaton, IL 60187 "But I never claimed to be sane!"
alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) (02/23/90)
In article <1872@wheaton.UUCP>, stefan@wheaton.UUCP (Stefan Brandle ) writes: > r b w avm fre re at pi po fr de sr r0 r2 x2 x3 in sy cs us sy id > 2 0 0 671 5911 0 0 0 0 0 0 0 2 0 0 0 85 73 14 10 20 70 > 1 1 0 710 5865 0 0 0 0 0 0 0 2 0 0 0 901 256 10 5 65 31 > 1 0 0 313 5865 0 0 0 0 0 0 0 0 0 0 0 950 262 9 7 80 13 > 1 0 0 542 5865 0 0 0 0 0 0 0 1 1 0 0 956 268 10 6 84 9 > 1 0 0 476 5865 0 0 0 0 0 0 0 1 3 0 0 475 147 26 6 76 18 I may have previously commented that I didn't have the experience in determining when the number of interupts was "too many". Sometimes though it's pretty obvious. The real question though is where are they all coming from? The first place I'd look at this point is to what the tty I/O is like. Does seem be a relationship between the high number of interrupts and either tty input or tty output? (Use iostat(1) to look at tty I/O). What are you using the console port for? Keep it find that the console interface is real braindead serial line. Every character input or output over it causes an interrupt. > procs memory page disk faults cpu > r b w avm fre re at pi po fr de sr r0 r2 x2 x3 in sy cs us sy id > 1 0 0 1066 5787 4 10 5 0 0 0 0 3 1 0 0 491 205 25 19 75 7 The paging blip is probably somebody starting up a program. > > I can reschedule news--I know how to do that--but wonder whether > there is anything else that can be modified kernel-wise that will make a > significant difference. My guess is no, but maybe somebody has ideas. Unfortunately there is very little in the kernel that can be "tuned". What you usually have to do is look at all the different *stat programs to find out where the bottlenecks or problems are and then see if you can get rid of them or balance around them. > > -stefan > -- > ------------------------------------------- MA Bell: (708) 260-5019 --------- > Stefan Brandle UUCP: ...!{obdient,uunet!tellab5}!wheaton!stefan > Wheaton College or stefan@wheaton.UUCP > Wheaton, IL 60187 "But I never claimed to be sane!" -- Alan Rollow alan@nabeth.enet.dec.com