corey@milton.u.washington.edu (Corey Satten) (09/21/90)
Archive-name: kmem/20-Sep-90 Original-posting-by: corey@milton.u.washington.edu (Corey Satten) Original-subject: Performance Tuning a DEC 5000 Ultrix 4.0 Risc Workstation Reposted-by: emv@math.lsa.umich.edu (Edward Vielmetti) [Reposted from comp.unix.ultrix,comp.sys.dec. Comments on this service to emv@math.lsa.umich.edu (Edward Vielmetti).] : ----- cut here ----- cut here ----- cut here ----- cut here ----- : This is a "shell archive". Save everything after the cut mark : in a file called thisstuff, then feed it to sh by typing sh thisstuff. : SHAR archive format. Archive created Thu Sep 20 09:13:16 PDT 1990 echo x - READ_ME echo '-rw-r--r-- 2 corey 6125 Sep 20 09:12 READ_ME (as sent)' sed 's/^-//' >READ_ME <<'+FUNKY+STUFF+' - Performance Tuning a DEC 5000 Ultrix 4.0 Risc Workstation - - Corey Satten, corey@cac.washington.edu - and - Laurence Lundblade, lgl@cac.washington.edu - - Networks and Distributed Computing - University of Washington - Seattle, Washington - September 1990 - - - -History: - - Until August 1990, our department was using a rather maximally - configured pmax (DEC 3100 running Ultrix 3.1) as a time-sharing host. - It had five disks, mostly Maxtor 660 megs. It served /usr/local/bin - via NFS to about a dozen workstations; was the departmental electronic - mail machine; host to some campus wide mailing lists; our anonymous FTP - server; one of two campus default domain nameservers; and also - time-sharing host for about 16 X-terminals plus about a dozen other users - connected via telnet. We were supporting about 150 megs of swap space - on some small portion of the 24 megabyte physical memory. A 'ps aux' - listing usually had 250-300 lines in it. - - As you might guess, the machine wasn't always snappy, but it did - admirably. It was clearly disk i/o limited -- mostly, we assumed, - because it was usually thrashing. Still, the load average was usually - between 1-2 and it was mostly the spikes which were annoying. - - Mid August we upgraded to a 3max (DEC 5000) running Ultrix 4.0. We - doubled our RAM to 48 megs, increased our MIPS rating by 50-80% and felt - that the system was slower than ever. According to the `ps' program we - were still thrashing even though our active virtual memory was less than - the physical memory available to support it. As we looked more closely, - we discovered that the system wasn't even paging, it was swapping, and - making stupid choices of what to swap, at that! - -Analysis: - - Eventually we decided that the constants involved in the 2-handed - clock paging algorithm are no longer appropriate. In particular: - - lotsfree = 128 (512k) - desfree = 64 (256k) - minfree = 24 (96k) - maxpgio = 60 (4k pages per second) - slowscan = 94 (computed) - fastscan = 47 (computed/2) - - In the old days, programs were small and the extra memory needed - to start several could be obtained from a 512k-byte free list. Today, - programs are bloated with X libraries, etc. Our average process is - about 500k. At the scan rates we were seeing: 100-200 4k pages/second, - scanning simply couldn't keep up with the demand. Our free list hovered - right around the minimum threshold which triggered swapping. - - We examined some old source code and discovered what factors can - trigger swapping. Several of these, such as load>2 are compiled into - the kernel as constants or are computed into local variables -- these - can only be changed by recompiling the kernel -- something we can't do - until DEC releases the current source. Fortunately a significant number - of the terms in the equation are stored in global variables which can - be fiddled on a running system. By changing a few values, we believe - we have virtually eliminated swapping on our system and raised the - interactive performance level substantially. - - On our system we have made the following changes: - - lotsfree = 1280 (5 meg) - desfree = 256 (1 meg) - minfree = 64 (256k) - maxpgio = 125 (4k pages per second) - slowscan = 30 - fastscan = 10 - - In this way, we try to have 5 megs of free list for programs to - absorb transient loads, we can replenish the free list 5 times faster - than the default, and we've increased the allowable page-in plus page-out - rate to 125 (I can easily make our system burst to 150 and sustain 125, - so I don't think 125 is indicative that the paging system is in distress. - Also, when choosing your own numbers, remember that vmstat displays `pi' - and `po' in 1k pages). - - Since DEC has phased-out adb, I wrote a program to allow us to make - these changes. I've called it `kmem' and it works like this: - - prompt% kmem lotsfree desfree # to read values - lotsfree(0x8014ba40) 1280 - desfree(0x8014ba48) 256 - - prompt% kmem -w lotsfree=1281 desfree=257 # to write values - lotsfree(0x8014ba40) 1280 -> 1281 - desfree(0x8014ba48) 256 -> 257 - - Once you find values you're happy with, stick it in /etc/rc.local - and be happy. The source to `kmem' is included in this directory. - -Final Disclaimers: - - By re-compiling the kernel, we expect we can do still better. We - believe the clock paging algorithm still isn't working very well and - even though we see better performance when paging than swapping, we - suspect that because the "global page replacement" algorithm is making - its decisions on very local (2megabyte spread between hands) page use - data we aren't making very good use of physical memory. To support this - claim, we notice that our cpu usually shows substantial idle time even - when the load is greater than 1 and the "active real memory" field we - print in our "vmstat" listing (from t_arm) usually shows lots of our - physical memory is "inactive" when we think it shouldn't be. - - By increasing desfree to 640 (2.5meg) we can partially re-enable - swapping of only "deadwood" (jobs sleeping for longer than 20 seconds). - We find this helps increase our active real memory and decrease our idle - cpu but at an unacceptable degradation in interactive response time. - - Before I finish, I should probably point out that in addition to the - load you might expect on our system, we have 3 anomalies: first, we - have about 60-80 processes such as xclock, which wake-up every now and - then to check/update something and then sleep for a short while longer. - Second, we have an unusually large number of very popular shell scripts - which start dozens of little awks, seds, greps, etc. Third, we have - 3 swap disks configured and we think we've done a good job of spreading - all disk requests across all the drives. - --------- -Corey Satten, corey@cac.washington.edu -Networks and Distributed Computing -University of Washington +FUNKY+STUFF+ chmod u=rw,g=r,o=r READ_ME ls -l READ_ME echo x - kmem.c echo '-rw-r--r-- 2 corey 2910 Sep 10 20:20 kmem.c (as sent)' sed 's/^-//' >kmem.c <<'+FUNKY+STUFF+' -/* - * a tool to use in place of adb (on systems without adb) which lets you - * peek and poke at the values of kernel variables in /dev/kmem - * - * usage: kmem var1 var2 ... varN - * or - * usage: kmem -w var1=val1 var2=val2 ... varN=valN - * - * Corey Satten, corey@cac.washington.edu, 9/6/90 - Ultrix 4.0 version - */ -#include<stdio.h> -#include<nlist.h> -#include<sys/file.h> - -struct nlist *nl; /* how we find locations of names */ -int *nv; /* the new values for each name */ -int w_flag = 0; /* write new values? */ -char *file = "/vmunix"; /* default file to read symbols from */ -int kmem; - -main(argc, argv) - int argc; - char *argv[]; -{ - int f; /* walks argv upto index of first non-flag */ - int i; /* walks through remaining arguments */ - int value = 0; - int rc = 0; - - /* - * flag parsing - */ - for (f=1; f<argc && *(argv[f]) == '-'; ++f) { - switch(argv[f][1]) { - default: - fprintf(stderr, "%s: unknown flag -%c\n", argv[0], argv[f][1]); - exit(1); - case 'w': - w_flag = 1; - break; - case 'f': - file = argv[++f]; - break; - } - } - - /* - * handle the remaining arguments as either symname or symname=value - * depending on whether -w (w_flag) was specified. - */ - - nl = (struct nlist *) malloc( sizeof(*nl) * (argc-f+1) ); - nv = (int *) malloc( sizeof(int) * (argc-f+1) ); - if (!nv || !nl) {perror("malloc"); exit(1);}; - - for (i=0; i<argc-f; ++i) { - char *name = (char *)malloc(strlen(argv[i+f]+1)); - - if (!name) {perror("malloc"); exit(1);}; - rc = sscanf(argv[i+f], "%[^=]=%d", name, &value); - if (rc - w_flag != 1) { - fprintf(stderr, "%s: bad argument: %s\n", argv[0], argv[i+f]); - exit(1); - } - nl[i].n_name = name; - nv[i] = value; - } - nl[i].n_name = ""; - - /* - * now figure out where to read/write in /dev/kmem and do it - */ - - nlist(file, nl); - - kmem = open("/dev/kmem", w_flag ? O_RDWR : O_RDONLY); - if (kmem < 0) { - perror("/dev/kmem open"); - exit(1); - } - - for (i=0; i<argc-f; ++i) { - long seekto = (long)nl[i].n_value; - - if (nl[i].n_type == 0) { - fprintf(stderr, "%s: symbol `%s' not found in namelist of %s\n", - argv[0], nl[i].n_name, file); - /* - * We promise to do all writes in command line order, so if one - * is going to fail, we'd best bail out rather than continue. - */ - if (w_flag) exit(2); - else continue; - } - if ( lseek(kmem, seekto, 0) != seekto ) { - perror("/dev/kmem lseek"); exit(2); - } - if ( read(kmem, &value, sizeof(int)) != sizeof(int) ) { - perror("/dev/kmem read"); exit(2); - } - - printf("%s(0x%x)\t%d", nl[i].n_name, nl[i].n_value, value); - - if (w_flag) { - if ( lseek(kmem, seekto, 0) != seekto ) { - perror("/dev/kmem lseek"); exit(2); - } - value = nv[i]; - printf(" -> %d", value); - if ( write(kmem, &value, sizeof(int)) != sizeof(int) ) { - perror("/dev/kmem write"); exit(2); - } - } - putchar('\n'); - } -} +FUNKY+STUFF+ chmod u=rw,g=r,o=r kmem.c ls -l kmem.c echo x - Makefile echo '-rw-r--r-- 2 corey 28 Sep 10 17:56 Makefile (as sent)' sed 's/^-//' >Makefile <<'+FUNKY+STUFF+' -kmem: kmem.o - cc -o $@ $@.o +FUNKY+STUFF+ chmod u=rw,g=r,o=r Makefile ls -l Makefile exit 0