jpl@allegra.UUCP (John P. Linderman) (06/26/84)
Synopis: /etc/pstat -s does not report accurately on the swapping system Symptoms: When /etc/pstat is invoked with the -s option, it is inclined to spew out a number of bogus "swap: rmap ovflo" messages and then give the wrong results. Repeat by: Try it. Go ahead. Make your day. My output looked like this: 47566 used (2002 text), 152562 free, 18178 wasted, 2112 missing avail: swap: rmap ovflo, lost [200704,201024) swap: rmap ovflo, lost [59392,200000) 4*1024 3*512 10*256 17*128 Sometimes I'd get many more ovflo messages. Quick fix: Insert the line marked with + between the other two lines from /usr/src/etc/pstat.c. swapmap->m_name = "swap"; + swapmap->m_limit = (struct mapent *)(swapmap + nswapmap); nswap = getw(nl[SNSWAP].n_value); Clean fix: Pstat determines the amount of available swap space by copying the the swap map out of the kernel, and then allocating blocks from this copy. Of course, this gives reasonable results only if the allocation scheme that pstat uses on the copy parallels the scheme used by the kernel on the original. Pstat parallels the kernel allocation scheme because somebody copied most of /sys/sys/subr_rmap.c (where the kernel allocation code is defined) into pstat.c, then massaged it to work in that context. The slip in transcribing the algorithm is not my main problem with the way it was done; the slip can be fixed. What bothers me more is that the kernel allocation scheme might change, and pstat won't track those changes. [A suggested change to subr_rmap.c is included in this report, as if to prove that changes are not inevitable.] The "right" way to fix the problem is to make some simple changes to subr_rmap.c and pstat.c that allow the latter to #include the former, making it much easier to track changes. [Pstat can't just load the subr_rmap.o file, because it is compiled with KERNEL defined.] Fortunately, /sys/sys/subr_rmap.c is a nicely written piece of software, so the changes to both routines are relatively minor. First, the changes to subr_rmap.c. We bracket the #includes with #ifdef KERNEL so we won't get multiple definitions when subr_rmap.c is included by pstat.c. Then we add a comment pointing back to pstat, to warn people about the dependency. + #ifdef KERNEL #include "../h/param.h" #include "../h/systm.h" #include "../h/map.h" #include "../h/dir.h" #include "../h/user.h" #include "../h/proc.h" #include "../h/text.h" #include "../h/kernel.h" + #endif KERNEL ... * N.B.: the current implementation uses a dense array and does * not admit the value ``0'' as a legal address, since that is used * as a delimiter. + * + * This routine is included in pstat.c to check on swap allocation. + * If changes are made here, pstat should be remade and checked + * for compatibility. */ Here comes the "optional" change to subr_rmap.c. What's going on is that blocks of size dmmax should be allocated on dmmax boundaries when dealing with swap space and multiple swap devices. Variable "first" identifies how many bytes must be skipped to get to a dmmax boundary. If we start out on a dmmax boundary, which is not unlikely, since we will often allocate dmmax-sized chunks starting at dmmax boundaries, first gets set to dmmax, not 0. The effect is to leave a chunk of size dmmax that must be inserted into the allocation list, where it will be allocated the next time a request for dmmax bytes comes in (and force the list to collapse.) The space isn't "wasted", but the effort inserting it into the list and collapsing it out is wasted, and there may be undesirable global effects because blocks get allocated 2,1,4,3,6,5... instead of 1,2,3,4,5,6. Anyway, I put in a simple check for the [not-so-]special case, and our swap disks haven't exploded, so it works. /* * Search for a piece of the resource map which has enough * free space to accomodate the request. */ for (bp = ep; bp->m_size; bp++) { if (bp->m_size >= size) { /* * If allocating from swapmap, * then have to respect interleaving * boundaries. */ if (mp == swapmap && nswdev > 1 && #ifndef NEW - (first = dmmax - bp->m_addr%dmmax) < bp->m_size) { #else NEW + (first = dmmax - bp->m_addr%dmmax) != dmmax && + first < bp->m_size) { #endif NEW if (bp->m_size - first < size) continue; addr = bp->m_addr + first; rest = bp->m_size - first - size; bp->m_size = first; if (rest) rmfree(swapmap, rest, addr+size); return (addr); } The #ifdef that follows logically removes a piece of code that everyone seems to agree doesn't belong there to begin with. done: + #ifdef KERNEL /* * THIS IS RIDICULOUS... IT DOESN'T BELONG HERE! */ if ((mp == kernelmap) && kmapwnt) { kmapwnt = 0; wakeup((caddr_t)kernelmap); } + #endif KERNEL return; badrmfree: panic("bad rmfree"); } That's it for subr_rmap.c. Since #ifdef KERNEL is true when building the kernel, the non-optional changes are harmless there, and the optional one looks reasonable, too. With these changes in place, pstat.c can be fixed "properly" as follows. The first change makes variable "swapmap" external, the way subr_rmap expects to reference it. The second change is the quick fix from above. int dmmin, dmmax, nswdev; + struct map *swapmap; doswap() { struct proc *proc; int nproc; struct text *xtext; int ntext; - /* struct map *swapmap; */ int nswapmap; register struct proc *pp; int nswap, used, tused, free, waste; ... read(fc, swapmap, nswapmap * sizeof (struct map)); swapmap->m_name = "swap"; + swapmap->m_limit = (struct mapent *)(swapmap + nswapmap); nswap = getw(nl[SNSWAP].n_value); dmmin = getw(nl[SDMMIN].n_value); You can then take the code starting with: /* * Allocate 'size' units from the given * map. Return the base of the allocated space. * In a map, the addresses are increasing and the * list is terminated by a 0 size. * * Algorithm is first-fit. * and replace it and everything that follows it (~175 lines) with: #include </sys/sys/subr_rmap.c> panic(msg) char *msg; { printf("Bailing out with panic message `%s'\n", msg); exit(1); } to import the "real kernel allocation code" and bind up the dangling reference to panic(). Postscript: This was a very long posting to change a dozen lines of code. I hate long items, but I also get nervous about making changes from the net when I don't understand what they do. So I have attempted to explain what I thought was going on and why I changed things. I'd like to see more "tutorial" items about 4.2, and there are obviously a lot of people on the net that know pieces of 4.2 inside-out. Any good ideas how we can get that knowledge onto the net? John P. Linderman Software Sanitation Engineer allegra!jpl