[net.bugs.4bsd] 4.2 /etc/pstat -s is broken

jpl@allegra.UUCP (John P. Linderman) (06/26/84)

Synopis:
    /etc/pstat -s    does not report accurately on the swapping system

Symptoms:
    When /etc/pstat is invoked with the -s option, it is inclined to
    spew out a number of bogus "swap: rmap ovflo" messages and then
    give the wrong results.

Repeat by:
    Try it.  Go ahead.  Make your day.  My output looked like this:

    47566 used (2002 text), 152562 free, 18178 wasted, 2112 missing
    avail: swap: rmap ovflo, lost [200704,201024)
    swap: rmap ovflo, lost [59392,200000)
    4*1024 3*512 10*256 17*128

    Sometimes I'd get many more ovflo messages.

Quick fix:
    Insert the line marked with + between the other two lines from
    /usr/src/etc/pstat.c.

  	swapmap->m_name = "swap";
+ 	swapmap->m_limit = (struct mapent *)(swapmap + nswapmap);
  	nswap = getw(nl[SNSWAP].n_value);

Clean fix:
    Pstat determines the amount of available swap space by copying the
    the swap map out of the kernel, and then allocating blocks from
    this copy.  Of course, this gives reasonable results only if the
    allocation scheme that pstat uses on the copy parallels the scheme
    used by the kernel on the original.

    Pstat parallels the kernel allocation scheme because somebody
    copied most of /sys/sys/subr_rmap.c (where the kernel allocation
    code is defined) into pstat.c, then massaged it to work in that
    context.  The slip in transcribing the algorithm is not my main
    problem with the way it was done;  the slip can be fixed.  What
    bothers me more is that the kernel allocation scheme might change,
    and pstat won't track those changes.  [A suggested change to
    subr_rmap.c is included in this report, as if to prove that
    changes are not inevitable.]  The "right" way to fix the problem
    is to make some simple changes to subr_rmap.c and pstat.c that
    allow the latter to #include the former, making it much easier
    to track changes.  [Pstat can't just load the subr_rmap.o file,
    because it is compiled with KERNEL defined.]

    Fortunately, /sys/sys/subr_rmap.c is a nicely written piece of
    software, so the changes to both routines are relatively minor.
    First, the changes to subr_rmap.c.  We bracket the #includes
    with #ifdef KERNEL so we won't get multiple definitions when
    subr_rmap.c is included by pstat.c.  Then we add a comment
    pointing back to pstat, to warn people about the dependency.

    +   #ifdef	KERNEL
	#include "../h/param.h"
	#include "../h/systm.h"
	#include "../h/map.h"
	#include "../h/dir.h"
	#include "../h/user.h"
	#include "../h/proc.h"
	#include "../h/text.h"
	#include "../h/kernel.h"
    +   #endif	KERNEL

	...

	 * N.B.: the current implementation uses a dense array and does
	 * not admit the value ``0'' as a legal address, since that is used
	 * as a delimiter.
    +    *
    +    * This routine is included in pstat.c to check on swap allocation.
    +    * If changes are made here, pstat should be remade and checked
    +    * for compatibility.
	 */

    Here comes the "optional" change to subr_rmap.c.  What's going on
    is that blocks of size dmmax should be allocated on dmmax boundaries
    when dealing with swap space and multiple swap devices.  Variable
    "first" identifies how many bytes must be skipped to get to a
    dmmax boundary.  If we start out on a dmmax boundary, which is not
    unlikely, since we will often allocate dmmax-sized chunks starting
    at dmmax boundaries, first gets set to dmmax, not 0.  The effect is
    to leave a chunk of size dmmax that must be inserted into the
    allocation list, where it will be allocated the next time a request
    for dmmax bytes comes in (and force the list to collapse.)  The space
    isn't "wasted", but the effort inserting it into the list and
    collapsing it out is wasted, and there may be undesirable global
    effects because blocks get allocated 2,1,4,3,6,5... instead of
    1,2,3,4,5,6.  Anyway, I put in a simple check for the [not-so-]special
    case, and our swap disks haven't exploded, so it works.

		/*
		 * Search for a piece of the resource map which has enough
		 * free space to accomodate the request.
		 */
		for (bp = ep; bp->m_size; bp++) {
			if (bp->m_size >= size) {
				/*
				 * If allocating from swapmap,
				 * then have to respect interleaving
				 * boundaries.
				 */
				if (mp == swapmap && nswdev > 1 &&
    #ifndef NEW
    -				(first = dmmax - bp->m_addr%dmmax) < bp->m_size) {
    #else NEW
    +				(first = dmmax - bp->m_addr%dmmax) != dmmax &&
    +				first < bp->m_size) {
    #endif NEW
					if (bp->m_size - first < size)
						continue;
					addr = bp->m_addr + first;
					rest = bp->m_size - first - size;
					bp->m_size = first;
					if (rest)
						rmfree(swapmap, rest, addr+size);
					return (addr);
				}

    The #ifdef that follows logically removes a piece of code that
    everyone seems to agree doesn't belong there to begin with.

	done:
    +   #ifdef	KERNEL
		/*
		 * THIS IS RIDICULOUS... IT DOESN'T BELONG HERE!
		 */
		if ((mp == kernelmap) && kmapwnt) {
			kmapwnt = 0;
			wakeup((caddr_t)kernelmap);
		}
    +   #endif	KERNEL
		return;
	badrmfree:
		panic("bad rmfree");
	}

    That's it for subr_rmap.c.  Since #ifdef KERNEL is true when building
    the kernel, the non-optional changes are harmless there, and the
    optional one looks reasonable, too.  With these changes in place,
    pstat.c can be fixed "properly" as follows.  The first change makes
    variable "swapmap" external, the way subr_rmap expects to reference it.
    The second change is the quick fix from above.

	int dmmin, dmmax, nswdev;
    +   struct map *swapmap;

	doswap()
	{
		struct proc *proc;
		int nproc;
		struct text *xtext;
		int ntext;
    -	    /*  struct map *swapmap; */
		int nswapmap;
		register struct proc *pp;
		int nswap, used, tused, free, waste;

	...

		read(fc, swapmap, nswapmap * sizeof (struct map));
		swapmap->m_name = "swap";
    +		swapmap->m_limit = (struct mapent *)(swapmap + nswapmap);
		nswap = getw(nl[SNSWAP].n_value);
		dmmin = getw(nl[SDMMIN].n_value);

    You can then take the code starting with:

	/*
	 * Allocate 'size' units from the given
	 * map. Return the base of the allocated space.
	 * In a map, the addresses are increasing and the
	 * list is terminated by a 0 size.
	 *
	 * Algorithm is first-fit.
	 *

    and replace it and everything that follows it (~175 lines) with:

	#include </sys/sys/subr_rmap.c>

	panic(msg)
		char *msg;
	{
		printf("Bailing out with panic message `%s'\n", msg);
		exit(1);
	}

    to import the "real kernel allocation code" and bind up the
    dangling reference to panic().

Postscript:
    This was a very long posting to change a dozen lines of code.
    I hate long items, but I also get nervous about making changes
    from the net when I don't understand what they do.  So I have
    attempted to explain what I thought was going on and why I changed
    things.  I'd like to see more "tutorial" items about 4.2,
    and there are obviously a lot of people on the net that know
    pieces of 4.2 inside-out.  Any good ideas how we can get that
    knowledge onto the net?

John P. Linderman  Software Sanitation Engineer  allegra!jpl