[net.unix-wizards] misleading load average

johnsson@decwrl.UUCP (Richard Johnsson) (10/23/85)

I have a VAX 785 running Ultrix 1.1 which has been reporting a load average
just over 3 all night although there is apparently nothing going on (ps
shows nothing running except itself; mon reports no processes in the run
queues and over 90% idle time).  I believe I have also seen this on 4.2
sytems here.

Can anyone explain or provide a fix?
-- 
	Richard Johnsson, DEC Western Software Lab, Palo Alto, CA
	UUCP:  {decvax,ucbvax,ihnp4}!decwrl!johnsson
        ARPA:  johnsson@decwrl.dec.com     DEC ENet: rhea::johnsson

root%bostonu.csnet@CSNET-RELAY.ARPA (BostonU SysMgr) (10/24/85)

>From: johnsson@decwrl.UUCP (Richard Johnsson)
>
>I have a VAX 785 running Ultrix 1.1 which has been reporting a load average
>just over 3 all night although there is apparently nothing going on (ps
>shows nothing running except itself; mon reports no processes in the run
>queues and over 90% idle time).  I believe I have also seen this on 4.2
>sytems here.
>
>Can anyone explain or provide a fix?

It is hard without some info, here is what you need:

First, the load average is basically an indication of how many jobs are
in the run queue, there are only two possible explanations for what you
see (that I can tell):

	a) something is corrupted in your kernel (not likely)
	b) ps is lying to you because things are changing too fast

I like (b), my first guess is a bad tty line and a getty dying/forking at
a high rate or some similar problem. Try 'iostat' to see if any I/O is
going on, possibly 'pstat' to try and track down where it is coming from.
Try this:

ps augx > foo1;sleep 5;ps augx > foo2;diff foo1 foo2

if nothing is really going on the diff should be short, watch the TIME
column and the PID column carefully, perhaps you are missing the fact
that the same named process is showing up repeatedly with a different
PID?

Just a guess from the outside. I can believe that some of the meters are
inaccurate enough to miss rapidly created/dying processes.

	-Barry Shein, Boston University

bilbo.jbrown@ucla-locus.ARPA (Jordan Brown) (10/26/85)

The load average is calculated by examining the run queue every second,
or something like that.  This is done _after_ alarms go off, but _before_
the programs get to run, so a program which sleeps for a second, does a little,
sleeps for a second, etc, will up the load average without affecting actual
system load significantly.  "tail -f" is one example of this; there are other
programs which look for logins and logouts which do similar things.