[comp.unix.wizards] Load Avarage graph pattern

shani@GENIUS.TAU.AC.IL (Oren Shani) (05/30/91)

Can anyone tell me why the load avarge graph shows definite patterns of
exponential decay?

It seems that most (by far most) of the points of the LA graph are on lines
of the form c*exp(-a*(t-t0))+b, in which 'a' is some cosmic constant (I sampled
LA in severl computers, for several time periods, and got the same 'a' 
every time). Is this due to some policy of the system's time shearing mechanism,
or did I discover a new cosmic law?

O.S.

-- 
    ---  ---  Oren Shani (shani@genius.tau.ac.il)
   /  / /     Faculty of Engineering, Tel Aviv univ.
  /  /  ---   Israel
 /  /     /

mycroft@goldman.gnu.ai.mit.edu (Charles Hannum) (05/31/91)

In article <2155@ccsg.tau.ac.il> shani@GENIUS.TAU.AC.IL (Oren Shani) writes:

   Can anyone tell me why the load avarge graph shows definite
   patterns of exponential decay?

   It seems that most (by far most) of the points of the LA graph are
   on lines of the form c*exp(-a*(t-t0))+b, in which 'a' is some cosmic
   constant (I sampled LA in severl computers, for several time periods,
   and got the same 'a' every time). Is this due to some policy of the
   system's time shearing mechanism, or did I discover a new cosmic law?

I noticed this a long time ago, while running xload.  For some reason,
every 30 or 60 seconds, the load will suddenly jump and slowly decay
on an otherwise idle machine.

Note that the load average that xload displays is the average over the
past minutes -- which explains the slow decay.  But why the sudden
jump?  I've always attributed it to 'update' ('syncer' on some
systems), and ignored it.

shani@GENIUS.TAU.AC.IL (Oren Shani) (06/05/91)

I think I know already what the reason is. It's simply because avaraging a stairs
function (in this case, the length of the ready queue), over a constant period,
gives a seemingly exponential pattern. Infact, this is simply a derive of a
geometrical series... This is quite easy to show... 

Geez, I didn't think I will get so much response to that :-)
-- 
    ---  ---  Oren Shani (shani@genius.tau.ac.il)
   /  / /     Faculty of Engineering, Tel Aviv univ.
  /  /  ---   Israel
 /  /     /
 --- * --- *  "And that's the last time I trust a woman!"

torek@elf.ee.lbl.gov (Chris Torek) (06/09/91)

In article <2155@ccsg.tau.ac.il> shani@GENIUS.TAU.AC.IL (Oren Shani) asks:
>Can anyone tell me why the load avarge graph shows definite patterns
>of exponential decay?  It seems that most (by far most) of the points
>of the LA graph are on lines of the form c*exp(-a*(t-t0))+b, in which
>'a' is some cosmic constant ...

Surprisingly, I have seen no answers to this at all, when the reason is
trivial.  This exponential decay is there because it was designed to be
there.  The load average is computed by iterations of the formula:

				 -1/k		   -1/k
	average  =  average     e      +   n (1 - e    )
	       t	   t-1

where `t' is time, `n' is the instantaneous `number of runnable jobs',
and `k' is the number of discrete t's that occur per `load average time'.
Since the load average sample interval is 5 seconds, the one-minute
average has k=12 (12*5 = 60 seconds), the 5-minute average has k=60
(60*5 = 300 s = 5 min), and the 15-minute average has k=180 (180*5 =
900 s = 15 min).  When n is zero, as it typically is on workstations,
this reduces to

				-t/k
	average  =  average    e
	       t	   0

i.e., exponential decay.

The reason this is consistent across many systems is that it was done
at Berkeley for 4BSD and then copied into those systems.

In article <MYCROFT.91May31031208@goldman.gnu.ai.mit.edu>
mycroft@goldman.gnu.ai.mit.edu (Charles Hannum) writes:
>I noticed this a long time ago, while running xload.  For some reason,
>every 30 or 60 seconds, the load will suddenly jump and slowly decay
>on an otherwise idle machine. ... I've always attributed [these spikes]
>to 'update' ('syncer' on some systems), and ignored it.

This is almost certainly the correct explanation (/etc/update is counted
as runnable while waiting for the sync() system call, which it typically
issues once every 30 seconds).

In article <MEISSNER.91May31111801@curley.osf.org> meissner@osf.org
(Michael Meissner) writes:
>Another thing could be the activity to run the various xclock
>programs, and such.  I would imagine that on timesharing systems with
>lots of xterms, this could be significant.

xclock is particularly unlikely to add to the load average (although it
does add to the machine load!) because of a design misfeature in most
Unix systems.  The problem is that the system metering---the code that
computes the load average, cpu utilization for each process, and so
on---is run off the same clock as the scheduler.  Thus, at every clock
tick (or every n'th tick), we first see what is going on---nothing---
then we schedule the clock program, which runs for a short while and
goes back to sleep.

In particular, given the usage-sensitive CPU scheduling found in most
BSD-derived schedulers (which is to say every SunOS system through at
least SunOS 3.5, and probably 4.x as well), it is possible for a
program to use the clock to drive itself just after it is sampled as
sleeping, work until just before the next sample, and then go to sleep
waiting for the next clock tick.  By doing this it appears to use no
CPU time, hence gets fairly high priority (the kernel believes that it
has not got its fair share of CPU yet) and runs immediately on the next
clock tick, and thus is asleep again by the time the clock ticks
again.  This perpetuates the cycle.  Such a process can starve out
other processes.

The solution is simple but requires relatively precise clocks.
Fortunately such clocks exist on Sun SparcStations (unlike Sun-3s).
The 4BSD Sparc kernel will use them, once I get around to fixing that
part of the system.  (First I have to get running multi-user, now that
single user boots work, and write such minor [ahem] things as a frame
buffer driver and get enough going to make X run....  Sorry, Masataka,
but I intend to run X windows on *my* workstation, at least until
something better comes along. :-) )
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

shani@GENIUS.TAU.AC.IL (Oren Shani) (06/11/91)

In article <MEISSNER.91May31111801@curley.osf.org>, meissner@osf.org (Michael Meissner) writes:

|> Another thing could be the activity to run the various xclock
|> programs, and such.  I would imagine that on timesharing systems with
|> lots of xterms, this could be significant.

Yea, I think so too. Infact, this is why I started analayzing the LA garph -
I intend to use the graph's "ruffness" as a meausre to the volume of activity in
the computer, i.e. the number of little vi's and xterm's, etc, running arround.


|> --
|> Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
|> Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142
|> 
|> You are in a twisty little passage of standards, all conflicting.
          --- Geez! someone still remeber ol' Zork! :-)

-- 
    __    __  Oren Shani (shani@genius.tau.ac.il) 
   /  /  /    Faculty of Engineering, Tel Aviv university
  /  /   --   Israel
 /__/ . __/ . "Hold your temper" -- The caterpillar to Alice

dhesi@cirrus.com (Rahul Dhesi) (06/12/91)

In <14081@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:

>The load average is computed by iterations of the formula:
>				 -1/k		   -1/k
>	average  =  average     e      +   n (1 - e    )
>	       t	   t-1

Just to give credit where it is apparently due, we should note that the
concept of an exponentially-decaying measure of the number of jobs in
the ready queue was probably invented in the TENEX operating system,
which ran on DECsystem-10 machines.  BSD seems to have borrowed the
idea from there and given it a life of its own.
-- 
Rahul Dhesi <dhesi@cirrus.COM>
UUCP:  oliveb!cirrusl!dhesi

stodola@orion.fccc.edu (Robert K. Stodola) (06/12/91)

In article <14081@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:

>In article <MEISSNER.91May31111801@curley.osf.org> meissner@osf.org
>(Michael Meissner) writes:
>>Another thing could be the activity to run the various xclock
>>programs, and such.  I would imagine that on timesharing systems with
>>lots of xterms, this could be significant.

	[Much very interesting text deleted here.  Thanks Chris!]

>The solution is simple but requires relatively precise clocks. ...

One of my associates and I did a study of this a number of years ago (actually
it was with a PDP-11/70 running IAS).  We found that there was substantial
clock synchronized usage on the system.  The solution we found didn't require
very precise clocks at all.  Simply one whose rate was relatively prime to
the system clock.  We got very good results using a clock at 5x.y Hz (don't
remember the exact speed, but was a strange one in the 50's) on a system
driven off a 60Hz clock.  This was adequate to desynchronize the sampling
rate from the system rhythms.  Because it was a slow clock, it didn't add
much load to the system, but did gave an adequate statistical picture of
individual usage and load.