[comp.sys.sequent] load average

jjb@sequent.UUCP (Jeff Berkowitz) (04/28/89)

In article <67727@pyramid.pyramid.com>,
  csg@pyramid.pyramid.com (Carl S. Gutekunst) writes:
>
>There's a reason for that. Dynix divides the load average by the number of
>CPUs you have. If uptime(1) displays 1.6, and you have four CPUs, then the
>load average is really 6.4.
>

"really"? :-)

Very early in the history of DYNIX, Sequent experimented with both
alternative implementations of load average.  The existing one was
selected because it more accurately described the behavior that
users perceived.

In addition, some daemons refuse to run if the "load average"
is very high.  Since customers can also write code that checks
this, the computed load average should reflect reality; each
processor can simultaneously run a program.

How does a four processor 9845 handle load average?  I presume
from your comment that Pyramid does not divide by the number of
processors?  Does this mean performance does not scale linearly?
-- 
Jeff Berkowitz N6QOM			uunet!sequent!jjb
Sequent Computer Systems		Custom Systems Group

steve@polyslo.CalPoly.EDU (Steve DeJarnett) (04/28/89)

In article <15248@sequent.UUCP> jjb@sequent.UUCP (Jeff Berkowitz) writes:
>In article <67727@pyramid.pyramid.com>, csg@pyramid.pyramid.com (Carl S. Gutekunst) writes:
>>There's a reason for that. Dynix divides the load average by the number of
>>CPUs you have. If uptime(1) displays 1.6, and you have four CPUs, then the
>>load average is really 6.4.
>"really"? :-)
>
>Very early in the history of DYNIX, Sequent experimented with both
>alternative implementations of load average.  The existing one was
>selected because it more accurately described the behavior that
>users perceived.

	Load average was (long ago) defined to be "the average number of jobs
in the Run queue over the last 1,5,15 minutes".  To quote directly from the 
Dynix Version 3.0.4 man page for 'w':

	The load average numbers give the number of jobs
     in the run queue averaged over 1, 5 and 15 minutes.

Are there multiple run queues on a Balance 8000??  I've never studied the
implementation of Dynix (lack of source makes it more difficult also :-), but
I'd suspect there's one run queue, and processors grab the next job eligible
when they're free.  Am I correct here??

	If so, then the notion of dividing the # of jobs in the run queue
by the number of processors to obtain load average is in conflict with what
the manuals say you're doing.  Of course, as has been pointed out, load averages
are merely subjective measurements of your system "response".  As we all know,
system "response" depends on a great number of things.  

	So, the question boils down to this:  Do you want to generate load 
averages "like the rest of the world" that reflects how many jobs are in your
run queue, and then have some added caveat of "but we have N processors to run
these M jobs on, so the effective load (or some such term as that) is really
M/N", or do you generate load averages the way you currently do with "load 
average is dependent on the number of processors AND the number of processes
(and, oh, therefore our load averages MAY or MAY NOT compare directly with 
those of machine X)".

	Personally, I prefer the former.  This gives you a way of comparing
Apples to Apples (figuratively, not literally).  If there's a load average of
7.5 on my Pyramid, and a load average of 7.5 on my Sequent, I would know that 
they are measuring the same thing.  Then if I log into my Sequent and find that
response time is faster (or slower), I would have a means of direct comparison
that is quantifiable (sp??).

	I realize that in the end, this whole thing boils down to a religious
issue over what you believe is "right".  I personally (if it wasn't already
apparent) believe that "number of jobs in the run queue" is the appropriate
measure.  That's just me, though.

	One last question.  When Sequent computes their load average, do they
take into account the possibility that some of the processors might not have
been available during the last 1,5,15 minutes??  If I have 2 processors running
user processes, but the Sequent is basing its calculation of load average on
10 processors (or how many there actually are in my system), then a load 
average based on that premise is not a truly representative number.

>How does a four processor 9845 handle load average?  I presume
>from your comment that Pyramid does not divide by the number of
>processors?  Does this mean performance does not scale linearly?

	I don't think they do on a 2 processor 98x, so I doubt things are that
different on a 9845 (our machine's kernel actually believes that it's a 9810,
but that's a totally different story).  Load average on a Pyramid (correct me
if I'm wrong, Carl) is "Average # of jobs in the run queue over the last 1, 5,
and 15 minutes".  The fact that you have 4 processors there to keep things 
going makes it all the better.

	One other question springs to mind here (sorry this is getting very 
long):  Given more processors to run jobs, won't the jobs that are there finish
(hopefully) sooner than they would on a system with fewer of the same 
processors, and therefore result in there being fewer jobs in the run queue at
any given moment in time overall??  This would seem to be another argument (if
it is indeed true) against Sequent's method of load average computation.

>Jeff Berkowitz N6QOM			uunet!sequent!jjb
>Sequent Computer Systems		Custom Systems Group

-------------------------------------------------------------------------------
| Steve DeJarnett            | Smart Mailers -> steve@polyslo.CalPoly.EDU     |
| Computer Systems Lab       | Dumb Mailers  -> ..!ucbvax!voder!polyslo!steve |
| Cal Poly State Univ.       |------------------------------------------------|
| San Luis Obispo, CA  93407 | BITNET = Because Idiots Type NETwork           |
-------------------------------------------------------------------------------

csg@pyramid.pyramid.com (Carl S. Gutekunst) (04/28/89)

Hi Jeff!

In article <15248@sequent.UUCP> jjb@sequent.UUCP (Jeff Berkowitz) writes:
>How does a four processor 9845 handle load average?

The total number of processes in run state or in non-interruptable sleep.

>In addition, some daemons refuse to run if the "load average" is very high.

Anything besides sendmail?

>Since customers can also write code that checks this, the computed load ave-
>rage should reflect reality; each processor can simultaneously run a program.

Any program that decides to alter its behavior based on load average *better*
make that value run-time selectable. Sendmail does.

I understand that you are trying to make this relatively meaningless number
more useful and intuitive. But given per-process multiprocessing, I don't see
how "more processors" differs from "faster processors." Taken to its logical
extreme, it would seem that every vendor should divide their load average by
their VUPS rating. :-) 

It's up to the system administration to determine what an "acceptable" load
average is. This is going to vary based on the needs of the site, and the type
of machine they are using. If I add more horsepower to my machine, then in my
mind I've increaed the allowable load average, regardless of whether I did it
by adding bigger processors (9805 to 9815, or Balance to Symmetry) or by
adding more processors. If the load average is divided by the number of CPUs,
then the calculation is distorted; I end up mentally multiplying the number I
see by a magic factor to turn it back into something I can use. 

On the other hand, there *is* the warm fuzzy of installing more processors
and seeing the load average drop. Me, I'm not real wild about warm fuzzies.
(A warm tribble, perhaps....)

>Does this mean performance does not scale linearly?

How is this relevant to the discussion?

To answer the question, though -- How linear is linear? :-) The Pyramid 9000
is within 5% of linear. I gather that it's not quite as flat as a Balance,
Symmetry, Multimax, or Elxsi; but it seems to be considerably more linear
than a VAX 8800.

<csg>

arosen@hen.ulowell.edu (MFHorn) (04/30/89)

In article <10847@polyslo.CalPoly.EDU> steve@polyslo.CalPoly.EDU (Steve DeJarnett) writes:
> Are there multiple run queues on a Balance 8000??  I've never studied the
> implementation of Dynix (lack of source makes it more difficult also :-), but
> I'd suspect there's one run queue, and processors grab the next job eligible
> when they're free.  Am I correct here??

I believe the BSD kernel maintains something like 30 run queues (Chris Torek
could probably give more accurate information).  The Dynix kernel is a pretty
close clone of the 4.2 kernel, so they probably have just as many.

I do know that Dynix maintains a queue (one per processor?) of jobs that have
been 'affinitied' to a processor.  The scheduler checks this queue before the
'normal' run queues.

> One last question.  When Sequent computes their load average, do they
> take into account the possibility that some of the processors might not have
> been available during the last 1,5,15 minutes??  If I have 2 processors running
> user processes, but the Sequent is basing its calculation of load average on
> 10 processors (or how many there actually are in my system), then a load 
> average based on that premise is not a truly representative number.

There are system calls available to find out how many processors are online
at the time of the call.  The kernel function that computes the load averages
(loadav in vm_sched.c) divides by 'nonline'.  If a processor goes offline,
the load averages won't be accurate, but after 15 minutes they will be.

--
Andy Rosen           | arosen@hawk.ulowell.edu | "I got this guitar and I
ULowell, Box #3031   | ulowell!arosen          |  learned how to make it
Lowell, Ma 01854     |                         |  talk" -Thunder Road
		RD in '88 - The way it should've been