[net.unix-wizards] 4.2 scheduler

tupper@wanginst.UUCP (John Tupper) (11/21/85)

*** nothing cute here ***

We (not Wang Institute, a different we) recently switched from BSD 4.1 to
BSD 4.2. We have found 4.2 to be SLOWER than 4.1 by alot!

Disk io is faster, but cpu utilization seems to be way down. Sometimes
our vax750 will act loaded when there's only one process running (several
others hanging around doing nothing).

As far as I know we have the original 4.2 (have there been updates?).

Any help on the following questions would be appreciated.

	Has anyone else experienced the same thing (and what did
	you do about it).

	Are the different releases (updates) of 4.2?

	Is there any tuning we can perform to the kernel to speed
	things up?

Many thanks for whatever help you can offer.
-- 
John Tupper                              tupper@wanginst        (Csnet)
Wang Institute of Graduate Studies       wanginst!tupper        (UUCP)
Tyng Road, Tyngsboro, MA 01879           (617) 649-9731

chris@umcp-cs.UUCP (Chris Torek) (11/22/85)

The biggest slowdown in 4.2 was probably the long file names.
These slow down namei() quite a bit; and kernel profiles on our
machines have shown that namei() is by far the top competitor for
user CPU time of all the non-interrupt level code.

The fix is to install 4.3 as soon as it is out.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

friesen@psivax.UUCP (Stanley Friesen) (11/26/85)

In article <1360@wanginst.UUCP> tupper@wanginst.UUCP (John Tupper) writes:
>
>We (not Wang Institute, a different we) recently switched from BSD 4.1 to
>BSD 4.2. We have found 4.2 to be SLOWER than 4.1 by alot!
>
>As far as I know we have the original 4.2 (have there been updates?).
>
>Any help on the following questions would be appreciated.
>
>	Has anyone else experienced the same thing (and what did
>	you do about it).

	Yes, it happened here too, and there is little that can be
done except hope that 4.3 becomes available soon, since 4.3 is
essentially 4.2 with *significant* speed improvements.
>
>	Are the different releases (updates) of 4.2?

	As far as i know the only "update" is to 4.3, which is only in
Beta testing right now.
>
>	Is there any tuning we can perform to the kernel to speed
>	things up?
>
	I doubt it, at least not significantly. The problem is *not*
the scheduler, it is a rather inefficient implementation of a number
of system services. The critical routine 'namei' is quite slow, due to
the need to recognise and correctly handle sockets. Also pipes now
have a few more layers of software to run them. Some of the
bookkeeping operations for the new Fast File System are rather CPU
intensive. The net result is that a 4.2 system spends 30 to 50% of
its time in the kernel(verify this by running vmstat)!! This is quite
high. As I understand it 4.3 has corrected many of these problems.

-- 

				Sarima (Stanley Friesen)

UUCP: {ttidca|ihnp4|sdcrdcf|quad1|nrcvax|bellcore|logico}!psivax!friesen
ARPA: ttidca!psivax!friesen@rand-unix.arpa

guy@sun.uucp (Guy Harris) (11/28/85)

> The critical routine 'namei' is quite slow, due to the need to recognise
> and correctly handle sockets.

Sockets?  "namei" doesn't know sockets from a hole in the ground!  "namei"
is slow on just about *any* UNIX system; it just happens to be slower in the
4.2BSD file system as opposed to the V7 file system that all other UNIXes
use because of the longer names and more complicated directory structure,
and because you may encounter symbolic links which can lengthen the path you
end up searching.

> Also pipes now have a few more layers of software to run them.

To be precise, pipes now work through the networking code rather than
through the file system code.

> The net result is that a 4.2 system spends 30 to 50% of its time in the
> kernel(verify this by running vmstat)!! This is quite high.

Well, it *can* spend that much time in the kernel, depending on what you're
doing.  30 to 50% isn't *that* high (I've heard MVS spends at least 50% of a
370-family machine's cycles); the paper Berkeley put out on 4.2/4.3
performance tuning indicated that 4.1 spent about 45% of its time in the
kernel, 4.2 spent about 55% of its time there, and that they'd cut it back
to 45% with their tuning.

> As I understand it 4.3 has corrected many of these problems.

Yes; they've sped up "namei" with a cache of name-to-inode translations, as
well as other changes.  They may have sped up the flow of data through the
network code, which would speed up pipes (and TCP transfers, and...).

	Guy Harris

friesen@psivax.UUCP (Stanley Friesen) (12/02/85)

In article <3043@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>
>> The net result is that a 4.2 system spends 30 to 50% of its time in the
>> kernel(verify this by running vmstat)!! This is quite high.
>
>Well, it *can* spend that much time in the kernel, depending on what you're
>doing.  30 to 50% isn't *that* high  ...

	Well, I  consider 30% to be reasonable, but I find 50% and up
to be excessive, after all the machine is really there to execute
*user* code, not the kernal!

> the paper Berkeley put out on 4.2/4.3
>performance tuning indicated that 4.1 spent about 45% of its time in the
>kernel, 4.2 spent about 55% of its time there, and that they'd cut it back
>to 45% with their tuning.
>
	Actually, on our system I noticed a greater change. Under 4.1
with a moderate load the system time averaged about 35%, under 4.2 it
jumped the the cited 55% value.

-- 

				Sarima (Stanley Friesen)

UUCP: {ttidca|ihnp4|sdcrdcf|quad1|nrcvax|bellcore|logico}!psivax!friesen
ARPA: ttidca!psivax!friesen@rand-unix.arpa

herbie@polaris.UUCP (Herb Chong) (12/05/85)

In article <3043@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>Well, it *can* spend that much time in the kernel, depending on what you're
>doing.  30 to 50% isn't *that* high (I've heard MVS spends at least 50% of a
>370-family machine's cycles); the paper Berkeley put out on 4.2/4.3
>performance tuning indicated that 4.1 spent about 45% of its time in the
>kernel, 4.2 spent about 55% of its time there, and that they'd cut it back
>to 45% with their tuning.

to set the record straight, a properly configured and tuned MVS system
running flat out will spend between 5 and 15 percent of it's time in
the kernel or related system programs averaged over a 15 minute
interval.  a 4.2 bsd system properly configured and tuned will spend
about 25 to 35 percent of it's time executing in the kernel.  if you
are willing to live with 45 percent system time averaged over 15
minutes, your system is overloaded for good response.  from what i hear
though, most people prefer to run their 4.2 systems that way.  for a 4M
780, the 15 minute load average rising above 10 is a warning that
system limits are being reached.  any time the 1 minute load average is
over 14 for more than 30 seconds, the system is beginning to thrash.
kernel character echo at 9600 baud is noticeably slower and and almost
stops when it reaches about 20.

Herb Chong...

I'm still user-friendly -- I don't byte, I nybble....

VNET,BITNET,NETNORTH,EARN: HERBIE AT YKTVMH
UUCP:  {allegra|cbosgd|cmcl2|decvax|ihnp4|seismo}!philabs!polaris!herbie
CSNET: herbie.yktvmh@ibm-sj.csnet
ARPA:  herbie.yktvmh.ibm-sj.csnet@csnet-relay.arpa
========================================================================
DISCLAIMER:  what you just read was produced by pouring lukewarm
tea for 42 seconds onto 9 people chained to 6 Ouiji boards.

guy@sun.uucp (Guy Harris) (12/07/85)

> >Well, it *can* spend that much time in the kernel, depending on what you're
> >doing.  30 to 50% isn't *that* high  ...
> 
> 	Well, I  consider 30% to be reasonable, but I find 50% and up
> to be excessive, after all the machine is really there to execute
> *user* code, not the kernal!

Well, let me ask a couple of questions:

	1) how much work would that user code be able to do if
	   the kernel (that's kernEl, not kernAl, by the way, it's
	   spelled correctly in the passage to which you're referring
	   and it's time programmers learned to spell it correctly)
	   weren't doing opens, reads, writes, etc. for you?

	2) how much of the user code in question is actually
	   doing "support work" for the part of the user code
	   that's doing the real work?

What value was in the mode bit(s) of your processor when a particular piece
of code was executing has little to do with whether that code is performing
useful work.  If you built an OS for your machine which ran all code in
kernel mode (or its equivalent), including "user" programs, your system
would spend 100% of its time in the kernel.  If you built an OS with very
primitive kernel operations, and shoved most "kernel" functions into
userland, it would probably spend the bulk of its time in user mode.
Looking at the percentage of time spent in system vs. user mode might give
you a clue as to what piece of code needs to be sped up
(/{un,vmun,xen,whatever}ix vs. /whatever/your_application) but it's not
necessarily "wrong" to have 50% of your time spent in the kernel - it
depends on what the stuff running on your system is doing.

	Guy Harris

guy@sun.uucp (Guy Harris) (12/07/85)

> to set the record straight, a properly configured and tuned MVS system
> running flat out will spend between 5 and 15 percent of it's time in
> the kernel or related system programs averaged over a 15 minute
> interval.

You work for IBM, I don't, so I'll take your word for it.  But...

> a 4.2 bsd system properly configured and tuned will spend
> about 25 to 35 percent of it's time executing in the kernel.  if you
> are willing to live with 45 percent system time averaged over 15
> minutes, your system is overloaded for good response.

Well, I don't know what the averaging period was for the system they were
talking about in the Berkeley paper.  However, I've seen systems running a
"normal" load showing anywhere between a 90% user/ 1-10% system split to a
30% user / 60% system split ("instantaneous" figures from "vmstat").
Another person here whose opinion I respect says that the 45% figure is
reasonable.

> for a 4M 780, the 15 minute load average rising above 10 is a warning that
> system limits are being reached.  any time the 1 minute load average is
> over 14 for more than 30 seconds, the system is beginning to thrash.
> kernel character echo at 9600 baud is noticeably slower and and almost
> stops when it reaches about 20.

gorodish$ uptime
  8:14pm  up 10 days,  7:57,  1 users,  load average: 20.88, 20.45, 19.71

At that time, kernel character echo was not noticeably slower than when the
system was unloaded.  Furthermore, the "uptime" command came back within a
couple of seconds after I hit the return.

The load average is just an average of the length of the run queue over a
particular period of time.  It may *indicate* how loaded the system is, but
its numerical value doesn't necessarily directly indicate anything.  In this
particular case, I kicked off 20 programs doing a "branch-to-self".  The
load average of ~20 should not be surprising.  Kernel character echo is done
at interrupt level, so user-mode activity should not greatly affect it (it
may affect interrupt latency, or toss kernel code or data from a cache or
toss page table entries from a translation lookaside buffer - admittedly,
the latter two don't apply on a Sun).

Your noting the memory size of the VAX in question, and your reference to
"system" limits rather than "CPU" limits, indicate that you're thinking of a
particular job mix.  If you pile on more jobs of some sort which consumes
memory, then your paging rate will go up and cause more I/O traffic.  If the
system is spending enough time in the disk driver interrupt routine, then
yes, it could lock out terminal interrupts and slow down echoing.

If, however, you have a bunch of jobs FFTing or inverting matrices or doing
ray-tracing or some other sort of compute-intensive work, and they have
enough physical memory so that they do little or no paging, you should see
minimal impact on interrupt-driven activities (such as echoing) and it
shouldn't totally destroy interactive activities (such as relatively quick
commands) - the UNIX scheduler does give *some* weight to patterns of CPU
usage, after all.  (Yes, screen editors aren't happy, but screen editors do
eat a lot of CPU, and that's the scarce resource in this example.)

	Guy Harris

thomas@utah-gr.UUCP (Spencer W. Thomas) (12/07/85)

I think it's also a question of what the kernel is doing.  For example,
on the Mac, if you consider everything in the ROM "the kernel", the
"system" spends most of its time in the kernel for an average
application.  But this is just because it provides so much functionality
that you don't have to supply for yourself.

-- 
=Spencer   ({ihnp4,decvax}!utah-cs!thomas, thomas@utah-cs.ARPA)
	"Ask not what your kernel can do for you, but rather what you
	 can do for you kernel!"

herbie@polaris.UUCP (Herb Chong) (12/09/85)

In article <3062@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>Well, I don't know what the averaging period was for the system they were
>talking about in the Berkeley paper.  However, I've seen systems running a
>"normal" load showing anywhere between a 90% user/ 1-10% system split to a
>30% user / 60% system split ("instantaneous" figures from "vmstat").
>Another person here whose opinion I respect says that the 45% figure is
>reasonable.

okay, i admit i didn't indicate what work was being run.  try with 10
vi sessions and 8 troff's and leave 2 for trivial commands and you get
what i got.  the load average is not a good indicator of the work
being done in general on a system, but for a given process mix, it is.

BTW, i once started up 20 programs that allocate and access randomly
3Mbytes of arrays each.  i tried an uptime and it never came back even
though i waited about 10 minutes.  since i was running single user when
i was "benchamarking" the entire system, i knew that those processes
were the only other processes running besides myself.  character echo
stopped completely and never did i see anything more after i typed in
uptime.  this special case drove our 4M 780 running 4.2 into thrashing
once more than 4 of the programs were running.  running about 10
long CPU and IO bucholz benchmark scripts did the same but that was
because I/O was competing with paging I/O.

the profile that we ran on the system later under "live" conditions
indicated that 20% of the time was spent in namei and about 35% of the
time in paging routines.  under the conditions we wanted to use the
vax, adding more memory would have helped a lot but then fairly quickly
we would have become CPU bound in namei.  hopefully, 4.3 bsd addresses
this problem successfully.

our vax that i worked on was a heavily loaded machine doing what vaxes
are not particularly good at when there's not enough memory.  under
controlled conditions, i forced the 780 into thrashing and for the
workload we usually ran. 14 load average was it.  more than 15 minutes
and the system more or less rolled over permanently unless we started
renicing processes, effectively suspending them.  the load control
system posted a while back to net.sources attempts the same kind of
thing automatically on your processor and I/O hogs.  the conclusion i
drew from my analysis of our system was that more memory and more CPU
was required.  i recommended an upgrade to a 785 and adding between 4M
and 12M of memory.

a different system ran student programs that were CPU bound and tended
to fork many processes.  it rolled over at a load average of about 30.
it also was a 4M 780.  kernel echo became noticeable at about load
average 25 or so.  i don't pretend to be an expert on implementation of
any of the kernel stuff, but i do know about system modelling and
analysis and have spent quit a bit of time analyzing both MVS and
4.2bsd systems trying to squeeze as much as possible out of them.

Herb Chong...

I'm still user-friendly -- I don't byte, I nybble....

VNET,BITNET,NETNORTH,EARN: HERBIE AT YKTVMH
UUCP:  {allegra|cbosgd|cmcl2|decvax|ihnp4|seismo}!philabs!polaris!herbie
CSNET: herbie.yktvmh@ibm-sj.csnet
ARPA:  herbie.yktvmh.ibm-sj.csnet@csnet-relay.arpa
========================================================================
DISCLAIMER:  what you just read was produced by pouring lukewarm
tea for 42 seconds onto 9 people chained to 6 Ouiji boards.

chris@umcp-cs.UUCP (Chris Torek) (12/10/85)

In article <324@polaris.UUCP> herbie@polaris.UUCP (Herb Chong) writes:

> [under heavy paging load] character echo stopped completely ...

Sounds like you ran into a driver bug that hung the system at IPL
15 or higher.  Character echo should NEVER stop, and noticeable
slowdown should be EXTREMELY rare even under the worst disk load
you can generate.

> the profile that we ran on the system later under "live" conditions
> indicated that 20% of the time was spent in namei and about 35% of
> the time in paging routines.  under the conditions we wanted to
> use the vax, adding more memory would have helped a lot but then
> fairly quickly we would have become CPU bound in namei.  hopefully,
> 4.3 bsd addresses this problem successfully.

It does.  We see an `average' name cache hit rate of at least 80%,
meaning that only 20% of the translations need to look at the buffer
cache/disks.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/10/85)

It is, however, true that many of the 4.2BSD kernel
"efficiency" hacks win only if CPU cycles are cheap
and the system is fairly large.  On smaller systems,
good old Thompson & Ritchie UNIX may run better.

ron@BRL.ARPA (Ron Natalie) (12/11/85)

Oh, foo.  First, the 4BSD scheduler is a VAX scheduler.  If you
are going to move it to any other machine, the porters should
develop the scheduler to their own needs.

Frankly, the Richie and Thompson scheduler is obsolete even for
the elevens.  I doubt that it is a really valid suggestion.

-Ron

friesen@psivax.UUCP (Stanley Friesen) (12/11/85)

In article <3058@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>>	(me)
>> 	Well, I  consider 30% to be reasonable, but I find 50% and up
>> to be excessive, after all the machine is really there to execute
>> *user* code, not the kernal!
>
>Well, let me ask a couple of questions:
>
>	1) how much work would that user code be able to do if
>	   the kernel weren't doing opens, reads, writes, etc. for you?
>
	Except that the older versions were doing all that at an
overhead of only 30-40%, why should I "pay" more for the same
functions? In fact 90% of our work here is compiling and editing, with
a little troff thrown in. As far as I know these utilities are little
changed between 4.1 and 4.2. Certainly they expect much the same
services out of the kernel. The net result is that for the same user
load, with the same job mix, we are getting much slower response
time.

>	2) how much of the user code in question is actually
>	   doing "support work" for the part of the user code
>	   that's doing the real work?
>
	I don't really know, but I suspect that specialized,
tailor-made support code is going to be more efficient than the
generalized support available from the kernel.
-- 

				Sarima (Stanley Friesen)

UUCP: {ttidca|ihnp4|sdcrdcf|quad1|nrcvax|bellcore|logico}!psivax!friesen
ARPA: ttidca!psivax!friesen@rand-unix.arpa

gwyn@BRL.ARPA (VLD/VMB) (12/11/85)

I would hope that 45% of the CPU is not being spent in
the scheduler!

greg@ncr-sd.UUCP (Greg Noel) (12/12/85)

In article <893@psivax.UUCP> friesen@psivax.UUCP (Stanley Friesen) writes:
>	I don't really know, but I suspect that specialized,
>tailor-made support code is going to be more efficient than the
>generalized support available from the kernel.

Er, no.  There are two reasons why not.  First, "tailor-made" support code
is often \less/ efficient than the corresponding general-but-highly-optimized
code in the kernel simply because more attention has been paid to making it
efficient, and secondly, a global resource manager can often make \better/
decisions than a local, narrowly-focused one.  That's why you are willing to
"pay" for such services -- it may not be quite as optimal for the individual
user, but it gives better system-level performance.

The concept that it can be more efficient to expend cycles in the kernel seems
to be a difficult one for a lot of people to comprehend.  I once argued this
point with Grace Hooper and could never get her to accept that anything run in
kernel mode (really, anything that turned on the supervisor-state light on the
360 panel) could be anything but overhead.  I suspect that this is because on
an IBM-class operating system, where the user program has to carry around all
the access routines and run them in user mode, most of the supervisor-state
code \is/ overhead.  'Tain't so with Unix and to blindly apply the same methods
to measure "efficiency" is to end up comparing apples and oranges.
-- 
-- Greg Noel, NCR Rancho Bernardo    Greg@ncr-sd.UUCP or Greg@nosc.ARPA