[comp.unix.microport] V.3 + top

fox@marlow.uucp (Paul Fox) (10/13/88)

Well, I've finally done it. I've finally worked out how to relate
the 'proc' structure to the 'user' structure. I've been needing
this in order to modify the top program to work under V.3.

Boy was that a difficult job - over 8 solid hours looking at the
kernel with 'crash' and the proc.h & user.h include files !!

Now I've done it, I think I can now begin to understand why
V.3 systems are soooo much slowwwer than Xenix systems..

Just to let you know why I think this is so...
We have Uport & ISC V.3 systems running on 4MB 16&20MHz 386's. We
also have a 16MHz Xenix system with 2MB of memory. The Xenix system
beats the pants off of our 20MHz 386's. I've always want ed to
know why. I always presumed this was due to the quality of the
compiler (MSC 5) and the mods which SCO & Msoft have done to the kernel.

Well, I believe differently now.

Xenix uses the old Berkely way of doing memory allocation. Specifically
a fixed 'u' area in which the user area of the currently running process
gets mapped. In order to write a 'ps' type of program you
would look at the proc->p_addr field to determine where in physical
memory the 'u' area of a process would reside. If one of the flags
said it was swapped out, then the p_addr field points to the
address of the 'u' area on /dev/swap.

After hours of trying to understand region tables and page directory 
tables on the 386, I have it sussed.

On V.3, the proc->p_ubptbl points to two page table entries which
between them map the top half and bottom half of the
'u' area. (Two are needed since the u area is > 4K, and < 8K).
The first 4K can be ignored since that maps onto the
kernel stack.

OK, so in my system I have a proc limit of 100 processes. It seems
to me that the u area for these 100 procs pretty much resides in
physical memory all the time. Net result: 100 * 8K taken up in
'u' areas. Thats 800K. Add this to the 500K disk buffers + 1MB for
the kernel. We end up with 2.3MB used up. That leaves me with 1.7MB
for my processes before we subtract the other bits needed by my kernel.

On our Xenix system with 2MB, we have about 1.3MB free for processes.
So as you see the 1.3MB figure and 1.7MB figure are pretty close.

Add to this the mods MS & SCO have done to Xenix to speed it
up and the presumably smaller 'u' areas, then its not
surprising that Xenix wizzes by.

Any comments anyone ? 


=====================
     //        o      All opinions are my own.
   (O)        ( )     The powers that be ...
  /    \_____( )
 o  \         |
    /\____\__/        Tel: +44 628 891313 x. 212
  _/_/   _/_/         UUCP:     fox@marlow.uucp

vandys@hpcupt1.HP.COM (Andrew Valencia(Seattle)) (10/21/88)

/ hpcupt1:comp.unix.microport / fox@marlow.uucp (Paul Fox) /  3:30 pm  Oct 12, 1988 /
>OK, so in my system I have a proc limit of 100 processes. It seems
>to me that the u area for these 100 procs pretty much resides in
>physical memory all the time. Net result: 100 * 8K taken up in
>'u' areas. Thats 800K. Add this to the 500K disk buffers + 1MB for
>the kernel. We end up with 2.3MB used up. That leaves me with 1.7MB
>for my processes before we subtract the other bits needed by my kernel.

	I'm doing the port of V.2 regions to our 680x0 line of processors,
and have had some experience with this stuff at this point :->.  Although
my comments really only apply to the code I'm porting (which is actually
a port of V.2 to our RISC line), some of it probably applies.

	A region represents the actual set of pages representing the object.
Not all of the pages need be present.  A pregion supplies a view onto the
region.  Thus, for memory-mapped files, a region would represent the whole
file, whereas a pregion might give a window of only the first couple pages.
We use this to exec a.outs--a text region simply views the text image portion
of the a.out file.

	For the U area, we actually allocate 3 pages--one for the U area,
one for the kernel stack, and one for the kernel red zone.  The red zone
doesn't really have a page, and traps the kernel if it over-grows its stack.
While a user is running, there are actually two pregions for the U area--one
which maps it in kernel virtual address space, and another to map it into
the user's address space.  This latter is done mostly for compatibility,
although some supporting code for signals (the "signal trampoline" code) is
also located there.

	Our pageout daemon (actually known as "vhand") executes a typical
"scan and age" algorithm.  It works its way through the the regions of a
user's virtual address space (which actually hang off a "vas" structure),
arranging for some pages to be freed (if they are old and unmodified) or
written out (if they are old and modified).  The U area pregions hang off
the vas, and thus--you guessed it--can be paged out.  When the process later
runs, the pages are brought back in and execution continues.

	The performance problems we had with regions didn't have much to
do with simple memory usage.  We found some of their locking techniques
naive and wasteful.  We also found some bugs with how they managed pages
over time.  On the other hand, with some cleanup we have found the code
to be much more suitable for techniques like mapped files, shared libraries,
and other fancy VM features than, say, the 4.2 VM code.

				Andy

dave@micropen (David F. Carlson) (10/22/88)

In article <10770005@hpcupt1.HP.COM>, vandys@hpcupt1.HP.COM (Andrew Valencia(Seattle)) writes:
> comp.unix.microport / fox@marlow.uucp (Paul Fox) /  3:30 pm  Oct 12, 1988 /
> >OK, so in my system I have a proc limit of 100 processes. It seems
> >to me that the u area for these 100 procs pretty much resides in
> >physical memory all the time. Net result: 100 * 8K taken up in
> >'u' areas. Thats 800K. Add this to the 500K disk buffers + 1MB for
> >the kernel. We end up with 2.3MB used up. That leaves me with 1.7MB
> >for my processes before we subtract the other bits needed by my kernel.
> 
> 	For the U area, we actually allocate 3 pages--one for the U area,
> one for the kernel stack, and one for the kernel red zone.  The red zone
> doesn't really have a page, and traps the kernel if it over-grows its stack.
> While a user is running, there are actually two pregions for the U area--one
> which maps it in kernel virtual address space, and another to map it into
> the user's address space.  This latter is done mostly for compatibility,
> although some supporting code for signals (the "signal trampoline" code) is
> also located there.

It is my impression (and I may be wrong about modern UNIXes) that the u part
of a process is distinct from the proc part of a process *solely* so that
the u part may be swapped out when not running but that the proc part is
never swapped so that scheduling is not produce a swapping deadlock.  This
swapping (or really effectively paging) was very important for V7 on PDP11
or any other 16 bit address space machine.  My knowledge of source for the
286 port in limited but the separation of u from proc was/is a critical
OS design decision.

fodder
fodder
fodder
fodder
fodder
fodder
-- 
David F. Carlson, Micropen, Inc.
micropen!dave@ee.rochester.edu

"The faster I go, the behinder I get." --Lewis Carroll

vandys@hpcupt1.HP.COM (Andrew Valencia(Seattle)) (10/24/88)

/ hpcupt1:comp.unix.microport / dave@micropen (David F. Carlson) / 10:05 am  Oct 21, 1988 /
>It is my impression (and I may be wrong about modern UNIXes) that the u part
>of a process is distinct from the proc part of a process *solely* so that
>the u part may be swapped out when not running but that the proc part is
>never swapped so that scheduling is not produce a swapping deadlock.

	Originally, yes.  But as with all features, cruft lands on them and
becomes part of the whole.  Originally, being swapped out was virtually
equated with moving the U area out of memory.  These days, a number of other
things rank right up there with the U area.  Page tables, for instance.
They tend to use even more space than U areas.  The U area's relationship
to the actions of the VM system has become less dominating than it once
was, as all manner of stuff gets heaped into the memory management picture.

					Andy

fox@marlow.uucp (Paul Fox) (10/31/88)

In article <10770006@hpcupt1.HP.COM> vandys@hpcupt1.HP.COM (Andrew Valencia(Seattle)) writes:
> ... Originally, being swapped out was virtually
>equated with moving the U area out of memory.  These days, a number of other
>things rank right up there with the U area.  Page tables, for instance.
>They tend to use even more space than U areas.  The U area's relationship
>to the actions of the VM system has become less dominating than it once
>was, as all manner of stuff gets heaped into the memory management picture.
>

Following on from my original posting, heres a status update.
Having got top to run I find that my system is 93% busy in the
kernel, EVEN WHEN ITS NOT DOING ANYTHING. After a number of weeks
of fretting and watching the appalling system response, I finally
resorted to removing DOS-MERGE from my kernel. And what do you know ?
I get normal system response times. Jeeze...

Anyway system response is still not too hot, but now I've modified top
to tell me what events sleeping processes are waiting on -- and I can
see that disk intensive processes are (quite rightly) waiting on
disk buffers, etc....

What I wanted to be able to do was print out how much of a process
was resident in memory. I've worked out some of the details --
scan the processes page directory and page table entries tosee which
pages are marked as present, but this approach would appear to be
potentially very CPU intensive - a process has 1024 page directory
tables * 1024 pages within each table. Although most of the address
space is sparse it would appear that I still need to scan the entirety
of these tables.

Does anyone know of a way of easily determining how much of a process
is resident, or of some other useful kernel quantity that can give
some idea of system response ?


=====================
     //        o      All opinions are my own.
   (O)        ( )     The powers that be ...
  /    \_____( )
 o  \         |
    /\____\__/        Tel: +44 628 891313 x. 212
  _/_/   _/_/         UUCP:     fox@marlow.uucp

vandys@hpcupt1.HP.COM (Andrew Valencia(Seattle)) (11/03/88)

/ hpcupt1:comp.unix.microport / fox@marlow.uucp (Paul Fox) /  4:54 am  Oct 31, 1988 /
>Does anyone know of a way of easily determining how much of a process
>is resident, or of some other useful kernel quantity that can give
>some idea of system response ?

	Vanilla V.2.1 Regions has a field p_region in the proc structure
which points to the regions for the process.  r_nvalid is a count of the
number of valid pages under a region.  In general, look in /usr/include/sys
at region.h and proc.h--the comments will give you leads on the values of
the various fields.

						Andy

fox@marlow.uucp (Paul Fox) (11/13/88)

Well, after having been given the info about the region tables --
that is exactly what I was after. Top now displays the current
resident working set (as derived from the region table).

Now, is this a bug ? I've been trying to find out why my system
(and the ISC V.3 systems we use) are so slow compared to Xenix/386.

This is what I have done.

I have an editor, GRIEF, which edits files by reading them into memory.
I created a really big file (~1MB) and read it all in. This essenitally
caused everything else to be swapped out.

I then repeat this on another screen, so that the first GRIEF swaps
out. 

I then go back to the original GRIEF and tell it to go to the middle
of the file. This should involve paging in most of the code (~100KB) +
aa small fraction of the data file. However what happens is that the
2nd GRIEF swaps out in its entirety and about 95% of the 1st one swaps
back in.

I looked thru my code to ensure that GRIEF isnt randomly walking all
over its virtual address map, and as far as I can see it doesnt.

(By the way this is a 4MB machine with 400 disk buffers.
The startup of /unix says I have avail memory = 2510848).

Anyway, so I decided to sdb GRIEF and watch what happens when it
tries to go to the middle of the file. (Going to the
middle of the file involves adding 'n' to the current line number, and
the display code working out where to find lines n..n+24).

If I single step GRIEF, then only about 10-20% of GRIEF swaps back in,
and the system performs nicely as expected.

The question is why when GRIEF runs at full speed does the kernel
bring in the entire image ?

Another thing I did was to use crash to look at the regions
allocated to this process. From my understanding, a region is a description
of a contigous piece of memory, in multiple page units. I dont
know how V.3 'coalesces' pages into a region. But, GRIEF has regions
with the r_pgsz set to 30-70 pages long. What I presume is that
when a page fault occurs, V.3 swaps in the entire region even although
only one or two pages may be needed. This may have been added to V.3
as an 'optimisation', for example if all the pages happen
to be in consecutive sectors in the swap space. If so, I think
this is bloody stupid.

I think the issue of performance is V.3 swaps in too much, and
defeats the whole object of virtual memory. The V.3 virtual
memory system is about 60-70% a complete swapping system. 

Is this a bug, or have ISC & Microport badly tuned the kernel ?

PS. Can somebody tell me what good values for the high and low water
marks are for the page fault handler ? I have them set very close to
each other to try and avoid long periods of the bdflush/vhand processes
from writing to disk. However I think maybe having the high
water mark set higher may give a performance improvement.

Can somebody please respond. We dont have source code here so
I can't look it up myself (I usually have to disassemble the kernel
to work out whats wrong).

Many thanks in advance.


=====================
     //        o      All opinions are my own.
   (O)        ( )     The powers that be ...
  /    \_____( )
 o  \         |
    /\____\__/        Tel: +44 628 891313 x. 212
  _/_/   _/_/         UUCP:     fox@marlow.uucp