[comp.sys.encore] Multimax thrashing

C.R.Ritson@newcastle.ac.uk (C.R. Ritson) (12/04/90)

We have an encore multimax that is overloaded.  We cannot at this time
afford more memory for it.

Every  few  days  at  a  time of high load, the physical memory in use
climbs from its normal 80-90% to 99% and stays there.  No  matter  how
hard  the system swaps (thrashes) it seems to be unable to do anything
to improve the situation.  The only solution is to reboot, and  assume
that this makes some users go away for a coffee.

The  system  paramers  are  largely set to their default values.  Does
anyone have any  hints  about  alternate  settings  that  improve  the
performance under high load?  Are there any known bugs in this area?

For  your  information, incase it might help, here is some information
about our system:

running O/S: Umax 4.3 (R4.1.0) APC NFS Fri Aug 24 15:17:15 1990

Megabytes of memory                                             64
Maximum number of CPUs configured                               8 (4 APCs)

Number of clock interrupts per second                           10
Maximum number of users                                         100
Maximum number of processes per user                            50
Ticks between process timeslice interrupts                      2000

Number of swap buffers                                          64
Microseconds of user virtual time between working set scans     2000000

Chris Ritson
--
PHONE: +44 91 222 8175              Computing Laboratory,
FAX  : +44 91 222 8232              University of Newcastle upon Tyne,
TELEX: uk+53654-UNINEW_G            UK, NE1 7RU

gharel@encore.com (Guy Harel) (12/12/90)

From article <1990Dec3.170300.14750@newcastle.ac.uk>, by C.R.Ritson@newcastle.ac.uk (C.R. Ritson):
> We have an encore multimax that is overloaded.  We cannot at this time
> afford more memory for it.
> 
Use 'sar -AO' to collect data. Trim down your disk buffers to a minimal.
Scan and flush more aggressively to free up pages quickier. You should see
a net difference.

francis@cs.ua.oz.au (Francis Vaughan) (12/12/90)

In article <130064@infocenter.encore.com>, gharel@encore.com (Guy Harel)
writes:
|> From article <1990Dec3.170300.14750@newcastle.ac.uk>, by
C.R.Ritson@newcastle.ac.uk (C.R. Ritson):
|> > We have an encore multimax that is overloaded.  We cannot at this time
|> > afford more memory for it.
|> > 
|> Use 'sar -AO' to collect data. Trim down your disk buffers to a minimal.
|> Scan and flush more aggressively to free up pages quickier. You should see
|> a net difference.

Que? what is this command 'sar'? It is not on our system nor in any man pages.
However we are running UMAX 4.3 (R4_0.0), never seen 4_1.0 in Australia unlike
the poster from the UK.

I wonder if some of the memory thrashing obseved is due to the problem in
the memory manager that locks down copy-on-write pages. Any sign of this being
fixed? Our perception (on our machine anyway) is that this is costing us about
half the performace of our machine. We are not happy. 
(Machine is 48MB, 4xXPC, EMC & MSC)

Encore memory is completely unaffordable and way over the odds price/megabyte.
We can buy a complete dual processor Solbourne with disk and 64MB of memory
for the same price as Encore want for a single 64MB card. 

For further comparison DG will sell us memory for their top end 88000 machines 
at less than half the price that Encore want, They also will sell us a
complete 
dual processor 88k machine for only a little more than an Encore 64MB board, 
thats 48MB + 2BG disk, ready to go. We can buy 450MB of 4MB simms or over
1200MB of 1MB simms for the same money. 

Where are the Encore 88k boards? Will Encore provide a sensible trade in 
agreement  on 16MB cards for 64MB cards, since the 16MB cards won't work 
with the 88k boards?


						Francis Vaughan.

(An otherwise happy Encore customer who belives that a good
company is going to drive itself down the gurgler if it doesn't
sharpen up its act.)

rmtodd@uokmax.ecn.uoknor.edu (Richard Michael Todd) (12/13/90)

francis@cs.ua.oz.au (Francis Vaughan) writes:

>Que? what is this command 'sar'? It is not on our system nor in any man pages.
>However we are running UMAX 4.3 (R4_0.0), never seen 4_1.0 in Australia unlike
>the poster from the UK.

I'd be rather surprised to see 'sar' show up in any 4.3 release, considering
sar is a System V command and reads all sorts of obscure little-documented
variables in the System V kernel that have no obvious counterparts in BSD.

(Disclaimer: I don't have any notion of what Encore may or may not do in
future releases; they may indeed get sar to work on their BSD release.  I
just know where 'sar' came from because I have a SysV system at home.)

>(An otherwise happy Encore customer who belives that a good
>company is going to drive itself down the gurgler if it doesn't
>sharpen up its act.)
No kidding....
-- 
Richard Todd   rmtodd@chinet.chi.il.us  or  rmtodd@uokmax.ecn.uoknor.edu  

ar@mcdd1 (Alastair Rae) (12/14/90)

francis@cs.ua.oz.au (Francis Vaughan) writes:

> ... 
> I wonder if some of the memory thrashing obseved is due to the problem in
> the memory manager that locks down copy-on-write pages. Any sign of this being
> fixed? Our perception (on our machine anyway) is that this is costing us about
> half the performance of our machine. We are not happy. 
> ...

I hadn't heard of this problem but I'm very interested to find out more.
When you pay through the nose for a big *nix box, you expect big performance!
I've had lots of fun :-( trying to tune our box and had the feeling
that something was wrong somewhere.

Could you elaborate, please, Francis?

-- 
.--------------.--------------------.----------------.--------------------.
| Alastair Rae | uunet!ukc!mcdd1!ar | +44 442 272071 | *Usual disclaimer* |
`--------------^--------------------^----------------^--------------------'

francis@cs.ua.oz.au (Francis Vaughan) (12/20/90)

In article <142@mx-1>, ar@mcdd1 (Alastair Rae) writes:
|> francis@cs.ua.oz.au (Francis Vaughan) writes:
|> 
|> > ... 
|> > I wonder if some of the memory thrashing obseved is due to the problem in
|> > the memory manager that locks down copy-on-write pages. Any sign of
this being
|> > fixed? Our perception (on our machine anyway) is that this is
costing us about
|> > half the performance of our machine. We are not happy. 
|> > ...
|> 
|> I hadn't heard of this problem but I'm very interested to find out more.
|> When you pay through the nose for a big *nix box, you expect big
performance!
|> I've had lots of fun :-( trying to tune our box and had the feeling
|> that something was wrong somewhere.
|> 
|> Could you elaborate, please, Francis?

No problem.

Most of this was covered in a posting from Gordon Irlam 
(gordoni@cs.adelaide.edu.au) on the 18th of September to comp.sys.encore. 
If any one really wants the entire thing again I will either repost, if
there are a few requests, or individually forward it, if there are only a very 
few. The full posting includes a few ideas to help mitigate the impact and 
an example program to really screw your machine as well.

We reported this problem to Encore in April, and have heard nothing since. In 
conversation with our local software support people, I was told last week that
Encore was satisfied with the design of the memory system and had no intention
of fixing the problem. I would love to be told otherwise.

Interested folk with source should look in the routine ageregion_sh in
the file 
sys_x.x/sys/vm_pageout.c where x.x is your release version.


This is a small precis.


Umax 4.3 release 4.0.0, and all previous releases of BSD Umax, contain
a serious bug in the virtual memory system that prevents it from being
able to page out pages of processes under certain commonly occurring
circumstances.  This degrades system performance.  Or equivalently
increases the amount of physical memory needed to obtain a given level
of performance.  In more extreme cases it may cause severe performance
problems or even deadlock.

Umax 4.3 is not able to page out copy on write pages.  The meaning of
this and its ramifications are explained below.

When a process forks under Umax all of the modifiable pages of the
parent process are marked copy on write.  The same set of pages are
marked copy on write in the child process.  Because code pages are
read only they can be shared without being marked copy on write.
Marking a page copy on write means setting its protection to read
only, and then if a write to that page causes a translation fault a
copy of the page is made, the protection on the page is set to
read-write, and the faulting instruction re-executed.  Copy on write
pages minimize the cost of forking.

If when a copy on write fault occurs the copy on write page is no
longer shared with any other processes, say because the child has
exited, the page will be set to read-write without the needing to make
a copy of the page.  Note that this final giving away of a copy on
write page is not performed as soon as the page becomes owned by a
single process, but only when the last owner of the page writes to it.
If the last owner never writes to the page it will remain copy on
write despite the fact that it is not shared with anyone else.

Fortunately many processes,
    1) do not fork, or
    2) fork but have a reasonably small amount of data, or
    3) shortly after forking both child and parent,
           a) exit, or
           b) exec, or
           c) modify nearly all their data pages, or
    4) only access a few pages data pages, immediately prior to
       forking, and then only read a few data pages at any time
       subsequent to forking.

Those cases where these constraints are not met cause the most
problems, and to a certain extent case 4 can also cause problems.  In
case 4 where a process only touches a few pages immediately prior to
forking, if the system was heavily loaded at the time prior to the
fork, most pages will have been swapped out, and so will not end up
being locked down by the fork - unless they are subsequently read in.
But if the system was lightly loaded at the time of the fork then case
4 will still cause a large number of pages to be locked down.

Our experience is that we can not use much more swap space than twice
the physical memory on our machines, even though many of our processes
are idle for substantial periods of time.

We had considerable difficulty when we attempted to use a Multimax as
a server for a large number of X terminals.  The machine had
sufficient compute power, virtual, and physical memory for the
clients, but nearly all of the physical memory filled up with
non-pageable copy on write pages, that weren't even being used.
Unfortunately the xterm binary was both long lived and caused a large
number of pages to be locked down for long periods of time.

Identifying the problem is fairly easy.  Sysparam will be showing the
system paging heavily, but when you do a ps you will find some pages
of processes remain in memory, even when they are idle or stopped.  In
more severe cases all of the system's memory may end up becoming
non-pageable, preventing you from even being able to login.


						Francis Vaughan

C.R.Ritson@newcastle.ac.uk (C.R. Ritson) (12/20/90)

francis@cs.ua.oz.au (Francis Vaughan) writes:

     (in  responce  to  my  complaint about a Multimax running Umax4.3
     R4.1.0 thrashing)...

>I wonder if some of the memory thrashing obseved is due to the problem in
>the memory manager that locks down copy-on-write pages. Any sign of this being
>fixed? Our perception (on our machine anyway) is that this is costing us about
>half the performace of our machine. We are not happy.
>(Machine is 48MB, 4xXPC, EMC & MSC)

This  did  turn out to be the case.  We run X11R4, and before patching
xterm this was costing us 23 our of our 64 Mbytes as COW pages with by
now only one user.  We have a patched xterm that  has  improved  this.
One  problem  is  that  although things are better, I have no idea how
many other utilities are locking down memory in this way?

If  a page has only one user, why can't the COW status can be switched
off?  This would improve things a lot, as there must be many  programs
that  fork  and  exec,  but where the parent has a lot of data that it
does not modify.  Like all the shells perhaps?  I do not know if these
use vfork instead, I presume vfork is not causing similar problems.

Chris Ritson.

gharel@encore.com (Guy Harel) (12/22/90)

From article <2166@sirius.ucs.adelaide.edu.au>, by francis@cs.ua.oz.au (Francis Vaughan):
> |> Use 'sar -AO' to collect data. Trim down your disk buffers to a minimal.
> |> Scan and flush more aggressively to free up pages quickier. You should see
> |> a net difference.


My apology! I spent a month tuning on UMAX V, but none on UMAX 4.3. Really
tought that 'sar' asavailable on 4.3.  Its a great tool and much more 
professional that anything alike on BSD. I guess that system tuning and 
triming (to save on mem) could be equally performed on BSD using:

	- pstat, to check on kernel data structure usage or non-usage
	- vmstat, to check on memory consumption and paging load
	- systat (diskstat?) to check on disk cache hit ratios

Guidelines to save on mem:

	1- reduce kernel tables to a minimum (users,inodes,sem..)
	2- exercise various 'flush' water marks and scan rates
	(this is more for smoothing off trashing effects)
	3- reduce cache size for disk to an aceptable minimum

Have fun (if you can afford it..)

paradis@acestes.UUCP (Jim Paradis) (12/27/90)

In article <2195@sirius.ucs.adelaide.edu.au> francis@cs.adelaide.edu.au writes:

>Umax 4.3 is not able to page out copy on write pages.  The meaning of
>this and its ramifications are explained below.
>
> [ very good explanation deleted ]

Francis' explanation is quite correct... but unfortunately a fix is not
trivial.  The reason for this is that when several processes reference
a copy-on-write page (e.g. a process forks off several children that
neither exit nor exec), then EACH process has a PTE that references either
the physical page in memory or (if the page were originally paged out when
we forked), the page on disk.  The problem is that there is no easy way
to propagate a change to one of these PTEs to all the other PTEs 
referencing the page.  Basically, there are no direct links from one
PTE sharing a page to another.  You CAN find them all, but it involves
rummaging through a LOT of kernel data structures and performing a lot
of tests... which means locking down a lot of stuff for a long time.
Since you can't find every affected PTE when you change (or remove) its
physical mapping, you can't reliably page a copy-on-write page out.
(actually, you can probably get away with it if its reference count is
1; wonder what that optimization will buy you...)

Note that only if the page is already in physical memory will you end
up wiring down physical pages this way.  If you're so lucky as to be
swapped out 8-), then forking will only wire down the disk blocks on
the paging partition until the last one out turns out the lights...

-- 
Jim Paradis                  UUCP:  harvard!m2c!jjmhome!acestes!paradis
9 Carlstad St.               AT&T:  (508) 792-3810
Worcester, MA 01607-1569     ICBM:  42deg 13' 52",  71deg 47' 51"

lawley@cs.mu.OZ.AU (michael lawley) (12/29/90)

On 26 Dec 90 23:56:38 GMT,
paradis@acestes.UUCP (Jim Paradis) said:

	[stuff deleted]

> Note that only if the page is already in physical memory will you end
> up wiring down physical pages this way.  If you're so lucky as to be
> swapped out 8-), then forking will only wire down the disk blocks on
> the paging partition until the last one out turns out the lights...

Pages that are swapped out will become locked in physical memory as soon as
they are referenced.  So, if you don't reference the pages, you gain.
Otherwise, all you gain is a slight delay before you lose physical memory.

mike
--
 _--_|\		michael lawley (lawley@cs.mu.OZ.AU).
/      \	The Unicycling Systems Programmer,
\_.--.*/	Melbourne University, Computer Science
      v
	"She was the kind of woman who lived for others -
	 you could tell the others by their hunted look."  C.S.Lewis

alan@encore.encore.COM (Alan Langerman) (01/03/91)

In article <2195@sirius.ucs.adelaide.edu.au>, francis@cs.ua.oz.au (Francis Vaughan) writes:
|> This is a small precis.
|> 
|> 
|> Umax 4.3 release 4.0.0, and all previous releases of BSD Umax, contain
|> a serious bug in the virtual memory system that prevents it from being
|> able to page out pages of processes under certain commonly occurring
|> circumstances.

Please note that Encore's other operating system products (Mach, UmaxV)
do not suffer from this particular bug.  (Of course, they may suffer from
other bugs.)

For those who care, Encore Mach Release 1.0 looks to the user/programmer
like a 4.3BSD Tahoe system and reads/writes existing 4.3BSD filesystems.
It also has full server and client NFS.  However, it does NOT support
Umax4.3-isms like inq_stats, dread/dwrite, etc.  Release 1.0 is available now.

Alan Langerman
Mach Group, OSF/1 Project