C.R.Ritson@newcastle.ac.uk (C.R. Ritson) (12/04/90)
We have an encore multimax that is overloaded. We cannot at this time afford more memory for it. Every few days at a time of high load, the physical memory in use climbs from its normal 80-90% to 99% and stays there. No matter how hard the system swaps (thrashes) it seems to be unable to do anything to improve the situation. The only solution is to reboot, and assume that this makes some users go away for a coffee. The system paramers are largely set to their default values. Does anyone have any hints about alternate settings that improve the performance under high load? Are there any known bugs in this area? For your information, incase it might help, here is some information about our system: running O/S: Umax 4.3 (R4.1.0) APC NFS Fri Aug 24 15:17:15 1990 Megabytes of memory 64 Maximum number of CPUs configured 8 (4 APCs) Number of clock interrupts per second 10 Maximum number of users 100 Maximum number of processes per user 50 Ticks between process timeslice interrupts 2000 Number of swap buffers 64 Microseconds of user virtual time between working set scans 2000000 Chris Ritson -- PHONE: +44 91 222 8175 Computing Laboratory, FAX : +44 91 222 8232 University of Newcastle upon Tyne, TELEX: uk+53654-UNINEW_G UK, NE1 7RU
gharel@encore.com (Guy Harel) (12/12/90)
From article <1990Dec3.170300.14750@newcastle.ac.uk>, by C.R.Ritson@newcastle.ac.uk (C.R. Ritson): > We have an encore multimax that is overloaded. We cannot at this time > afford more memory for it. > Use 'sar -AO' to collect data. Trim down your disk buffers to a minimal. Scan and flush more aggressively to free up pages quickier. You should see a net difference.
francis@cs.ua.oz.au (Francis Vaughan) (12/12/90)
In article <130064@infocenter.encore.com>, gharel@encore.com (Guy Harel) writes: |> From article <1990Dec3.170300.14750@newcastle.ac.uk>, by C.R.Ritson@newcastle.ac.uk (C.R. Ritson): |> > We have an encore multimax that is overloaded. We cannot at this time |> > afford more memory for it. |> > |> Use 'sar -AO' to collect data. Trim down your disk buffers to a minimal. |> Scan and flush more aggressively to free up pages quickier. You should see |> a net difference. Que? what is this command 'sar'? It is not on our system nor in any man pages. However we are running UMAX 4.3 (R4_0.0), never seen 4_1.0 in Australia unlike the poster from the UK. I wonder if some of the memory thrashing obseved is due to the problem in the memory manager that locks down copy-on-write pages. Any sign of this being fixed? Our perception (on our machine anyway) is that this is costing us about half the performace of our machine. We are not happy. (Machine is 48MB, 4xXPC, EMC & MSC) Encore memory is completely unaffordable and way over the odds price/megabyte. We can buy a complete dual processor Solbourne with disk and 64MB of memory for the same price as Encore want for a single 64MB card. For further comparison DG will sell us memory for their top end 88000 machines at less than half the price that Encore want, They also will sell us a complete dual processor 88k machine for only a little more than an Encore 64MB board, thats 48MB + 2BG disk, ready to go. We can buy 450MB of 4MB simms or over 1200MB of 1MB simms for the same money. Where are the Encore 88k boards? Will Encore provide a sensible trade in agreement on 16MB cards for 64MB cards, since the 16MB cards won't work with the 88k boards? Francis Vaughan. (An otherwise happy Encore customer who belives that a good company is going to drive itself down the gurgler if it doesn't sharpen up its act.)
rmtodd@uokmax.ecn.uoknor.edu (Richard Michael Todd) (12/13/90)
francis@cs.ua.oz.au (Francis Vaughan) writes: >Que? what is this command 'sar'? It is not on our system nor in any man pages. >However we are running UMAX 4.3 (R4_0.0), never seen 4_1.0 in Australia unlike >the poster from the UK. I'd be rather surprised to see 'sar' show up in any 4.3 release, considering sar is a System V command and reads all sorts of obscure little-documented variables in the System V kernel that have no obvious counterparts in BSD. (Disclaimer: I don't have any notion of what Encore may or may not do in future releases; they may indeed get sar to work on their BSD release. I just know where 'sar' came from because I have a SysV system at home.) >(An otherwise happy Encore customer who belives that a good >company is going to drive itself down the gurgler if it doesn't >sharpen up its act.) No kidding.... -- Richard Todd rmtodd@chinet.chi.il.us or rmtodd@uokmax.ecn.uoknor.edu
ar@mcdd1 (Alastair Rae) (12/14/90)
francis@cs.ua.oz.au (Francis Vaughan) writes: > ... > I wonder if some of the memory thrashing obseved is due to the problem in > the memory manager that locks down copy-on-write pages. Any sign of this being > fixed? Our perception (on our machine anyway) is that this is costing us about > half the performance of our machine. We are not happy. > ... I hadn't heard of this problem but I'm very interested to find out more. When you pay through the nose for a big *nix box, you expect big performance! I've had lots of fun :-( trying to tune our box and had the feeling that something was wrong somewhere. Could you elaborate, please, Francis? -- .--------------.--------------------.----------------.--------------------. | Alastair Rae | uunet!ukc!mcdd1!ar | +44 442 272071 | *Usual disclaimer* | `--------------^--------------------^----------------^--------------------'
francis@cs.ua.oz.au (Francis Vaughan) (12/20/90)
In article <142@mx-1>, ar@mcdd1 (Alastair Rae) writes: |> francis@cs.ua.oz.au (Francis Vaughan) writes: |> |> > ... |> > I wonder if some of the memory thrashing obseved is due to the problem in |> > the memory manager that locks down copy-on-write pages. Any sign of this being |> > fixed? Our perception (on our machine anyway) is that this is costing us about |> > half the performance of our machine. We are not happy. |> > ... |> |> I hadn't heard of this problem but I'm very interested to find out more. |> When you pay through the nose for a big *nix box, you expect big performance! |> I've had lots of fun :-( trying to tune our box and had the feeling |> that something was wrong somewhere. |> |> Could you elaborate, please, Francis? No problem. Most of this was covered in a posting from Gordon Irlam (gordoni@cs.adelaide.edu.au) on the 18th of September to comp.sys.encore. If any one really wants the entire thing again I will either repost, if there are a few requests, or individually forward it, if there are only a very few. The full posting includes a few ideas to help mitigate the impact and an example program to really screw your machine as well. We reported this problem to Encore in April, and have heard nothing since. In conversation with our local software support people, I was told last week that Encore was satisfied with the design of the memory system and had no intention of fixing the problem. I would love to be told otherwise. Interested folk with source should look in the routine ageregion_sh in the file sys_x.x/sys/vm_pageout.c where x.x is your release version. This is a small precis. Umax 4.3 release 4.0.0, and all previous releases of BSD Umax, contain a serious bug in the virtual memory system that prevents it from being able to page out pages of processes under certain commonly occurring circumstances. This degrades system performance. Or equivalently increases the amount of physical memory needed to obtain a given level of performance. In more extreme cases it may cause severe performance problems or even deadlock. Umax 4.3 is not able to page out copy on write pages. The meaning of this and its ramifications are explained below. When a process forks under Umax all of the modifiable pages of the parent process are marked copy on write. The same set of pages are marked copy on write in the child process. Because code pages are read only they can be shared without being marked copy on write. Marking a page copy on write means setting its protection to read only, and then if a write to that page causes a translation fault a copy of the page is made, the protection on the page is set to read-write, and the faulting instruction re-executed. Copy on write pages minimize the cost of forking. If when a copy on write fault occurs the copy on write page is no longer shared with any other processes, say because the child has exited, the page will be set to read-write without the needing to make a copy of the page. Note that this final giving away of a copy on write page is not performed as soon as the page becomes owned by a single process, but only when the last owner of the page writes to it. If the last owner never writes to the page it will remain copy on write despite the fact that it is not shared with anyone else. Fortunately many processes, 1) do not fork, or 2) fork but have a reasonably small amount of data, or 3) shortly after forking both child and parent, a) exit, or b) exec, or c) modify nearly all their data pages, or 4) only access a few pages data pages, immediately prior to forking, and then only read a few data pages at any time subsequent to forking. Those cases where these constraints are not met cause the most problems, and to a certain extent case 4 can also cause problems. In case 4 where a process only touches a few pages immediately prior to forking, if the system was heavily loaded at the time prior to the fork, most pages will have been swapped out, and so will not end up being locked down by the fork - unless they are subsequently read in. But if the system was lightly loaded at the time of the fork then case 4 will still cause a large number of pages to be locked down. Our experience is that we can not use much more swap space than twice the physical memory on our machines, even though many of our processes are idle for substantial periods of time. We had considerable difficulty when we attempted to use a Multimax as a server for a large number of X terminals. The machine had sufficient compute power, virtual, and physical memory for the clients, but nearly all of the physical memory filled up with non-pageable copy on write pages, that weren't even being used. Unfortunately the xterm binary was both long lived and caused a large number of pages to be locked down for long periods of time. Identifying the problem is fairly easy. Sysparam will be showing the system paging heavily, but when you do a ps you will find some pages of processes remain in memory, even when they are idle or stopped. In more severe cases all of the system's memory may end up becoming non-pageable, preventing you from even being able to login. Francis Vaughan
C.R.Ritson@newcastle.ac.uk (C.R. Ritson) (12/20/90)
francis@cs.ua.oz.au (Francis Vaughan) writes: (in responce to my complaint about a Multimax running Umax4.3 R4.1.0 thrashing)... >I wonder if some of the memory thrashing obseved is due to the problem in >the memory manager that locks down copy-on-write pages. Any sign of this being >fixed? Our perception (on our machine anyway) is that this is costing us about >half the performace of our machine. We are not happy. >(Machine is 48MB, 4xXPC, EMC & MSC) This did turn out to be the case. We run X11R4, and before patching xterm this was costing us 23 our of our 64 Mbytes as COW pages with by now only one user. We have a patched xterm that has improved this. One problem is that although things are better, I have no idea how many other utilities are locking down memory in this way? If a page has only one user, why can't the COW status can be switched off? This would improve things a lot, as there must be many programs that fork and exec, but where the parent has a lot of data that it does not modify. Like all the shells perhaps? I do not know if these use vfork instead, I presume vfork is not causing similar problems. Chris Ritson.
gharel@encore.com (Guy Harel) (12/22/90)
From article <2166@sirius.ucs.adelaide.edu.au>, by francis@cs.ua.oz.au (Francis Vaughan): > |> Use 'sar -AO' to collect data. Trim down your disk buffers to a minimal. > |> Scan and flush more aggressively to free up pages quickier. You should see > |> a net difference. My apology! I spent a month tuning on UMAX V, but none on UMAX 4.3. Really tought that 'sar' asavailable on 4.3. Its a great tool and much more professional that anything alike on BSD. I guess that system tuning and triming (to save on mem) could be equally performed on BSD using: - pstat, to check on kernel data structure usage or non-usage - vmstat, to check on memory consumption and paging load - systat (diskstat?) to check on disk cache hit ratios Guidelines to save on mem: 1- reduce kernel tables to a minimum (users,inodes,sem..) 2- exercise various 'flush' water marks and scan rates (this is more for smoothing off trashing effects) 3- reduce cache size for disk to an aceptable minimum Have fun (if you can afford it..)
paradis@acestes.UUCP (Jim Paradis) (12/27/90)
In article <2195@sirius.ucs.adelaide.edu.au> francis@cs.adelaide.edu.au writes: >Umax 4.3 is not able to page out copy on write pages. The meaning of >this and its ramifications are explained below. > > [ very good explanation deleted ] Francis' explanation is quite correct... but unfortunately a fix is not trivial. The reason for this is that when several processes reference a copy-on-write page (e.g. a process forks off several children that neither exit nor exec), then EACH process has a PTE that references either the physical page in memory or (if the page were originally paged out when we forked), the page on disk. The problem is that there is no easy way to propagate a change to one of these PTEs to all the other PTEs referencing the page. Basically, there are no direct links from one PTE sharing a page to another. You CAN find them all, but it involves rummaging through a LOT of kernel data structures and performing a lot of tests... which means locking down a lot of stuff for a long time. Since you can't find every affected PTE when you change (or remove) its physical mapping, you can't reliably page a copy-on-write page out. (actually, you can probably get away with it if its reference count is 1; wonder what that optimization will buy you...) Note that only if the page is already in physical memory will you end up wiring down physical pages this way. If you're so lucky as to be swapped out 8-), then forking will only wire down the disk blocks on the paging partition until the last one out turns out the lights... -- Jim Paradis UUCP: harvard!m2c!jjmhome!acestes!paradis 9 Carlstad St. AT&T: (508) 792-3810 Worcester, MA 01607-1569 ICBM: 42deg 13' 52", 71deg 47' 51"
lawley@cs.mu.OZ.AU (michael lawley) (12/29/90)
On 26 Dec 90 23:56:38 GMT, paradis@acestes.UUCP (Jim Paradis) said: [stuff deleted] > Note that only if the page is already in physical memory will you end > up wiring down physical pages this way. If you're so lucky as to be > swapped out 8-), then forking will only wire down the disk blocks on > the paging partition until the last one out turns out the lights... Pages that are swapped out will become locked in physical memory as soon as they are referenced. So, if you don't reference the pages, you gain. Otherwise, all you gain is a slight delay before you lose physical memory. mike -- _--_|\ michael lawley (lawley@cs.mu.OZ.AU). / \ The Unicycling Systems Programmer, \_.--.*/ Melbourne University, Computer Science v "She was the kind of woman who lived for others - you could tell the others by their hunted look." C.S.Lewis
alan@encore.encore.COM (Alan Langerman) (01/03/91)
In article <2195@sirius.ucs.adelaide.edu.au>, francis@cs.ua.oz.au (Francis Vaughan) writes: |> This is a small precis. |> |> |> Umax 4.3 release 4.0.0, and all previous releases of BSD Umax, contain |> a serious bug in the virtual memory system that prevents it from being |> able to page out pages of processes under certain commonly occurring |> circumstances. Please note that Encore's other operating system products (Mach, UmaxV) do not suffer from this particular bug. (Of course, they may suffer from other bugs.) For those who care, Encore Mach Release 1.0 looks to the user/programmer like a 4.3BSD Tahoe system and reads/writes existing 4.3BSD filesystems. It also has full server and client NFS. However, it does NOT support Umax4.3-isms like inq_stats, dread/dwrite, etc. Release 1.0 is available now. Alan Langerman Mach Group, OSF/1 Project