francis@chook.ua.oz.au (Francis Vaughan) (07/06/90)
In article <1990Jul06.003628.2633@esegue.segue.boston.ma.us>, johnl@esegue.segue.boston.ma.us (John R. Levine) writes: |>I don't get what this argument is about. Every machine I've ever seen that |>has regular (as opposed to inverted) page tables pages the page tables. |>Some, like the Vax, do it explicitly by embedding the page table in another |>pagable address space. Most others, such as the IBM 370 and the Intel 386, |>get exactly the same effect with two-level page tables. In both cases, the |>page table is broken up into chunks (which always seem to have a size of one |>page) and each chunk can be marked resident or non-resident independently. Umm, No. Try the SUN Sparc MMU. SUN-OS pages its copies of page maps (called pmegs). However the hardware has its own local copy of a subset of the tables. How much is dependant on the machine. SUN-OS maintains the illusion of pageable page tables but the underlying hardware is very different. Each pmeg holds 32 entries (or on the SS-1 64 entries) where each entry refers to a page. SS-1s have MMU memory for only 128 pmegs or 64*128 =8192 page entries. That means it can only hold entries for 32 megabytes of virtual memory agragate for all processes running on the machine. Since that is in chunks of 64*4=256KB and you need a seperate chunk (therefore pmeg) for data, bss, code, no matter how little is used the system rapidly runs out of pmegs. If the OS needs a pmeg and the hardware table is full, it selects a used one and trashes it. This is done with a software cache manager that copies the kernel version of the table into the hardware table. You take quite a perfomance hit when this happens. The crazy thing is that SUN-OS places all the pages refered to by the reused pmeg onto the free list. If a process gets all of its pgems reused then all of its pages are marked as free then it gets marked for eager swapping by the kernel. This can result in processes that are actually entierly memory resident being swapped. An easy way to tell is if the free list is huge, but the system is thrashing. The hit for SS-1s (and this includes SS-1+s) is that you don't get the sort of improvement you would expect in performance when adding much more than about 16MB of memory, especially if there is little image sharing. Extra memory is essentially used as a disk cache. The bigger machines (especially the 490) have bigger slabs of MMU memory and do not suffer as much (the 490 has 1024 pmegs of 32 entries each). Currently I do not know if SUN intend fixing the problem in SUN-OS (this has been coloquially known as the pmeg leakage problem) however they have announced a new product that may help. The Sun Database Excelerator. " This product features: IMPROVED MEMORY MANAGEMENT; ..." " This easy to install unbundled software dramaticly increases Sun SPARCserver performance with database applications ..." "...increased system throughput and improved response times particularly under heavy system loads. Maximum throughput increases up to 50% while supporting up to five times as many concurrent users." Francis Vaughan
pcg@cs.aber.ac.uk (Piercarlo Grandi) (07/07/90)
In article <1122@sirius.ucs.adelaide.edu.au> francis@chook.ua.oz.au (Francis Vaughan) writes: [ ... on the SPARC MMU and TLB, and paging page tables ... ] Each pmeg holds 32 entries (or on the SS-1 64 entries) where each entry refers to a page. SS-1s have MMU memory for only 128 pmegs or 64*128 =8192 page entries. That means it can only hold entries for 32 megabytes of virtual memory agragate for all processes running on the machine. Thhat would be entirely adequate if this were a TLB; a TLB with 8192 entries would be quite large, actually. The catch of course is the granularity of this TLB: Since that is in chunks of 64*4=256KB and you need a seperate chunk (therefore pmeg) for data, bss, code, no matter how little is used the system rapidly runs out of pmegs. The Sun-3s were even worse, they cached entire page tables in one of typically eight slots; if there were more than eight active address spaces you had the page tables (several dozen kilobytes -- 256MB of 8KB pages) swapped to/from memory. If the OS needs a pmeg and the hardware table is full, it selects a used one and trashes it. This is done with a software cache manager that copies the kernel version of the table into the hardware table. You take quite a perfomance hit when this happens. The crazy thing is that SUN-OS places all the pages refered to by the reused pmeg onto the free list. I don't know whether this is still true, but SunOS used to have a round robin scheduler, i.e. FIFO, with a page table cache replacement algorithm approximating LRU, i.e. LIFO. It is well known that LIFO replacement with a FIFO access pattern results in a miss on every single access as soon as the length of the FIFO exceeds the depth of the LIFO. This resulted (and probably still results) in the wonderful phenomenon, easily observed, that with 9 active address spaces and an 8 context cache, a cache miss occurred on every single address space activation. A wonderful example of the left hand not knowing what the right hand is up to. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
renglish@hpcupt1.HP.COM (Robert English) (07/07/90)
> / lkaplan@bbn.com (Larry Kaplan) / 6:49 am Jul 6, 1990 / > The idea behind > copy-on-write is that a proper fork() requires NO EXTRA physical memory for > the child process (except that required for kernel data structures and page > tables). You probably end up needing one stack page pretty soon. Virtual > memory is VIRTUAL. Who cares if you were running out in one process? You get > an entire new address space in the next. This implementation of fork() is > what fork() was always intended to be, as far as I can tell. By doing fork() > correctly, the need for a separate vfork() disappears (as stated in the BSD > man pages). Ain't necessarily so, for a couple of reasons. First, while vfork() was a hack intended to get around the absence of a copy-on-write scheme, it has different semantics from a regular fork(), and those semantics can sometimes be useful. /bin/csh, for example, takes advantage of them. Such use may not be wise, but it does exist, and if you just get rid of vfork(), there will be programs that break. Not many, but some. Second, and more important from an architectural point of view, vfork-exec is usually faster than fork-exec, even when copy-on-write is implemented. Whenever you fork a process, you have to copy its data structures. For a large process, the work of setting up an entire virtual memory structure for a process and then immediately tearing it down can be significant, even when the process's data is not actually copied. While one could design a VM system so that even that work was done lazily, that would entail a lot of complexity just to avoid implementing vfork(). It doesn't quite sound worth it to me, but perhaps that's just prejudice. Finally, all of this discussion misses the point that the normal performance-critical sequence is a fork followed by some kernel data structure manipulations and an exec, and that both performance and purity could be achieved by providing a single system call that performs the whole sequence. --bob-- renglish@hpda
mark@parc.xerox.com (Mark Weiser) (07/07/90)
In article <1122@sirius.ucs.adelaide.edu.au> francis@chook.ua.oz.au (Francis Vaughan) writes: >... >The hit for SS-1s (and this includes SS-1+s) is that you don't get the sort >of improvement you would expect in performance when adding much more than >about 16MB of memory, especially if there is little image sharing. Good message. But you say that the problem is especially bad if there is little image sharing. Turns out this is irrelevant, or negative. SunOS does not recognize the sharing in its pmeg algorithms, so it does not help. And it can make things worse by causing many more different little pmegs to be allocated because the sharing forces many different address to be in use all over the place. We have a program which shows pmeg use by process, and shared libraries makes the pmeg problem much worse: 3-4 pmegs for the shared libraries, replicated for every copy of the shared library in use (sharing not recognized, remember)? >" This product features: IMPROVED MEMORY MANAGEMENT; ..." >" This easy to install unbundled software dramaticly increases Sun >SPARCserver performance with database applications ..." >"...increased system throughput and improved response times particularly >under heavy system loads. Maximum throughput increases up to 50% while >supporting up to five times as many concurrent users." I certainly hope Sun does something about this, and not as an extra product, but a bug fix available to all. Selling a 15 MIP machine that slows down to 4 MIPS when using more than about 16MB of memory is certainly a bug! -mark -- Spoken: Mark Weiser ARPA: weiser@xerox.com Phone: +1-415-494-4406