[comp.arch] Paging page tables & SUN4s

francis@chook.ua.oz.au (Francis Vaughan) (07/06/90)

In article <1990Jul06.003628.2633@esegue.segue.boston.ma.us>,
johnl@esegue.segue.boston.ma.us (John R. Levine) writes:
|>I don't get what this argument is about.  Every machine I've ever seen that
|>has regular (as opposed to inverted) page tables pages the page tables.
|>Some, like the Vax, do it explicitly by embedding the page table in another
|>pagable address space.  Most others, such as the IBM 370 and the Intel 386,
|>get exactly the same effect with two-level page tables.  In both cases, the
|>page table is broken up into chunks (which always seem to have a size of one
|>page) and each chunk can be marked resident or non-resident independently.

Umm, No. Try the SUN Sparc MMU. SUN-OS pages its copies of page maps (called
pmegs). However the hardware has its own local copy of a subset of the 
tables. How much is dependant on the machine. SUN-OS maintains the illusion
of pageable page tables but the underlying hardware is very different.

Each pmeg holds 32 entries (or on the SS-1 64 entries) where each entry
refers to a page. SS-1s have MMU memory for only 128 pmegs or 64*128 =8192
page entries. That means it can only hold entries for 32 megabytes of
virtual memory agragate for all processes running on the machine. Since 
that is in chunks of 64*4=256KB and you need a seperate chunk (therefore 
pmeg) for data, bss, code, no matter how little is used the  system rapidly
runs  out of pmegs.

If the OS needs a pmeg and the hardware table is full, it selects a used 
one and trashes it. This is done with a software cache manager that copies
the kernel version of the table into the hardware table. You take quite a 
perfomance hit when this happens. The crazy thing is that SUN-OS places all 
the pages refered to by the reused pmeg onto the free list. If a process 
gets all of its pgems reused then all of its pages are marked as free then
it gets marked for eager swapping by the kernel. This can result in 
processes that are actually entierly memory resident being swapped. An 
easy way to tell is if the free list is huge, but the system is thrashing.
                  
The hit for SS-1s (and this includes SS-1+s) is that you don't get the sort
of improvement you would expect in performance when adding much more than
about 16MB of memory, especially if there is little image sharing. Extra
memory is essentially used as a disk cache. 

The bigger machines (especially the 490) have bigger slabs of MMU memory 
and do not suffer as much (the 490 has 1024 pmegs of 32 entries each).

Currently I do not know if SUN intend fixing the problem in SUN-OS (this 
has been coloquially known as the pmeg leakage problem) however they have 
announced a new product that may help. The Sun Database Excelerator.

" This product features: IMPROVED MEMORY MANAGEMENT; ..."
" This easy to install unbundled software dramaticly increases Sun 
SPARCserver performance with database applications ..."
"...increased system throughput and improved response times particularly 
under  heavy system loads. Maximum throughput increases up to 50% while 
supporting up to five times as many concurrent users."


						Francis Vaughan

pcg@cs.aber.ac.uk (Piercarlo Grandi) (07/07/90)

In article <1122@sirius.ucs.adelaide.edu.au> francis@chook.ua.oz.au
(Francis Vaughan) writes:

	[ ... on the SPARC MMU and TLB, and paging page tables ... ]

   Each pmeg holds 32 entries (or on the SS-1 64 entries) where each
   entry refers to a page. SS-1s have MMU memory for only 128 pmegs or
   64*128 =8192 page entries. That means it can only hold entries for
   32 megabytes of virtual memory agragate for all processes running on
   the machine.

Thhat would be entirely adequate if this were a TLB; a TLB with 8192
entries would be quite large, actually. The catch of course is the
granularity of this TLB:

   Since that is in chunks of 64*4=256KB and you need a seperate chunk
   (therefore pmeg) for data, bss, code, no matter how little is used
   the  system rapidly runs  out of pmegs.

The Sun-3s were even worse, they cached entire page tables in one of
typically eight slots; if there were more than eight active address
spaces you had the page tables (several dozen kilobytes -- 256MB of 8KB
pages) swapped to/from memory.

   If the OS needs a pmeg and the hardware table is full, it selects a
   used one and trashes it. This is done with a software cache manager
   that copies the kernel version of the table into the hardware table.
   You take quite a perfomance hit when this happens. The crazy thing
   is that SUN-OS places all the pages refered to by the reused pmeg
   onto the free list.

I don't know whether this is still true, but SunOS used to have a round
robin scheduler, i.e. FIFO, with a page table cache replacement
algorithm approximating LRU, i.e. LIFO.

It is well known that LIFO replacement with a FIFO access pattern
results in a miss on every single access as soon as the length of the
FIFO exceeds the depth of the LIFO. This resulted (and probably still
results) in the wonderful phenomenon, easily observed, that with 9
active address spaces and an 8 context cache, a cache miss occurred on
every single address space activation.

A wonderful example of the left hand not knowing what the right hand is
up to.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

renglish@hpcupt1.HP.COM (Robert English) (07/07/90)

> / lkaplan@bbn.com (Larry Kaplan) /  6:49 am  Jul  6, 1990 /

> The idea behind 
> copy-on-write is that a proper fork() requires NO EXTRA physical memory for 
> the child process (except that required for kernel data structures and page 
> tables).  You probably end up needing one stack page pretty soon.  Virtual 
> memory is VIRTUAL.  Who cares if you were running out in one process?  You get
> an entire new address space in the next.  This implementation of fork() is
> what fork() was always intended to be, as far as I can tell.  By doing fork()
> correctly, the need for a separate vfork() disappears (as stated in the BSD
> man pages).

Ain't necessarily so, for a couple of reasons.

First, while vfork() was a hack intended to get around the absence of a
copy-on-write scheme, it has different semantics from a regular fork(),
and those semantics can sometimes be useful.  /bin/csh, for example,
takes advantage of them.  Such use may not be wise, but it does exist,
and if you just get rid of vfork(), there will be programs that break.
Not many, but some.

Second, and more important from an architectural point of view,
vfork-exec is usually faster than fork-exec, even when copy-on-write
is implemented.  Whenever you fork a process, you have to copy its data
structures.  For a large process, the work of setting up an entire
virtual memory structure for a process and then immediately tearing it
down can be significant, even when the process's data is not actually
copied.  While one could design a VM system so that even that work was
done lazily, that would entail a lot of complexity just to avoid
implementing vfork().  It doesn't quite sound worth it to me, but
perhaps that's just prejudice.

Finally, all of this discussion misses the point that the normal
performance-critical sequence is a fork followed by some kernel data
structure manipulations and an exec, and that both performance and
purity could be achieved by providing a single system call that performs
the whole sequence.

--bob--
renglish@hpda

mark@parc.xerox.com (Mark Weiser) (07/07/90)

In article <1122@sirius.ucs.adelaide.edu.au> francis@chook.ua.oz.au (Francis Vaughan) writes:

>...
>The hit for SS-1s (and this includes SS-1+s) is that you don't get the sort
>of improvement you would expect in performance when adding much more than
>about 16MB of memory, especially if there is little image sharing.

Good message.  But you say that the problem is especially bad if there
is little image sharing.  Turns out this is irrelevant, or negative.
SunOS does not recognize the sharing in its pmeg algorithms, so it
does not help.  And it can make things worse by causing many more
different little pmegs to be allocated because the sharing forces
many different address to be in use all over the place.  We have a program
which shows pmeg use by process, and shared libraries makes the pmeg problem
much worse: 3-4 pmegs for the shared libraries, replicated for every
copy of the shared library in use (sharing not recognized, remember)?

>" This product features: IMPROVED MEMORY MANAGEMENT; ..."
>" This easy to install unbundled software dramaticly increases Sun 
>SPARCserver performance with database applications ..."
>"...increased system throughput and improved response times particularly 
>under  heavy system loads. Maximum throughput increases up to 50% while 
>supporting up to five times as many concurrent users."

I certainly hope Sun does something about this, and not as an extra
product, but a bug fix available to all.  Selling a 15 MIP machine that
slows down to 4 MIPS when using more than about 16MB of memory is
certainly a bug!

-mark
--
Spoken: Mark Weiser 	ARPA:	weiser@xerox.com	Phone: +1-415-494-4406