[comp.arch] VM Measurement

lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (03/11/91)

Taking my cache-measurement post a bit sideways:

It's difficult to find out the virtual memory behavior of a program.
At one level, that's an OS deficiency, and not the province of this
news group. On the other hand, TLBs are definitely architecture, and
today's designs are deficient in not keeping any measurements. Wasn't
an anecdote posted here, about a mysterious program that turned out
to be thrashing the ETA-10's TLB?

The MIPS chips have a unique advantage here. MIP's TLB refill is done
by software, hence, a user could in theory boot an instrumented
version of the OS.  I suggest collecting, per page, a count of how
many times that page is faulted in to the TLB.

I ran some of this past a Mach-PMAX person, Alessandro Forin, and he
commented, in part:

>-"the" pages that faulted/missed	- easy
>- same, plus how many times		- easy
>- same, plus in what order		- space expensive
>- "the" pages accessed at least once	- easy
>- same, plus in what mode		- easy
>- same, plus how many times		- impossible
>- same, plus in what order		- impossible

Sandro was envisioning logic that that you wouldn't want to see cast
in hardware.  However, it would still be nice if the hardware kept a
measly counter or two.



-- 
Don		D.C.Lindsay .. temporarily at Carnegie Mellon Robotics

torek@elf.ee.lbl.gov (Chris Torek) (03/11/91)

In article <12318@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu
(Donald Lindsay) writes:
>The MIPS chips have a unique advantage here. MIP's TLB refill is done
>by software, hence, a user could in theory boot an instrumented
>version of the OS.  I suggest collecting, per page, a count of how
>many times that page is faulted in to the TLB.

The Sun MMUs in the Sun-3 and Sun-4 series (the Sun-2 as well, but the
Sun-2 is essentially dead) have a somewhat similar property.  The MMU
itself is a piece of SRAM with some hardware logic around it; the SRAM
is too small to map even a single process, so you must `demand load' it
at fault time:

	fault_handler:
		va = instruction_access_fault ? pc : saved_va;
		if (mmu entry for va is out of date) {
			reload mmu entry;
			retry;
		}
		if (pagein(va) succeeds) {
			reload mmu entry;	/* if not part of pagein */
			retry;
		}
		deliver fault to process;

The `MMU entry's are actually collections of 16 (Sun-3), 32 (Sun-4), or
64 (Sun-4c) PTEs (individual PTEs are addressed much like words in
cache lines).  You can even do counting without taking faults on each
PTE (take only one fault per `PTE line').  There are reference bits in
each PTE.  If you turn them off whenever you load a new PTE line, then
when you swap out an old line and collect its reference bits, the va's
corresponding to any one PTE were used iff the ref bit is on.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov