lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (03/11/91)
Taking my cache-measurement post a bit sideways: It's difficult to find out the virtual memory behavior of a program. At one level, that's an OS deficiency, and not the province of this news group. On the other hand, TLBs are definitely architecture, and today's designs are deficient in not keeping any measurements. Wasn't an anecdote posted here, about a mysterious program that turned out to be thrashing the ETA-10's TLB? The MIPS chips have a unique advantage here. MIP's TLB refill is done by software, hence, a user could in theory boot an instrumented version of the OS. I suggest collecting, per page, a count of how many times that page is faulted in to the TLB. I ran some of this past a Mach-PMAX person, Alessandro Forin, and he commented, in part: >-"the" pages that faulted/missed - easy >- same, plus how many times - easy >- same, plus in what order - space expensive >- "the" pages accessed at least once - easy >- same, plus in what mode - easy >- same, plus how many times - impossible >- same, plus in what order - impossible Sandro was envisioning logic that that you wouldn't want to see cast in hardware. However, it would still be nice if the hardware kept a measly counter or two. -- Don D.C.Lindsay .. temporarily at Carnegie Mellon Robotics
torek@elf.ee.lbl.gov (Chris Torek) (03/11/91)
In article <12318@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes: >The MIPS chips have a unique advantage here. MIP's TLB refill is done >by software, hence, a user could in theory boot an instrumented >version of the OS. I suggest collecting, per page, a count of how >many times that page is faulted in to the TLB. The Sun MMUs in the Sun-3 and Sun-4 series (the Sun-2 as well, but the Sun-2 is essentially dead) have a somewhat similar property. The MMU itself is a piece of SRAM with some hardware logic around it; the SRAM is too small to map even a single process, so you must `demand load' it at fault time: fault_handler: va = instruction_access_fault ? pc : saved_va; if (mmu entry for va is out of date) { reload mmu entry; retry; } if (pagein(va) succeeds) { reload mmu entry; /* if not part of pagein */ retry; } deliver fault to process; The `MMU entry's are actually collections of 16 (Sun-3), 32 (Sun-4), or 64 (Sun-4c) PTEs (individual PTEs are addressed much like words in cache lines). You can even do counting without taking faults on each PTE (take only one fault per `PTE line'). There are reference bits in each PTE. If you turn them off whenever you load a new PTE line, then when you swap out an old line and collect its reference bits, the va's corresponding to any one PTE were used iff the ref bit is on. -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov