[comp.arch] More on MIPSco TLB

mash@mips.UUCP (John Mashey) (05/02/87)

In article <95@ksr.UUCP> ned@ksr.UUCP (Kittlitz) writes:
>I found the Compcon article. I don't understand how EntryHi (the VPN
>part of the TLB entry) gets set, based on guess-translation of the
>opcodes in your sample.  I get an implication that k0/k1 are somehow
>not in the general register set, but I didn't see that in the article
>(or do you just not preserve two registers?)

When there is a user segment TLB-miss, the hardware stuffs the bad virtual
address into the BadVaddr register, and puts the bad VPN part into both the
EntryHi register and the COntext register.  the software doesn't need to use
this if it doesn't feel like it, but it's cheap for the hardware to do,
and it saves a couple cycles getting the VPN into EntryHi.  All this got
here by writing early versions of the UTLBMISS handler, then successively
paring it down by moving to hardware those functions that could be
done in parallel, cheaply, without taking away decision-making from the
softwarel, and that mostly consisted of latching useful data.

k0 and k1 are names for 2 regular registers.  They just aren't preserved,
and the compilers don't generate code that uses them.  User programs
may use them if they wish to consider them as pseudo-random number
generators that change at arbitrary times :-)
All this may seem strange.  The reasoning was:
	a) There was some evidence that the marginal utility of extra
	registers (for general use, not counting stack pointer, return
	address, zero) decreases strongly around 24-28 registers.
	[Our compiler group did some quick tests, giving the compilers
	options to use varying numbers of registers, to get this result.
	I've heard the same range informally from both IBM and HP;
	I'm not sure if this is published anywhere.]
	Hence it was reasonable to give the kernel 2.  We might have given it
	3, but it wouldn't really have saved us anything in a UNIX environment.
	The kernel MUST be given 1 it can trash [to do the return jump,
	restore from exception sequence].
	b) There were thoughts of more specialized constructs; as usual,
	it costs less to have 32 general registers and just use them this way
	by software convention.  HP Spectrums have (I think) 3 magic registers
	into which they save/restore 3 GP regs during their TLB miss handling.
	c) In more dedicated operating environments [i.e., controllers],
	it is quite plausible to dedicate more registers to the kernel,
	so that one can enter the kernel and do significant work [such as
	fast-path system calls or low-latency interrupt handling] without
	ever saving or restoring registers.  In such environments, you tell
	the compilers to lay their hands off more registers.  For a general
	UNIX environment, this doesn't usually make sense [we estimated
	the tradeoffsfrom the numbers we got above.]
>
>I don't know Unix, so whether or not the reclaim list is a standard
>thing, I don't know exactly how it is used. I am familiar with several
>systems using "clock" algorithms, which operate as:
>   - is PTE used off? This page hasn't been used for a while.
>     - is modified on? time to purify the page
>     - else modified is off, this page can be re-used
>   - else PTE used is on, turn it off.
These are typical.
>Each manipulation which turns off a bit in the PTE has to ensure that
>it is off in all TLBs.
No: see my comment below about out-of-synchness.
The point is that you don't have to do this, all you have to do is
that you don't ever make wrong decisions, or that you catch out-of-sycnh
things before they do damage.  Note: this is a classic "hints" strategy
[Butler Lampson did a great paper on systems design a few years ago,
that among other principles used "hints" that sped performance, and
could well be wrong, if you were careful.  I'll try to dig up the
reference if no one else gets there first.]
>
>In article <351@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes:
>>Multi-processors: here's what I can say:
>>		them and fix them, AS LONG AS YOU TRAP MODIFIES.
>
>
>Re b),I don't know how you can trap modifies if write-enable is on in
>some TLB.
Correct: you don't trap the modifies thru a TLB that has the 
write-enable set; you do trap all others.  In general, the first one
that attempts to write into a non-writable one will cause the PTE
to be changed; later (much later) references will see the write-enable
set [OK]; references that happen in the middle may get unwritable
copies into the PTEs, but that's an acceptable inconsistency, because
it will get fixed.
>
>To avoid instantaneous clearing of TLBS, it seems to me that you need
>a reference count associated with each PTE (number of TLBs that have
>this cached).  The TLB replacement algorithm would manipulate the
>counts of the victim and new entry. I expect there is some other
>method involving periodic scanning of a list of changed PTEs, followed
>by updating your TLB.

No, this is just not necessary.  Unfortunately, it's hard to show this
without an exhaustive analysis of all the cases, states, and transitions.
The only such analysis I've got needs reworking to remove some vendor-specific
references, and even it depended on knowledge of people's experiences with
MP systems, i.e., it was explained assuming detailed experience in
MP-izing UNIX systems.  If I can think of a terse explanation that makes
sense, I'll post it after I get back from next trip.

At least some of this depends on the fact that you can always just
flush the TLB of things that might be stale, and if you're sneaky,
you can manage to drive the overhead of doing this down by only doing
it every once in a while.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

bjorn@alberta.UUCP (Bjorn R. Bjornsson) (05/04/87)

In article <360@winchester.UUCP>, mash@mips.UUCP (John Mashey) writes:
>	a) There was some evidence that the marginal utility of extra
>	registers (for general use, not counting stack pointer, return
>	address, zero) decreases strongly around 24-28 registers.
>	[Our compiler group did some quick tests, giving the compilers
>	options to use varying numbers of registers, to get this result.
>	I've heard the same range informally from both IBM and HP;
>	I'm not sure if this is published anywhere.]

While there is no shortage of particular applications that
can use all the registers you can throw in their general
direction,  I suspect that the above figure is closely tied
to human psychology (ie. the manifestations of manual programming)
and the nature of the Universe (that is the laws of physics),
the former being heavily influenced by the latter.
I'm reminded of what a prof said to me when I slipped up in a
measure and integration theory oral:

	If that were [not] true I don't know what the
	Universe would look like, except it wouldn't
	be anywhere close to what we are experiencing
	now!

Ah, this is starting too sound quite metaphysical isn't it?

			Bjorn R. Bjornsson
			{ubc-vision,ihnp4,mnetor}!alberta!bjorn