[comp.arch] TLB freezeup?

cprice@mips.COM (Charlie Price) (05/26/90)
In article <39006@mips.mips.COM> mark@mips.COM (Mark G. Johnson) writes:
>In article <26274@super.ORG> rminnich@super.UUCP (Ronald G Minnich) writes:
>  >I attended a talk once by someone who had worked on porting a popular 
>  >OS to an R2000. He mentioned a problem he had been bit by which he 
>  >called TLB freezeup. I have since wondered exactly the sequence of 
>  >events that could cause this. Can anyone out there give a good description
>  >this phenomenon? Is there a hardware basis for this problem or is it 
>  >merely a result of bugs in the software which is responsible for loading 
>  >the TLBs? I was unclear whether it was just a deadlock or a true hardware
>  >event.
>
>
>"TLB freezeup" may be a common name for some kind software phenomenon, in
>which case all of the hadrware discussion below is misleading.  Apologies
>in advance, do not hold in hand, use only under adult supervision.
>
>
>    The System Coprocessor (aka memory management unit) of the R2000 
>    contains a 32b register called the Status Register.  Bit 21 of that
>    register is a special flag that is asserted in case the hardware
>    detects it is in danger.  It's described on page 5-7 of the R2000/
>    R3000 book by Gerry Kane.  Locally among the chip designers this bit is
>    called the "TLB Burnout" protector.  Perhaps you misremembered this as
>    "TLB freezeup".  In the book it's called "TLB Shutdown".
>
>    Suppose for a moment that the OS software went crazy and wrote utter
>    nonsense into the virtual-to-physical mapping entries.  For example,
>    what would happen if the TLB was told that (all within one ProcessID):
>          Virtual Address 37  <====> Physical Address  6
>          Virtual Address 37  <====> Physical Address 51
...
>    If this sort of sillyness ever happens (and hopefully it doesn't),
>    the R2000 sets its TLB Shutdown bit, telling you that you tried to
>    use the TLB in a REALLY unexpected fashion.  The reason for doing
>    this is both selfish and altruistic: it lets S/W know something went
>    wrong, and it protects the TLB hardware from mangling itself.

Mark fails to mention what the hardware difficulty is.

A TLB is generally just a small cache of physical-to-virtual translations.
The TLB in the R2000 and R3000 is fully-associative,
any translation can go into any of the cache locations.
The lookup mechanism for a such a fully-associative cache
is a content addressable memory for the virtual page number.
In effect, you yell out the index (the page number) and if the
translation is in any TLB entry, it shouts the physical page number back.
The problem is when more than one location answers at a time,
and that can happen if the same virtual address is mistakenly
mapped to more than one physical address.
If that happens, more than one value will be driven at the same time.
(Here is where the software-understanding-of-hardware handwaving starts).
If the hardware is trying to simultaneously drive a 0 and a 1 out
for a particular bit position, this will cause highly-undesired
current and this can actually destroy the circuits involved.
The MIPS TLB detects the situation where it gets multiple
answers to a lookup and disables the TLB before it can destroy itself.
-- 
Charlie Price    cprice@mips.mips.com        (408) 720-1700
MIPS Computer Systems / 928 Arques Ave. / Sunnyvale, CA   94086