[net.arch] 98% hit rate on 386's TLB

clif@intelca.UUCP (Clif Purkiser) (11/19/85)
There seems to be a little bit of confusion about the 98% "typical" hit rate for
the 80386's translation lookaside buffer (TLB).  Our news feed seems to be 
erratic or I would have respond sooner to this question.

First the TLB is an address translation cache, it caches page table entries,
not instructions or data references.  A page table entry contains information 
about a page such as:  is it present, dirty, accessed, User/Supervisor, 
Read-only etc.  This is different than the cache of the 68020 which is an
instruction only cache.  A TLB is associated with memory management, not
program execution.  I believe that most of the flames about this
high hit rate were a result of the confusion as to what a 
TLB is.  If I am wrong and you understand about TLBs but don't believe the
98% hit rate, then read further and I will attempt to explain how we 
arrived at it.

TLB hit rates are typically higher than instruction and data caches
96-99.5% for a TLB  compared to a 80-95% hit rate for a big (>4K) caches.
Mainframe and minicomputer manufactures in particular DEC, and a UC Berkeley 
Professor have done extensive studies on the hit rates of both TLB's 
and code and data caches.  (If pressed I will dig them out of the files)

TLB hit rates depend on two factors the TLB and paging hardware, and the 
system environment.  The TLB + paging hardware consists of three factors in 
order of importance.  

1. The total address spaced mapped by the TLB hardware:  128K bytes for the 386
   (32 TLB entries * 4K byte pages = 128K bytes)

2. TLB organization:  the 386's TLB is 32 entries and 4 way set associative:  

3. TLB replacement Algorithm:  LRU, Most recently used, Random etc. 

The 386 paging hardware is a known quantity; the system environment is of course
unknown.  Let us look at some system factors which would influence the TLB hit
rate.

			Primary Factors
Size of application programs.  Programs under 128K bytes would have very high 
hit rates (>99.9%).  

Characteristics of application programs:  A Lisp program would probably have a
lower hit rate than a C program by the nature of the way Lisp programs allocate
and deallocate memory.  Caching theory basically says that programs tend to 
access data and code which is relatively close. (e.g. for loops, array access))

			Secondary Factors.
The Operating Systems time slice.  A system which swaps process 100
times a second will have a slightly lower hit rate, than one which only 
switches 10 times a second .  (This is relatively insignificant 
if an OS switches tasks 100/sec than the 386 will execute between 30,000
to 40,000 instructions/process switch.  If the application is under 128K 
bytes than a maximum of 32 TLB entries will have to be loaded, or 
32/35,000 ~= .1% miss rate.

Needless to say there are lots of other factors to consider.  So how did
Intel arrive at the 98% figure?  

We made extensive simulation and then compared them with published studies. 
We performed an address trace on our IBM 370 for variety of applications.
These traces generated information about which addresses an application will
access: For instance a C compiler may accesses locations
1000, 1004, 4010, 1000040, 4014, 4018 etc.

The application programs were all large; a MainSail compiler, Nroff, a C 
compiler, and an circuit simulation program.  We took the addresses trace which
generated the WORSE hit rate, (the C compiler surprisingly enough), and then 
conducted simulations using several different page sizes, and number of TLB 
entries.

For the 386's TLB we found the the hit rate for the WORSE application was 
98.9%.    I am very confident that most systems will see the 98+% TLB hit rate 
that is quoted in 386 presentations.

-- 
Clif Purkiser, Intel, Santa Clara, Ca.
HIGH PERFORMANCE MICROPROCESSORS
{pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif
	
{standard disclaimer about how these views are mine and may not reflect
the views of Intel, my boss , or USNET goes here. }