clif@intelca.UUCP (Clif Purkiser) (11/19/85)
There seems to be a little bit of confusion about the 98% "typical" hit rate for the 80386's translation lookaside buffer (TLB). Our news feed seems to be erratic or I would have respond sooner to this question. First the TLB is an address translation cache, it caches page table entries, not instructions or data references. A page table entry contains information about a page such as: is it present, dirty, accessed, User/Supervisor, Read-only etc. This is different than the cache of the 68020 which is an instruction only cache. A TLB is associated with memory management, not program execution. I believe that most of the flames about this high hit rate were a result of the confusion as to what a TLB is. If I am wrong and you understand about TLBs but don't believe the 98% hit rate, then read further and I will attempt to explain how we arrived at it. TLB hit rates are typically higher than instruction and data caches 96-99.5% for a TLB compared to a 80-95% hit rate for a big (>4K) caches. Mainframe and minicomputer manufactures in particular DEC, and a UC Berkeley Professor have done extensive studies on the hit rates of both TLB's and code and data caches. (If pressed I will dig them out of the files) TLB hit rates depend on two factors the TLB and paging hardware, and the system environment. The TLB + paging hardware consists of three factors in order of importance. 1. The total address spaced mapped by the TLB hardware: 128K bytes for the 386 (32 TLB entries * 4K byte pages = 128K bytes) 2. TLB organization: the 386's TLB is 32 entries and 4 way set associative: 3. TLB replacement Algorithm: LRU, Most recently used, Random etc. The 386 paging hardware is a known quantity; the system environment is of course unknown. Let us look at some system factors which would influence the TLB hit rate. Primary Factors Size of application programs. Programs under 128K bytes would have very high hit rates (>99.9%). Characteristics of application programs: A Lisp program would probably have a lower hit rate than a C program by the nature of the way Lisp programs allocate and deallocate memory. Caching theory basically says that programs tend to access data and code which is relatively close. (e.g. for loops, array access)) Secondary Factors. The Operating Systems time slice. A system which swaps process 100 times a second will have a slightly lower hit rate, than one which only switches 10 times a second . (This is relatively insignificant if an OS switches tasks 100/sec than the 386 will execute between 30,000 to 40,000 instructions/process switch. If the application is under 128K bytes than a maximum of 32 TLB entries will have to be loaded, or 32/35,000 ~= .1% miss rate. Needless to say there are lots of other factors to consider. So how did Intel arrive at the 98% figure? We made extensive simulation and then compared them with published studies. We performed an address trace on our IBM 370 for variety of applications. These traces generated information about which addresses an application will access: For instance a C compiler may accesses locations 1000, 1004, 4010, 1000040, 4014, 4018 etc. The application programs were all large; a MainSail compiler, Nroff, a C compiler, and an circuit simulation program. We took the addresses trace which generated the WORSE hit rate, (the C compiler surprisingly enough), and then conducted simulations using several different page sizes, and number of TLB entries. For the 386's TLB we found the the hit rate for the WORSE application was 98.9%. I am very confident that most systems will see the 98+% TLB hit rate that is quoted in 386 presentations. -- Clif Purkiser, Intel, Santa Clara, Ca. HIGH PERFORMANCE MICROPROCESSORS {pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif {standard disclaimer about how these views are mine and may not reflect the views of Intel, my boss , or USNET goes here. }