[comp.sys.ibm.pc] Different types of cache on 386's

john@wa3wbu.UUCP (John Gayman) (11/22/88)

      Can anyone tell me breifly what the difference is between "Direct
Mapped Cache" and "Two-way Set Associative Cache" and why one would be
preferred over the other ?  I usually see the first being a 32K and
the second 64K.  Thanks.  This is on a 386 system.


					John


-- 
John Gayman, WA3WBU              |           UUCP: uunet!wa3wbu!john
1869 Valley Rd.                  |           ARPA: john@wa3wbu.uu.net 
Marysville, PA 17053             |           Packet: WA3WBU @ AK3P

rmarks@KSP.Unisys.COM (Richard Marks) (11/24/88)

In article <673@wa3wbu.UUCP> john@wa3wbu.UUCP (John Gayman) writes:
>      Can anyone tell me breifly what the difference is between "Direct
>Mapped Cache" and "Two-way Set Associative Cache" and why one would be
>preferred over the other ?  I usually see the first being a 32K and
>the second 64K.  Thanks.  This is on a 386 system.

I'll try. Basically all cache is addressed by the low bits of the address. 
The address in the cache is usually to the word (4 bytes) level.  So you
can always get something from the cache, the question is VALIDITY.

You get an extra 8 bits along with the 32 bits of data from the cache.
These 8 bits are compared with the next 8 bits of the address.  If they match,
you have cached data.  (The highest 8 bits must match or cache gets reset.
This is not a problem on a PC but it is on Unix).  So far I have described
"Direct Mapped Cache".

Now in most PC programs there is a seperate code and data segment.  Usually
these are addressed from the low segment offset; so low offsets are used
more frequently.  Thus considering the above DMC algorithm, there is a good
chance of having code and data access the same offset and thus the same
cache location.  This causes cache to keep reloading and reduces it's
effectiveness.  The two-way cache allows two segemnt addresses for the same
offset.  

I thought most 386 machines use DMC while the Motorola MMU'c use two-way.
Does anyone know which 386's use two way??  The 386 Unisys machines I know of
(the 6000/50 and PW850) use DMC.

Richard Marks
rmarks@KSP.unisys.COM

rmarks@KSP.Unisys.COM (Richard Marks) (11/24/88)

I goofed.  In my previous posting on cache, I said that our (Unisys) PW850
only has Direct Cache.  It comes with 32k direct cache; but you can add 
another 32k and then it becomes 2-way associative cache.

By the way, these PW850's are real good boxes and they are selling at
clone competitive prices.

Richard Marks
rmarks@KSP.unisys.COM

darel@maccs.McMaster.CA (Darel Mesher) (11/24/88)

 There are _many_ different caching strategies (limited only 
by designer imagination) however 3 more popular cache organizations
are ;
         Associative 
         Direct Mapped
         Set Associative


  An Associative cache uses a content addressable memory (CAM)
where a tag (address) is presented to the cache and simultaneously
all valid cache tag entries are compared.  In the event of a 'hit'
the data is returned, a 'miss' generates an external memory fetch
(usually a block of data is transfered [eg. 4 words] in burst mode)
and the new data and tag displaces a current cache entry.  Since 
the cache uses a CAM, the replacement mechanism can be much more 
efficient than the Direct Mapped approach.  Typical replacement 
strategies include Least Recently Used (LRU), Random and First
In First Out (FIFO).  Needless to say, the complexity of a CAM
is great, each cell in the cache has to have its own tag comparitor,
and for on-chip caches this is a luxury that not often available.
This complexity is a trade-off for one of the most efficient 
caching strategies.


  Direct Mapped caches are by far the simplest and most compact 
(in terms of chip real estate) however, with simplicity comes draw-
backs.  Direct Mapped caches directly map the tag (address) to 
the corresponding data.  There is no replacement algorithm, the tag
(address) determines which location in the cache must be used. 
Hence aliasing is a problem:


                    ,-----.
              ,-----| = ? |-----> HIT      Tag check and data 
              |     `-----'                retrieval done 
              |        ^                   simultaneously.
              |        |N-m bits           
              |    ,--------.          Example: N= 4 and m= 2 bits 
      N-m bits|    |  TAG   |
      ,-------'    | (RAM)  |                Tag     Data  Valid
N bit/             `--------'             ----------------------
 ---<                  ^  addr        00 |        |        |    |
addr \_________________|              01 |  10    |  data  | 1  |
          m bits       |              10 |        |        |    |
          (LSB)        v  addr        11 |        |        |    |
                   ,--------.             ----------------------
                   |  DATA  |
                   | (RAM)  |
                   `--------'         Therefore addr 1001 would map
                       |              to the second entry in the 
                       |              table and in this case it is 
             data <----'              valid (bit = 1) and 'HIT'
                                      signalling a cache 'hit'.

The problem of aliasing can be demonstrated using the above example.
If a loop in the executing code mades repeated references to different
addresses with the same Tag field then that cache entry would generate
a 'cache miss' each time thru the loop.  For example if both address
1101 and 1001 are repeatedly referenced then each reference would 
generate a cache fault that would force an external memory fetch. 
In short, this would result in perpetual displacement of the xx01 cache
entry while inside the loop.  This scenario represents the limitations 
of Direct Mapped caches.


  Set Associative can be considered as providing a compromise between
Direct Mapped (simple,reasonable performance) and Associative 
(complex, execellent performance).  N-way set Associative caches
provide N Direct Mapped caches which are inspected simultaneously.
Using the Direct Mapped example from above with a set size of 2 the
cache table would look like;

         Tag      Data      Valid
       --------------------------
   00 |.......|...........|......|
      |_______|___________|______|
   01 |..10...|...data....|...1..|
      |__11___|___data____|___1__|
   10 |.......|...........|......|
      |_______|___________|______|
   11 |.......|...........|......|
      |       |           |      |
       --------------------------

Therefore, there are now two entries for the 01 tag and as in the 
example above, references for the two address 1101 and 1001 would 
both result in cache hits.  The hardware is still relatively simple
and replacement strategies can be employed for entry displacement.
It has been shown that optimum set sizes fall into the 2 to 4 
range (since a programmer rarely manipulates more than 4 data
structures simultaneously).  The 2-way Set Associative approach
shows a noticable performance enhancement over the direct mapped
approach.

It is not uncommon to find CPUs with different size and types
of on-board caches.  The National Semiconductor NS32532 has
a 512 byte Direct Mapped Instruction cache, a 1024 byte 2-Way Set 
Associative Data cache and an on-board MMU with a 64 entry Associative
(CAM) Translation Lookaside Buffer.  This one CPU has all three cache
types mentioned above!


-- 
Darel Mesher				...!uunet!mnetor!maccs!darel
McMaster University			    darel@maccs.mcmaster.ca