john@wa3wbu.UUCP (John Gayman) (11/22/88)
Can anyone tell me breifly what the difference is between "Direct Mapped Cache" and "Two-way Set Associative Cache" and why one would be preferred over the other ? I usually see the first being a 32K and the second 64K. Thanks. This is on a 386 system. John -- John Gayman, WA3WBU | UUCP: uunet!wa3wbu!john 1869 Valley Rd. | ARPA: john@wa3wbu.uu.net Marysville, PA 17053 | Packet: WA3WBU @ AK3P
rmarks@KSP.Unisys.COM (Richard Marks) (11/24/88)
In article <673@wa3wbu.UUCP> john@wa3wbu.UUCP (John Gayman) writes: > Can anyone tell me breifly what the difference is between "Direct >Mapped Cache" and "Two-way Set Associative Cache" and why one would be >preferred over the other ? I usually see the first being a 32K and >the second 64K. Thanks. This is on a 386 system. I'll try. Basically all cache is addressed by the low bits of the address. The address in the cache is usually to the word (4 bytes) level. So you can always get something from the cache, the question is VALIDITY. You get an extra 8 bits along with the 32 bits of data from the cache. These 8 bits are compared with the next 8 bits of the address. If they match, you have cached data. (The highest 8 bits must match or cache gets reset. This is not a problem on a PC but it is on Unix). So far I have described "Direct Mapped Cache". Now in most PC programs there is a seperate code and data segment. Usually these are addressed from the low segment offset; so low offsets are used more frequently. Thus considering the above DMC algorithm, there is a good chance of having code and data access the same offset and thus the same cache location. This causes cache to keep reloading and reduces it's effectiveness. The two-way cache allows two segemnt addresses for the same offset. I thought most 386 machines use DMC while the Motorola MMU'c use two-way. Does anyone know which 386's use two way?? The 386 Unisys machines I know of (the 6000/50 and PW850) use DMC. Richard Marks rmarks@KSP.unisys.COM
rmarks@KSP.Unisys.COM (Richard Marks) (11/24/88)
I goofed. In my previous posting on cache, I said that our (Unisys) PW850 only has Direct Cache. It comes with 32k direct cache; but you can add another 32k and then it becomes 2-way associative cache. By the way, these PW850's are real good boxes and they are selling at clone competitive prices. Richard Marks rmarks@KSP.unisys.COM
darel@maccs.McMaster.CA (Darel Mesher) (11/24/88)
There are _many_ different caching strategies (limited only by designer imagination) however 3 more popular cache organizations are ; Associative Direct Mapped Set Associative An Associative cache uses a content addressable memory (CAM) where a tag (address) is presented to the cache and simultaneously all valid cache tag entries are compared. In the event of a 'hit' the data is returned, a 'miss' generates an external memory fetch (usually a block of data is transfered [eg. 4 words] in burst mode) and the new data and tag displaces a current cache entry. Since the cache uses a CAM, the replacement mechanism can be much more efficient than the Direct Mapped approach. Typical replacement strategies include Least Recently Used (LRU), Random and First In First Out (FIFO). Needless to say, the complexity of a CAM is great, each cell in the cache has to have its own tag comparitor, and for on-chip caches this is a luxury that not often available. This complexity is a trade-off for one of the most efficient caching strategies. Direct Mapped caches are by far the simplest and most compact (in terms of chip real estate) however, with simplicity comes draw- backs. Direct Mapped caches directly map the tag (address) to the corresponding data. There is no replacement algorithm, the tag (address) determines which location in the cache must be used. Hence aliasing is a problem: ,-----. ,-----| = ? |-----> HIT Tag check and data | `-----' retrieval done | ^ simultaneously. | |N-m bits | ,--------. Example: N= 4 and m= 2 bits N-m bits| | TAG | ,-------' | (RAM) | Tag Data Valid N bit/ `--------' ---------------------- ---< ^ addr 00 | | | | addr \_________________| 01 | 10 | data | 1 | m bits | 10 | | | | (LSB) v addr 11 | | | | ,--------. ---------------------- | DATA | | (RAM) | `--------' Therefore addr 1001 would map | to the second entry in the | table and in this case it is data <----' valid (bit = 1) and 'HIT' signalling a cache 'hit'. The problem of aliasing can be demonstrated using the above example. If a loop in the executing code mades repeated references to different addresses with the same Tag field then that cache entry would generate a 'cache miss' each time thru the loop. For example if both address 1101 and 1001 are repeatedly referenced then each reference would generate a cache fault that would force an external memory fetch. In short, this would result in perpetual displacement of the xx01 cache entry while inside the loop. This scenario represents the limitations of Direct Mapped caches. Set Associative can be considered as providing a compromise between Direct Mapped (simple,reasonable performance) and Associative (complex, execellent performance). N-way set Associative caches provide N Direct Mapped caches which are inspected simultaneously. Using the Direct Mapped example from above with a set size of 2 the cache table would look like; Tag Data Valid -------------------------- 00 |.......|...........|......| |_______|___________|______| 01 |..10...|...data....|...1..| |__11___|___data____|___1__| 10 |.......|...........|......| |_______|___________|______| 11 |.......|...........|......| | | | | -------------------------- Therefore, there are now two entries for the 01 tag and as in the example above, references for the two address 1101 and 1001 would both result in cache hits. The hardware is still relatively simple and replacement strategies can be employed for entry displacement. It has been shown that optimum set sizes fall into the 2 to 4 range (since a programmer rarely manipulates more than 4 data structures simultaneously). The 2-way Set Associative approach shows a noticable performance enhancement over the direct mapped approach. It is not uncommon to find CPUs with different size and types of on-board caches. The National Semiconductor NS32532 has a 512 byte Direct Mapped Instruction cache, a 1024 byte 2-Way Set Associative Data cache and an on-board MMU with a 64 entry Associative (CAM) Translation Lookaside Buffer. This one CPU has all three cache types mentioned above! -- Darel Mesher ...!uunet!mnetor!maccs!darel McMaster University darel@maccs.mcmaster.ca