aglew@ccvaxa.UUCP (09/30/86)
Motorola 68030 Cache Organization --------------------------------- Can someone explain to me why the change in cache organization between the 68020 and the 68030 is such a win? I don't need numbers, I'd just like a rationalization that explains the mechanism. NB. I'm not talking about the separate address/data lines to the I-cache - that's obviously an improvement. What I refer to is the comment in _Electronics_ that goes like this: To improve the likelihood of cache hits, Motorola is also reorganizing the 256-byte instruction cache into 16 entries of four long words each with 4 bytes per word. The 68020 instruction cache consists of 64 entries each of one long word... The reorganized instruction cache, along with the new burst mode addressing methods, should double the cache hit ratio and reduce the number of times the 68030 must access the system bus. First off, reducing the number of entries that can be independently associated seems to be a loss, not a win. But, have they changed the cache structure - is it fully associative now, where it wasn't before? Maybe they just needed fewer entries so that they could do it fast enough for a 1-cycle access with the separate A/D buses. Do the entries have to be strictly aligned, on a multiple of 16 byte boundary, or can they be skewed? I'd suspect the former. If so, this means that there will be advantages to aligning the top of your inner loops on a 16 byte boundary. NOPs, anybody? Why the emphasis on "four long words each with 4 bytes per word"? I assume that the 4 long words reflect how the cache line is filled, by a Modulo 4 burst mode memory access. That's probably one of the big advantages of this cache organization - it doesn't increase cache hit ratio so much as decrease the time necessary to make good a cache miss, so that you can get back to work quicker. Also, if you are sequentially accessing memory, as you do in a linear instruction stream, you may have obtained the next word, due to a burst mode line fill, before the processor asks for it - whereas if you weren't prefetching you'd have another miss, and even if you were prefetching but were using a slower memory access, you might not have it ready in time. The emphasis on `bytes' in the instruction cache probably means that it is easier for the execution unit to pull funny sized instructions out of the cache. Ahh, the joys of variable length instruction sets! The orientation to longer lines, filled faster by burst mode, is probably a good thing for an instruction cache, but one wonders whether it is so good for a data cache. Probably is for floating point numbers, which by themselves can fill up a cache line, or for matrix processing or graphics where you do a lot of sequential access to data, but maybe not so good for systems that use a lot of pointer accesses to random fields in structures, picking out, say, only one byte on every cache line filled. Could Motorola have given us a 64 entry 1 word per line data cache, like the 68020's instruction cache? (Oh, another thing: TLB address translation is done in parallel with cache access. Does this mean that the cache is virtual? Does it do invalidations according to physical addresses off the external bus, or what?) Summing up, I see these as the tradeoffs that came into the 68030 cache: LOSS fewer independent entries GAIN faster association on the fewer entries? GAIN faster filling using burst mode Longer cache lines GAIN for instructions GAIN for numerical and sequentially accessed data LOSS for pointer/structure oriented programs? Am I missing or confused about anything? Andy "Krazy" Glew. Gould CSD-Urbana. USEnet: ihnp4!uiucdcs!ccvaxa!aglew 1101 E. University, Urbana, IL 61801 ARPAnet: aglew@gswd-vms
simoni@Shasta.STANFORD.EDU (Richard Simoni) (10/02/86)
In article <5100146@ccvaxa> aglew@ccvaxa.UUCP writes: > >Motorola 68030 Cache Organization >--------------------------------- >Oh, another thing: TLB address translation is done in parallel with >cache access. Does this mean that the cache is virtual? This doesn't necessarily follow. Address translation is often done in parallel with cache access by using only the low-order bits of the virtual address (i.e., the bits that indicate the offset within the page) to address the cache. This is possible because these offset bits do not change in the virtual-to-physical mapping. When the cache access is complete, the tag (which is a physical page number) is compared with the result of the address translation (which happened in parallel with the cache access) to see if a hit occurred in the cache. The problem with this scheme is that it can be difficult to build a large cache since the page size limits the number of bits that can be used to address the cache. The size of the cache can be increased by making the cache set-associative and/or by increasing the page size (thereby increasing the number of bits that can address the cache). Of course, an on-chip cache (as in the 68030 case) will not be very large, anyway. Rich Simoni Center for Integrated Systems Stanford University simoni@sonoma.stanford.edu ...!decwrl!glacier!shasta!simoni
johnl@ima.UUCP (John R. Levine) (10/04/86)
In article <5100146@ccvaxa> aglew@ccvaxa.UUCP writes: > >Motorola 68030 Cache Organization >--------------------------------- > >Can someone explain to me why the change in cache organization between >the 68020 and the 68030 is such a win? ... > > To improve the likelihood of cache hits, Motorola is also reorganizing > the 256-byte instruction cache into 16 entries of four long words each > with 4 bytes per word. The 68020 instruction cache consists of 64 entries > each of one long word... The reorganized instruction cache, along with > the new burst mode addressing methods, should double the cache hit ratio > and reduce the number of times the 68030 must access the system bus. According to an article in Digital Design, the big win with this kind of cache design is that it takes advantage of nibble mode RAM chips that can cycle four sequential bits out very fast. It means you can get four times the data in a bus transaction in much less than four times the time. Since much read access is sequential anyway (instruction execution, or scanning a string or a table) it's a big win. -- John R. Levine, Javelin Software Corp., Cambridge MA +1 617 494 1400 { ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.EDU The opinions expressed herein are solely those of a 12-year-old hacker who has broken into my account and not those of any person or organization.