hmthaker@wateng.UUCP (03/02/87)
I have a question concerning caches. Currently, caches are used, either in uniprocessors, or multiprocessors, because the time a microprocessor takes to execute an instruction is much smaller than the time to reference main memory. Thus, a cache is used to match the speed difference (the cache is usually as fast as the microprocessor). Thus, it is evident that a cache can very dramatically improve the throughput of a microprocessor. For multiprocessors, much research is being conducted to find efficient algorithms for multi-cache consistency. Multi-cache consistency problem arises when, say a block of data resides in the caches of processors A, B, and C. Then, if processor B decides to write to that block, it must inform A and C that their copies of the block are no longer valid. My question is, then, with the current improvements in memory chips (ie. faster access, and greater densities), does anyone forsee a time in the distant future (> 3 or 4 years) that the speed of say, a 1Mb chip will be comparable to that of say a 1Kb ECL chip used in current caches? In other words, will all the research being conducted for the cache coherency problems be a waste? Could the research done for multi-cache coherency be applied elsewhere? Thanks. Hemi Thaker ------------------------------------------------------------ UUCP : {allegra,decvax,utzoo,clyde}!watmath!wateng!hmthaker
marc@ucla-cs.UUCP (03/03/87)
In article <3182@wateng.UUCP> hmthaker@wateng.UUCP (Hemi M. Thaker) writes: > > My question is, then, with the current improvements in >memory chips (ie. faster access, and greater densities), does >anyone forsee a time in the distant future (> 3 or 4 years) >that the speed of say, a 1Mb chip will be comparable to that >of say a 1Kb ECL chip used in current caches? > > In other words, will all the research being conducted >for the cache coherency problems be a waste? The question could also be stated as follow: "Will there still be some memory hierarchy mechanism in future computers"? Indeed one could say that the availability of very dense and very fast memory chips will eliminate secondary memory in the same way that these chips will eliminate the the use of caches between the processor and the main memory. Or will it... I don't think so. First of all there is always a price to pay when the size of memory using VLSI chips is increased (regardless of the technology used). A register file of 16 registers introduces less delay than one of 32, right? (because of longer data buses and/or longer select lines. In the same way the access to the cache will always be faster than one to main memory because its size is supposedly smaller, (in the case of direct mapping for example). Also if a 1Mb chip (I supposed that the author meant in CMOS) becomes as fast as an ECL chip, it also means that the processor will faster too, most likely using the same technology. One could argue that in the future the time of a Read/Write for VLSI chips will be so short that it will not be a limiting factor anymore. For example in a pipeline processor the time allowed for the processor to read/write its operands could be so long (some picoseconds!) that there would be no need for caches. Well in this case, the choice of the period of a subcycle should probably be decreased to match the access time to the registers (memory) and the "execute" part in the processor should be furthermore pipelined. We will see VLSI memory chips replace secondary memory though, only because its cost-performance ratio will be better than disc storage. But once again the hierarchy will remain the same. In order to reduce the dependency between the size of a register file and the delay of a Read (a larger register file introduces longer delays), we designed (at UCLA) a register file with variable size windows which does not increase the delays very much even if one doubles the size of the register file. For more information you can send me e-mail, or look in the proceedings of HICSS-87. Marc Tremblay Computer Science Department UCLA
dave@micropen.UUCP (03/03/87)
In article <3182@wateng.UUCP>, hmthaker@wateng.UUCP (Hemi M. Thaker) writes: > > In other words, will all the research being conducted > for the cache coherency problems be a waste? Could the > research done for multi-cache coherency be applied elsewhere? > Hemi Thaker > UUCP : {allegra,decvax,utzoo,clyde}!watmath!wateng!hmthaker Interesting case but this is not a question of electronics but of economics. For any given technology one chooses to implement a machine in, there will *always* be a technology faster, smaller, cooler or whatever. The only "problem" with the technology is that it will always be unfavorably expensive to built the entire machine out of-- but just a bit for cache will make the budget if engineering promises large gains for small investment. There will always be a way of making a cache memory for a system that will be faster than the main memory. MOS -> TTL -> ECL -> GaAs -> Josephson -> ??? So the answer is no, they're will be cache until engineers don't want to squeeze that last bit from the technology they are using. -- David F. Carlson, Micropen, Inc. ...!{seismo}!rochester!ur-valhalla!micropen!dave "The faster I go, the behinder I get." --Lewis Carroll
kent@decwrl.UUCP (03/03/87)
Even if the performance of the memory hierarchy in the CPU (or main processor assembly in the case of a multiprocessor) becomes a moot point because the relative performance is small, current research in cache consistency algorithms will still be useful. There's a whole range of memory hierarchy that isn't being considered -- what about that disk you have hanging out there? Technology has made this portion of the memory hierarchy less and less visible (no longer do we have a small paging drum that acts as a cache for a slower, larger paging disk, for example), but it's still there, and still critical to good performance. In fact, with the introduction of remote file systems in computing systems distributed across a LAN, the relative performance of main memory to backing/paging stores is worse than it used to be. Much research is still needed to apply cache technology to this portion of the memory hierarchy, and main memory cache consistency seems to be a good place to start. My thesis research looked at adapting various multiprocessor cache consistency schemes to a distributed file system; some worked out quite nicely. chris -- Chris Kent Western Research Laboratory Digital Equipment Corporation kent@decwrl.dec.com decwrl!kent (415) 853-6639
reiter@endor.UUCP (03/04/87)
In article <3182@wateng.UUCP> hmthaker@wateng.UUCP (Hemi M. Thaker) writes: >that the speed of say, a 1Mb chip will be comparable to that >of say a 1Kb ECL chip used in current caches? > In other words, will all the research being conducted >for the cache coherency problems be a waste? Even if processor and memory chips are made with the same technology, the processor can access on-chip memory much faster than off-chip memory. So, even if CMOS is as fast as ECL, we still may want to make a cache of our processor's on-chip memory. Also, main memory is typically shared accessed by a bus with several masters (DMA I/O devices as well as CPU, perhaps several CPU's in a shared-memory multiprocessor), so somewhat complex (and slow) bus protocols may be required, and memory bandwidth must be shared. Processors may still benefit from a private unshared cache memory, even if the cache's memory chips are no faster than main memory's memory chips. I think caches will be around for quite a while! Ehud Reiter reiter@harvard (ARPA,BITNET,UUCP)
ron@brl-sem.UUCP (03/04/87)
In article <3182@wateng.UUCP>, hmthaker@wateng.UUCP (Hemi M. Thaker) writes: > I have a question concerning caches. Currently, > caches are used, either in uniprocessors, or multiprocessors, > because the time a microprocessor takes to execute an > instruction is much smaller than the time to reference > main memory. Thus, a cache is used to match the speed > difference (the cache is usually as fast as the microprocessor). The advantage of the cache isn't because it is just faster main memory, but in that it short circuits the address decoding that must happen to access conventional RAM. As main memories get faster, so will caches. Projects are underway to build processors with large scale "content adderssable memories" which are essentially machines whose main memory is like today's cache. > > My question is, then, with the current improvements in > memory chips (ie. faster access, and greater densities), does > anyone forsee a time in the distant future (> 3 or 4 years) > that the speed of say, a 1Mb chip will be comparable to that > of say a 1Kb ECL chip used in current caches? > Like I said, faster memory, faster caches. A few years ago, one of my coworkers came in my office and told me that in a few years I would be able to get a computer to sit on my desk the speed of a VAX (780) and then I'd get one and I'd be happy. I told him (and later this has been referred to around here as "Ron's Rule of Computing") that when that day came I'd still want a computer "this big" (arms outstretched indicating the approximate size of a 780 CPU cabinet), whatever the technology of the time can fit in that much space. As the technology increases, so do our needs. -Ron
davidsen@steinmetz.UUCP (03/04/87)
Several bits of information on cache. First the improvement I have noted is more like 20% than the large numbers you have heard elsewhere. This doesn't mean that you *can't* get large improvements, just that they happen when memory is way too slow. Second, putting the cache on a multi-port memory controller was not state of the art when I first saw it in 1968. It may come back for some designs. It eliminates (more or less) cycle stealing by placing N processors and N i/o subsystems on a multiport memory. The memory controller has the cache. Third, in a Harvard archetecture, the data and address space is separate, eliminating the need to worry about syncing the contents of the code (unless someone plans to go back to self-modifying code :-}). Finally, The IBM 4Mbit chips are said to run at 45ns, and the 16Mbit chips being prototyped in Japan are about half that. The 16Mbit chips are "proofs of existance, not beta test versions". I think that your conjecture that main memory may be fast enough is certainly probably for most applications, particularly since the majority of new machines will probably use at least a 32bit data path, cutting the number of accesses. -- bill davidsen sixhub \ ihnp4!seismo!rochester!steinmetz -> crdos1!davidsen chinet / ARPA: davidsen%crdos1.uucp@ge-crd.ARPA (or davidsen@ge-crd.ARPA)
baum@apple.UUCP (03/05/87)
-------- [] This idea behind caching is that it is possible to make a combination of small-fast and large-slow memory look large-fast. Obviously, this is only useful if you can't get large-fast for some reason (e.g. it is prohibitively expensive). This is generally the case; fast memories cost more than slow memories of equivalent size. There is no reason to believe that this 'fact of economics/nature' will change. Another possibility is that memory will be fast enough; faster memories will not speed up the system as a whole. I wouldn't expect this to be the case unless the logic or I/O is slow. If it is the logic, build your logic out of memories (or memory technology). If it is I/O that slows the system down instead of logic, then (assuming Disk I/O) you can make your disk look faster using caching techniques as well. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
markp@valid.UUCP (03/05/87)
> > I have a question concerning caches. Currently, > caches are used, either in uniprocessors, or multiprocessors, > because the time a microprocessor takes to execute an > instruction is much smaller than the time to reference > main memory. Thus, a cache is used to match the speed > difference (the cache is usually as fast as the microprocessor). More precisely, a cache is used to match the throughput of the memory system to the memory bandwidth of the processor. Instruction boundaries are inconsequential, except that RISC machines tend to have a memory reference to instruction ratio of very close to 1 (actually 1.2-1.4, the instruction reference itself plus .2-.4 for load/stores, more or less). > > Thus, it is evident that a cache can very dramatically > improve the throughput of a microprocessor. For multiprocessors, > much research is being conducted to find efficient algorithms > for multi-cache consistency. > No comment. :-) > Multi-cache consistency problem arises when, say a > block of data resides in the caches of processors A, B, and > C. Then, if processor B decides to write to that block, > it must inform A and C that their copies of the block are > no longer valid. > Or update the copies in the caches of A and C (broadcast write). > My question is, then, with the current improvements in > memory chips (ie. faster access, and greater densities), does > anyone forsee a time in the distant future (> 3 or 4 years) > that the speed of say, a 1Mb chip will be comparable to that > of say a 1Kb ECL chip used in current caches? > Sure. But you're comparing apples and oranges, and it doesn't make sense to compare the speeds of tomorrow's memory chips with today's processors. Look at it this way-- CPU's and memory chips use basically the same fab processes. It usually happens that memory chips serve as the testbeds for the new processes (i.e. megabit DRAM's for 1u CMOS) and CPU's follow, but the CPU's ALWAYS FOLLOW! Therefore your comparison is irrelevant, since the memory bandwidth required by next-generation processors will still far exceed the bandwidth of next-generation memories. In fact, the new processors will cycle so fast that it will be impractical to go off-chip for cache, and we will see instruction and data caches integrated onto the CPU chip (just like the 68030, but of truly useful size). Furthermore, multi-level caches will become more important, as the on-chip cache(s) may not provide a good enough hit rate to allow going to main memory directly, and an on-board cache (say a few megabytes or so) will be used to further reduce the average time required to satisfy a memory reference. Main memory, consisting of 16Mb chips or even denser, will still serve as a last resort (not counting the disk, of course). > In other words, will all the research being conducted > for the cache coherency problems be a waste? Could the > research done for multi-cache coherency be applied elsewhere? > > Thanks. > > Hemi Thaker > Ooh, my blood is boiling now! Especially since I spent 2 years working on multiprocessor cache coherence algorithms for my MSEE, and have then spent the last 2+ years helping the P896 Futurebus committee to define a standard set of facilities to implement various multi-cache consistency protocols (of varying complexity and performance). The answer to your questions is "no" on both counts, by the way, and it is likely that coherence solutions partially enforced by software will become more and more important. In other words, the on-chip MMU may contain bits to designate pages "potentially sharable," enabling consistency to be enforced on those pages only. Otherwise, communication between processor modules, even at >100MB/sec, is insufficient to support the invalidation and/or update traffic, particularly in systems with many (i.e. >10) processors. Even though I can imagine a system that uses a 2GB/sec fiber-optic bus to connect its GaAs processors to a bank of memory boards based on 256Mb CMOS memories, a hierarchy of communication bandwidths will still exist, and caches will still be necessary. Also, don't forget that the speed of light will become very important in microprocessor-based memory hierarchies in the 1990's, and this will form yet another driving force behind the effective use of cache memories. Now of course, if you are willing to argue that cache consistency enforcement can be done completely in software while providing an acceptable programming model (i.e. reasonably transparent) for parallel programming and efficient load balancing of processes across multiple processors, then go ahead and prove it to me by example. But you still need caches on your processors for the same reasons as I detailed above, and there still needs to be research done in more efficient software-enforced cache coherence schemes. In other words, the field is still WIDE open, but future research should proceed thinking about 1990's technology, not 1980's technology. Flame off. Phew. Mark Papamarcos Valid Logic [and P896 Futurebus working group/cache task group] {ihnp4,hplabs}!pesnta!valid!markp