[comp.arch] Will caches ever become obsolete?

hmthaker@wateng.UUCP (03/02/87)

      I have a question concerning caches.  Currently,
caches are used, either in uniprocessors, or multiprocessors,
because the time a microprocessor takes to execute an
instruction is much smaller than the time to reference
main memory.  Thus, a cache is used to match the speed
difference (the cache is usually as fast as the microprocessor).

      Thus, it is evident that a cache can very dramatically
improve the throughput of a microprocessor.  For multiprocessors,
much research is being conducted to find efficient algorithms
for multi-cache consistency.

      Multi-cache consistency problem arises when, say a
block of data resides in the caches of processors A, B, and
C.  Then, if processor B decides to write to that block,
it must inform A and C that their copies of the block are
no longer valid.

     My question is, then, with the current improvements in
memory chips (ie. faster access, and greater densities), does
anyone forsee a time in the distant future (> 3 or 4 years)
that the speed of say, a 1Mb chip will be comparable to that
of say a 1Kb ECL chip used in current caches?

     In other words, will all the research being conducted
for the cache coherency problems be a waste?  Could the
research done for multi-cache coherency be applied elsewhere?

Thanks.

             Hemi Thaker

------------------------------------------------------------
UUCP  : {allegra,decvax,utzoo,clyde}!watmath!wateng!hmthaker

marc@ucla-cs.UUCP (03/03/87)

In article <3182@wateng.UUCP> hmthaker@wateng.UUCP (Hemi M. Thaker) writes:
>
>     My question is, then, with the current improvements in
>memory chips (ie. faster access, and greater densities), does
>anyone forsee a time in the distant future (> 3 or 4 years)
>that the speed of say, a 1Mb chip will be comparable to that
>of say a 1Kb ECL chip used in current caches?
>
>     In other words, will all the research being conducted
>for the cache coherency problems be a waste?  


	The question could also be stated as follow: "Will there 
still be some memory hierarchy mechanism in future computers"?
Indeed one could say that the availability of very dense and very
fast memory chips will eliminate secondary memory in the same way that
these chips will eliminate the the use of caches between the processor
and the main memory. Or will it...

	I don't think so. First of all there is always a price to pay
when the size of memory using VLSI chips is increased (regardless of the
technology used). A register file of 16 registers introduces less
delay than one of 32, right? (because of longer data buses and/or 
longer select lines. In the same way the access to the cache
will always be faster than one to main memory because its size is 
supposedly smaller, (in the case of direct mapping for example).

	Also if a 1Mb chip (I supposed that the author meant in CMOS)
becomes as fast as an ECL chip, it also means that the processor will
faster too, most likely using the same technology.

	One could argue that in the future the time of a Read/Write for
VLSI chips will be so short that it will not be a limiting factor 
anymore. For example in a pipeline processor the time allowed for 
the processor to read/write its operands could be so long (some 
picoseconds!) that there would be no need for caches.  Well in 
this case, the choice of the period of a subcycle should probably
be decreased to match the access time to the registers (memory) 
and the "execute" part in the processor should be furthermore pipelined.

	We will see VLSI memory chips replace secondary memory though,
only because its cost-performance ratio will be better than disc storage.
But once again the hierarchy will remain the same.

	In order to reduce the dependency between the size of a register
file and the delay of a Read (a larger register file introduces longer
delays), we designed (at UCLA) a register file with variable size windows which
does not increase the delays very much even if one doubles the size of 
the register file. For more information you can send me e-mail, or look
in the proceedings of HICSS-87.

					Marc Tremblay
					Computer Science Department
					UCLA

dave@micropen.UUCP (03/03/87)

In article <3182@wateng.UUCP>, hmthaker@wateng.UUCP (Hemi M. Thaker) writes:
> 
>      In other words, will all the research being conducted
> for the cache coherency problems be a waste?  Could the
> research done for multi-cache coherency be applied elsewhere?
>              Hemi Thaker
> UUCP  : {allegra,decvax,utzoo,clyde}!watmath!wateng!hmthaker

Interesting case but this is not a question of electronics but of economics.
For any given technology one chooses to implement a machine in, there will
*always* be a technology faster, smaller, cooler or whatever.  The only 
"problem" with the technology is that it will always be unfavorably expensive
to built the entire machine out of-- but just a bit for cache will make
the budget if engineering promises large gains for small investment.
There will always be a way of making a cache memory for a system that 
will be faster than the main memory.  
		MOS -> TTL -> ECL -> GaAs -> Josephson -> ???

So the answer is no, they're will be cache until engineers don't want to
squeeze that last bit from the technology they are using.

-- 
David F. Carlson, Micropen, Inc.
...!{seismo}!rochester!ur-valhalla!micropen!dave

"The faster I go, the behinder I get." --Lewis Carroll

kent@decwrl.UUCP (03/03/87)

Even if the performance of the memory hierarchy in the CPU (or main processor
assembly in the case of a multiprocessor) becomes a moot point because
the relative performance is small, current research in cache consistency
algorithms will still be useful. 

There's a whole range of memory hierarchy that isn't being considered -- what
about that disk you have hanging out there? Technology has made this
portion of the memory hierarchy less and less visible (no longer do we have
a small paging drum that acts as a cache for a slower, larger paging
disk, for example), but it's still there, and still critical to good
performance. In fact, with the introduction of remote file systems in
computing systems distributed across a LAN, the relative performance
of main memory to backing/paging stores is worse than it used to be.

Much research is still needed to apply cache technology to this portion
of the memory hierarchy, and main memory cache consistency seems to be
a good place to start. My thesis research looked at adapting various
multiprocessor cache consistency schemes to a distributed file system;
some worked out quite nicely.

chris
-- 
Chris Kent	Western Research Laboratory	Digital Equipment Corporation
kent@decwrl.dec.com	decwrl!kent			(415) 853-6639

reiter@endor.UUCP (03/04/87)

In article <3182@wateng.UUCP> hmthaker@wateng.UUCP (Hemi M. Thaker) writes:
>that the speed of say, a 1Mb chip will be comparable to that
>of say a 1Kb ECL chip used in current caches?
>     In other words, will all the research being conducted
>for the cache coherency problems be a waste?

Even if processor and memory chips are made with the same technology,
the processor can access on-chip memory much faster than off-chip memory.
So, even if CMOS is as fast as ECL, we still may want to make a cache of
our processor's on-chip memory.

Also, main memory is typically shared accessed by a bus with several masters
(DMA I/O devices as well as CPU, perhaps several CPU's in a shared-memory
multiprocessor), so somewhat complex (and slow) bus protocols may be
required, and memory bandwidth must be shared.  Processors may still
benefit from a private unshared cache memory, even if the cache's memory
chips are no faster than main memory's memory chips.

I think caches will be around for quite a while!

					Ehud Reiter
					reiter@harvard (ARPA,BITNET,UUCP)

ron@brl-sem.UUCP (03/04/87)

In article <3182@wateng.UUCP>, hmthaker@wateng.UUCP (Hemi M. Thaker) writes:

>       I have a question concerning caches.  Currently,
> caches are used, either in uniprocessors, or multiprocessors,
> because the time a microprocessor takes to execute an
> instruction is much smaller than the time to reference
> main memory.  Thus, a cache is used to match the speed
> difference (the cache is usually as fast as the microprocessor).

The advantage of the cache isn't because it is just faster main
memory, but in that it short circuits the address decoding that
must happen to access conventional RAM.   As main memories get
faster, so will caches.  Projects are underway to build processors
with large scale "content adderssable memories" which are essentially
machines whose main memory is like today's cache.
> 
>      My question is, then, with the current improvements in
> memory chips (ie. faster access, and greater densities), does
> anyone forsee a time in the distant future (> 3 or 4 years)
> that the speed of say, a 1Mb chip will be comparable to that
> of say a 1Kb ECL chip used in current caches?
> 

Like I said, faster memory, faster caches.  A few years ago, one of my
coworkers came in my office and told me that in a few years I would be
able to get a computer to sit on my desk the speed of a VAX (780) and
then I'd get one and I'd be happy.  I told him (and later this has been
referred to around here as "Ron's Rule of Computing") that when that day
came I'd still want a computer "this big" (arms outstretched indicating
the approximate size of a 780 CPU cabinet), whatever the technology of
the time can fit in that much space.  As the technology increases,
so do our needs.

-Ron

davidsen@steinmetz.UUCP (03/04/87)

Several bits of information on cache. First the improvement I
have noted is more like 20% than the large numbers you have
heard elsewhere. This doesn't mean that you *can't* get large
improvements, just that they happen when memory is way too slow.

Second, putting the cache on a multi-port memory controller was
not state of the art when I first saw it in 1968. It may come
back for some designs. It eliminates (more or less) cycle
stealing by placing N processors and N i/o subsystems on a
multiport memory. The memory controller has the cache.

Third, in a Harvard archetecture, the data and address space is
separate, eliminating the need to worry about syncing the
contents of the code (unless someone plans to go back to
self-modifying code :-}).

Finally, The IBM 4Mbit chips are said to run at 45ns, and the
16Mbit chips being prototyped in Japan are about half that. The
16Mbit chips are "proofs of existance, not beta test versions".
I think that your conjecture that main memory may be fast enough
is certainly probably for most applications, particularly since
the majority of new machines will probably use at least a 32bit
data path, cutting the number of accesses.
-- 
bill davidsen			sixhub \
      ihnp4!seismo!rochester!steinmetz ->  crdos1!davidsen
				chinet /
ARPA: davidsen%crdos1.uucp@ge-crd.ARPA (or davidsen@ge-crd.ARPA)

baum@apple.UUCP (03/05/87)

--------
[]

This idea behind caching is that it is possible to make a combination
of small-fast and large-slow memory look large-fast. Obviously, this
is only useful if you can't get large-fast for some reason (e.g. it
is prohibitively expensive). This is generally the case; fast
memories cost more than slow memories of equivalent size. There is no
reason to believe that this 'fact of economics/nature' will change.

Another possibility is that memory will be fast enough; faster
memories will not speed up the system as a whole. I wouldn't expect
this to be the case unless the logic or I/O is slow. If it is the
logic, build your logic out of memories (or memory technology). If it
is I/O that slows the system down instead of logic, then (assuming
Disk I/O) you can make your disk look faster using caching techniques
as well.


--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

markp@valid.UUCP (03/05/87)

> 
>       I have a question concerning caches.  Currently,
> caches are used, either in uniprocessors, or multiprocessors,
> because the time a microprocessor takes to execute an
> instruction is much smaller than the time to reference
> main memory.  Thus, a cache is used to match the speed
> difference (the cache is usually as fast as the microprocessor).

More precisely, a cache is used to match the throughput of the memory
system to the memory bandwidth of the processor.  Instruction boundaries
are inconsequential, except that RISC machines tend to have a memory
reference to instruction ratio of very close to 1 (actually 1.2-1.4, the
instruction reference itself plus .2-.4 for load/stores, more or less).

> 
>       Thus, it is evident that a cache can very dramatically
> improve the throughput of a microprocessor.  For multiprocessors,
> much research is being conducted to find efficient algorithms
> for multi-cache consistency.
> 

No comment. :-)

>       Multi-cache consistency problem arises when, say a
> block of data resides in the caches of processors A, B, and
> C.  Then, if processor B decides to write to that block,
> it must inform A and C that their copies of the block are
> no longer valid.
> 

Or update the copies in the caches of A and C (broadcast write).

>      My question is, then, with the current improvements in
> memory chips (ie. faster access, and greater densities), does
> anyone forsee a time in the distant future (> 3 or 4 years)
> that the speed of say, a 1Mb chip will be comparable to that
> of say a 1Kb ECL chip used in current caches?
> 

Sure.  But you're comparing apples and oranges, and it doesn't make
sense to compare the speeds of tomorrow's memory chips with today's
processors.  Look at it this way-- CPU's and memory chips use basically
the same fab processes.  It usually happens that memory chips serve as
the testbeds for the new processes (i.e. megabit DRAM's for 1u CMOS)
and CPU's follow, but the CPU's ALWAYS FOLLOW!  Therefore your comparison
is irrelevant, since the memory bandwidth required by next-generation
processors will still far exceed the bandwidth of next-generation memories.
In fact, the new processors will cycle so fast that it will be impractical
to go off-chip for cache, and we will see instruction and data caches
integrated onto the CPU chip (just like the 68030, but of truly useful
size).  Furthermore, multi-level caches will become more important, as the
on-chip cache(s) may not provide a good enough hit rate to allow going to
main memory directly, and an on-board cache (say a few megabytes or so)
will be used to further reduce the average time required to satisfy a memory
reference.  Main memory, consisting of 16Mb chips or even denser, will still
serve as a last resort (not counting the disk, of course).

>      In other words, will all the research being conducted
> for the cache coherency problems be a waste?  Could the
> research done for multi-cache coherency be applied elsewhere?
> 
> Thanks.
> 
>              Hemi Thaker
> 

Ooh, my blood is boiling now!  Especially since I spent 2 years working
on multiprocessor cache coherence algorithms for my MSEE, and have then spent
the last 2+ years helping the P896 Futurebus committee to define a standard
set of facilities to implement various multi-cache consistency protocols
(of varying complexity and performance).  The answer to your questions is
"no" on both counts, by the way, and it is likely that coherence solutions
partially enforced by software will become more and more important.
In other words, the on-chip MMU may contain bits to designate pages
"potentially sharable," enabling consistency to be enforced on those pages
only.  Otherwise, communication between processor modules, even at >100MB/sec,
is insufficient to support the invalidation and/or update traffic,
particularly in systems with many (i.e. >10) processors.  Even though I can
imagine a system that uses a 2GB/sec fiber-optic bus to connect its GaAs
processors to a bank of memory boards based on 256Mb CMOS memories, a
hierarchy of communication bandwidths will still exist, and caches will
still be necessary.  Also, don't forget that the speed of light will become
very important in microprocessor-based memory hierarchies in the 1990's, and
this will form yet another driving force behind the effective use of cache
memories.

Now of course, if you are willing to argue that cache consistency
enforcement can be done completely in software while providing an acceptable
programming model (i.e. reasonably transparent) for parallel programming
and efficient load balancing of processes across multiple processors, then go
ahead and prove it to me by example.  But you still need caches on your
processors for the same reasons as I detailed above, and there still needs
to be research done in more efficient software-enforced cache coherence
schemes.

In other words, the field is still WIDE open, but future research should
proceed thinking about 1990's technology, not 1980's technology.

Flame off.  Phew.

	Mark Papamarcos
	Valid Logic [and P896 Futurebus working group/cache task group]
	{ihnp4,hplabs}!pesnta!valid!markp