[comp.arch] On-Chip Cache Survey

jlodman@beowulf.ucsd.edu (Michael Lodman) (02/08/91)

I am doing some work in which I need information regarding those
microprocessors with on-chip cache. Of the top of my head only
the Intel 80486 comes to mind.

I would appreciate people sending to me or posting lists of micros
they know of with on-chip cache. User comments would be appreciated
as well.

Thanks in advance!

-- 
Michael Lodman	Department of Computer Science Engineering
	University of California, San Diego
jlodman@cs.ucsd.edu			(619) 672-1673

dlau@mipos2.intel.com (Dan Lau) (02/08/91)

Off the top of my head, the following chips have on-chip cache (all
on one single chip, as opposed to the Clipper, 88000 or RS6000, which
have separate cache controller/memory chips which are used as a set):

80486: 4K code, 4K data
80860: 4K code, 8K data
80960CA: 1K code, 1K "register"
68040: 8K unified
68030: 256 code

torrie@cs.stanford.edu (Evan Torrie) (02/08/91)

dlau@mipos2.intel.com (Dan Lau) writes:

>Off the top of my head, the following chips have on-chip cache (all
>on one single chip, as opposed to the Clipper, 88000 or RS6000, which
>have separate cache controller/memory chips which are used as a set):

>80486: 4K code, 4K data

>68040: 8K unified

  These should be reversed I believe...  
68040 4K code 4K data 
80486 8K unified

Also, be on the lookout for
MIPS R4000 8K data 8K code
Motorola 88110
TI Viking?

as examples of chips with on-chip cache.

-- 
------------------------------------------------------------------------------
Evan Torrie.  Stanford University, Class of 199?       torrie@cs.stanford.edu   
"If it weren't for your gumboots, where would you be?   You'd be in the
hospital, or in-firm-ary..."  F. Dagg

phil@motaus.sps.mot.com (Phil Brownfield) (02/08/91)

>68040: 8K unified
>68030: 256 code

Actually, this should read:

68040: 4K instruction, 4K data; both accessed by real addresses
68030: 256 byte instruction, 256 byte data; both virtual addressed
68020: 256 byte instruction; virtual addressed

-- 
Phil Brownfield
phil@motaus.sps.mot.com
{cs.utexas.edu!oakhill, mcdchg}!motaus!phil

edschulz@cbnewsj.att.com (edward.d.schulz) (02/08/91)

Don't forget the VL86C020 (Acorn RISC Machine):
4Kbyte unified cache, 64-way set associative.

-- 
Ed Schulz, AT&T, Room 2P276 200 Laurel Ave., Middletown, NJ 07748
+1 908 957 3899      Ed_Schulz@att.com    or    eds@mtdcb.att.com

kers@hplb.hpl.hp.com (Chris Dollin) (02/08/91)

The ARM3 (Acorn Risc Machine, version 3) has a 4Kb cache on-chip (instructions
+ data, I believe).

Since the chip appears in machines with slow (ie, cheap) memory (no, I can't
quote times), and clocks at around 30MHz, the cache is pretty much essential to
get a performance improvement over the 8MHz ARM2. It certainly makes one hell
of a difference on my A440.
--

Regards, Kers.      | "You're better off  not dreaming of  the things to come;
Caravan:            | Dreams  are always ending  far too soon."

jab@duke.cs.duke.edu (John A. Board) (02/09/91)

In article <16444@sdcc6.ucsd.edu>, jlodman@beowulf.ucsd.edu (Michael Lodman) writes:
> ...
> I would appreciate people sending to me or posting lists of micros
> they know of with on-chip cache. User comments would be appreciated
> as well.
>
``Reliable rumor'' has it that the next generation Transputer (the H1,
due to be announced in April) has a 16 KByte unified cache that can also
be configured to act as on-chip (i.e. fast) user-addressable RAM.
I believe one can select the two 8K sections separately as Cache/RAM.
I'm not aware of any other chip that lets the user ``program'' the cache
in this manner, but then again I'm not omniscient....  Earlier Transputers
have had on-chip RAM only; no on-chip cache.  Perhaps someone from
SGS-Thomson/INMOS can comment further.

John Board                                   INET: jab@dukee.egr.duke.edu
Assistant Professor                            or  jab@duke.cs.duke.edu
Dept. Electrical Eng'g and Dept. Comp. Sci.  
Duke University                              FAX:   +1 (919) 684-4860
Durham NC USA                                VOICE: +1 (919) 660-5272

enbody@ss8.cps.msu.edu (Dr Richard Enbody) (02/09/91)

M68020   128 word (64 longwords), I-cache

ckp@grebyn.com (Checkpoint Technologies) (02/09/91)

In article <2369@inews.intel.com> dlau@mipos2.UUCP (Dan Lau) writes:
>80486: 4K code, 4K data

Wrong: 8K unified.  Keeps all the self-modifying DOS code running.

>68040: 8K unified

Wrong: 4K code, 4K data.  The 68020 broke any functional self-modifying
code long ago.

>68030: 256 code

Wrong: 256 code, 256 data.

Also, the 68020 has 256 bytes of code cache, no data cache.

-- 
First comes the logo: C H E C K P O I N T  T E C H N O L O G I E S      / /  
                                                                    \\ / /    
Then, the disclaimer:  All expressed opinions are, indeed, opinions. \  / o
Now for the witty part:    I'm pink, therefore, I'm spam!             \/

jlodman@beowulf.ucsd.edu (Michael Lodman) (02/10/91)

In article <1991Feb8.020013.22133@Neon.Stanford.EDU> torrie@cs.stanford.edu (Evan Torrie) writes:
>Motorola 88110
>
>as examples of chips with on-chip cache.

Is Motorola planning on adding a cache on-chip to the 88100? Last time I
checked the cache was contained in the 88200 CMMU.

Thanks to all who have replied so far. I've been out of the CPU business for
a while, and plain forgot about the small 68020 cache.

-- 
Michael Lodman	Department of Computer Science Engineering
	University of California, San Diego
jlodman@cs.ucsd.edu			(619) 672-1673

scott@electron.amd.com (Scott McMahon) (02/10/91)

In article <16444@sdcc6.ucsd.edu> jlodman@beowulf.ucsd.edu (Michael Lodman) writes:
| I would appreciate people sending to me or posting lists of micros
| they know of with on-chip cache. User comments would be appreciated
| as well.

The Am29000 has a 512 byte Branch Target Cache (yes, an instruction
cache variant). The Am29050 has a 1 kbyte dual mode BTC (programmable
number of sets vs block size tradeoff).

-Scott
--
Scott McMahon - 29k Advanced Processor Development - Advanced Micro Devices
scott@amd.com                                        (800) 531-5202 x54985

aegl@unisoft.UUCP (Tony Luck) (02/13/91)

dlau@mipos2.intel.com (Dan Lau) writes:
>Off the top of my head, the following chips have on-chip cache (all
>on one single chip, as opposed to the Clipper, 88000 or RS6000, which
>have separate cache controller/memory chips which are used as a set):

>68040: 8K unified
>68030: 256 code


No! The mc68040 has 4k i-cache, and 4k d-cache ... and they definitely
aren't "unified". The mc68030 has 256 byte i-cache and (for what little
use it is) a 256 byte d-cache.

-Tony Luck <aegl@unisoft.com>

jlodman@beowulf.ucsd.edu (Michael Lodman) (02/13/91)

The following is the list of micros with on-chip cache I've received
to date. Thanks to all who helped out!

Intel		80486
Motorola	68020 (I-cache only)
Motorola	68030
Motorola	68040
Intel		i860
Intel		i960
Mips		R4000
Fujitsu, et al	Gmicro/200 
Fujitsu, et al	Gmicro/300
Motorola	88110
National	NS32532
Weitek		XL8220
VTI		ARM
AMD		Am29000
AMD		Am29050
LSI Logic 	LR33000
IDT		R3501
IDT		R3502
Inmos		Transputer
TI		MicroExplorer


-- 
Michael Lodman	Department of Computer Science Engineering
	University of California, San Diego
jlodman@cs.ucsd.edu			(619) 672-1673

floydg@oakhill.sps.mot.com (Floyd Goodrich-HiEnd Product Eng) (02/13/91)

In article <2369@inews.intel.com> dlau@mipos2.UUCP (Dan Lau) writes:
>Off the top of my head, the following chips have on-chip cache (all
>on one single chip, as opposed to the Clipper, 88000 or RS6000, which
>have separate cache controller/memory chips which are used as a set):
>
>80486: 4K code, 4K data
>80860: 4K code, 8K data
>80960CA: 1K code, 1K "register"
>68040: 8K unified
>68030: 256 code

  About the Motorola parts, this should be:
    68040:  4K code,  4K data
    68030: 256 code, 256 data
    68020: 256 code

-------
Floyd Goodrich
Motorola Inc., Austin TX
floydg@oakhill.sps.mot.com

rogerk@mips.COM (Roger B.A. Klorese) (02/14/91)

In article <16616@sdcc6.ucsd.edu> jlodman@beowulf.ucsd.edu (Michael Lodman) writes:
>The following is the list of micros with on-chip cache I've received
>to date. Thanks to all who helped out!
>Mips		R4000

...and R6000.
-- 
ROGER B.A. KLORESE                                  MIPS Computer Systems, Inc.
MS 6-05    930 DeGuigne Dr.   Sunnyvale, CA  94088              +1 408 524-7421
rogerk@mips.COM         {ames,decwrl,pyramid}!mips!rogerk         "I'm the NLA"
"WAR: been there, done that... hated it."  -- QueerPeace/DAGGER chant

rogerk@MIPS.com (Roger B.A. Klorese) (02/15/91)

In article <574@spim.mips.COM> rogerk@mips.COM (Some Idiot) writes:
>In article <16616@sdcc6.ucsd.edu> jlodman@beowulf.ucsd.edu (Michael Lodman) writes:
>>The following is the list of micros with on-chip cache I've received
>>to date. Thanks to all who helped out!
>>Mips		R4000
>...and R6000.

Wrong.

While the R6000 has a two-level cache like the R4000, both levels are 
implemented off-chip.
-- 
ROGER B.A. KLORESE                                  MIPS Computer Systems, Inc.
MS 6-05    930 DeGuigne Dr.   Sunnyvale, CA  94088              +1 408 524-7421
rogerk@mips.COM         {ames,decwrl,pyramid}!mips!rogerk         "I'm the NLA"
"WAR: been there, done that... hated it."  -- QueerPeace/DAGGER chant

grunwald@foobar.colorado.edu (Dirk Grunwald) (02/15/91)

Here's a blurb on the H1. Scan for the word ``cache''. They don't say
much more than you already have.

From: andyr@inmos.COM (Andy Rabagliati)
Newsgroups: comp.parallel
Subject: Next generation Transputer
Summary: reprint of Article in New Electronics, Sept. 1990
Keywords: transputer, H1, inmos
Date: 25 Oct 90 13:01:51 GMT
Organization: INMOS Corporation, Colorado Springs


The following is an article that appeared in New Electronics, Sept. 1990.

Cheers,  Andy Rabagliati    US Central Applications
			    EMAIL:- rabagliatia@isnet.inmos.COM

----------------------------------------------------

Inmos - Hl - architecture revealed

Since the Transputer's launch five years ago there have been no major changes
to the microprocessor.  Now, on the eve of the device's fifth birthday, Dr
Clive Dyson outlines the architecture of the Hl; the next generation
Transputer.

The revolutionary architecture of the Inmos Transputer created a storm when
it was first revealed in the mid 1980s.  A microprocessor with on chip ram
and four serial communication links was dramatically different to other 32bit
processors then appearing on the market.

But the Transputer has matured into a credible force in the microprocessor
market and according to Dataquest figures, is now the third most popular
device behind the products of Intel and Motorola.

However, the complexion of the 32bit microprocessor market is changing with
embedded systems continuing to account for a growing share.  Dataquest
predicts the market will grow to 28 million units in 1993 from 4.3 million
units in 1988.  Over this period the share taken by system cpus is predicted
to fall to 28% from 64% while that for embedded system processors is forecast
to rise to 61% from 30%.

The Transputer is well placed to take advantage of this shift away from
'conventional' processors; indeed the majority of the 200,000 Transputers
Inmos shipped worldwide in 1989 were built into embedded systems.  This is no
idle coincidence; the architecture of the Transputer was originally designed
to suit it to the demands of real time computing.

A key requirement in most programmed systems, especially embedded systems, is
the ability of the processor to switch context efficiently at an interrupt or
timeslice between processes or tasks.  These processes also have to
communicate with each other.  The Transputer is unique in providing hardware
support for process scheduling and specific instructions for inter-process
communication.

Processes can be written in C or Fortran supported by a kernel or operating
system.  Altematively programs can be written in parallel versions of C,
Fortran, Ada or in Occam in which case the scheduling capabilities of the
Transputer are used directly.  Furthermore, as the computational loads on
embedded controllers increase, the ability of the Transputer to produce
scalable multiprocessor systems is crucial.  Message passing over dedicated
point to point links will always prove more deterministic than communication
over a shared backplane.  Finally, real time embedded systems have to be
compact both in terms of a low component count and in the amount of code they
require.  The Transputer meets both these requirements.

However, the existing Transputer can be improved further to enhance its
suitability for embedded systems and a team at Inmos Bristol design centre
have been working for the past two years to do just that.  The device,
codenamed Hl, will be launched early next year.

The design goal for the Hl was to establish a new standard in single
processor performance while enhancing the Transputer's position as the
premier multiprocessing microprocessor.  This had to be achieved while
maintaining compatibility with existing Transputer products.

To meet these goals a new micro-architecture has been developed which
implements the same instruction set as the existing T805 Transputer.  The Hl
provides an order of magnitude increase in performance combined with enhanced
capabilities to support the emerging software standards in the embedded
systems market.

Inmos is also designing a range of network communication products to
complement the Hl.  These products are based on a new 100Mbits/s link
protocol which supports the dynamic routing of messages between processors.

The key features of the Hl architecture are a pipelined superscalar processor
alu combined with on-chip cache ram and improved communications which make
multiprocessor programming easier.

The major design goal of achieving a significant performance increase, while
maintaining instruction set compatibility with the T805 Transputer, produced
a design which gives a peak performance in excess of 150mips and 20Mflops
with a sustained performance exceeding 60mips and IOMnops.

A number of design features have contributed to this performance.  The
processor itself uses a pipelined superscalar architecture which is able to
execute up to eight instructions on each clock cycle and operates at a clock
speed of 50MHz (a consequence of a sub-micron cmos process).

The number of cycles required to execute many of the instructions such as
integer and floating point multiply and logical shift has been reduced
dramatically.  The T805, for example, requires 38 clock cycles for an integer
multiply operation; the Hl will need a small fraction of that number.

Unlike other superscalar machines the Hl architecture does not require an
advanced compiler to schedule the different functional units in the
processor.  The flow of multiple instructions through the pipeline is
controlled by hardware.  It is not necessary for existing compilers to be
modified, or for source code to be recompiled to obtain the full claimed
performance.

The triple metal layer, sub-micron cmos process has enabled 16kbytes of
on-chip cache memory to be provided.  The move to a cached architecture is a
radical change from the simple on-chip memory provided on earlier
Transputers.  It is achievable because 16kbytes is a sufficiently large cache
to result in high hit rates for most applications.  However, it is possible
to run the cache as on-chip ram for applications which only require small
amounts of memory, or which cannot tolerate the indeterminate behaviour
caused by cache line misses.

Great care has been taken to ensure that the Hl Transputer will provide this
high performance in low component count systems.  For example, there is a
64bit data bus, which can sustain high data transfer rates for cache line
refill, with a programmable memory interface.  The interface supports four
independent banks of external memory and the timing for each bank can be
configured independently.  System designers could choose, for example, to
fill two banks with dram, one bank with video ram and the other with
peripherals.  Such a system would require no extemal support logic.  The Hl
supports, in hardware, the same scheduling algorithms as current Transputers.
However, on the Hl each process can be augmented with a trap handler process.
If an error such as integer overflow occurs then the trap handler copes with
the error in software before returning control to the process.

A separate user mode is also supported by the Hl.  In this mode privileged
instructions (which include communications and scheduling instructions)
cannot be executed.  All memory accesses are checked and translated from the
logical to the physical address space.

The memory protection and address translation mechanisms are designed
specifically to support secure programming and debugging in embedded systems.
For dedicated (single user) systems the protection aids the detection of
programming errors.  For multiuser general purpose compute systems it allows
users and the operating system to be protected from erroneous or malicious
programs.

By concentrating on the requirements of embedded systems the protection and
translation mechanisms allow the processor to execute code in protected
(user) mode as efficiently as in normal processes but without the overhead
associated with paged based virtual memory.  Additionally, enhancements on
the Hl allow programmers to write more efficient real time kernels; the state
of the machine, the process and timer queues, timeslicing and
interruptability can be accessed and controlled.

One limitation of existing Transputer networks is the need to match the
algorithms to the interconnectivity in a specific machine.  This means the
software is not readily ported to other machines with different link
topologies.  The Hl eliminates this problem by providing hardware which
allows Transputers to be connected via a quick communications network.
Communication channels may be established between processes on any two
Transputers in the network.

This simplifies programming because processes can be allocated to Transputers
after the program has been written.  Different alloations can be made for
different machines and the allocation can be changed to optimise performance.
It is possible, in principle, for the allocation to be made by the compiler
effectively removing all configuration details from the program.

The Hl Transputer itself contains a separate communications processor which
multiplexes a large number of logical communications channels (virtual
channels) onto each of its four physical links.  Messages are transmitted
along virtual channels as a sequence of packets all of which, except the
last, contain 32bytes of data.  Each packet starts with a header which is
used to route the packet through the network and to identify the destination
virtual channel on the remote Transputer.

The communication network is constructed using a separate routing device, the
C104.  Small numbers of Hls can be connected using a single C104.  In larger
systems a number of C104 devices can be connected together to form a
hypercube, a multi-dimensional grid or tree network.

Each C104 has 32 bidirectional links.  The header of each packet arriving on
a link input is used to determine the output link for that paclcet which is
then transmitted when the output link becomes free.

An algorithm called 'interval labelling' decides which link should be the
output connection for each packet.  A continuous set of header values, an
interval, is allocated to each output link.  The header of an incoming packet
will lie within only one range and the packet will be directed out of the
associated link.  Using this algorithm it is possible to devise the optimum
labelling scheme, which is free of deadlock, for all the common network
topologies.

The C104 provides additional circuitry to allow networks to be connected
together and to reduce the impact of message congestion on worst case latency
and bandwidth in heavily loaded networks.

The balance of processing power, multiprocessing capabilities and support for
standard software will allow the Hl to take up the challenges of the l990s.

Dr Clive Dyson is the Transputer development manager at Inmos, Bristol

New Electronics September 1990