[comp.arch] Harvard Architecure

kruger@16bits.dec.com (I've got 50nS memory. What did you say?) (03/02/88)

a) And architecture working at a prestious law firm, making over $40K to start.

b) An architecture that has separate address and data busses so that instruction	fetches don't stop the processor from doing data transfers, which is
	what it's supposed to be doing!

Motorola claims that the 68030 has an "internal Harvard architecture" by which
they mean it has separate internal instruction and data caches.

dov

brucek@hpsrla.HP.COM (Bruce Kleinman) (03/08/88)

+-------
| Motorola claims that the 68030 has an "internal Harvard architecture" by
| which they mean it has separate internal instruction and data caches.
+-------

Ahh, those massive 256-Byte caches are really going to speed this puppy up :-)



                              Bruce Kleinman
          brucek%hpnmd@hpcea.hp.com  -or-  ...hplabs!hpnmd!brucek

              Hewlett Packard - Network Measurements Division
                          Santa Rosa, California

lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) (03/09/88)

In article <3460011@hpsrla.HP.COM> brucek@hpsrla.HP.COM (Bruce Kleinman) 
 writes about the 68030:
>Ahh, those massive 256-Byte caches are really going to speed this puppy up :-)

Actually, it will. Remember, the CDC 6600 got a win from an "instruction
stack" of 480 bits ! 

Plus, the two caches access in parallel (versus the one cache of the 68020).
Plus, the caches now take one clock (versus 2 on the 68020).
Plus, the caches now have burst refill (if the board designer supports it,
of course.)

All in all, a clear improvement. I don't hear any suggestions as to a better
use for the silicon.
-- 
	Don		lindsay@k.gp.cs.cmu.edu    CMU Computer Science

bcase@Apple.COM (Brian Case) (03/10/88)

In article <1071@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes:
>In article <3460011@hpsrla.HP.COM> brucek@hpsrla.HP.COM (Bruce Kleinman) 
> writes about the 68030:
>>Ahh, those massive 256-Byte caches are really going to speed this puppy up :-)

Talking about 68030.

>
>Actually, it will. Remember, the CDC 6600 got a win from an "instruction
>stack" of 480 bits ! 
>
>All in all, a clear improvement. I don't hear any suggestions as to a better
>use for the silicon.

A little birdie with an EE degree told me that you can expect maybe a 20%
improvement over a 68020 at the same clock rate.  An improvement, yes, but
a better use of silicon might have been some on-chip floating point.  Or
how about more pins so as to expose the harvard architecture to the external
world?

bcase@Apple.COM (Brian Case) (03/10/88)

In article <7614@apple.Apple.Com> bcase@apple.UUCP (Brian Case) writes:
>A little birdie with an EE degree told me that you can expect maybe a 20%
>improvement over a 68020 at the same clock rate.

Oops, I should have said that the little birdie also showed me the system
running real stuff.  That is one amazing little bird (has trouble holding
the soldering iron though).

brucek@hpsrla.HP.COM (Bruce Kleinman) (03/11/88)

+-------
| >Ahh, those massive 256-Byte caches are really going to speed this
| > puppy up :-)
|
| Actually, it will. Remember, the CDC 6600 got a win from an "instruction
| stack" of 480 bits ! 
|
| Plus, the two caches access in parallel (versus the one cache of the 68020).
| Plus, the caches now take one clock (versus 2 on the 68020).
| Plus, the caches now have burst refill (if the board designer supports it,
| of course.)
| 
| All in all, a clear improvement. I don't hear any suggestions as to a better
| use for the silicon.
+-------

All in all a clear improvement?  Over the '020, perhaps ...
The 256-byte data cache is of questionable value, as the miss-rate will be
fairly high.  I would supply numbers for "fairly high," but I can't seem to
find any miss-rate data for caches of smaller than 1 Kbyte.  Furthermore, the
restricted implementation of the data cache makes it useless in multi-processor
systems as well as systems with DMA.  The data cache has no facilities for
coherency.  Solution - disable the data cache or flush, flush away.

Suggestions as to a better use for the silicon?  OK ...
Expand the I-cache.  I suspect that loosing the D-cache could make room for
a 1 Kbyte I-cache.  This would offer honest improvements in almost all systems.
High-perf machines can surround the '030 with an hefty sys-cache for
increased performance.  Memory bandwidth problems, you say?  Bring the
Harvard architecture out to the pins - which is exactly what Motorola did
for the 88000.


                              Bruce Kleinman
          brucek%hpnmd@hpcea.hp.com  -or-  ...hplabs!hpnmd!brucek

              Hewlett Packard - Network Measurements Division
                          Santa Rosa, California

pv@tut.fi (Vuorimaa Petri Kalevi) (03/12/88)

From article <7614@apple.Apple.Com>, by bcase@Apple.COM (Brian Case):

> A little birdie with an EE degree told me that you can expect maybe a 20%
> improvement over a 68020 at the same clock rate.  An improvement, yes, but
> a better use of silicon might have been some on-chip floating point.

Floating point unit takes much more space than instruction cache.
Actually more room than memory management unit and instruction cache
together. You must also remember that there's just not that much
more space on the chip: Motorola had to drop the number of address
translation cache entries from 64 (MC68851) to 22 (MC68030) to fit MMU in.
At this moment it's just not possible to add on-chip floating point unit
to MC68020. So, Motorola made the best choice they could.

> Or how about more pins so as to expose the harvard architecture to the
> external world?

That means nearly 200 pins (128 + 32 + 32 + control signals).
And that's much more than I myself have ever seen (or is there such?).
If the address space for instructions was smaller, then maybe 
it could be possible, but that causes troubles for the applications
all ready using MC68020. 
-- 
Petri Vuorimaa	   Tampere University of Technology / Computer Systems Lab
pv@tut.FI          PO. BOX. 527, 33101 Tampere, Finland

bcase@Apple.COM (Brian Case) (03/15/88)

In article <2781@korppi.tut.fi> pv@tut.fi (Vuorimaa Petri Kalevi) writes:
>Floating point unit takes much more space than instruction cache.
>Actually more room than memory management unit and instruction cache
>together. You must also remember that there's just not that much
>more space on the chip: Motorola had to drop the number of address
>translation cache entries from 64 (MC68851) to 22 (MC68030) to fit MMU in.
>At this moment it's just not possible to add on-chip floating point unit
>to MC68020. So, Motorola made the best choice they could.

Yeah, there is some problem with chip area.  However, the T800 transputer
has on-chip floating point plus a 2K byte ram plus 4 links plus....  There
are some floating-point assists that might have fit.

One thing is for sure, Mot has never been know for having the best
process technology, er, I mean densest anyway.  This may be changing
somewhat.

alan@pdn.UUCP (Alan Lovejoy) (03/19/88)

In article <7614@apple.Apple.Com> bcase@apple.UUCP (Brian Case) writes:
/In article <1071@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes:
/>In article <3460011@hpsrla.HP.COM> brucek@hpsrla.HP.COM (Bruce Kleinman) 
/> writes about the 68030:
/>>Ahh, those massive 256-Byte caches are really going to speed this puppy up :-)
/
/Talking about 68030.
/
/>
/>Actually, it will. Remember, the CDC 6600 got a win from an "instruction
/>stack" of 480 bits ! 
/>
/>All in all, a clear improvement. I don't hear any suggestions as to a better
/>use for the silicon.
/
/A little birdie with an EE degree told me that you can expect maybe a 20%
/improvement over a 68020 at the same clock rate.  An improvement, yes, but
/a better use of silicon might have been some on-chip floating point.  Or
/how about more pins so as to expose the harvard architecture to the external
/world?

The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction
cache turned on (compared to its being turned off).  Apparently the
'030 gets **AT LEAST** a 30% performance boost from turning on the data
cache (so I have been told by those who have benchmarked one).

Enough said.

--alan@pdn

mash@mips.COM (John Mashey) (03/20/88)

In article <2594@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes:

>The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction
>cache turned on (compared to its being turned off).  Apparently the
>'030 gets **AT LEAST** a 30% performance boost from turning on the data
>cache (so I have been told by those who have benchmarked one).
>
>Enough said.

1) Even small I-caches are almost always useful: the question that started
this all was whether or not small D-caches were useful, and if so, how much,
and under what circumstances.

2)One would expect (rightly or wrongly) that overall system design would heavily
influence the benefit level of a small on-chip D-cache. I.e., one would
expect that, for example, turning the D-cache on would help more in
a Sun-3/160 - style design (no external cache) than in a /260 design
(well-designed, fast external cache). (Expectations could be wrong, but...)

3) Data would help: whenever 68030 systems become widely available,
and especially if there are convenient ways to turn the caches on/off,
people could do comp.arch a large service by:
	a) running large, realistic benchmarks
	b) reporting the results
	c) reporting the clock speed and overall memory system configuration.
Until then, all we've got to go on is indirect reports, having no idea what
sorts of benchmarks and configurations are being tested. Alan: can you
possibly offer more fo the details, or is it still proprietary?
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

radford@calgary.UUCP (Radford Neal) (03/21/88)

In article <2594@pdn.UUCP>, alan@pdn.UUCP (Alan Lovejoy) writes:

> The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction
> cache turned on (compared to its being turned off).  Apparently the
> '030 gets **AT LEAST** a 30% performance boost from turning on the data
> cache (so I have been told by those who have benchmarked one).
> 
> Enough said.

This is a bit hard to believe, seeing as the 68020 takes two cycles
to access a word from cache, and only three to access the same word
from external memory (if my memory serves me right). Of course, the
external memory might be slow, and require wait states to be added
to the three cycles. If you're willing to go for that, however, I'm 
sure I could build a system in which turning on the cache speeds 
things up by a factor of ten. (Well, actually, I'm not sure I could,
not having much experience with a soldering iron, but you know 
what I mean... :-)

What's needed are data on: (1) the benefit of the '030's data
cache assuming zero wait states (and a 32-bit bus), (2) the
benefit for various other memory configurations, (3) the overall
benefit in systems typical of various applications.

    Radford Neal

pf@diab.UUCP (Per Fogelstr|m) (03/21/88)

In article <2594@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes:
>The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction
>cache turned on (compared to its being turned off).  ...............
>
>--alan@pdn

From our experience tests has shown that turning off the internal cache in the
"020" results in a 20% slowdown if the external memory don't have wait states.

So Your figure must be from a system with slow external memory, right ??

chow@batcomputer.tn.cornell.edu (Christopher Chow) (03/24/88)

In article <373@ma.diab.UUCP> pf@ma.UUCP (Per Fogelstr|m) writes:
|In article <2594@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes:
||The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction
||cache turned on (compared to its being turned off).  ...............
||
||--alan@pdn
|
|From our experience tests has shown that turning off the internal cache in the
|"020" results in a 20% slowdown if the external memory don't have wait states.
|
|So Your figure must be from a system with slow external memory, right ??

My own tests have also shown that on a Mac II, the 68020 instruction cache
results in a 20% performance change.  (I think the Mac II has 1 wait state.)


Christopher Chow
/---------------------------------------------------------------------------\
| Internet:  chow@tcgould.tn.cornell.edu (128.84.248.35 or 128.84.253.35)   |
| Usenet:    ...{uw-beaver|ihnp4|decvax|vax135}!cornell!batcomputer!chow    |
| Bitnet:    chow@crnlthry.bitnet                                           |
| Phone:     1-607-253-6699   Address: 7122 N. Campus 7, Ithaca, NY 14853   |
| Delphi:    chow2            PAN:  chow                                    |
\---------------------------------------------------------------------------/

alan@pdn.UUCP (Alan Lovejoy) (03/25/88)

In article <373@ma.diab.UUCP> pf@ma.UUCP (Per Fogelstr|m) writes:
>In article <2594@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes:
>>The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction
>>cache turned on (compared to its being turned off).  ...............
>>
>>--alan@pdn
>
>From our experience tests has shown that turning off the internal cache in the
>"020" results in a 20% slowdown if the external memory don't have wait states.
>
>So Your figure must be from a system with slow external memory, right ??

Absolutely correct.  150ns DRAMs to be precise.  Using 45ns SRAMs, the
figure is closer to the 20% you quoted (my source gets a 30% difference with his
benchmarks and his compiler).  Somehow only the 150ns DRAM figure stuck
in my mind (perhaps because my main interest is personal computers where
45ns SRAMs are too expensive).  The numbers from the '030 also are for
relatively slow external memory and no external cache.  Sorry, but I
can't be more specific than that.

But an interesting point is raised here:  what's good for a $50,000
workstation may not be so good for a $5000 pc, and vice-versa.  What's
good for running UNIX&C may not be good for running Smalltalk, and
vice versa.  This is not new information, but the discussion in this
group tends to lose sight of it at times.

--alan@pdn