[comp.sys.apple] Harvard Architecture explanation, & how it might be done on the 65816

toddpw@tybalt.caltech.edu (Todd P. Whitesel) (03/12/90)

q4kx@vax5.cit.cornell.edu (Joel Sumner) writes:

>1.  Could someone please explain exactly what "Harvard Architecture" is and
>how it affects the 65816 (or how it is implemented in a processor)?

First, what we're used to (CPU spits out memory addresses, and either reads or
writes data to them) is called the Von Neumann (sp?) Architecture after Jon Von
Neumann, who came up with it. It's fairly simple to implement and you can build
a computer with it, but it has one limitation -- it's fundamental structure
places a bottleneck between the CPU and its memory, of whatever the bus speed
is, because that is the top speed at which the CPU can read or write data.

To get around this, researchers at Harvard came up with the idea of separating
The instruction and data memory so that both could be accessed in parallel by
the CPU; this is the general idea referred to by the term 'harvard architecture'
but it does not need to be applied to the whole system. If you have on chip
caching of instructions and data and you separate them then you can run them in
parallel and thus speed up execution on the CPU itself without modifying the
von-neumann-ness of the rest of the system. The 68030 does this if I am not
mistaken.

How this relates to the 65816: Every 65816 clock cycle is a potential bus
operation and at slow clock speeds (memory is as fast as CPU clock) this
is a real advantage. However, at higher clock speeds only cache RAM can
run that fast so you have to use a cache because memory that runs at the CPU
speed is expensive.

Many of us want to see a cache on chip, and while we're at it we can attack
the major time-waster of the 65xxx design which Bill Mensch _says_ he wants
to addresss but hasn't gotten to it yet... this is the fact that there are
many cycles which could be trimmed off by increased parallelism within the
CPU. One of the best ways to do that is to have separate instruction and
data caches and 'harvard' them; direct page and stack operations (the 65xxx's
"registers" if you will) would then operate as fast as immediate operands,
almost as fast as the real registers which there are precious few of.

As for this breaking code: I am pretty sure that it could be done such that
cache conflicts (executing code out of direct page, or storing bytes to
the instruction stream) only slow the CPU down and make it operate the way
it normally does out of one cache. I may be wrong, but I don't see what is
so hard about this, and it otherwise sounds like a real win.

Todd Whitesel
toddpw @ tybalt.caltech.edu