[comp.sys.amiga] RISC

daveh@cbmvax.UUCP (Dave Haynie) (10/14/87)

in article <21258@ucbvax.BERKELEY.EDU>, shs@ji.Berkeley.EDU (Steve Schoettler) says:
> Keywords: RISC cache
> Summary: Risc doesn't need big fast memory, only fast cache.
> Xref: cbmvax comp.sys.atari.st:5537 comp.sys.misc:936 comp.sys.amiga:9155
> 
> In article <2473@cbmvax.UUCP> daveh@cbmvax.UUCP (Dave Haynie) writes:
>>
>>Of course you have to track the design of main memory along with all of this,
>>too.  Even if your RISC architecture can run 20 times the instruction rate
>>of your CISC, if there's no main memory to support this, the RISC may loose
>>big, since it needs to get more instructions executed per second to keep
>>up with the more work per instruction aspect of CISC.
> 
> Why does everyone assume that for a 25MHz processor, you need 25MHz RAM?
> That's what caches are for!!!!!  If you have a good cache (95% or greater hit
> rate isn't hard to get) and keep your miss penalty low, a full speed cache
> (64K or so) with a large, slower memory will get almost the same performance.
> and be much cheaper.  Finding memory as fast as the processor really hasn't
> been a problem for recent microprocessors (30ns static RAM is no big deal).

Keep in mind that when you say "25MHz processor", most people think of a 
processor like the 68020 running at 25MHz.  A 25MHz 68020 is fast, but has a
memory cycle time of "only" 120ns or so, since each memory cycle takes
three clock cycles.  Certainly a cache is nice and even reasonable to build
in static RAM for a processor running in this speed range.  Like the Compaq
80386 machine and it's static RAM cache.  

On the other hand, folks are building RISC machines to be 1 cycle machines,
not 3 cycle machines.  So now you're talking about 40ns memory cycle times
if you really want to be as fast as possible.  Which you need to be, since
that's the big (main, only, whatever) advantage of RISC over CISC.  So now
my RAM cache is ECL, or maybe GaAs.  Or maybe too slow for the CPU.

>>  So you get strange
>>new architectures that are loosely termed RISC, like load/store architectures,
>>on-chip RAM, lots of registers, etc.  
> Large register banks really aren't much of a win because of the increase in
> context switch time.  In the AM29000 (which isn't really a RISC), the 192 registers
> are divided into 64 (global data) + [ 8 (tasks) * 16 (local registers per task) ]
> If you get more than 8 tasks, context switching has to go off chip and gets
> slow again.

Actually, AMD isn't using the resisters as separate task frames, but more like
a task data cache.  You swap all your registers on each task swap.  But unless
you're doing somthing really strange, even swapping 192 registers to memory
(at full AM29000 speed, at least) shouldn't be alot of time compared to the
time spent within any task.  If for some reason you've got to swap tasks every
couple microseconds, this could become significant.  But most OSs that I've
used swap tasks more on the order of 100s of milliseconds.  In this case, the
only significant task-swap overhead is incurred if the swap encounters page
faults.

> What we really need is an on-chip cache with process-id-tags so context
> switching doesn't destroy the cache completely (68020 has to flush its cache with
> every context switch).

If you could keep the overhead of managing the id-tags below that of flushing
the cache, sure.  I'll use any little tweak you can give me.

>>Dave Haynie
> 
> Steve Schoettler,  humble CS researcher.

-- 
Dave Haynie     Commodore-Amiga    Usenet: {ihnp4|caip|rutgers}!cbmvax!daveh
   "The B2000 Guy"              PLINK : D-DAVE H             BIX   : hazy
    "Computers are what happen when you give up sleeping" - Iggy the Cat

oconnor@sunray.steinmetz (Dennis Oconnor) (10/15/87)

( text in square-brackets [...] is mine. DMOC )

In article <2494@cbmvax.UUCP> daveh@cbmvax.UUCP (Dave Haynie) writes:
>On the other hand, folks are building RISC machines to be 1 cycle machines,
>not 3 cycle machines.  So now you're talking about 40ns [at 25MHz] memory cycle
>times if you really want to be as fast as possible.  Which you need to be,
>since that's the big (main, only, whatever) advantage of RISC over CISC.

Basicly correct, but not entirely. RISCs indeed need about 1.5(+-.3)
memory accesses per cycle ( 1 for instructions, .2-.8 for operands ).
However, this does not directly lead to a need for RAM that has an
access time lower than the cycle time. Naturally you can seperate
your instruction and data memory, and then you can pipeline the
memory, and then you can use wide memories, and then you can use
interleaved banks of memory, and then you can use look-ahead systems...
It's best to design these features in from the start, they are
horrendous to add in later ( some of them ), and they usually raise
latency and complicate the hardware ( no free lunch ), but they do
allow the processor to run using slower-than-cycle-time memories
without taking a major (or even proportionate) performance hit,
even when the slower-memory is the cache.

A better name for "RISC" might be "BFMI" : _B_ig _F_ast _M_emory that's
_I_nexpensive. ( BFMI also stands for "Brute Force and Massive Ignorance" ).
Well, when you've got dynamite you don't dig ditches with a scalpel.

>So now my RAM cache is ECL, or maybe GaAs.  Or maybe too slow for the CPU.

For 40ns? Please, get up to date. 35ns CMOS static RAMS have been
around for years, in 64K and bigger. Check out Performance, Cypress,
and IDT : the state-of-the-art in production (like, delivery NOW ) 
CMOS static rams is 15ns, 16k by 4. I've eye-balled them, they exist.
Someone I know has tested some older 16kx4 CMOS RAMS: 12ns access time.

>... shs@ji.Berkeley.EDU (Steve Schoettler) says:
>> What we really need is an on-chip cache with process-id-tags so context
>> switching doesn't destroy the cache completely (68020 has to flush its
>>cache with every context switch).
>
>If you could keep the overhead of managing the id-tags below that of flushing
>the cache, sure.  I'll use any little tweak you can give me.

The Stanford MIPS processor has a neat scheme they call APHID ( for
Active Process Header ID ) : replace some number of the MSb's of all
the addresses generated with a number unique to each process. Each
process thinks it is accessing memory locations 0..2**n-1, when really
each process accesses (k*2**n)+(0..2**n-1). Low overhead, allows
relocation of processes easily. Kinda grainy ( every process is
allocated some power-of-two sized block of memory, which may waste
49% of memory ) but cheap to implement, gate-wise. Solves the
cache-purge problem nicely. Also gives some protection of
processes and resources from bad processes, and provides some
fault-detection ( array-out-of-bounds kinda stuff ) capability.

>> Steve Schoettler,  humble CS researcher.
>-- 
>Dave Haynie     Commodore-Amiga    Usenet: {ihnp4|caip|rutgers}!cbmvax!daveh
>   "The B2000 Guy"              PLINK : D-DAVE H             BIX   : hazy

Sure, hey, but comp.sys.arch is the correct place for this discussion.
In fact, they probably already had it.
--
	Dennis O'Connor 	oconnor@sungoddess.steinmetz.UUCP ??
				ARPA: OCONNORDM@ge-crd.arpa
        "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?"