[net.arch] RISC vs INTEL

dan@pyramid.UUCP (Danial Carl Sobotta) (02/27/86)

clif@intelca.UUCP (Clif Purkiser) writes:

>The RISC machine I designed for my Computer Architecture course
>(taught by Mr. RISC Dave Patterson ) had 24 instructions and two addressing 
>modes it didn't even have a multiple.  While I was very happy at the time that
>I didn't have to try to right microcode for a lot of complex instructions!
>(The previous class had to implement the Z8000 with TTL SSI and MSI parts)
>My machine  sure took a long time to do useful work.  Needless to say my RISC 
>computer was a toy compared to real RISC machines.

I would be interested to see how much of your performance degradation was
due to *your* implementation as opposed to the nature of RISC in itself.

>J. Giles point is well taken that compiler writers have not yet figured out a way of taking advantage of the numerous addressing modes and 
>instructions offered by CISC machines such as 86, Z8K, 68K or 32K families.
>I would even concede that you can still have a RISC machine that takes more
>than one clock to complete an instruction.  This allows an important
>instruction such as multiple to be included in the instruction set.
                    ^^^^^^^^
Multiple what?? (block move, n-bit decimal add, ...).  Yes, there are
RISC machines that have some multiple cycle instructions.  In my experience,
though, something like a block move can usually be implemented with a loop
(especially using an efficient branch scheme) of simpler instructions with
*no loss of performance*.

>The Ascii Adjust
>instruction are used by COBOL (yes people still use it) compilers and 
>spreadsheet designers because COBOL uses BCD and these instruction speed 
>this process up.  Likewise, XLAT can be used for converting ASCII to EBCDIC
>in 5 clocks.  

It is indeed a shame that people still use COBOL, but nevertheless, most
of the convert type instructions can be done easily (and maybe faster!)
using a RISC architecture.  Converting ASCII to EBCDIC in 5 cycles?? Excuse
me if I'm ignorant, but wouldn't a simple table lookup do?? If this
conversion were somewhat popular, then it's likely that the RISC equivalent
routine AND the lookup-table would be in cache, where this operation could
be done in fewer than 5 cycles!

>One disagreement I have with the RISC proponents is the theory that
>everyone writes in a HLL.  It seems that despite years of trying to force
>everyone to write in HLL languages there will always be a few assembly 
>language programers.  Because no matter how much performance Semicoducter and 
>Computer companies give programmers they always want their programs to run 
>faster.  So these CISC instructions while not useful to compiler writers are 
>useful to assembly language jocks. 

There is such a thing as MACROS which allow assembly-language hacks to define
their OWN CISC 'instructions' (rather than have some arch designer tell them
what they can and can't have).

>This probably compiles into these RISC instructions assuming x, a, and i
>are in register R1,R2, and R3 respectively.
>ShiftL R4, R3, #2
>Add    R4, R4, R2
>Load   R1, [R4]
>This is a single 4 clock instruction on the 80386 vs 3 clocks for the RISC
>chip.  However, the RISC chip has had to fetch 3 instructions vs one for 
>the CISC processor.  So unless the RISC chip has a large on-chip cache it
>will be slower.    

Well, 3 cycles IS faster than 4!  Furthermore, the 3 RISC instructions
will most likely be together in cache, where they can be fetched as fast
as any microword could be.  YES, a RISC chip will have a much larger
cache (all else being equal) than a CISC chip. Plus, there should be
plenty of room left to add some performance features that the 80386
obviously doesn't have.

>I think that the good thing about the RISC philosphy is that it
>will reduce the tendency of designers to add new instructions or addressing
>modes just because they look whizzy or Brand X has them.  If a complex way of
>doing something is slower than a simple way don't put it in.

Hey! I won't argue with THAT.  How do ya think most of CISC got developed
anyway?
	

-- 


  'Out of the inkwell comes Bozo the Clown ...'
 
DISCLAIMER:  These opinions are neither mine nor my C-compiler's
       sun!pyramid!dan

hammond@petrus.UUCP (Rich A. Hammond) (03/03/86)

> clif@intelca.UUCP (Clif Purkiser) writes:
> 
> >The RISC machine I designed for my Computer Architecture course ...
to which sun!pyramid!dan responds:
> ...  Yes, there are
> RISC machines that have some multiple cycle instructions.  In my experience,
> though, something like a block move can usually be implemented with a loop
> (especially using an efficient branch scheme) of simpler instructions with
> *no loss of performance*.
> 
*NO LOSS OF PERFORMANCE?!?  No way!  Look, a memory block move is a RISC
at its worst, since the block move defeats the data cache.  A M68000
(16 bit data and instruction bus) is FASTER than the UCB RISC (32 bit
data and instruction bus) for equivalent 32 bit at a time block moves.
Even if the RISC instructions are in a cache, the data isn't and what's
worse, every other data access is a write.  Essentially, the block transfer
measures data bus bandwidth, not cache bandwidth, which is what RISCs
exploit.

Be careful when comparing "clock ticks".  No RISC I've
seen written up actually finishes instructions in ONE cycle, but since
the RISC is pipelined, an instruction completes every cycle.  There
is a difference!

Second, the 68000 family and Intel's *86 family tend to have high
frequency clocks divided on chip for micro-cycle timing.  RISC chips
often have much slower clocks, sometimes with separate phases 
generated off chip, what you need to use is the clock rate which
allows the chips to run with a constant speed mmain memory.
I.e. the time from address valid until data returns through the
buffers to the CPU should be the same, then calculate what clock rate
the CPU uses to have that time be the maximum memory cycle time.
Rich Hammond	{allegra, ucbvax,decvax} !bellcore!hammond

david@ztivax.UUCP (03/06/86)

> hammond@petrus writes:
> sun!pyramid!dan writes:
>> ...  Yes, there are
>> RISC machines that have some multiple cycle instructions.  In my experience,
>> though, something like a block move can usually be implemented with a loop
>> (especially using an efficient branch scheme) of simpler instructions with
>> *no loss of performance*.
> 
>*NO LOSS OF PERFORMANCE?!?  No way!  Look, a memory block move is a RISC
>at its worst, since the block move defeats the data cache.  A M68000 
>(16 bit data and instruction bus) is FASTER than the UCB RISC (32 bit
>data and instruction bus) for equivalent 32 bit at a time block moves.
>Even if the RISC instructions are in a cache, the data isn't and what's
>worse, every other data access is a write.  Essentially, the block transfer
>measures data bus bandwidth, not cache bandwidth, which is what RISCs
>exploit.

OK, so here we have a classic example of how a CISC instruction does
not help.  A CISC microcodes the action on-chip.  A RISC uses on-chip
cache.  Both are speed limited by the memory access times.  That is
possibly why the 68000 can do this as fast as the RISC - were the 
memory access times were similar?

So here, RISC at its worst, the same as CISC at its best?

David Smyth
Free and proud of it

seismo!unido!ztivax!david

hammond@petrus.UUCP (Rich A. Hammond) (03/10/86)

I pointed out that a 68000 could do block moves of 32 bit words FASTER
than the UCB RISC I or II for equivalent memory access times.
> David Smyth responded:
> 
> OK, so here we have a classic example of how a CISC instruction does
> not help.  A CISC microcodes the action on-chip.  A RISC uses on-chip
> cache.  Both are speed limited by the memory access times.  That is
> possibly why the 68000 can do this as fast as the RISC - were the 
> memory access times were similar?
> 
> So here, RISC at its worst, the same as CISC at its best?

NO WAY, as I pointed out, the 68000 has 16 bit data bus (i.e. 2 memory
cycles for each read and write of 32 bit words) while the RISC I & II
have a 32 bit data bus.  IF the CISC 68000 had the same size data bus
it would save 2 memory cycles, out of a total of 7 or be about 30%
faster than the RISC.  Of course this benchmark is never included for
RISC vs CISC comparisons.  However, copy loops occur much more frequently
in real code than benchmarks such as Ackermann's function.

Rich Hammond

aglew@ccvaxa.UUCP (03/14/86)

>/* Written  6:40 am  Mar 10, 1986 by hammond@petrus in ccvaxa:net.arch */
>I pointed out that a 68000 could do block moves of 32 bit words FASTER
>than the UCB RISC I or II for equivalent memory access times.
>> David Smyth responded:
>> ...
>> So here, RISC at its worst, the same as CISC at its best?
>
>NO WAY, as I pointed out, the 68000 has 16 bit data bus (i.e. 2 memory
>cycles for each read and write of 32 bit words) while the RISC I & II
>have a 32 bit data bus.  IF the CISC 68000 had the same size data bus
>it would save 2 memory cycles, out of a total of 7 or be about 30%
>faster than the RISC.  Of course this benchmark is never included for
>RISC vs CISC comparisons.  However, copy loops occur much more frequently
>in real code than benchmarks such as Ackermann's function.
>
>Rich Hammond

Well, let's be fair. How about looking at the benchmarks RISC did include?
Not just Ackerman's function.

In `A VLSI RISC', Computer 1982, Patterson and Sequin present SIMULATED
results. Among them:
	SED	- the UNIX stream oriented editor
	Speed   RISC I (sim) / VAX-11/780   1.1
		ie. the VAX is 10% slower on this text processing program
		than RISC

OK, those are simulations. How about some actual results?: `Running RISCs',
Foderaro, Van Dyke, and Patterson, VLSI Design, Sept/Oct 1982. String search.
	MC68000	8 MHz	wait states 2	4.7 ms
	RISC I	1.5 MHz		    0	2.5 ms
There are a lot more recent benchmarks, but this is the one that impressed me.
The very first RISC I, with bugs, a clock rate about a third what they'd
hoped for, ans still they benchmarked faster than a much more mature machine.