[comp.sys.mac] Pierce Explains RISC. was new mac rumors

wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) (02/26/89)

> 
> This is very interesting.  I don't have the strongest background in
> hardware architecture, but, could you please explain how a processor
> could be optimized for a specific high level language?
> 
     The speed of a mircoprocessor is somewhat proportional to the number
of instructions it has to implement. For instance the 6502 can do every 
instruction in one clock cycle, while the 68000 can take up to 70.

     However, most HLL use 20% of the instructions 80% of the time. (80-20)
rule. If we optimize those instructions, and allow the others to be
constructed out of the major instuctions we can get a significant speed increase
  The other thing that most HLL need is lots and lots of registers. The
68000 has 16, but 1 is a stack pointer(a7), 1 points to the global area(a5), and
another points to the local area(a6), and one is used to return function values
(d0), leaving only 5 address and 7 data registers available. Some RISC chips
on the other hand have up to 25 registers, a 256byte data cache, and a program
cache. Some even have registers big enough to hold an entire 96-bit floating
point number.

   That's the basic gist of how Reduced Instruction Set CPU's if you want more
info you can get the data sheets for the 88000, or one of the other new risc
chips and they'll go into a lot more detail.
Pierce
-- 
____________________________________________________________________________
You can flame or laud me at:
wetter@tybalt.caltech.edu or wetter@csvax.caltech.edu or pwetter@caltech.bitnet
  (There would be a witty saying here, but my signature has to be < 4lines)

trebor@biar.UUCP (Robert J Woodhead) (02/27/89)

In article <9770@cit-vax.Caltech.Edu> wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) writes:
>     The speed of a mircoprocessor is somewhat proportional to the number
>of instructions it has to implement. For instance the 6502 can do every 
>instruction in one clock cycle, while the 68000 can take up to 70.

Many 6502 instructions are >1 cycle long, as a glance at any 6502 programming
manual will quickly confirm.  The true speed of any processor is

(# of cycles per second) * (how much an average instruction does)
-----------------------------------------------------------------
        (average number of cycles per instruction)

The answer to the question "how much an average instruction does" depends
on who you are talking to.  However, it is a safe bet to say that an
average 68000 instruction does more than an average 6502 instruction.

+---------------------------------------------------------------------------+
| Robert J Woodhead      !uunet!cornell!biar!trebor     CompuServe 72447,37 |
| Biar Games, Inc., 10 Spruce Lane, Ithaca NY 14850  607-257-1708,3864(fax) |
+---------------------------------------------------------------------------+
| Games written, Viruses killed   "I'm the head honcho of this here spread; |
| While U Wait.  Take a number.    I don't need no stinking disclaimers!!!" |
+---------------------------------------------------------------------------+

tim@crackle.amd.com (Tim Olson) (02/27/89)

In article <9770@cit-vax.Caltech.Edu> wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) writes:
| > 
| > This is very interesting.  I don't have the strongest background in
| > hardware architecture, but, could you please explain how a processor
| > could be optimized for a specific high level language?
| > 
|      The speed of a mircoprocessor is somewhat proportional to the number
| of instructions it has to implement. For instance the 6502 can do every 
| instruction in one clock cycle, while the 68000 can take up to 70.

This has more to do with the 6502 having hardwire decode, while the 68k
is microcoded. (The 6502, by the way, takes at least 2 clock cycles to
execute an instruction, and can take up to 7, depending upon addressing
modes).

|   The other thing that most HLL need is lots and lots of registers. The
| 68000 has 16, but 1 is a stack pointer(a7), 1 points to the global area(a5), and
| another points to the local area(a6), and one is used to return function values
| (d0), leaving only 5 address and 7 data registers available. Some RISC chips
| on the other hand have up to 25 registers, a 256byte data cache, and a program
			       ^^^^^^^^^^^^
Most RISC chips have at least 31 GP registers (they usually reserve 1
for a constant 0); some have more.  SPARC implementations
currently have ~120 registers, and the Am29000 has 192.

|    That's the basic gist of how Reduced Instruction Set CPU's if you want more
| info you can get the data sheets for the 88000, or one of the other new risc
| chips and they'll go into a lot more detail.
| Pierce

Here's another quickie explination:


For a RISC machine to be faster than a CISC machine, it simply must take
fewer cycles to complete the overall program, even if this means
executing more instructions:

							1
	Performance = 1/sec = cycles/sec * -----------------------------
					   cycles/inst  *  [total inst]


Thus, we can improve performance by raising the cycles/sec (increasing
the clock frequency; basically a processing problem), decreasing the
total number of instructions executed (by making them complex: CISC), or
decreasing the number of cycles that an instruction requires (by making
them simple: RISC).  Note that these variables are not independant; it
is hard to make very complex instructions run fast, etc. 


That is the view from the hardware side.  However, software
(specifically optimizing compilers) play just as important a role in the
RISC performance picture.  One can make the argument that RISC & CISC
look very similar at the "micromachine" level, and that the fetching of
a microinstruction from the microcode on a CISC machine is somewhat like
a RISC machine fetching an instruction.  Now the CISC machine has
hard-wired microcode to execute from, while the RISC machine
instructions are "custom-tailored" by the compiler for the problem at
hand.

For example, let's look at a typical loop:

	for (i=0; i<MAX; ++i)
		a[i] = 0;

A CISC machine may have a single instruction that performs the inner
statement, by using an indexed base+offset addressing mode.  However,
each time through the loop it must fetch the 32-bit base address of the
array "a", multiply the index variable i by the size of the elements of
a, add the two values together to form an address, then store 0 out to
that location.

A highly-optimizing compiler can recognize that the base of the array
never changes (so it can be computed in a register before the loop
begins [loop-invarient code motion]), and we can increment this address
by the size of each element, rather than incrementing by 1 and then
multiplying (or shifting) [strength-reduction].  Now the loop consists
of a few, simple instructions (store, add, compare, branch), which
matches nicely with what is provided by the RISC machine (and they are
performed quickly, because they are executed directly instead of being
interpreted by another level of microcode).


	-- Tim Olson
	Advanced Micro Devices
	(tim@crackle.amd.com)

wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) (02/27/89)

>>     The speed of a mircoprocessor is somewhat proportional to the number
>>of instructions it has to implement. For instance the 6502 can do every 
 >>instruction in one clock cycle, while the 68000 can take up to 70.
  > 
> Many 6502 instructions are >1 cycle long, as a glance at any 6502 programming
 > manual will quickly confirm.  The true speed of any processor is

    Ooops, sorry. I meant many 6502 instructions take ~ 1 cycle. I obviously
didn't expect a multiply to run in one cycle.
Pierce
 > 
-- 
____________________________________________________________________________
You can flame or laud me at:
wetter@tybalt.caltech.edu or wetter@csvax.caltech.edu or pwetter@caltech.bitnet
  (There would be a witty saying here, but my signature has to be < 4lines)

keith@Apple.COM (Keith Rollin) (02/27/89)

In article <9795@cit-vax.Caltech.Edu> wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) writes:
>>>     The speed of a mircoprocessor is somewhat proportional to the number
>>>of instructions it has to implement. For instance the 6502 can do every 
> >>instruction in one clock cycle, while the 68000 can take up to 70.
>  > 
>> Many 6502 instructions are >1 cycle long, as a glance at any 6502 programming
> > manual will quickly confirm.  The true speed of any processor is
>
>    Ooops, sorry. I meant many 6502 instructions take ~ 1 cycle. I obviously
>didn't expect a multiply to run in one cycle.

You're digging in deeper. To wit:

- ALL 6502 instructions take at leat 2 cycles (even NOP).
- There is no multiply instruction.

------------------------------------------------------------------------------
Keith Rollin  ---  Apple Computer, Inc.  ---  Developer Technical Support
INTERNET: keith@apple.com
    UUCP: {decwrl, hoptoad, nsc, sun, amdahl}!apple!keith
"Argue for your Apple, and sure enough, it's yours" - Keith Rollin, Contusions

trebor@biar.UUCP (Robert J Woodhead) (02/27/89)

In article <9795@cit-vax.Caltech.Edu> wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) writes:
>    Ooops, sorry. I meant many 6502 instructions take ~ 1 cycle. I obviously
>didn't expect a multiply to run in one cycle.

The 6502 does not have a multiply instruction.  As I remember it (I haven't
looked up the instruction timings in years) each cycle on a 6502 is a
complete bus operation).  Thus, the only instructions
that are 1 cycle long are those without memory or immediate operands, such as
TAX or CLC.  This is a very small number of instructions.

The 6502 was, for it's time, an extremely elegant architecture.  After all,
the chip really has 128 16 bit registers; they just happen to be in the
first 256 bytes of ram.  However, it was designed before, and does not even
approach, the RISC concept of 1 instruction = 1 cycle.

+---------------------------------------------------------------------------+
| Robert J Woodhead      !uunet!cornell!biar!trebor     CompuServe 72447,37 |
| Biar Games, Inc., 10 Spruce Lane, Ithaca NY 14850  607-257-1708,3864(fax) |
+---------------------------------------------------------------------------+
| Games written, Viruses killed   "I'm the head honcho of this here spread; |
| While U Wait.  Take a number.    I don't need no stinking disclaimers!!!" |
+---------------------------------------------------------------------------+

trebor@biar.UUCP (Robert J Woodhead) (02/28/89)

In article <165@biar.UUCP> trebor@biar.UUCP (Robert J Woodhead) [me] writes:
>The 6502 does not have a multiply instruction.  As I remember it (I haven't
>looked up the instruction timings in years) each cycle on a 6502 is a
>complete bus operation).  Thus, the only instructions
>that are 1 cycle long are those without memory or immediate operands, such as
>TAX or CLC.  This is a very small number of instructions.

Well, foot-in-mouth disease is catching.  My onboard memory not being
parity checked, I blew it.  There are no 1 cycle 6502 instructions, as
a gentleman from Apple mentions in his response to the original post.

Though come to think of it, there were times when I was writing disk II
code when I wished there were a few....

+---------------------------------------------------------------------------+
| Robert J Woodhead      !uunet!cornell!biar!trebor     CompuServe 72447,37 |
| Biar Games, Inc., 10 Spruce Lane, Ithaca NY 14850  607-257-1708,3864(fax) |
+---------------------------------------------------------------------------+
| Games written, Viruses killed   "I'm the head honcho of this here spread; |
| While U Wait.  Take a number.    I don't need no stinking disclaimers!!!" |
+---------------------------------------------------------------------------+

daveh@cbmvax.UUCP (Dave Haynie) (03/01/89)

in article <9795@cit-vax.Caltech.Edu>, wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) says:

>>> For instance the 6502 can do every 
>>> instruction in one clock cycle, while the 68000 can take up to 70.

>> Many 6502 instructions are >1 cycle long, as a glance at any 6502 programming
>> manual will quickly confirm.  The true speed of any processor is

>     Ooops, sorry. I meant many 6502 instructions take ~ 1 cycle. I obviously
> didn't expect a multiply to run in one cycle.
> Pierce

Well, no problem, the 6502 doesn't have a multiply instruction.  I think 
there's a little mixing of ideas here.  Without wait states, the 6502 takes 
only 1 clock cycle for each bus access.  By contrast, a 68000 takes 4, a
68020 takes 3, and a 68030 takes 2 (the IIx and SE/30 both have wait states).
Bus access is only part of it, though.  As far as actual instructions go,
the 680x0 chips can take 60 or more clock cycles to do a multiply.  The 
6502's longest instruction takes around 7 clocks, but it's 32 bit multiply
routine will undoubtedly take longer at the same clock frequency than the
68030.  Perhaps a RISC processor at the same clock frequency can execute one
whole reasonable instruction in one cycle, and finish such a multiply faster
than the 60 clocks it might take the 68030.  The bottom line is who gets the
most actual work done in a particular chunk of time.  In some cases, it may
even be the 6502 that wins (I can think of one case it beats the 68000 at
1/2 the clock speed), but for the most part, don't count on it.

-- 
Dave Haynie  "The 32 Bit Guy"     Commodore-Amiga  "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: D-DAVE H     BIX: hazy
              Amiga -- It's not just a job, it's an obsession

t-stephp@microsoft.UUCP (Stephen Poole) (03/02/89)

In article <9795@cit-vax.Caltech.Edu> wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) writes:
>    Ooops, sorry. I meant many 6502 instructions take ~ 1 cycle. I obviously
>didn't expect a multiply to run in one cycle.
>Pierce

Particularly unlikely in light of the fact that the 6502 has no multiply
instruction.  

Just out of curiosity, have you ever actually written code for the 6502 or
a RISC chip?
-- 
-- Stephen D. Poole -- t-stephp@microsoft.UUCP -- Mac II Fanatic --
--                                                               --
-- I'm just an Oregon Tech Software Engineering co-op at  Micro- --
-- soft.  Believe me, nobody here pays attention to my opinions! --