wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) (02/26/89)
> > This is very interesting. I don't have the strongest background in > hardware architecture, but, could you please explain how a processor > could be optimized for a specific high level language? > The speed of a mircoprocessor is somewhat proportional to the number of instructions it has to implement. For instance the 6502 can do every instruction in one clock cycle, while the 68000 can take up to 70. However, most HLL use 20% of the instructions 80% of the time. (80-20) rule. If we optimize those instructions, and allow the others to be constructed out of the major instuctions we can get a significant speed increase The other thing that most HLL need is lots and lots of registers. The 68000 has 16, but 1 is a stack pointer(a7), 1 points to the global area(a5), and another points to the local area(a6), and one is used to return function values (d0), leaving only 5 address and 7 data registers available. Some RISC chips on the other hand have up to 25 registers, a 256byte data cache, and a program cache. Some even have registers big enough to hold an entire 96-bit floating point number. That's the basic gist of how Reduced Instruction Set CPU's if you want more info you can get the data sheets for the 88000, or one of the other new risc chips and they'll go into a lot more detail. Pierce -- ____________________________________________________________________________ You can flame or laud me at: wetter@tybalt.caltech.edu or wetter@csvax.caltech.edu or pwetter@caltech.bitnet (There would be a witty saying here, but my signature has to be < 4lines)
trebor@biar.UUCP (Robert J Woodhead) (02/27/89)
In article <9770@cit-vax.Caltech.Edu> wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) writes: > The speed of a mircoprocessor is somewhat proportional to the number >of instructions it has to implement. For instance the 6502 can do every >instruction in one clock cycle, while the 68000 can take up to 70. Many 6502 instructions are >1 cycle long, as a glance at any 6502 programming manual will quickly confirm. The true speed of any processor is (# of cycles per second) * (how much an average instruction does) ----------------------------------------------------------------- (average number of cycles per instruction) The answer to the question "how much an average instruction does" depends on who you are talking to. However, it is a safe bet to say that an average 68000 instruction does more than an average 6502 instruction. +---------------------------------------------------------------------------+ | Robert J Woodhead !uunet!cornell!biar!trebor CompuServe 72447,37 | | Biar Games, Inc., 10 Spruce Lane, Ithaca NY 14850 607-257-1708,3864(fax) | +---------------------------------------------------------------------------+ | Games written, Viruses killed "I'm the head honcho of this here spread; | | While U Wait. Take a number. I don't need no stinking disclaimers!!!" | +---------------------------------------------------------------------------+
tim@crackle.amd.com (Tim Olson) (02/27/89)
In article <9770@cit-vax.Caltech.Edu> wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) writes: | > | > This is very interesting. I don't have the strongest background in | > hardware architecture, but, could you please explain how a processor | > could be optimized for a specific high level language? | > | The speed of a mircoprocessor is somewhat proportional to the number | of instructions it has to implement. For instance the 6502 can do every | instruction in one clock cycle, while the 68000 can take up to 70. This has more to do with the 6502 having hardwire decode, while the 68k is microcoded. (The 6502, by the way, takes at least 2 clock cycles to execute an instruction, and can take up to 7, depending upon addressing modes). | The other thing that most HLL need is lots and lots of registers. The | 68000 has 16, but 1 is a stack pointer(a7), 1 points to the global area(a5), and | another points to the local area(a6), and one is used to return function values | (d0), leaving only 5 address and 7 data registers available. Some RISC chips | on the other hand have up to 25 registers, a 256byte data cache, and a program ^^^^^^^^^^^^ Most RISC chips have at least 31 GP registers (they usually reserve 1 for a constant 0); some have more. SPARC implementations currently have ~120 registers, and the Am29000 has 192. | That's the basic gist of how Reduced Instruction Set CPU's if you want more | info you can get the data sheets for the 88000, or one of the other new risc | chips and they'll go into a lot more detail. | Pierce Here's another quickie explination: For a RISC machine to be faster than a CISC machine, it simply must take fewer cycles to complete the overall program, even if this means executing more instructions: 1 Performance = 1/sec = cycles/sec * ----------------------------- cycles/inst * [total inst] Thus, we can improve performance by raising the cycles/sec (increasing the clock frequency; basically a processing problem), decreasing the total number of instructions executed (by making them complex: CISC), or decreasing the number of cycles that an instruction requires (by making them simple: RISC). Note that these variables are not independant; it is hard to make very complex instructions run fast, etc. That is the view from the hardware side. However, software (specifically optimizing compilers) play just as important a role in the RISC performance picture. One can make the argument that RISC & CISC look very similar at the "micromachine" level, and that the fetching of a microinstruction from the microcode on a CISC machine is somewhat like a RISC machine fetching an instruction. Now the CISC machine has hard-wired microcode to execute from, while the RISC machine instructions are "custom-tailored" by the compiler for the problem at hand. For example, let's look at a typical loop: for (i=0; i<MAX; ++i) a[i] = 0; A CISC machine may have a single instruction that performs the inner statement, by using an indexed base+offset addressing mode. However, each time through the loop it must fetch the 32-bit base address of the array "a", multiply the index variable i by the size of the elements of a, add the two values together to form an address, then store 0 out to that location. A highly-optimizing compiler can recognize that the base of the array never changes (so it can be computed in a register before the loop begins [loop-invarient code motion]), and we can increment this address by the size of each element, rather than incrementing by 1 and then multiplying (or shifting) [strength-reduction]. Now the loop consists of a few, simple instructions (store, add, compare, branch), which matches nicely with what is provided by the RISC machine (and they are performed quickly, because they are executed directly instead of being interpreted by another level of microcode). -- Tim Olson Advanced Micro Devices (tim@crackle.amd.com)
wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) (02/27/89)
>> The speed of a mircoprocessor is somewhat proportional to the number >>of instructions it has to implement. For instance the 6502 can do every >>instruction in one clock cycle, while the 68000 can take up to 70. > > Many 6502 instructions are >1 cycle long, as a glance at any 6502 programming > manual will quickly confirm. The true speed of any processor is Ooops, sorry. I meant many 6502 instructions take ~ 1 cycle. I obviously didn't expect a multiply to run in one cycle. Pierce > -- ____________________________________________________________________________ You can flame or laud me at: wetter@tybalt.caltech.edu or wetter@csvax.caltech.edu or pwetter@caltech.bitnet (There would be a witty saying here, but my signature has to be < 4lines)
keith@Apple.COM (Keith Rollin) (02/27/89)
In article <9795@cit-vax.Caltech.Edu> wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) writes: >>> The speed of a mircoprocessor is somewhat proportional to the number >>>of instructions it has to implement. For instance the 6502 can do every > >>instruction in one clock cycle, while the 68000 can take up to 70. > > >> Many 6502 instructions are >1 cycle long, as a glance at any 6502 programming > > manual will quickly confirm. The true speed of any processor is > > Ooops, sorry. I meant many 6502 instructions take ~ 1 cycle. I obviously >didn't expect a multiply to run in one cycle. You're digging in deeper. To wit: - ALL 6502 instructions take at leat 2 cycles (even NOP). - There is no multiply instruction. ------------------------------------------------------------------------------ Keith Rollin --- Apple Computer, Inc. --- Developer Technical Support INTERNET: keith@apple.com UUCP: {decwrl, hoptoad, nsc, sun, amdahl}!apple!keith "Argue for your Apple, and sure enough, it's yours" - Keith Rollin, Contusions
trebor@biar.UUCP (Robert J Woodhead) (02/27/89)
In article <9795@cit-vax.Caltech.Edu> wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) writes: > Ooops, sorry. I meant many 6502 instructions take ~ 1 cycle. I obviously >didn't expect a multiply to run in one cycle. The 6502 does not have a multiply instruction. As I remember it (I haven't looked up the instruction timings in years) each cycle on a 6502 is a complete bus operation). Thus, the only instructions that are 1 cycle long are those without memory or immediate operands, such as TAX or CLC. This is a very small number of instructions. The 6502 was, for it's time, an extremely elegant architecture. After all, the chip really has 128 16 bit registers; they just happen to be in the first 256 bytes of ram. However, it was designed before, and does not even approach, the RISC concept of 1 instruction = 1 cycle. +---------------------------------------------------------------------------+ | Robert J Woodhead !uunet!cornell!biar!trebor CompuServe 72447,37 | | Biar Games, Inc., 10 Spruce Lane, Ithaca NY 14850 607-257-1708,3864(fax) | +---------------------------------------------------------------------------+ | Games written, Viruses killed "I'm the head honcho of this here spread; | | While U Wait. Take a number. I don't need no stinking disclaimers!!!" | +---------------------------------------------------------------------------+
trebor@biar.UUCP (Robert J Woodhead) (02/28/89)
In article <165@biar.UUCP> trebor@biar.UUCP (Robert J Woodhead) [me] writes: >The 6502 does not have a multiply instruction. As I remember it (I haven't >looked up the instruction timings in years) each cycle on a 6502 is a >complete bus operation). Thus, the only instructions >that are 1 cycle long are those without memory or immediate operands, such as >TAX or CLC. This is a very small number of instructions. Well, foot-in-mouth disease is catching. My onboard memory not being parity checked, I blew it. There are no 1 cycle 6502 instructions, as a gentleman from Apple mentions in his response to the original post. Though come to think of it, there were times when I was writing disk II code when I wished there were a few.... +---------------------------------------------------------------------------+ | Robert J Woodhead !uunet!cornell!biar!trebor CompuServe 72447,37 | | Biar Games, Inc., 10 Spruce Lane, Ithaca NY 14850 607-257-1708,3864(fax) | +---------------------------------------------------------------------------+ | Games written, Viruses killed "I'm the head honcho of this here spread; | | While U Wait. Take a number. I don't need no stinking disclaimers!!!" | +---------------------------------------------------------------------------+
daveh@cbmvax.UUCP (Dave Haynie) (03/01/89)
in article <9795@cit-vax.Caltech.Edu>, wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) says: >>> For instance the 6502 can do every >>> instruction in one clock cycle, while the 68000 can take up to 70. >> Many 6502 instructions are >1 cycle long, as a glance at any 6502 programming >> manual will quickly confirm. The true speed of any processor is > Ooops, sorry. I meant many 6502 instructions take ~ 1 cycle. I obviously > didn't expect a multiply to run in one cycle. > Pierce Well, no problem, the 6502 doesn't have a multiply instruction. I think there's a little mixing of ideas here. Without wait states, the 6502 takes only 1 clock cycle for each bus access. By contrast, a 68000 takes 4, a 68020 takes 3, and a 68030 takes 2 (the IIx and SE/30 both have wait states). Bus access is only part of it, though. As far as actual instructions go, the 680x0 chips can take 60 or more clock cycles to do a multiply. The 6502's longest instruction takes around 7 clocks, but it's 32 bit multiply routine will undoubtedly take longer at the same clock frequency than the 68030. Perhaps a RISC processor at the same clock frequency can execute one whole reasonable instruction in one cycle, and finish such a multiply faster than the 60 clocks it might take the 68030. The bottom line is who gets the most actual work done in a particular chunk of time. In some cases, it may even be the 6502 that wins (I can think of one case it beats the 68000 at 1/2 the clock speed), but for the most part, don't count on it. -- Dave Haynie "The 32 Bit Guy" Commodore-Amiga "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: D-DAVE H BIX: hazy Amiga -- It's not just a job, it's an obsession
t-stephp@microsoft.UUCP (Stephen Poole) (03/02/89)
In article <9795@cit-vax.Caltech.Edu> wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) writes: > Ooops, sorry. I meant many 6502 instructions take ~ 1 cycle. I obviously >didn't expect a multiply to run in one cycle. >Pierce Particularly unlikely in light of the fact that the 6502 has no multiply instruction. Just out of curiosity, have you ever actually written code for the 6502 or a RISC chip? -- -- Stephen D. Poole -- t-stephp@microsoft.UUCP -- Mac II Fanatic -- -- -- -- I'm just an Oregon Tech Software Engineering co-op at Micro- -- -- soft. Believe me, nobody here pays attention to my opinions! --