dan@pyramid.UUCP (Danial Carl Sobotta) (02/27/86)
clif@intelca.UUCP (Clif Purkiser) writes: >The RISC machine I designed for my Computer Architecture course >(taught by Mr. RISC Dave Patterson ) had 24 instructions and two addressing >modes it didn't even have a multiple. While I was very happy at the time that >I didn't have to try to right microcode for a lot of complex instructions! >(The previous class had to implement the Z8000 with TTL SSI and MSI parts) >My machine sure took a long time to do useful work. Needless to say my RISC >computer was a toy compared to real RISC machines. I would be interested to see how much of your performance degradation was due to *your* implementation as opposed to the nature of RISC in itself. >J. Giles point is well taken that compiler writers have not yet figured out a way of taking advantage of the numerous addressing modes and >instructions offered by CISC machines such as 86, Z8K, 68K or 32K families. >I would even concede that you can still have a RISC machine that takes more >than one clock to complete an instruction. This allows an important >instruction such as multiple to be included in the instruction set. ^^^^^^^^ Multiple what?? (block move, n-bit decimal add, ...). Yes, there are RISC machines that have some multiple cycle instructions. In my experience, though, something like a block move can usually be implemented with a loop (especially using an efficient branch scheme) of simpler instructions with *no loss of performance*. >The Ascii Adjust >instruction are used by COBOL (yes people still use it) compilers and >spreadsheet designers because COBOL uses BCD and these instruction speed >this process up. Likewise, XLAT can be used for converting ASCII to EBCDIC >in 5 clocks. It is indeed a shame that people still use COBOL, but nevertheless, most of the convert type instructions can be done easily (and maybe faster!) using a RISC architecture. Converting ASCII to EBCDIC in 5 cycles?? Excuse me if I'm ignorant, but wouldn't a simple table lookup do?? If this conversion were somewhat popular, then it's likely that the RISC equivalent routine AND the lookup-table would be in cache, where this operation could be done in fewer than 5 cycles! >One disagreement I have with the RISC proponents is the theory that >everyone writes in a HLL. It seems that despite years of trying to force >everyone to write in HLL languages there will always be a few assembly >language programers. Because no matter how much performance Semicoducter and >Computer companies give programmers they always want their programs to run >faster. So these CISC instructions while not useful to compiler writers are >useful to assembly language jocks. There is such a thing as MACROS which allow assembly-language hacks to define their OWN CISC 'instructions' (rather than have some arch designer tell them what they can and can't have). >This probably compiles into these RISC instructions assuming x, a, and i >are in register R1,R2, and R3 respectively. >ShiftL R4, R3, #2 >Add R4, R4, R2 >Load R1, [R4] >This is a single 4 clock instruction on the 80386 vs 3 clocks for the RISC >chip. However, the RISC chip has had to fetch 3 instructions vs one for >the CISC processor. So unless the RISC chip has a large on-chip cache it >will be slower. Well, 3 cycles IS faster than 4! Furthermore, the 3 RISC instructions will most likely be together in cache, where they can be fetched as fast as any microword could be. YES, a RISC chip will have a much larger cache (all else being equal) than a CISC chip. Plus, there should be plenty of room left to add some performance features that the 80386 obviously doesn't have. >I think that the good thing about the RISC philosphy is that it >will reduce the tendency of designers to add new instructions or addressing >modes just because they look whizzy or Brand X has them. If a complex way of >doing something is slower than a simple way don't put it in. Hey! I won't argue with THAT. How do ya think most of CISC got developed anyway? -- 'Out of the inkwell comes Bozo the Clown ...' DISCLAIMER: These opinions are neither mine nor my C-compiler's sun!pyramid!dan
hammond@petrus.UUCP (Rich A. Hammond) (03/03/86)
> clif@intelca.UUCP (Clif Purkiser) writes: > > >The RISC machine I designed for my Computer Architecture course ... to which sun!pyramid!dan responds: > ... Yes, there are > RISC machines that have some multiple cycle instructions. In my experience, > though, something like a block move can usually be implemented with a loop > (especially using an efficient branch scheme) of simpler instructions with > *no loss of performance*. > *NO LOSS OF PERFORMANCE?!? No way! Look, a memory block move is a RISC at its worst, since the block move defeats the data cache. A M68000 (16 bit data and instruction bus) is FASTER than the UCB RISC (32 bit data and instruction bus) for equivalent 32 bit at a time block moves. Even if the RISC instructions are in a cache, the data isn't and what's worse, every other data access is a write. Essentially, the block transfer measures data bus bandwidth, not cache bandwidth, which is what RISCs exploit. Be careful when comparing "clock ticks". No RISC I've seen written up actually finishes instructions in ONE cycle, but since the RISC is pipelined, an instruction completes every cycle. There is a difference! Second, the 68000 family and Intel's *86 family tend to have high frequency clocks divided on chip for micro-cycle timing. RISC chips often have much slower clocks, sometimes with separate phases generated off chip, what you need to use is the clock rate which allows the chips to run with a constant speed mmain memory. I.e. the time from address valid until data returns through the buffers to the CPU should be the same, then calculate what clock rate the CPU uses to have that time be the maximum memory cycle time. Rich Hammond {allegra, ucbvax,decvax} !bellcore!hammond
david@ztivax.UUCP (03/06/86)
> hammond@petrus writes: > sun!pyramid!dan writes: >> ... Yes, there are >> RISC machines that have some multiple cycle instructions. In my experience, >> though, something like a block move can usually be implemented with a loop >> (especially using an efficient branch scheme) of simpler instructions with >> *no loss of performance*. > >*NO LOSS OF PERFORMANCE?!? No way! Look, a memory block move is a RISC >at its worst, since the block move defeats the data cache. A M68000 >(16 bit data and instruction bus) is FASTER than the UCB RISC (32 bit >data and instruction bus) for equivalent 32 bit at a time block moves. >Even if the RISC instructions are in a cache, the data isn't and what's >worse, every other data access is a write. Essentially, the block transfer >measures data bus bandwidth, not cache bandwidth, which is what RISCs >exploit. OK, so here we have a classic example of how a CISC instruction does not help. A CISC microcodes the action on-chip. A RISC uses on-chip cache. Both are speed limited by the memory access times. That is possibly why the 68000 can do this as fast as the RISC - were the memory access times were similar? So here, RISC at its worst, the same as CISC at its best? David Smyth Free and proud of it seismo!unido!ztivax!david
hammond@petrus.UUCP (Rich A. Hammond) (03/10/86)
I pointed out that a 68000 could do block moves of 32 bit words FASTER than the UCB RISC I or II for equivalent memory access times. > David Smyth responded: > > OK, so here we have a classic example of how a CISC instruction does > not help. A CISC microcodes the action on-chip. A RISC uses on-chip > cache. Both are speed limited by the memory access times. That is > possibly why the 68000 can do this as fast as the RISC - were the > memory access times were similar? > > So here, RISC at its worst, the same as CISC at its best? NO WAY, as I pointed out, the 68000 has 16 bit data bus (i.e. 2 memory cycles for each read and write of 32 bit words) while the RISC I & II have a 32 bit data bus. IF the CISC 68000 had the same size data bus it would save 2 memory cycles, out of a total of 7 or be about 30% faster than the RISC. Of course this benchmark is never included for RISC vs CISC comparisons. However, copy loops occur much more frequently in real code than benchmarks such as Ackermann's function. Rich Hammond
aglew@ccvaxa.UUCP (03/14/86)
>/* Written 6:40 am Mar 10, 1986 by hammond@petrus in ccvaxa:net.arch */ >I pointed out that a 68000 could do block moves of 32 bit words FASTER >than the UCB RISC I or II for equivalent memory access times. >> David Smyth responded: >> ... >> So here, RISC at its worst, the same as CISC at its best? > >NO WAY, as I pointed out, the 68000 has 16 bit data bus (i.e. 2 memory >cycles for each read and write of 32 bit words) while the RISC I & II >have a 32 bit data bus. IF the CISC 68000 had the same size data bus >it would save 2 memory cycles, out of a total of 7 or be about 30% >faster than the RISC. Of course this benchmark is never included for >RISC vs CISC comparisons. However, copy loops occur much more frequently >in real code than benchmarks such as Ackermann's function. > >Rich Hammond Well, let's be fair. How about looking at the benchmarks RISC did include? Not just Ackerman's function. In `A VLSI RISC', Computer 1982, Patterson and Sequin present SIMULATED results. Among them: SED - the UNIX stream oriented editor Speed RISC I (sim) / VAX-11/780 1.1 ie. the VAX is 10% slower on this text processing program than RISC OK, those are simulations. How about some actual results?: `Running RISCs', Foderaro, Van Dyke, and Patterson, VLSI Design, Sept/Oct 1982. String search. MC68000 8 MHz wait states 2 4.7 ms RISC I 1.5 MHz 0 2.5 ms There are a lot more recent benchmarks, but this is the one that impressed me. The very first RISC I, with bugs, a clock rate about a third what they'd hoped for, ans still they benchmarked faster than a much more mature machine.