dillon@CORY.BERKELEY.EDU (Matt Dillon) (03/09/87)
Don't get clumsy now! Let me get it straight for everybody: The 6502 takes one clock cycle to do an 8-bit memory fetch. The number of clock cycles required to execute an instruction is in most cases exactly the number of memory operations required to read and execute the instruction. Thus, a LDA absolute requires 4 memory fetches and thus 4 clock cycles. (3 fetches for the instructions, 1 for the absolute memory operation). There are some expceptions.. most single byte instructions like TAX take 2 clock cycles even though there is only one memory fetch. A 68000 on the other hand takes 4 clock cycles for each memory fetch, and fetches data 16-bits at a time. instruction execution times are, in general, related to the number of memory operations required. Most longword operations take an extra 2 clock cycles (not memory cycles) due to internal processing. SO. In terms of basic throughput, an 8Mhz 68000 is 4 times faster than a 1Mhz 6502. (32bits/uS vs 8bits/uS). HOWEVER, the 68000 allows you to do a more complex range of operations in the same time. Specifically, a 68000 can manipulate 16 and 32 bit quantities and the 6502 can only manipulate 8 bit quantities. Attempting to make the 6502 do, say, a 16bit add immediate to memory requires about 7 instructions (CLC/LDA/ADC#/STA/LDA/ADC#/STA)=17cc whereas a 68000 can do it in a single instruction (ADD)=16cc. So the 6502 can be thought of as fast only if you're program doesn't require anything beyond 8 bit quantitiy sizes. Even if you spent 24 hours optimizing your 6502 code, you can't really do a 16bit add in anything less than four instructions, and that's assuming one addend is already loaded into registers A and X and the carry is set to something meaningful. Each 68000 instruction is about 4x more powerful than a 6502 instruction. Now, a 68000 instruction is, on the average, twice as long as a 6502 instruction... And I'm being very generous to the 6502 here. So, putting it all together: 8 Mhz 68000 Vs 1 Mhz 6502 Basic throughput 4x Take into account power of 68000 (16/32bit registers & operations): 4x Take into account instruction size: .5x Overall rating: 8x. The jist is that the clock rating reflects the relative differences between a 6502 and a 68000. (Obviously this generalization only applies to the 6502 vs 68000). Thus if an 8Mhz 6502 did exist, it would probably be on par with a 68000. NOTE: the previous argument is very generous towards the 6502.... I do not take into account the large number of registers on the 68000 or its expanded address space. Example 2: Tight loop copy 256 bytes from absolute location 6502: ldx #0 time ~= 256*(4+4+2+3) = 3328 clock cycles loop: lda src,x sta dest,x dex bne loop 68000: move.l src,a0 time ~= 64*(20+10) = 1920 clock cycles move.l dest,a1 move.w #256/4,d0 loop: move.l (a0)+,(a1)+ dbf D0,loop result: 8Mhz 68000 about 14x a 1Mhz 6502 Note that for the 6502 program to copy more than 256 bytes, the most efficient routine is a self-modifying code routine that has an inner loop equivalent to the above example and an outer loop which modifies the MSB address in the LDA and STA instructions. this effectively gives the same throughput. Example 3: 16 bit add 6502: (add .Alsb .Xmsb to zero page memory) time = 16 cc clc adc dest sta dest txa adc dest+1 sta dest+1 68000: (add D0 to register indirect (Aztec small data model)) add.w d0,off(Ax) time = 16 cc result: 8Mhz 68000 about 8x a 1Mhz 6502 NOTE: register-register ADD takes only 4 clock cycles. NOTE: addressing modes picked to best represent programming enviroment. Example 4: 32 bit add 6502: (add .Alsb .Xmsb and zero-page src to zero page destination) clc time = 34 cc adc dest sta dest txa adc dest+1 sta dest+1 lda src adc dest+2 sta dest+2 lda src+1 adc dest+3 sta dest+3 68000: add.l D0,off(Ax) time = 24 cc result: 8Mhz 68000 about 11x a 1Mhz 6502 Example 5: Simple table driven PLOT x,y onto some screen . Assume will do many plots. 6502: plot (x, y).. max 256x256 drawing area. lda scanlinelsb,y time = 33 cc sta zeropage lda scanlinemsb,y sta zeropage+1 (takes 3cc) ldy columnindex,x lda (zeropage),y (takes 5cc) ora bittable,x (takes 4cc) sta (zeropage),y (takes 6cc) 68000: plot (D0, D1).. max 2048 pixels on the X axis, 8192 on the Y registers (as they would be for multiple plots): A0=scanline table of longword screen address A1=columnindex (table of bytes column convert) A2=bittable (table of byte masks) time = 72 cc asl.w #2,D1 ;y = y * 4 to get index into longword array move.l 0(A0,D1.w),A3 ;get scanline add.w 0(A1,D0.w),A3 ;incorporate columnindex move.b 0(A2,D0.w),D0 ;get mask (80/40/20/10/08/04/02/01) or.b D0,(A3) ;write it to screen Results: 8 Mhz 68000 only 3.7x a 1 Mhz 6502 -Matt
hatcher@INGRES.BERKELEY.EDU (Doug Merritt) (03/10/87)
Matt Dillon wants to straighten us all out on how much faster a 68000 is than a 6502. What he says seems to be good information, but it misses the entire point that George Robbins (and I and others) are trying to make. The point is not how fast you can do things on one versus the other. That is different than doing an emulation. The point is how hard it is to do an emulation. Go re-read George's two postings. They are totally accurate and to the point, and do not deserve being misunderstood. To put it another way, if you are going to write software that *TRANSLATES* a 6502 program into an equivalent 68000 program, then you can probably get a 68000 program that is as fast or faster than the 6502 program. But nobody is planning that, because it is extraordinarily difficult. That is a vastly different thing than doing an emulator. An emulator need only understand each individual effect of each instruction of the target machine. A translator needs to be able to understand the OVERALL effect of arbitrary groups of instructions, and be able to WRITE NEW CODE in the new machine that has the same effect. Now, I won't discourage you from doing this. I think intelligent translators are pretty cool. But don't underestimate the work involved. It is considerably harder than just writing a mere C compiler, for instance. At least, it is if you want any kind of optimized output. I suppose that if you don't care how optimized the output is, then it's not too hard. Note that in this case, you end up with an emulator again, and not a fast one, since it just puts the emulation of each instruction in-line in the output program. So although Matt and others have some good points, they are addressing a different subject than "what about all that 6502 software out there". Doug
grr@cbmvax.UUCP (03/11/87)
In article <8703100226.AA07627@ingres.Berkeley.EDU> hatcher@INGRES.BERKELEY.EDU (Doug Merritt) writes: >Matt Dillon wants to straighten us all out on how much faster a 68000 >is than a 6502. What he says seems to be good information, but it misses >the entire point that George Robbins (and I and others) are trying to make. > >The point is not how fast you can do things on one versus the other. >That is different than doing an emulation. I think we can agree that a 8 MHz 68000 is n:n>4 times faster than a 1 MHz 6502 on generic code segments where the bigger register store and 16 bit instructions pay off. Howver for the fairly simple task that map nicely into 6502 instructions - byte moves, indexing 256 byte tables, etc. the 68000 may only be n:n<2 times faster. NOW! What sort of operations are you going to be doing in your interpreter? You will have to do exactly those dinky little things where you have the least performance advantage over the 6502. Also you have interpretive overhead to deal with, although hopefully the power of the 68000 helps here. To strike the final blows, remember the the C64 has memory mapped I/O! This means for *every* memory access (possibly including reads, even I-fetches) you have to test for side effects! Sound bad? Next since since the C64 has dynamically switchable ROM/RAM overlays, you get to add either a layer of indirection or other mapping function. Oh, pain! Count all those nice little 68000 cycles you're eating. Hmmm, anything else? Remember C64 games typically synchronize little code fragments to raster positions and change VIC registers on the fly. So of course you're interleaving this VIC emulation somehow... Anybody got some 32 MHz 68030's? If you had told the 6502 designers that the 6502 would be one of the most popular *general purpose* microprocessors ever, they would have laughed. It was a variation on the Motorola 6800 theme (which was more general purpose) with an instruction set to give tight, fast code and and external interface to allow optimal use as a micro-controller chip. You know, traffic lites, blenders, microwaves and that sort of thing... -- George Robbins - now working for, uucp: {ihnp4|seismo|rutgers}!cbmvax!grr but no way officially representing arpa: cbmvax!grr@seismo.css.GOV Commodore, Engineering Department fone: 215-431-9255 (only by moonlite)