[comp.sys.amiga] 6502 Vs 68000, lets get it straight .

dillon@CORY.BERKELEY.EDU (Matt Dillon) (03/09/87)

	Don't get clumsy now!  Let me get it straight for everybody:

The 6502 takes one clock cycle to do an 8-bit memory fetch.  The number of
clock cycles required to execute an instruction is in most cases exactly
the number of memory operations required to read and execute the instruction.
Thus, a LDA absolute requires 4 memory fetches and thus 4 clock cycles.
(3 fetches for the instructions, 1 for the absolute memory operation).
There are some expceptions.. most single byte instructions like TAX take 2
clock cycles even though there is only one memory fetch.

A 68000 on the other hand takes 4 clock cycles for each memory fetch, and
fetches data 16-bits at a time.  instruction execution times are, in general,
related to the number of memory operations required.  Most longword operations
take an extra 2 clock cycles (not memory cycles) due to internal processing.

SO.  In terms of basic throughput, an 8Mhz 68000 is 4 times faster than a 
1Mhz 6502. (32bits/uS vs 8bits/uS).  HOWEVER, the 68000 allows you to do
a more complex range of operations in the same time.  Specifically, a 68000
can manipulate 16 and 32 bit quantities and the 6502 can only manipulate 8
bit quantities.  Attempting to make the 6502 do, say, a 16bit add immediate
to memory requires about 7 instructions (CLC/LDA/ADC#/STA/LDA/ADC#/STA)=17cc 
whereas a 68000 can do it in a single instruction (ADD)=16cc.  So the 6502
can be thought of as fast only if you're program doesn't require anything
beyond 8 bit quantitiy sizes.  Even if you spent 24 hours optimizing your 
6502 code, you can't really do a 16bit add in anything less than four 
instructions, and that's assuming one addend is already loaded into registers
A and X and the carry is set to something meaningful.  Each 68000 instruction
is about 4x more powerful than a 6502 instruction.






Now, a 68000 instruction is, on the average, twice as long as a 6502
instruction... And I'm being very generous to the 6502 here.

So, putting it all together:	8 Mhz 68000 Vs 1 Mhz 6502
	Basic throughput 4x
	Take into account power of 68000 (16/32bit registers & operations): 4x
	Take into account instruction size: .5x

	Overall rating:	8x.


The jist is that the clock rating reflects the relative differences between
a 6502 and a 68000. (Obviously this generalization only applies to the 6502
vs 68000).  Thus if an 8Mhz 6502 did exist, it would probably be on par with
a 68000.

NOTE: the previous argument is very generous towards the 6502.... I do not
take into account the large number of registers on the 68000 or its expanded
address space.



Example 2:  Tight loop copy 256 bytes from absolute location

6502:	ldx #0		time ~= 256*(4+4+2+3) = 3328 clock cycles
loop:	lda src,x
	sta dest,x
	dex
	bne loop

68000:	move.l	src,a0	time ~= 64*(20+10) = 1920 clock cycles
	move.l	dest,a1
	move.w	#256/4,d0
loop:	move.l	(a0)+,(a1)+
	dbf D0,loop

	result: 8Mhz 68000 about 14x a 1Mhz 6502

	Note that for the 6502 program to copy more than 256 bytes, the 
	most efficient routine is a self-modifying code routine that
	has an inner loop equivalent to the above example and an outer loop
	which modifies the MSB address in the LDA and STA instructions.  this
	effectively gives the same throughput.

Example 3:	16 bit add
6502:	(add .Alsb .Xmsb to zero page memory)	time = 16 cc
	clc
	adc dest
	sta dest
	txa
	adc dest+1
	sta dest+1

68000:	(add D0 to register indirect (Aztec small data model))
	add.w d0,off(Ax)				time = 16 cc

	result: 8Mhz 68000 about 8x a 1Mhz 6502
	NOTE: register-register ADD takes only 4 clock cycles.
	NOTE: addressing modes picked to best represent programming
	enviroment.

Example 4:	32 bit add
6502:	(add .Alsb .Xmsb and zero-page src to zero page destination)
	clc						time = 34 cc
	adc dest
	sta dest
	txa
	adc dest+1
	sta dest+1
	lda src
	adc dest+2
	sta dest+2
	lda src+1
	adc dest+3
	sta dest+3

68000:	add.l	D0,off(Ax)				time = 24 cc
	
	result: 8Mhz 68000 about 11x a 1Mhz 6502


Example 5:	Simple table driven PLOT x,y onto some screen .  Assume
		will do many plots.

6502:	plot (x, y).. max 256x256 drawing area.
	lda scanlinelsb,y				time = 33 cc
	sta zeropage
	lda scanlinemsb,y
	sta zeropage+1		(takes 3cc)
	ldy columnindex,x
	lda (zeropage),y	(takes 5cc)
	ora bittable,x		(takes 4cc)
	sta (zeropage),y	(takes 6cc)

68000:	plot (D0, D1).. max 2048 pixels on the X axis, 8192 on the Y
	registers (as they would be for multiple plots):
		A0=scanline table of longword screen address
		A1=columnindex (table of bytes column convert)
		A2=bittable (table of byte masks)

							time = 72 cc

	asl.w	#2,D1		;y = y * 4 to get index into longword array
	move.l	0(A0,D1.w),A3	;get scanline
	add.w	0(A1,D0.w),A3	;incorporate columnindex
	move.b	0(A2,D0.w),D0	;get mask (80/40/20/10/08/04/02/01)
	or.b	D0,(A3)		;write it to screen


	Results: 8 Mhz 68000 only 3.7x a 1 Mhz 6502


				-Matt

hatcher@INGRES.BERKELEY.EDU (Doug Merritt) (03/10/87)

Matt Dillon wants to straighten us all out on how much faster a 68000
is than a 6502. What he says seems to be good information, but it misses
the entire point that George Robbins (and I and others) are trying to make.

The point is not how fast you can do things on one versus the other.
That is different than doing an emulation.

The point is how hard it is to do an emulation. Go re-read George's two
postings. They are totally accurate and to the point, and do not deserve
being misunderstood.

To put it another way, if you are going to write software that *TRANSLATES*
a 6502 program into an equivalent 68000 program, then you can probably
get a 68000 program that is as fast or faster than the 6502 program. But
nobody is planning that, because it is extraordinarily difficult. That is
a vastly different thing than doing an emulator.

An emulator need only understand each individual effect of each instruction
of the target machine.

A translator needs to be able to understand the OVERALL effect of arbitrary
groups of instructions, and be able to WRITE NEW CODE in the new machine
that has the same effect.

Now, I won't discourage you from doing this. I think intelligent translators
are pretty cool. But don't underestimate the work involved. It is considerably
harder than just writing a mere C compiler, for instance. At least, it is
if you want any kind of optimized output. I suppose that if you don't care
how optimized the output is, then it's not too hard. Note that in this case,
you end up with an emulator again, and not a fast one, since it just puts
the emulation of each instruction in-line in the output program.

So although Matt and others have some good points, they are addressing
a different subject than "what about all that 6502 software out there".
	Doug

grr@cbmvax.UUCP (03/11/87)

In article <8703100226.AA07627@ingres.Berkeley.EDU> hatcher@INGRES.BERKELEY.EDU (Doug Merritt) writes:
>Matt Dillon wants to straighten us all out on how much faster a 68000
>is than a 6502. What he says seems to be good information, but it misses
>the entire point that George Robbins (and I and others) are trying to make.
>
>The point is not how fast you can do things on one versus the other.
>That is different than doing an emulation.

I think we can agree that a 8 MHz 68000 is n:n>4 times faster than a 1 MHz
6502 on generic code segments where the bigger register store and 16 bit
instructions pay off.  Howver for the fairly simple task that map nicely into
6502 instructions - byte moves, indexing 256 byte tables, etc. the 68000 may
only be n:n<2 times faster.

NOW!  What sort of operations are you going to be doing in your interpreter?
You will have to do exactly those dinky little things where you have the least
performance advantage over the 6502.  Also you have interpretive overhead to
deal with, although hopefully the power of the 68000 helps here.

To strike the final blows, remember the the C64 has memory mapped I/O!  This
means for *every* memory access (possibly including reads, even I-fetches)
you have to test for side effects!  Sound bad?  Next since since the C64 has
dynamically switchable ROM/RAM overlays, you get to add either a layer of
indirection or other mapping function.  Oh, pain!  Count all those nice little
68000 cycles you're eating.

Hmmm, anything else?  Remember C64 games typically synchronize little code
fragments to raster positions and change VIC registers on the fly.  So of
course you're interleaving this VIC emulation somehow...

Anybody got some 32 MHz 68030's?  If you had told the 6502 designers that the
6502 would be one of the most popular *general purpose* microprocessors ever,
they would have laughed.  It was a variation on the Motorola 6800 theme (which
was more general purpose) with an instruction set to give tight, fast code and
and external interface to allow optimal use as a micro-controller chip.  You
know, traffic lites, blenders, microwaves and that sort of thing...
-- 
George Robbins - now working for,	uucp: {ihnp4|seismo|rutgers}!cbmvax!grr
but no way officially representing	arpa: cbmvax!grr@seismo.css.GOV
Commodore, Engineering Department	fone: 215-431-9255 (only by moonlite)