[comp.sys.amiga] Memory Fill/Copy/Compare statistics + some assembly examples w/ DBcc

dillon@CORY.BERKELEY.EDU (Matt Dillon) (04/24/87)

	(I better sign my name here... this is a long article)

				-Matt

	Here is a good example of how to use a DBcc statement, including
how to extend it to handle counts larger than 16 bits (only two more
instructions).  I've tested all three of these routines.

	The routines transfer information a byte at a time.  Theoretically,
transfering information a word or a long at a time would be about 2x and 4x
as fast, but you need to add code to take care of initial an trailing 
alignment.  If you made a restriction that the supplied arguments were on
word or long boundries, and the size in multiples of 2 or 4 bytes, you
could implement such an improvment by a simple shift of the count register
to the right by 1 or 2 before beginning the loop, and using move.w/move.l
instead of move.b in the code below.

	The best example for the use of DBcc is the BCMP() routine, which
employs an actual condition (it uses DBNE) to exit the loop on compare
failure.

68000 specs,  clock-cycles-per-loop/MBytes-per-second on Amiga
where the MBsec is the total transfer rate.  E.g. if you want to zero
396K, it will take a second.  If you want to move 324K from one place to
another it will take a second.

68000 @ 7.14Mhz
				BSET/BZERO	BMOV		BCMP

	as is:			18/0.396 MBsec	22/0.324 MBsec	22/0.324 MBsec
	mod for	word at a time	18/0.793 MBsec	22/0.649 MBsec	22/0.649 MBsec
	mod for long at a time	22/1.298 MBsec	30/0.952 MBsec	30/0.952 MBsec


Transfering a long at a time is about 3x faster than transfering a byte at a
time.  A 68010 can do memory fills/moves/compares from 1.3x to 1.6x faster 
than a 68000 can (1.6x for fills, 1.3x for moves/compares).

-----------------------------------------------------------

Oh yah.. extending the count beyond 16 bits.  Just look at the assembly.
Since DBcc only effects the lower word of a data register, you can still use
a single data register to hold the 32-bit 'count' you want, and simply add
a two instruction outer loop after the DBcc inner loop.


--- Just for kicks, this is what a 68010 would give you ---

Since the loops in all cases will take two instructions, both a word in size,
if you have a 68010 it will enter loop mode operation, which gives 
significantly faster results: (clock cycle times are for the meat of the loop)

68010 @ 7.14Mhz
				BSET/BZERO	BMOV		BCMP

	as is:			10/0.714 MBsec	14/0.510 MBsec	14/0.510 MBsec
	mod for	word at a time	10/1.428 MBsec	14/1.002 MBsec	14/1.002 MBsec
	mod for long at a time	14/2.040 MBsec	22/1.298 MBsec	22/1.298 MBsec



NOTE: All passed arguments are 32 bits each 

#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
#	bcmp.asm
#	bmov.asm
#	bset.asm
# This archive created: Thu Apr 23 20:56:55 1987
export PATH; PATH=/bin:/usr/bin:$PATH
echo shar: "extracting 'bcmp.asm'" '(590 characters)'
if test -f 'bcmp.asm'
then
	echo shar: "will not over-write existing file 'bcmp.asm'"
else
cat << \!Funky!Stuff! > 'bcmp.asm'

;BCMP.ASM
;	using byte operations
;
;   BCMP(p1,p2,n)   return 0=failed, 1=compare ok

     xdef  _bcmp

_bcmp:
	movem.l	4(A7),A0/A1	;A0 = ptr1, A1 = ptr2
	move.l	12(A7),D1	;# bytes
	clr.l	D0		;def. return value is false, also sets Z bit
	bra	drop		;drop into the DBF loop
loop	cmpm.b	(A0)+,(A1)+
drop	dbne.w	D1,loop		;until count exhausted or compare failed
	bne	end
	sub.l	#$10000,D1	;for buffers >65535
	bpl	loop		;branch	to loop	because	D0.W now is FFFF
	addq.l	#1,D0		;return	TRUE
end	rts


!Funky!Stuff!
fi  # end of overwriting check
echo shar: "extracting 'bmov.asm'" '(504 characters)'
if test -f 'bmov.asm'
then
	echo shar: "will not over-write existing file 'bmov.asm'"
else
cat << \!Funky!Stuff! > 'bmov.asm'

;BMOV.ASM
;      4    8	12
;BMOV(src,dest,bytes)
;

	xdef  _bmov

_bmov
	move.l	4(A7),A0	;source
	move.l	8(A7),A1	;destination
	move.l	12(A7),D0	;bytes
	cmp.l	A0,A1
	beq	end		;trivial case
	ble	dropfwd		;forward copy  (dest < src)
	add.l	D0,A0		;backward copy (dest > src)
	add.l	D0,A1
	bra	dropbck

loopfwd	move.b	(A0)+,(A1)+
dropfwd	dbf.w	D0,loopfwd
	sub.l	#$10000,D0
	bpl	loopfwd
	bra	end

loopbck	move.b	-(A0),-(A1)
dropbck	dbf.w	D0,loopbck
	sub.l	#$10000,D0
	bpl	loopbck
end	move.l	8(A7),D0
	rts


!Funky!Stuff!
fi  # end of overwriting check
echo shar: "extracting 'bset.asm'" '(593 characters)'
if test -f 'bset.asm'
then
	echo shar: "will not over-write existing file 'bset.asm'"
else
cat << \!Funky!Stuff! > 'bset.asm'

;BSET.ASM
;BZERO.ASM
;

     xdef  _bset
     xdef  _bzero

_bzero
	clr.l	D1
	bra	begin
_bset
	move.b	15(A7),D1	;12(A7)-> msb .	. lsb	(D1.B = data)
begin
	move.l	4(A7),A0	;A0 = pointer to memory
	move.l	8(A7),D0	;D0 = bytes to set
	bra	drop		;drop into the DBF loop
loop	move.b	D1,(A0)+
drop	dbf.w	D0,loop		;remember, only	effects	lower word
	sub.l	#$10000,D0	;for buffers >65535
	bpl	loop		;branch	to loop	because	D0.W now is FFFF
	move.l	4(A7),D0	;return	pointer	to buffer start
	rts


!Funky!Stuff!
fi  # end of overwriting check
exit 0
#	End of shell archive