dillon@CORY.BERKELEY.EDU (Matt Dillon) (04/24/87)
(I better sign my name here... this is a long article)
-Matt
Here is a good example of how to use a DBcc statement, including
how to extend it to handle counts larger than 16 bits (only two more
instructions). I've tested all three of these routines.
The routines transfer information a byte at a time. Theoretically,
transfering information a word or a long at a time would be about 2x and 4x
as fast, but you need to add code to take care of initial an trailing
alignment. If you made a restriction that the supplied arguments were on
word or long boundries, and the size in multiples of 2 or 4 bytes, you
could implement such an improvment by a simple shift of the count register
to the right by 1 or 2 before beginning the loop, and using move.w/move.l
instead of move.b in the code below.
The best example for the use of DBcc is the BCMP() routine, which
employs an actual condition (it uses DBNE) to exit the loop on compare
failure.
68000 specs, clock-cycles-per-loop/MBytes-per-second on Amiga
where the MBsec is the total transfer rate. E.g. if you want to zero
396K, it will take a second. If you want to move 324K from one place to
another it will take a second.
68000 @ 7.14Mhz
BSET/BZERO BMOV BCMP
as is: 18/0.396 MBsec 22/0.324 MBsec 22/0.324 MBsec
mod for word at a time 18/0.793 MBsec 22/0.649 MBsec 22/0.649 MBsec
mod for long at a time 22/1.298 MBsec 30/0.952 MBsec 30/0.952 MBsec
Transfering a long at a time is about 3x faster than transfering a byte at a
time. A 68010 can do memory fills/moves/compares from 1.3x to 1.6x faster
than a 68000 can (1.6x for fills, 1.3x for moves/compares).
-----------------------------------------------------------
Oh yah.. extending the count beyond 16 bits. Just look at the assembly.
Since DBcc only effects the lower word of a data register, you can still use
a single data register to hold the 32-bit 'count' you want, and simply add
a two instruction outer loop after the DBcc inner loop.
--- Just for kicks, this is what a 68010 would give you ---
Since the loops in all cases will take two instructions, both a word in size,
if you have a 68010 it will enter loop mode operation, which gives
significantly faster results: (clock cycle times are for the meat of the loop)
68010 @ 7.14Mhz
BSET/BZERO BMOV BCMP
as is: 10/0.714 MBsec 14/0.510 MBsec 14/0.510 MBsec
mod for word at a time 10/1.428 MBsec 14/1.002 MBsec 14/1.002 MBsec
mod for long at a time 14/2.040 MBsec 22/1.298 MBsec 22/1.298 MBsec
NOTE: All passed arguments are 32 bits each
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
# bcmp.asm
# bmov.asm
# bset.asm
# This archive created: Thu Apr 23 20:56:55 1987
export PATH; PATH=/bin:/usr/bin:$PATH
echo shar: "extracting 'bcmp.asm'" '(590 characters)'
if test -f 'bcmp.asm'
then
echo shar: "will not over-write existing file 'bcmp.asm'"
else
cat << \!Funky!Stuff! > 'bcmp.asm'
;BCMP.ASM
; using byte operations
;
; BCMP(p1,p2,n) return 0=failed, 1=compare ok
xdef _bcmp
_bcmp:
movem.l 4(A7),A0/A1 ;A0 = ptr1, A1 = ptr2
move.l 12(A7),D1 ;# bytes
clr.l D0 ;def. return value is false, also sets Z bit
bra drop ;drop into the DBF loop
loop cmpm.b (A0)+,(A1)+
drop dbne.w D1,loop ;until count exhausted or compare failed
bne end
sub.l #$10000,D1 ;for buffers >65535
bpl loop ;branch to loop because D0.W now is FFFF
addq.l #1,D0 ;return TRUE
end rts
!Funky!Stuff!
fi # end of overwriting check
echo shar: "extracting 'bmov.asm'" '(504 characters)'
if test -f 'bmov.asm'
then
echo shar: "will not over-write existing file 'bmov.asm'"
else
cat << \!Funky!Stuff! > 'bmov.asm'
;BMOV.ASM
; 4 8 12
;BMOV(src,dest,bytes)
;
xdef _bmov
_bmov
move.l 4(A7),A0 ;source
move.l 8(A7),A1 ;destination
move.l 12(A7),D0 ;bytes
cmp.l A0,A1
beq end ;trivial case
ble dropfwd ;forward copy (dest < src)
add.l D0,A0 ;backward copy (dest > src)
add.l D0,A1
bra dropbck
loopfwd move.b (A0)+,(A1)+
dropfwd dbf.w D0,loopfwd
sub.l #$10000,D0
bpl loopfwd
bra end
loopbck move.b -(A0),-(A1)
dropbck dbf.w D0,loopbck
sub.l #$10000,D0
bpl loopbck
end move.l 8(A7),D0
rts
!Funky!Stuff!
fi # end of overwriting check
echo shar: "extracting 'bset.asm'" '(593 characters)'
if test -f 'bset.asm'
then
echo shar: "will not over-write existing file 'bset.asm'"
else
cat << \!Funky!Stuff! > 'bset.asm'
;BSET.ASM
;BZERO.ASM
;
xdef _bset
xdef _bzero
_bzero
clr.l D1
bra begin
_bset
move.b 15(A7),D1 ;12(A7)-> msb . . lsb (D1.B = data)
begin
move.l 4(A7),A0 ;A0 = pointer to memory
move.l 8(A7),D0 ;D0 = bytes to set
bra drop ;drop into the DBF loop
loop move.b D1,(A0)+
drop dbf.w D0,loop ;remember, only effects lower word
sub.l #$10000,D0 ;for buffers >65535
bpl loop ;branch to loop because D0.W now is FFFF
move.l 4(A7),D0 ;return pointer to buffer start
rts
!Funky!Stuff!
fi # end of overwriting check
exit 0
# End of shell archive