[comp.sys.amiga] Blitter / Assembly block clear benchmarks

dillon@CORY.BERKELEY.EDU (Matt Dillon) (05/03/87)

	-Open a 1 bit plane 320x206 low-res screen and use it's bitmap
	 plane 0 pointer for the array to clear.
	-zero N bytes of it using (A) the blitter, and (B) the fastest 
	 assembly routine possible (DBcc inner loop using long operations)

	NOTES: Cleared memory is in CHIPMEM.  Very tiny speed increase for
	assembly routine clearing FASTMEM.

Results:	(throughput based on many iterations of the block set)
	(1) BltClear() w/ flag = 1  (waits for blit to finish before returning)
	(2) BltClear() w/ flag = 0  (doesn't wait for blitter to finish)

			      (MBytes/sec)
		assembly	blitter		blitter
blocksize	throughput	throughput (1)	throughput (2)

8192		1.167 		3.084		3.153
4096		1.158		2.814		2.995
2048		1.146		2.723		2.953
1024		1.121		2.356		2.759
 512		1.054		1.941		2.496
 256		 .958		1.436		2.076
 128		 .806		 .945		1.360
  64		 .617		 .609		 .738
  32		 .412		 .345		 .369
  16		 .252		 .173		 .185
   8		 .140 		 .086		 .092

for comparison, a 68010 can zero a large block @ 2.04 MBytes/sec

---------------------------------------------------------------------
	So it looks like the blitter setup starts to make a difference
at around the 64-128 byte block size.  Any larger than that and the blitter
can do the memory clear operation faster.

	There are several considerations when using BltClear() to clear 
memory:

	(1) You can only clear CHIP memory
	(2) You can concurrently do a blitter clear and calculations.

	That's right folks, you can give BltClear()'s flag = 0 and the
call will proceed as follows:
		-Wait for any previous blits to finish
		-Start our clear-memory blit
		-return immediately

	Which means that if you are updating a double buffered display, you
can start the clear operation on one of the buffers, and while the blitter
is working on that you can calculate the points list (or whatever ... say
you were doing 3D animation).  You don't have to worry about starting
your line drawing before the blitter finishes the clear because the line
drawing uses the blitter (and will automatically wait for it to become idle).

	You save a whole 2.6 mS!! for a 320x200x1 clear.   Hmmm... that isn't
very much.  If we were wild and crazy and used a 640x400x2 double buffered
display, we could save about 20 mS! by running the BltClear() concurrently.
Still not all that much.  The BltClear() is just too fast... hey, C-A, why
don't you slow it down... just kidding.

				-Matt

rokicki@rocky.UUCP (05/05/87)

dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
> for comparison, a 68010 can zero a large block @ 2.04 MBytes/sec

Actually, it's possible to clear memory pretty quickly with a
bare bones 68000.  For block sizes larger than 256, you can do
something like this (after setup):

loop:
	movem.l	d1-d7/a1-a6,-(a0)	; 116
	movem.l	d1-d7/a1-a6,-(a0)	; 116
	movem.l	d1-d7/a1-a6,-(a0)	; 116
	movem.l	d1-d7/a1-a6,-(a0)	; 116
	movem.l	d1-d7/a1-a5,-(a0)	; 108
	dbra	d0,loop			;  10
					_____
					  582 cycles to clear 256 bytes

On a 7.18 MHz processor, that yields a throughput of

	( 7.18 * 10^6 * 256 ) / ( 582 * 1048576 )

or 3.01 MBytes per second.  (Gotta run it to be sure.)

Of course, the bandwidth of the 68000 is:

	( 7.18 * 10^6 ) / ( 2 * 1048576 )

or 3.42 MBytes per second, so we are not too far off.
The blitter can function at twice that:

	( 7.18 * 10^6 ) / 1048576

or 6.85 MBytes per second, if you turn the display off.  And
get rid of all of the other overhead.

Note that in the 68000 calculations above, we are assuming no
contention for the bus for the 68000.  Actually, the 68000 *does*
see some small amount of contention, even with only a high-res 2
bit plane screen like the workbench screen, but it's not much.

Now, what are you going to do with memory you clear so fast?

dillon@CORY.BERKELEY.EDU.UUCP (05/05/87)

:dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
:> for comparison, a 68010 can zero a large block @ 2.04 MBytes/sec
:
:Actually, it's possible to clear memory pretty quickly with a
:loop:
:	movem.l	d1-d7/a1-a6,-(a0)	; 116
: etc.

	Ahhh... I didn't think of that!  Marvelous!  Well worth the register
saving required (At least for large block sizes)...

				-Matt