dillon@CORY.BERKELEY.EDU (Matt Dillon) (05/03/87)
-Open a 1 bit plane 320x206 low-res screen and use it's bitmap plane 0 pointer for the array to clear. -zero N bytes of it using (A) the blitter, and (B) the fastest assembly routine possible (DBcc inner loop using long operations) NOTES: Cleared memory is in CHIPMEM. Very tiny speed increase for assembly routine clearing FASTMEM. Results: (throughput based on many iterations of the block set) (1) BltClear() w/ flag = 1 (waits for blit to finish before returning) (2) BltClear() w/ flag = 0 (doesn't wait for blitter to finish) (MBytes/sec) assembly blitter blitter blocksize throughput throughput (1) throughput (2) 8192 1.167 3.084 3.153 4096 1.158 2.814 2.995 2048 1.146 2.723 2.953 1024 1.121 2.356 2.759 512 1.054 1.941 2.496 256 .958 1.436 2.076 128 .806 .945 1.360 64 .617 .609 .738 32 .412 .345 .369 16 .252 .173 .185 8 .140 .086 .092 for comparison, a 68010 can zero a large block @ 2.04 MBytes/sec --------------------------------------------------------------------- So it looks like the blitter setup starts to make a difference at around the 64-128 byte block size. Any larger than that and the blitter can do the memory clear operation faster. There are several considerations when using BltClear() to clear memory: (1) You can only clear CHIP memory (2) You can concurrently do a blitter clear and calculations. That's right folks, you can give BltClear()'s flag = 0 and the call will proceed as follows: -Wait for any previous blits to finish -Start our clear-memory blit -return immediately Which means that if you are updating a double buffered display, you can start the clear operation on one of the buffers, and while the blitter is working on that you can calculate the points list (or whatever ... say you were doing 3D animation). You don't have to worry about starting your line drawing before the blitter finishes the clear because the line drawing uses the blitter (and will automatically wait for it to become idle). You save a whole 2.6 mS!! for a 320x200x1 clear. Hmmm... that isn't very much. If we were wild and crazy and used a 640x400x2 double buffered display, we could save about 20 mS! by running the BltClear() concurrently. Still not all that much. The BltClear() is just too fast... hey, C-A, why don't you slow it down... just kidding. -Matt
rokicki@rocky.UUCP (05/05/87)
dillon@CORY.BERKELEY.EDU (Matt Dillon) writes: > for comparison, a 68010 can zero a large block @ 2.04 MBytes/sec Actually, it's possible to clear memory pretty quickly with a bare bones 68000. For block sizes larger than 256, you can do something like this (after setup): loop: movem.l d1-d7/a1-a6,-(a0) ; 116 movem.l d1-d7/a1-a6,-(a0) ; 116 movem.l d1-d7/a1-a6,-(a0) ; 116 movem.l d1-d7/a1-a6,-(a0) ; 116 movem.l d1-d7/a1-a5,-(a0) ; 108 dbra d0,loop ; 10 _____ 582 cycles to clear 256 bytes On a 7.18 MHz processor, that yields a throughput of ( 7.18 * 10^6 * 256 ) / ( 582 * 1048576 ) or 3.01 MBytes per second. (Gotta run it to be sure.) Of course, the bandwidth of the 68000 is: ( 7.18 * 10^6 ) / ( 2 * 1048576 ) or 3.42 MBytes per second, so we are not too far off. The blitter can function at twice that: ( 7.18 * 10^6 ) / 1048576 or 6.85 MBytes per second, if you turn the display off. And get rid of all of the other overhead. Note that in the 68000 calculations above, we are assuming no contention for the bus for the 68000. Actually, the 68000 *does* see some small amount of contention, even with only a high-res 2 bit plane screen like the workbench screen, but it's not much. Now, what are you going to do with memory you clear so fast?
dillon@CORY.BERKELEY.EDU.UUCP (05/05/87)
:dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
:> for comparison, a 68010 can zero a large block @ 2.04 MBytes/sec
:
:Actually, it's possible to clear memory pretty quickly with a
:loop:
: movem.l d1-d7/a1-a6,-(a0) ; 116
: etc.
Ahhh... I didn't think of that! Marvelous! Well worth the register
saving required (At least for large block sizes)...
-Matt