[comp.sys.amiga.tech] How do I blit this?

bevis@ee.ecn.purdue.edu (Jeff Bevis) (01/07/91)

Maybe it's just me, but I'm having a heck of a time trying to find a way to
perfom a specific blit operation.  I'm accessing the hardware directly
for my purpose.  The problem goes something like this:

I've got a 320x200 bitplane, and I want to blit out a 16x16 bit rectangle.
I want it to go into a buffer area the size of 16 words.  Sounds fine.  It's
easy if the source rectangle is on a word boundary, too.  But supposing the
source rectangle is shifted off the word boundary, I'll have to use the bit
shifter in the blitter.  No problem there.  Let me draw a picture of the
hypothetical source:

0123456789abcdef 0123456789abcdef
......xxxxxxxxxx xxxxxx..........

Here, the 16-bit wide block is shifted right by 6 bits.  So, I see that if
blit two words per line, with a shift of 10, the blitter will align the source
information in the second word it reads on each line.  The two words the
blitter copies into the destination (for each line) will look like this:

0123456789abcdef 0123456789abcdef
cccccccccc...... xxxxxxxxxxxxxxxx

(the c's are carried from the previous line;  they could be masked out)

The question is, how do I get the blitter to not WRITE the first word on
each line?  For a two-word wide blit, I'll need a two-word wide place to put
the result.  I only want to use my 1-word by 16-line storage for the result.
The first word is really junk that I don't need or want.  According to what
I see, I need to reserve space for the unused portion of the blit.

If someone could tell me what the OS does to accomplish this (and it
apparently does), or if you've got a clue as to what I'm doing wrong, please
email me asap!  

Thanks!
-------------------------------------------------------------------------------
    "Three is never equal to four, except for very large values of three."
-------------------------------------------------------------------------------
Jeff Bevis		     Purdue Univeristy School of Electrical Engineering
bevis@en.ecn.purdue.edu	  	   	       Give me Amiga or nothing at all.
-------------------------------------------------------------------------------

rokicki@Neon.Stanford.EDU (Tomas G. Rokicki) (01/09/91)

>I've got a 320x200 bitplane, and I want to blit out a 16x16 bit rectangle.
>I want it to go into a buffer area the size of 16 words.  Sounds fine.  It's
>easy if the source rectangle is on a word boundary, too.  But supposing the
>source rectangle is shifted off the word boundary, I'll have to use the bit
>shifter in the blitter.  No problem there.  Let me draw a picture of the

>The question is, how do I get the blitter to not WRITE the first word on
>each line?  For a two-word wide blit, I'll need a two-word wide place to put
>the result.  I only want to use my 1-word by 16-line storage for the result.

The simplest way is to work from the bottom up (using negative modulos.)
With this, you will write the first word, but it will be overwritten later
with the correct value.  (Note that you do not use descending mode, just
negative modulos.)

Otherwise, you can set something up like

   D = C . ~A + B . A

and use the masking capabilities of the blitter to restore whatever was
there originally in the destination.  The only difficulty with this is
that the blitter is pipelined---let's say your destination is short a[],
the blitter will:

First `line' (two words):

fetch a[-1], compute new a[-1], store nothing
fetch a[0], compute new a[0], store a[-1]

Next `line' (two words)

fetch *old* a[0], compute `new' incorrect a[0], store pipelined a[0]
fetch a[1], compute new a[1], store new incorrect a[0].

I don't think the blitter flushes the pipeline at the end of each line.
If it did, this wouldn't be a problem.  But using negative modulos can
do magic . . .

-tom

ccplumb@rose.uwaterloo.ca (Colin Plumb) (01/11/91)

In article <9101061957.AA20737@en.ecn.purdue.edu> bevis@ee.ecn.purdue.edu (Jeff Bevis) writes:
>Maybe it's just me, but I'm having a heck of a time trying to find a way to
>perfom a specific blit operation.  I'm accessing the hardware directly
>for my purpose.  The problem goes something like this:
>
>I've got a 320x200 bitplane, and I want to blit out a 16x16 bit rectangle.
>I want it to go into a buffer area the size of 16 words.  Sounds fine.  It's
>easy if the source rectangle is on a word boundary, too.  But supposing the
>source rectangle is shifted off the word boundary, I'll have to use the bit
>shifter in the blitter.  No problem there.  Let me draw a picture of the
>hypothetical source:
>
>0123456789abcdef 0123456789abcdef
>.....xxxxxxxxxx xxxxxx..........
>
>Here, the 16-bit wide block is shifted right by 6 bits.  So, I see that if
>blit two words per line, with a shift of 10, the blitter will align the source
>information in the second word it reads on each line.  The two words the
>blitter copies into the destination (for each line) will look like this:
>
>0123456789abcdef 0123456789abcdef
>cccccccccc...... xxxxxxxxxxxxxxxx
>
>(the c's are carried from the previous line;  they could be masked out)
>
>The question is, how do I get the blitter to not WRITE the first word on
>each line?  For a two-word wide blit, I'll need a two-word wide place to put
>the result.  I only want to use my 1-word by 16-line storage for the result.
>The first word is really junk that I don't need or want.  According to what
>I see, I need to reserve space for the unused portion of the blit.

Well, I can get the blitter to read and write back the first word unchanged.
This isn't quite as good as not touching it (there is a noticeable delay,
so there is a small chance that, unless it's protected during the blit,
the the processor will change it, only to have its original value stomped
back in -> BUG), but it's close.

What we need to use it the ubitquitous cookie cutter function, D = A*B + !A*B.
C is the destination, B is the source, and A is the mask.  Since the mask
is just a fixed-length row of bits, we don't need to load it from memory
and can just write $FFFF to BLTADAT (which will never be overwritten if we
don't enable DMA).  Program a shift of 0, a $0000 BLTAFWM and a $FFFF BLTALWM,
and then load $FFFF into BLTADAT (the internal latch is *after* the shifter,
so programming the shifter after writing doesn't do anything)

For source B, use the apropriate shift (10 bits, in your example) and modulo
(320 bits = 20 bytes, less 2 is 18 bytes).

Source C and destination D should point to the word before the 16-word
buffer, with a modulo of -2 (-1 word).

Then program the minterm, the DMA control to USEB, USEC and USED, and start
a blit two words wide and 16 deep.  The first word of the first line will
be fetched, shifted down so there are no significant bits, and the bits
which are one in ($FFFF (BLTADAT) & $0000 (BLTAFWM)), i.e. none of them,
in source C will be replaced with corresponding bits in source B, and
the result (the word originally read from C) will be written back to D.

Ka-boom, I just realized what's wrong.  The blitter has an internal pipeline,
so the results of one operation aren't written until the data for the
next one is read.  While computing the data for one line (the second word
accessed), it will fetch the word about to be written and not change it
when it's written back.  The upshot is that only the last word will be changed.

You have to turn this whole thing around, going to descending mode, to
avoid the problem.  The same basic approach works, though.

HOWEVER, I'd like to suggest that doing it with the processor (load 32
bits, shift, and store 16) would be faster than setting up the blitter
and waiting for it to finish.  In C,

void blit16(char *srcmap, USHORT *dstptr, USHORT x, USHORT y)
{
	register ULONG *baseaddr;
	register short shiftamt;
#define ROWSIZE 20 /* RASSIZE would be better if I remembered argument order */

	baseaddr = (ULONG *)(srcmap + ROWSIZE*y + ((x-1)/8)&-2);
	shiftamt = x&15;
	*destptr++ = (USHORT)(baseaddr[ 0*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 1*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 2*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 3*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 4*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 5*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 6*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 7*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 8*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 9*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[10*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[11*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[12*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[13*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[14*ROWSIZE]>>shiftamt]);
	*destptr   = (USHORT)(baseaddr[15*ROWSIZE]>>shiftamt]);
}

It was written this way to make the assembler code pretty obvious
so even a stupid compiler should manage to make it optimal.
-- 
	-Colin