[comp.compression] Fast 68000 copy routine.

ross@spam.ua.oz.au (Ross Williams) (06/19/91)

>In article <845@spam.ua.oz> ross@spam.ua.oz.au (Ross Williams) writes:
>>Compressor heads who are using the 68000 version of LZRW1 or who want
>>a 68000 fast memory block copy (using unrolled loops and so forth) may
>>be interested in my fast block memory routine written in 68000 machine code.
>
>Isn't this called "re-inventing the wheel"?

Yes - sorry.

>How is your code better than the other 97 implementations of
>"fast block copy"?  It's been about 10 years since the idea of
>using MOVEM.L in an unrolled loop was invented; have you compared
>your implementation to the standard routines?

No I haven't. Worse still, I didn't spot the MOVEM.L instruction! I
just used MOVE.L/.W/.B in an unrolled loop!! I calculate that mine will
be approximately 13% slower than one that uses a MOVEM.L. However, my
algorithm will quickly move non-relatively-word-aligned blocks if that
is any help!

The main reason that I posted the code was that it is an important
part of the 68000 assembler implementation of my LZRW1 algorithm.
Those using the algorithm up to now have had to find their own copy
routine. Now there is an "official" one for the algorithm.

The reason that I wrote the code in the first place was because the
movmem routine in my Macintosh Lightspeed C compiler's library was
slow. I assumed that if fast routines were commonplace then the
library of this fairly mainstream compiler would have used one.

Ross Williams
ross@spam.ua.oz.au

basker@diku.dk (Tom Thuneby) (06/21/91)

ross@spam.ua.oz.au (Ross Williams) writes:

>>How is your code better than the other 97 implementations of
>>"fast block copy"?  It's been about 10 years since the idea of
>>using MOVEM.L in an unrolled loop was invented; have you compared
>>your implementation to the standard routines?

I'm not a skilled M68000 programmer, but anyway:

If you are only considering MC68000, then read no further. Otherwise:
I believe the MC68010 (and possibly '20 & up) have a special 'loop-mode'
for these cases: when an one-word move instruction is followed by a
decrement-and-branch instruction (branching to the move), the MC68010
will enter loop-mode. No instructions are fetched; only data transfers
take place. The instructions are still executed, but they reside in the
decode register and the prefetch queue, and the processor only has to
fetch them twice (not once, don't ask me why).

I don't have the instruction timings here, but it might outdo the MOVEM.L
routines, especially on smallish blocks.

This is probably not the appropriate group for this, so please mail me
any follow-ups (I probably don't subscribe to an appropriate group :).

		Tom Thuneby (basker@diku.dk)