[comp.sys.m68k] 68030 usage questions

meulenbr@cst.prl.philips.nl (Frans Meulenbroeks) (06/11/91)

Hi,

I'm moving software from a 68000 system to a 68030 system. 
While doing so two questions came up.
On the 68000 my longs and pointers (in C) are word aligned.
Would it boost performance if they were longword aligned?
Significantly??

The code to be moved also contains a copy loop in assembler to copy
substantial chunks of data (512 bytes).
Since this copy operation is done very often we tried to make
it as fast as possible. The solution used was to 
dump all registers (except a7) on the stack, load the src address
in a0 and the destination address in a1, and then copy by
filling the register with movem.l and writing them to the other
memory part with another movem.l. This is done as often as needed
(not in a loop, it is inline).
At the end the old registers are restored.
Advantage is that there are few opcode fetches, so a lot of copying is
done with little overhead.

However, I was wondering if this can be done faster on the 68030.
I could use a dbf loop here and copy a long at a time. Would this
be faster than my 68000 movem solution. I don't know the cost of
a move.l/dbf loop when it is in the cache, and the part describing
timing is not the most readable part of the 030 manual. 
Does anyone have an idea which alternative is better? Or is there even
a better solution??

Thanks!
--
Frans Meulenbroeks        (meulenbr@prl.philips.nl)
	Centre for Software Technology

k2@bl.physik.tu-muenchen.de (Klaus Steinberger) (06/12/91)

meulenbr@cst.prl.philips.nl (Frans Meulenbroeks) writes:

>Hi,

>I'm moving software from a 68000 system to a 68030 system. 
>While doing so two questions came up.
>On the 68000 my longs and pointers (in C) are word aligned.
>Would it boost performance if they were longword aligned?
>Significantly??
Yes, because the 68030 has to make two bus accesses, if its not
longword aligned.

>The code to be moved also contains a copy loop in assembler to copy
>substantial chunks of data (512 bytes).
>Since this copy operation is done very often we tried to make
>it as fast as possible. The solution used was to 
>dump all registers (except a7) on the stack, load the src address
>in a0 and the destination address in a1, and then copy by
>filling the register with movem.l and writing them to the other
>memory part with another movem.l. This is done as often as needed
>(not in a loop, it is inline).
>At the end the old registers are restored.
>Advantage is that there are few opcode fetches, so a lot of copying is
>done with little overhead.
>However, I was wondering if this can be done faster on the 68030.
>I could use a dbf loop here and copy a long at a time. Would this
>be faster than my 68000 movem solution. I don't know the cost of
>a move.l/dbf loop when it is in the cache, and the part describing
>timing is not the most readable part of the 030 manual. 
>Does anyone have an idea which alternative is better? Or is there even
>a better solution??
As I understand the Motorola Doku, a movem.l; dbf loop will get complete
overlapping in the pipeline. Due to the cache, there will be no more
opcode fetches after initial ones.

But make sure, the cache is really enabled. One some hardware,
the ROM address range sets the CI line, so you will get no benefit
from the cache. That's even more annoying, if you have ROM's with
only Byte access, instead of word or long-word.  In this case, the loop
will be as slow as on a 68000!!!!!

I got this problem with ELTEC's Eurocom 6. My memory initialisation
routine was annoyingly slow. I've solved that problem, through 
temporarily copying the assembler part of the initialization into
RAM.

Sincerely,
Klaus Steinberger

--
Klaus Steinberger               Beschleunigerlabor der TU und LMU Muenchen
Phone: (+49 89)3209 4287        Hochschulgelaende
FAX:   (+49 89)3209 4280        D-8046 Garching, Germany
BITNET: K2@DGABLG5P             Internet: k2@bl.physik.tu-muenchen.de