meulenbr@cst.prl.philips.nl (Frans Meulenbroeks) (06/11/91)
Hi, I'm moving software from a 68000 system to a 68030 system. While doing so two questions came up. On the 68000 my longs and pointers (in C) are word aligned. Would it boost performance if they were longword aligned? Significantly?? The code to be moved also contains a copy loop in assembler to copy substantial chunks of data (512 bytes). Since this copy operation is done very often we tried to make it as fast as possible. The solution used was to dump all registers (except a7) on the stack, load the src address in a0 and the destination address in a1, and then copy by filling the register with movem.l and writing them to the other memory part with another movem.l. This is done as often as needed (not in a loop, it is inline). At the end the old registers are restored. Advantage is that there are few opcode fetches, so a lot of copying is done with little overhead. However, I was wondering if this can be done faster on the 68030. I could use a dbf loop here and copy a long at a time. Would this be faster than my 68000 movem solution. I don't know the cost of a move.l/dbf loop when it is in the cache, and the part describing timing is not the most readable part of the 030 manual. Does anyone have an idea which alternative is better? Or is there even a better solution?? Thanks! -- Frans Meulenbroeks (meulenbr@prl.philips.nl) Centre for Software Technology
k2@bl.physik.tu-muenchen.de (Klaus Steinberger) (06/12/91)
meulenbr@cst.prl.philips.nl (Frans Meulenbroeks) writes: >Hi, >I'm moving software from a 68000 system to a 68030 system. >While doing so two questions came up. >On the 68000 my longs and pointers (in C) are word aligned. >Would it boost performance if they were longword aligned? >Significantly?? Yes, because the 68030 has to make two bus accesses, if its not longword aligned. >The code to be moved also contains a copy loop in assembler to copy >substantial chunks of data (512 bytes). >Since this copy operation is done very often we tried to make >it as fast as possible. The solution used was to >dump all registers (except a7) on the stack, load the src address >in a0 and the destination address in a1, and then copy by >filling the register with movem.l and writing them to the other >memory part with another movem.l. This is done as often as needed >(not in a loop, it is inline). >At the end the old registers are restored. >Advantage is that there are few opcode fetches, so a lot of copying is >done with little overhead. >However, I was wondering if this can be done faster on the 68030. >I could use a dbf loop here and copy a long at a time. Would this >be faster than my 68000 movem solution. I don't know the cost of >a move.l/dbf loop when it is in the cache, and the part describing >timing is not the most readable part of the 030 manual. >Does anyone have an idea which alternative is better? Or is there even >a better solution?? As I understand the Motorola Doku, a movem.l; dbf loop will get complete overlapping in the pipeline. Due to the cache, there will be no more opcode fetches after initial ones. But make sure, the cache is really enabled. One some hardware, the ROM address range sets the CI line, so you will get no benefit from the cache. That's even more annoying, if you have ROM's with only Byte access, instead of word or long-word. In this case, the loop will be as slow as on a 68000!!!!! I got this problem with ELTEC's Eurocom 6. My memory initialisation routine was annoyingly slow. I've solved that problem, through temporarily copying the assembler part of the initialization into RAM. Sincerely, Klaus Steinberger -- Klaus Steinberger Beschleunigerlabor der TU und LMU Muenchen Phone: (+49 89)3209 4287 Hochschulgelaende FAX: (+49 89)3209 4280 D-8046 Garching, Germany BITNET: K2@DGABLG5P Internet: k2@bl.physik.tu-muenchen.de