mj@myrias.UUCP (Michal Jaegermann) (03/14/89)
Seeing David Brooks posting, relayed by Steve Yelvington, gave me enough of incentive to look a little bit closer also at lmemcpy code, as distributed with dLibs 1.2. Sure enough I found that it exhibited the same set of problems as memcpy. So here is a replacement code, quite obviously derived from a code by David. * lmemcpy.s - replacement for a code in dLibs 1.2 * * After memcpy by David Brooks - Michal Jaegermann, 11 Mar 89 * * char *lmemcpy(dest, source, len) * char *dest at 4(sp) * char *source at 8(sp) * unsigned long len at 12(sp) * * Avoid "btst #n,dn" because of a bug in the Sozobon assembler. * If parity of both adresses the same - copies in 16 byte blocks. * One has to stop loop unrolling somewhere. .text .globl _lmemcpy _lmemcpy: lea 4(a7),a2 ; Point to argument list move.l (a2)+,a1 ; a1 = dest move.l (a2)+,a0 ; a0 = source move.l (a2),d2 ; d2 = len beq lmemcpy6 ; return if zero move.l a0,d1 ; Check for odd/even alignment add.w a1,d1 ; This is really eor.w on the lsb. Really. asr.w #1,d1 ; Get lsb into C. If it's 1, alignment is off. bcs lmemcpy7 ; Go do it slowly move.l a0,d1 ; Check for initial odd byte asr.w #1,d1 ; Get lsb bcc lmemcpy1 subq.l #1,d2 ; Move initial byte move.b (a0)+,(a1)+ ; lmemcpy1: moveq.l #15,d1 ; Split into a longword count and remainder and.w d2,d1 lsr.l #4,d2 ; for 16 bytes at a time move.l d2,d0 ; a second counter for dbra swap d0 bra lmemcpy3 ; Words (!!) d2 and d0 could equal 0. lmemcpy2: move.l (a0)+,(a1)+ ; Copy 16 bytes move.l (a0)+,(a1)+ move.l (a0)+,(a1)+ move.l (a0)+,(a1)+ lmemcpy3: dbra d2,lmemcpy2 dbra d0,lmemcpy2 bra lmemcpy5 ; Enter final loop. Again d1 could equal 0. lmemcpy4: move.b (a0)+,(a1)+ ; Up to 15 trailing bytes lmemcpy5: dbra d1,lmemcpy4 lmemcpy6: move.l 4(a7),d0 ; stick return value into d0 rts ; All done. lmemcpy7: ; Handle the odd/even aligned case move.l a1,d0 ; d0 = dest, ready to return subq #1,d2 ; here d2 was positive! move.l d2,d1 swap d1 ; second dbra counter lmemcpy8: move.b (a0)+,(a1)+ ; move byte-by-byte dbra d2,lmemcpy8 dbra d1,lmemcpy8 rts ; and exit normally lmemcpy is used internally by realloc. Since malloc has to return nicely aligned addresses, hence in this case we luckied out and an old version of lmemcpy will work for realloc, maybe even a little bit faster. :-). Unless your program will pass to realloc some funny argument. And one more thing. dLibs documentation claims that length parameter to memcpy and lmemcpy is either int or long. Code documents and works with - more logically, but with some discrepancy with U*IX library - unsigned quantities. Accordingly to ANSI standard specification length parameter to memcpy has to be of size_t type, where size_t is an integral type specified by an implementation. This means that unsigned int, or unsigned long, are ok, if you said so. Watch out for these until things get fixed. Michal Jaegermann Myrias Research Corporation Edmonton, Alberta, CANADA ...{alberta,ncc}!myrias!mj