andrew@frip.gwd.tek.com (Andrew Klossner) (11/19/88)
[An earlier posting under this title went out with some egregious bugs. I cancelled it; my apologies if you saw it. The lesson: never post something when you're due in a meeting in five minutes.] For comparison, here's an 88k routine to do the bitwise inversion using a 256-byte lookup table. It the entire routine and the entire table are in cache (a reasonable assumption if the routine is heavily used; the smallest I and D cache sizes are each 16k), then the routine takes 17 cycles, including the return-to-caller. As my colleagues did, I point out that this was a ten-minute hack, and I'd welcome suggestions for improvement ... we will actually need a routine like this in a few months for graphics-intensive work. ; end-for-end routine ; Register usage: ; r2 = parameter (and return value) ; r3 -> 256-byte table of inverted bytes ; r4 = inversion of least significant parameter byte ; r5 = inversion of second parameter byte ; r6 = inversion of third parameter byte ; r7 = inversion of most significant parameter byte ; All of these registers are caller-saved. global _end_for_end _end_for_end: ; Cycle count ; Load address of inverted byte table. or.u r3,r0,hi16(end_for_end_table) ; 1 or r3,r3,lo16(end_for_end_table) ; 2 ; Start loading the inverse of the first byte. extu r4,r2,8<0> ; 3 ld.bu r4,r3,r4 ; 4 ; Start loading the inverse of the second byte. extu r5,r2,8<8> ; 5 ld.bu r5,r3,r5 ; 6 ; Start loading the inverse of the third byte. extu r6,r2,8<16> ; 7 ld.bu r6,r3,r6 ; 8 ; Now the data pipeline's full. ; Compute the address of the fourth byte inversion. extu r7,r2,8<24> ; 9 ; Stall waiting for the first inverse to come in. mak r2,r4,8<24> ; 10 (r4 stall) ; Start loading the inverse of the fourth byte. ld.bu r7,r3,r7 ; 11 ; Stall on, then assemble the remaining bytes into the return value. mak r5,r5,8<16> ; 12 (r5 stall) or r2,r2,r5 ; 13 mak r6,r6,8<8> ; 14 (r6 stall) or r2,r2,r6 ; 15 jmp.n r1 : 16 or r2,r2,r7 ; 17 (r7 stall) -=- Andrew Klossner (uunet!tektronix!hammer!frip!andrew) [UUCP] (andrew%frip.gwd.tek.com@relay.cs.net) [ARPA]