andrew@frip.gwd.tek.com (Andrew Klossner) (11/19/88)
[An earlier posting under this title went out with some egregious bugs.
I cancelled it; my apologies if you saw it. The lesson: never post something
when you're due in a meeting in five minutes.]
For comparison, here's an 88k routine to do the bitwise inversion using
a 256-byte lookup table. It the entire routine and the entire table
are in cache (a reasonable assumption if the routine is heavily used;
the smallest I and D cache sizes are each 16k), then the routine takes
17 cycles, including the return-to-caller.
As my colleagues did, I point out that this was a ten-minute hack, and
I'd welcome suggestions for improvement ... we will actually need a
routine like this in a few months for graphics-intensive work.
; end-for-end routine
; Register usage:
; r2 = parameter (and return value)
; r3 -> 256-byte table of inverted bytes
; r4 = inversion of least significant parameter byte
; r5 = inversion of second parameter byte
; r6 = inversion of third parameter byte
; r7 = inversion of most significant parameter byte
; All of these registers are caller-saved.
global _end_for_end
_end_for_end: ; Cycle count
; Load address of inverted byte table.
or.u r3,r0,hi16(end_for_end_table) ; 1
or r3,r3,lo16(end_for_end_table) ; 2
; Start loading the inverse of the first byte.
extu r4,r2,8<0> ; 3
ld.bu r4,r3,r4 ; 4
; Start loading the inverse of the second byte.
extu r5,r2,8<8> ; 5
ld.bu r5,r3,r5 ; 6
; Start loading the inverse of the third byte.
extu r6,r2,8<16> ; 7
ld.bu r6,r3,r6 ; 8
; Now the data pipeline's full.
; Compute the address of the fourth byte inversion.
extu r7,r2,8<24> ; 9
; Stall waiting for the first inverse to come in.
mak r2,r4,8<24> ; 10 (r4 stall)
; Start loading the inverse of the fourth byte.
ld.bu r7,r3,r7 ; 11
; Stall on, then assemble the remaining bytes into the return value.
mak r5,r5,8<16> ; 12 (r5 stall)
or r2,r2,r5 ; 13
mak r6,r6,8<8> ; 14 (r6 stall)
or r2,r2,r6 ; 15
jmp.n r1 : 16
or r2,r2,r7 ; 17 (r7 stall)
-=- Andrew Klossner (uunet!tektronix!hammer!frip!andrew) [UUCP]
(andrew%frip.gwd.tek.com@relay.cs.net) [ARPA]