[comp.arch] IBM-801 Bytestring Move

baum@Apple.COM (Allen J. Baum) (09/09/88)

[]
>In article <22859@amdcad.AMD.COM> tim@delirun.amd.com (Tim Olson) writes:
>We use an interesting trick in the Am29000 memcpy routine for
>source/destination misalignment.  In this case, we set up the alignment
>difference in the funnel-count register, read in two source words, and
>"extract" a destination word using the funnel-shifter's ability to
>extract any 32-bit word from a 64-bit double-word in a single cycle. 
>The inner loop then consists of shifting the low source word to the high
>source word, reading a new low source word, extracting a destination and
>storing it (well, there is also the overhead of counting down the
>correct number of "word" moves, but you get the idea.)

IBM was very interested in making this work real fast, so one of the variations
of the 801s shift instruction was:
  Funnel shift MQ,Reg1 by amt->Mem(reg2++), Reg1->MQ
This did the alignment, the store, and the address update, all in one 
instruction, and then moved the source to MQ so that you could do it again
without moving registers around, or unrolling. Furthermore, there were 
variations that looked at the low 2 bits of reg2(='n') (the address pointer),
and would do things like only store in the first 'n' bytes  or last '-n' bytes
of memory in order to handle the beginning and ending cases. You could write
code that will move 4 bytes in 3 cycles, which is awfully damn quick
(Load, Decr leng&Branch, Align&Store in delay slot).

--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385