[net.unix-wizards] movc3 is NOT always faster!

tbray@mprvaxa.UUCP (Tim Bray) (03/13/84)

x <-- USENET insecticide

When you gaily issue a movc3, there's a LOT of microcode that
starts swishing around, and also if you're assembler hacking,
you might have to push all the registers that movc* steps on.

An internal DEC benchmark I saw once suggested that movc3 becomes
a win at about 100 bytes - fewer bytes than that and a tight
mov, aobleq loop is better.

Tim Bray ...decvax!microsoft!ubc-vision!mprvaxa!tbray
         ...ihnp4!alberta!ubc-vision!mprvaxa!tbray

rehmi@umcp-cs.UUCP (03/16/84)

Just out of curiosity, does anyone know if any version of movc[35]
shovels more than a byte at a time? What I'm thinking is, mightn't it
be faster (on large transfers) to movb up to 7 bytes to quad align
with respect to something, and then do the remainder with movq?

-- 
Uucp:      ..!seismo!umcp-cs!rehmi        By the fork, spoon, and
CsNet:     rehmi.umcp-cs@csnet-relay      exec of Lord Basfour's
InterNet:  rehmi@{maryland,umd-csd}       Publick High Guardian

rbbb%rice@sri-unix.UUCP (03/19/84)

From:  David Chase <rbbb@rice>

Probably true, but (on a 750, 4.1 version of cc) when benchmarking a
suitably packaged movc3 against various versions of a C loop doing the
same thing, the movc3 begins to win on 20 bytes (about 2 times as fast, it
seems).  On 100 byte moves it is winning by a factor of 7.  Also, since
everyone uses CALLS or CALLG, the registers get pushed at the procedure
call.

drc

dmmartindale@watcgl.UUCP (Dave Martindale) (03/20/84)

Movc[35] normally does 32-bit writes to memory, at least on the 780.
Reads are always 64 bits due to the cache.  The actual data transfer
is slower than doing a movq, since 4 SBI cycles are required for 2 32-bit
writes vs. 3 for one 64-bit write.  Also, if you have the old MS780C
memory controller, a 32-bit write has to do a read-modify-write cycle
for any write smaller than 64 bits.  The new controllers don't have
this problem.