tbray@mprvaxa.UUCP (Tim Bray) (03/13/84)
x <-- USENET insecticide When you gaily issue a movc3, there's a LOT of microcode that starts swishing around, and also if you're assembler hacking, you might have to push all the registers that movc* steps on. An internal DEC benchmark I saw once suggested that movc3 becomes a win at about 100 bytes - fewer bytes than that and a tight mov, aobleq loop is better. Tim Bray ...decvax!microsoft!ubc-vision!mprvaxa!tbray ...ihnp4!alberta!ubc-vision!mprvaxa!tbray
rehmi@umcp-cs.UUCP (03/16/84)
Just out of curiosity, does anyone know if any version of movc[35] shovels more than a byte at a time? What I'm thinking is, mightn't it be faster (on large transfers) to movb up to 7 bytes to quad align with respect to something, and then do the remainder with movq? -- Uucp: ..!seismo!umcp-cs!rehmi By the fork, spoon, and CsNet: rehmi.umcp-cs@csnet-relay exec of Lord Basfour's InterNet: rehmi@{maryland,umd-csd} Publick High Guardian
rbbb%rice@sri-unix.UUCP (03/19/84)
From: David Chase <rbbb@rice> Probably true, but (on a 750, 4.1 version of cc) when benchmarking a suitably packaged movc3 against various versions of a C loop doing the same thing, the movc3 begins to win on 20 bytes (about 2 times as fast, it seems). On 100 byte moves it is winning by a factor of 7. Also, since everyone uses CALLS or CALLG, the registers get pushed at the procedure call. drc
dmmartindale@watcgl.UUCP (Dave Martindale) (03/20/84)
Movc[35] normally does 32-bit writes to memory, at least on the 780. Reads are always 64 bits due to the cache. The actual data transfer is slower than doing a movq, since 4 SBI cycles are required for 2 32-bit writes vs. 3 for one 64-bit write. Also, if you have the old MS780C memory controller, a 32-bit write has to do a read-modify-write cycle for any write smaller than 64 bits. The new controllers don't have this problem.