rpw3@fortune.UUCP (02/08/84)
#R:kobold:-27200:fortune:16200020:000:1178
fortune!rpw3 Feb 8 02:25:00 1984
And of course (?) everyone knows by now (?) that you can get even better
with a 68000 by using the move-multiple-long (register load/store)
instructions to eat and spew big gulps.
1. Save a few regs
2. While bunches left to do
a. gulp into the regs
b. spew out to memory
c. adjust indices
3. copy the odd few words.
Now, that strategy doesn't compare well against loop-unrolled move-long,
since the move long takes care of the indices (movl a1@+,a2@+) and
the moveml doesn't, but the moveml's can be loop-unrolled too! In that
case, each load/store pair has a higher address offset word in the
instruction ("moveml <d0-d6,a0-a4>,a5(offset1)"), and you fix up the
whole loop with two adds at the end. In the limiting case (which you
can get close to attaining while doing buffer-block moves), you only
fetch 8 bytes of instructions for each 40 bytes of data copied (note
that's 80 bytes touched), or just over 10% overhead.
(See the code for "blt" that comes with the the MIT "C" compiler.)
Rob Warnock
UUCP: {sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3
DDD: (415)595-8444
USPS: Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065jab@uokvax.UUCP (02/15/84)
#R:kobold:-27200:uokvax:3000016:000:825 uokvax!jab Feb 12 19:58:00 1984 kobolt!tjt proposed a byte-copying mechanism that was basically a loop that copied N bytes for every iteration of the loop instead of one byte per iteration. I'd say that if you get to that particular point, what you do is create a library routine called "copybytes" that is as machine-dependent as need be to get the speed you want, but that is retargeted on a per-machine basis, similar to the "strlen" and "index/rindex" code I've seen written for the VAX-11 already. I don't like the idea of adding "magic" control structures in order to outsmart the compiler, and I like the "asm" hack even less. Why not segregate the parts of the program that need to run FAST into separate modules and work on making those modules fast? The above library routine might be an example; there are many others. Jeff Bowles Lisle, IL
wescott@ncrcae.UUCP (Mike Wescott) (02/21/84)
[] > And of course (?) everyone knows by now (?) that you can get even better > with a 68000 by using the move-multiple-long (register load/store) > instructions to eat and spew big gulps. > . > . > . > (See the code for "blt" that comes with the the MIT "C" compiler.) You can also get into trouble if you're not careful. The version of "blt" we used here worked just fine, until it pointed out a microcode bug in the MC68000 for us. The movem instruction, when fetching from memory into registers does an extra word read. The extra word is thrown away and causes no problems, except when the extra read crosses into an unreadable segment, in which case you get an unwanted "segmentation violation" or "panic: kernel memory management error." The situation shows up (if my memory is correct), when the length of the move is a multiple of 48 bytes and abuts an unreadable segment ("blt" uses a 12 register movem). I recoded it and still managed to pick up a little performance. -Mike Wescott NCR Corporation, W. Columbia SC mcnc!ncsu!ncrcae!wescott