rpw3@fortune.UUCP (02/08/84)
#R:kobold:-27200:fortune:16200020:000:1178 fortune!rpw3 Feb 8 02:25:00 1984 And of course (?) everyone knows by now (?) that you can get even better with a 68000 by using the move-multiple-long (register load/store) instructions to eat and spew big gulps. 1. Save a few regs 2. While bunches left to do a. gulp into the regs b. spew out to memory c. adjust indices 3. copy the odd few words. Now, that strategy doesn't compare well against loop-unrolled move-long, since the move long takes care of the indices (movl a1@+,a2@+) and the moveml doesn't, but the moveml's can be loop-unrolled too! In that case, each load/store pair has a higher address offset word in the instruction ("moveml <d0-d6,a0-a4>,a5(offset1)"), and you fix up the whole loop with two adds at the end. In the limiting case (which you can get close to attaining while doing buffer-block moves), you only fetch 8 bytes of instructions for each 40 bytes of data copied (note that's 80 bytes touched), or just over 10% overhead. (See the code for "blt" that comes with the the MIT "C" compiler.) Rob Warnock UUCP: {sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3 DDD: (415)595-8444 USPS: Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065
jab@uokvax.UUCP (02/15/84)
#R:kobold:-27200:uokvax:3000016:000:825 uokvax!jab Feb 12 19:58:00 1984 kobolt!tjt proposed a byte-copying mechanism that was basically a loop that copied N bytes for every iteration of the loop instead of one byte per iteration. I'd say that if you get to that particular point, what you do is create a library routine called "copybytes" that is as machine-dependent as need be to get the speed you want, but that is retargeted on a per-machine basis, similar to the "strlen" and "index/rindex" code I've seen written for the VAX-11 already. I don't like the idea of adding "magic" control structures in order to outsmart the compiler, and I like the "asm" hack even less. Why not segregate the parts of the program that need to run FAST into separate modules and work on making those modules fast? The above library routine might be an example; there are many others. Jeff Bowles Lisle, IL
wescott@ncrcae.UUCP (Mike Wescott) (02/21/84)
[] > And of course (?) everyone knows by now (?) that you can get even better > with a 68000 by using the move-multiple-long (register load/store) > instructions to eat and spew big gulps. > . > . > . > (See the code for "blt" that comes with the the MIT "C" compiler.) You can also get into trouble if you're not careful. The version of "blt" we used here worked just fine, until it pointed out a microcode bug in the MC68000 for us. The movem instruction, when fetching from memory into registers does an extra word read. The extra word is thrown away and causes no problems, except when the extra read crosses into an unreadable segment, in which case you get an unwanted "segmentation violation" or "panic: kernel memory management error." The situation shows up (if my memory is correct), when the length of the move is a multiple of 48 bytes and abuts an unreadable segment ("blt" uses a 12 register movem). I recoded it and still managed to pick up a little performance. -Mike Wescott NCR Corporation, W. Columbia SC mcnc!ncsu!ncrcae!wescott