[net.lang.c] Casting Pointers -- fast *portab

rpw3@fortune.UUCP (02/08/84)

#R:kobold:-27200:fortune:16200020:000:1178
fortune!rpw3    Feb  8 02:25:00 1984

And of course (?) everyone knows by now (?) that you can get even better
with a 68000 by using the move-multiple-long (register load/store)
instructions to eat and spew big gulps.

	1. Save a few regs
	2. While bunches left to do
	   a. gulp into the regs
	   b. spew out to memory
	   c. adjust indices
	3. copy the odd few words.

Now, that strategy doesn't compare well against loop-unrolled move-long,
since the move long takes care of the indices (movl a1@+,a2@+) and
the moveml doesn't, but the moveml's can be loop-unrolled too! In that
case, each load/store pair has a higher address offset word in the
instruction ("moveml <d0-d6,a0-a4>,a5(offset1)"), and you fix up the
whole loop with two adds at the end. In the limiting case (which you
can get close to attaining while doing buffer-block moves), you only
fetch 8 bytes of instructions for each 40 bytes of data copied (note
that's 80 bytes touched), or just over 10% overhead.

(See the code for "blt" that comes with the the MIT "C" compiler.)

Rob Warnock

UUCP:	{sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3
DDD:	(415)595-8444
USPS:	Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065

jab@uokvax.UUCP (02/15/84)

#R:kobold:-27200:uokvax:3000016:000:825
uokvax!jab    Feb 12 19:58:00 1984

kobolt!tjt proposed a byte-copying mechanism that was basically a loop
that copied N bytes for every iteration of the loop instead of one byte
per iteration.

I'd say that if you get to that particular point, what you do is create
a library routine called "copybytes" that is as machine-dependent as need
be to get the speed you want, but that is retargeted on a per-machine basis,
similar to the "strlen" and "index/rindex" code I've seen written for the
VAX-11 already.

I don't like the idea of adding "magic" control structures in order to
outsmart the compiler, and I like the "asm" hack even less. Why not segregate
the parts of the program that need to run FAST into separate modules and work
on making those modules fast? The above library routine might be an example;
there are many others.

	Jeff Bowles
	Lisle, IL

wescott@ncrcae.UUCP (Mike Wescott) (02/21/84)

[]
> And of course (?) everyone knows by now (?) that you can get even better
> with a 68000 by using the move-multiple-long (register load/store)
> instructions to eat and spew big gulps.
>        .
>        .
>        .
> (See the code for "blt" that comes with the the MIT "C" compiler.)

You can also get into trouble if you're not careful. The version
of "blt" we used here worked just fine, until it pointed out a
microcode bug in the MC68000 for us. The movem instruction, when
fetching from memory into registers does an extra word read. The
extra word is thrown away and causes no problems, except when the
extra read crosses into an unreadable segment, in which case you
get an unwanted "segmentation violation" or "panic: kernel memory
management error."

The situation shows up (if my memory is correct), when the length
of the move is a multiple of 48 bytes and abuts an unreadable
segment ("blt" uses a 12 register movem).

I recoded it and still managed to pick up a little performance.

-Mike Wescott
 NCR Corporation, W. Columbia SC
 mcnc!ncsu!ncrcae!wescott