bb@wjh12.UUCP (byer) (07/11/83)
[ Reference to article umcp-cs.674 ]
Ferst, we shuddn't be so hastey (* see below) to criticize (destructively,
no less) other people's code (``... that routine ... sucks''); as Chris
is surely aware, during development of a large & complex program, the
author is correct to concentrate on the larger problem and leave the
`trivial' details until later. After all, the routine works properly
on the 11 & Vax, and those are the architectures for which Bell licenses
it. If anyone wishes to sling stones (or feces), they should aim at
Unisoft, for they did not bother to verify the compatibility of their
implementation before shipping it. Enough dirt; on to the meat...
I have made further improvements to Chris Torek's submission; the
resulting routine is about 17% faster. Since I don't believe in
`fast enough', I have fiddled with the Vax assembly code for an
additional 25% savings. Considering the many megabytes passing
through this routine daily, such anality might be warranted.
-----
* 3 spelling errors - prerequisite for a news submission, no??
-----
For those with 11's, there are savings to be gained, but smaller.
Only with mods to the assembly code could I get anything really
worthwhile (20%).
Timings: microsecs per call of chksum(buf, 128)
original C. Torek's below (C) below (as)
VAX-11/780 2460 2050 1720 1300
PDP-11/44 2500 -- 2400 2140
The modified code follows:
For the Vax:
chksum (s, n)
register char *s;
register n;
{
register sum, x;
register unsigned t;
sum = -1;
x = 0;
do {
/* Rotate left, copying bit 15 to bit 0 */
sum <<= 1;
/* NOTE #1 */ if (sum & 0x10000) {
sum &= 0xffff;
sum++;
}
t = sum;
/* NOTE #2 */ sum += (*s++ & 0377);
/* NOTE #2 */ sum &= 0xffff;
x += sum ^ n;
if ((unsigned)sum <= t) /* (unsigned) not necessary */
sum ^= x; /* but doesn't hurt */
} while (--n > 0);
/* NOTE #3 ^^ */
return (int) (short) sum;
}
NOTES:
1. All the savings are here; you figure it out.
2. This simplification didn't make any difference on the Vax,
but probably would on other architectures/optimizers.
3. Surprisingly, this very common loop terminator generated
less than optimal code. (dec r10; jgtr Lxx instead of
sobgtr r10,Lxx )
Warning: hard-core hacking below -- (but it's worth an extra 25%)
[ cf: cc -S -O chksum.c (extracted from pk0.c) ]
.align 1
.globl _chksum
.set L25,0xf80
.data
.text
_chksum:.word L25
movl 4(ap),r11
movl 8(ap),r10
mnegl $1,r9
clrl r8
L31:ashl $1,r9,r9
jbc $16,r9,L32
movzwl r9,r9
incl r9
L32:movl r9,r7
movzbl (r11)+,r0
addw2 r0,r9
xorl3 r10,r9,r0
addl2 r0,r8
cmpl r9,r7
jgtru L30
xorl2 r8,r9
L30:sobgtr r10,L31
cvtwl r9,r0
ret
( Can you squeeze another microsecond out? )
------
For the 11:
In C, the only change to be made is in the loop control, so as to
execute the sob (an instruction, not a cry for capital punishment).
Replace do {
...
} while (--n > 0);
with for (n++; --n >= 0; ) {
...
}
That's worth a measly 4%
For a 20% saving, use the following assembly code:
[ cf: cc -S -O chksum.c ]
.globl _chksum
_chksum:
~~chksum:
jsr r5,csv
mov 4(r5),r4
~s=r4
mov 6(r5),r3
~n=r3
sub $4,sp
~sum=r2
~t=r1
~x=177766
mov $-1,r2
clr -12(r5)
inc r3
jbr L13
L20005:tst r2
jge L15
asl r2
inc r2
jbr L16
L15:asl r2
L16:mov r2,r1
movb (r4)+,r0
bic $-400,r0
add r0,r2
mov r3,r0
xor r2,r0
add r0,-12(r5)
cmp r1,r2
jlo L13
mov -12(r5),r0
xor r0,r2
L13:sob r3,L20005
mov r2,r0
jmp cret
-------
Brent Byer ``I think we're all bozos on this bus.''
Textware Intl. (decvax!genrad!wjh12!textware!brent)padpowell@wateng.UUCP (PAD Powell[Admin]) (07/13/83)
When a hacker is stung, he really can do it. Personally, I wonder if anybody is going to go out and discover that the checksum thing can be done by peeking in a register in an HDLC chip, and then building a board... Sigh. Patrick Powell