bb@wjh12.UUCP (byer) (07/11/83)
[ Reference to article umcp-cs.674 ] Ferst, we shuddn't be so hastey (* see below) to criticize (destructively, no less) other people's code (``... that routine ... sucks''); as Chris is surely aware, during development of a large & complex program, the author is correct to concentrate on the larger problem and leave the `trivial' details until later. After all, the routine works properly on the 11 & Vax, and those are the architectures for which Bell licenses it. If anyone wishes to sling stones (or feces), they should aim at Unisoft, for they did not bother to verify the compatibility of their implementation before shipping it. Enough dirt; on to the meat... I have made further improvements to Chris Torek's submission; the resulting routine is about 17% faster. Since I don't believe in `fast enough', I have fiddled with the Vax assembly code for an additional 25% savings. Considering the many megabytes passing through this routine daily, such anality might be warranted. ----- * 3 spelling errors - prerequisite for a news submission, no?? ----- For those with 11's, there are savings to be gained, but smaller. Only with mods to the assembly code could I get anything really worthwhile (20%). Timings: microsecs per call of chksum(buf, 128) original C. Torek's below (C) below (as) VAX-11/780 2460 2050 1720 1300 PDP-11/44 2500 -- 2400 2140 The modified code follows: For the Vax: chksum (s, n) register char *s; register n; { register sum, x; register unsigned t; sum = -1; x = 0; do { /* Rotate left, copying bit 15 to bit 0 */ sum <<= 1; /* NOTE #1 */ if (sum & 0x10000) { sum &= 0xffff; sum++; } t = sum; /* NOTE #2 */ sum += (*s++ & 0377); /* NOTE #2 */ sum &= 0xffff; x += sum ^ n; if ((unsigned)sum <= t) /* (unsigned) not necessary */ sum ^= x; /* but doesn't hurt */ } while (--n > 0); /* NOTE #3 ^^ */ return (int) (short) sum; } NOTES: 1. All the savings are here; you figure it out. 2. This simplification didn't make any difference on the Vax, but probably would on other architectures/optimizers. 3. Surprisingly, this very common loop terminator generated less than optimal code. (dec r10; jgtr Lxx instead of sobgtr r10,Lxx ) Warning: hard-core hacking below -- (but it's worth an extra 25%) [ cf: cc -S -O chksum.c (extracted from pk0.c) ] .align 1 .globl _chksum .set L25,0xf80 .data .text _chksum:.word L25 movl 4(ap),r11 movl 8(ap),r10 mnegl $1,r9 clrl r8 L31:ashl $1,r9,r9 jbc $16,r9,L32 movzwl r9,r9 incl r9 L32:movl r9,r7 movzbl (r11)+,r0 addw2 r0,r9 xorl3 r10,r9,r0 addl2 r0,r8 cmpl r9,r7 jgtru L30 xorl2 r8,r9 L30:sobgtr r10,L31 cvtwl r9,r0 ret ( Can you squeeze another microsecond out? ) ------ For the 11: In C, the only change to be made is in the loop control, so as to execute the sob (an instruction, not a cry for capital punishment). Replace do { ... } while (--n > 0); with for (n++; --n >= 0; ) { ... } That's worth a measly 4% For a 20% saving, use the following assembly code: [ cf: cc -S -O chksum.c ] .globl _chksum _chksum: ~~chksum: jsr r5,csv mov 4(r5),r4 ~s=r4 mov 6(r5),r3 ~n=r3 sub $4,sp ~sum=r2 ~t=r1 ~x=177766 mov $-1,r2 clr -12(r5) inc r3 jbr L13 L20005:tst r2 jge L15 asl r2 inc r2 jbr L16 L15:asl r2 L16:mov r2,r1 movb (r4)+,r0 bic $-400,r0 add r0,r2 mov r3,r0 xor r2,r0 add r0,-12(r5) cmp r1,r2 jlo L13 mov -12(r5),r0 xor r0,r2 L13:sob r3,L20005 mov r2,r0 jmp cret ------- Brent Byer ``I think we're all bozos on this bus.'' Textware Intl. (decvax!genrad!wjh12!textware!brent)
padpowell@wateng.UUCP (PAD Powell[Admin]) (07/13/83)
When a hacker is stung, he really can do it. Personally, I wonder if anybody is going to go out and discover that the checksum thing can be done by peeking in a register in an HDLC chip, and then building a board... Sigh. Patrick Powell