[comp.protocols.tcp-ip] TCP checksum unrolling

geof@imagen.UUCP (Geof Cooper) (10/07/87)

Here is one, undebugged, that illustrates the concept.  It
also uses the trick that (if you have it) you can use 32-bit
two's complement addition and add all the carries in at the
end (another trick that is sometimes faster is to generate
a 32-bit one's complement sum and then add the top and bottom
halves together to get the 16-bit sum).  Some C compilers
won't accept the wierd syntax below; or maybe I should point
out, as you wretch on the floor, that there is at least ONE
c compiler that DOES accept this syntax.

It is trivial to code it for all C compilers -- but
what you really want to do is code the exact intent of the
following into assembly language.  That makes it a lot faster
to add the two halves of a 32-bit word.

These tricks don't work for XNS checksums.  Our experience is
that this difference alone makes our XNS implementation a little
slower than our TCP implementation on a 68000.

- Geof

checksum(p, n)
    unsigned short *p;
    short n;
{
    short nloop;
    short nrem;
    unsigned long sum;

    sum = 0;
    if ( n > 0 ) {
        nloop = (n >> 3) + 1;
        nrem  = n & 7;

        switch ( nloop ) {

            do {
                    sum += *p++;
                case 7:
                    sum += *p++;
                case 6:
                    sum += *p++;
                case 5:
                    sum += *p++;
                case 4:
                    sum += *p++;
                case 3:
                    sum += *p++;
                case 2:
                    sum += *p++;
                case 1:
                    sum += *p++;
                case 0:
            } while ( --nloop > 0 );
        }
    }

    sum = (sum >> 16) + (sum & 0xffff);
    sum = (sum >> 16) + (sum & 0xffff);

    return ( sum );
}

henry@utzoo.UUCP (10/21/87)

> ... Some C compilers won't accept the wierd syntax below; or maybe I
> should point out, as you wretch on the floor, that there is at least ONE
> c compiler that DOES accept this syntax.

This particular piece of ugliness (switch labels inside a loop, for loop
unrolling) is known as Duff's Device, and is legitimate C that any correct
C compiler is supposed to accept.

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry