[comp.protocols.tcp-ip] Unrolling TCP Checksum Loop

MAB@CORNELLC.BITNET (Mark Bodenstein) (10/06/87)
I received quite a few responses to my question about unrolling the TCP
checksum loop.  Thanks to all who replied.

Probably the most elegant solution came from Al Marshall at Proteon.
I've taken the liberty of including it here:

--------------------
Mark,
The simple way to unroll the loops (which is what I am doing now for the Novell
drivers for ProNET-10) is to divide the input packet size (from the registers)
by the length of the unrolled loop.  The whole number is the number used for
iteration AFTER jumping into the appropriate point of the loop for the
remainder.  The "appropriate point of the loop" is the remainder times
the number of bytes in the instruction length for each "move".

There are other ways to accomplish the same thing such as doing the whole
number part first then doing a short single cycle loop for the remainder.

Hope this helps you.  The actual speed improvement is significant because of
the way the 86,88,286,and 386 do instruction pre-fetches of bytes.  What it
does is that on each branch instruction the pre-fetch queue is dumped and
you have to fill it on each instruction.  You could see as much as 5 or 10
times the speed for things like this.
    -Al Marshall, Proteon
--------------------
Additional refinements came from CLYNN at BBN and from David C. Plummer
at Symbolics:

From CLYNN (paraphrased):

  - Use as much accumulator width as you've got, to process as many
    bytes at a time as possible.

  - For retransmissions, don't recompute the entire checksum, just
    subtract out the old header checksum and add in the new.

From David C. Plummer at Symbolics (paraphrased):

   - Make the unrolled loop length a power of 2.  Then the calculation
     of the quotient and remainder (see above) become a "shift" and an
     "and", respectively.
---------

Mark Bodenstein    (mab%cornellc.bitnet@wiscvm.wisc.edu)
Cornell University