MAB@CORNELLC.BITNET (Mark Bodenstein) (10/06/87)
I received quite a few responses to my question about unrolling the TCP checksum loop. Thanks to all who replied. Probably the most elegant solution came from Al Marshall at Proteon. I've taken the liberty of including it here: -------------------- Mark, The simple way to unroll the loops (which is what I am doing now for the Novell drivers for ProNET-10) is to divide the input packet size (from the registers) by the length of the unrolled loop. The whole number is the number used for iteration AFTER jumping into the appropriate point of the loop for the remainder. The "appropriate point of the loop" is the remainder times the number of bytes in the instruction length for each "move". There are other ways to accomplish the same thing such as doing the whole number part first then doing a short single cycle loop for the remainder. Hope this helps you. The actual speed improvement is significant because of the way the 86,88,286,and 386 do instruction pre-fetches of bytes. What it does is that on each branch instruction the pre-fetch queue is dumped and you have to fill it on each instruction. You could see as much as 5 or 10 times the speed for things like this. -Al Marshall, Proteon -------------------- Additional refinements came from CLYNN at BBN and from David C. Plummer at Symbolics: From CLYNN (paraphrased): - Use as much accumulator width as you've got, to process as many bytes at a time as possible. - For retransmissions, don't recompute the entire checksum, just subtract out the old header checksum and add in the new. From David C. Plummer at Symbolics (paraphrased): - Make the unrolled loop length a power of 2. Then the calculation of the quotient and remainder (see above) become a "shift" and an "and", respectively. --------- Mark Bodenstein (mab%cornellc.bitnet@wiscvm.wisc.edu) Cornell University