rhorn@infinet.UUCP (Rob Horn) (04/15/88)
The practical engineering decision that CRC's are needed is based upon: assumptions about how SLIP will be used, assumptions about the environment that it will be used in, and cost (time, money, etc.) assumptions. To restate the often discovered and forgotten: IP checksums are mostly a network performance optimization. Transmission errors that escape IP checksums usually escape TCP and vice versa. (Note: I am already assuming that those concerned with reliability use TCP. This is only 95% correct.) So adding checksums to SLIP must be justified on network performance, not error detection. My concern is error detection. I think Rick Adams major mistake is in assuming that SLIP will only be used over voice channels, hence using modems. The errors characteristic of voice channel + modem are dropouts and random error bursts. A checksum is almost as good as a CRC at detecting these, so TCP works well. My assumption is that SLIP will be used over not only voice channels, but also muxes of many kinds, data PBX's, etc. The errors characteristic of these are dropouts, random error bursts, duplications, and transpositions. Checksums are very vulnerable to transpositions. Any even stride byte transposition will escape detection. Much more serious, TCP will probably also fail to detect it. I have experienced such errors with checksummed links through failing multiplexors. I think that a failure which escapes TCP is very serious and an option to detect it is very important. CRC's will catch transpositions. I only ask that CRC be available as a checking option. I see no need to burden a simple system with LAPB or whatever. If the CRC fails, just trash the packet. The cost of this is minor: a few days programming and a few instructions per byte. If you look at costs further, a more cost effective way to enhance reliability is forward error correction (FEC). Instead of retrying, spend cycles to fix damaged packets without retry. This is worthwhile whenever the incremental cost of FEC is less than the cost of retransmission. For a small system, time is what counts. Ten thousand instructions is nothing when compared to a retransmission. For a multi-user host, the tradeoff is not obvious. I have been contemplating a two-way interleaved (255,253) Reed-Solomon FEC with assumed nulls for message length matching. This code would fix: a) any single erroneous byte b) any two erroneous bytes with odd stride (including transposition) Other errors would result in either incorrect fixes or error detection. The incorrect fixes will trigger TCP checksum error detection, including transposition cases. My first complexity estimate is that this FEC would take about twice as much CPU as a 16-bit CRC (for undamaged packets) and would add 32 bits of FEC per 506 bytes of message instead of 16 bits of CRC. More robust FEC's exist if someone has good data on what errors need correction. This FEC alone will make a very big difference to a marginal voice or asynch digital channel. It can be cheaper than the cost difference between an error correcting modem and a normal modem to let the CPU spin a dozen extra instructions per byte. I have implemented LAPB on asynch and I think that FEC+IP+TCP would be faster, more reliable, and less software. -- Rob Horn UUCP: ...harvard!adelie!infinet!rhorn ...ulowell!infinet!rhorn, ..decvax!infinet!rhorn Snail: Infinet, 40 High St., North Andover, MA