[comp.protocols.tcp-ip] SLIP CRC's and other reliability

rhorn@infinet.UUCP (Rob Horn) (04/15/88)

The practical engineering decision that CRC's are needed is based
upon: assumptions about how SLIP will be used, assumptions about the
environment that it will be used in, and cost (time, money, etc.)
assumptions.  To restate the often discovered and forgotten: IP
checksums are mostly a network performance optimization.  Transmission
errors that escape IP checksums usually escape TCP and vice versa.
(Note: I am already assuming that those concerned with reliability use
TCP.  This is only 95% correct.)  So adding checksums to SLIP must be
justified on network performance, not error detection.

My concern is error detection.  I think Rick Adams major mistake
is in assuming that SLIP will only be used over voice channels,
hence using modems.  The errors characteristic of voice channel +
modem are dropouts and random error bursts.  A checksum is almost
as good as a CRC at detecting these, so TCP works well.

My assumption is that SLIP will be used over not only voice
channels, but also muxes of many kinds, data PBX's, etc.  The
errors characteristic of these are dropouts, random error bursts,
duplications, and transpositions.  Checksums are very vulnerable
to transpositions.  Any even stride byte transposition will
escape detection.  Much more serious, TCP will probably also fail
to detect it.  I have experienced such errors with checksummed
links through failing multiplexors.

I think that a failure which escapes TCP is very serious and an
option to detect it is very important.  CRC's will catch
transpositions.  I only ask that CRC be available as a checking
option.  I see no need to burden a simple system with LAPB or
whatever.  If the CRC fails, just trash the packet.  The cost of
this is minor:  a few days programming and a few instructions per
byte.

If you look at costs further, a more cost effective way to
enhance reliability is forward error correction (FEC).  Instead
of retrying, spend cycles to fix damaged packets without retry. 
This is worthwhile whenever the incremental cost of FEC is less
than the cost of retransmission.  For a small system, time is
what counts.  Ten thousand instructions is nothing when compared
to a retransmission.  For a multi-user host, the tradeoff is not
obvious.

I have been contemplating a two-way interleaved (255,253)
Reed-Solomon FEC with assumed nulls for message length matching. 
This code would fix:
  a) any single erroneous byte
  b) any two erroneous bytes with odd stride (including
     transposition)
Other errors would result in either incorrect fixes or error
detection.  The incorrect fixes will trigger TCP checksum error
detection, including transposition cases.  My first complexity
estimate is that this FEC would take about twice as much CPU as a
16-bit CRC (for undamaged packets) and would add 32 bits of FEC
per 506 bytes of message instead of 16 bits of CRC.  More robust
FEC's exist if someone has good data on what errors need
correction.  This FEC alone will make a very big difference to a
marginal voice or asynch digital channel.  It can be cheaper than
the cost difference between an error correcting modem and a
normal modem to let the CPU spin a dozen extra instructions per
byte.

I have implemented LAPB on asynch and I think that FEC+IP+TCP
would be faster, more reliable, and less software.

-- 
				Rob  Horn
	UUCP:	...harvard!adelie!infinet!rhorn
		...ulowell!infinet!rhorn, ..decvax!infinet!rhorn
	Snail:	Infinet,  40 High St., North Andover, MA