[comp.protocols.misc] CRC vs data length

fdc@WATSUN.CC.COLUMBIA.EDU (Frank da Cruz) (04/13/89)

In all the literature I've encountered on the Cyclic Redundancy Check, proofs
of its effectiveness have not discussed the length of the data being checked.
The proofs generally run something like so: if there is a single-bit error,
then blah blah, and if there is a double bit error then ..., and if there is
an odd number of bit errors ..., and if there is an error burst less than the
length of the checking polynomial then ..., and if there is an error burst
equal in length to the checking polynomial then ..., etc.  But it seems to me
that the longer the data, the more likely there will be (for instance) several
error bursts, in different places, and there is some chance that these errors
will cancel each other out in the CRC.  Does the probability that this will
happen increase with the length of the data, given a certain average bit error
rate?  And this leads to the question, is there an optimum message length for
a given checking polynomial and error rate?  Obviously, the longer the
message, the more efficient the use of the medium provided retransmissions can
be kept to a minimum...

rpw3@amdcad.AMD.COM (Rob Warnock) (04/13/89)

fdc@WATSUN.CC.COLUMBIA.EDU (Frank da Cruz) writes:
+---------------
| In all the literature I've encountered on the Cyclic Redundancy Check, proofs
| of its effectiveness have not discussed the length of the data being checked.
| ...  And this leads to the question, is there an optimum message length for
| a given checking polynomial and error rate?  Obviously, the longer the
| message, the more efficient the use of the medium provided retransmissions can
| be kept to a minimum...
+---------------

Yes, CRC's are designed for an upper limit to message length. The upper limit
for CRC-16 is -- not surprisingly -- 2^16 bits, or 8192 bytes. (But note that
the CRC itself must be included in the 8192 bytes.) But multiple errors are
*much* more likely to be detected if the block is much smaller than the maximum.
And of course, the probability of multiple errors depends on the error rate
and the block length.

More interestingly, there is an optimum block length which depends strongly
on the error rate, the retransmission scheme, and the link delay -- but *not*
particularly on the CRC, assuming it's strong enough for the maximum block
length. This is often shown in books on error-correcting codes as a family
of throughput-vs-blocklength curves, and a typical curve is steadiliy rising
at first, then peaking, and tailing off to a steep dive back down. Look in
some of the older books for this, as the newer ones tend to concentrate on
error-correction rather than ARQ.  [Sorry I don't have any numbers for you.]

Selective retransmit helps, as do full duplex channels, forward-error-correction
codes, and other "advanced" features.

Two practical notes:

1. CRC-32 (the "Ethernet" checksum) is very easy to do by table lookup, and
   should probably be used instead of CRC-16 for any new protocol. It requires
   a lookup table of 256 32-bit words. Typical C code for each byte is:

	crc = (crc << 8) ^ CRC32table[ (crc >> 24) ^ new_char];

   Code has been posted to the net for this (several times!).

2. If the error rate is fairly high, and the link delay large, you may
   want to consider some limited error-correction. There is a simple
   Reed-Solomon code used in the IBM 3370 disk [described on pp.528-531
   of "Error Control Coding" by Lin & Costello] which should be really
   easy to do in software by table lookup. You need two 256-byte tables,
   and two table lookups and three XORs per data byte. This code will
   correct an entire byte being zapped. (You should probably interleave
   it 3-4 ways.) If the "data" also includes an internal CRC-32, you'll
   have pretty good protection against mis-correction.

   I'm considering using this for a version of SLIP. [News at 11...]


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403