[comp.protocols.misc] Table driven CRC evaluation

dg@lakart.UUCP (David Goodenough) (09/08/88)

I have recently implemented an XMODEM file transfer program on my CP/M machine
at home, and since I evaluate the CRC "on the fly" i.e. transmit a character
then update the CRC, I found that even at 9600 BPS I could keep up doing
in the "slow" way i.e. bit by bit (it takes a hair over 100 Us to update
ONE BYTE of CRC on a 4mhz z80, so I spend about 90% of my time idle. However
back to the question. I have seen table driven CRC software, that does each
byte in no time flat by looking in a table. However there is one major
difference: In my software I have to "finish" the crc evaluation by pushing
a pair of zero bytes into it, whereas the table version doesn't need this.
How does the table get away without the extra pair of bytes: because both
systems get it right for any number of bytes, 128 and 1024 being the
obvious pair, as an aside I've tried it with many other byte counts.

E-mail responses, as I don't usually read this newsgroup.

		Thanks in advance,
-- 
	dg@lakart.UUCP - David Goodenough		+---+
							| +-+-+
	....... !harvard!cca!lakart!dg			+-+-+ |
						  	  +---+

rpw3@amdcad.AMD.COM (Rob Warnock) (09/10/88)

In article <238@lakart.UUCP> dg@lakart.UUCP (David Goodenough) writes:
+---------------
| ...the question. I have seen table driven CRC software, that does each
| byte in no time flat by looking in a table. However there is one major
| difference: In my software I have to "finish" the crc evaluation by pushing
| a pair of zero bytes into it, whereas the table version doesn't need this.
| How does the table get away without the extra pair of bytes...
+---------------

Actually, you could rewrite your bit-by-bit algorithm to avoid this as well.
See below. [Warning: somewhat long, with the answer at the end.]

Look in a good book on error-correcting codes (they usually have coverage
of CRC's too) for exact details, but it goes somewhat like this:

There are two ways to look at CRC generation (let's take CRC-16 to be specific,
though it can be any cyclic code). From one view, you take the message (treated
as a binary polynomial), multiply it by x^16 (thus adding 16 zeroes at the end),
then divide it by the check polynomial, and replace the 16 zeroes with the
remainder of the division, such that when an error-free received message is
in turn divided by the same check polynomial the remainder will be zero.
(From your question, this is clearly what your program does.)

From the other point of view, you are *multiplying* the message polynomial
by the generator matrix of the particular cyclic code to get a codeword
of that code, such that when the codeword (i.e., the transmitted message)
is *multiplied* by the parity matrix of the same code you get an all-zero
syndrome vector. This view is actually more fundamental to coding theory
than the first, but is rarely used directly in practice.

These two views are reconciled by noting that since all this arithmetic
is going on in a Galois Field (finite field), all non-zero polynomials
have a multiplicative inverse modulo x^(order of the field) - 1 (or
something similar... please excuse fuzziness on details). This means
that instead of dividing, you can multiply by the inverse, and vice-versa.
Thus the dividing by the check polynomial is equivalent to multiplying
by the generator polynomial. (The parity and generator *matrices* are
related in a simple way to the generator *polynomial*.)

Moreover, when realizing the needed Galois Field arithmetic in hardware,
there are various ways to go about it. The straightforward implementation
of the decoder -- multiplication by the parity check matrix -- uses a 16-bit
shift register with its output fed back into itself with XOR gates at various
places [not surprisingly, those places correspond to the non-zero terms of
the check polynomial], and the received codeword XOR'd in at the input end
of the shift register. The syndrome (final contents of the shift register)
will be zero if there was no error.

The same circuit can be used to *encode* the message -- i.e., compute the
CRC when transmitting -- but has the problem, as you noted, that you need
to shift in 16 zeroes after the message is all in to correctly generate
the check bits. Those 16 "extra" zeros hold the place of the check bits
while encoding. In essence, to encode this way you are *decoding* the
message assuming zero check bits, looking to see what CRC error that would
produce, and then sending as the actual check bits exactly what would cancel
out the received "error". (The fact that the check bits you send happen to
be the same as the "error" bits when "decoding" the message_plus_16_zeros
is due to the fact that your are using modulo-2 arithmetic, i.e., XOR gates.
In any other number system you have to send the *complement* of the "error"
bits, which for binary just happens to be the same.) Anyway, the only problem
with doing it this way is that you have to take a "burp" of processing
(16 shifts) between the time the last message bit goes out before you can
send the first CRC bit.

You can get the effect of pre-multiplying the message by x^16 (appending
those zeros) by changing the shift register connections very slightly as
follows: Instead of XOR'ing the incoming message with the feedback to the
*input* stage of the shift register, XOR the message with the *output* of
the shift register to produce the feedback to all the stages (including
the input stage). That is, if in pseudo-C your original shift step was:

	message_bit = get_next_msg_bit();
	crc = (crc << 1 ) | message_bit;	/* shift in new bit */
	if (crc & 0x10000)			/* if bit shifted out, */
		crc ^= CHECK_POLYNOMIAL;	/* do feedback terms */

replace it by:

	message_bit = get_next_msg_bit();
	crc = (crc << 1) ^ (message_bit << 16); /* XOR shifted-out bit with
						   message bit */
	if (crc & 0x10000)			/* if need feedback, */
		crc ^= CHECK_POLYNOMIAL;	/* do feedback terms */

(Note that in both cases CHECK_POLYNOMIAL includes the 0x10000 bit to save
explicitly having to turn it off. I also assumed that "crc" could hold at
least 17 bits. I assume you can adjust the above code if "crc" is a short.)

Now, and here comes your answer after all the above verbiage, in the byte-
at-a-time table lookup CRC routines you refer to, the table simply stores
the pre-computed effect of the input byte on the feedback terms, and if
you examine the effect of a single bit of input on the CRC, you'll see
that they're using the *second* method. That is, they take the *high* byte
of the CRC and XOR it with a byte of input, use that to look up the effective
feedback bits, and XOR that with the old CRC shift up one byte. In C:

	msg_byte = get_next_msg_byte();
	crc = (crc << 8) ^ CRC_table[(crc >> 8) ^ msg_byte];

If they seem to be doing it the other way, watch closely to see which byte
of the CRC they *send* first. Whichever byte is sent first is the "high" byte,
regardless of whether it's stored in the high or low byte of the "crc"
variable. If they are storing the "high" byte in the low 8 bits of "crc"
in order to make the XOR easier, then the table is also byte-swapped.
By the way, it's a lot easier to see what's going on in the CRC-32 version
where you have *four* bytes of CRC. In C:

	msg_byte = get_next_msg_byte();
	crc32 = (crc32 << 8) ^ CRC32_table[(crc32 >> 24) ^ msg_byte];

Hope the above has helped, and apologize for the length...


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
ATTmail:  !rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403