dg@lakart.UUCP (David Goodenough) (09/08/88)
I have recently implemented an XMODEM file transfer program on my CP/M machine
at home, and since I evaluate the CRC "on the fly" i.e. transmit a character
then update the CRC, I found that even at 9600 BPS I could keep up doing
in the "slow" way i.e. bit by bit (it takes a hair over 100 Us to update
ONE BYTE of CRC on a 4mhz z80, so I spend about 90% of my time idle. However
back to the question. I have seen table driven CRC software, that does each
byte in no time flat by looking in a table. However there is one major
difference: In my software I have to "finish" the crc evaluation by pushing
a pair of zero bytes into it, whereas the table version doesn't need this.
How does the table get away without the extra pair of bytes: because both
systems get it right for any number of bytes, 128 and 1024 being the
obvious pair, as an aside I've tried it with many other byte counts.
E-mail responses, as I don't usually read this newsgroup.
Thanks in advance,
--
dg@lakart.UUCP - David Goodenough +---+
| +-+-+
....... !harvard!cca!lakart!dg +-+-+ |
+---+
rpw3@amdcad.AMD.COM (Rob Warnock) (09/10/88)
In article <238@lakart.UUCP> dg@lakart.UUCP (David Goodenough) writes: +--------------- | ...the question. I have seen table driven CRC software, that does each | byte in no time flat by looking in a table. However there is one major | difference: In my software I have to "finish" the crc evaluation by pushing | a pair of zero bytes into it, whereas the table version doesn't need this. | How does the table get away without the extra pair of bytes... +--------------- Actually, you could rewrite your bit-by-bit algorithm to avoid this as well. See below. [Warning: somewhat long, with the answer at the end.] Look in a good book on error-correcting codes (they usually have coverage of CRC's too) for exact details, but it goes somewhat like this: There are two ways to look at CRC generation (let's take CRC-16 to be specific, though it can be any cyclic code). From one view, you take the message (treated as a binary polynomial), multiply it by x^16 (thus adding 16 zeroes at the end), then divide it by the check polynomial, and replace the 16 zeroes with the remainder of the division, such that when an error-free received message is in turn divided by the same check polynomial the remainder will be zero. (From your question, this is clearly what your program does.) From the other point of view, you are *multiplying* the message polynomial by the generator matrix of the particular cyclic code to get a codeword of that code, such that when the codeword (i.e., the transmitted message) is *multiplied* by the parity matrix of the same code you get an all-zero syndrome vector. This view is actually more fundamental to coding theory than the first, but is rarely used directly in practice. These two views are reconciled by noting that since all this arithmetic is going on in a Galois Field (finite field), all non-zero polynomials have a multiplicative inverse modulo x^(order of the field) - 1 (or something similar... please excuse fuzziness on details). This means that instead of dividing, you can multiply by the inverse, and vice-versa. Thus the dividing by the check polynomial is equivalent to multiplying by the generator polynomial. (The parity and generator *matrices* are related in a simple way to the generator *polynomial*.) Moreover, when realizing the needed Galois Field arithmetic in hardware, there are various ways to go about it. The straightforward implementation of the decoder -- multiplication by the parity check matrix -- uses a 16-bit shift register with its output fed back into itself with XOR gates at various places [not surprisingly, those places correspond to the non-zero terms of the check polynomial], and the received codeword XOR'd in at the input end of the shift register. The syndrome (final contents of the shift register) will be zero if there was no error. The same circuit can be used to *encode* the message -- i.e., compute the CRC when transmitting -- but has the problem, as you noted, that you need to shift in 16 zeroes after the message is all in to correctly generate the check bits. Those 16 "extra" zeros hold the place of the check bits while encoding. In essence, to encode this way you are *decoding* the message assuming zero check bits, looking to see what CRC error that would produce, and then sending as the actual check bits exactly what would cancel out the received "error". (The fact that the check bits you send happen to be the same as the "error" bits when "decoding" the message_plus_16_zeros is due to the fact that your are using modulo-2 arithmetic, i.e., XOR gates. In any other number system you have to send the *complement* of the "error" bits, which for binary just happens to be the same.) Anyway, the only problem with doing it this way is that you have to take a "burp" of processing (16 shifts) between the time the last message bit goes out before you can send the first CRC bit. You can get the effect of pre-multiplying the message by x^16 (appending those zeros) by changing the shift register connections very slightly as follows: Instead of XOR'ing the incoming message with the feedback to the *input* stage of the shift register, XOR the message with the *output* of the shift register to produce the feedback to all the stages (including the input stage). That is, if in pseudo-C your original shift step was: message_bit = get_next_msg_bit(); crc = (crc << 1 ) | message_bit; /* shift in new bit */ if (crc & 0x10000) /* if bit shifted out, */ crc ^= CHECK_POLYNOMIAL; /* do feedback terms */ replace it by: message_bit = get_next_msg_bit(); crc = (crc << 1) ^ (message_bit << 16); /* XOR shifted-out bit with message bit */ if (crc & 0x10000) /* if need feedback, */ crc ^= CHECK_POLYNOMIAL; /* do feedback terms */ (Note that in both cases CHECK_POLYNOMIAL includes the 0x10000 bit to save explicitly having to turn it off. I also assumed that "crc" could hold at least 17 bits. I assume you can adjust the above code if "crc" is a short.) Now, and here comes your answer after all the above verbiage, in the byte- at-a-time table lookup CRC routines you refer to, the table simply stores the pre-computed effect of the input byte on the feedback terms, and if you examine the effect of a single bit of input on the CRC, you'll see that they're using the *second* method. That is, they take the *high* byte of the CRC and XOR it with a byte of input, use that to look up the effective feedback bits, and XOR that with the old CRC shift up one byte. In C: msg_byte = get_next_msg_byte(); crc = (crc << 8) ^ CRC_table[(crc >> 8) ^ msg_byte]; If they seem to be doing it the other way, watch closely to see which byte of the CRC they *send* first. Whichever byte is sent first is the "high" byte, regardless of whether it's stored in the high or low byte of the "crc" variable. If they are storing the "high" byte in the low 8 bits of "crc" in order to make the XOR easier, then the table is also byte-swapped. By the way, it's a lot easier to see what's going on in the CRC-32 version where you have *four* bytes of CRC. In C: msg_byte = get_next_msg_byte(); crc32 = (crc32 << 8) ^ CRC32_table[(crc32 >> 24) ^ msg_byte]; Hope the above has helped, and apologize for the length... Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 ATTmail: !rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403