djones@megatest.UUCP (Dave Jones) (10/31/90)
Recently, in connection with the topic of hashing, Ron Irvine posted a C algorithm for calculating the CRC-16 number for a null-terminated string. I decided to grab it and put it in my utilities library. His routine was based on two sixteen-entry tables, because, he said, they would fit in cache. Well one machine I use has a much larger cache than that, and the other has none, so I decided just for the fun of it to convert the algorithm to one based on one 256-entry table. I did that, and testing has verified that it produces the same results. Now then, for purposes of documentation, and to verify the original algorithm, (and for my own education I guess), I decided to figure out what the thing actually does. I borrowed a book, _Computer Networks, Second Edition_, by Andrew S. Tanenbaum, (Simon & Schuster) and dug in. I coded up the algorithm they gave, using the example on page 210, fig. 4-7.(*) My code printed out intermediate results exactly matching those in the example. So far so good. Now I changed the two constants that in theory should change the example check-sum into the CRC-16 checksum. The result was that I got different numbers from my algorithm than from the posted one. Could this be because of some Big-endian/Little-Endian kind of disagreement? Does the posted algorithm in effect shift the bits out least-significant bit first, for example? Also the book specifies that the input is first tagged at the end with 16 zero-bits before doing the calculation. Does the algorithm published on the net effectively do that? It does not appear to, but I confess I have not yet figured out why the posted algorithm does anything similar to what the book describes. I guess what I am asking for is the "official" definition of a CRC-16 checksum as it applies to byte-streams rather than bit-streams, and why I don't get the same numbers that the posted algorithm gets. In another posting, you'll find a shar of the files under discussion here, including the "bit-shifting" toy algorithm based on what I read in the book. (*) Actually I had to change it a little, because the algorithm described was obviously wrong. It says, "A divisor is said to 'go into' a dividend if the dividend has as many bits as the divisor." I interpreted that to mean instead, "if the dividend and divisor have the same base-two log," which is consistent with the example, and with the statement that the algorithm can be implemented with "a simple shift register circuit".
ron@scocan.sco.COM (Ron Irvine) (11/06/90)
> in my utilities library. His routine was based on two sixteen-entry > tables, because, he said, they would fit in cache. Well one > machine I use has a much larger cache than that, and the other > has none, so I decided just for the fun of it to convert the algorithm > to one based on one 256-entry table. I did that, and testing > has verified that it produces the same results. Yes a 256 table can be faster if it is in cache. But if you are hashing a string for lookup in a table you may boot out some other useful data in the cache. The nibble table (2 * 16) has a better chance of cache hits than the sparse 256 table, and less chance of booting out other useful data, but it takes more work to calcutate a crc. Your mileage may vary. > Now then, for purposes of documentation, and to verify the original > algorithm, (and for my own education I guess), I decided to figure > out what the thing actually does. There are two popular 16 bit crc calculations. The one I used is the crc16 used for DDCMP and Bisync communications. Its polynomial is: x^16 + x^15 + x^2 + 1, with an initial value of 0. I call it crc16(). The other common crc is used in ADCCP, HDLC and SDLC; its polynomial is: x^16 + x^12 + x^5 + 1, with an initial value of -1. I call it ccitt(). If you are using the crc to hash you may want to play with both or invent your own. ----------- Here is some more code ----------------- /* Computing a POLY number from the crc equation. * Crc's are usually expressed as an polynomial expression such as: * * x^16 + x^12 + x^5 + 1 * * Since the highest order term is to the power 16 this is a * 16 bit crc. The POLY number is determined by setting bits * corresponding to the power terms in the polynomial equation * in an integer. To do this we number the bits of the integer * with the most significant bit = bit 0. CAUTION: this is the * opposite to the some bit numbering schemes. This is due to * the least significant bit first convention of crc calculations. * The above equation becomes: * * 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 * |___.___.___.___|___.___.___.___|___.___.___.___|___.___.___.___| * 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 * msb lsb * * Note: the "1" term is equivalent to "x^0" and the "x^16" * term is ignored (it does determine the length to be 16 bits). * Thus the POLY number is 0x8408 (ccitt crc). */ /* crc16 - DDCMP and Bisync (16 bit) * x^16 + x^15 + x^2 + 1 */ #define POLY_CRC16 0xa001 #define INIT_CRC16 0 /* CCITT - ADCCP, HDLC, SDLC (16 bit) * x^16 + x^12 + x^5 + 1 */ #define POLY_CCITT 0x8408 #define INIT_CCITT -1 /* a bit at a time */ unsigned crc_16(in) char *in; { register unsigned short crc, bit; register int i; crc = INIT_CRC16; while (*in) { for (i=1; i <= 0x80; i<<=1) { bit = (((*in)&i)?0x8000:0); if (crc & 1) bit ^= 0x8000; crc >>= 1; if (bit) crc ^= (int)(POLY_CRC16); } ++in; } return crc; } /* For even more fun - ccitt() crc routine */ /* nibble table for CCITT crc */ unsigned int ccitt_h[] = { 0x0000, 0x1081, 0x2102, 0x3183, 0x4204, 0x5285, 0x6306, 0x7387, 0x8408, 0x9489, 0xa50a, 0xb58b, 0xc60c, 0xd68d, 0xe70e, 0xf78f, }; unsigned int ccitt_l[] = { 0x0000, 0x1189, 0x2312, 0x329b, 0x4624, 0x57ad, 0x6536, 0x74bf, 0x8c48, 0x9dc1, 0xaf5a, 0xbed3, 0xca6c, 0xdbe5, 0xe97e, 0xf8f7, }; unsigned short ccitt(in, size) register unsigned char *in; { register unsigned int n, crc; crc = -1; while (*in) { n = *in++ ^ crc; crc = ccitt_l[n&0x0f] ^ ccitt_h[(n>>4)&0x0f] ^ (crc>>8); }; return(crc); }