NJG@CORNELLA.BITNET (07/24/86)
I received the following from Craig Watkins, the author of Jnet. I will also be uploading this information to PROB CRC-FAIL on VMSHARE. We are also following up on a suspicion expressed by a couple of sources that the reason that these RF broadband modems are generating this class of error is related to the scrambling polynomial used by them. The basic analysis is that if they have factors in common then we would see the sort of problems we are currently seeing. I'll upload more information as this suspicion is firmed up. -NJGimbrone ------------------------------------------------------------------------- In the following, I will probably tell you much that you already know since it is obvious that you have put a lot of work into this problem. Please forgive any of my explanation that you may already know. I first took the blocks of trace data that you sent me and ran them through some software and calculated the CRCs on them in the exact manner that your communications processor would do. This shows us that your modems are indeed creating errors that CRC16 is unable to catch and that your hardware is working correctly. I also performed CRC calculations on the data that you had in your posting. Each of the pairs of (short) data streams showed the same resultant CRC, again confirming that these were errors that are undetected with CRC16. I then tried to characterize what type of error represents something that will be undetected. A very basic understanding of the math behind CRCs is necessary. A transmitted bit stream can be thought of as a very large binary number. A CRC polynomial, such as CRC16 (X|16 + X|15 + X|2 + 1) can also be represented as the binary number 11000000000000101. The "CRC" code is the remainder resulting from the division of the bit stream by the CRC polynomial. To divide these binary numbers "modulo mathematics" is used. Involved in modulo mathematics is the use of exclusive-or'ing instead of the traditional additions and subtractions. This method makes the hardware simpler, as well as the concepts. It is fairly clear that if we divide G by P and get remainder X, we will also yield X as a remainder if we divide G+P by P. Further, we will achieve a remainder X when we divide G+nP by P where n is an integer. In the process of calculating a CRC, G is the binary bit stream, P is the CRC polynomial and X is the CRC check code. What the above suggests is that we can add any multiple of the polynomial to the actual bit stream and yield the same CRC check code. Since we are using modulo math, our adds will take on the form of exclusive-or's. The logical operation "exclusive-or" can be thought of as a bit-toggle, or in our particular analysis, an error. What I am suggesting is that you can take the CRC polynomial in the format of 11000000000000101 and exclusive-or it into the good bit stream, producing an erroneous bit stream which will produce the same CRC check code. Further, this mask can be exclusive-or'ed into the bit stream in any position, as many times as desired, and the resultant CRC check code will remain the same. You will note that in the five examples in your posting, each one was an instance of the single mask 11000000000000101 being XOR'ed into the bit stream. While looking at a hex dump, it is not immediately obvious that this is so. One must realize that each byte is sent LSB first. As an example, 52 7A B5 would be sent as: (first bit sent this end) 0100 1010 0101 1110 1010 1101 (last bit sent). Thus, it seems reasonable to assume that for some reason your RF modems tend to produce errors in the pattern of the CRC16 polynomial. Comments on specifics you mentioned in your posting: > We have seen cases of two such errors occurring in a buffer. In all > cases that we have looked at even closer the total number of incorrect > bits is even and 4 or more. We have seen 1 and 2 bit errors in > nibbles. We tend to see mostly a total of 4 bits in error, but have > identified cases of 6 and 8 bits in error in the RSCS buffer. Four-bit errors are the smallest number of bit errors that will be passed undetected because there are four terms in the CRC16 polynomial. Likewise, following the above discussion of exclusive-or'ing various positioned masks in the bit stream, you will also note that it is impossible to achieve an odd number of bit errors that go undetected. Your observations seem to be correct. > There has been statements made that the CRC-16 will tend to perform > better when small blocks are checked. We have been told that 742 is > one of the 'magic' numbers. Unfortunately RSCS V1.3's DMTVMB driver > will only allow block sizes as small as 824... Its NJI driver, which > goes down to 300, will not talk to another copy of itself. However we > have tried a modified version of the VMB driver with a block size of > 400 and have seen no change in the symptoms. For certain types of errors, it may be true that CRCs perform better on smaller blocks, but for your type of error, I don't believe the size of the block has much (if any) significance. Given that it seems that your RF modems produce errors in the same pattern of the CRC16 polynomial, and the error's placement in the block is not significant, varying the block size would seem to me to have no (or little) effect. Also, in regard to this, when you tell RSCS to send a buffer size of 742, RSCS sends blocks up to the size of the buffer and hardly ever fills the buffer to perfect capacity. > None of RSCS's drivers support block sizes this small. The VMB driver > does not use ITBs to force the re-calculation of the CRC on smaller > quantities. This is true. VMB always uses a single ITB in each block. > It is not yet clear to us if the NJI driver uses ITBs to > improve the performance of the CRC and yet keep the line performance up > by allowing large blocks to be sent before turning the line around. NJI doesn't use ITBs. I hope this information helps somewhat.