cs450a03@uc780.umd.edu (03/25/91)
This is to follow up to another posting by Ray something-or-other (in another newsgroup). Last time I dealt with this stuff was in '89, so feel free to jump on me where I screw up. Hamming codes, in their basic form, are a specialized application of parity. Essentially, bits can be grouped together, and parity can be computed for each grouping, so that if some error flips one of the bits (either data or parities) it leaves a signature which can spot the impact point. Example, take an 8 bit word: bit 1, bit 2, bit 3, bit 4, ... bit 8 assign 4 parity bits, call them bit 9, bit 10, bit 11, and bit 12 call bits 1 3 5 7 and 9 a grouping, and set bit 9 for odd parity call bits 2 3 6 7 and 10 a grouping, and set bit 10 for odd parity call bits 4 5 6 7 and 11 a grouping, and set bit 11 for odd parity call bits 8 thru 12 a grouping, and set bit 12 for odd parity That feels a bit lumpy, but nevermind... Now, an error flips a bit, and wango, you get the bit's address in the parity bits (bit 9, 11 go off, that's 12 11 10 9 : 0 1 0 1 : error in bit 5. Now what if you want to correct two-bit errors? I think you can just repeat the process (make a hamming code for a 12-bit word), and go through two iterations of correction and make out all right most of the time (all of the time?). But that's not even close to rigorous. Somebody want to bail me out here, before I reveal myself as a total idiot? Thanks, Raul Rockwell P.S. remember: underlying assumption is we're just considering bit-flipping, no framing errors.
mayne@zeta.cs.fsu.edu (William Mayne) (03/26/91)
In article <25MAR91.01330289@uc780.umd.edu> cs450a03@uc780.umd.edu writes: [Description and example of a Hamming code omitted.] Here is an alternative description of a Hamming code which I think is easier to understand and generalize. The purpose of the code is to detect and correct 1 bit errors in blocks of bits. Number the bits 1,2,3,...n, including both data and check bits. The trick is to devise a check function which will be 0 if no bit errors have occurred and will indicate the location of any single bit error. Any block containing a one bit error can then be corrected by changing the indicated bit. The required check function is based on the bit wise exclusive or of the binary representation of the position of each bit which is 1. The check bits are all those whose indexes are powers of 2. It is convenient to number the bits of a block so that the check bits are all collected together at the end, thus: [data] 3,5,6,7,9,...n [check] int(log2(n)),...,4,2,1 This assumes that n itself is not a power of 2. Otherwise it would be a check bit. The method would still work, but is easier to explain if n is not a power of 2 and is actually most efficient when n=2**m-1. The values of the check bits can be computed by computing the check value of the data bits. The binary representation of the result gives the bit string for the check bits (right justified). It is easy to see that this makes the check value of the whole block 0, and that if one bit is changed the new check value will be the position of that bit, regardless of whether it is a data bit or a check bit. This is the most efficient possible coding which will detect and correct one bit errors. Since only int(log2(n)) extra bits are required for n-int(log2(n)) data bits the efficiency increases exponentially with the block size, but so does the chance of a 2 bit error. Implementation may be easier and reliability improved if several checks are performed in parallel. For example, given a stream of 8 bit bytes to be encoded, compute 8 check values, so each byte contributes just one bit to each of the check values. Besides being arguably easier to program (less bit shifting), this will correct errors in up to 8 consecutive bits, an advantage if noise is likely to occur in short bursts. >Now what if you want to correct two-bit errors? I think you can just >repeat the process (make a hamming code for a 12-bit word), and go >through two iterations of correction and make out all right most of >the time (all of the time?). But that's not even close to rigorous. I'm not sure about this. I don't think error correcting for two bit errors is that simple, but I haven't pursued it. Any other ideas or better yet informed answers? Bill Mayne (mayne@cs.fsu.edu)
rjc@geech.gnu.ai.mit.edu (Ray Cromwell) (03/26/91)
I don't want to offend the poster in any way, but the first explanation of hamming codes was easier for me to understand. It was short and simple and used example diagrams. The second explanation was valuable, but I got lost in the int(log()) stuff and had to re-read it a few times before I caught on. This hamming code discussion is very educational for me since I always wondered how they worked, so if anyone else adds to this thread (correction of 2 bit errors) try to use examples of bit streams. -- /~\_______________________________________________________________________/~\ |n| rjc@albert.ai.mit.edu Amiga, the computer for the creative mind. |n| |~| .-. .-. |~| |_|________________________________| |_| |________________________________|_|
cs450a03@uc780.umd.edu (03/26/91)
William Mayne writes > Me >> [... careful explanation of Hamming codes elided ...] >>Now what if you want to correct two-bit errors? I think you can just >>repeat the process (make a hamming code for a 12-bit word), and go >>through two iterations of correction and make out all right most of >>the time (all of the time?). But that's not even close to rigorous. > >I'm not sure about this. I don't think error correcting for two bit >errors is that simple, but I haven't pursued it. Any other ideas or >better yet informed answers? First off, it's occurred to me that unless you permute the bits, a second iteration of Hamming coding would be pretty silly. If you do permute them (say reverse order), well... I still need to think about that. Second off, it should be noted that some extra corrective redundancy can be had by applying Hamming to a smaller word. But as the word gets smaller you approach pure redundancy. Here permuting might be used to get you away from errors which hit several adjacent bits. And, I suppose the most general way of correcting N bit errors is with redundancy greater than 1+2N. Still, I think the original question that Ray (or whoever it was) was posing was how do you get a hamming distance greater than 2 between the encodings of a word of data. I'm inclined to say take a Grey code, and choose every n-th number (where n is a number that depends on the number of bits distance you want). 0 0 0 0 0 0 start: 0 bits distance 0 0 0 0 0 1 n=1 1 bit distance (straight encoding) 0 0 0 0 1 1 n=2 2 bit distance (analogous to hamming code) 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 1 1 n=5 3 bit distance ????? (won't wrap around properly) 0 0 0 1 0 1 and I think n=10 is 4 bits distance. I dunno, I still feel like I'm missing something pretty basic. (Thanks for posting Bill, I think you did capture the exact definition of Hamming code.) Raul Rockwell
cs450a03@uc780.umd.edu (03/26/91)
Please note, in my gauche analogy of Grey code to Hamming, that parity has a bit distance of 2, Hamming has a bit distance of 3 (and is not distributed linearly), and "2 bit error correcting" ought to have a bit distance of at least 5 (4 ought to function, but leave no error detection ability). *sigh* Also note that coming up with a distribution is not sufficient. There must also exist a method to correct the errors that can occur. Raul Rockwell
mayne@delta.cs.fsu.edu (William Mayne) (03/26/91)
In article <26MAR91.00093317@uc780.umd.edu> cs450a03@uc780.umd.edu writes: > [... careful explanation of Hamming codes elided ...] Thanks, but as I think back on it it wasn't so careful. When I gave the indexes of the check bits it should obviously have been: 2**(log2(n)),...,4,2,1 >>I'm not sure about this. I don't think error correcting for two bit >>errors is that simple, but I haven't pursued it. Any other ideas or >>better yet informed answers? > >First off, it's occurred to me that unless you permute the bits, a >second iteration of Hamming coding would be pretty silly. If you do >permute them (say reverse order), well... I still need to think about >that. You are right, at least about the first part. Also I can't think of any way to permute the bits for a second application of a Hamming code to correct 2 bit errors. But surprisingly (to me) the number of redundant bits needed to correct errors in up to 2 bits is only about twice that required for 1 bit correction. I would have expected more than that. >Second off, it should be noted that some extra corrective redundancy >can be had by applying Hamming to a smaller word. But as the word >gets smaller you approach pure redundancy. Here permuting might be >used to get you away from errors which hit several adjacent bits. The method I described of slicing words of a message into bits and applying Hamming coding to corresponding bits from consecutive words will work to correct errors in n consecutive bits, where n is the word size, whether or not the n or fewer consecutive corrupted bits cross a word boundary. >And, I suppose the most general way of correcting N bit errors is with >redundancy greater than 1+2N. The exact amount of redundancy needed for a word size of n (where n counts both bits of useful data and check bits) is: n=1 -> log2(n+1) [This proves the Hamming code is the most efficient possible to correct 1 bit errors.] n=2 -> log2(n**2+1) [Only twice as many check bits.] n=3 -> log2(n**3-2*n**2+2*n+1) [Unless I've made an algebraic mistake, which is not unlikely.] The general rule for this is derived as follows: For n bits there are 2**n distinct configurations. If there are up to m bits which may be corrupted there are f(m,n) different ways that one intended message can be code, where f(m,n) is as defined below. So n bits with up to m errors can contain (2**n)/f(m,n) different messages, or log2 of that many bits of data. Thus the number of bits devoted to error correction is log2(f(m,n)). To construct f(m,n) we have to ask how many configurations there are for a given message with Hamming distance less than or equal to m. For m=1 this is n+1 (the correct version plus one for each possible error position). For m=2 is is n(n-1)+n+1=n**2+1, i.e. n(n-1) with 2 bad bits, n with 1 bad bit, and 1 with no bad bits. For m=3 it is n(n-1)(n-2)+n(n-1)+n+1. It is hard to represent the general form in ascii, but you just keep adding terms on the left as n(n-1)(n-2)...(n-m-1). Obviously the result with be O(n**m). This establishes the lower limit on redundancy. The Hamming code for m=1 is optimal and the specific coding used is elegant since it is so easy to understand and compute. Unfortunately I don't any elegant codes for m>1, or if there is a general rule to construct them in an elegant way. CAVEAT: This isn't really my specialty. I've had only minimal formal instruction in error correcting codes and derived the formula and proof that Hamming codes are optimal myself. If I've made some gross error either in my thinking or in typing it in I expect I'll hear about it. Asbestos suit on. >[Suggestion based on Grey code omitted.] I believe the idea is correct, if I understand you correctly. But this doesn't answer the question of how to actually map data to and from the redundant representation needed. It would be nice if there is a simple algorithm such as the one for the Hamming code which could be applied for any number of errors. Unfortunately I don't have time to pursue that. Bill Mayne (mayne@cs.fsu.edu)
ddr@cs.edinburgh.ac.uk (Doug Rogers) (03/29/91)
The first posting on this was flawed in that it did not take acount of the check bits themselves being faulty. I've always thought of hamming codes in terms of the binary value of the bit Example for 16 bits of which Bit address Purpose 0 0000 overall parity check for 2 bit error detection 1 0001 check bit 2 0010 check bit 3 0011 data 4 0100 check bit 5 0101 data 6 0110 data 7 0111 data 8 1000 check bit 9 1001 data A 1010 data B 1011 data C 1100 data D 1101 data E 1110 data F 1111 data By making the check sum for all bits with one of their address bits set at 1 be even, then the offending error is found by anding the addresses of the check bits that fail on receipt. For actual implementation the addresses may be shuffled to combine the check bits at one end of the message. This though illustrates that : s':=lg2(s')+s+1 where s' is the total number of bits needed and s is the number of bits available for data. which is not soluable to give an easy equation for the number of check bits needed. -- Douglas Rogers JANET: ddr@uk.ac.ed.lfcs Department of Computer Science UUCP: ..!mcvax!ukc!lfcs!ddr University of Edinburgh ARPA: ddr%lfcs.ed.ac.uk@nsfnet-relay.ac.uk Edinburgh EH9 3JZ, UK. Tel: 031-650 5172 (direct line)
mayne@delta.cs.fsu.edu (William Mayne) (03/30/91)
In article <8380@skye.cs.ed.ac.uk> ddr@cs.edinburgh.ac.uk (Doug Rogers) writes: >The first posting on this was flawed in that it did not take acount of the check bits themselves being faulty. > >I've always thought of hamming codes in terms of the binary value of the bit > >Example for 16 bits of which > > Bit address Purpose > 0 0000 overall parity check for 2 bit error detection > 1 0001 check bit >[Rest of table omitted] For brevity and generality, the check bits are exactly those whose bit addresses are powers of 2, namely 1, 2, 4, 8... The property of interest for the purpose of understanding the check calculation is that the binary representation has just one 1. Numbering a bit 0 and using it for overall parity is separate from the Hamming code. It is a good idea since the Hamming code which starts numbering bits with 1 is most efficient when the word size is a Mersenne (sp?) number (2^n-1) and common word sizes are 2^n. In other words that extra bit would be useless in a pure Hamming code, 2^n bits can contain no more data bits than 2^n-1, so the extra bit in a word size of 2^n might as well be used as a parity bit as suggested. Otherwise it would go to waste. The reason is clear. If the Hamming code were applied over the whole word the last bit would be numbered 2^n and that would be another check bit. >By making the check sum for all bits with one of their address bits set at 1 be >even, then the offending error is found by anding the addresses of the check bits that fail on receipt. I find this confusing and probably wrong. I say "probably wrong" because I frankly don't understand it clearly enough to apply and test it. I suggest the following alternative description. The receiver's calculation of the check, which uses only the bits numbered 1 and higher is this: check:=0; for i=1 to n /* n=highest bit address */ if bit(i)=1 then check:=check XOR i /* bitwise exclusive or of binary */ if check not = 0 then /* check = address of error bit */ bit(check) = bit(check) XOR 1 /* correct bit(check) */ The key to the Hamming coding is that the check bits are calculated to make the check "sum" come out 0 if there are no errors. The sender does this by taking the XOR of the bit positions of all the data bits which are 1. This will be a number <= n. That number is then used to set the check bits. Say the check on the data bits only came out to be 6 (binary 0110). Then set check bits 4 and 2 (4+2=6). Now the check over all the bits (except the parity bit) will be 0. If any one of the data or check bits is changed the check sum will be the address of that bit. The added parity check bit is calculated by the sender after the check bits are set. Either even or odd parity can be used for this. The receiver performs the Hamming correction shown above before checking the parity, which is intended to detect two bit errors. (No pun intended, though I don't think this would be a pun in the U.K. anyway. I note that the previous post was from Scotland.) It is only possible to detect a two bit error, not to repair it. By the way, it is ironic to note that there is a 1 bit error which cannot be distinguished from a two bit error and therefore can't be repaired. That is if the parity bit, which played no part in the Hamming code, is flipped. The two bit error detection works because any two bit error not involving bit 0 will cause the calculated check to be non-zero, causing the Hamming check to change 1 bit (probably (certainly?) not one of the two which were wrong, it turns out). The parity would have been correct before, so the the Hamming correction will cause the resulting parity to be wrong. A two bit error involving bit 0 will be detected because the Hamming correction will correctly repair the other bad bit, but the parity bit itself will not be corrected by the Hamming correction. >For actual implementation the addresses may be shuffled to combine the check bits at one end of the message. This though illustrates that : > s':=lg2(s')+s+1 > >where s' is the total number of bits needed and s is the number of bits available for >data. >which is not soluable to give an easy equation for the number of check bits needed. This is actually easier than it seems. Of s' bits exactly log2(s') will be used for check bits, which is to say just the number of bits whose addresses are powers of 2. The rest are available for data, so s=s'-log2(s') or s'=s+log2(s'). This ignores the extra 1 for the parity bit which, as shown above, isn't actually part of the Hamming code scheme. I wouldn't say that it is difficult to find s' given s. The log2 calculation asks only for the largest integer in log2(s'). As a first approximation try substituting log2(s). Chances are when truncated log2(s) and log2(s') will be the same. Next calculate s'=s+log2(s). If it happens that log2(s') is not actually = log2(s) just increase s' by 1. If s'-log2(s') is bigger than the s you started with it just means that for the log2(s') check bits you have more data bits. If you don't need them subtract the extra bits from s' for the actual number of total bits you need to encode s data bits. (This might be clearer if I renamed s and s' at each step, but I find too names confusing rather than helpful.) Theory: The trick to correcting a block containing up to n error bits is to map the data bit string on to a larger bit string containing check bits in such a way that the Hamming distance between the encoding of any two bit strings is greater than n. The Hamming distance between any two bit strings of equal size is just the number of bit positions at which the two strings are different, i.e. the number of 1s in the bitwise XOR of the strings. Then if n or fewer bits are changed the result cannot be the same as the proper image of any other data bits. Correction is accomplished by determing which data would yield a coded string whose Hamming distance from the coded string actually received is n or less. This is guaranteed to be unique. Another way of looking at it is to find a way to convert the corrupted block to a valid one by changing n or fewer bits. The great elegance of the Hamming code described above for the case of n=1 is that it is the most efficient possible (i.e. it requires the fewest possible check bits for any given number of data bits) and it is easy to calculate. I don't know how to generalize it for n>1, or if any similarity elegant algorithms for large n are known. Bill Mayne (mayne@cs.fsu.edu)