[comp.theory] Hamming codes 101

cs450a03@uc780.umd.edu (03/25/91)

This is to follow up to another posting by Ray something-or-other (in
another newsgroup).  Last time I dealt with this stuff was in '89, so
feel free to jump on me where I screw up.

Hamming codes, in their basic form, are a specialized application of
parity.  Essentially, bits can be grouped together, and parity can be
computed for each grouping, so that if some error flips one of the
bits (either data or parities) it leaves a signature which can spot
the impact point.

Example, take an 8 bit word: bit 1, bit 2, bit 3, bit 4, ... bit 8
assign 4 parity bits, call them bit 9, bit 10, bit 11, and bit 12

call bits 1 3 5 7 and 9 a grouping, and set bit 9 for odd parity
call bits 2 3 6 7 and 10 a grouping, and set bit 10 for odd parity
call bits 4 5 6 7 and 11 a grouping, and set bit 11 for odd parity
call bits 8 thru 12 a grouping, and set bit 12 for odd parity

That feels a bit lumpy, but nevermind...

Now, an error flips a bit, and wango, you get the bit's address in the
parity bits (bit 9, 11 go off, that's 12 11 10 9 : 0 1 0 1 : error in
bit 5.

Now what if you want to correct two-bit errors?  I think you can just
repeat the process (make a hamming code for a 12-bit word), and go
through two iterations of correction and make out all right most of
the time (all of the time?).  But that's not even close to rigorous.  

Somebody want to bail me out here, before I reveal myself as a total
idiot?

Thanks,

Raul Rockwell

P.S. remember: underlying assumption is we're just considering
bit-flipping, no framing errors.

mayne@zeta.cs.fsu.edu (William Mayne) (03/26/91)

In article <25MAR91.01330289@uc780.umd.edu> cs450a03@uc780.umd.edu writes:

[Description and example of a Hamming code omitted.]

Here is an alternative description of a Hamming code which I think is
easier to understand and generalize.  The purpose of the code is to
detect and correct 1 bit errors in blocks of bits. Number the bits
1,2,3,...n, including both data and check bits. The trick is to devise
a check function which will be 0 if no bit errors have occurred and
will indicate the location of any single bit error. Any block containing
a one bit error can then be corrected by changing the indicated bit.

The required check function is based on the bit wise exclusive or of
the binary representation of the position of each bit which is 1.
The check bits are all those whose indexes are powers of 2. It is
convenient to number the bits of a block so that the check bits are
all collected together at the end, thus:

[data]  3,5,6,7,9,...n
[check] int(log2(n)),...,4,2,1

This assumes that n itself is not a power of 2. Otherwise it would
be a check bit. The method would still work, but is easier to explain
if n is not a power of 2 and is actually most efficient when n=2**m-1.

The values of the check bits can be computed by computing the check
value of the data bits. The binary representation of the result gives
the bit string for the check bits (right justified). It is easy to see
that this makes the check value of the whole block 0, and that if one
bit is changed the new check value will be the position of that bit,
regardless of whether it is a data bit or a check bit. This is the most
efficient possible coding which will detect and correct one bit errors.

Since only int(log2(n)) extra bits are required for n-int(log2(n)) data
bits the efficiency increases exponentially with the block size, but so
does the chance of a 2 bit error.

Implementation may be easier and reliability improved if several
checks are performed in parallel. For example, given a stream of
8 bit bytes to be encoded, compute 8 check values, so each byte
contributes just one bit to each of the check values. Besides
being arguably easier to program (less bit shifting), this will
correct errors in up to 8 consecutive bits, an advantage if noise
is likely to occur in short bursts.

>Now what if you want to correct two-bit errors?  I think you can just
>repeat the process (make a hamming code for a 12-bit word), and go
>through two iterations of correction and make out all right most of
>the time (all of the time?).  But that's not even close to rigorous.

I'm not sure about this. I don't think error correcting for two bit
errors is that simple, but I haven't pursued it. Any other ideas or
better yet informed answers?

Bill Mayne (mayne@cs.fsu.edu)

rjc@geech.gnu.ai.mit.edu (Ray Cromwell) (03/26/91)

   I don't want to offend the poster in any way, but the first explanation
of hamming codes was easier for me to understand. It was short and simple
and used example diagrams. The second explanation was valuable, but I got
lost in the int(log()) stuff and had to re-read it a few times
before I caught on. This hamming code discussion is very educational 
for me since I always wondered how they worked, so if anyone else
adds to this thread (correction of 2 bit errors) try to use examples 
of bit streams.



--
/~\_______________________________________________________________________/~\
|n|   rjc@albert.ai.mit.edu   Amiga, the computer for the creative mind.  |n|
|~|                                .-. .-.                                |~|
|_|________________________________| |_| |________________________________|_|

cs450a03@uc780.umd.edu (03/26/91)

William Mayne writes        >
Me                          >>
  [... careful explanation of Hamming codes elided ...]
>>Now what if you want to correct two-bit errors?  I think you can just
>>repeat the process (make a hamming code for a 12-bit word), and go
>>through two iterations of correction and make out all right most of
>>the time (all of the time?).  But that's not even close to rigorous.
>
>I'm not sure about this. I don't think error correcting for two bit
>errors is that simple, but I haven't pursued it. Any other ideas or
>better yet informed answers?

First off, it's occurred to me that unless you permute the bits, a
second iteration of Hamming coding would be pretty silly.  If you do
permute them (say reverse order), well... I still need to think about
that. 

Second off, it should be noted that some extra corrective redundancy
can be had by applying Hamming to a smaller word.  But as the word
gets smaller you approach pure redundancy.  Here permuting might be
used to get you away from errors which hit several adjacent bits.

And, I suppose the most general way of correcting N bit errors is with
redundancy greater than 1+2N.  

Still, I think the original question that Ray (or whoever it was) was
posing was how do you get a hamming distance greater than 2 between
the encodings of a word of data.  I'm inclined to say take a Grey
code, and choose every n-th number (where n is a number that depends
on the number of bits distance you want).  

0 0 0 0 0 0    start: 0 bits distance 
0 0 0 0 0 1    n=1    1 bit distance  (straight encoding)
0 0 0 0 1 1    n=2    2 bit distance  (analogous to hamming code)
0 0 0 0 1 0
0 0 0 1 1 0
0 0 0 1 1 1    n=5    3 bit distance ????? (won't wrap around properly)
0 0 0 1 0 1

and I think n=10 is 4 bits distance.

I dunno, I still feel like I'm missing something pretty basic.

(Thanks for posting Bill, I think you did capture the exact definition
of Hamming code.)

Raul Rockwell

cs450a03@uc780.umd.edu (03/26/91)

Please note, in my gauche analogy of Grey code to Hamming, that parity
has a bit distance of 2, Hamming has a bit distance of 3 (and is not
distributed linearly), and "2 bit error correcting" ought to have a
bit distance of at least 5 (4 ought to function, but leave no error
detection ability).

*sigh*

Also note that coming up with a distribution is not sufficient.  There
must also exist a method to correct the errors that can occur.

Raul Rockwell

mayne@delta.cs.fsu.edu (William Mayne) (03/26/91)

In article <26MAR91.00093317@uc780.umd.edu> cs450a03@uc780.umd.edu writes:
>  [... careful explanation of Hamming codes elided ...]

Thanks, but as I think back on it it wasn't so careful.
When I gave the indexes of the check bits it should obviously have been:

2**(log2(n)),...,4,2,1

>>I'm not sure about this. I don't think error correcting for two bit
>>errors is that simple, but I haven't pursued it. Any other ideas or
>>better yet informed answers?
>
>First off, it's occurred to me that unless you permute the bits, a
>second iteration of Hamming coding would be pretty silly.  If you do
>permute them (say reverse order), well... I still need to think about
>that.

You are right, at least about the first part. Also I can't think of
any way to permute the bits for a second application of a Hamming code
to correct 2 bit errors. But surprisingly (to me) the number of
redundant bits needed to correct errors in up to 2 bits is only about
twice that required for 1 bit correction. I would have expected more
than that.

>Second off, it should be noted that some extra corrective redundancy
>can be had by applying Hamming to a smaller word.  But as the word
>gets smaller you approach pure redundancy.  Here permuting might be
>used to get you away from errors which hit several adjacent bits.

The method I described of slicing words of a message into bits and
applying Hamming coding to corresponding bits from consecutive words
will work to correct errors in n consecutive bits, where n is the
word size, whether or not the n or fewer consecutive corrupted bits
cross a word boundary.

>And, I suppose the most general way of correcting N bit errors is with
>redundancy greater than 1+2N.

The exact amount of redundancy needed for a word size of n (where n
counts both bits of useful data and check bits) is:

n=1 -> log2(n+1)  [This proves the Hamming code is the most efficient
                  possible to correct 1 bit errors.]

n=2 -> log2(n**2+1) [Only twice as many check bits.]

n=3 -> log2(n**3-2*n**2+2*n+1) [Unless I've made an algebraic mistake,
                               which is not unlikely.]

The general rule for this is derived as follows:

For n bits there are 2**n distinct configurations.
If there are up to m bits which may be corrupted there are f(m,n)
different ways that one intended message can be code, where f(m,n)
is as defined below. So n bits with up to m errors can contain
(2**n)/f(m,n) different messages, or log2 of that many bits of
data. Thus the number of bits devoted to error correction is
log2(f(m,n)).

To construct f(m,n) we have to ask how many configurations there
are for a given message with Hamming distance less than or equal
to m. For m=1 this is n+1 (the correct version plus one for each
possible error position). For m=2 is is n(n-1)+n+1=n**2+1,
i.e. n(n-1) with 2 bad bits, n with 1 bad bit, and 1 with no bad bits.
For m=3 it is n(n-1)(n-2)+n(n-1)+n+1. It is hard to represent the
general form in ascii, but you just keep adding terms on the left
as n(n-1)(n-2)...(n-m-1). Obviously the result with be O(n**m).

This establishes the lower limit on redundancy. The Hamming code for
m=1 is optimal and the specific coding used is elegant since it is
so easy to understand and compute. Unfortunately I don't any elegant
codes for m>1, or if there is a general rule to construct them in
an elegant way.

CAVEAT: This isn't really my specialty. I've had only minimal
formal instruction in error correcting codes and derived the
formula and proof that Hamming codes are optimal myself. If
I've made some gross error either in my thinking or in typing
it in I expect I'll hear about it. Asbestos suit on.

>[Suggestion based on Grey code omitted.]

I believe the idea is correct, if I understand you correctly.
But this doesn't answer the question of how to actually map
data to and from the redundant representation needed. It would be
nice if there is a simple algorithm such as the one for the Hamming
code which could be applied for any number of errors. Unfortunately
I don't have time to pursue that.

Bill Mayne (mayne@cs.fsu.edu)

ddr@cs.edinburgh.ac.uk (Doug Rogers) (03/29/91)

The first posting on this was flawed in that it did not take acount of the check bits themselves being faulty.

I've always thought of hamming codes in terms of the binary value of the bit

Example for 16 bits of which 

        Bit address	Purpose
	0  0000		overall parity check for 2 bit error detection		
	1  0001		check bit 
	2  0010		check bit
	3  0011		data
	4  0100		check bit
	5  0101		data
	6  0110		data
	7  0111		data
	8  1000		check bit
	9  1001		data
	A  1010		data
	B  1011		data
	C  1100		data
	D  1101		data
	E  1110		data
	F  1111		data



By making the check sum for all bits with one of their address bits set at 1 be 
even, then the offending error is found by anding the addresses of the check bits that fail on receipt. 

For actual implementation the addresses may be shuffled to combine the check bits at one end of the message. This though illustrates that :
   s':=lg2(s')+s+1

where s' is the total number of bits needed and s is the number of bits available for
data.
which is not soluable to give an easy equation for the number of check bits needed.
-- 
Douglas Rogers                     JANET: ddr@uk.ac.ed.lfcs
Department of Computer Science     UUCP:  ..!mcvax!ukc!lfcs!ddr
University of Edinburgh            ARPA:  ddr%lfcs.ed.ac.uk@nsfnet-relay.ac.uk
Edinburgh EH9 3JZ, UK.             Tel:   031-650 5172 (direct line)

mayne@delta.cs.fsu.edu (William Mayne) (03/30/91)

In article <8380@skye.cs.ed.ac.uk> ddr@cs.edinburgh.ac.uk (Doug Rogers) writes:
>The first posting on this was flawed in that it did not take acount of the check bits themselves being faulty.
>
>I've always thought of hamming codes in terms of the binary value of the bit
>
>Example for 16 bits of which 
>
>        Bit address	Purpose
>	0  0000		overall parity check for 2 bit error detection		
>	1  0001		check bit
>[Rest of table omitted]

For brevity and generality, the check bits are exactly those whose bit
addresses are powers of 2, namely 1, 2, 4, 8...
The property of interest for the purpose of understanding the check
calculation is that the binary representation has just one 1.

Numbering a bit 0 and using it for overall parity is separate from
the Hamming code. It is a good idea since the Hamming code which
starts numbering bits with 1 is most efficient when the word size
is a Mersenne (sp?) number (2^n-1) and common word sizes are 2^n.
In other words that extra bit would be useless in a pure Hamming
code, 2^n bits can contain no more data bits than 2^n-1, so the 
extra bit in a word size of 2^n might as well be used as a parity
bit as suggested. Otherwise it would go to waste. The reason is
clear. If the Hamming code were applied over the whole word the
last bit would be numbered 2^n and that would be another check bit.

>By making the check sum for all bits with one of their address bits set at 1 be 
>even, then the offending error is found by anding the addresses of the check bits that fail on receipt. 

I find this confusing and probably wrong. I say "probably wrong" because I
frankly don't understand it clearly enough to apply and test it. I suggest
the following alternative description.

The receiver's calculation of the check, which uses only the bits
numbered 1 and higher is this:

check:=0;
for i=1 to n /* n=highest bit address */
  if bit(i)=1 then check:=check XOR i /* bitwise exclusive or of binary */
if check not = 0 then           /* check = address of error bit */
  bit(check) = bit(check) XOR 1 /* correct bit(check) */

The key to the Hamming coding is that the check bits are calculated
to make the check "sum" come out 0 if there are no errors. The sender
does this by taking the XOR of the bit positions of all the data bits
which are 1. This will be a number <= n. That number is then used to
set the check bits. Say the check on the data bits only came out to
be 6 (binary 0110). Then set check bits 4 and 2 (4+2=6). Now the
check over all the bits (except the parity bit) will be 0. If
any one of the data or check bits is changed the check sum will be
the address of that bit.

The added parity check bit is calculated by the sender after the
check bits are set. Either even or odd parity can be used for this.

The receiver performs the Hamming correction shown above before
checking the parity, which is intended to detect two bit errors.
(No pun intended, though I don't think this would be a pun in
the U.K. anyway. I note that the previous post was from Scotland.)

It is only possible to detect a two bit error, not to repair it.
By the way, it is ironic to note that there is a 1 bit error which
cannot be distinguished from a two bit error and therefore can't
be repaired. That is if the parity bit, which played no part in the
Hamming code, is flipped.

The two bit error detection works because any two bit error not
involving bit 0 will cause the calculated check to be non-zero,
causing the Hamming check to change 1 bit (probably (certainly?)
not one of the two which were wrong, it turns out). The parity would
have been correct before, so the the Hamming correction will cause the
resulting parity to be wrong. A two bit error involving bit 0 will be
detected because the Hamming correction will correctly repair the other
bad bit, but the parity bit itself will not be corrected by the Hamming
correction.

>For actual implementation the addresses may be shuffled to combine the check bits at one end of the message. This though illustrates that :
>   s':=lg2(s')+s+1
>
>where s' is the total number of bits needed and s is the number of bits available for
>data.
>which is not soluable to give an easy equation for the number of check bits needed.

This is actually easier than it seems. Of s' bits exactly log2(s') will be
used for check bits, which is to say just the number of bits whose addresses
are powers of 2. The rest are available for data, so s=s'-log2(s') or
s'=s+log2(s'). This ignores the extra 1 for the parity bit which, as
shown above, isn't actually part of the Hamming code scheme.

I wouldn't say that it is difficult to find s' given s. The log2 calculation
asks only for the largest integer in log2(s'). As a first approximation try
substituting log2(s). Chances are when truncated log2(s) and log2(s') will
be the same. Next calculate s'=s+log2(s). If it happens that log2(s') is not
actually = log2(s) just increase s' by 1.

If s'-log2(s') is bigger than the s you started with it just means that for
the log2(s') check bits you have more data bits. If you don't need
them subtract the extra bits from s' for the actual number of total bits
you need to encode s data bits. (This might be clearer if I renamed s and
s' at each step, but I find too names confusing rather than helpful.)

Theory: The trick to correcting a block containing up to n error bits is
to map the data bit string on to a larger bit string containing check
bits in such a way that the Hamming distance between the encoding of
any two bit strings is greater than n. The Hamming distance between any
two bit strings of equal size is just the number of bit positions at
which the two strings are different, i.e. the number of 1s in the
bitwise XOR of the strings. Then if n or fewer bits are changed the
result cannot be the same as the proper image of any other data bits.
Correction is accomplished by determing which data would yield a coded
string whose Hamming distance from the coded string actually received
is n or less. This is guaranteed to be unique. Another way of looking
at it is to find a way to convert the corrupted block to a valid one
by changing n or fewer bits. The great elegance of the Hamming code
described above for the case of n=1 is that it is the most efficient
possible (i.e. it requires the fewest possible check bits for any given
number of data bits) and it is easy to calculate. I don't know how
to generalize it for n>1, or if any similarity elegant algorithms
for large n are known.

Bill Mayne (mayne@cs.fsu.edu)