[comp.dcom.modems] What are the details of MNP-5

MMK102@psuvm.psu.edu (06/06/91)

Can someone please explain to me the details of this form of compression.
Thank you.
-------------------------------------------------------------------------
Michael Kercsmar                        |
MMK102 @ PSUVM (BitNet)                 |      Post no bills
mmk102 @ psuvm.psu.edu (Internet)       |
                                        |

tnixon@hayes.uucp (06/06/91)

In article <91156.161304MMK102@psuvm.psu.edu>,
<MMK102@psuvm.psu.edu> writes: 

> Can someone please explain to me the details of this form of compression.

MNP5 is a modified adaptive Huffman-based compression scheme.  The 
most frequently-occurring characters are encoded using fewer bits, 
and the less frequently-occurring characters are encoded using more 
bits.  

An MNP5 implementation basically keeps three tables:  (1) the number
of times each of the 256 possible characters has occurred (one entry
in this table is incremented for each character sent), (2) the rank 
order of characters based on frequency (which is sorted after each 
character is sent), and (3) translation from rank order back to 
character (which is kept aligned with the second table).  The tables 
are initialized with all counts 0, and the characters in ascending
order by their binary values.

When a character is received from the DTE, it is first encoded 
according to the current state of the tables, and then the tables 
are updated to reflect the fact that that character occurred in the 
data.  The receiver decodes the character using its current tables, 
and then updates its tables the same way.  Thus, the transmitter and 
receiver tables are always kept in synchronization.

The encoding scheme is pretty simple:  three bits for length 
(indicating the number of bits, minus one, that follow the length 
code), followed by 1 to 8 data bits.  The most frequently occurring 
character is encoded as 0000, next as 0001, next as 00100, then 
00101, etc., up to 11111111110; 1111111111 is followed by 1 more 
bit, so that you have 256 total codes, not 255.  If you calculate 
this out, you find that the AVERAGE token size is 9 bits, which is 
why MNP5 is said to expand random data by 12%.  There is no way to 
turn off compression once it is on (which there IS in V.42bis), so 
you're stuck sending the expanded data if this happens.

A token can be split across frames (part in one frame, part in the 
next), and there is a way to flush out the last few bits of the last 
character if the line goes idle.

If you need more information than this, I suggest that you contact 
Microcom for the MNP5 documentation.  If you're thinking of 
implementing it, I caution you that even though Microcom has made 
the MNP5 specification public, it is protected by at least one 
patent that you would need to license.

-- 
Toby Nixon, Principal Engineer    | Voice   +1-404-840-9200  Telex 151243420
Hayes Microcomputer Products Inc. | Fax     +1-404-447-0178  CIS   70271,404
P.O. Box 105203                   | UUCP uunet!hayes!tnixon  AT&T    !tnixon
Atlanta, Georgia  30348  USA      | Internet       hayes!tnixon@uunet.uu.net