[comp.dcom.modems] Data Compression

cbn@mace.cc.purdue.edu (Robert S. Unoki) (07/11/90)

I'm currently considering purchasing a modem with data compression for my own
personal use.  However, I am unclear on exactly how the data compression works.
I understand that all compression is done at the hardware level to effectively
increase throughput  by factors of 2:1 (MNP5) or 4:1 (V.42bis).  I also realize
that modems on both ends of the connection must share the same compression 
algorithms.

I intend to purchase a 2400 modem that operates using either of the above 
compression schemes.  Does this mean that I will basically have a 4800 baud
connection using MNP5 or 9600 baud using V.42bis?  Would these be the settings
of my communications software?  

Thanks in advance...

Rob Unoki
cbn@mace.cc.purdue.edu
Programmer, C-SPAN Public Affairs Video Archives

tnixon@hsfmsh.UUCP (Toby Nixon) (07/12/90)

In article <5084@mace.cc.purdue.edu>, Robert S. Unoki writes:

- I'm currently considering purchasing a modem with data compression
- for my own personal use.  However, I am unclear on exactly how the
- data compression works. I understand that all compression is done at
- the hardware level to effectively increase throughput  by factors of
- 2:1 (MNP5) or 4:1 (V.42bis).  I also realize that modems on both
- ends of the connection must share the same compression algorithms. 

Briefly, MNP5 works by keeping counts of the frequency of occurrence 
of individual characters in the data stream.  A table is kept sorted 
so that the most frequently-occurring characters appear at the 
beginning of the table.  To transmit a character, you transmit its 
POSITION in this table, Huffman-coded (the lowest values take four 
bits to send; the highest values take 12 bits to send).  If your 
data is made up of only characters which appear so frequently that 
their position can be sent in 4 bits, then you get 2-to-1 
compression.  In reality, English text compresses to an average of 
about 1.6-to-1 with MNP5, but when you combine this with the
stripping of start and stop bits done by MNP4 (and V.42 LAPM), you
can see 2-to-1 throughput (but it's dependent on the data you're 
sending).

V.42bis uses an entirely different technique, commonly known as 
Lempel-Ziv-Welch.  It builds a tree-structured linked list of
strings of characters, constantly adding new characters to extend
the length of existing strings and "pruning" infrequently-referenced
"leaf nodes" to recover places to put them.  A string is transmitted
by sending the position in the tree of the LAST character in the 
string; the receiver recovers the data by following the links up the 
tree to the "root node" (first character).  A string can be from 1 
to 250 characters in length, and in normal English text, depending 
on the maximum number of nodes you have storage for, you can get an 
average string length of somewhere around 4-5 characters, giving 
4-to-1 compression when combined with start- and stop-bit stripping 
(it takes about 12 bits to send the position in the dictionary).

I can go into this in more detail if you like.  But remember this 
essential difference:  MNP5 takes FIXED-LENGTH objects and sends 
them using a VARIABLE-LENGTH code; V.42bis takes VARIABLE-LENGTH 
objects and sends them using a FIXED-LENGTH code.

- I intend to purchase a 2400 modem that operates using either of the
- above compression schemes.  Does this mean that I will basically
- have a 4800 baud connection using MNP5 or 9600 baud using V.42bis? 
- Would these be the settings of my communications software? 

The actual throughput you see is dependent on the redundancy 
(compressibility) of the data.  The 2-to-1 and 4-to-1 are for 
English text (like this; lower-case, lots of spaces, fairly normal 
vocabulary, etc.)  If you're sending binary files or 
previously-compressed data (like news feeds), you won't see that 
level of compression (if any).  But for interactive work, the 
compression definitely is an advantage.

	-- Toby

-----------------------------------------------------------------------------
Toby Nixon, Principal Engineer     Fax:    +1-404-441-1213  Telex: 6502670805
Hayes Microcomputer Products Inc.  Voice:  +1-404-449-8791  CIS:    70271,404
Norcross, Georgia, USA             BBS:    +1-404-446-6336  MCI:       TNIXON
                                   Telemail: T.NIXON/HAYES  AT&T:     !tnixon
UUCP:   ...!uunet!hayes!tnixon     Internet:        hayes!tnixon@uunet.uu.net
MHS:    C=US / AD=ATTMAIL / PN=TOBY_L_NIXON / DD=TNIXON
-----------------------------------------------------------------------------

roy@esp.ics.uci.edu (John Roy) (07/13/90)

In <3505@hsfmsh.UUCP> tnixon@hsfmsh.UUCP (Toby Nixon) writes:
>I can go into this in more detail if you like.  But remember this
>essential difference:  MNP5 takes FIXED-LENGTH objects and sends
>them using a VARIABLE-LENGTH code; V.42bis takes VARIABLE-LENGTH
>objects and sends them using a FIXED-LENGTH code.

>The actual throughput you see is dependent on the redundancy
>(compressibility) of the data.  The 2-to-1 and 4-to-1 are for
>English text (like this; lower-case, lots of spaces, fairly normal
>vocabulary, etc.)  If you're sending binary files or
>previously-compressed data (like news feeds), you won't see that
>level of compression (if any).  But for interactive work, the
>compression definitely is an advantage.

>	-- Toby

So if I'm sending ASCII digits would I do better with MNP5 or V.42bis?
I would guess MNP5, but...

Yes, I know this could be done by translating into binary or even BCD
and then compressing.  But if I can get similar results by just
purchasing the right modem, I'd be happy.

jmar
--
John M.A. Roy 714/856-5039			TRINTECH USA 714/757-7757
ICS Dept., Univ. Calif., Irvine CA 92714	18500 Von Karman, #410
Internet: roy@ics.uci.edu  			Irvine, CA 92715

tnixon@hsfmsh.UUCP (Toby Nixon) (07/16/90)

In article <269DE3BD.28935@ics.uci.edu>, John Roy asks:

- So if I'm sending ASCII digits would I do better with MNP5 or V.42bis?
- I would guess MNP5, but...

No!  Remember, MNP5 compresses individual characters into, at a 
minimum, 4 bits.  This means that the absolute BEST it can do is 
2-to-1 compression (unless you're sending long strings of repeating 
characters, then it's run-length-encoding feature kicks in, but you 
rarely send files with frequent long strings of the same character). 
But with V.42bis, the algorithm can compress from 1 to 250 
characters into a single "codeword" (a codeword is from 9 to, 
usually, 12 bits, depending on the maximum "dictionary size" 
supported).  V.42bis typically acheives an average of about 4-to-1 
compression on English text files.  You will always (unless you 
purposely construct a file that doesn't compress well with V.42bis) 
get better compression with V.42bis than with MNP5.

- Yes, I know this could be done by translating into binary or even BCD
- and then compressing.  But if I can get similar results by just
- purchasing the right modem, I'd be happy.

You'll be better off with a V.42bis modem.  All of the ones on the 
market now also support MNP5, so you'll have backward compatibility 
with the installed base.  It uses MNP5 when talking to modems that 
don't have V.42bis, but automatically uses V.42bis for better 
performance when connected with another V.42bis modem.

	-- Toby

-----------------------------------------------------------------------------
Toby Nixon, Principal Engineer     Fax:    +1-404-441-1213  Telex: 6502670805
Hayes Microcomputer Products Inc.  Voice:  +1-404-449-8791  CIS:    70271,404
Norcross, Georgia, USA             BBS:    +1-404-446-6336  MCI:       TNIXON
                                   Telemail: T.NIXON/HAYES  AT&T:     !tnixon
UUCP:   ...!uunet!hayes!tnixon     Internet:        hayes!tnixon@uunet.uu.net
MHS:    C=US / AD=ATTMAIL / PN=TOBY_L_NIXON / DD=TNIXON
-----------------------------------------------------------------------------

urlichs@smurf.sub.org (Matthias Urlichs) (07/17/90)

In comp.dcom.modems, article <3547@hsfmsh.UUCP>,
  tnixon@hsfmsh.UUCP (Toby Nixon) writes:
< In article <269DE3BD.28935@ics.uci.edu>, John Roy asks:
< 
< - So if I'm sending ASCII digits would I do better with MNP5 or V.42bis?
< - I would guess MNP5, but...
< 
< No!  Remember, MNP5 compresses individual characters into, at a 
< minimum, 4 bits.  [...]
<  V.42bis typically achieves an average of about 4-to-1 
< compression on English text files.  You will always (unless you 
< purposely construct a file that doesn't compress well with V.42bis) 
< get better compression with V.42bis than with MNP5.
< 
Grossly generalized: MNP5 relies on a limited character set, and V.42bis
relies on recurring sequences of characters.

Standard ASCII text has about 6.5 bits per character, so MNP5 gives you about
15% compression. On the other hand, any given character (in English text) can
only be followed by certain others; the cumulative effect of this is a rather
good compression. Program code is even better.

So, about anything compresses better with V.42bis, including executable code,
but excluding (this is from experience) digitized sound. Pictures are a border
case, depending on whether they're dithered.

-- 
Matthias Urlichs -- urlichs@smurf.sub.org -- urlichs@smurf.ira.uka.de
Humboldtstrasse 7 - 7500 Karlsruhe 1 - FRG -- +49+721+621127(Voice)/621227(PEP)