cbn@mace.cc.purdue.edu (Robert S. Unoki) (07/11/90)
I'm currently considering purchasing a modem with data compression for my own personal use. However, I am unclear on exactly how the data compression works. I understand that all compression is done at the hardware level to effectively increase throughput by factors of 2:1 (MNP5) or 4:1 (V.42bis). I also realize that modems on both ends of the connection must share the same compression algorithms. I intend to purchase a 2400 modem that operates using either of the above compression schemes. Does this mean that I will basically have a 4800 baud connection using MNP5 or 9600 baud using V.42bis? Would these be the settings of my communications software? Thanks in advance... Rob Unoki cbn@mace.cc.purdue.edu Programmer, C-SPAN Public Affairs Video Archives
tnixon@hsfmsh.UUCP (Toby Nixon) (07/12/90)
In article <5084@mace.cc.purdue.edu>, Robert S. Unoki writes:
- I'm currently considering purchasing a modem with data compression
- for my own personal use. However, I am unclear on exactly how the
- data compression works. I understand that all compression is done at
- the hardware level to effectively increase throughput by factors of
- 2:1 (MNP5) or 4:1 (V.42bis). I also realize that modems on both
- ends of the connection must share the same compression algorithms.
Briefly, MNP5 works by keeping counts of the frequency of occurrence
of individual characters in the data stream. A table is kept sorted
so that the most frequently-occurring characters appear at the
beginning of the table. To transmit a character, you transmit its
POSITION in this table, Huffman-coded (the lowest values take four
bits to send; the highest values take 12 bits to send). If your
data is made up of only characters which appear so frequently that
their position can be sent in 4 bits, then you get 2-to-1
compression. In reality, English text compresses to an average of
about 1.6-to-1 with MNP5, but when you combine this with the
stripping of start and stop bits done by MNP4 (and V.42 LAPM), you
can see 2-to-1 throughput (but it's dependent on the data you're
sending).
V.42bis uses an entirely different technique, commonly known as
Lempel-Ziv-Welch. It builds a tree-structured linked list of
strings of characters, constantly adding new characters to extend
the length of existing strings and "pruning" infrequently-referenced
"leaf nodes" to recover places to put them. A string is transmitted
by sending the position in the tree of the LAST character in the
string; the receiver recovers the data by following the links up the
tree to the "root node" (first character). A string can be from 1
to 250 characters in length, and in normal English text, depending
on the maximum number of nodes you have storage for, you can get an
average string length of somewhere around 4-5 characters, giving
4-to-1 compression when combined with start- and stop-bit stripping
(it takes about 12 bits to send the position in the dictionary).
I can go into this in more detail if you like. But remember this
essential difference: MNP5 takes FIXED-LENGTH objects and sends
them using a VARIABLE-LENGTH code; V.42bis takes VARIABLE-LENGTH
objects and sends them using a FIXED-LENGTH code.
- I intend to purchase a 2400 modem that operates using either of the
- above compression schemes. Does this mean that I will basically
- have a 4800 baud connection using MNP5 or 9600 baud using V.42bis?
- Would these be the settings of my communications software?
The actual throughput you see is dependent on the redundancy
(compressibility) of the data. The 2-to-1 and 4-to-1 are for
English text (like this; lower-case, lots of spaces, fairly normal
vocabulary, etc.) If you're sending binary files or
previously-compressed data (like news feeds), you won't see that
level of compression (if any). But for interactive work, the
compression definitely is an advantage.
-- Toby
-----------------------------------------------------------------------------
Toby Nixon, Principal Engineer Fax: +1-404-441-1213 Telex: 6502670805
Hayes Microcomputer Products Inc. Voice: +1-404-449-8791 CIS: 70271,404
Norcross, Georgia, USA BBS: +1-404-446-6336 MCI: TNIXON
Telemail: T.NIXON/HAYES AT&T: !tnixon
UUCP: ...!uunet!hayes!tnixon Internet: hayes!tnixon@uunet.uu.net
MHS: C=US / AD=ATTMAIL / PN=TOBY_L_NIXON / DD=TNIXON
-----------------------------------------------------------------------------
roy@esp.ics.uci.edu (John Roy) (07/13/90)
In <3505@hsfmsh.UUCP> tnixon@hsfmsh.UUCP (Toby Nixon) writes: >I can go into this in more detail if you like. But remember this >essential difference: MNP5 takes FIXED-LENGTH objects and sends >them using a VARIABLE-LENGTH code; V.42bis takes VARIABLE-LENGTH >objects and sends them using a FIXED-LENGTH code. >The actual throughput you see is dependent on the redundancy >(compressibility) of the data. The 2-to-1 and 4-to-1 are for >English text (like this; lower-case, lots of spaces, fairly normal >vocabulary, etc.) If you're sending binary files or >previously-compressed data (like news feeds), you won't see that >level of compression (if any). But for interactive work, the >compression definitely is an advantage. > -- Toby So if I'm sending ASCII digits would I do better with MNP5 or V.42bis? I would guess MNP5, but... Yes, I know this could be done by translating into binary or even BCD and then compressing. But if I can get similar results by just purchasing the right modem, I'd be happy. jmar -- John M.A. Roy 714/856-5039 TRINTECH USA 714/757-7757 ICS Dept., Univ. Calif., Irvine CA 92714 18500 Von Karman, #410 Internet: roy@ics.uci.edu Irvine, CA 92715
tnixon@hsfmsh.UUCP (Toby Nixon) (07/16/90)
In article <269DE3BD.28935@ics.uci.edu>, John Roy asks:
- So if I'm sending ASCII digits would I do better with MNP5 or V.42bis?
- I would guess MNP5, but...
No! Remember, MNP5 compresses individual characters into, at a
minimum, 4 bits. This means that the absolute BEST it can do is
2-to-1 compression (unless you're sending long strings of repeating
characters, then it's run-length-encoding feature kicks in, but you
rarely send files with frequent long strings of the same character).
But with V.42bis, the algorithm can compress from 1 to 250
characters into a single "codeword" (a codeword is from 9 to,
usually, 12 bits, depending on the maximum "dictionary size"
supported). V.42bis typically acheives an average of about 4-to-1
compression on English text files. You will always (unless you
purposely construct a file that doesn't compress well with V.42bis)
get better compression with V.42bis than with MNP5.
- Yes, I know this could be done by translating into binary or even BCD
- and then compressing. But if I can get similar results by just
- purchasing the right modem, I'd be happy.
You'll be better off with a V.42bis modem. All of the ones on the
market now also support MNP5, so you'll have backward compatibility
with the installed base. It uses MNP5 when talking to modems that
don't have V.42bis, but automatically uses V.42bis for better
performance when connected with another V.42bis modem.
-- Toby
-----------------------------------------------------------------------------
Toby Nixon, Principal Engineer Fax: +1-404-441-1213 Telex: 6502670805
Hayes Microcomputer Products Inc. Voice: +1-404-449-8791 CIS: 70271,404
Norcross, Georgia, USA BBS: +1-404-446-6336 MCI: TNIXON
Telemail: T.NIXON/HAYES AT&T: !tnixon
UUCP: ...!uunet!hayes!tnixon Internet: hayes!tnixon@uunet.uu.net
MHS: C=US / AD=ATTMAIL / PN=TOBY_L_NIXON / DD=TNIXON
-----------------------------------------------------------------------------
urlichs@smurf.sub.org (Matthias Urlichs) (07/17/90)
In comp.dcom.modems, article <3547@hsfmsh.UUCP>,
tnixon@hsfmsh.UUCP (Toby Nixon) writes:
< In article <269DE3BD.28935@ics.uci.edu>, John Roy asks:
<
< - So if I'm sending ASCII digits would I do better with MNP5 or V.42bis?
< - I would guess MNP5, but...
<
< No! Remember, MNP5 compresses individual characters into, at a
< minimum, 4 bits. [...]
< V.42bis typically achieves an average of about 4-to-1
< compression on English text files. You will always (unless you
< purposely construct a file that doesn't compress well with V.42bis)
< get better compression with V.42bis than with MNP5.
<
Grossly generalized: MNP5 relies on a limited character set, and V.42bis
relies on recurring sequences of characters.
Standard ASCII text has about 6.5 bits per character, so MNP5 gives you about
15% compression. On the other hand, any given character (in English text) can
only be followed by certain others; the cumulative effect of this is a rather
good compression. Program code is even better.
So, about anything compresses better with V.42bis, including executable code,
but excluding (this is from experience) digitized sound. Pictures are a border
case, depending on whether they're dithered.
--
Matthias Urlichs -- urlichs@smurf.sub.org -- urlichs@smurf.ira.uka.de
Humboldtstrasse 7 - 7500 Karlsruhe 1 - FRG -- +49+721+621127(Voice)/621227(PEP)