[comp.protocols.iso.x400] DATA Compression and X400 standards

JPALME@qz.qz.se (Jacob Palme QZ) (10/28/90)

One way of introducing "data compression" into X.400 would be
to introduce a new body part type, e.g. "IA5 compressed according
to compression algorithm X". Conversion between this new body
part and ordinary IA5 would be simple, and would have to be done
when transferring a message into a system which cannot understand
this new body type.

In fact, what I am describing above is actually what we are already
doing in the SuperKOM message system. We do however always transfer
from "compressed IA5" to ordinary "IA5" whenever a message is
transmitted to a non-SuperKOM system, since we do not assume
that any other X.400 system uses compression just now.

Eppenberger@verw.switch.ch (Urs Eppenberger) (10/29/90)

From my point of view data compression should not be handled within the
framework of X.400. This is purely a matter of the lower layers.
We have already a mess with all the body parts, I can't see any reason
to add compressed ones.
If some X.400 implementation store messages in a compressed format on disk,
I do not care and see also no need for standardisation.
Perhaps this view is too easy?

Kind regards,
Urs.

vcerf@NRI.Reston.VA.US (10/29/90)

Urs,

your view strikes me as potentially off the mark in the
sense that compression methods may vary in their usefullness
depending on the kind of material and encoding employed.
As a result, it may be important to perform the compression
with knowledge of the type of content. This tends to place
the application of compression rather high in the protocol
architecture rather than below the level of X.400, for
example.

Just to give you one example, I recently got a message
advising I could pick up a compressed postscript file
via FTP from Germany. The compression took place "above"
the level of FTP and decompression is applied after
receipt of the file. I suppose you could argue this
should have somehow been done at a lower layer, but I
think the argument is not convincing on its surface.

I appreciate your apparent distress with all the various
body types. I suppose this will get worse over time as
people want to convey proprietary objects. Best guess
is that things will settle down as we discover particular
encodings and object types which seem to be the most useful.

Vint Cerf

Eppenberger@verw.switch.ch (Urs Eppenberger) (10/29/90)

Dear Vint,

you are perfectly right with your FTP example.
But I was talking about ISO standards, where the same mistakes should note
be repeated.
There are two reasons for compression:
1. Save disk space
It is up to the UA how it stores info on disk, standardisation is not needed.
2. faster transmission
Here it is the job of the lower layers to compress the protocoll units.
Users should not be bothered with that at all!

Kind regards,

Urs Eppenberger

ms6b+@andrew.cmu.edu (Marvin Sirbu) (10/29/90)

Urs,

You miss the point that Vint was trying to make.  Shannon's theory of
information says that the more you know about the message set, the more
effectively you can compress it.  Thus, if I send a multi-media message,
I want to use two dimensional run length encoding to compress the image
portion, but a very different scheme (LZW?) to compress the text.
With image alone, I would use a different encoding table if the image is
scanned at 600 dpi than I would use if it is scanned at 200 dpi.  In fact,
using an inappropriate encoding scheme can actually _increase_ the number of
bits a message takes.
The inefficiency of doing ecoding only at lower layers is well illustrated
by the problem of telephone circuit encoding.  If I intend to use the circuit
only for voice traffic, I can easily encode it as 16 kbps or 8 kbps.  If
I want the channel to carry any kind of 3300 Hz bandwidth information
(e.g. modem traffic as well) than the best I can do is ADPCM at 32 kbps.

While it may appear simpler to use a single compression scheme at a layer
below the application, such an approach may sacrifice substantial
potential efficiency gains in transmission.


Marvin Sirbu
CMU

vcerf@NRI.Reston.VA.US (10/30/90)

Urs,

I gather we are considering different reasons for compression and
different layers in which it might be practiced. I agree that
with respect to local compression inside a UA, standardization
is less necessary - although I suppose even there some standards
might be welcome if it led to widely available hardware assistance.

Vint

neufeld@cs.ubc.ca (Gerald Neufeld) (10/30/90)

It seems to me that you can still get the advantage of different compression
algorithms based on knowledge of the data (as Marvin points out) and still do
the compression below the application layer.  Couldn't this  be done by using
different transfer syntaxes for each of the different types of data? The
compression can then be done at the presentation layer.

Gerald
UBC

enag@ifi.uio.no (10/31/90)

In article <531*Eppenberger@verw.switch.ch>, Urs Eppenberger writes:

   From my point of view data compression should not be handled within
   the framework of X.400. This is purely a matter of the lower
   layers.  We have already a mess with all the body parts, I can't
   see any reason to add compressed ones.

   If some X.400 implementation store messages in a compressed format
   on disk, I do not care and see also no need for standardisation.
   Perhaps this view is too easy?

Aren't all those layers supposed to make things easier?

Seriously, does anybody know of attempts to standardize compression
schemes so they can be negotiated by the (re)presentation entities at
connection establishment time?

CCITT has recommended compression schemes at the data link layer for
low-speed PSTN connections, i.e. in the V-series (V.42, I believe).  I
don't know whether it is possible to negotiate this at a higher level,
and whether it is possible to propagate PDU boundaries so that the
data link layer algorithm does not reduce the average transmission
speed in the presence of quick turnarounds, small windows, etc.

Just curious.

--
[Erik Naggum]		Naggum Software; Gaustadalleen 21; 0371 OSLO; NORWAY
	I disclaim,	<erik@naggum.uu.no>, <enag@ifi.uio.no>
  therefore I post.	+47-295-8622, +47-256-7822, (fax) +47-260-4427
--

pww@uunet.uu.NET (Peter Whittaker) (10/31/90)

In article <Qb=4CfO00VADA1N41e@andrew.cmu.edu> ms6b+@andrew.cmu.edu (Marvin Sirbu) writes:
>Shannon's theory of
>information says that the more you know about the message set, the more
>effectively you can compress it.  Thus, if I send a multi-media message,

(a bit deleted)

>While it may appear simpler to use a single compression scheme at a layer
>below the application, such an approach may sacrifice substantial
>potential efficiency gains in transmission.
>
>

Can't help but agree that compression should be higher in the stack,
and for a variety of reasons (number 3 is the most imp, IMHO).

1) As Marvin states, you get better compression when you know what you
   are compressing (compressing data of unknown type/origin seems kinda
   silly (esp. as it could already be in its most space efficient form,
   and please correct me if I'm wrong, but compressing it could lead to
   pathological behavior where the 'compressed' data is bulkier than
   the original)).

2) Compress higher in the stack, and the lower layers have less data to move,
   i.e. less memory to manipulate, less room for transmission/reception/
   allocation/deallocation errors.

3) Compression is an example of manipulation of user data:  from the OSI
   purists perspective (I'm a purist on odd-numbered days - Happy Hallowe'en)
   the last (lowest numbered) layer to touch user data is the presentation
   layer (layer 6).  Once it gets further down, the OSI stack assumes it's
   safe to ship.  It can't assume that it's in best form to ship, but it's
   bound to heed the 'prerogative' of layer 6:  that's where ASN.1 is made,
   and where the BER are applied.

   Not to mention that when data is compressed, it has to be uncompressed
   (trivial, right?).  But how does the other side of the connection know
   data is compressed?  It seems to me that compression vs non-compression
   would be part of the context negotiations at session establishment:
   the iniatiator and responder would have to agree on what set of compression
   routines to use, if any, and how to indicate to one another that compression
   had been applied.

   My understanding of layers 4 and below (I work on the upper 3-4 layers,
   depending on how you define the application stack) is that peer-to-peer
   communication do not provide any services for such negotiation.
   (Please corect if wrong....).

   There are some more practical consioderations too (NOTE:  OSI purists
   may go into conniptions fits :@} ).

   The presentation layer (layer 6) is responsible for translation between
   network independent and host specific data representations.  It is also
   the last layer that 'knows' what data types it's handling (all that layer
   5 and below see are bits).  So, the presentation layer is the last layer
   that can make a determination as to the most effcient compression
   routine to be applied to a certain body type (or generic data type).

   Furthermore, when compressing the data, are you compressing to save
   local disk space and memory, or to save network resources?

   In the former case, X.400 (at layer 7) could call a presentation service
   element and ask it to perform some compression on a body type before
   transmission (i.e. in the case of a store and forward node:  receive the
   data, identify the data type, compress it, then store it till it's
   time to forward it.  All this depends on the store-time, of course
   (is it worth processing 10 pages of g3fax if you're only going to store
    it for ten minutes?)).

   In the latter case, the network may benefit from having compression
   applied to the machine dependent data representation (i.e. compress, then
   encode as ASN.1) or it might benefit from compression after encoding.

   The only way to know which to do is to have compression routines having
   to the presentation layer (for use before or after ASN.1 encoding), and
   to experiment, and collect some metrics.  In time, we'll (hopefully)
   have a body of experimental evidence of what-works-best-when-in-most-cases
   (or maybe somebody can work it all out in theory:  theories are easier to
   program to than experimental data).








--
Peter Whittaker         [~~~~~~~~~~~~~~~~~~~~~~~~~~]    Open Systems Integration
pww@bnr.ca              [                          ]    Bell Northern Research
Ph: +1 613 765 2064     [                          ]    P.O. Box 3511, Station C
FAX:+1 613 763 3283     [__________________________]    Ottawa, Ontario, K1Y 4H7

anand@ka (Govindaraj Padmanaban) (11/02/90)

One thing which everyone is agreed upon is that "Data Compression"
is really useful. (I hope so.)

In that case, the next question is "Who can do the compression?".
Which is the best place to put in?.

If the burden is put on the MTA, every MTA on the route has to parse
the message to get the body-parts to do the uncompress/compress.

The message format received by the MTA looks something like this.

P1 envelope  +---------------------------------+\
             |  UMPDUEnvelope (P1 envelope)    | \
             |  /*                             |  \
             |   * This is the only part each  |    MTA can use and modify
             |   * MTA has to parse and change.|    only this portion
             |   */                            |   /
             |  Origin, ContentType, Recipients|  /
             |  Trace Information etc...       | /
             +---------------------------------+
P2 message   | UMPDUContent  (P2 message)      | \
             | +-----------------------------+ |  \
             | | IM-UAPDU                    | |   \
             | |                             | |   |
             | |  +--------------------------+ |   |
             | |  | Heading                  | |   |
             | |  |  IPmsgid, originator     | |   |
             | |  +--------------------------+ |   |
             | |  |  Body Part 1             | |   |
             | |  |                          | |   |
             | |  +--------------------------+ |   UA formats this part
             | |  |  Body Part 2             | |   and submits to the MTA
             | |  |                          | |   for sending. Only UA or
             | |  +--------------------------+ |   the Gateway MTA parses it.
             | |  |            o             | |   |
             | |  |            o             | |   |
             | |  |            o             | |   |
             | |  |            o             | |   |
             | |  |                          | |   |
             | |  +--------------------------+ |   |
             | |  |  Body Part n             | |   |
             | |  |                          | |  /
             | +--+--------------------------+ | /
             +---------------------------------+/

The relaying MTA is NOT suppose to do burden of compression/uncompression
because it is not suppose to MODIFY the p2 message.

Only the UA knows about the body-part boundaries.  Each of the bodypart
can be of different type (say text, exe, GIF image etc).  A single
compression algorithm may not work for all the body parts.
Each body part has to carry the information about the algorithm
used to compress also.

Compression can't be handled by the lower layers.

In the article, <Qb=4CfO00VADA1N41e@andrew.cmu.edu> pww@uunet.uu.NET
(Peter Whittaker) writes,

>3) Compression is an example of manipulation of user data:  from the OSI
>   purists perspective (I'm a purist on odd-numbered days - Happy Hallowe'en)
>   the last (lowest numbered) layer to touch user data is the presentation
>   layer (layer 6).  Once it gets further down, the OSI stack assumes it's
>   safe to ship.
>   But how does the other side of the connection know
>   data is compressed?  It seems to me that compression vs non-compression
>   would be part of the context negotiations at session establishment:
>   the iniatiator and responder would have to agree on what set of compression
>   routines to use, if any, and how to indicate to one another that compression
>   had been applied.

It calls for all the type compression algorithm used by the bodyparts
should be known before hand to negotiate the connection.  I disagree with
that.  And also needs a basic change in the message format to carry the
compression information.  And with the single negotiated connection
you can't transfer all the messages to the next MTA. And for every message
you can't negotiate the connection also...

                                  Route 1
origin UA  ==>  MTA-10 ==>MTA-11 ==> .....   ==> MTA-1N  ==> UA (recipient)
                   |                               |
                   |                               |
                   |              route 2          |
                MTA-20 ==>MTA-21 ==> .....   ==> MTA-2M

If one MTA doesn't know of the compress method used in the message,
the routing decisions are affected to transfer the message. I hate
when this happens.

I really vote for the Sending and Receiving UAs to do the compression.
It calls for a little bit of intelligence on the UA part.  It also
makes sense.  The message originator knows about the types of body-parts
and the compresss algorithm used.  The receiving UA has to do the
uncompress operation.  The job of MTA is to transfer the message rather
than changing/modifying it on the fly.  If the receiving UA can handle
the new compress method, all the MTAs in the route need not change at all.

Any suggestions welcome.

Anand
+------------------------------------------------------------------------+
| We have not inherited the earth from our parents,                      |
|                                     but borrowed it from our children. |
| Govindaraj A Padmanaban - Novell Inc. 408-473-8643(w) 408-263-7055(h)  |
| Email: anand@novell.COM {ames | apple | mtxinu | leadsv }!excelan!anand|
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
| We have not inherited the earth from our parents,                      |
|                                     but borrowed it from our children. |
| Govindaraj A Padmanaban - Novell Inc. 408-473-8643(w) 408-263-7055(h)  |
| Email: anand@novell.COM {ames | apple | mtxinu | leadsv }!excelan!anand|
+------------------------------------------------------------------------+

Christian.Huitema@mirsa.inria.fr (Christian Huitema) (11/05/90)

Peter,

I agree with your general remark that compression could be done at the
presentation layer. In fact, we at INRIA played with presentation layer
compression for two years, and came to the conclusion that defining a
presentation transfer syntax as e.g. the stacking of a LZ or Hamming coding over
BER (or faster) coding rules is both feasible and useful. There are a couple of
problems to solve, essentially relating to the limited negociation capabilities
of the presentation protocol (how do you negociate the size of the LZ
dictionnary?) and also to the "tree" structure of the encoding (how do you handle
an EXTERNAL quoted within a compressed syntax?). The case of X.400 is much
harder to solve, however:

* The bulk of the data is within the "content", which is carried as an "octet
string".

* The exchange of messages between UA is "connectionless".

Using the presentation layer compression for X.400 would be done within a single
context, that of the envelope. Not very interesting... And the octet string
"content" is only typed by a "content identifier", pointing in principle to some
ASN.1 content description, e.g. P2. There is no place to indicate that something
else than ASN.1 BER was used for the encoding -- and the same is true for the use
of the EXTERNAL construct in the absence of presentation negociation.

Christian Huitema

jwagner@princeton.edu (11/05/90)

> I really vote for the Sending and Receiving UAs to do the compression.
> It calls for a little bit of intelligence on the UA part.  It also
> makes sense.  The message originator knows about the types of body-parts
> and the compresss algorithm used.  The receiving UA has to do the
> uncompress operation.  The job of MTA is to transfer the message rather
> than changing/modifying it on the fly.  If the receiving UA can handle
> the new compress method, all the MTAs in the route need not change at

While this approach seems attractive to me also, how does the sending UA know
the receiving UA can handle the compresssion?  This becomes even more
important if the sending UA is isolated (for example on a BITNET node).

   John Wagner

Stef@ICS.UCI.EDU (Einar Stefferud) (11/06/90)

Compression of Body Parts in X.400 P2 envelopes is very much like
adoption of any other special body part, like WordPerfect, or MSWord, or
DCA, or LOTUS Spread Sheet, or EXCEL, etc, et al.

There are two basic problems:

1.  Establish a standard definition and make it wisely available and
widely implemented so many UAs can install and use it.

2.  Figure out who can handle the defined object as a body part.

Item 2 is really hard to resolve for the global community, without
requiring a global directory to hold specific information on exactly
what body part types every UA in the world can handle so every potential
originator can simply ask "the directory" if an intended recipient can
handle a given body part type.  I shudder at the task of setting up and
operating such a global directory of UA capabilities, and at the Quality
of Service aspects when individuals fail to keep their UA entries
current in the global directory.  Some people and organizations will
even regard this as private information, not to be disclosed to the
public.

As I see it, this grand global directory is only a dream.  Maybe a
"pipedream".

So, the only fall back we have is for the originator to ask the intended
UA owner if the target UA can handle the body part type that the
originator wants to send.  This is actually cheaper than demanding that
everyone in the world inform everyone in the world what body part types
their UA can handle.

I don't see any other way around this dilemma.  Best...\Stef

mhsc@oce.nl (Maarten Schoonwater) (11/14/90)

The discussion on data compression makes it very clear that there is a need
for standard data representations and formats for interchange. Now the
global community is coming more and more together we must learn to speak
common languages.

In ODA (Office Document Architecture) the problems that are signalled here
are solved to a great extend. ODA defines different Document Application
Profiles for different levels of interchange. The simplest form only
contains text, level 2 and 3 also graphics. The bitmap graphics in the ODA
documents can be compressed according to the T.4 and T.6 compression
standards (i.e. fax group 3 and 4). When you send an ODA body part over
X.400 you declare the content type (the profile level). The receiving X.400
system thus can check whether it can decode this level of ODA and thus
refuse the message if desired.

ODA only solves the problem for documents, there should be additional
standards for other applications. For pure bitmaps there are already the
CCITT facsimile standards that can be used and declared in the various body
parts. Compression is therefore already solved.

Maarten Schoonwater
Oce-Nederland BV

pv@Eng.Sun.COM (Peter Vanderbilt) (11/16/90)

Assuming that compression of P2 body parts is a good thing, is there a
standard mechanism to use for identifying compression?

The simpliest mechanism is to just use an external body part with a
different object id (OID) for each different compression.

Alternatively, one could use the parameters part with a field to
indicate what kind of compression is used, where each compression
algorithm is assigned an OID.

The first mechanism has the problem that it requires "M*N" OIDs -- an
OID has to be allocated (and configured) for each pair of data type and
compression algorithm.  The second mechanism only requires "M+N" OIDs
-- one for each data type and one for each compression algorithm.  But
the second mechanism has the problem that it requires wide-spread
implementation to achieve the desired independence -- which seems like
a major hurdle.

Does anybody have info on whether the standards people considered body
part compression and, if so, how they expected it to be implemented?
Is anyone implementing it currently and, if so, how?

(Along the lines of the second mechanism, in practice it appears to be
useful to carry an identifying string with a body part -- is there any
hope that implementors would agree to a standard way to carry labels in
the parameters part?)

Pete