[comp.compression] Proposed standard -- to block or not to block.

jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (06/21/91)
From article <859@spam.ua.oz>, by ross@spam.ua.oz.au (Ross Williams):
> DATA COMPRESSION INTERFACE STANDARD

> 2.2.1 A conforming  procedure must have a parameter  list that conveys
> no more  and no less information  than that conveyed by  the following
> "model" parameter list.
> 
>    IN    action    - The action requested of the data compression procedure.
>    INOUT memory    - A block of memory for use by the algorithm.
>    IN    fresh     - Is this the start of a new context block?
>    IN    inblock   - An array of input  bytes.
>    OUT   outblock  - An array of output bytes.
>    OUT   identity  - Record specifying attributes of the algorithm.
>

One of the biggest questions I have about this proposal is the requirement
that the calls to the algorithm be at the block level.  For many
applications, a stream oriented interface would work better, and it
would be easy to provide a standard stream-to-block adapter for those
applications which prefer a block oriented interface such as the above.

For some inherently block-oriented compression algorithms, it might be
equally useful to have a standard block-to-stream adapter.  As in many
object-oriented environments, we run into the classic problem here of
which interface belongs where in the hierarchy.  For some implementations
of the compression abstraction, the block oriented interface is natural;
for other implementations, the stream oriented interface will be more
natural.  In either case, it will be useful to have a set of adapter
cables that allow applications written in one form to make effective
use of algorithms coded in the other form.

What I have in mind is a stream interface that has the following components:

  start_compress()     -- called to indicate the start of a new context
                            block.

  compress(IN byte)    -- called to append a byte to the stream of data
                            being input to the compression algorithm.

  flush_compress()     -- called to mark the end of a context block, causing
                            any partially accumulated compressed data to be
                            output.

  transmit(IN byte)    -- provided by the user and called by compress to
                            append one byte to the stream of compressed data.
                            stream.

  flush_transmit()     -- provided by the user and called by flush_compress.

----

  start_expand()       -- called to indicate the start of a new context
                            block.

  expand(OUT byte)     -- called by the user to get the next byte of the
                            reconstructed input stream.

  receive(OUT byte)    -- provided by the user and called by expand to
                            get the next byte of the compressed data stream.

At this level, bookkeeping for MAX_EXPANSION isn't present.

Two things might need to be done to this:

 1) for some applications, a compression or expansion context may be a
    useful add-on parameter to all of these routines.  This would allow
    programs to deal with multiple independent streams at the same time.

 2) End-of-stream during expansion isn't addressed here.  The C approach
    would be to have expand and receive return integers with the byte
    in the low order bits and a value of -1 in case of end-of-stream.
    The Ada approach would be to raise an end-of-stream exception.
    value, in Ada, you'd raise an end-of-stream exception, in 
    functions.
					Doug Jones
					jones@cs.uiowa.edu
-----------------
Seen on a T-Shirt recently:  Remember!
                             Spam isn't just for breakfast anymore!