[comp.compression] Reply to Jones's criticism of standard.

ross@spam.ua.oz.au (Ross Williams) (06/21/91)

Thanks for your response Douglas. Here is my reply.

CRIT_001: USE OF A SINGLE PROCEDURE
-----------------------------------
>The idea of passing the desired action to one omnibus routine that can
>both compress, expand, and perhaps wash cars is odious.  It means that
>the entire baggage of compression must be included for applications
>which only decompress, and vis a versa.
>
>I have yet to see any non-artificially constructed examples where I would
>want to use either the compress or expand operations at the same point
>in a program, depending on some variable.  Thus, it would seem quite
>practical to use two separate procedures, one for compressing a block,
>and one for expanding an already compressed block.

The reason for using one procedure only  is that as one routine it can
be manipulated as a first class  object in many languages. For example
in C  a pointer-to-function  type can be  defined, variables  of whose
type can point  to a particular data compression  algorithm. This will
allow systems that  wish to choose between a variety  of algorithms to
store  an  array of  pointers  to  such  functions  and try  out  each
algorithm in  turn. If  more than  one function  is declared  for each
algorithm it becomes much more messy.

The point about having to include (say) all the compression baggage in
a decompressor hardly matters as it  is only the compression CODE that
must  be  lugged  about,  not  the  working  memory  required  by  the
compression code. As most compression  algorithms will use only a very
small amount of code, I don't see this as a problem.

I do not view  the carwash problem as serious as  there are only three
operations and two  of them use the same set  of parameters. The third
operation introduces only one extra parameter.

CRIT_002: BLOCK VS STREAM
-------------------------
>> 2.2.1 A conforming  procedure must have a parameter  list that conveys
>> no more  and no less information  than that conveyed by  the following
>> "model" parameter list.
>>
>>    IN    action    - The action requested of the data compression procedure.
>>    INOUT memory    - A block of memory for use by the algorithm.
>>    IN    fresh     - Is this the start of a new context block?
>>    IN    inblock   - An array of input  bytes.
>>    OUT   outblock  - An array of output bytes.
>>    OUT   identity  - Record specifying attributes of the algorithm.
>>
>
>One of the biggest questions I have about this proposal is the requirement
>that the calls to the algorithm be at the block level.  For many
>applications, a stream oriented interface would work better, and it
>would be easy to provide a standard stream-to-block adapter for those
>applications which prefer a block oriented interface such as the above.

I agree 100%; it took a lot of agony to arrive at the block interface.
However there are two very strong arguments for the block interface:

   EFFICIENCY: A stream interface requires one procedure call per byte
   processed.  For high speed applications this is unacceptable.

   DIRECTNESS: A stream interface will force many algorithms to become
   involved in unnecessary buffering.

In contrast, a stream interface  can always be efficiently constructed
over the top  of a block interface. The stream  interface is then also
in a better position to choose buffer sizes.

>For some inherently block-oriented compression algorithms, it might be
>equally useful to have a standard block-to-stream adapter.  As in many
>object-oriented environments, we run into the classic problem here of
>which interface belongs where in the hierarchy.  For some implementations
>of the compression abstraction, the block oriented interface is natural;
>for other implementations, the stream oriented interface will be more
>natural.  In either case, it will be useful to have a set of adapter
>cables that allow applications written in one form to make effective
>use of algorithms coded in the other form.

Agreed 100%. However,  I really believe that there is  an asymmetry in
the  stream/block  schemes (for  the  reasons  outlined above).  I  am
strongly  in favour  of creating  a stream  standard, but  I think  it
should be  constructed as a  layer above  the block standard.  (If you
want another example, look at UNIX IO :-).

>What I have in mind is a stream interface that has the following components:
>
>  start_compress()     -- called to indicate the start of a new context
>                            block.
>
>  compress(IN byte)    -- called to append a byte to the stream of data
>                            being input to the compression algorithm.
>
>  flush_compress()     -- called to mark the end of a context block, causing
>                            any partially accumulated compressed data to be
>                            output.
>
>  transmit(IN byte)    -- provided by the user and called by compress to
>                            append one byte to the stream of compressed data.
>                            stream.
>
>  flush_transmit()     -- provided by the user and called by flush_compress.
>
>----
>
>  start_expand()       -- called to indicate the start of a new context
>                            block.
>
>  expand(OUT byte)     -- called by the user to get the next byte of the
>                            reconstructed input stream.
>
>  receive(OUT byte)    -- provided by the user and called by expand to
>                            get the next byte of the compressed data stream.

The interface above is a nice stream interface. However, it introduces
many of the problems that I was trying to avoid in my standard:

   * Making the user have to hand over the routines "transmit",
     "flush_transmit" and "receive" introduces a host of problems in
     many programming languages. The problems relate to:
        * Function types.
        * Function parameter list type conformance.
        * Functions as parameters to other functions.
        * Global variables and IO state.
        * Error handling (what happens if the IO operation in
          "transmit" fails?
     In contrast, the interface of memory is extremely simple
     portable, and is guaranteed not to fail.

     You might think that I am contradicting myself by criticising
     functions as first class objects above while also being in favour of
     having a single function interface so that it can be used in a
     first class manner. Not so. I am in favour of
     having one function in the interface so as to ALLOW functions to
     be easily passed about in languages that allow it. The stream
     suggestion above FORCES it, making things difficult in languages
     that do not support functions as  first class objects.

   * This interface gives the compression algorithm the right to write
     however many bytes it feels like, whenever it feels like it. This
     makes it potentially unmanageable (see below).

   * By having lots of interface functions, this alternative does not
     easily allow algorithms to be manipulated as first class
     objects.

>At this level, bookkeeping for MAX_EXPANSION isn't present.

I agree, and this is an advantage in the stream interface where one is
copying  streams.  However, if  the  stream  interface is  being  used
internally to compress  from memory to memory (and  the programmer has
dutifully written  and handed  over functions to  read and  write from
input and output memory buffers)  then exactly the same problem arises
except that  one now has no  guarantees of how much  memory one should
allow in the output block.

On as separate note, I think that it is a healthy thing to set a limit
to the expansion that a compression algorithm can yield (although this
may be restrictive in experimental contexts).

>Two things might need to be done to this:
>
> 1) for some applications, a compression or expansion context may be a
>    useful add-on parameter to all of these routines.  This would allow
>    programs to deal with multiple independent streams at the same time.
>
> 2) End-of-stream during expansion isn't addressed here.  The C approach
>    would be to have expand and receive return integers with the byte
>    in the low order bits and a value of -1 in case of end-of-stream.
>    The Ada approach would be to raise an end-of-stream exception.
>    value, in Ada, you'd raise an end-of-stream exception, in
>    functions.

The memory  block interface  does not  have a  difficulty with  end of
stream as it leaves that problem to the user.

------

Once again, thanks for your criticism  Douglas. I look forward to more
comments from comp.compression heads.

Ross Williams
ross@spam.ua.oz.au

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (06/22/91)

In article <861@spam.ua.oz> ross@spam.ua.oz.au (Ross Williams) writes:
> However there are two very strong arguments for the block interface:
>    EFFICIENCY: A stream interface requires one procedure call per byte
>    processed.  For high speed applications this is unacceptable.

Huh? UNIX's read() is a stream interface, and it sure doesn't require
one procedure call per byte processed.

>    DIRECTNESS: A stream interface will force many algorithms to become
>    involved in unnecessary buffering.

Your block interface forces *all* algorithms to become involved in
unnecessary blocking. Again it sounds like you're concentrating on
zero-delay modem compression. I don't think compressors should be forced
into that model.

> In contrast, a stream interface  can always be efficiently constructed
> over the top  of a block interface.

This is not true, especially given your requirement that the block
compressor be memoryless.

---Dan