[comp.compression] Standard: Memory problems continued.

ross@spam.ua.oz.au (Ross Williams) (06/22/91)

CRIT_004: CONTIGUOUS MEMORY AND TYPING (CONTINUED)
--------------------------------------------------
Sender: jloup@chorus.fr
Date: 22-Jun-1991.

|| One  problem  here  is  that   many  algorithms  allocate  heap  items
|| incrementally as  they build their data  model. It seems silly  for an
|| algorithm that is told that it can have three megabytes to make 100000
|| allocation calls only to later find out that it only has to process 1K
|| of  data! If  the standard  has to  be changed  so that  the algorithm
|| allocates its own memory  then I am in favour of allowing  it to do so
|| incrementally, but  to ensure that  it does not  use more than  it has
|| been allowed.
|
|I also agree that incremental allocation is important for some algorithms.
|But I don't think that you need a bullet proof protection against
|algorithms which do not follow the rules. If you cannot prove from
|the source of your algorithm that the amount of required memory is bounded by
|a certain value you are in trouble anyway. Even in your original proposal
|you could not guard against malicious use of the stack or heap without
|checking the source code.

One  of  the  troubles  with  allowing an  algorithm  to  do  its  own
allocations is  that you run  into heap fragmentation  problems. There
may be enough  memory left, but if it is  fragmented and the algorithm
wants another  slightly largish lump then  you may get a  failure half
way through a compression. That is why  I decided to add in the result
status  for all  data compression  procedure  calls when  I agreed  to
allowing algorithms to allocate their own memory on the fly.

I see no harm in creating a bullet proof standard even if it cannot be
automatically enforced. The  Ada language definition has  a concept of
"erroneous" which is a bit like this.

|| I agree with your analysis and propose the following:
||
||    1. The identity record specifies the maximum amount of memory to be
||       used.
||    2. This memory may be allocated from the heap only. (However, the
||       procedure can still use up to 1024 bytes of other memory.)
|
|I think you mean that the memory must not be allocated before the
|"Initialize data structures" call and must be deallocated no later
|than the "Destroy data structures" call. This is sufficient to
|disallow static data. There is no need to make an explicit reference
|to "heap". I am not convinced that you must disallow stack usage.
|Many architectures do not have small limits on the stack size, and
|stack space is cheaper than heap space. Of course the part of
|the data structures which must live across compression calls must
|be in the heap and accessed through the MEMORY parameter, but temporary
|data could be on the stack. (For example the LZW uncompress requires a stack
|to reverse the bytes of a code string.)

OK OK I'm being a martinet again.  Agreed. The standard has to be more
flexible  than I  have made  it about  where memory  comes from.  I am
getting uneasy about how the standard is loosening up though. I wanted
to  place a  bound  on the  stack so  that  procedures wouldn't  cause
trouble on machines with a small stack space (e.g. some microprocessor
embedded systems).

I like your  definition for disallowing static data.  However, it does
allow multiple activated instantiations to communicate with each other
so extra conditions might have to be added.

||    4. The memory parameter will become an INOUT generic pointer in which
||       the algorithm should place a reference to the root of its
||       data structure during an init call.
|
|Please allow this pointer to have a meaningful type for strongly
|typed languages. In other words, the type may be different for
|different algorithms. In C terminology allow (struct foo *) instead of
|(void *).

This is  up to the writer  of the child standard  for each programming
language.  The  generic standard  is  concerned  only with  the  rough
information flows and protocol.

|| Jean: Thanks  for your  comments. Do  you agree  with the  proposal as
|| given above?
|
|Yes. In particular it is nice to allow incremental allocation.

At, as  I mentioned earlier, the  cost of needing a  result status for
all  calls (because  of fragmentation).  However, the  status is  also
useful for allowing the algorithm to  notify of data corruptions so it
is probably worth it.

|Jean-loup
|jloup@chorus.fr
|
|PS: My only strong disagreement is to be named Jean :-) Hyphenated
|french names must not be split.

OK.

|| PS: Please don't  think I'm a typeless troglodyte.  My SAKDC algorithm
|| is written in Ada and is  full of rock-solid highly typed abstractions
|| using great  big heap based data  structures. It's just that  my brain
|| has been warped by coding in C recently.
|
|I have myself fallen from the Ada language design team down to earth (C++) :-)
|But I'd like to get your SAKDC.

I'm impressed. I think that you've  created a big monster (I have been
involved  with safety-critical  systems  and  sympathize with  Hoare's
criticism) but  I have  found Ada to   be a  friendly big  monster who
looks after you most of the time. I like it. Well done.

Yet another vote for digging up SAKDC. I might even do it soon.

Ross Williams
ross@spam.ua.oz.au

ian@airs.com (Ian Lance Taylor) (06/22/91)

ross@spam.ua.oz.au (Ross Williams) writes:

>OK OK I'm being a martinet again.  Agreed. The standard has to be more
>flexible  than I  have made  it about  where memory  comes from.  I am
>getting uneasy about how the standard is loosening up though. I wanted
>to  place a  bound  on the  stack so  that  procedures wouldn't  cause
>trouble on machines with a small stack space (e.g. some microprocessor
>embedded systems).

Your standard seems to be driving at two different goals at once:

    1) Allow programs to use several different compression subroutines.

    2) Require compression subroutines to be portable between
       machines.

The point you are making above seems to relate only to goal 2.  I
think the real benefit of the standard will be goal 1.  It's going to
be awfully hard to make all compression routines portable to all
machines anyhow.  That seems to me to be, as the C language standard
would say, a ``quality of implementation'' issue for the compression
routine itself.  Furthermore, an implementation on a machine with lots
of stack space could use the stack; an implementation on a machine
without stack space could use some other memory.  There shouldn't be a
requirement that the same compression code work on different machines;
only that if a different machine is supported, the same functionality
is provided.

Allowing programs to use different compression routines with a
standard interface seems to me to be a worthy goal.  Specifying
details about where the memory comes from seems needlessly machine
specific.
-- 
Ian Taylor                   ian@airs.com                 uunet!airs!ian
First person to identify this quote wins a free e-mail message:
``If he could have moved, he would have gotten up and gone after the man
to thank him for wearing something so marvelously interesting.''