[comp.sw.components] Real-time Garbage Collection

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/18/89)

From gateley@m2.csc.ti.com (John Gateley):
> There exists at least one real-time garbage collection algorithm 
> (don't have the reference right now, email if you want it).

    I recently read of a linear-time algorithm (ignoring constant
    factors, of course) -- the only problem is that it required 
    sixteen times the amount of space actually needed by the program!!!

    The point is, any system which uses GC is going to be considerably
    less efficient (and constant factors are important here!) than a
    competing system in which the fact that an object is no longer
    needed is directly communicated to the storage management system.

    Now the advocates of GC frequently claim that space management is
    too heavy a burden to be placed on the application programmer, and
    this may well be true.  However, there is a much better way to deal
    with this problem.  All that is necessary is to encapsulate the
    details of storage management within abstract data types, such that
    the destruction algorithm is integrated into the block exit mechanism.

    By doing this, we place a level of abstraction between the application
    programmer and the details of storage management.  The use of pointers
    is confined to the ADT's implementation, and the application programmer
    is shielded from all the resulting complexities.  The interface also
    simplifies the work of the ADT's implementor; the destruction algorithm
    is usually quite straightforward.

    Worrying about GC is essentially barking up the wrong tree; what we
    need is better reuse mechanisms, so that we can also gain the added
    benefit of exploiting the implicitly supplied information regarding 
    the fact that a given variable is no longer needed which is naturally
    produced (with no special effort on the part of the application
    programmer) upon each and every block exit.  Thus, we can have our
    programming convenience WITHOUT eating our memory or CPU cycles, and
    thereby keep everyone happy (except perhaps those hackers who get a
    charge out of playing with pointers).


    Bill Wolfe, wtwolfe@hubcap.clemson.edu

tynor@prism.gatech.EDU (Steve Tynor) (09/18/89)

In article <6488@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes:
...
>    By doing this, we place a level of abstraction between the application
>    programmer and the details of storage management.  The use of pointers
>    is confined to the ADT's implementation, and the application programmer
>    is shielded from all the resulting complexities.  The interface also
>    simplifies the work of the ADT's implementor; the destruction algorithm
>    is usually quite straightforward.

Freeing the memory _is_ usually straightforward. Knowing when it's safe to 
do so _isn't_. That's when GC (or reference counting) becomes necessary.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If the facts do not conform to the theory, they must be disposed of.
                     
    Steve Tynor
    Georgia Tech Research Institute
    Artificial Intelligence Branch
    tynor@prism.gatech.edu

ted@nmsu.edu (Ted Dunning) (09/18/89)

i find that i am mostly in agreement with mr. wolfe when 

In article <6488@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) writes:


       The point is, any system which uses GC is going to be considerably
       less efficient (and constant factors are important here!) than a
       competing system in which the fact that an object is no longer
       needed is directly communicated to the storage management
       system.

this is not always true.  if the objects are heavily shared, or if
point in time that the object becomes unreferencable is obscure, then
this is a difficult problem.  people have used reference counting in
the past to solve this problem, but the general consensus has always
been (on lisp implementations) that devoting enough bits to reference
counting was impractical (it certainly is if every value with pointer
semantics must be tagged) and that reference counting collection was
more expensive than stop and copy.  in addition, reference counting
does not compact storage as does any of the various forms of stop and
copy. 

       All that is necessary is to encapsulate the
       details of storage management within abstract data types, such that
       the destruction algorithm is integrated into the block exit
       mechanism.

this is not sufficient.  we also need to handle assignment (the old
value of the assignee becomes less referenced), and formal argument
passing (where we suddenly have one more reference).  in c++ there is
also all of the complexity of implicit and explicity type conversion.

trickier is the issue of continuations and first class functional
objects.  somehow all variables that support reference counting styles
of storage management which are referenced by a continuation, or by a
functional object, must be marked as referenced and not reclaimed on
block exit.

       By doing this, we place a level of abstraction between the application
       programmer and the details of storage management.

certainly the goal.  

       The use of pointers
       is confined to the ADT's implementation, and the application programmer
       is shielded from all the resulting complexities.

again, motherhood and apple pie.

       The interface also
       simplifies the work of the ADT's implementor; the destruction algorithm
       is usually quite straightforward.

this is only straightforward in a language that does not support
advanced concepts.  (like ada and c++).  once you talk about a
language as strong as scheme, then you have a much more interesting
problem.  one that is, incidently, easily handled by conventional
garbage collectors.

in fact, reference counting is not the only possibility in a language
like c++ which provides some knowledge of scope entry and exit for the
programmer.  instead, it is possible to keep the equivalent of an
obarray, pushing and popping references to variables as needed.  then
when gc is desired, one can do a stop and copy or mark/sweep with an
extra pass to correct all the pointers.

       Worrying about GC is essentially barking up the wrong tree; what we
       need is better reuse mechanisms, so that we can also gain the added
       benefit of exploiting the implicitly supplied information regarding 
       the fact that a given variable is no longer needed which is naturally
       produced (with no special effort on the part of the application
       programmer) upon each and every block exit.

this _is_ garbage collection.  it just isn't stop and do x garbage
collection. 



it should be kept in mind that in general, collection overhead increases
as you go from generational collection to stop and copy to mark sweep
to reference counting to `real time' collection methods (almost all of
which are based indirectly on baker's algorithm in the august '78
issue of cacm).  if you care about average throughput, use stop and
copy, and maybe a generational copier.  if you want a very simple
system, use ref counting.  nobody really uses the real time methods
yet because they are expensive and complex.

--
ted@nmsu.edu
			Most of all, he loved the fall
			when the cottonwoods leaves turned gold
			and floated down the trout streams
			under the clear blue windswept skies.

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/19/89)

From article <1895@hydra.gatech.EDU>, by tynor@prism.gatech.EDU (Steve Tynor):
> Freeing the memory _is_ usually straightforward. Knowing when it's safe to 
> do so _isn't_. That's when GC (or reference counting) becomes necessary.

   It's safe to do so:

     1) When the variable goes out of scope

     2) When the programmer specifies that it's to be destroyed sooner.

   So assuming that we use abstract data types like good little 
   programmers, where's the need for reference counting?


   Bill Wolfe, wtwolfe@hubcap.clemson.edu

rang@cs.wisc.edu (Anton Rang) (09/19/89)

In article <6493@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) writes:
>From article <1895@hydra.gatech.EDU>, by tynor@prism.gatech.EDU (Steve Tynor):
>> Freeing the memory _is_ usually straightforward. Knowing when it's safe to 
>> do so _isn't_. That's when GC (or reference counting) becomes necessary.
>
>   It's safe to do so:
>
>     1) When the variable goes out of scope
>
>     2) When the programmer specifies that it's to be destroyed sooner.

  If there's any explicit pointer manipulation, these conditions
aren't sufficient, because an object can be pointed to in more than
one place.  For instance, if I have two objects (A and B) sharing
pointers to object X, then object X should be destroyed (assuming no
other references) when both A and B have been destroyed.
  Personally, I prefer controlling my own data structures...leaving it
up to a garbage collector has always seemed a little sloppy to me.
   
+----------------------------------+------------------+
| Anton Rang (grad student)        | "VMS Forever!"   |
| University of Wisconsin--Madison | rang@cs.wisc.edu |
+----------------------------------+------------------+

scc@cl.cam.ac.uk (Stephen Crawley) (09/19/89)

The most important distinction between ordinary and real-time programs
is that the real-time programs need to execute in predictable time.  
An ordinary synchronous garbage collector that stops execution for a 
non-trivial amount of time is clearly not acceptable in this context.
But an asynchronous garbage collector that allows GC to be done in
short burst during periods of inactivity MAY be ok for real-time 
programming.  

The key question that determines whether or not asynch GC will work in
real-time is whether there is enough idle time to keep up with garbage 
collection.  In many cases the answer should be yes!  And if the answer 
is no, it may be possible to MAKE more idle time can be made by 1) using 
a faster processor or 2) moving the tasks of garbage collection onto a
second processor.  Only when we are pushing the limits of the hardware
will the answer be no. 

A second question that affects the viability of GC in a real-time program
is how much garbage gets generated.  I claim that for most of real-time 
program that are being written today, the answer is "not a lot".  If a 
programmer can decide to free a piece of dynamic on exiting a scope block, 
then a smart compiler could usually make the same decision in many cases.
Similarly a smart compiler should be able to optimise storage reclamation
in other cases.  [It is my guess that the main reason that compilers that 
optimise GC don't exist is that the industry is prejudiced against GC]

There are of courses the cases that cannot be solved by freeing stuff on 
scope exit, or by other static tricks.  In such cases, the non-GC'ed 
solution MUST handled by the application.  Bill Wolfe seems to be saying
that such situations almost never arise in real-time problems.  This may
be true right now, but it won't be long before the military start asking 
for real-time AI!

Finally, I'd like to rebut the idea that ADT's that do their own storage
management are appropriate for use in a GC'ed environment.  If a type 
does its own management, then in order to get it to do its stuff, it must
export an explicit destructor operation or its equivalent.  How / when does 
the destructor get called?  There are 2 options as I see it:

1)	A type that refers to a non-GC'ed type becomes non-GC'ed, and must
	have a destructor operation.  We soon find that large parts of our
	application have to understand storage management policy. 
	
2)	We arrange that the garbage collector provides hooks for calling
	an object's destructor operation when the object becomes garbage
	in the conventional sense.  This makes garbage collection more
	complicated.  For example, when the GC finds some garbage instances 
	of ADT's with a destructor, it must decide on the correct order to 
	destroy them.

In either case, we end up with the problems we were trying to avoid by 
using garbage collection; complication for the application, and lots of 
potential for storage leaks and other bugs. 

In summary, we need to recognise 3 things:

o  A GC'ed language is the only sane approach to many hard programming
   problems.  Some of these problems WILL also be real-time problems.

o  A GC'ed language is not necessarily inappropriate for real-time
   problems.  It depends on the problem, the tools (GC and compiler)
   and on the hardware.

o  GC'ed languages and non-GC'ed packages don't mix, so we should
   not try to mix them.  This is a case where software reuse is NOT
   appropriate.
   
-- Steve

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/19/89)

From rang@cs.wisc.edu (Anton Rang):
>>   It's safe to do so:
>>     1) When the variable goes out of scope
>>     2) When the programmer specifies that it's to be destroyed sooner.
> 
>   If there's any explicit pointer manipulation, these conditions
> aren't sufficient, because an object can be pointed to in more than
> one place.  

   True, but the basic thrust of the position is that the programmer
   should generally not use pointers explicitly; rather, the use of 
   pointers should be made safe via encapsulation within a secure ADT.
 
>   Personally, I prefer controlling my own data structures...leaving it
> up to a garbage collector has always seemed a little sloppy to me.

   Same here.  I welcome GC algorithms as an ADT *debugging* tool 
   (raising an exception if the algorithm ever succeeds in finding 
   any loose storage), but it just costs way too much at RUN time.


   Bill Wolfe, wtwolfe@hubcap.clemson.edu
 

tynor@prism.gatech.EDU (Steve Tynor) (09/19/89)

In article <6497@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes:
>From rang@cs.wisc.edu (Anton Rang):
>>>   It's safe to do so:
>>>     1) When the variable goes out of scope
>>>     2) When the programmer specifies that it's to be destroyed sooner.
>> 
>>   If there's any explicit pointer manipulation, these conditions
>> aren't sufficient, because an object can be pointed to in more than
>> one place.  
>
>   True, but the basic thrust of the position is that the programmer
>   should generally not use pointers explicitly; rather, the use of 
>   pointers should be made safe via encapsulation within a secure ADT.

Whether a reference to a shared object is a pointer, or is hidden via some
clever encapsulation, I can't see how the programmer can be expected to know
when to explicitly deallocate the reference without maintaining reference
counts (whether he does it, or the ADT - it's still reference counting) or
relying on GC (or assuming that references are _never_ shared - which seems
like a simplistic assumption).

I am (as we speak) trying to implement a ADT (in Ada) which contains objects
whose lifetimes are not lexically scoped and to which many other objects may
refer.  I'd be very happy to avoid reference counts and GC if someone can
suggest a way to do so!

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
WITH disclaimer; USE disclaimer;
                     
    Steve Tynor
    Georgia Tech Research Institute
    Artificial Intelligence Branch
    tynor@prism.gatech.edu

johnson@p.cs.uiuc.edu (09/19/89)

> Written  6:21 am  Sep 18, 1989 by billwolf%hazel.c@hubcap.clemson.edu 
>  Now the advocates of GC frequently claim that space management is
>  too heavy a burden to be placed on the application programmer, and
>  this may well be true.  However, there is a much better way to deal
>  with this problem.  All that is necessary is to encapsulate the
>  details of storage management within abstract data types, such that
>  the destruction algorithm is integrated into the block exit mechanism.

>  By doing this, we place a level of abstraction between the application
>  programmer and the details of storage management.  The use of pointers
>  is confined to the ADT's implementation, and the application programmer
>  is shielded from all the resulting complexities.  The interface also
>  simplifies the work of the ADT's implementor; the destruction algorithm
>  is usually quite straightforward.

Bill Wolfe is being hopelessly idealistic.  I've used C++ a lot, which
takes the approach that he advocates, and there are many times when it
is very hard for an ADT to figure out when to destroy an object.  Moreover,
memory management is the source of many bugs and takes a lot of programmer
time to get right.  In general, we want to avoid programming language
features that encourage bugs and cause programmers to spend a lot of time.
C++ programmers tend to minimize using dynamic memory management as much as
possible, and it is probably partly for this reason.

Moreover, there are times when memory management would be more efficient
if it were implemented by the compiler instead of by the programmer.
For example, C++ programmers use reference counting a lot, and there are
a number of algorithms by which even stupid compilers can eliminate a
lot of reference counting.  Of course, programmers can perform these
optimizations by hand, but I believe that compilers should be in charge
of optimization, not programmers.  Most languages with garbage collection
are implemented by interpreters or by very simple compilers, and so we
haven't seen much optimization of memory management, but we will in the
future, as more and more languages include garbage collection.

Many garbage collection systems are quite efficient.  For example,
some generation scavenging systems have a 3% to 5% overhead.  The
problem is that the efficient systems are not real-time, and the
real-time garbage collection algorithms are not efficient.  The
algorithm by Appel et. al. in the '88 SIGPLAN programming language
implementation conference is supposed to be both, but I am not sure
that it has been ever been used in a real-time environment.  In any
case, there were no timings given, and "real-time" is always relative.

Ralph Johnson

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/19/89)

From scc@cl.cam.ac.uk (Stephen Crawley):
> The key question that determines whether or not asynch GC will work in
> real-time is whether there is enough idle time to keep up with garbage 
> collection.  [...] Only when we are pushing the limits of the hardware
> will the answer be no. 

   Another key question is "How can we make this product such that it
   uses as little memory and CPU as possible, so that we might underbid
   our competitors?"; the reuse of components which do their own space
   management will permit the elimination of GC, thus providing an edge.
 
> Finally, I'd like to rebut the idea that ADT's that do their own storage
> management are appropriate for use in a GC'ed environment.  [...]
> [how to handle interactions between managed and unmanaged storage?]

   I'm not an expert on how this is done, but I'd guess that memory is
   divided into two regions: managed and unmanaged.  The GC system could 
   *read* the managed region (to detect pointers into the unmanaged area),
   but only for the purpose of operating upon the unmanaged region; it
   must refrain from taking any actions with respect to managed storage. 

   Probably there are considerably more sophisticated techniques 
   involved; I'd suggest porting this thread into comp.lang.ada 
   if someone is seriously interested in the question, since Ada
   compilers which provide GC are also required to enforce the
   CONTROLLED pragma (which specifies that this type manages its 
   own storage). 


   Bill Wolfe, wtwolfe@hubcap.clemson.edu
 

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/19/89)

From tynor@prism.gatech.EDU (Steve Tynor):
>>>>   It's safe to do so:
>>>>     1) When the variable goes out of scope
>>>>     2) When the programmer specifies that it's to be destroyed sooner.
> 
> Whether a reference to a shared object is a pointer, or is hidden via some
> clever encapsulation, I can't see how the programmer can be expected to know
> when to explicitly deallocate the reference without maintaining reference
> counts (whether he does it, or the ADT - it's still reference counting) or
> relying on GC (or assuming that references are _never_ shared - which seems
> like a simplistic assumption).

   OK, first let me clarify my position.  The primary use for pointers
   is simply to implement abstract data types -- lists, trees, graphs, etc.;
   by shielding the user from the details of implementation, we can keep
   the user from having to worry about manipulating pointers.

   Now your question, I believe, is "What if the user wants to store
   pointers in a container?"... the response is that the user probably
   should seriously consider the possibility that s/he is making an 
   inappropriate use of pointers.

   Let's suppose the user has an object with lots of "inertia", which
   causes the user to initially think that pointers are a good way to
   "move" this object as necessary.  This can be addressed with a
   technique which applies to an even harder problem: the case in
   which the object is not even locally present.  The solution is
   to assign unique identification numbers to objects of the type,
   keep the objects in a database (which may of course be a distributed
   database), and then deal with the identification numbers instead.
   
   Identification numbers have many nice properties: they can be arbitrary
   abstractions (e.g., encoding certain properties of their client objects),
   they are totally machine-independent, and they are not subject to any
   a priori limitations.  Information regarding the maximum lifetime of the 
   object is a particularly good property to encode into the identification
   number, since the holder can then quickly check whether the object has
   expired without even consulting the database.  Where objects can have
   an infinite lifetime, a protocol can be devised whereby the database
   must be checked at least once per some arbitrary time period by all
   users, which will permit identification numbers to be recycled after
   one full time period of nonexistence.  Other protocols might be used
   depending upon the characteristics of the specific object involved.
   The techniques for managing the active range of identification numbers
   can also be quite sophisticated.

> I am (as we speak) trying to implement a ADT (in Ada) which contains 
> objects whose lifetimes are not lexically scoped and to which many 
> other objects may refer.  I'd be very happy to avoid reference counts
> and GC if someone can suggest a way to do so!

   Consider it done. 


   Bill Wolfe, wtwolfe@hubcap.clemson.edu

tynor@prism.gatech.EDU (Steve Tynor) (09/19/89)

In article <6506@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes:
...
>   expired without even consulting the database.  AWhere objects can have
>   an infinite lifetime, a protocol can be devised whereby the database
>   must be checked at least once per some arbitrary time period by all
>   users, which will permit identification numbers to be recycled after
>   one full time period of nonexistence.  Other protocols might be used

Gee. This sounds an awful lot like garbage collection to me.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Progress means replacing something wrong with something more subtly wrong.
                     
    Steve Tynor
    Georgia Tech Research Institute
    Artificial Intelligence Branch
    tynor@prism.gatech.edu

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/20/89)

From article <1929@hydra.gatech.EDU>, by tynor@prism.gatech.EDU (Steve Tynor):
>>   expired without even consulting the database.  AWhere objects can have
>>   an infinite lifetime, a protocol can be devised whereby the database
>>   must be checked at least once per some arbitrary time period by all
>>   users, which will permit identification numbers to be recycled after
>>   one full time period of nonexistence.  Other protocols might be used
> 
> Gee. This sounds an awful lot like garbage collection to me.

   No... GC would involve verifying (at tremendous cost) that no more
   occurrences of a given identification number exist anywhere.  This
   protocol specifies that identification numbers EXPIRE AUTOMATICALLY 
   unless certain conditions are met.  GC is restricted to a particular
   machine; this protocol will function even in a distributed environment.
   
   Usually, the semantics of the object will allow even better controls,
   such as a fixed expiration date, but this protocol covers the worst case.


   Bill Wolfe, wtwolfe@hubcap.clemson.edu
 

scc@cl.cam.ac.uk (Stephen Crawley) (09/20/89)

Bill Wolf (?) writes:
>>> It's safe to [reclaim a dynamicly allocated ADT]:
>>>  1) When the variable goes out of scope
>>>  2) When the programmer specifies that it's to be destroyed sooner.

Anton Rang replies:
>>  If there's any explicit pointer manipulation, these conditions
>>  aren't sufficient, because an object can be pointed to in more than
>>  one place.

In addition:
1) doesn't work if a given scope doesn't exit before you've run
out of space OR if the value in a variable is passed out of the scope.
2) doesn't work unless your programmer writes perfect code.

Bill Wolf replies:
>   True, but the basic thrust of the position is that the programmer
>   should generally not use pointers explicitly; rather, the use of 
>   pointers should be made safe via encapsulation within a secure ADT.
   
Anton is absolutely right.  It makes no difference whether references
to an ADT is implicit or explicit.  The ADT cannot know whether or not
the application that uses it still holds a reference to a given instance.
In all but the simple cases the application must take a hand in the ADT's 
storage management, either by invoking the ADT's deallocator explicitly
or by telling the ADT when to incrementing/decrementing a reference count.
 

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/20/89)

From scc@cl.cam.ac.uk (Stephen Crawley):
>>>> It's safe to [reclaim a dynamicly allocated ADT]:
>>>>  1) When the variable goes out of scope
>>>>  2) When the programmer specifies that it's to be destroyed sooner.
>>   [...] the basic thrust of the position is that the programmer
>>   should generally not use pointers explicitly; rather, the use of 
>>   pointers should be made safe via encapsulation within a secure ADT.
>    
> It makes no difference whether references to an ADT is implicit 
> or explicit.  The ADT cannot know whether or not the application 
> that uses it still holds a reference to a given instance.

    The pointers are INTERNAL to the ADT, and the user will never
    see them.  When the ADT goes out of scope, it dies just like 
    any predefined type would.  We are considering here an ADT which 
    is declared at the beginning of a (procedure, function, block).
    This ADT will do dynamic allocation INTERNALLY so as to facilitate
    its own expansion and contraction as necessary.  Upon block exit, 
    the ADT's destroy procedure will be invoked, and all the space
    occupied by the ADT will be released.

    
    Bill Wolfe, wtwolfe@hubcap.clemson.edu

scc@cl.cam.ac.uk (Stephen Crawley) (09/21/89)

From Bill Wolfe:
>>>>> It's safe to [reclaim a dynamicly allocated ADT]:
>>>>>  1) When the variable goes out of scope
>>>>>  2) When the programmer specifies that it's to be destroyed sooner.
>>>   [...] the basic thrust of the position is that the programmer
>>>   should generally not use pointers explicitly; rather, the use of 
>>>   pointers should be made safe via encapsulation within a secure ADT.
>>    
>> It makes no difference whether references to an ADT is implicit 
>> or explicit.  The ADT cannot know whether or not the application 
>> that uses it still holds a reference to a given instance.
>
> The pointers are INTERNAL to the ADT, and the user will never
> see them.  When the ADT goes out of scope, it dies just like 
> any predefined type would.  We are considering here an ADT which 
> is declared at the beginning of a (procedure, function, block).
> This ADT will do dynamic allocation INTERNALLY so as to facilitate
> its own expansion and contraction as necessary.  Upon block exit, 
> the ADT's destroy procedure will be invoked, and all the space
> occupied by the ADT will be released.


OK ... in the trivial case your scheme does work.  The conditions
for it to work include:

a)	NO references to the ADT are passed out of scope.

b)	the scope exits before you run short of space.

c)	the programmer (i.e. the application!) does not incorrectly tell 
	the ADT to release an instance too soon.

If either a), b) or c) do not hold the program dies.  Condition a) can 
be shown to be true, provided that you can trace all of the side-effects
that occur within the scope and PROVE that they do not violate the
condition.  Condition b) is hard to reason about formally except to the
extent that it is possible to place an upper bound on the amount of 
memory allocated within the scope.  Condition c) is virtually impossible
to reason about ... unless you do something drastic like forbidding
all side-effects.

There are other problems with your scheme:

  o  Point 2) is in fact the application taking a hand in an ADT's 
     storage management ... something that he has been saying is
     unnecessary.  [If you leave out 2), then your scheme is equivalent
     to a stack based storage management with an extra stack!]

  o  GC'ed memory management can reclaim junk (heap nodes that are no longer
     needed) whenever it runs out of space, whereas a scope based scheme
     can in general only identify and hence reclaim junk on scope exit.
     In other words, scope based reclamation of memory will in general 
     need more memory.

Finally, don't forget that the case where condition a) is not true; i.e.
some of the dynamically allocated ADT instance are passed out of scope
either explicitly (as a result) or implicitly (by side-effect).

----
  
This whole GC versus non-GC debate seems to hinge the following trade-off.
Do we want programs that

  are slower (by say 25%), but are intrinsicly free of a large class of 
  storage management errors.
  
or

  are usually faster (not always), but may use more storage, may have 
  potential storage leaks and other storage management bugs, and which 
  arguably take longer to develop.
  
----

I would recommend that people should read Chapter 16 (Memory Management)
of Bertrand Meyer's book "Object Oriented Software Construction" for a
calm and rational analysis of the pros and cons of various methods of 

scc@cl.cam.ac.uk (Stephen Crawley) (09/21/89)

[Sorry about the Re:^n's ... bugs in my news reader]

Bill Wolfe proposes a scheme for getting around the need for garbage 
collection by storing objects in a database, using object expiration 
times to avoid proper garbage collection.  Unfortunately this scheme
doesn't achieve its aim.

Bill says: 

  "Where objects can have an infinite lifetime, a protocol can be devised 
   whereby the database must be checked at least once per some arbitrary 
   time period by all users, which will permit identification numbers 
   to be recycled after one full time period of nonexistence."

Exactly.  In order to check each object that it still refers to, each 
"user" (actually application) must traverse the graph of its objects in 
the database.  This process is exactly equivalent to the mark phase of 
a garbage collection, except that the application is doing it!!!

One other point that Bill Wolfe misses is that the CPU overheads of 
managing objects in this way can be orders of magnitude greater
than the cost of using ordinary pointers.

Don't get me wrong, I've been doing research in this particular area for 
a number of years, and I strongly believe that object databases are the 
way of the future.  But they ain't cheap, and they ain't a substitute
for garbage collection.

-- Steve

scc@cl.cam.ac.uk (Stephen Crawley) (09/21/89)

Bill Wolfe writes:

> OK, first let me clarify my position.  The primary use for pointers
> is simply to implement abstract data types -- lists, trees, graphs, etc.;
> by shielding the user from the details of implementation, we can keep
> the user from having to worry about manipulating pointers.

Frankly, I can't see how this is possible.  Suppose I have an ADT 'A'
that has a generator operation NEW and a destructor function DISPOSE.
Suppose I want a generic ADT LIST[t: type] that can be used in the
following way.

a: A
aa: A

  while true do
    a := A$NEW(...)
    aa := A$NEW(...)
    
    l: LIST[A] := LIST[A]$MAKE_EMPTY()
  
    LIST[A]$INSERT(l, a)
    LIST[A]$INSERT(l, A$NEW(...))
    LIST[A]$INSERT(l, a)
    
    aa := LIST[A]$REMOVE_HEAD(l)
    
    etc
    
  end while
  
  
The generation of instances of A must be under the control of the 
application and assignment of A must be handled.  I must be able to
put an A instance into a list any number of times.

How do I implement the LIST ADT so that it invokes the A$DISPOSE
operation at the right time?  When is the right time?  What happens
if I want to put an instance of A into 2 LIST's at the same time?

djones@megatest.UUCP (Dave Jones) (09/22/89)

A very wise man once said, "Floating point representation is a mistake.
If you don't know enough about the problem to scale the quantities, you
don't know enough about the problem."

May he rest in peace. He was martyred by an angry hoard of Fortran
application programmers.

If he had lived, today he might say, "Garbage collection is a mistake.
If you don't know enough about the problem to free the unreferenced
memory, you don't know enough about the problem."

If I had lived, I probably would agree with him.

I have some library packages, oops -- I meant to say "ADT's" --
which help.  I can declare Stacks, which can be marked.  Later you can
deallocate all the memory above the mark in one very fast pop. That
covers many many situations. I also have Heaps which contain blocks
all of one size. Not only is it quicker than your typical malloc(),
sometimes by a factor of several hundred, but if you are through with
all the records in a Heap, just destroy the entire Heap. Both of these,
like all my librar.. er ADT's, grow in size indefinitely, as required.

tynor@prism.gatech.EDU (Steve Tynor) (09/22/89)

In article <8222@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes:
...
>If he had lived, today he might say, "Garbage collection is a mistake.
>If you don't know enough about the problem to free the unreferenced
>memory, you don't know enough about the problem."
>
>If I had lived, I probably would agree with him.
...
>all the records in a Heap, just destroy the entire Heap. Both of these,
>like all my librar.. er ADT's, grow in size indefinitely, as required.
                                ^^^^^^^^^^^^^^^^^^^^^^^^^
I assume this was intended to be followed by a ":-)". 

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
No problem is so formidable that you can't just walk away from it.
                     
    Steve Tynor
    Georgia Tech Research Institute
    Artificial Intelligence Branch
    tynor@prism.gatech.edu

giuliano@ashtate (Giuliano Carlini) (09/23/89)

From article <6519@hubcap.clemson.edu>, by billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ):
> 
>                                                      Upon block exit, 
>     the ADT's destroy procedure will be invoked, and all the space
>     occupied by the ADT will be released.
> 
>     
>     Bill Wolfe, wtwolfe@hubcap.clemson.edu

This can't work.  What if the ADT is returned as the result of the
block.  In order to handle this you either need reference counting, or
garbage collection.

giuliano
uunet!ashton!giuliano
-- 
giuliano
Net:  ...!uunet!ashtate!giuliano
Home: 1322 27th San Pedro, CA 90731 (213)538-6554
Work: Ashton-Tate, 20101 Hamilton Ave., Torrance CA 90502-1319 (213)538-7683

ds@hollin.prime.com (09/24/89)

I believe you are both correct.

Most created objects can be managed with explicit creation/destruction code,
placed in abstract data type specifications or otherwise part of the
application, because their lifetimes are naturally connected to the algorithms
(or messages) that use them.  Sometimes, however, dynamic reclamation (garbage
collection) is truly necessary (to avoid unnaturalness or inefficiency).  A
useful parallel is the need for both stack and heap data storage in
general-purpose programming languages.

When an application can use explicit creation and destruction for most of its
object management, the garbage collection overhead is reduced, and this has
all sorts of good consequences.

David Spector
Prime Computer, Inc.
ds@primerd.prime.com  (until my layoff, expected soon)

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/24/89)

From scc@cl.cam.ac.uk (Stephen Crawley):
> Bill Wolfe proposes a scheme for getting around the need for garbage 
> collection by storing objects in a database, using object expiration 
> times to avoid proper garbage collection.  Unfortunately this scheme
> doesn't achieve its aim. 
>
>   "Where objects can have an infinite lifetime, a protocol can be devised 
>    whereby the database must be checked at least once per some arbitrary 
>    time period by all users, which will permit identification numbers 
>    to be recycled after one full time period of nonexistence."
>
> Exactly.  In order to check each object that it still refers to, each 
> "user" (actually application) must traverse the graph of its objects in 
> the database.  This process is exactly equivalent to the mark phase of 
> a garbage collection, except that the application is doing it!!!

   Not exactly.  To start with, the user does not have to do any work
   at all if the identification number's expiration period is also used
   as the period over which the user will lose interest in the object if
   it is not further utilized.  In this case, the identification number
   will be allowed to expire and the user need only throw it away.

   If the user does make use of the number within the expiration period,
   then the number is automatically revalidated for another period unless
   the database replies that the object has been destroyed, in which case
   the identification number expires as well.

   In the worst case, the user will have to do a database access just to
   verify that the object still exists and that the identification number
   remains valid for another expiration period.  Since many objects have
   a finite lifetime, this burden is borne only by those users who make
   use of infinite-lifetime objects.  The task of maintaining the validity
   of the identification number can be done automatically if desired using
   a suitably designed identification-number abstraction which carries a 
   user-settable automatic revalidation on-off switch.  

   There are no reference counts kept at the database; the database only
   maintains the object, the associated identification number, and any
   authorization protocols which may be required of users who wish to
   create, read, modify, or destroy.  The expiration period protocol
   allows identification numbers to be distributed without limitation
   (to systems which are unknown to the database, possibly not even
   reachable).  It is fault-tolerant, allowing users to die with their
   identification numbers intact without interfering with the destruction
   of the associated object.  If the user wakes up, the user can either
   continue to utilize the identification number (if it has not expired),
   or throw it away due to its expiration. 

> One other point that Bill Wolfe misses is that the CPU overheads of 
> managing objects in this way can be orders of magnitude greater
> than the cost of using ordinary pointers.

   Where the situation permits (as it usually does), pointers which are
   encapsulated within ADT implementations will give better performance.
 
   We are managing objects in this way because they present worst-case
   properties which do not permit us to use more efficient techniques.

 
   Bill Wolfe, wtwolfe@hubcap.clemson.edu

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/24/89)

From ds@hollin.prime.com:
> Sometimes, [...] dynamic reclamation (garbage collection) is 
> truly necessary (to avoid unnaturalness or inefficiency).  A
> useful parallel is the need for both stack and heap data storage in
> general-purpose programming languages.

    I think we've established that managed and unmanaged storage
    paradigms can coexist, and that components which manage their
    own storage can avoid the inefficiencies of garbage collection.

    We also know that the user MUST be involved in storage management,
    if for no other reason than to decide which data structures to
    throw away in the event of a storage crisis.  

    What I'd like to know is: Is there a hard counterexample which 
    will demonstrate that situations exist in which the inefficiencies
    of garbage collection cannot be avoided?  Someone has already outlined
    the properties that such a counterexample should have, and mentioned
    a "gut feeling" that such a counterexample exists.  Unfortunately,
    I have a similar "gut feeling" that GC is never worth the cost. 


    Bill Wolfe, wtwolfe@hubcap.clemson.edu

scc@cl.cam.ac.uk (Stephen Crawley) (09/25/89)

I wrote:
>> Bill Wolfe proposes a scheme for getting around the need for garbage 
>> collection by storing objects in a database, using object expiration 
>> times to avoid proper garbage collection.  Unfortunately this scheme
>> doesn't achieve its aim. 
>>
>>   "Where objects can have an infinite lifetime, a protocol can be devised 
>>    whereby the database must be checked at least once per some arbitrary 
>>    time period by all users, which will permit identification numbers 
>>    to be recycled after one full time period of nonexistence."
>>
>> Exactly.  In order to check each object that it still refers to, each 
>> "user" (actually application) must traverse the graph of its objects in 
>> the database.  This process is exactly equivalent to the mark phase of 
>> a garbage collection, except that the application is doing it!!!

Bill Wolfe replies
>  Not exactly.  To start with, the user does not have to do any work
>  at all if the identification number's expiration period is also used
>  as the period over which the user will lose interest in the object if
>  it is not further utilized.  In this case, the identification number
>  will be allowed to expire and the user need only throw it away.

In other words the objects don't have a long lifetime.  This is a red 
herring!!  The case we were discussing is the one where the objects DO 
have long lifetimes.

Bill continues:
>  If the user does make use of the number within the expiration period,
>  then the number is automatically revalidated for another period ...

Granted.  But unless there is a GUARANTEE that EVERY object of interest 
is touched periodically by the application, there is a risk that objects
of interest to the user will be destroyed prematurely.  

> ... unless the database replies that the object has been destroyed, 
> in which case the identification number expires as well.

From the user's point of view, this is unacceptable behaviour by the
application / database cabal.  

Bill continues:
>  In the worst case, the user will have to do a database access just to
>  verify that the object still exists and that the identification number
>  remains valid for another expiration period.  Since many objects have
>  a finite lifetime, this burden is borne only by those users who make
>  use of infinite-lifetime objects.

Not true.  There is no way for an application to intuit the lifetime of a 
given set of database entries.  If it makes a guess, and is wrong it has 
committed the sin of destroying user data.  We can't expect the user to 
specify an accurate lifetime either.  The user can't accurately predict the 
future any more than an application can.  It would be totally unreasonable to 
then come back to a user and say "21 of your objects expired and were 
deleted.  Oh, you didn't want that?  Well tough."  

In this day and age, users are more and more naive, and systems must 
be correspondingly forgiving.  Therefore, the burden of storage management
must be borne by ALL applications that deal with POTENTIALLY long lived
data items.  In practice this means ALL applications that store data in
the database.

Bill continues:
> The task of maintaining the validity of the identification number can 
> be done automatically if desired using a suitably designed identification-
> number abstraction which carries a user-settable automatic revalidation 
> on-off switch.

I'm not sure, but this sounds like Bill is finally admitting that the 
application needs to include a GC style marker.  

However ... user-settable object refreshing is like giving a hang grenade 
to a two year old.  I can just imagine it now:

    User: 	I came in this morning and half of my PC's database
  		has disappeared
    Support:	Hmmm ... why is object refresshing turned off???
    User:	Joe told me that turning off object would make the 
  		database would make it go faster.

Bill continues to describe an object access control scheme, which is
pretty much beside the point ...

I wrote:
>> One other point that Bill Wolfe misses is that the CPU overheads of 
>> managing objects in this way can be orders of magnitude greater
>> than the cost of using ordinary pointers.

Bill replies:
> We are managing objects in this way because they present worst-case
> properties which do not permit us to use more efficient techniques.

What about garbage collection???  

Surely Bill's scheme with object numbers and expiration times and (as 
Bill finally admits) object refreshing in application code is MUCH MORE
EXPENSIVE than an efficient garbage collector!  Bear in mind that a 
previous article (sorry no ref.) claimed measurements of overheads of 
only 3-5% for a modern garbage collector.

-- Steve

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)

From scc@cl.cam.ac.uk (Stephen Crawley):
>> ... unless the database replies that the object has been destroyed, 
>> in which case the identification number expires as well.
> 
> From the user's point of view, this is unacceptable behaviour by the
> application / database cabal.  

    The model involves a database which stores objects which can be
    created, destroyed, read, and written by various users.  The users
    might be partitioned into different groups (e.g., those authorized
    to destroy vs. those who cannot).  Now from the perspective of a
    user who can read and write but not destroy, the object's lifetime
    is potentially infinite, depending upon the whims of those who have
    the power to destroy.  By revalidating the identification number,
    such users receive continued assurances that the object still exists.

    If the object is destroyed, the identification numbers will then all
    expire automatically, regardless of how widely they were distributed.
 
> Bill replies:
>> We are managing objects in this way because they present worst-case
>> properties which do not permit us to use more efficient techniques.
> 
> What about garbage collection???  

    Won't work under the conditions I described (distributed environment).


    Bill Wolfe, wtwolfe@hubcap.clemson.edu

johnson@p.cs.uiuc.edu (09/27/89)

> by Bill Wolfe

>    I think we've established that managed and unmanaged storage
>    paradigms can coexist, and that components which manage their
>    own storage can avoid the inefficiencies of garbage collection.

In fact, I don't believe that managed and unmanaged storage paradigm
should coexist in one program.  You should either use components that
use automatic garbage collection or components that do not.  While it
is possible to link C code to Lisp or Smalltalk, it is always tricky
and error prone.

>    We also know that the user MUST be involved in storage management,
>    if for no other reason than to decide which data structures to
>    throw away in the event of a storage crisis.  

Automatic garbage collection prevents storage crises.  The system I
use generate an exception if it runs out of memory in spite of a
garbage collection, but I wouldn't dream of trying to handle that
automatically.  Moreover, it only happens during debugging, and
usually means an infinite loop.

Bill Wolfe keeps making statements to the effect that garbage collection
is expensive.  That is totally false.  Garbage collection is cheap.
Anyone who is worried by 5% should be considering assembly language.
Garbage collection is cheap and is going to be cheaper.  For example,
there has been little work on using optimizing compilers to reduce
the cost of garbage collection.  I bet that a good compiler can make
automatic garbage collection cheaper than doing it yourself.

The problem with garbage collection is that the efficient algorithms are
not real-time.  There is work being done in this area, and perhaps we
will soon find a way to make real-time programming compatible with
automatic memory management, but I don't think it is there yet.
However, the problem with garbage collection is NOT efficiency when
you measure the cost of garbage collection over the life of a program.

Ralph Johnson

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)

From johnson@p.cs.uiuc.edu:
> In fact, I don't believe that managed and unmanaged storage paradigm
> should coexist in one program.  You should either use components that
> use automatic garbage collection or components that do not.  While it
> is possible to link C code to Lisp or Smalltalk, it is always tricky
> and error prone.

    Since Ada components can coexist with no difficulty, this is
    not a true statement in general.

> Automatic garbage collection prevents storage crises.  The system I
> use generate an exception if it runs out of memory in spite of a
> garbage collection, but I wouldn't dream of trying to handle that
> automatically.  

   Those who write embedded systems would tend to disagree.
 
> [The cost is only 5%]

   At what cost in terms of extra space requirements?

> [Real-time GC is coming, Real Soon Now]

   When it gets there, let me know.


   Bill Wolfe, wtwolfe@hubcap.clemson.edu

scc@cl.cam.ac.uk (Stephen Crawley) (09/27/89)

From wtwolfe@hubcap.clemson.edu:
> From johnson@p.cs.uiuc.edu:
>> In fact, I don't believe that managed and unmanaged storage paradigm
>> should coexist in one program.  You should either use components that
>> use automatic garbage collection or components that do not.  While it
>> is possible to link C code to Lisp or Smalltalk, it is always tricky
>> and error prone.
>
> Since Ada components can coexist with no difficulty, this is
> not a true statement in general.

Suppose we garbage collect an object that contains a pointer to a 
non-GC'able object X.  The garbage collector cannot reclaim X and
it cannot tell the ADT for X that the pointer has been blown away.
In general the object X is likely to be lost to a storage leak.  

The only possible escape is if the garbage collection mechanism includes 
some scheme for applying a "destructor" to an object that is about to
be reclaimed by the GC.  I have not come across such a scheme described
in the ADA RM. 

-- Steve