[comp.lang.ada] Garbage Collection

g-rh@cca.CCA.COM (Richard Harter) (03/31/88)

In article <628@ra.rice.edu> boehm@titan.rice.edu (Hans Boehm) writes:

>  My experience is that any attempt at manipulation of interesting linked
>structures in a language that doesn't support real automatic storage
>management will either fail, or waste large amounts of debugging time.
>(My experience includes a (working) 40,000 line compiler, written in C, that
>manipulates a reference counted syntax "tree", or more accurately, a
>reference counted syntax graph.)  Normally, it is extremely difficult
>to track down bugs created by premature deallocation.  When such bugs are
>finally removed, the resulting programs normally include a substantial
>number of storage leaks.
>  Some recent experiments by Mark Weiser and myself with retrofitting a
>garbage collector to existing C programs, verify the latter point.
>(The garbage collector should never free anything since that was the
>programmers responsibility.  It does.  In other people's code.
>Our paper on this should appear in Software P&E shortly.)
>Mike Caplinger reported similar results for another application at the last
>USENIX conference, I believe.  We have resurrected C code with storage
>management bugs by linking it against a garbage collector (which in the case
>of C doesn't always work either, but it has a better track record than manual
>storage management).

	There is problably something I am missing, but I don't quite see
how you can implement garbage collection in C without collateral assumptions
-- how is the garbage collector supposed to know that that a block is free?

	Your main point is certainly true - manual allocation and deallocation
with the programmer being responsible for ensuring that all comes out right
really doesn't work very well.  This was a critical issue for us -- our
software has to run on systems which do not have pseudo-infinite memory
and it sometimes has to run for very long runs.  We can't afford memory
leaks (or premature deallocation).  What we did was to write our own
storage allocation package.  Salient features:

The allocation routine takes two arguments, a size and an ID.  The latter
is a unique integer specifying that particular call.  (We manage the ID list
manually.)  The allocator creates a node for each request block.  In this
node are sundry links, the ID, and (in effect) a date stamp.  There is also
a hash table that contains the address of every allocated block.  All allocator
control information is stored in a separate space from allocated memory itself.
(This eliminates a large class of mystery bugs associated with overwriting
allocator control data.)

The deallocation routine takes one argument, the address of the block being
returned.  The deallocation routine validates the address as being an
allocated address.  This eliminates another class of bugs caused by passing
an improper or stale address to the deallocation routine.

There is a facility for printing out a complete map of allocated memory
including call ID's and date stamps.  We use this, from time to time, to
check the code for storage leaks.

This scheme is not as good as automatic garbage collection, but we have
found it to be satisfactory.  
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

boehm@titan.rice.edu (Hans Boehm) (04/01/88)

In reference to g-rh@cca.CCA.COM (Richard Harter)'s question:
>        There is problably something I am missing, but I don't quite see
>how you can implement garbage collection in C without collateral assumptions
>-- how is the garbage collector supposed to know that that a block is free?

  This is usually, but not always, possible.  The idea is to use a traditional
non-compacting mark-sweep collector.  On a UN*X system, we start by scanning
the registers, stack, data, and (statically allocated) bss segments for
any addresses of valid allocated objects.  We subsequently scan allocated
objects we find in the same way.  For a typical C program, all reachable
objects can be found in this manner.  (Storing exclusive or'ed pointer
values, tagged pointers, and similiar programming tricks, as well as some
rarely used (for C) compiler optimization techniques may break this scheme.)
  In theory, we may end up marking a few unreachable objects as reachable
as well, since integers may be misinterpreted as pointers.  The collector
can be designed so that the only result of this is excess storage retention.
(This does require that we don't compact.)  In my experience, I have never
seen any indication of such unnecessary retention in practice.  There is
an argument to be made that no garbage collector is completely immune
from this phenomenon.
  The fact that the collector has to perform a fairly complicated (but still
O(1)) check for pointer validity does slow it down a little.  On a Sun 3/260,
with a 2 Meg heap, half of which is in use, we normally see collection times of
about 3 seconds.  (This assumes longword aligned pointers and small data and
static bss areas.)  All of this is largely irrelevant if the
collector is used as a debugging tool to detect storage leaks.
  More details can be found in Mark Weiser's and my upcoming Software P&E
article entitled "Garbage Collection in an Uncooperative Environment".

Hans-J. Boehm
boehm@rice.edu

g-rh@cca.CCA.COM (Richard Harter) (04/01/88)

In article <632@ra.rice.edu> boehm@titan.rice.edu (Hans Boehm) writes:
>>        There is problably something I am missing, but I don't quite see
>>how you can implement garbage collection in C without collateral assumptions
>>-- how is the garbage collector supposed to know that that a block is free?

>  This is usually, but not always, possible.  The idea is to use a traditional
>non-compacting mark-sweep collector.  On a UN*X system, we start by scanning
>the registers, stack, data, and (statically allocated) bss segments for
>any addresses of valid allocated objects.  We subsequently scan allocated
>objects we find in the same way.  For a typical C program, all reachable
>objects can be found in this manner.  (Storing exclusive or'ed pointer
>values, tagged pointers, and similiar programming tricks, as well as some
>rarely used (for C) compiler optimization techniques may break this scheme.)

I see.  My block was in not thinking of the entire address space of the
program as accessible.  For our purposes it is not (we live in portability
land and are not allowed to do things like that :-)).  That is one reason
we did our own allocator (in C) -- it allows us to track down memory problems
portably.

I still don't see how a garbage collector helps with premature deallocation,
unless you scrap deallocation entirely, and rely entirely on garbage
collection.

The problem with garbage collectors in the old days was that they did
garbage collection when the available space was allocated.  This works
well enough until the actual amount of space allocated starts to approach
the amount of available space, and then the garbage collector starts
thrashing.  Is this a problem?  If not, how do you deal with it?
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

boehm@titan.rice.edu (Hans Boehm) (04/02/88)

  Richard Harter writes:
>I still don't see how a garbage collector helps with premature deallocation,
>unless you scrap deallocation entirely, and rely entirely on garbage
>collection.

>The problem with garbage collectors in the old days was that they did
>garbage collection when the available space was allocated.  This works
>well enough until the actual amount of space allocated starts to approach
>the amount of available space, and then the garbage collector starts
>thrashing.  Is this a problem?  If not, how do you deal with it?

  Our current version of the collector doesn't help you with
premature deallocation.  Further instrumenting it might help.  For example,
you could use the marking algorithm to determine whether "free"d objects
really were inaccessible.  (This requires a coding style in which
unused references are explicitly cleared.  Replacing "free" by a
macro that clears its argument takes care of a lot of that and,
by itself, occasionally helps track down bugs.)  I don't have much experience
with this though.
  In non-real-time applications, we tend to use explicit deallocation where
it's easy to do, and let the collector take care of the error-prone cases.
  You are right in that the garbage collector code is not completely portable.
It should really be viewed as part of the run-time library, which is rarely
completely portable anyway.  Actually porting it isn't terribly hard.  There
is only about a screenful of machine dependent code in the collector.
The longest part of that deals with marking from the machine registers.
  Frequent collections in small address spaces can become a problem.
We operate in a virtual memory environment, so we deal with it by expanding
the heap size whenever a collection fails to return enough memory.  This
requires allocating a bit of extra memory.  For debugging, this hardly
matters.  (Memory is supposedly cheap.) In a production environment,
especially without virtual memory, it could be a problem.  On the other hand,
for some existing code, this can be much less significant than memory
otherwise lost to leaks.

Hans-J. Boehm  (boehm@rice.edu)

nevin1@ihlpf.ATT.COM (00704a-Liber) (04/05/88)

[followups to comp.lang.misc]

In article <26358@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes:
>	There is problably something I am missing, but I don't quite see
>how you can implement garbage collection in C without collateral assumptions
>-- how is the garbage collector supposed to know that that a block is free?

The only way to do this is to allocate a giant heap with malloc()
and do *all* the memory management on your own.  It is no different
than implementing in C or C++ a language that does automatic garbage
collection.


The problem I found with languages that do automatic garbage collection is
that there is no way for me to modify the implementation of the class (or
the type, for those of you unfamiliar with C++ lingo :-)) to make it better
suit my needs.

For applications where the programmer cost is greater than the run-time
cost (such as prototypes, personal tools, one-shot programs), I tend to use
languages with built in memory management.  For other applications, however,
I would rather use a language like C++ which allows me to change the
implementation of a class without affecting the rest of my code.
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/12/88)

From article <124000026@inmet>, by ron@inmet.UUCP:
> 
> Garbage collection certainly does not come for free, but it is extremely
> useful.  It frees the programmer from the need to repeatedly write
> sophisticated ADT deallocate routines and to deal with the potentially
> massive book-keeping requirements for determining whether an object is in
> fact garbage.  In general, lack of GC forces a programmer to invest a lot of
> effort addressing issues that are not fundamentally related to the problem
> domain.  My experience has also shown that problems of "slow heap leakage"
> are among the hardest errors to correct.

   "Repeatedly" write sophisticated ADT deallocate routines??  The basic 
   idea is to write an ADT once and use it forever (i.e., until the next 
   language modification :-) ); where does "repeatedly" come in?

   Dealing with the details of allocation requires as much effort as  
   dealing with deallocation.  Perhaps you would say that any maintenance 
   of the structure used to represent your ADT is not fundamentally
   related to the problem domain.  If so, I will have to disagree.

   ADTs, and computer programs in general, are characterized by both
   time complexity and space complexity.  Making the tradeoffs between
   these forms of complexity is one of the most fundamental questions
   a programmer must deal with.  This is the topic of much, if not most, 
   of the literature regarding ADTs.

   Also, I disagree with the claim that a lot of effort is involved;
   in my experience, DESTROY routines are among the easiest to code.
   Usually, the ADT is defined recursively, lending itself to a very
   easy recursive destruction procedure; after that, just destroy
   any sub-ADTs in your descriptor, and from there it only takes one
   line of code to blow away the descriptor.  About the hardest part
   of the whole process is the tedium of looking through to see what
   fields are user-defined sub-ADTs, since they have to be explicitly
   destroyed due to Ada's lack of integration between UNCHECKED_DEALLOCATION
   and user-defined DESTROY procedures.


                                          Bill Wolfe

                                   wtwolfe@hubcap.clemson.edu

gateley@m2.csc.ti.com (John Gateley) (12/12/88)

In article <3832@hubcap.UUCP> wtwolfe@hubcap.clemson.edu writes:
>From article <124000026@inmet>, by ron@inmet.UUCP:
>> 
>> Garbage collection certainly does not come for free, but it is extremely
>> useful.  It frees the programmer from the need to repeatedly write
>> sophisticated ADT deallocate routines and to deal with the potentially
>> [...]
>   "Repeatedly" write sophisticated ADT deallocate routines?? [...]
>    where does "repeatedly" come in?

The repeatedly comes in because you write more than one ADT, each requiring
a/many deallocation routines.

>   Dealing with the details of allocation requires as much effort as  
>   dealing with deallocation. [...]

This is not true. When you allocate an object, all you need is "free"
memory. The allocator does not have to worry about whether or not the
object is in use etc. Deallocation, when done correctly, requires some
sort of checking to make sure no currently in use object refers to the
object being deallocated. This is a much more complicated problem.

>   [...]
>   Also, I disagree with the claim that a lot of effort is involved;
>   in my experience, DESTROY routines are among the easiest to code.
>   Usually, the ADT is defined recursively, lending itself to a very
>   easy recursive destruction procedure; [...]

This method is fine, as long as no structures are shared. But when you
share structure, then the problem is more complex. A sub adt-object
can be used in more than one object. Deleting an object which contains
this sub-object may or may not require deleteing the sub object. One
answer is "reference counting", adding a field to each object which says
how many people refer to it. Automating this gives reference count
garbage collection, which is slower than the GC techniques in current
use. So if your programming style never uses shared structure, then you
don't need GC. If it does, then GC is a valuable time saving tool.

>                                          Bill Wolfe
>                                   wtwolfe@hubcap.clemson.edu

John Gateley
gateley@tilde.csc.ti.com

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/13/88)

From article <65475@ti-csl.CSNET>, by gateley@m2.csc.ti.com (John Gateley):
>>   Also, I disagree with the claim that a lot of effort is involved;
>>   in my experience, DESTROY routines are among the easiest to code.
>>   Usually, the ADT is defined recursively, lending itself to a very
>>   easy recursive destruction procedure; [...]
> 
> This method is fine, as long as no structures are shared. [...] 
> [...] if your programming style never uses shared structure, then you
> don't need GC. 

    Precisely.  

    Structural sharing is nothing more than a euphemism for hacking.  
    It is the spatial equivalent of what hackers enjoy doing with time 
    in the name of efficiency, in their unmitigated zeal to violate every
    form of abstraction, to throw all traces of readability and reliability 
    to the winds.  

    If GC were prohibited, it would infuriate all those who enjoy hacking
    with space as opposed to time.  Good.  Let them use C.  Ada is the
    language of those who appreciate that "abstraction is the fundamental
    tool with which complexity can be effectively managed", and recognize
    that to violate an abstraction (rather than design another which is
    more appropriate to their needs) is to be penny wise but pound foolish.

    Folks, space management in Ada is NOT that difficult.  I had problems with
    space management in Pascal, but Ada-based ADTs have basically made it
    into a non-issue.  The real challenges which remain are problems which
    are inherent in the language itself, such as getting local ADTs to be 
    properly destroyed as an automatic consequence of a block exit.

    I agree that existing tools for detecting space-wasting ADTs are 
    inadequate, but let's call for better debugging tools rather than
    going about advocating the practice of spewing garbage. 


    
                                        Bill Wolfe

                                 wtwolfe@hubcap.clemson.edu
 

klaiber@june.cs.washington.edu (Alexander Klaiber) (12/13/88)

In article <3848@hubcap.UUCP> wtwolfe@hubcap.clemson.edu writes:
>From article <65475@ti-csl.CSNET>, by gateley@m2.csc.ti.com (John Gateley):
>> This method is fine, as long as no structures are shared. [...] 
>> [...] if your programming style never uses shared structure, then you
>> don't need GC. 
>    Precisely.  
>    Structural sharing is nothing more than a euphemism for hacking.  
>    It is the spatial equivalent of what hackers enjoy doing with time 
>    in the name of efficiency, in their unmitigated zeal to violate every
>    form of abstraction, to throw all traces of readability and reliability 
>    to the winds.  

Apart from the amount of cheap flames you've thrown in, do you think you
REALLY have thought this out?

Sure, structure-sharing *is* sometimes used with the goal of conserving
memory, but there are cases where it is vital; especially when you're doing
object-oriented like programming.

Assume I maintain mailing-lists of people and I represent the lists by
pointers to objects representing one person each; i.e.

type person is (some adt)
type mailinglist is new list(person);

Now if I have more than one mailing-list, obviously I have structure-sharing: 
a given person may appear on more than one list. And it is *NOT* very wise
to create one copy of each person for every mailing list that person appears
on: people can change the addresses and by having just *one* instance of
the person around, there is only *one* place where it needs to be changed
-- these changes will be completely transparent to the mailing lists or,
for that matter, *FOR ANY OTHER REFERENCES TO THE PERSON* that I might
later add to the system!

In a system such as this, I will need *some* form of keeping track just 
how many references to a given object (a person) exist, so I know when
I can safely deallocate it --- now one method is to have the programmer
take care of it, but that might require making the (mailinglist) and 
(person) know about each other, thus creating additional dependencies and
limiting further extensibility. The other method is to supply GC.

Now you *can* do without structure sharing by creating "deep copies" of
each reference to any given object --- but you will have to live with the
"updating problem" which is well-known in databases where multiple copies
of objects are kept.

The bottom line is that structure-sharing can be a valuable abstraction
tool, and one you apparently are not aware of.

>    If GC were prohibited, it would infuriate all those who enjoy hacking
>    with space as opposed to time.  Good.  Let them use C.  Ada is the

Why do you want to explicitly *prohibit* it?

>    Folks, space management in Ada is NOT that difficult.  

Depends on what kinds of applications you are writing.

Note that I do not ask that GC be *required* (obviously that would be rather
foolish in real-time applications), but it is sure nice to have it, since
it can greatly help to reduce the complexity of your programs by
eliminating all these storage-management chores and allowing you to
concentrate on the real problems.

It sure eliminates a *lot* of possibilities for errors and actually makes
programs much more reliable and readable. Maybe *you* are throwing
readability and reliability in the wind...



Alexander Klaiber
klaiber@june.cs.washington.edu


P.S.:
For complex applications as these, I tend to programm in Smalltalk
rather than in Ada, so I might be somewhat biased. But I *have* done
o-o programming in C++, and believe me, storage management can be
*nasty*!

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/14/88)

From article <6702@june.cs.washington.edu>, by klaiber@june.cs.washington.edu (Alexander Klaiber):
> Assume I maintain mailing-lists of people and I represent the lists by
> pointers to objects representing one person each; i.e.
> 
> type person is (some adt)
> type mailinglist is new list(person);
> 
> Now if I have more than one mailing-list, obviously I have structure-sharing: 
> a given person may appear on more than one list. 

    No, this isn't structural sharing, because you are explicitly manipulating 
    a list of pointers.  Structural sharing occurs when you have two objects, 
    A and B, and making some modification to B causes a modification to A as 
    well, because they share a portion of their structure.  The structure
    of a pointer is simply the address field, and the address fields are not
    being shared.  Now if we had two ADTs, both implemented by *hidden*
    pointers, and assignment failed to result in a deep copy, then THAT 
    would be structural sharing. 

    In this particular instance, it would be best to store the key by which
    a person could be identified (social security number, for example) in
    the list, and then using the key to retrieve the current address from
    the Person database.  Thus, a mailing list would be a list of social
    security numbers. 
   

                                            Bill Wolfe

                                    wtwolfe@hubcap.clemson.edu
 

sommar@enea.se (Erland Sommarskog) (12/14/88)

Bill Wolfe (wtwolfe@hubcap.clemson.edu) refuses to see the point with
garbage collection:
>   "Repeatedly" write sophisticated ADT deallocate routines??  The basic 
>   idea is to write an ADT once and use it forever (i.e., until the next 
>   language modification :-) ); where does "repeatedly" come in?

But you have to write the same boring code for every ADT you implement!
That was what ron@inmet was talking about.

>   Also, I disagree with the claim that a lot of effort is involved;
>   in my experience, DESTROY routines are among the easiest to code.
>   Usually, the ADT is defined recursively, lending itself to a very
>   easy recursive destruction procedure; after that, just destroy
>...

If it is just as easy as that. Let's say that you write a package for
double-linked lists. You provide a procedure for taking an element
out of a list. Should you free the memory allocated? You don't know
because the caller may or may not still keep an reference to the
element. He may be totally uninterested in it, but he may also want
to insert it in another list.  
  So you end up providing another procedure for deallocating the element,
which the user must to remember to call, unless he wants his memory
resources run low.
  And same goes for every ADT you define. You must provide a deallocation
routine that the user must remember to call. What worse is, the user  
may notice if he omits the call, because the implementation of the 
ADT is a static type, and the deallocation is void. And when he is
ready with his tests, you change the implementation, and his program
blows the memory.

Although you refuse to see it, garbage collection saves the programming
community from a lot of headache and true object-oriented programming
would be impossible without it.



-- 
Erland Sommarskog
ENEA Data, Stockholm
sommar@enea.se
"Frequently, unexpected errors are entirely unpredictable" - Digital Equipment

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/14/88)

From article <6702@june.cs.washington.edu>, by klaiber@june.cs.washington.edu (Alexander Klaiber):
> 
> Why do you want to explicitly *prohibit* it [garbage collection]?
> 

    Basically, I charge garbage collection with the same crime that
    the GOTO was charged with: its sole function is to facilitate
    UNDISCIPLINED programming.  From Tremblay and Sorenson, 1985:

       A common thought among proponents of the GOTO is: "I just
       might need it; something might come up".  The answer to
       this appears to be: "Nothing ever does".

    If this charge can be successfully prosecuted, then garbage
    collection will face the same penalty: the total elimination 
    of the facility, in favor of a more disciplined environment.



                                          Bill Wolfe

                                  wtwolfe@hubcap.clemson.edu

klaiber@june.cs.washington.edu (Alexander Klaiber) (12/14/88)

In article <8812131536.AA08238@galaxy.compass.com> worley@compass.UUCP (Dale Worley) writes:
>I will add that garbage collection is one of the greatest aids to
>readabilty and reliability, because it takes a complicated and
>error-prone part of programming (reclaiming storage) and eliminates it
>completely.  It probably shouldn't be required in Ada, but as an aid
>in programmability, it's great.

Thank you very much, exactly my point. At least I'm not alone :-)

==========================================================================
In article <3861@hubcap.UUCP> wtwolfe@hubcap.clemson.edu writes:
>    No, this isn't structural sharing, because you are explicitly manipulating 

Call it what you want, the fact is that multiple references exist to one
object, thus making explicit (i.e. by the programmer) storage management
a nightmare and an unnecessary chore. GC deals with this situation easily.

>    In this particular instance, it would be best to store the key by which
>    a person could be identified (social security number, for example) in
>    the list, and then using the key to retrieve the current address from
>    the Person database.  

That is, if you are willing to willing to pay the cost for the additional
data base lookup (usually an O(log n) operation), as well as the additional
complexity introduced in your program. 
I didn't say it couldn't be done without (what I call) structure sharing, 
I just wanted to point out that this would be a very intuitive approach 
and would probably lead to a very readable code.

The method of using key searches is, I believe, an unnecessary hack to
avoid multiple references. Why bother to include a full B-tree package
or such when we can do without? 

Clearly, there is a tradeoff between readability and efficiency. In my
version, I reduce efficiency by requiring garbage collection (although
there exist rather powerful and fast GC algorithms). Your proposal
would definitely increase the complexity of the program and, due to the
overhead of a DB-lookup, might actually run slower than the version with
GC --- depending on circumstances such as frequency of lookups in the 
database, size of the database (your program) vs. time required for GC, 
amount of garbage produced etc. (my program).

In my opinion, people tend to put too much emphasis on plain efficiency of
their programs and too little on issues such as readability, reliability
and extensibility --- and I contend that (depending on the application)
garbage collection *CAN* help to improve all of these.

Hence, it is *wrong* to flatly ignore GC; it sure has its place and,
especially in o-o systems, should be available as an option to the 
programmer. Now whether it should be required in *Ada* is a question of
just where Ada9x will go and not really my concern.



	Alexander Klaiber
	klaiber@june.cs.washington.edu

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/14/88)

From article <6713@june.cs.washington.edu>, by klaiber@june.cs.washington.edu (Alexander Klaiber):
> In article <8812131536.AA08238@galaxy.compass.com> worley@compass.UUCP (Dale Worley) writes:
>>I will add that garbage collection is one of the greatest aids to
>>readabilty and reliability, because it takes a complicated and
>>error-prone part of programming (reclaiming storage) and eliminates it
>>completely.  

    I don't see storage reclamation as being complicated or error-prone,
    but maybe that's because I'm so accustomed to dealing with it that
    it comes almost automatically.  Probably the most important reason,
    though, is that the ADT paradigm provides such a powerful mechanism
    for simplifying the situation.  I'm doing an ADT implementation
    right now (mergeable priority queues implemented as binomial forests),
    and I find it hard to believe that anyone could sit there and do an
    ADT implementation without being able to visualize the structure
    in question as the code is being generated.  Given that the implementor
    can visualize the structure, the process of destroying it seems trivial.

> In article <3861@hubcap.UUCP> wtwolfe@hubcap.clemson.edu writes:
>>    No, this isn't structural sharing, because you are explicitly manipulating 
> 
> Call it what you want, the fact is that multiple references exist to one
> object, thus making explicit (i.e. by the programmer) storage management
> a nightmare and an unnecessary chore. GC deals with this situation easily.

    There's an even easier way to deal with it:

       *** Never let a pointer out of your sight ***

    Given that one is doing this, the question is reduced to managing
    local pointers to a local structure, which is quite easy.

>>    In this particular instance, it would be best to store the key by which
>>    a person could be identified (social security number, for example) in
>>    the list, and then using the key to retrieve the current address from
>>    the Person database.  
> 
> That is, if you are willing to willing to pay the cost for the additional
> data base lookup (usually an O(log n) operation)

     Not necessarily.  If all you're doing is insertions, deletions,
     and retrievals, a hashing implementation would give better results.

> I didn't say it couldn't be done without (what I call) structure sharing, 
> I just wanted to point out that this would be a very intuitive approach 
> and would probably lead to a very readable code.

     I'd contest both claims; as a maintainer, I wouldn't want anything
     to get in the way of being able to pin down the state of that program
     precisely, with regard to both time AND SPACE.  For one who is
     accustomed to being able to nail down the precise state of a program,
     code from which it is impossible to "complete the picture" is
     highly counterintuitive, and will probably wind up being rewritten.

> The method of using key searches is, I believe, an unnecessary hack to
> avoid multiple references. Why bother to include a full B-tree package
> or such when we can do without? 

     How can you possibly analyze the space complexity of your program
     when you can't pin down the precise status of every entity???  You
     can't just wave your hands and say "Well, I'm using this much space,
     plus there's a bunch of space that may or may not be occupied..."!!

     And let's not forget the pure hell of doing a time analysis given
     that you can be interrupted unpredictably for garbage collection,
     the timing and duration of which is totally beyond your control,
     particularly in that it depends on how much memory happens to be
     currently installed on the executing system!!

> In my opinion, people tend to put too much emphasis on plain efficiency of
> their programs and too little on issues such as readability, reliability
> and extensibility 

     At last, something we can ALL agree on.  (I hope...!)


                                     
                                       Bill Wolfe

                                wtwolfe@hubcap.clemson.edu

ok@quintus.uucp (Richard A. O'Keefe) (12/14/88)

In article <3865@hubcap.UUCP> billwolf@hubcap.clemson.edu writes:
>    Basically, I charge garbage collection with the same crime that
>    the GOTO was charged with: its sole function is to facilitate
>    UNDISCIPLINED programming.  From Tremblay and Sorenson, 1985:

This is completely back-to-front.  Doing your own storage management
is like doing your own stack management instead of using procedure calls.
If you want to compare garbage collection with a control structure,
compare it with resursive procedures.

>    If this charge can be successfully prosecuted, then garbage
>    collection will face the same penalty: the total elimination 
>    of the facility, in favor of a more disciplined environment.

I think it would be very hard to accuse functional programming languages
like ML of providing an _un_disciplined environment.  Yet automatic storage
control is vital to them.  A word about the history of ML may make this
clearer.  ML was originally the MetaLanguage of Edinburgh LCF, a system
for doing proofs about computations.  It is strongly typed, because it
was important that anything the system claimed to be a proof should be a
valid proof.  There is a basic set of proof-forming operations which are
known to be valid.  The language cannot permit changes to elements of a
data structure, otherwise the arguments to a proof might be changed after
the proof was constructed, rendering it invalid.  "deallocate" simply
doesn't make sense in that kind of environment.  Storage management is
particularly easy in a language without side effects to data structures.

leake@cme.nbs.gov (Stephe Leake) (12/15/88)

In article <6713@june.cs.washington.edu> klaiber@june.cs.washington.edu (Alexander Klaiber) writes:

   Clearly, there is a tradeoff between readability and efficiency. In my
   version, I reduce efficiency by requiring garbage collection (although
   there exist rather powerful and fast GC algorithms). Your proposal
   would definitely increase the complexity of the program and, due to the
   overhead of a DB-lookup, might actually run slower than the version with
   GC --- depending on circumstances such as frequency of lookups in the 
   database, size of the database (your program) vs. time required for GC, 
   amount of garbage produced etc. (my program).

You are not reducing the complexity of the program; you are merely
hiding it in the vendor-supplied memory management package. If the
program is written correctly, the complexity can be hidden in a
user-supplied memory management package. Then there is no difference
in the readability of the application code, and the programmer has the
chance to tailor the garbage collection algorithm to the application.
Since there are many garbage collection algorithms (each posting here
seems to mention another one), it is clear that each will be suited to
certain applications. Better to require the programmer to choose the
algorithm, than to be tempted to "live with" the single vendor
supplied one. There is no way the vendor can take into account the
trade-offs you mention, but you can!

Stephe Leake 	(301) 975-3431 		leake@cme.nbs.gov
National Institute of Standards and Technology
(formerly National Bureau of Standards)
Rm. B-124, Bldg. 220
Gaithersburg, MD  20899

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/15/88)

From article <864@quintus.UUCP>, by ok@quintus.uucp (Richard A. O'Keefe):
> In article <3865@hubcap.UUCP> billwolf@hubcap.clemson.edu writes:
>>    Basically, I charge garbage collection with the same crime that
>>    the GOTO was charged with: its sole function is to facilitate
>>    UNDISCIPLINED programming.  From Tremblay and Sorenson, 1985:
> 
> This is completely back-to-front.  Doing your own storage management
> is like doing your own stack management instead of using procedure calls.

     Procedure calls are used to manage the complexity of creating the
     structure and of maintaining it.  They should also be used to
     manage its destruction.  This is clear and consistent, and promotes
     readable and reliable operations on a well-defined structure with
     well-defined space/time properties.

> I think it would be very hard to accuse functional programming languages
> like ML of providing an _un_disciplined environment.  [...] 
> Storage management is particularly easy in a language without 
> side effects to data structures.

     OK, try getting a referentially transparent language to
     handle systems which are not static, but change over time.

     With increased power comes increased responsibility.   


                                      Bill Wolfe

                              wtwolfe@hubcap.clemson.edu

gateley@m2.csc.ti.com (John Gateley) (12/15/88)

A good example of when garbage collection is needed for readability:
An infinite precision integer arithmatic package (or ADT, or module,
or whatever you want to call it). Infinite precision integers will
take up an unknown amount of memory to represent. Either you have to
explicitly deallocate them when you are done, or let GC take care of them.
But now consider the uses of integers in programs, they are used all over
the place, with no regard for careful deallocation. Converting a program
to use infinite precision integers will be very difficult, since it is
hard to figure exactly when the program is 'finished' with a particular
number.

John

tpmsph@ecsvax.uncecs.edu (Thomas P. Morris) (12/15/88)

In article <6713@june.cs.washington.edu>, klaiber@june.cs.washington.edu (Alexander Klaiber) writes:
> The method of using key searches is, I believe, an unnecessary hack to
> avoid multiple references. Why bother to include a full B-tree package
> or such when we can do without? 
> 
You mean to tell us that you =intend= to keep your "mailing" lists
=permanently in memory=? You're =never= going to write them to a permanent
file system/disk? Or that your never going to keep a list larger than can
be kept in memory? Really!  The method of using key searches is a useful
aid to designing a program to read from a database larger than you can
keep in memory, which needs to be permanent, and which ought to be well-
behaved in terms of no =clobbering= other processes' performance!

:-) Yeah, I know that the mailing list abstraction was only for an example.
But a fairly poor example if you wish to justify structure sharing, which
*has* its proper place...

> Clearly, there is a tradeoff between readability and efficiency. In my
> version, I reduce efficiency by requiring garbage collection (although
> there exist rather powerful and fast GC algorithms). Your proposal
> would definitely increase the complexity of the program and, due to the
> overhead of a DB-lookup, might actually run slower than the version with
> GC --- depending on circumstances such as frequency of lookups in the 
> database, size of the database (your program) vs. time required for GC, 
> amount of garbage produced etc. (my program).
> 
For something like a database system (i.e. mailing lists, again), I would
have to disagree that a keyed-search DB-lookup paradigm is necessarily
less readable or understandable or complex than having to pre-process
the databases (to create your structure sharing) to have direct access,
which might not be reasonable or possible with a *large* database.

Your points about speed of access are well taken, but consider the end
user aspects: how many folks are going to wait 10, or 20, or 30 minutes
for you to read in the external data, build your structure sharing
structures, and =then= process their mailing list (personnel records,
widget manufacturing data, etc), rather than using a relational database
paradigm, using keyed access to those external permanent files?

> In my opinion, people tend to put too much emphasis on plain efficiency of
> their programs and too little on issues such as readability, reliability
> and extensibility --- and I contend that (depending on the application)
> garbage collection *CAN* help to improve all of these.

  Yes and yes. The mailing list example is and unfortunate choice perhaps.
And some of us put also put an emphasis on how well a system works for
its intended end-users!
-----------------------------------------------------------------------------
Tom Morris                                 BITNET: TOM@UNCSPHVX
UNC School of Public Health                UUCP  : ...!mcnc!ecsvax!tpmsph
-----------------------------------------------------------------------------
-- 
-----------------------------------------------------------------------------
Tom Morris                                 BITNET: TOM@UNCSPHVX
UNC School of Public Health                UUCP  : ...!mcnc!ecsvax!tpmsph
-----------------------------------------------------------------------------

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/16/88)

From article <4159@enea.se>, by sommar@enea.se (Erland Sommarskog):
> But you have to write the same boring code for [destroying]
> every ADT you implement!

     If there were an ability to iterate over the fields of a
     generic record parameter, standard generics for recursively
     destroying certain classes of structures (e.g., n-ary trees)
     could be written, thus automating the process to some extent.  

     The reason we must suffer writing the same code repeatedly is that 
     Ada does not permit us to abstract over arbitrary record structures. 

> Let's say that you write a package for double-linked lists. You provide 
> a procedure for taking an element out of a list. Should you free the 
> memory allocated? You don't know because the caller may or may not still 
> keep an reference to the element. He may be totally uninterested in it, 
> but he may also want to insert it in another list.  

    When the list is instantiated, the user tells us what type of object
    is to be kept in the list, and provides us with procedures for
    assignment and equality.  Objects are copied into and out of the list.  
    When an object is to be removed from the list, the local copy of the
    object which is stored within the list is destroyed.  Now if the
    user wants, a pointer type can be supplied as the type of object
    to be listed, and the user-supplied equality function can compare
    the objects pointed to rather than the pointers themselves.  The
    user then bears all responsibility for managing the "external storage",
    and can track his/her own references.

    There is no implementor's dilemma here whatsoever.


                                          Bill Wolfe

                                    wtwolfe@hubcap.clemson.edu
 

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/16/88)

From article <65713@ti-csl.CSNET>, by gateley@m2.csc.ti.com (John Gateley):
> A good example of when garbage collection is needed for readability:
> An infinite precision integer arithmatic package (or ADT, or module,
> or whatever you want to call it). Infinite precision integers will
> take up an unknown amount of memory to represent. Either you have to
> explicitly deallocate them when you are done, or let GC take care of them.
> But now consider the uses of integers in programs, they are used all over
> the place, with no regard for careful deallocation. Converting a program
> to use infinite precision integers will be very difficult, since it is
> hard to figure exactly when the program is 'finished' with a particular
> number.

     The deallocation of every object in the local environment is
     performed as an automatic service when a procedure, function,
     or local block is exited.  This is not garbage collection,
     because the programmer has implicitly directed that the 
     destruction be performed.  

     Unfortunately, Ada does not provide a means of integrating ADT 
     destruction algorithms into the mechanism providing this service.  
     This is Language Issue 35 of the Ada Language Issues Working Group; 
     it is presently under editorial review, a step which is usually followed 
     by approval and then by submission to ADA-COMMENT, according to the 
     ALIWG handout I obtained at Tri-Ada '88.


                                         Bill Wolfe

                                 wtwolfe@hubcap.clemson.edu

sommar@enea.se (Erland Sommarskog) (12/19/88)

Sworn enemy to garbage collection Bill Wolfe writes:
>     The deallocation of every object in the local environment is
>     performed as an automatic service when a procedure, function,
>     or local block is exited.  This is not garbage collection,
>     because the programmer has implicitly directed that the 
>     destruction be performed.  

I read this as "on block exit all memory allocated to variables
declared in that block should be deallocated". 
  Isn't this very dangerous? What if programmer copied the object to
a variable declared in a outer block? Or stored it in a table of some
sort in a subprogram call made in block? 
-- 
Erland Sommarskog
ENEA Data, Stockholm
sommar@enea.se
"Frequently, unexpected errors are entirely unpredictable" - Digital Equipment

gateley@m2.csc.ti.com (John Gateley) (12/20/88)

In article <3918@hubcap.UUCP> billwolf@hubcap.clemson.edu writes:
>From article <65713@ti-csl.CSNET>, by gateley@m2.csc.ti.com (John Gateley):
>> [I say infinite precision arithmetic is a good example of why GC is needed
>>  sometimes].
>
>     The deallocation of every object in the local environment is
>     performed as an automatic service when a procedure, function,
>     or local block is exited.  This is not garbage collection,
>     because the programmer has implicitly directed that the 
>     destruction be performed.  

But, integers are not always deallocated. They may be returned as the
result of a function, assigned to global variables, placed in the 
heap as parts of other data objects etc. When you try and do the same
with infinite precision integers, then the deallocation you describe
does not work! You can always copy them for these purposes, but this
can be very space/time inefficient. It also requires extra effort (remembering
to copy them), unless Ada provides some technique for doing this automatically.
So, GC is a good way to handle this problem.

>     Unfortunately, Ada does not provide a means of integrating ADT 
>     destruction algorithms into the mechanism providing this service.  
>     This is Language Issue 35 of the Ada Language Issues Working Group; 
>     ...

Integers, however, are not deallocated when a block is exited (if they
have been saved elsewhere). So, I don't think this language issue
applies to the infinite precision integer case.

John
gateley@tilde.csc.ti.com

wtwolfe@hubcap.UUCP (Bill Wolfe) (12/21/88)

From article <4176@enea.se>, by sommar@enea.se (Erland Sommarskog):
> Sworn enemy to garbage collection Bill Wolfe writes:
>>     The deallocation of every object in the local environment is
>>     performed as an automatic service when a procedure, function,
>>     or local block is exited.  This is not garbage collection,
>>     because the programmer has implicitly directed that the 
>>     destruction be performed.  
> 
> I read this as "on block exit all memory allocated to variables
> declared in that block should be deallocated". 
>   Isn't this very dangerous? What if programmer copied the object to
> a variable declared in a outer block? Or stored it in a table of some
> sort in a subprogram call made in block? 

   Not at all.  If a programmer assigns the value stored in some local
   object to some nonlocal object, then we simply have a new value for
   a nonlocal object, assuming we have not engaged in the foul practice
   of structural sharing.  Those who engage in structural sharing will
   get the run-time errors they deserve for engaging in such space-hacking. 

   Now there is, of course, the problem that the practice of assigning to
   nonlocal objects is somewhat distasteful, since this amounts to a
   "side effect" of a procedure or function.  The rap on side effects
   is that they frequently are not properly documented, and thus make
   a major contribution to the difficulty of understanding a program.

   In addition, optimizing compilers find it difficult to completely 
   define the relationship between procedures or functions and their
   enclosing environment, thus making the relevant optimizations costly.

   So... language designers to the rescue!!!  The solution to this
   problem has been discussed in recent issues of Sigplan Notices,
   and it is this:  Provide a scheme whereby procedures and functions
   have, by default, no relationship to their surrounding environment
   other than that which is explicitly documented in the parameter
   structure.  Then provide a facility whereby one can explicitly
   "poke holes in the shield" for selected external objects.  

   The maintainers are happy, because for once they can read the
   specification of a procedure or function and be certain that
   it does nothing which is not documented in that specification.

   The writers of optimizing compilers are absolutely ecstatic.  

   And the programmers can do their debugging faster as well.  

   But sadly for the programmers who pride themselves on obscure code, 
   there is now one less form of sloppiness available for their use.

   Oh, well, perhaps they'll find some comfort in C...


                                  
                                       Bill Wolfe

                               wtwolfe@hubcap.clemson.edu
 

wtwolfe@hubcap.UUCP (Bill Wolfe) (12/21/88)

In article <65914@ti-csl.CSNET>, gateley@m2.csc.ti.com (John Gateley) writes:
> >> [I say infinite precision arithmetic is a good example of why GC is needed
> >>  sometimes].
> >
> >     The deallocation of every object in the local environment is
> >     performed as an automatic service when a procedure, function,
> >     or local block is exited.  This is not garbage collection,
> >     because the programmer has implicitly directed that the 
> >     destruction be performed.  
> 
> But, integers are not always deallocated. They may be returned as the
> result of a function, assigned to global variables, placed in the 
> heap as parts of other data objects etc. 

    What we are (presumably) discussing here is the space management 
    applying to literals and/or anonymous values of some arbitrary type.  
    During compilation, the compiler will evaluate each literal as 
    an anonymous value of the appropriate type.  Anonymous values differ 
    from named values only in that the programmer cannot explicitly refer 
    to them by name.

    Typically, literals are used in the construction of anonymous
    expressions which will ultimately serve as a value to be assigned.
    They are then being supplied as "in" parameters, and we have already
    discussed the proper method by which compilers should handle anonymous
    values passed as "in" parameters: they should be passed by handoff;
    since no external name exists, there is no need to set any external
    object to "undefined".  Since anonymous values cannot simply exist
    in mid-air, and must always be consumed in the course of some statement,
    we know that they will always be consumed.  By the rules of Ada, they
    cannot be supplied as "in out" or "out" parameters; hence, the mechanism
    described for handling anonymous values which are being passed as "in"
    parameters in a space-efficient manner serves to cover all cases, if
    we treat assignment as a procedure requiring an "in" parameter for the
    source value; otherwise, the extension to this case is straightforward. 

    Thus, we have completely described the space management applicable to
    literals and other anonymous values, such that these values can be
    managed effectively without any need for recourse to garbage collection.

    All that is needed is a tighter definition of what happens to an 
    anonymous value which is passed as an "in" parameter.  Note that
    this analysis does not depend upon the amount of space which must
    be used to contain any given anonymous value.



                                           Bill Wolfe

                                   wtwolfe@hubcap.clemson.edu

sommar@enea.se (Erland Sommarskog) (12/27/88)

Bill Wolfe (wtwolfe@hubcap.UUCP) first said:
>>>     The deallocation of every object in the local environment is
>>>     performed as an automatic service when a procedure, function,
>>>     or local block is exited.  This is not garbage collection,
>>>     because the programmer has implicitly directed that the 
>>>     destruction be performed.  

I tried to understand:
>> I read this as "on block exit all memory allocated to variables
>> declared in that block should be deallocated". 
>>   Isn't this very dangerous? What if programmer copied the object to
>> a variable declared in a outer block? Or stored it in a table of some
>> sort in a subprogram call made in block? 

And Bill explains:
>   Not at all.  If a programmer assigns the value stored in some local
>   object to some nonlocal object, then we simply have a new value for
>   a nonlocal object, assuming we have not engaged in the foul practice
>   of structural sharing.  Those who engage in structural sharing will
>   get the run-time errors they deserve for engaging in such space-hacking. 


And I still don't understand. Let's say we have a tree handler
   Generic 
      Data_type is limited private;
      Procedure -- Assign, "<" and ">" 
   Package Tree_handler
      Tree_type is private;
Assume further that we instantiate this package with Some_access_type, 
it could be an imported limited type, it could be a locally defined.
Then we have the procedure:

    Procedure Insert_data_to_tree(Tree : in out Tree_type;
                                  Data_of_some_sort : in Any_type); 
    Reference : Some_access_type;
    Begin
       Reference := new Some_type;  -- Or subprogram call for external type
       Prepare(Reference, Data_of_some_sort);
       Insert(Tree, Reference);
    End Insert_data_to_tree;
    
According to Bill the data allocated to Reference should be freed when 
we exit this procedure, but if Insert does what we think this is of
course a severe error. Where did I go wrong? Did I plead guilty to
structure sharing just because I inserted a pointer to a tree? 
  Or does Bill mean that when Reference is inserted to the tree that 
Reference.all should be inserted to the tree, not the pointer itself?
But what if I also insert the reference into another tree (or some other 
structure) and then modify the referred data (some counter, for instance) 
and want this change to appear in both places?
  Note also that in this example, there are no side effects. We are only
dealing with local variables and paramerters. And still deallocation
on block exit is completely wrong. (And there is no way the compiler
could deduce that Insert really saves Reference somewhere.)

Several times in this discussion Bill Wolfe has claimed that 
garbage collection is a trouble-maker for maintainers. In what 
way? I can only see one: If the system due to a heavy use of memory
is spending a lot of time on garbage collecttion, something must be 
done. But in this case the problem is often quite easy to spot.
  Much worse for maintainence is when the system dies with "System 
access violation" or "segmentation fault" from time to time and you 
don't know why. The reason may be that an allocated area was erroneously 
released and then reused and the original data, that someone appearently 
is relying in, has been over-written. Or because an already deallocated 
was deallocated a second time.
  Bill Wolfe is trying to outline some alternative, but I
think he has to think more about, until it is equally safe.
(And really, I don't think he should. He is just reinventing
the wheel.)
  
Bill Wolfe said earlier that garbage collection encourages  
sloppy programming. Whether it is sloppy or not, I don't care,
but obviously garbage collection is taking a lot of the burden
of the programmer's shoulder and the maintainer's too. Why
should a programmer waste time bothering about something that
the computer could do much safer for him? After all, the  
programmer does not work for free.

So to conclude: I think Ada should require garbage collection
that could be disabled with a pragma for critical applications.
As it is now a portable application cannot rely on it, with
all the troubles explicit deallocation implies.
-- 
Erland Sommarskog
ENEA Data, Stockholm              This signature is not to be quoted.
sommar@enea.se

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/28/88)

From article <4195@enea.se>, by sommar@enea.se (Erland Sommarskog):

 [Erland describes a situation in which a user instantiates a tree 
  structure over an access type, and then considers the insertion
  procedure with which the user inserts an object into the tree]

> According to Bill the data allocated to Reference [in the example,
> a pointer to the user's object] should be freed when 
> we exit this procedure, but if Insert does what we think this is of
> course a severe error. Where did I go wrong? Did I plead guilty to
> structure sharing just because I inserted a pointer to a tree? 
>   Or does Bill mean that when Reference is inserted to the tree that 
> Reference.all should be inserted to the tree, not the pointer itself?
> But what if I also insert the reference into another tree (or some other 
> structure) and then modify the referred data (some counter, for instance) 
> and want this change to appear in both places?
>   Note also that in this example, there are no side effects. We are only
> dealing with local variables and paramerters. And still deallocation
> on block exit is completely wrong. (And there is no way the compiler
> could deduce that Insert really saves Reference somewhere.)

   The user has supplied the limited private type STORED_OBJECT, and
   defined assignment over this object.  Let's assume that the tree
   is implemented in terms of a record structure, one field of which
   is a STORED_OBJECT.  In the INSERT_ITEM procedure, the user sends
   "in out TREE_TYPE; in STORED_OBJECT".  The tree is modified as follows:
   a new record is created to represent a new node of the tree.  The 
   statement ASSIGN (NEW_NODE.STORED_ITEM, NEW_OBJECT) is executed,
   along with whatever pointer manipulations are necessary to maintain
   the structure of the tree.  

   Now we proceed to block exit.  Since the tree and the new object are
   *parameters* and not local variables, they are not subject to the
   automatic destruction which applies to the block's local variables.

   Since the user supplied an access type as the type of object to be
   stored in the tree, the user bears full responsibility for making
   responsible use of the fact that he/she is dealing with a tree of pointers.

> Several times in this discussion Bill Wolfe has claimed that 
> garbage collection is a trouble-maker for maintainers. In what way? 

      When there is great difficulty deciding whether a given object 
      exists or not, the maintainer experiences great difficulty 
      pinning down the precise state of the program.  
 
>   Much worse for maintainence is when the system dies with "System 
> access violation" or "segmentation fault" from time to time and you 
> don't know why. The reason may be that an allocated area was erroneously 
> released and then reused and the original data, that someone appearently 
> is relying in, has been over-written. Or because an already deallocated 
> was deallocated a second time.

      There are basically two cases in which this can happen:

         1) When an inexperienced ADT designer is in the process 
            of acquiring the mental discipline required of anyone
            who programs pointer-based structures.  To avoid this,
            purchase ADTs from professional houses or use experienced
            ADT designers.  Otherwise, it's a cost of professional training.

         2) When a user has engaged in S T R U C T U R A L  S H A R I N G.

                   *** Don't engage in structural sharing ***

> Bill Wolfe said earlier that garbage collection encourages  
> sloppy programming. Whether it is sloppy or not, I don't care,

    Obviously.

> Why should a programmer waste time bothering about something that
> the computer could do much safer for him? After all, the  
> programmer does not work for free.

    DESTROY procedures are easy to write, need only be written once
    per ADT, and can be reused indefinitely.  GC costs us the
    computer time required to repeatedly determine which storage
    is in use and which is not, and this cost must be paid repeatedly
    AT RUN TIME.  Given that this PERMANENT, RUN_TIME COST can be
    avoided by simply communicating the fact that a particular object
    is no longer needed, GC is a rather costly crutch for lazy programmers. 



                                             Bill Wolfe

                                     wtwolfe@hubcap.clemson.edu

rjh@cs.purdue.edu (Bob Hathaway) (12/28/88)

>According to Bill the data allocated to Reference should be freed when 
>we exit this procedure, but if Insert does what we think this is of
>course a severe error. Where did I go wrong? Did I plead guilty to

I think Bill meant deallocation should only occur if an object is no
longer accessible at the end of a subprogram or block, and Reference is 
accessible.  Subprogram and block exit is a very convenient time to perform 
clean-up of access types.  If Ada had an Adt construct for the "completion of
the Adt paradigm", we could specify a termination procedure for Adts which
could also be called at scope exit.  I agree that Adts/objects are an important
enough programming development to justify a new language construct.  I've 
worked with over 200,000 lines of code which used the older data-structure
oriented design and found the greatest problem in understanding and working
with these programs was caused by the use of global variables and the 
indescriminant access and modification of data-structures which were not 
implemented as Adts.  Such data-structures are poorly defined and lead to
code which is hard to read.

For my own opinion on the garbage collection discussion, I think a destroy 
procedure is desirable.  Adts need to be explicitly deallocated for long 
running daemons or loops even while objects are still accessible.  While 
calling a procedure or declaring objects in a local block from within a loop
allows automatic deallocation of objects, I'd rather use scope rules and 
subprograms for creating environments and performing actions and not for 
storage (de)allocation.  Also, I think Adt operations should include a destroy
procedure for completeness. If space management is not a problem the destroy 
procedure does not have to be called.  Concerning working sets, if a page is 
not accessed it will not reside in memory for long and should not cause a 
virtual space management problem.

I strongly agree with assignment operator overloading, and another
powerful extension is to allow any name/operator to be overloadable.

For yet another Ada 9X extension, I propose procedural variables.  As in 
Modula procedural variables can be limited to top level procedures but formal
parameter names must be included for named parameter association.  Here is an
example declaration:
    type insertNode_procedure = procedure (adt : in out structure,
					   node : in node);
Variables of type insertNode_procedure can be assigned to any procedure with
the same parameter type structure (type domain).  Procedural variables can
avoid the redundant use of case statements by allowing an operation to be 
stored within an Adt.  It also allows individually parameterizable and 
reprogrammable Adts since operations can be provided to alter the Adts actions
or structure.  Generic subprogram parameters can only allow Adts to be set 
once for any instantiation.  I use procedural variables and function pointers
in Adts frequently when programming in languages other than Ada and am 
convinced they are an elegant way to model Adt actions.

                                         Bob Hathaway
                                         rjh@purdue.edu

snorri@rhi.hi.is (Snorri Agnarsson) (12/29/88)

From article <3985@hubcap.UUCP>, by billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,):
>       When there is great difficulty deciding whether a given object 
>       exists or not, the maintainer experiences great difficulty 
>       pinning down the precise state of the program.  

 Exactly - and garbage collection is precisely what you need to get rid
 of this difficulty.
 When you have garbage collection there is no difficulty in determining
 whether an object exists or not.  It exists if you have some way to
 refer to it, otherwise not.

>     DESTROY procedures are easy to write, need only be written once
>     per ADT, and can be reused indefinitely.  GC costs us the
>     computer time required to repeatedly determine which storage
>     is in use and which is not, and this cost must be paid repeatedly
>     AT RUN TIME.  Given that this PERMANENT, RUN_TIME COST can be
>     avoided by simply communicating the fact that a particular object
>     is no longer needed, GC is a rather costly crutch for lazy programmers. 

 The PERMANENT, RUN_TIME COST is very often much worse if you use the
 methods Mr. Wolfe dogmatically demands.  For example if you have a program
 which does calculations with indefinite size integers, then each new integer
 value computed is most appropriately stored in a new area in the heap.
 If you disallow sharing then each assignment must create a new area.
 This is costly.  Each time you stop using an integer variable you must
 deallocate it.  This is costly.
 If you have a simple copying garbage collector and allow sharing, which
 by the way is not dangerous at all in this case at least, you don't have
 the abovementioned costly operations, but instead you have a simple garbage
 collection which will in my experience be infrequent (e.g. every ten seconds
 for a program doing intensive calculations with 10000 digit integers) and 
 fast (less than a fraction of a second for each collection).  And this
 is on a machine with small memory (a PC with 640KB memory).

 Memory management is one of the most difficult things for programmers to
 do correctly and efficiently.  Garbage collection certainly takes this
 burden from the programmer and allows him to think in more abstract terms.

 In my opinion reusable software components can not be adequately created
 in languages that do not have good automatic memory management.
 In this respect Ada is a failure unless garbage collection is required.
 The C++ approach of using destructors is costly both in memory and time
 and does not really solve the problem, although it is better than nothing.

 It is regrettable in my opinion that the issue of garbage collection
 is so much misunderstood.  It is also regrettable that proponents of
 the various interesting new programming methodologies do not push
 garbage collection as an issue.  Functional programming, logic
 programming and object oriented programming have only one thing in
 common; reliance on garbage collection.  None of those methodologies
 would have much use in my opinion if it were not for garbage collection.
 And, I repeat: reusable modules (that includes generic packages) are
 almost useless in languages with no garbage collection compared to
 the use you can get in languages having garbage collection.

 
-- 
Snorri Agnarsson                        --  snorri@rhi.is
Raunvisindastofnun Haskolans      
Dunhaga 3
IS-107 Reykjavik ICELAND

sommar@enea.se (Erland Sommarskog) (12/29/88)

I gave an example where I stored pointers in a tree, and tried
to make Bill Wolfe explain how this should work with his idea 
with deallocation on block exit. He answered, but I wouldn't 
say I'm satisfied... I still don't see where he is heading.

>   Since the user supplied an access type as the type of object to be
>   stored in the tree, the user bears full responsibility for making
>   responsible use of the fact that he/she is dealing with a tree of pointers.

Well, assume that Assign the procedure I write for Some_access_type 
(the type I store in the tree) is simply A := B. With your scheme I 
can't but see that the object I allocated is deallocated when I exit  
my procedure, although I still have a reference to it, stored in
the tree.
  Bill, tell me, what is wrong here? My understanding of your proposal,
or the program I have written? If you think the latter, explain to me
how the compiler should detect the error I'm doing. Or do you really 
think such an error should remain undiscovered until run-time? And
does that rhyme with your eagerness for simplify maintenance?

>> Several times in this discussion Bill Wolfe has claimed that 
>> garbage collection is a trouble-maker for maintainers. In what way? 
>
>      When there is great difficulty deciding whether a given object 
>      exists or not, the maintainer experiences great difficulty 
>      pinning down the precise state of the program.  

You have just given an excellent argument for garbage collection.
With garbage collection, only the objects that have references
exist. Without, there may be other objects that are unreachable.
With garbage collection you are sure of that all initiated non-null
references point to actual objects. Without, you may by mistake
be pointing straight into free space, or right in the middle of
some completely other kind of object. 
  The only incertainty you have with garbage collection is whether 
a non-referred object has yet been returned to free memory, depending
on wether the garbage collector has come there or not. But since 
the object no longer is part of the system, this question is of 
little interest.

>>   Much worse for maintainence is when the system dies with "System 
>> access violation" or "segmentation fault" from time to time and you 
>> don't know why. The reason may be that an allocated area was erroneously 
>> released and then reused and the original data, that someone appearently 
>> is relying in, has been over-written. Or because an already deallocated 
>> was deallocated a second time.

I forgot: When the system chokes with "heap full" and you as a maintainer
is trying to find where all space is being wasted. With garbage collection
you at least know that all this memory is being used. Without, you don't 
know if it is just memory leakage or real overuse.

>    DESTROY procedures are easy to write, need only be written once
>    per ADT, and can be reused indefinitely.  GC costs us the
>    computer time required to repeatedly determine which storage
>    is in use and which is not, and this cost must be paid repeatedly
>    AT RUN TIME.  Given that this PERMANENT, RUN_TIME COST can be
>    avoided by simply communicating the fact that a particular object
>    is no longer needed, GC is a rather costly crutch for lazy programmers. 

1) You have several times stated that the callers of your ADTs 
are responsible for that their destroy procedures are called.
So it's not only to write them. You must remember to call them
too. And when you forget, it takes time find out where.

2) Much of this talk reminds of what hear from C programmers when
range and type checking is being discussed. And the same arguments  
apply. If all programmers were skilled enough to do everything
right we wouldn't need Ada or high-level langauges at all. Actually,
if everyone else is taking help of computers these days, why shouldn't
programmers do?
  Just like range and type checking, garbage collection is a device
that increases our thrust in that the system does what we intended.

3) I recommend you to read "Object-Oriented Software Construction"
by Bertrand Meyer. There he explains why garbage collection is
necessary in an object-oriented system. He also describes how the
garbage collector is implemented in the Eiffel system. (Eiffel is
an object-oriented language, devised by Mr. Meyer, and comemercially
available.) 
-- 
Erland Sommarskog
ENEA Data, Stockholm              This signature is not to be quoted.
sommar@enea.se

wtwolfe@hubcap.UUCP (Bill Wolfe) (12/30/88)

From article <692@krafla.rhi.hi.is>, by snorri@rhi.hi.is (Snorri Agnarsson):
> From article <3985@hubcap.UUCP>, by billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,):
>>       When there is great difficulty deciding whether a given object 
>>       exists or not, the maintainer experiences great difficulty 
>>       pinning down the precise state of the program.  
> 
>  Exactly - and garbage collection is precisely what you need to get rid
>  of this difficulty.
>  When you have garbage collection there is no difficulty in determining
>  whether an object exists or not.  It exists if you have some way to
>  refer to it, otherwise not.

     So basically you have to sit there and total up the references to
     every piece of space manually.  Sounds like fun.  Got an algorithm
     for doing this in constant time?  If not, there is "great difficulty
     pinning down the precise state of the program". 
 
>>     DESTROY procedures are easy to write, need only be written once
>>     per ADT, and can be reused indefinitely.  GC costs us the
>>     computer time required to repeatedly determine which storage
>>     is in use and which is not, and this cost must be paid repeatedly
>>     AT RUN TIME.  Given that this PERMANENT, RUN_TIME COST can be
>>     avoided by simply communicating the fact that a particular object
>>     is no longer needed, GC is a rather costly crutch for lazy programmers. 
> 
>  The PERMANENT, RUN_TIME COST is very often much worse if you use the
>  methods Mr. Wolfe dogmatically demands.  For example if you have a program
>  which does calculations with indefinite size integers, then each new integer
>  value computed is most appropriately stored in a new area in the heap.
>  If you disallow sharing then each assignment must create a new area.

      Consider what happens to the storage associated with the old value
      of the target of the assignment statement.  The ASSIGN procedure
      can reuse it directly rather than deallocating and reallocating it.
      
>  Memory management is one of the most difficult things for programmers to
>  do correctly and efficiently.  Garbage collection certainly takes this
>  burden from the programmer and allows him to think in more abstract terms.

     Space is an important part of the abstraction.  And I challenge the
     claim that memory management is difficult.  One need only deal with
     a single structural model within the ADT; in the course of writing
     that ADT's implementation, one will become thoroughly familiar with
     that structure, and there should be no difficulty deciding when a
     part of it is no longer needed.

>  In my opinion reusable software components can not be adequately created
>  in languages that do not have good automatic memory management.

    In my opinion, reliance upon garbage collection is the single best
    way to create an inferior ADT implementation which will be totally
    rejected by anyone who is programming critical applications.  

>  In this respect Ada is a failure unless garbage collection is required.

    In this respect garbage collection is a total failure and should
    NEVER be required.  In the interests of seeing to it that portability
    to the critical-application environment is always possible, GC should
    be PROHIBITED.

>  It is regrettable in my opinion that the issue of garbage collection
>  is so much misunderstood.  

    Particularly by people who have no understanding of the issues
    applying to proper ADT design.

>  It is also regrettable that proponents of
>  the various interesting new programming methodologies do not push
>  garbage collection as an issue.  Functional programming, logic
>  programming and object oriented programming have only one thing in
>  common; reliance on garbage collection.  None of those methodologies
>  would have much use in my opinion if it were not for garbage collection.

    They don't have much use anyway.  See David Harland's "Concurrency
    and Programming Languages".


                                            Bill Wolfe

                                     wtwolfe@hubcap.clemson.edu

wtwolfe@hubcap.UUCP (Bill Wolfe) (12/30/88)

From article <4200@enea.se>, by sommar@enea.se (Erland Sommarskog):
>>   Since the user supplied an access type as the type of object to be
>>   stored in the tree, the user bears full responsibility for making
>>   responsible use of the fact that he/she is dealing with a tree of pointers.
> 
> Well, assume that Assign the procedure I write for Some_access_type 
> (the type I store in the tree) is simply A := B. With your scheme I 
> can't but see that the object I allocated is deallocated when I exit  
> my procedure, although I still have a reference to it, stored in
> the tree.

    One more time.  A exists within the tree.  The tree is a PARAMETER.
    The tree is NOT a local variable.  Hence, upon exit from the ASSIGN
    procedure, A will not suffer deallocation.  Now what about B?  It's
    a parameter too, and also will not be deallocated.
 
>   Bill, tell me, what is wrong here? My understanding of your proposal,
> or the program I have written? 

    Your understanding of my proposal.  I'm having a really hard time
    trying to figure out what you're thinking of here.  I thought everything
    had been clearly defined in terms that any Pascal/Modula-2/Ada programmer
    would understand.

> I forgot: When the system chokes with "heap full" and you as a maintainer
> is trying to find where all space is being wasted. With garbage collection
> you at least know that all this memory is being used. Without, you don't 
> know if it is just memory leakage or real overuse.

     Advanced debugging tools should be used to address this issue.

>>    DESTROY procedures are easy to write, need only be written once
>>    per ADT, and can be reused indefinitely.  GC costs us the
>>    computer time required to repeatedly determine which storage
>>    is in use and which is not, and this cost must be paid repeatedly
>>    AT RUN TIME.  Given that this PERMANENT, RUN_TIME COST can be
>>    avoided by simply communicating the fact that a particular object
>>    is no longer needed, GC is a rather costly crutch for lazy programmers. 
> 
> 1) You have several times stated that the callers of your ADTs 
> are responsible for that their destroy procedures are called.
> So it's not only to write them. You must remember to call them
> too. And when you forget, it takes time find out where.

   I'm TRYING to get Ada to integrate the automatic destruction of local
   variables which occurs during block exit with ADT destruction procedures,
   so the user can rely on the implicit deallocation request associated
   with block exit.  Let me give a concrete example:

      procedure EXAMPLE (A : in out ADT);

         B : ADT;
         C : INTEGER;

      [body of procedure EXAMPLE]

    Now when the body of procedure EXAMPLE has finished executing,
    in current Ada C will be properly destroyed, but B will not.
    If B is implemented as a pointer to something, that pointer will
    be destroyed, and everything it pointed to will become garbage.
    By integrating the DESTROY procedure into block exit, everything
    B pointed to will be deallocated as well.  Since A is a parameter,
    it is not part of the locally defined environment; it is instead
    part of the caller's environment, and will stay that way.  In the
    revised version, the "right" DESTROY procedure will apply to B
    as well as C, thus enabling the use of implicit destruction,
    which is the most frequently used method of destruction. 

> 3) I recommend you to read "Object-Oriented Software Construction"
> by Bertrand Meyer. There he explains why garbage collection is
> necessary in an object-oriented system. He also describes how the
> garbage collector is implemented in the Eiffel system. (Eiffel is
> an object-oriented language, devised by Mr. Meyer, and comemercially
> available.) 

   I'm quite aware of Eiffel, and I have just shown (in another article)
   how reliance upon garbage collection is a sure way to get your ADT
   rejected by users who are programming critical applications.  This
   same argument applies to Eiffel and the other members of the peanut
   gallery; they are unsuitable for critical applications, among their
   many other flaws.  May I suggest David Harland's "Concurrency and 
   Programming Languages"?

   Representation of classes of types is better accomplished through a
   synthesis approach, rather than an inheritance approach; however, this
   is presently a research topic.  Followups on this to comp.lang.sigplan
   or e-mail, please.  


                                        Bill Wolfe

                                 wtwolfe@hubcap.clemson.edu

sommar@enea.se (Erland Sommarskog) (12/31/88)

I'm trying to get Bill Wolfe to explain how he his deallocation-on-
block-exit scheme would work in an example I had. For brevity I
deleted my example in my last article. Apparently that was a fatal
mistake, because the answer I got wasn't even close the question.
So, my apologies, I repeat the exmample with some clarifcations.

We have:
   Generic 
      Data_type is limited private;
      Procedure -- Assign, "<" and ">" 
   Package Tree_handler
      Tree_type is private;
      
Assume further that we instantiate this package with Some_access_type, 
it could be an imported limited type, it could be a locally defined.
Then we have the procedure:

    Procedure Insert_data_to_tree(Tree : in out Tree_type;
                                  Data_of_some_sort : in Any_type); 
    Reference : Some_access_type;
    Begin
       Reference := new Some_type;  -- Or subprogram call for external type
       Prepare(Reference, Data_of_some_sort);
       Insert(Tree, Reference);
    End Insert_data_to_tree;
    
The Assign procedure which we provide when we instantiate the tree handler 
is simply:
   Procedure Assign(A : in out Some_access_type; B : in Some_access_type) is
   Begin
      A := B;
   End Assign;
   
Now, what happens with Reference.all on exit of Insert_to_data_tree, with    
your scheme? (Assume there is no user-written DESTROY procedure.) Would
Reference.all be deallocated although there still is a living reference  
to it in Tree? Or will it be ignored in any case? Or are you having a
reference count hidden somewhere to save you? 

Another interesting issue is: When Tree drops from the stack what happens
with our little object at this occassion? The Tree's DESTROY procedure
cannot know anything about it, so we have to traverse the tree and deallocate
all referd objects ourselves. (And pray we haven't inserted some of them
in a queue somewhere else too!) With garbage collection...

>> I forgot: When the system chokes with "heap full" and you as a maintainer
>> is trying to find where all space is being wasted. With garbage collection
>> you at least know that all this memory is being used. Without, you don't 
>> know if it is just memory leakage or real overuse.
>
>     Advanced debugging tools should be used to address this issue.

So why can't these "advanced debugging tools" be used to "pin down
the exact state of the system" when we have garbage collection?

>   I'm quite aware of Eiffel, and I have just shown (in another article)
>   how reliance upon garbage collection is a sure way to get your ADT
>   rejected by users who are programming critical applications.  This

I assume with critical mean time-critical here. If the issue is 
reliability or space, garbage collection is only there to help you.

There is good chance that your ADT will be rejected anyway. The critical 
developper looks at you and says: "You use Unchecked_dellocation? Sorry, 
then I can't use your ADT. Memory fragmention will cause too great a 
time penalty when the system have lived for a while." So you return and 
implement a free-list and try again. The developper looks at you: "You 
have an internal task for the free-list?" "Eh, no" "You must have that,
else all will be a mess if two tasks access the list simultaneously."
So you end up with with adding this task, and your critical developper
is satisfied. (Reservation: My experience of tasking in Ada is not
that great. Possibly the SHARED pragma could help, but I doubt.)
  Then you come to me who is writing an interactive data-base application 
for VAX/VMS and give me your ADT. It looks nice so I use it. Some time
later my users complains: the reponse time is too long. What have
you done? Being equipped with good tools I re-link the system with 
PCA (a profiler for VMS) to find the bottle-neck. And I to my surprise 
I find that rendez-vous are taking place in the system. "Ah, that why 
it's slow, but where do they come from?" Soon your ADT is found to be 
guilty. And just a little later your ADT is edited to be taskless, and my 
users are happy again. (Until they get dumps because memory is full,
because I forgot to call one of your damned Destroy routines somewhere.)

The lesson to learn is that it is very hard to satisfy everyone. 
And you end up with two versions anyway. Given that there are
many shops who never will write an application were garbage
collection is unacceptable, ADTs relying on its presence will
still have a wide use. Probably more use than ADTs using tasks
to protect free lists. (I have reponse times to think of, but garbage 
collection is no problem. With the collector running as coroutine 
like in the Eiffel system, it can work while I'm waiting for input.)

There is a solution that gives us a possibility to use the same 
implementation of the ADT above. And that solution is, yes you guessed
it: garbage collection. You provide your ADT with a Destroy procedure
that your critical developper uses. (As long your block-exit scheme 
doesn't work properly, he has to call it explicitly. I doubt he 
would accept your idea anyway.) I, in my turn, give a damn about it. 
All I want is that the task should not be used, which you can have
a special call for. Then I just set the garbage_collection flag in 
my system configuration file. 

-- 
Erland Sommarskog
ENEA Data, Stockholm              This signature is not to be quoted.
sommar@enea.se

ryer@inmet.UUCP (01/03/89)

Currently, Ada neither requires nor precludes garbage collection.  For 
example, the Symbolics Ada compiler provides it (as well they should), but
none of the eight cross-compilers for the 1750A computer does (as they 
shouldn't).

What are you (any of you) proposing?  That ALL Ada compilers should have
garbage collection?  Than no Ada compilers should be allowed to have it?
Pragmas?

Each day I find a new collection of comments on this note, debating the
goodness of garbage collection in the abstract.  I think that the Ada 
language has already taken the only defensible position, and would be
happy to explain why if anyone wants to take a different concrete position.

Mike "speaking only for myself" Ryer
Intermetrics, Inc.

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/05/89)

From article <4206@enea.se>, by sommar@enea.se (Erland Sommarskog):
> I'm trying to get Bill Wolfe to explain how he his deallocation-on-
> block-exit scheme would work in an example I had. [...] 
> I repeat the exmample with some clarifcations.
> 
> We have:
>    Generic 
>       Data_type is limited private;
>       Procedure -- Assign, "<" and ">" 
>    Package Tree_handler
>       Tree_type is private;
>       
> Assume further that we instantiate this package with Some_access_type, 
> it could be an imported limited type, it could be a locally defined.
> Then we have the procedure:
> 
>     Procedure Insert_data_to_tree(Tree : in out Tree_type;
>                                   Data_of_some_sort : in Any_type); 
                                                           ^^^^^^^^

                                        You mean Data_type, right?

>     Reference : Some_access_type;

                     How about "access Data_Type"?  It is illegal to
                     reference Some_access_type directly, since it
                     is not a generic formal parameter.  We can only 
                     reference Data_Type, which in this particular 
                     instantiation will refer to Some_access_type.

>     Begin
>        Reference := new Some_type;  -- Or subprogram call for external type

           Let's say "Reference := new Data_Type;", to make things legal.

>        Prepare(Reference, Data_of_some_sort);

           Prepare is not a generic parameter; I frankly don't see
           why you need a call to such a procedure anyway.  I'll replace
           it with "Assign (Reference.all, Data_of_some_sort);". 
 
>        Insert(Tree, Reference);

           What exactly is Insert doing?  It would make more sense to write:

            begin
              -- find the node for which the new value is to
              --   become either the leftmost or rightmost child,
              --   create a new child, and assign the new data value
              --   to the appropriate field of the new node.  At some
              --   point in this process, there may be some pointer 
              --   manipulations; it really doesn't matter where.  But
              --   let's assume the tree is initially empty.  Then, 
              TREE := new BINARY_TREE_NODE;
              ASSIGN (NEW_NODE.DATA_VALUE, DATA_OF_SOME_SORT);
              -- since pointer types are automatically initialized to
              --   null, we don't have to worry about initializing
              --   the left child and right child fields, so we're done.
           end INSERT;
 
>     End Insert_data_to_tree;
>     
> The Assign procedure which we provide when we instantiate the tree handler 
> is simply:
>    Procedure Assign(A : in out Some_access_type; B : in Some_access_type) is
>    Begin
>       A := B;
>    End Assign;
>    
> Now, what happens with Reference.all on exit of Insert_to_data_tree, with    
> your scheme? (Assume there is no user-written DESTROY procedure.) Would
> Reference.all be deallocated although there still is a living reference  
> to it in Tree? 

      First of all, I always require the user to pass in a DESTROY procedure.
      So my initial reaction is that this is a malformed ADT.  But let's
      assume the existence of such a malformed ADT.  We'll assume further
      that we have a pointer variable such as Reference.  Destruction of
      a pointer consists simply of destroying the pointer, and does not
      imply destruction of what it points to.  Destruction of an ADT
      which happens to be implemented via an initial pointer (as in Tree)
      does imply destroying the transitive closure of what it points to.

>>> I forgot: When the system chokes with "heap full" and you as a maintainer
>>> is trying to find where all space is being wasted. With garbage collection
>>> you at least know that all this memory is being used. Without, you don't 
>>> know if it is just memory leakage or real overuse.
>>
>>     Advanced debugging tools should be used to address this issue.
> 
> So why can't these "advanced debugging tools" be used to "pin down
> the exact state of the system" when we have garbage collection?

     They can, but only the exact state of one particular execution,
     and not the precise state of the theoretical system in general.

     A debugging tool exists simply to help you find errors which show
     up in one particular situation.  It does not help you understand
     the piece of software as a theoretical structure, which is essential
     if one is to perform nontrivial modifications to a system.  It is
     this theoretical imprecision which causes problems.

> 
>>   I'm quite aware of Eiffel, and I have just shown (in another article)
>>   how reliance upon garbage collection is a sure way to get your ADT
>>   rejected by users who are programming critical applications.  This
> 
> There is good chance that your ADT will be rejected anyway. The critical 
> developper looks at you and says: "You use Unchecked_dellocation? Sorry, 
> then I can't use your ADT. Memory fragmention will cause too great a 
> time penalty when the system have lived for a while." 

     Allocation/deallocation routines exists which automatically
     collapse adjacent blocks of free storage into larger blocks,
     which minimizes the problem.  In general, this is the *compaction*
     question, which is orthogonal to the GC question.  It is properly
     directed at the allocation/deallocation system and not at the
     ADT designer.  It is conceivable that an allocation/deallocation
     system could be designed in such a way as to spread the costs of
     compaction evenly over each allocation/deallocation request, thus
     meeting all requirements at the same time. 


                                            Bill Wolfe

                                    wtwolfe@hubcap.clemson.edu

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/05/89)

From article <124000028@inmet>, by ryer@inmet.UUCP:
> Currently, Ada neither requires nor precludes garbage collection.  [...] 
> What are you (any of you) proposing?  That ALL Ada compilers should have
> garbage collection?  Than no Ada compilers should be allowed to have it?
> Each day I find a new collection of comments on this note, debating the
> goodness of garbage collection in the abstract.  I think that the Ada 
> language has already taken the only defensible position, and would be
> happy to explain why if anyone wants to take a different concrete position.

    OK, I'll take the position that Ada should be defined such that
    no Ada compilers should be allowed to have garbage collection.

    I charge that GC encourages the production of sloppy code which
    is difficult to maintain and which compiles into permanently inefficient
    programs, that code can be written by space-conscious programmers
    at least as fast as by those who use GC as a crutch, and that such
    code will require less debugging time due to the more complete 
    understanding of the problem which is possessed by a programmer 
    who comprehends all aspects (both space and time) of his/her code,
    and that GC retards portability by making code non-reuseable from
    the perspective of real-time programmers.

    For all these reasons, I conclude that GC should be prohibited, 
    just as the GOTO should be (but unfortunately, is not yet) prohibited. 
    
    Your turn.



                                            Bill Wolfe

                                    wtwolfe@hubcap.clemson.edu

sommar@enea.se (Erland Sommarskog) (01/06/89)

Mike Ryer (ryer@inmet.UUCP) writes:
>Currently, Ada neither requires nor precludes garbage collection.  For 
>example, the Symbolics Ada compiler provides it (as well they should), but
>none of the eight cross-compilers for the 1750A computer does (as they 
>shouldn't).
>
>What are you (any of you) proposing?  That ALL Ada compilers should have
>garbage collection?  Than no Ada compilers should be allowed to have it?
>Pragmas?

I know about nothing about 1750, but I assume it's typically used
for those "embedded systems" that Ada was intended for. I can agree
that garbage collection is virtually of no interest on such a machine
and not worth the development costs.
  On the other hand if I want to port my interactive program from VAX/VMS
to SunOS, I can't with the rules of today rely on garbage collection, 
but have to stick to Unchecked_deallocation.
  So it would be nice if garbage collection was required to be available, 
or at least recommended. How it should be controlled (on or off) is a more 
tricky question. Pragmas is maybe not the best of ideas since to some extent 
it applies to the entire system.
  Of course, if practice turns out that all compilers except those for 
strict real-time use have garbage collection, then such a rule in the
LRM is unnecessary. Unfortunately, this doesn't seems to be the case
today.
-- 
Erland Sommarskog
ENEA Data, Stockholm              This signature is not to be quoted.
sommar@enea.se

ryer@inmet.UUCP (01/07/89)

Why some Ada compilers should be allowed to perform garbage collection:

a. Because they are to generate code that coexists in an environment with
languages like LISP where garbage collection is assumed.  They want to
operate on data structures shared with other languages.

b. Because the current world supply of "reusable" Ada components have
not all been written carefully for space management, and in prototyping
any reusable component (that works) may be better than none.

c. Because some users are very unconcerned with the efficiency or
reusability of their programs.  This applies to the run-only-once
programs such as in a training environment, or where a computer is used
as an oversized desk calculator just to obtain numeric answers.  It may
be that all computer science students should become careful implementers
of ADTs.  I doubt that all particle physicists need to develop this
discipline even though they may write substantial programs.

d. It Is FEASIBLE for the computer to do a good job of garbage collection
automatically and it does reduce the size of the source code which does
tend to reduce the life cycle cost of owning that code.  It may increase
execution time but some users will prefer this tradeoff.

Therefore, garbage collection should not be prohibited, even though it
is inappropriate and harmful in some cases.  Different applications
can benefit from different compilers.

Mike Ryer
Intermetrics

sommar@enea.se (Erland Sommarskog) (01/07/89)

For the third time I asked Bill Wolfe how his idea with deallocation
on block exit. With his last answer, I realized that due to unclarity
from my side, he may have misunderstood my example. My apologies to
you all, and Bill in particular. 

So I try again, and see if I can get it right:
We have:
     Generic
        Data_type is limited private;
        with Assign, "<", ">"    -- What you expect 
     Package Binary_trees is
        Tree_type is private;
        Procedure Insert(Tree in out : Tree_type; Data : in Data_type);
        ...
     End Binary_trees;
     
     Package Use_trees is  -- Could be a procedure. Could be generic.
        Type Any_type;     -- Could also be generic parameter, or imported.
        Type Some_type;
        Type Some_access_type is access Some_type;
        Procedure Assign(A : in out Some_access_type; 
                         B : in     Some_access_type) is 
        Begin A := B End Assign; 
        -- Also add "<" ">" to mean comparison on Any_type here.
        Package Tree_handler is new Binary_trees(Some_access_type, Assign);
        ...
        Procedure Put_something_in_tree(Tree : in out Tree_type;
                                        Anything : Any_type) is
        Reference : Some_access_type;                                
        Begin                                 
           Reference := new Some_type;
           Do_something(Reference.all, Anything);
           Tree_handler.Insert(Tree, Reference);
        End;
        ...
    End Use_trees;
    
Question: What happens with Reference.all when Put_something_in_tree
is exited? I read your proposal as that it should be deallocated, 
which would be disastrous. Or is this a misunderstanding from my 
side? If it is not: is this code erroneous with your scheme? 
  Side note: you said that the tree package should take a Destroy
procedure. For what benefit? You can't automatically call it when
your Destroy_tree is called. The caller might refer to the stored 
objects in some other place. Or does your Destroy_tree have a special
parameter to control this? Seems like adding two parameters (one
generic and one routine) that aren't necessary, but clutters up the
user's code. The user can always traverse the tree and destroy what
he has there.
-- 
Erland Sommarskog
ENEA Data, Stockholm              This signature is not to be quoted.
sommar@enea.se

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/09/89)

From article <4222@enea.se>, by sommar@enea.se (Erland Sommarskog):
> [Still discussing Erland's example regarding automatic destruction
>  of a local environment upon block exit.  Erland now states precisely
>  that he is instantiating a tree which stores objects of type pointer;
>  in other words, we are dealing with a tree of pointers.  The example
>  continues with a situation in which a pointer is being inserted into
>  the tree.]
>
>         Procedure Put_something_in_tree(Tree : in out Tree_type;
>                                         Anything : Any_type) is
>         Reference : Some_access_type;                                
>         Begin                                 
>            Reference := new Some_type;
>            Do_something(Reference.all, Anything);
>            Tree_handler.Insert(Tree, Reference);
>         End;
>     
> Question: What happens with Reference.all when Put_something_in_tree
> is exited? I read your proposal as that it should be deallocated, 
> which would be disastrous. Or is this a misunderstanding from my 
> side? If it is not: is this code erroneous with your scheme? 

   It is a misunderstanding from your side.  I stated in the preceding
   version of the example (or maybe the first version, or both) that
   since Reference is a simple pointer, destruction of Reference consists
   only of eliminating the space occupied by the pointer, and does not
   imply destruction of what Reference points to.  Now let's consider: 

      procedure EXAMPLE (INPUT : SOME_ADT) is

         TEMP : SOME_ADT;    -- now TEMP happens to be *implemented*
                             --  as a pointer to TEMP's descriptor,
                             --  which in turn contains other pointers
                             --  to TEMP's value.  But since TEMP is
                             --  limited private, our user doesn't know that.

         ACCESS_AN_INTEGER : access INTEGER;   -- a local object
         
      begin
         ASSIGN (TEMP, SOME_OPERATION (SOME_ADT)); 
         ACCESS_AN_INTEGER := new INTEGER := SIZE_OF (TEMP);    
            -- notice the allocation...
         DO_SOMETHING_WITH (TEMP, SOME_ADT, ACCESS_AN_INTEGER);
         DO_SOMETHING_ELSE_WITH (TEMP, ACCESS_AN_INTEGER);
            --
            -- notice failure to deallocate ACCESS_AN_INTEGER.all...
            --
      end EXAMPLE;  -- OK, now at this point we are done with TEMP.  It is a 
                    --   local variable, and therefore its DESTROY procedure 
                    --   will be automatically invoked.  We are also done with 
                    --   ACCESS_AN_INTEGER, so ACCESS_AN_INTEGER will also be 
                    --   destroyed.  But since ACCESS_AN_INTEGER is only a 
                    --   pointer, this leaves the integer value to become 
                    --   garbage!  We conclude that our user requires 
                    --   more programming experience, so that he/she 
                    --   might more completely comprehend the characteristics 
                    --   of the programming tool known as the pointer.

>   Side note: you said that the tree package should take a Destroy
> procedure. For what benefit? You can't automatically call it when
> your Destroy_tree is called. The caller might refer to the stored 
> objects in some other place. 

    You're forgetting that the caller completely controls what happens
    inside that caller-supplied DESTROY procedure!  If the caller so
    desires, the DESTROY procedure can consist of a null statement;
    then only the routine destruction of the space occupied by the
    pointer will occur, as an effect of the call to UNCHECKED_DEALLOCATION
    which destroys the space occupied by the record describing a given node.

    Secure in the knowledge that the caller is responsible for prescribing
    the destruction procedure, Destroy_Tree will in fact call Destroy
    automatically on each and every instance of a stored object.



                                        Bill Wolfe

                                wtwolfe@hubcap.clemson.edu

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/09/89)

From article <124000031@inmet>, by ryer@inmet.UUCP:
> 
> Why some Ada compilers should be allowed to perform garbage collection:
> 
> a. Because they are to generate code that coexists in an environment with
> languages like LISP where garbage collection is assumed.  They want to
> operate on data structures shared with other languages.

     Assume that the Ada code cleans up after itself.  Then that code
     can exist perfectly in a Lisp environment; since it does not
     contribute to the garbage problem, the Ada code will be a
     very welcome guest indeed.  

> b. Because the current world supply of "reusable" Ada components have
> not all been written carefully for space management, and in prototyping
> any reusable component (that works) may be better than none.

     However, I'll bet that most have.  At any rate, once New Ada
     emerges, you can bet there will be an immediate upgrading
     taking place among all the component vendors; the upgraded
     components will be available long before the New Ada compilers.

> c. Because some users are very unconcerned with the efficiency or
> reusability of their programs.  This applies to the run-only-once
> programs such as in a training environment, or where a computer is used
> as an oversized desk calculator just to obtain numeric answers.  It may
> be that all computer science students should become careful implementers
> of ADTs.  I doubt that all particle physicists need to develop this
> discipline even though they may write substantial programs.

     Let's face it: with generics, multitasking, and so on, Ada is
     a language for professionals.  Most nonprofessionals should be using
     application-specific languages, which are implemented in... Ada. 

     Particle physicists usually require high-performance software,
     and they should therefore turn to professional application developers.

> d. It Is FEASIBLE for the computer to do a good job of garbage collection
> automatically and it does reduce the size of the source code which does
> tend to reduce the life cycle cost of owning that code.  It may increase
> execution time but some users will prefer this tradeoff.

     The reduction in size is trivial.  Additionally, I contend that there
     are positive side effects with regard to a more complete programmer
     understanding of the problem, which leads to better overall code.

     Furthermore, the use of the ADT paradigm, in conjunction with 
     the automatic destruction of local environments, means that
     application programmers will almost never have to deal with deallocation
     on an explicit basis anyway.  This will provide positive incentive
     to use professionally designed ADTs, leading once again to higher-
     quality code.

> Therefore, garbage collection should not be prohibited, even though it
> is inappropriate and harmful in some cases. 

     I'll heartily agree that GC is inappropriate and harmful, but
     you still haven't proven your case.


                                     Bill Wolfe

                              wtwolfe@hubcap.clemson.edu

barmar@think.COM (Barry Margolin) (01/09/89)

In article <4041@hubcap.UUCP> billwolf@hubcap.clemson.edu writes:
>         ASSIGN (TEMP, SOME_OPERATION (SOME_ADT)); 
>         ACCESS_AN_INTEGER := new INTEGER := SIZE_OF (TEMP);    
>            -- notice the allocation...
>         DO_SOMETHING_WITH (TEMP, SOME_ADT, ACCESS_AN_INTEGER);
>         DO_SOMETHING_ELSE_WITH (TEMP, ACCESS_AN_INTEGER);
>            --
>            -- notice failure to deallocate ACCESS_AN_INTEGER.all...

But what if DO_SOMETHING_WITH and/or DO_SOMETHING_ELSE_WITH makes a
copy of its third parameter in some permanent data structure?  If
ACCESS_AN_INTEGER.all were to be deallocated at this point, you would
end up with a dangling pointer.  Or should these routines be required
to also copy THIRD_ARGUMENT.all, too?

My understanding is that the Ada designers didn't originally espouse
manual deallocation (hence the name UNCHECKED_DEALLOCATION) because it
is an easy way to write erroneous programs, and there's no way to
statically check for this error at compile time.  Forgetting to
deallocate something also produces an incorrect program (in non-GC
environments), but the failure modes are less frequent.  Also, there's
a relatively simple way to fix the out-of-memory problem that results:
add more memory; finding dangling references is often much harder.

I've used both GC and non-GC languages.  I've done lots of programming
in PL/I, and now I'm a Lisp programmer.  Personally, I've had few
problems with dangling pointers in PL/I, but most of the programming
didn't make use of ADT-style programming.  The one ADT-style PL/I
program I was involved with (the Multics mail system utilities) solved
its problem by using reference counts (luckily, there are no potential
circular reference chains) in all the data structures.

But I've also seen the freedom allowed by a garbage collector in the
Symbolics Lisp Machine software.  It means that unrelated programs can
share data structures without preplanning.  A fundamental assumption
of manual deallocation is that when the allocator of a structure is
through with it, so is everyone else.  But when references to these
structures are passed around the references may be copied.  Unless you
want to require everyone to copy those structures you can't assume
that the allocator knows when it's safe to deallocate.

By the way, for systems programming and performance purposes, the
Symbolics environment provides a number of manual
allocation/deallocation mechanisms.  A facility called "resources"
allows you to maintain a free-list of similar structures, and allocate
and deallocate from that list instead of from the heap.  This tends to
be used for large structures that are allocated and become garbage at
a very high rate, such as network packets and disk buffers.  There's
even a primitive for deallocating a structure from the heap, although
in the current implementation it does nothing if the structure isn't
the youngest object in its region of the heap, because it simply moves
the region's allocation pointer to the beginning of the object (if it
were to do this with a non-youngest object, it would deallocate
younger objects in the process); I've never actually seen this
facility used, though, and I think it's only there for compatibility
with PDP-10 MacLisp (the MacLisp GC put garbage onto a free-list, and
there was a primitive to add an arbitrary object to this).

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/10/89)

From article <35278@think.UUCP>, by barmar@think.COM (Barry Margolin):
> In article <4041@hubcap.UUCP> billwolf@hubcap.clemson.edu writes:
>>         ASSIGN (TEMP, SOME_OPERATION (SOME_ADT)); 
>>         ACCESS_AN_INTEGER := new INTEGER := SIZE_OF (TEMP);    
>>            -- notice the allocation...
>>         DO_SOMETHING_WITH (TEMP, SOME_ADT, ACCESS_AN_INTEGER);
>>         DO_SOMETHING_ELSE_WITH (TEMP, ACCESS_AN_INTEGER);
>>            --
>>            -- notice failure to deallocate ACCESS_AN_INTEGER.all...
> 
> But what if DO_SOMETHING_WITH and/or DO_SOMETHING_ELSE_WITH makes a
> copy of its third parameter in some permanent data structure?  If
> ACCESS_AN_INTEGER.all were to be deallocated at this point, you would
> end up with a dangling pointer.  Or should these routines be required
> to also copy THIRD_ARGUMENT.all, too?

    Some of you appear to be laboring under the false impression that
    when one passes a pointer into a procedure, one will have no idea
    what is to be done with that pointer.  Let me assure you that before
    I pass a pointer into *any* procedure, that procedure had damn well
    better declare its intentions with respect to that pointer.  Some
    possibilities are:

       -- The pointer must be the sole pointer to the object in question.
       -- This procedure then claims sole possession of that object;
       -- the pointer you pass in will therefore be read and then set to null.

       -- The pointer will be maintained as a reference to the object in qu
       -- question.  The caller is responsible for ensuring that the object
       -- is not destroyed without first making a call to REMOVE_OBJECT
       -- in order to destroy this reference to the object in question.

       -- The pointer must be the sole pointer to the object in question.
       -- This procedure will read the object pointed to, and then destroy it;
       -- the pointer you pass in will therefore be read and then set to null.

   ***********************************************
   ** The answer, my friends, is DOCUMENTATION. **
   ***********************************************
 
> My understanding is that the Ada designers didn't originally espouse
> manual deallocation (hence the name UNCHECKED_DEALLOCATION) because it
> is an easy way to write erroneous programs, and there's no way to
> statically check for this error at compile time.  

    Failure to deallocate is a logic error; like other logic errors,
    they are best detected by desk checking (primary line of defense),
    and by using a debugger as a last resort.

> I've used both GC and non-GC languages.  I've done lots of programming
> in PL/I, and now I'm a Lisp programmer.  Personally, I've had few
> problems with dangling pointers in PL/I, but most of the programming
> didn't make use of ADT-style programming.  

     Personally, I've had no problems with dangling pointers in Ada,
     due to the complexity management provided by the ADT paradigm.

> But I've also seen the freedom allowed by a garbage collector in the
> Symbolics Lisp Machine software.  It means that unrelated programs can
> share data structures without preplanning.  

     You mean "without documentation".  Thanks, but no thanks. 

A fundamental assumption
> of manual deallocation is that when the allocator of a structure is
> through with it, so is everyone else.  But when references to these
> structures are passed around the references may be copied.  Unless you
> want to require everyone to copy those structures you can't assume
> that the allocator knows when it's safe to deallocate.

     Sure I can, provided the lost art of documentation is practiced
     from time to time...  (Gee, I sure hope I don't ever have to 
     program with YOUR colleagues...)



                                        Bill Wolfe

                                 wtwolfe@hubcap.clemson.edu

barmar@think.COM (Barry Margolin) (01/10/89)

In article <4050@hubcap.UUCP> billwolf@hubcap.clemson.edu writes:
>A fundamental assumption
>> of manual deallocation is that when the allocator of a structure is
>> through with it, so is everyone else.  But when references to these
>> structures are passed around the references may be copied.  Unless you
>> want to require everyone to copy those structures you can't assume
>> that the allocator knows when it's safe to deallocate.
>
>     Sure I can, provided the lost art of documentation is practiced
>     from time to time...  (Gee, I sure hope I don't ever have to 
>     program with YOUR colleagues...)

I had a feeling you'd say that.  Unfortunately, at the time, I
couldn't think of a better example.  Here's one (I hope):

There are several programs running in a common address space.  Call
them COMMAND_PROCESSOR, COMPILER, and EDITOR.  There is a global data
structure, COMPILER_WARNINGS; the operations on this database include
adding warnings, selecting warnings based on various criteria, and
deleting selected warnings from the database.  COMPILER adds warnings,
as one would expect.  EDITOR has operations that make use of the data
in COMPILER_WARNINGS; for instance, you can display the warnings for
the current file and have the editor position you to the appropriate
source line.  And COMMAND_PROCESSOR has a command to clear out
COMPILER_WARNINGS (which selects all warnings and then deletes them).

The natural way to implement the COMPILER_WARNINGS database is as an
array or list of pointers to warning structures.  The ADD operation
copies the pointer it's given, and is documented so that the caller
knows not to try to deallocate the object it added.  SELECT simply
returns newly allocated pointers to the selected warnings.  DELETE
objviously should deallocate the pointer and remove it from
COMPILER_WARNINGS, but what about pointer.all?

If someone has called SELECT and it has returned a pointer to a
particular warning, DELETE should not deallocate pointer.all, because
there is another handle to it.  EDITOR should be permitted to continue
to display and manipulate selected warnings even if they've since been
deleted from the global database.

Reference counts can be used to make this work.  Every time a warning
is selected, its reference count can be incremented, and it can be
decremented when it is DELETEd, and DELETE can deallocate pointer.all
when it drops below 0.  Callers of SELECT are required to DELETE all
the selected warnings when they are done with them.  There would also
need to be a different DELETE operation to specify that it should be
removed from the COMPILER_WARNINGS database, to be used by the
COMMAND_PROCESSOR command.  But I don't think this is what you are
advocating.

Can this be solved with only documentation?  Or is this an ugly
example that only a Lisp afficionado would think is clean?

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/10/89)

From article <35300@think.UUCP>, by barmar@think.COM (Barry Margolin):
> There are several programs running in a common address space.  Call
> them COMMAND_PROCESSOR, COMPILER, and EDITOR.  There is a global data
> structure, COMPILER_WARNINGS; the operations on this database include
> adding warnings, selecting warnings based on various criteria, and
> deleting selected warnings from the database.  COMPILER adds warnings,
> as one would expect.  EDITOR has operations that make use of the data
> in COMPILER_WARNINGS; for instance, you can display the warnings for
> the current file and have the editor position you to the appropriate
> source line.  And COMMAND_PROCESSOR has a command to clear out
> COMPILER_WARNINGS (which selects all warnings and then deletes them).
> 
% The natural way to implement the COMPILER_WARNINGS database is as an
% array or list of pointers to warning structures.  The ADD operation
% copies the pointer it's given, and is documented so that the caller
% knows not to try to deallocate the object it added.  SELECT simply
% returns newly allocated pointers to the selected warnings.  DELETE
% objviously should deallocate the pointer and remove it from
% COMPILER_WARNINGS, but what about pointer.all?
% 
% If someone has called SELECT and it has returned a pointer to a
% particular warning, DELETE should not deallocate pointer.all, because
% there is another handle to it.  EDITOR should be permitted to continue
% to display and manipulate selected warnings even if they've since been
% deleted from the global database.   [...]
% 
% Can this be solved with only documentation?  Or is this an ugly
% example that only a Lisp afficionado would think is clean?

    It's in severe need of a cleanup.

    If I understand the problem correctly, we have a command processor,
    a compiler, an editor, and some set of files upon which the compiler
    and editor are to operate.  When a file is sent through the compiler,
    the compiler generates a warning structure with respect to that file.
    The editor then makes use of these warning structures during debugging.
    Furthermore, a command exists whereby the warning structures for given 
    files are destroyed, and this command must not interfere with editor
    processes which are still alive.

    Let's assume the existence of a suitably implemented warning container,
    which has been appropriately hardened for use as a shared variable.
    Our compiler proceeds by read-locking the file and write-locking the
    warning structure; having acquired the locks, the warning structure
    for that particular file is updated, and the locks are then released.
    Our editor proceeds by write-locking the file and read-locking the
    warning structure, updating the file, and then releasing all locks.
    Our warning destroyer proceeds by write-locking and then destroying
    the specified warning structure(s).

    That was almost too easy.  Did I misunderstand the problem?



                                       Bill Wolfe

                                wtwolfe@hubcap.clemson.edu

sommar@enea.se (Erland Sommarskog) (01/11/89)

This is where this discussion ends for my part. It has been going
on far too long. It seems quite obvious that no one here, at least
not me, will succeed in converting Bill Wolfe to believe that garbage 
collection is good. And it does not seem like he will convert anyone 
either.

I'm glad to hear that Reference.all will survive the block exit.
I am still a little puzzled, though. Assume that instead of a "simple 
pointer" we actually have a linked list, declared as (limited) 
private in some package. Then, it's Destroy routine will be called
at block exit. Now, I don't know how you folks write your Assign
procedures for linked lists, but I do simple pointer copying. (And
I assume that Bill Wolfe does the same, since he is so keen on that
his ADTs will be useful in a real-time environment. Copying value
and allocating memory for the new lists would be as harmful as
garbage collection, if not worse.) This means of course that my
implicit Destroy procedure must be just NULL(*). Then I have an another
for the user to call when is tired of his list.
  Excuse me, but is call-destroy-on-block-exit really a useful device? 
Seems like adding something to the langauge which you almost should never 
use. 

And same goes for the user to supply a Destroy procedure to the 
generic package. Since its body often will be NULL, its appearance
in the instantiating package is just white noise.

(*) This isn't really true. If the reference assignment also includes
updating of reference counters, we can check them in the Destroy
procedure. We have now drifted in to the land of poor man's garbage
collection, where to have to very careful everytime so we are not
doing something stupid.

Finally, Bill Wolfe has claimed that not having garbage collection
helps the programmer to understands all aspects of his code, both
time and space. Eh, there are some more aspects on the code than
so. For example, that it really does what it supposed to. It may very
elegantly sweep away all used memory and that, but tell us that
2 + 2 = 5. No good program. Fact is, in many applications, space is
seldom an important problem, if at all. All you want to be sure of
is that you really got rid all of you allocated, so that your interactive
user donesn't get a dump after five hours work.

And, oh, the world is not divided into ADT implementors and application
programmers. It just doesn't look that way.
-- 
Erland Sommarskog
ENEA Data, Stockholm              This signature is not to be quoted.
sommar@enea.se

barmar@think.COM (Barry Margolin) (01/11/89)

In article <4054@hubcap.UUCP> billwolf@hubcap.clemson.edu writes:
>    Let's assume the existence of a suitably implemented warning container,
>    which has been appropriately hardened for use as a shared variable.
>    Our compiler proceeds by read-locking the file and write-locking the
>    warning structure; having acquired the locks, the warning structure
>    for that particular file is updated, and the locks are then released.
>    Our editor proceeds by write-locking the file and read-locking the
>    warning structure, updating the file, and then releasing all locks.
>    Our warning destroyer proceeds by write-locking and then destroying
>    the specified warning structure(s).
>
>    That was almost too easy.  Did I misunderstand the problem?

First of all, whether the file is locked is immaterial to the
discussion (I never actually said that the compiler compiles files --
in Lisp the compiler can also be invoked on in-core interpreted
functions, and many Lisp programming environments allow in-core editor
buffers to be compiled).

However, I did imagine locking of the COMPILER_WARNINGS database.  I
expected that it would be read-locked during the operation of the
SELECT operation, and write-locked during ADD and DELETE.  But once
SELECT returns, the lock would be released.  The lock serves to
protect the collection structure (either an array or linked list,
probably), but not the individual elements (since there are no
operations that modify an existing warning object).

So, I envision that the editor proceeds by read-locking the warnings
structure, selecting the appropriate warnings, and then releasing the
lock.  It then displays the selected warnings to the user and allows
him to perform operations such as "go to line referenced in next
warning".

In many traditional systems the warnings database is a text file
containing the output of the compiler, and the SELECT operation works
by reading this file and parsing it into a set of warnings structures
internal to the editor.  The command processor's DELETE_WARNINGS
operation is then just a Delete File command.  However, this mechanism
effectively involves a copy operation (the contents of the warning
file are copied into the editor's memory), and it is this copy that
I'm trying to avoid by having the editor and compiler in the same
address space and allowing them to share the warning object
structures.

If we do put read and write locks on the individual warning objects,
we have effectively implemented reference counts.  The multiple-reader
locking schemes I'm familiar with increment a counter or add an
element to some list every time a reader grabs the lock, and do the
opposite when unlocking, so that they will know when all the readers
are gone.  So, if your answer is that reference counts are the right
solution, just say so.  It's possibly the case that all well-designed
applications can be implemented using either manual or ref-count-based
deallocation.  However, reference counts add a significant fixed
overhead to every assignment of a ref-count-protected data type; what
was probably a single instruction becomes:
	if target not null
	   decrement target.refcount
	   if target.refcount = 0 then deallocate target.all
	increment source.refcount
	target = source

Most modern GC schemes have time overhead that is a function (often
linear) of the frequency of allocation.  Since assignments are
always more frequent than allocations, and I suspect usually MUCH more
frequent, this difference is important.

Ada prevents many of the worst types of pointer problems by only
allowing pointers to be created by the heap allocation primitive
(NEW).  In other languages, where pointers to existing objects may be
created (C's "&", PL/I's "ADDR") things can get pretty bad without a
GC.  I ran into such a problem last year when I was trying to use a
subroutine library originally designed for display of a database in a
facility for editing that database.  The database is implemented as a
text file, with each line representing a record, and each field of the
record separated by a delimiter character.  The programs are in C, and
the subroutine for reading the database into memory read each line
into a newly allocated string, replaced each delimiter with a null
character (C's string terminator), and allocated structures containing
pointers to the first character of each field.  This was fine for the
read-only applications, as they could simply deallocate all the
strings and record structures that were allocated (they didn't
actually bother, since the programs run on Unix and they depended on
everything being deallocated when the program terminated).  I tried to
write a program that reads in this database and then allows the user
to edit fields.  If the new field value is shorter than the original,
I simply overwrote the original value; if not, I allocated a new
string for the new value, and changed the record's field pointer to
point there instead of into the middle of the original line.  But when
it came time to deallocate, I needed to know whether individual fields
needed to be deallocated independently of the lines.  Had I gotten
around to finishing this program, I probably would have added a
parallel data structure containing flags indicating whether each field
had been reallocated.

In general, what I see coming out of this is that manual deallocation
may always be possible, but it frequently requires a non-trivial
amount of extra coding to provide bookkeeping to let the program know
when it is safe to deallocate things.  Remember, every additional line
of code in a program increases its complexity and is another potential
failure point.  And details such as storage management are generally
uninteresting to the application programmer, who is probably more
interested in the problem domain, so it is likely that he won't be as
careful when designing, coding, and testing them (storage management
is also difficult to test, unless you or the runtime library include
extra mechanisms just to improve debuggability).  Garbage collectors,
on the other hand, are usually implemented by system programmers who
find them interesting and can be expected to care enough to do the
very best.  And if you don't think these psychological factors are
important, then you're working with programmers of a different species
than I'm familiar with (that reminds me, I wonder where my "Psychology
of Computer Programming" is -- probably in my parents' house,
unfortunately, or I'd look up references), and I wouldn't want to work
with them!

In conclusion, I expect that I can manage storage as well as you can.
And I could also manually convert a number to a character string and
send these characters to an I/O device.  Every major language provides
runtime support for the latter so we don't all have to waste our time
writing it.  What's so special about storage management that we
should all be burdened with it?

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/12/89)

From article <35328@think.UUCP>, by barmar@think.COM (Barry Margolin):
> First of all, whether the file is locked is immaterial to the
> discussion (I never actually said that the compiler compiles files --
> in Lisp the compiler can also be invoked on in-core interpreted
> functions, and many Lisp programming environments allow in-core editor
> buffers to be compiled).

    Whatever is being processed, be it a file or something else,
    would have to be locked.  If it is already inaccessible to
    all other processes, then it is continuously locked already.

> [discussion of copying vs. read-locking]

    All editors I know of work by copying the targeted file into
    memory, holding it there while it's being modified, and then
    writing it back to the file.  Another approach is the locking
    mechanism.  Since compiler warnings generally amount to very
    small text files, I'd probably go the copying route if locking
    /unlocking consumed too much time.  However, a good argument 
    can also be made that there should not be 3 million people 
    simultaneously editing and/or compiling the same file anyway, 
    so it probably makes little difference which method is chosen. 

> Most modern GC schemes have time overhead that is a function (often
> linear) of the frequency of allocation.  Since assignments are
> always more frequent than allocations, and I suspect usually MUCH more
> frequent, this difference is important.

    No, not just the frequency of allocation.  GC's performance also
    depends upon the frequency of running out of memory.  Furthermore,
    GC is a global mechanism, and it wastes much time scanning space
    which is already being properly managed.  
 
> the subroutine for reading the database into memory read each line
> into a newly allocated string, replaced each delimiter with a null
> character (C's string terminator), and allocated structures containing
> pointers to the first character of each field.  This was fine for the
> read-only applications, as they could simply deallocate all the
> strings and record structures that were allocated (they didn't
> actually bother, since the programs run on Unix and they depended on
> everything being deallocated when the program terminated).  I tried to
> write a program that reads in this database and then allows the user
> to edit fields.  If the new field value is shorter than the original,
> I simply overwrote the original value; if not, I allocated a new
> string for the new value, and changed the record's field pointer to
> point there instead of into the middle of the original line.  But when
> it came time to deallocate, I needed to know whether individual fields
> needed to be deallocated independently of the lines.  Had I gotten
> around to finishing this program, I probably would have added a
> parallel data structure containing flags indicating whether each field
> had been reallocated.

     Why was each line read into a newly allocated monolithic string,
     with pointers into this string?  It would seem far more sensible
     to read each *field* into a newly allocated string; then when we
     need to revise a field to a larger value, deallocate the old string 
     and allocate a new one.  Flags are not necessary.

> In general, what I see coming out of this is that manual deallocation
> may always be possible, but it frequently requires a non-trivial
> amount of extra coding to provide bookkeeping to let the program know
> when it is safe to deallocate things.  Remember, every additional line
> of code in a program increases its complexity and is another potential
> failure point.  And details such as storage management are generally
> uninteresting to the application programmer, who is probably more
> interested in the problem domain

     Which is why application programmers should make use of ADTs,
     which encapsulate and hide the details of storage management.

> In conclusion, I expect that I can manage storage as well as you can.
> And I could also manually convert a number to a character string and
> send these characters to an I/O device.  Every major language provides
> runtime support for the latter so we don't all have to waste our time
> writing it.  What's so special about storage management that we
> should all be burdened with it?

    Conversion of a number to a character string will perform identically
    whether we use a system routine or code that same routine ourselves.

    Deallocation of storage is a one-time cost (and a small one at that)
    if done by the programmer.  Given the implicit destruction of local
    environments and the use of ADTs, application programmers will 
    practically never have to do any explicit deallocation anyway.  
    When it is necessary, it's not that difficult.  In the database example,
    a single line of code would suffice.



                                     Bill Wolfe

                               wtwolfe@hubcap.clemson.edu

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/12/89)

From article <4232@enea.se>, by sommar@enea.se (Erland Sommarskog):
> I'm glad to hear that Reference.all will survive the block exit.
> I am still a little puzzled, though. Assume that instead of a "simple 
> pointer" we actually have a linked list, declared as (limited) 
> private in some package. Then, it's Destroy routine will be called
> at block exit. Now, I don't know how you folks write your Assign
> procedures for linked lists, but I do simple pointer copying. 

    I copy the values, whatever they are.  If they are pointers,
    then pointers will be copied.  If they are ADTs, then ADTs
    will be copied.  If the user can't tolerate the costs of copying,
    the user is free to make his/her list a list of pointers rather
    than a list of objects.  

    Pointer copying constitutes structural sharing, which is unacceptable;
    if a user updates the copy, that must not propagate back to the original.

> Finally, Bill Wolfe has claimed that not having garbage collection
> helps the programmer to understands all aspects of his code, both
> time and space. Eh, there are some more aspects on the code than
> so. For example, that it really does what it supposed to. It may very
> elegantly sweep away all used memory and that, but tell us that
> 2 + 2 = 5. No good program. Fact is, in many applications, space is
> seldom an important problem, if at all. All you want to be sure of
> is that you really got rid all of you allocated, so that your interactive
> user donesn't get a dump after five hours work.

    Precisely why implicit destruction is a valuable aid to programmers,
    leaving them secure in the knowledge that whatever local variables
    they create will be automatically destroyed upon block exit, and free
    to concentrate on the other aspects of the application such as how
    much space the procedure will consume while it is active, which then
    feeds into the overall order-of-magnitude figure for space consumption.  



                                  Bill Wolfe

                           wtwolfe@hubcap.clemson.edu

barmar@think.COM (Barry Margolin) (01/12/89)

In article <4066@hubcap.UUCP> billwolf@hubcap.clemson.edu writes:
>From article <35328@think.UUCP>, by barmar@think.COM (Barry Margolin):
>> First of all, whether the file is locked is immaterial to the
>> discussion (I never actually said that the compiler compiles files --
>> in Lisp the compiler can also be invoked on in-core interpreted
>> functions, and many Lisp programming environments allow in-core editor
>> buffers to be compiled).
>
>    Whatever is being processed, be it a file or something else,
>    would have to be locked.  If it is already inaccessible to
>    all other processes, then it is continuously locked already.

The reason I said it was immaterial was because we are discussing
garbage collection and structure sharing, not file management.

>
>> [discussion of copying vs. read-locking]
>
>    All editors I know of work by copying the targeted file into
>    memory, holding it there while it's being modified, and then
>    writing it back to the file.  Another approach is the locking
>    mechanism.  Since compiler warnings generally amount to very
>    small text files, I'd probably go the copying route if locking
>    /unlocking consumed too much time.  However, a good argument 
>    can also be made that there should not be 3 million people 
>    simultaneously editing and/or compiling the same file anyway, 
>    so it probably makes little difference which method is chosen. 

And you're still harping on how the editor manages the file.  The
whole point of my example was on how the editor, compiler and command
processor, running in the same address space (the compiler could be
invoked as a subroutine by the editor, for instance), share the
in-memory warnings database.  3 million people aren't editing and
compiling the same file, but one user is, and he may tell the command
processor to delete the compiler warnings before he's finished
scanning through them in the editor (because he doesn't want them
cluttering up the database if he asks for other compiler warnings to
be displayed).

>> Most modern GC schemes have time overhead that is a function (often
>> linear) of the frequency of allocation.  Since assignments are
>> always more frequent than allocations, and I suspect usually MUCH more
>> frequent, this difference is important.
>
>    No, not just the frequency of allocation.  GC's performance also
>    depends upon the frequency of running out of memory.  Furthermore,
>    GC is a global mechanism, and it wastes much time scanning space
>    which is already being properly managed.  

Real-time GC mechanisms are defined to do a bounded amount of work PER
ALLOCATION.  Decent GC mechanisms can be told to ignore manually
managed space.  And generational and ephemeral GC mechanisms
concentrate their efforts on memory most likely to contain garbage.

>     Why was each line read into a newly allocated monolithic string,
>     with pointers into this string?  It would seem far more sensible
>     to read each *field* into a newly allocated string; then when we
>     need to revise a field to a larger value, deallocate the old string 
>     and allocate a new one.  Flags are not necessary.

Probably because the language's runtime library provides operations
for reading by lines and for searching strings.  A program that uses
standard library routines is easier to read than one that uses its
own.

I admit that the database package was not written wonderfully (the guy
who wrote it is primarily a Lisp programmer, but he was one of the few
people in our company willing to write C utilities at the time).  It's
a locally-used facility, not part of any product, so high quality was
not the top priority.

>     Which is why application programmers should make use of ADTs,
>     which encapsulate and hide the details of storage management.

No.  The application programmer must still know when to call the ADT's
DESTROY procedure, or know to call the LOCK and UNLOCK procedures,
etc.

>    Deallocation of storage is a one-time cost (and a small one at that)
>    if done by the programmer.  Given the implicit destruction of local
>    environments and the use of ADTs, application programmers will 
>    practically never have to do any explicit deallocation anyway.  
>    When it is necessary, it's not that difficult.  In the database example,
>    a single line of code would suffice.

You continue with this "one-time cost" fallacy.  It is a cost that
must be paid at least every time an ADT is designed and implemented,
and it also adds complexity that must be borne by the users of the
ADT.  GC is truly a one-time cost (per runtime implementation).


Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/12/89)

From article <35340@think.UUCP>, by barmar@think.COM (Barry Margolin):
> 3 million people aren't editing and
> compiling the same file, but one user is, and he may tell the command
> processor to delete the compiler warnings before he's finished
> scanning through them in the editor (because he doesn't want them
> cluttering up the database if he asks for other compiler warnings to
> be displayed).

    Why doesn't the editor do the deletion of warnings?  If the editor
    is already providing a "search for next warning" function, it should
    provide the "delete warning" function as well.  Then we wouldn't have
    to deal with shared variables at all, although your later comments do
    make the whole question moot.

> Real-time GC mechanisms are defined to do a bounded amount of work PER
> ALLOCATION.  [....]  And generational and ephemeral GC mechanisms
> concentrate their efforts on memory most likely to contain garbage.

    Could you send me a detailed explanation of these mechanisms?

> Decent GC mechanisms can be told to ignore manually managed space.  

    If this means that I can specify that my allocation requests carry
    the condition that the GC mechanism will stay away from the space
    I receive, then I'll be happy to change my position on garbage collection.

    (Erland, you may have spoken too soon!)

    HOWEVER.  There are still things that need changing.  There still needs
    to be implicit destruction of local environments via calls to DESTROY
    routines.  Parameter passing is still ambiguous with respect to storage
    utilization.  In general, I can live with GC being required IFF I can
    also exert total control over when, how, and for how long every piece
    of space is occupied, and I can determine exactly how much space is
    being used implicitly and for how long.  

    If what you say is true, then nothing prevents me from designing an 
    entire system in which GC is completely absent and for which I can make 
    firm statements regarding the system's time and space characteristics; 
    even though the compiler is required to provide GC for those who wish
    to use it, the programmer can require the compiler to provide a
    local environment in which GC is prohibited.  This I can live with.  



                                              Bill Wolfe

                                       wtwolfe@hubcap.clemson.edu

barmar@think.COM (Barry Margolin) (01/15/89)

In article <4073@hubcap.UUCP> wtwolfe@hubcap.clemson.edu writes:
>    Why doesn't the editor do the deletion of warnings?  If the editor
>    is already providing a "search for next warning" function, it should
>    provide the "delete warning" function as well.  Then we wouldn't have
>    to deal with shared variables at all, although your later comments do
>    make the whole question moot.

Who said that the editor doesn't provide for deletion of warnings,
too?  But that shouldn't prevent the command processor from also
deleting them.

>> Real-time GC mechanisms are defined to do a bounded amount of work PER
>> ALLOCATION.  [....]  And generational and ephemeral GC mechanisms
>> concentrate their efforts on memory most likely to contain garbage.
>
>    Could you send me a detailed explanation of these mechanisms?

The canonical reference for real-time GC is a paper, about 10 years
old, by Henry Baker.  I don't know the exact reference, but the title
included the phrases "Garbage Collection" and "Real-time", and it was
probably in something like a proceedings of a POPL conference.  The
general idea of these mechanisms is that every time an object is
allocated up to N words are scanned for non-garbage objects to be
copied.

My reference for ephemeral GC is David Moon's paper, approximately
titled "Garbage Collection in Large Virtual Memory Systems", which is
in the proceedings of the first (or maybe second) ACM Lisp Conference.
I don't know what the references for generational GC's are; recent
Lisp and Functional Programming Conference proceedings are probably a
good place to look.

>> Decent GC mechanisms can be told to ignore manually managed space.  
>
>    If this means that I can specify that my allocation requests carry
>    the condition that the GC mechanism will stay away from the space
>    I receive, then I'll be happy to change my position on garbage collection.

Well, the GC implementation that I'm most familiar with is the
Symbolics 3600.  This architecture organizes the address space into
logical entities called "areas".  Each area can be flagged as either
dynamic or static, the latter indicating that garbage should not be
collected there.  The GC still has to scan through static areas to see
whether they contain references to objects in dynamic areas, though;
otherwise the GC might create a dangling reference.

In the Symbolics system, the primary use of static areas is to prevent
pure code pages that are paged directly out of load files from being
copied into swap space.  They are also used for holding I/O buffers,
which are manually reused and therefore never become garbage.  The
features they are missing for your purposes are sophisticated
allocation and deallocation primitives; the only primitives the Lisp
Machine provides always allocate and deallocate space at the end of an
area, because the system depends on the GC to free up interior space.
But you could easily build a sophisticated allocator that manages its
own space inside a static area.

>    HOWEVER.  There are still things that need changing.  There still needs
>    to be implicit destruction of local environments via calls to DESTROY
>    routines.  Parameter passing is still ambiguous with respect to storage
>    utilization.  In general, I can live with GC being required IFF I can
>    also exert total control over when, how, and for how long every piece
>    of space is occupied, and I can determine exactly how much space is
>    being used implicitly and for how long.  

Lisp-like languages provide for this by providing macros that
encapsulate the allocation and deallocation of objects.  For example,
in Common Lisp one can write:

	(WITH-OPEN-FILE (variable file-name options...)
	  body...)

This opens the specified file, assigns the stream to the variable,
executes the body, and then closes the file.  And it guarantees that
the file will be closed no matter how the environment is exited.  This
is generally implemented using Lisp's UNWIND-PROTECT primitive, which
allows arbitrary code to be executed when an environment is being
exited.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar