[comp.sw.components] Garbage Collection & ADTs

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/21/89)

From scc@cl.cam.ac.uk (Stephen Crawley):
>> This ADT will do dynamic allocation INTERNALLY so as to facilitate
>> its own expansion and contraction as necessary.  Upon block exit, 
>> the ADT's destroy procedure will be invoked, and all the space
>> occupied by the ADT will be released.
>
> OK ... in the trivial case your scheme does work.  The conditions
> for it to work include:
> 
> a)	NO references to the ADT are passed out of scope.

   In keeping with the general principle that application programmers
   should not be using pointers; appropriate ADTs should be used instead. 

> b)	the scope exits before you run short of space.

   Or you have programmed your application to properly handle the
   exception STORAGE_ERROR.  The sequence is then as follows:

    1) An ADT request comes in for which the ADT decides it will
       need more space.  The ADT attempts to obtain the space, but
       STORAGE_ERROR is raised.  Fortunately, the ADT is programmed
       so as to guarantee to the user that the ADT will remain in
       its original state (i.e., the transaction will abort).  

    2) The ADT now either directly propagates STORAGE_ERROR or 
       something equivalent (Unable_To_Comply, for example) to 
       the user, in accordance with the ADT's specification.  

    3) The user can then take whatever steps are deemed appropriate
       in order to handle the exception.  Note carefully that only 
       the user is in a position to decide what data structures 
       to throw away in such a crisis.  

> c)	the programmer (i.e. the application!) does not incorrectly tell 
> 	the ADT to release an instance too soon.

   Since the default is that the ADT will die automatically upon going
   out of scope, the user will presumably refrain from invoking Destroy
   without a fairly good reason.

>   o  GC'ed memory management can reclaim junk (heap nodes that are no longer
>      needed) whenever it runs out of space, whereas a scope based scheme
>      can in general only identify and hence reclaim junk on scope exit.
>      In other words, scope based reclamation of memory will in general 
>      need more memory.

   Not true.  Consider -- what precisely is junk?  If we define it as
   "memory that is no longer needed due to deallocate operations", then
   the ADT will automatically free up all such memory by virtue of the
   semantics of the deallocation process; an order to deallocate 
   constitutes a guarantee from the ADT that it is completely safe 
   to do so; it knows this because it has received an order from the
   user (e.g., Remove_Item) which removes the need for the space, 
   and since the ADT is in total control of the space involved, it can
   proceed to issue the deallocation order and thereby reduce the size
   of the ADT's representation.  
 
   If we define junk as "memory which is strictly not necessary at this 
   point in the program", then we require a "Read Programmer's Mind" 
   instruction.  Not happening to have such an instruction, we must rely
   on the programmer to communicate the fact that a data structure is 
   no longer necessary via the Destroy command.

   Incidentally, the ADT can provide as one of its services a function
   which indicates the amount of space currently occupied by the ADT, 
   which the user can make use of when it becomes necessary to decide 
   which data structures to throw away.  This is made possible by the
   'SIZE attribute in Ada; the ADT simply adds up the 'SIZE of all the
   space it is occupying, and returns that value to the user.  The user
   can then simply select the data structure for which the cost function 
   (Size/Value) is maximal and order it Destroyed, and then proceed to 
   retry the aborted operation.
 
> Finally, don't forget that the case where condition a) is not true; i.e.
> some of the dynamically allocated ADT instance are passed out of scope
> either explicitly (as a result) or implicitly (by side-effect).

   Which can't happen unless either the applications programmer is 
   using pointers directly (naughty naughty), or something called by the
   application programmer (in effect, an extension of the application
   programmer) is doing it on his/her behalf.  In the latter case, we
   this presumably is in accordance with the called unit's specification, 
   and hence remains the fault of the applications programmer.

   The rule is simple: Leave the use of pointers to ADT implementors,
   and simply reuse the code they produce.  Exploit to the absolute 
   fullest the hard guarantees (proper space management, serializability, 
   recoverability, concurrent processing, etc.) that the ADT implementors
   worked so hard to provide you with, and thy Manager will be pleased.
   
> This whole GC versus non-GC debate seems to hinge the following trade-off.
> Do we want programs that are slower (by say 25%), but are intrinsicly 
> free of a large class of storage management errors, or are usually faster
> (not always), but may use more storage, may have potential storage leaks 
> and other storage management bugs, and which arguably take longer to 
> develop.

    All of the negatives you cite (except "may use more storage", which 
    is bogus) are overcome by the reuse paradigm.  The ADT is developed
    with extremely high quality at a high cost, but this cost is then 
    spread over an extremely large number of users until the end of time;
    the result is very much an economic win-win situation.


    Bill Wolfe, wtwolfe@hubcap.clemson.edu
 

maslen@Neon.Stanford.EDU (Thomas Maslen) (09/22/89)

In the last week or so there has been an interesting, er, discussion on
the desirability of GC in production code.  Bill "ADA is the one true
way" Wolfe and perhaps Anton Rang have argued the negative case, while
Ian Hopper, Scott Henninger, John Gateley, Steve Tynor, Rob MacLachlan,
Stephen Crawley, Ted Dunning and Ralph Johnson have been on the
affirmative [my apologies if I've mis-characterized anyone's position].
No prizes for guessing my biases:  I think the people arguing for GC
have the right idea.

The paragraph above added exactly no technical content to this
discussion.  What we need is something that does.  If we're to have
any hope of convincing the ADAholics that maybe, just maybe, languages
need GC in order to be useful, we need a crisp example of a problem
that:
      -	requires various unrelated parts of a program to use one
	instance of an ADT (either because it's too large to copy, or
	because we really do want updates to apply to the one
	instance),
      -	doesn't have any obvious point at which we can say "we're sure
	we've finished with it now, but we definitely couldn't have
	got rid of it much earlier",
      -	has sufficiently stringent efficiency requirements that we
	can't pass around magic numbers, or whatever it was that
	Bill W was proposing a few days ago, which have to be
	translated before we can actually use the ADT,
      - is sufficiently tricky that we can't rely on the programmer
	magically remebering to "do the right thing",
      -	is arguably representative of a non-trivial class of real
	problems.
I don't have such a problem handy, just a gut feeling that the insides
of real compilers are rich with suitable examples.  Any volunteers?
If nobody has a decent example, then maybe Bill W has a point.

Since this is likely to involve non-negligible volumes of code, and
some of us try to digest lunch while reading this group, I'd like to
suggest that E-mail may be more appropriate than clogging up the
group.

Next, a side bet:

In article <6530@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes:
[mucho stuff deleted]
>   In keeping with the general principle that application programmers
>   should not be using pointers; appropriate ADTs should be used instead. 
[more stuff deleted]
>   The rule is simple: Leave the use of pointers to ADT implementors,
>   and simply reuse the code they produce.  Exploit to the absolute 
>   fullest the hard guarantees (proper space management, serializability, 
>   recoverability, concurrent processing, etc.) that the ADT implementors
>   worked so hard to provide you with, and thy Manager will be pleased.

My guess is that any ADA solutions we see to my hypothetical problem
will involve auxiliary "appropriate ADTs" which are just limited
private types containing a pointer to the reference-counted real ADT.

Finally, has anyone made use of the (Boehm et al) GC-for-C that was
recently posted to comp.sources.unix?  Any comments?

Thomas Maslen					    maslen@polya.stanford.edu

mwolf@Teknowledge.COM (Michael Wolf) (09/22/89)

In article <11885@polya.Stanford.EDU> maslen@Polya.Stanford.EDU (Thomas Maslen) writes:
>My guess is that any ADA solutions we see to my hypothetical problem
>will involve auxiliary "appropriate ADTs" which are just limited
>private types containing a pointer to the reference-counted real ADT.

I don't see what the problem is with passing around private types
with pointers to reference-counted ADT's.  It adds a small amount of
overhead to pointer references, and a couple bytes per object for
reference counts.  If you've got a language which enforces the
"private" part, you're pretty damn sure that your reference count
is correct, and it's then easy to tell when it is or isn't okay
to deallocate an object.  The only thing you need is a destructor
routine called on each "private" pointer when it goes out of scope
or get's deallocated.

The nice thing about this approach is that you KNOW that garbage
collection isn't going to unpredicatably impact your performance.

Now, if you can find a GC algorithm that's as efficient, I'll consider
switching.  Also, I don't see that writing such a reference counting
ADT is all that much tougher then writing the same ADT without
the reference counting, so it doesn't take too much extra development
time.

There may be algorithms which require so many pointer references, and
so little memory allocation/deallocation that a little GC would be
more efficent, but I'm betting they're in the minority.  Of course,
it's possible to get applications where you can't afford the space
overhead of a couple bytes per object, and then GC would be the
way to go.

Michael Wolf            These opinions are bootleg from Taiwan, so it comes
mwolf@Teknowledge.COM   as no surprise to find that they're defective.

scc@cl.cam.ac.uk (Stephen Crawley) (09/23/89)

Michael Wolf writes:

> I don't see what the problem is with passing around private types
> with pointers to reference-counted ADT's.  It adds a small amount of
> overhead to pointer references, and a couple bytes per object for
> reference counts.  If you've got a language which enforces the
> "private" part, you're pretty damn sure that your reference count
> is correct, and it's then easy to tell when it is or isn't okay
> to deallocate an object.

Ok so far ... sort of

> The only thing you need is a destructor
> routine called on each "private" pointer when it goes out of scope
> or get's deallocated.

And there's the problem with this scheme.

As I tried to explain before, scope based deallocation works if and
only if:

  1) the scope DOES exit. [The global scope and the scope of main() 
     NEVER exit for the sake of this argument.]
  2) the scope exits BEFORE we start running out of space
  3) and we can afford the extra storage overhead of remembering
     that we allocated object X in this scope Y.  [You either need
     to build a chain of objects allocated at each scope level or
     a new heap for each scope level ...]
     
In general, one of the above criteria does not hold.  This means that
we must rely on some other mechanism to tell the ADT to apply the
destructor to a given instance at the appropriate time.  

I claim that this means that the application programmer needs to deal
with this in significant percentage of the cases.  This is possible,
but it is very difficult to get correct when (for instance) the private
pseudo-pointer type is itself passed around the place in complicated
ways or when it is stored in dynamic data structures.

Bill Wolfe maintains that the problem can be solved with a higher level 
library ADTs that implement generic list, tree, graphs etc.  I claim he 
is incorrect and have presented an example (see <902@scaup.cl.cam.ac.uk>) 
which I believe demonstrates this. 

Lets try not to rehash old arguments ...

-- Steve

scc@cl.cam.ac.uk (Stephen Crawley) (09/23/89)

Bill Wolfe wrote:
>>> This ADT will do dynamic allocation INTERNALLY so as to facilitate
>>> its own expansion and contraction as necessary.  Upon block exit, 
>>> the ADT's destroy procedure will be invoked, and all the space
>>> occupied by the ADT will be released.

I replied:
>> OK ... in the trivial case your scheme does work.  The conditions
>> for it to work include:
>> 
>> a)	NO references to the ADT are passed out of scope.

Bill replied:
>   In keeping with the general principle that application programmers
>   should not be using pointers; appropriate ADTs should be used instead.
 
An I claim that it is IMPOSSIBLE to build sufficiently general library 
ADT's!  See article <902@scaup.cl.cam.ac.uk> for an example.  [And I've 
a better one up my slieve ;-)]

I wrote:
>> b)	the scope exits before you run short of space.

Bill replies:
>  Or you have programmed your application to properly handle the
>  exception STORAGE_ERROR.  The sequence is then as follows:
>
> [Bill then describes a scheme where the application catches a 
> STORAGE_ERROR exception, causes the current transaction to be 
> aborted and throws away some (global) data structures after 
> asking the user.]

This is unrealistic.  It assumes that there are enough large global 
data structures to be thrown.  It also assumes that the application
contains the necessary code for interacting with the end user; i.e. 
asking questions that he/she has a chance of understanding.  Neither
will be true in general.

I wrote:
>> c)	the programmer (i.e. the application!) does not incorrectly tell 
>> 	the ADT to release an instance too soon.

Bill replies:
> Since the default is that the ADT will die automatically upon going
> out of scope, the user will presumably refrain from invoking Destroy
                    ^^^^
        [he means programmer I think ...]
		    
> without a fairly good reason.

In this case the *good reason* is that the scope does not exit soon 
enough; if the programmer doesn't reclaim the ADT before scope exit, 
the program will run out of space and die.

I wrote:
>> GC'ed memory management can reclaim junk (heap nodes that are no longer
>> needed) whenever it runs out of space, whereas a scope based scheme
>> can in general only identify and hence reclaim junk on scope exit.
>> In other words, scope based reclamation of memory will in general 
>      need more memory.

Bill replied:
> Not true.  Consider -- what precisely is junk?  

I define junk to mean dynamically allocated space which was allocated
in the current scope (or passed down from a higher scope) that is no
longer accessible from an in-scope variable anywhere in the program.

In a complex program, such junk may be generated in such a way that 
it IMPOSSIBLE for any application code or any library ADT to keep track 
of its accessibility ... without implementing a garbage collector!

Bill continues:
> If we define it as "memory that is no longer needed due to 
> deallocate operations" [...]

No thats not what I meant ...

Bill continues:
> If we define junk as "memory which is strictly not necessary at this 
> point in the program" 

This is approximately what I mean ...

> then we require a "Read Programmer's Mind" instruction.

Oh no we don't!!!!  A garbage collector can (and will) reclaim a lot of 
this junk by tracing all in-scope variables etc.  The garbage collector
can do a "perfect" job of this.  Storage reclamation code in an ADT or 
the application can only do a half-hearted job in difficult cases.
Such a difficult case might involve detecting that a subgraph containing
cycles has become detached from its root ... the case where reference
counting does not work.  My argument is that the difference between a
perfect job and imperfect job may be enough to cause the program to die.

I therefore maintain that my "may use more storage" point is CORRECT.

Finally (sigh) I wrote:
>> Finally, don't forget that the case where condition a) is not true; i.e.
>> some of the dynamically allocated ADT instance are passed out of scope
>> either explicitly (as a result) or implicitly (by side-effect).

Bill replies:
> Which can't happen unless either the applications programmer is 
> using pointers directly (naughty naughty), or something called by the
> application programmer (in effect, an extension of the application
> programmer) is doing it on his/her behalf.  In the latter case, we
> this presumably is in accordance with the called unit's specification, 
> and hence remains the fault of the applications programmer.

Sigh.

> The rule is simple: Leave the use of pointers to ADT implementors,
> and simply reuse the code they produce.  Exploit to the absolute 
> fullest the hard guarantees (proper space management, serializability, 
> recoverability, concurrent processing, etc.) that the ADT implementors
> worked so hard to provide you with, and thy Manager will be pleased.

Except that ADT implementors haven't because it is impossible.  And my
and their Managers had damn well better not start asking the impossible, 
'cos it will be on HIS head when we can't deliver!  (Don't worry, I will
have asked for a transfer ... or resigned by then.)

-- Steve

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/25/89)

From scc@cl.cam.ac.uk (Stephen Crawley):
>> [Bill then describes a scheme where the application catches a 
>> STORAGE_ERROR exception, causes the current transaction to be 
>> aborted and throws away some (global) data structures after 
>> asking the user.]
> 
> This is unrealistic.  It assumes that there are enough large global 
> data structures to be thrown.  It also assumes that the application
> contains the necessary code for interacting with the end user; i.e. 
> asking questions that he/she has a chance of understanding.  Neither
> will be true in general.

   This hasn't even a remote resemblance to what I said.

   The ADT aborts the transaction because it can't satisfy the
   request.  The user receives an exception to that effect.  The user
   does something to alleviate the problem (e.g., selecting and
   destroying a data structure) in the process of handling the
   exception, and proceeds to retry the operation.

   Incidentally, the questions you are asking (and this is NOT,
   repeat NOT, intended to reflect negatively on Mr. Crawley) in
   this and at least one previous article indicate that you do not
   really understand generics or exception handling at all, and I
   would therefore suggest that you investigate some basic Ada texts
   in order to understand these basic concepts and better participate
   in the discussion. 

> I define junk to mean dynamically allocated space which was allocated
> in the current scope (or passed down from a higher scope) that is no
> longer accessible from an in-scope variable anywhere in the program.
> In a complex program, such junk may be generated in such a way that 
> it IMPOSSIBLE for any application code or any library ADT to keep track 
> of its accessibility ... without implementing a garbage collector!

   And I contend that if the program is that complex, then it is
   badly written; the programmer probably has not made use of the
   basic principle that "abstraction is the fundamental means by
   which the computer scientist can combat complexity".  Abstract
   data types are an integral part of this process. 

>> then we require a "Read Programmer's Mind" instruction.
> 
> Oh no we don't!!!!  A garbage collector can (and will) reclaim a lot of 
> this junk by tracing all in-scope variables etc.  The garbage collector
> can do a "perfect" job of this.  Storage reclamation code in an ADT or 
> the application can only do a half-hearted job in difficult cases.
> Such a difficult case might involve detecting that a subgraph containing
> cycles has become detached from its root ... the case where reference
> counting does not work.  My argument is that the difference between a
> perfect job and imperfect job may be enough to cause the program to die.

    Not a difficult case at all, if we have (for example) a generic
    Simple_Directed_Graph ADT.   Again, the argument here reflects 
    a lack of understanding of the basic ideas behind generic software
    components.  
 
>> The rule is simple: Leave the use of pointers to ADT implementors,
>> and simply reuse the code they produce.  Exploit to the absolute 
>> fullest the hard guarantees (proper space management, serializability, 
>> recoverability, concurrent processing, etc.) that the ADT implementors
>> worked so hard to provide you with, and thy Manager will be pleased.
> 
> Except that ADT implementors haven't because it is impossible.  

    It is quite possible; start by reading Booch's "Software Components
    with Ada", and once you understand that you will be in a position
    to consider some more advanced techniques.  Then if you still want
    to argue against my position, you will at least understand it well 
    enough to launch a reasonable attack.


    Bill Wolfe, wtwolfe@hubcap.clemson.edu
 

grichard@cherokee.cis.ohio-state.edu (Golden Richard) (09/25/89)

In article <909@scaup.cl.cam.ac.uk> scc@cl.cam.ac.uk (Stephen Crawley) writes:
[pro-GC arguments deleted]

I find the current GC wars rather interesting, but I find myself siding
naturally with Bill simply because I have *never* encountered a situation
that absolutely demanded GC.   Even without automatic scope-exit destructors
for ADTs, programmer-controlled storage management isn't difficult.
I'd be most interested in seeing some concrete example where GC was the 
'only way to go'.

-=-
Golden Richard III        OSU Department of Computer and Information Science
grichard@cis.ohio-state.edu         "I'm absolutely positive!  ...or not."    

scc@cl.cam.ac.uk (Stephen Crawley) (09/25/89)

Golden Richard III writes:

> I find the current GC wars rather interesting, but I find myself siding
> naturally with Bill simply because I have *never* encountered a situation
> that absolutely demanded GC.   Even without automatic scope-exit destructors
> for ADTs, programmer-controlled storage management isn't difficult.
> I'd be most interested in seeing some concrete example where GC was the 
> 'only way to go'.

Here are some problems where programmer-controlled storage management 
would be very difficult or impossible.

  Interactive editting of cyclic graph data structures.  You have a 
  heterogeneous cyclic graph data structure and the editor must be able
  to make arbitrary sequences of changes to the data structure.  How
  do you detect that a subgraph has become detached?   
  
This sort of problem arises in many real life problems such as graphical 
editors and in various programming environment tools.  In practice, many
such applications don't try very hard to reclaiming storage.  When the
application has leaked so much space that the processor starts to thrash, 
the user can always save his work and restart ...

  Dynamic loading & unloading of modules in a running program.  An
  application consists of a static set of core modules and a number
  of other modules that are dynamically link-loaded on demand.  When
  the programmer recompiles a dynamic module, he may want to reload
  it and start using it without shutting down the entire application.
  But to do this, the dynamic loader needs to know if the module is
  currently being 'used' ... e.g. if some thread is executing a 
  procedure exported by the module or if some other part of the
  application has salted away a reference into the module (e.g. a 
  procedure pointer)

This problem arose when I was using Mesa / XDE to implement my entity system;
a persistent object system.  I did get the dynamic loading to work ... and 
it made a big performance difference, but dynamic unloading could not be 
done safely, since the Mesa / XDE environment has no garbage collector.

[Incidentally, the lack of garbage collection caused so many reliability 
problems that I was forced to give up using Mesa / XDE altogether
about 2 years ago.  The new version of the entity system uses MIT CLU 
as the base language.  CLU is garbage collected, and though the GC I am 
using is slow and has some unpleasant attributes, and though I've had
to do some extensive butchery on unsavoury parts of CLU runtime system,
I do not regret making the switch]

-- Steve

scc@cl.cam.ac.uk (Stephen Crawley) (09/25/89)

Bill Wolfe writes:
> Incidentally, the questions you are asking (and this is NOT,
> repeat NOT, intended to reflect negatively on Mr. Crawley) in
> this and at least one previous article indicate that you do not
> really understand generics or exception handling at all, and I
> would therefore suggest that you investigate some basic Ada texts
> in order to understand these basic concepts and better participate
> in the discussion.

Nice one Bill.

I'll have you know Mr Wolfe that I've been using languages with exception
handling (CLU and Mesa) and generic data types (CLU) for many years.  I
understand the concepts quite adequately thank you.  I've also implemented
a variety of storage managers, including garbage collectors both for short
term and long term data structures, and I've done quite a bit of work on
hybrid (i.e. partly GC'ed, partly application controlled) storage management 
solutions.

Frankly I don't see why it is necessary to understand ADA intimately
to talk about software reuse.  The fact that two issues seem to be so
firmly associated in your mind is very sad, since it seems to blind 
you the possibility of non-Ada solutions.

Mr Wolfe:  I used to believe fervently (like you) that the overhead 
garbage collection was unacceptable.  Many years of bitter experience 
trying to get by without it have lead me to change my mind.  Perhaps if 
you had had my experience your views would be different.  You never know, 
you might even be wrong!

I'll answer your "technical" points later.

-- Steve

wright@hsi.UUCP (Gary Wright) (09/25/89)

In article <62342@tut.cis.ohio-state.edu> Golden Richard <grichard@cis.ohio-state.edu> writes:
>In article <909@scaup.cl.cam.ac.uk> scc@cl.cam.ac.uk (Stephen Crawley) writes:
>[pro-GC arguments deleted]
>
>I find the current GC wars rather interesting, but I find myself siding
>naturally with Bill simply because I have *never* encountered a situation
>that absolutely demanded GC.   Even without automatic scope-exit destructors
>for ADTs, programmer-controlled storage management isn't difficult.
>I'd be most interested in seeing some concrete example where GC was the 
>'only way to go'.

Well, I have *never* encountered a situation that absolutely demanded using
a high level language.   That is why I still do all my programming in
assembly. :-)

If our criteria for deciding whether GC is better than explicit memory
management is based on absolutes, we will never come to a conclusion.

Explicit memory management is theoretically speaking equivelent to GC
(we are all using Turing machines aren't we?).  What we need to
determine is if a language that supports GC allows programs to be
expressed in a form that is "better" than a language without GC.
Clearly, it will be hard to come to a conclusion about which form of
expression is "better" than another.  We also need to determine if the
benefits derived from the ease of expression outweigh any performace
degradation.

This trade off has been made in the past.  A hand-coded assembly program
can probably be made to be more efficient in time and space than a program
generated by a high-level language compiler.  Yet most programmers 
don't use assembly.   Perhaps this analogy holds for the current debate
regarding GC.

It is interesting to note the advances that have been made in 
optimizing compilers.  Perhaps a sufficiently smart compiler for a language
that supports GC can figure out when GC can be avoided?  That is to say,
the compiler can notice when the last reference to an object will be lost 
and can explicity "dispose" of the object.  In general, the compiler won't
be able to determine this (thus GC) but for the simple cases, why not?

I have seen a number of people claim that GC can not be used in a
real-time system and then conclude that GC is a bad thing that should
not be used.  If this isn't a non sequitur I don't know what is.  There
will always be special cases that will require special techniques.
What I am interested in are language features that make my life easier
for the majority of applications that don't fall under the catagory of
"special case".  I also would like compilers that can do the dirty,
tedious work of deciding when to use special techniques instead of a
more general one.  Areas in which I see this now or in the future are:

	register usage
	parameter passing techniques
	dynamic vs. static binding (in the contect of OOP)
	function inlining
	GC vs non-GC

I am sure others can come up with more examples.

My predicition is that more an more people will start using languages
with GC but there will always remain those who think it is a "bad thing",
and there will always be situations in which GC should not be used.

There are people today who claim that we don't need anything more than
assemblers, and there are situations today in which it is the only 
reasonable solution.
-- 
Gary Wright 					...!uunet!hsi!wright
Health Systems International                    wright@hsi.com

grichard@hockey.cis.ohio-state.edu (Golden Richard) (09/26/89)

In article <599@hsi86.hsi.UUCP> wright@hsi.com (Gary Wright) writes:
>In article <62342@tut.cis.ohio-state.edu> Golden Richard <grichard@cis.ohio-state.edu> writes:
>>In article <909@scaup.cl.cam.ac.uk> scc@cl.cam.ac.uk (Stephen Crawley) writes:
>>[pro-GC arguments deleted]
>
>Well, I have *never* encountered a situation that absolutely demanded using
>a high level language.   That is why I still do all my programming in
>assembly. :-)
>
>If our criteria for deciding whether GC is better than explicit memory
>management is based on absolutes, we will never come to a conclusion.
>

I never intended to sound as if I were daring someone to come up with an
application which I couldn't handle with programmer-managed reclamation.
The "absolutely demanded" phrase in my article was prompted by references
to (unnamed) applications which supposedly require GC to ease the burden
on the programmer.

To rephrase my statement, I have never encountered an application where
GC was necessarily preferable to programmer-managed reclamation.
Furthermore, I can't imagine a properly designed application where
storage management is so complicated that the usual techniques for 
programmer-managed storage reclamation aren't adequate.   If the
programmer can't define when a certain object can go poof, I suspect
a serious lack of understanding of the problem at hand.







-=-
Golden Richard III        OSU Department of Computer and Information Science
grichard@cis.ohio-state.edu         "I'm absolutely positive!  ...or not."    

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/26/89)

From wright@hsi.UUCP (Gary Wright):
> I have seen a number of people claim that GC can not be used in a
> real-time system and then conclude that GC is a bad thing that should
> not be used.  

   You're missing several important steps here having to do with
   performance degradation and the violation of the general principle
   that one should never do at run time that which can be done instead
   at compile time, design time, etc.; in general, the sooner a problem
   is handled (and similarly, the sooner an error is detected), the less
   costly it will be to handle it.  Why pay and pay and pay at run time
   when an efficient self-managing solution (valid forevermore) can be 
   used (and reused) instead?


   Bill Wolfe, wtwolfe@hubcap.clemson.edu

amull@Morgan.COM (Andrew P. Mullhaupt) (09/26/89)

	The debate raging over garbage collection in this newsgroup is an
interesting contrast to the "effect of free()" thread in comp.lang.c; there
are many there who argue that they need to examine pointers after de-allocating
them(!) One argument advanced is that even though it's a bad thing, since it's
usually OK, it should become standard practice. This is to my mind in character
for C programmers. In this newsgroup, we find other programmers arguing that
programmers should have no direct use for pointers, and so the language can
determine the correct lifetime of dynamic data.

	I suggest it is going to prove necessary for the languages which have
this "Pascalian" point of view to close all loopholes (typecasts to integer,
etc.) to prevent the widespread use of C style perfidious trickery to escape
from sensible programming practice. (If you give them an inch...)

	I would also point out that despite the claims of its adherents, APL
(A Pointerless Language) despite having no explicit pointers does not do a
great job with storage management. Often space is allocated, then de-allocated
as soon as possible, only to be re-allocated for the same object, somewhere
else in the workspace, with the attendant risks of fragmentation and garbage
collection. The programmer is often forced to peculiar code (even for APL)
in order to try and provide memory management cues to the interpreter. There
is often no relief for the problem when storage efficiency requires re-use of
a local variable for more than one purpose. At least they have the cute idea
of always garbage collection immediately after printing the last response to
the user. (This has the effect of making garbage collection less noticable 
when working in the typical sofware engineer's paradise of a tiny test problem,
and excrutiatingly painful when you get a trivial syntax error in a workspace
where you just started a huge operation.) The point is that it makes a lot of
difference when garbage collection is performed, and if the programmer can 
control it.

	Another APL story: When calling FORTRAN from APL, you must allocate
the storage for the FORTRAN subprogram in the APL workspace. There are many
important FORTRAN subprograms which involve two calls sharing data in an
auxilliary array residing in the APL workspace, (i.e. at the mercy of the APL
garbage collector.) It can happen that APL decides it must perform garbage
collection between the two calls to FORTRAN and the auxilliary array is moved
to a new home. As can be expected in cases like this, the manual guarantees
unpredictable results. The classic example of this is a Fast Fourier Transform
which sets up all the trig calls in one call and performs the transform in a
second. Repeated transforms can then be done by repeating the second call only,
if you're calling from FORTRAN, (or C, or Pascal etc.) but this is too daring
from APL. And since there is no way to force APL to leave your array alone, the
only hope, and common practice, is to ensure that the garbage collection is
performed before the first call to FORTRAN, and to look at the the size of the
available workspace to make sure the arrays to be transformed are not large
enough to trigger the calamity again. This means that you always suffer the
performace hit of garbage collection, and you always have to essentially manage
your own storage. If writing an FFT in APL wasn't so slow, and the highly
vectorized FORTRAN FFT so fast, and the data arrays so large as to endanger
garbage collection, this wouldn't be much of a payoff. But they are.

	This last example falls into the category of "application programmer
doing something naughty (by extension through library routine) with (what are
essentially) pointers." The interesting part is that *BOTH* APL and FORTRAN get
to accuse you of this: APL doesn't want you to rely on the array being in any
given place, and FORTRAN doesn't want you to be able to move it. This is the
kind of unforseen conflict which prevents "RE-USABILITY" in many cases. From
their separate points of view, both the FORTRAN and APL implementors have
correct and sensible implementations, but their products are being used (to
very good result) in a way which neither expected, and through silly tricks
which make for highly unreadable and difficult to maintain code. 

	The moral of the story is: 

1. Put in garbage collection, but also put in sufficient controls for the
programmer.

2. If you need to make assertions about scope of dynamic data, FORCE them.

3. Programmers (even us Pascal lovers) will not stop at anything when the
big push comes to the big shove, and we'll likely get around sensible 
and normal guidelines which are not supported by adequate sensible facilities.

Later,
Andrew Mullhaupt.

Disclaimer:
Any opinions here are not necessarily those of my employer.

(Did you know that Morgan Stanley & Co. Inc., is the world's largest APL
  shop? IBM told us so...)

jans@tekgvs.LABS.TEK.COM (Jan Steinman) (09/26/89)

<62342@tut.cis.ohio-state.edu (Golden Richard)>
<<wright@hsi.UUCP (Gary Wright)>>

<...I can't imagine a properly designed application where storage management is 
so complicated that the usual techniques for  programmer-managed storage 
reclamation aren't adequate.   If the programmer can't define when a certain 
object can go poof, I suspect a serious lack of understanding of the problem at 
hand.>

One who cannot imagine a need for garbage collection other than to correct for 
programmer deficiencies simply lacks imagination.

The situation is not so simple in dynamic-bound systems.  Are you claiming that 
Smalltalk, Lisp, Eiffel, Objective C, Gnu Emacs, et. al. have garbage 
collectors simply because those who program in them lack serious understanding 
of their problems?

In particular, systems designed for rapid prototyping may have storage systems 
that have completely non deterministic usage patterns.  I'm certain those who 
have used non-GC systems to build things similar to the NeXT InterfaceBuilder 
or the Smalltalk Browser have come to appreciate the need for GC.

To bring it back to the newsgroup, the problem of reusable components is 
intimately concerned with automatic storage management.  The need to document 
storage procedures is yet another reason truly reusable components are rare.

I really agree with Gary: <If our criteria for deciding whether GC is better 
than explicit memory
management is based on absolutes, we will never come to a conclusion.>  Once 
the religious give up their stubborn, blind opposition to GC, we can all begin 
to discover when to use it, when not to, and how to control it so that it stays 
out of the way when not used.  I can pound most nails with the handle of a 
screwdriver, but I'm a fool if I declare there will never be a hammer in my 
toolbox!

In particular, I don't think the combination of asynchronous GC with programmer 
storage handling has been adequately explored.  Creation and destruction of 
many data can be handled by a compiler, and most real-time systems can have 
scheduled "idle time" in which to perform overhead activities.  (A SONAR 
project I worked on performed GC during the few milliseconds then the beam was 
sweeping through the hull.)

							   Jan Steinman - N7JDB
						  Electronic Systems Laboratory
					Box 500, MS 50-370, Beaverton, OR 97077
						(w)503/627-5881 (h)503/657-7703

grichard@cherokee.cis.ohio-state.edu (Golden Richard) (09/26/89)

In article <5995@tekgvs.LABS.TEK.COM> jans@tekgvs.LABS.TEK.COM (Jan Steinman) writes:
><62342@tut.cis.ohio-state.edu (Golden Richard)>

[stuff deleted]

>The situation is not so simple in dynamic-bound systems.  Are you claiming that 
>Smalltalk, Lisp, Eiffel, Objective C, Gnu Emacs, et. al. have garbage 
>collectors simply because those who program in them lack serious understanding 
>of their problems?
>

I am claiming no such thing.   In languages where the choice for GC has been
made *for* the programmer, little other choice remains. As a matter of fact,
I do not even intend to say that GC is inferior to programmer-managed
storage reclamation.   The fact that I am relatively ignorant of applications
best suited to using GC is why I'm asking the pro-GC spokespersons to please
give a concrete example where one would truly need or want GC over the
alternative.   

I'm certainly not of the "assembly language" school that believes that
everything should be explicitly and intricately hacked out by the
programmer.  In particular, automatic scope entry/exit creation/destruction
is a godsend.  I just don't see the point in making things more complicated
than they need to be.


-=-
Golden Richard III        OSU Department of Computer and Information Science
grichard@cis.ohio-state.edu         "I'm absolutely positive!  ...or not."    

grichard@cherokee.cis.ohio-state.edu (Golden Richard) (09/26/89)

If anyone received multiple (somewhat mangled and incomplete) copies of my 
previous article, I would appreciate email.  I was having some technical
problems.   Thanks in advance.    ...and now back to the storage wars...

-=-
Golden Richard III        OSU Department of Computer and Information Science
grichard@cis.ohio-state.edu         "I'm absolutely positive!  ...or not."    

twl@brunix (Ted "Theodore" (W) Leung) (09/26/89)

In article <599@hsi86.hsi.UUCP> wright@hsi.com (Gary Wright) writes:
>It is interesting to note the advances that have been made in 
>optimizing compilers.  Perhaps a sufficiently smart compiler for a language
>that supports GC can figure out when GC can be avoided?  That is to say,
>the compiler can notice when the last reference to an object will be lost 
>and can explicity "dispose" of the object.  In general, the compiler won't
>be able to determine this (thus GC) but for the simple cases, why not?

There has been some work done in this area.....

@incollection{NDJSSM81,
	author = "Neil D. Jones and Steven S. Muchnick",
	title = "Flow Analysis and Optimization of LISP-like Structures",
	booktitle = "Program Flow Analysis: Theory and Applications",
	publisher = "Prentice Hall",
	year = "1981",
	editor = "Steven S. Muchnick and Neil D. Jones",
	chapter = "4",
	pages = "102-131"
}


Jones and Muchnick present some techniques which allow exactly the
kinds of determinations which you are talking about.
--------------------------------------------------------------------
Internet/CSnet: twl@cs.brown.edu 	| Ted Leung
BITNET: twl@BROWNCS.BITNET		| Box 1910, Brown University
UUCP: uunet!brunix!twl			| Providence, RI 02912

wright@hsi.UUCP (Gary Wright) (09/26/89)

In article <6579@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes:
>From wright@hsi.UUCP (Gary Wright):
>> I have seen a number of people claim that GC can not be used in a
>> real-time system and then conclude that GC is a bad thing that should
>> not be used.  
>
>   You're missing several important steps here having to do with
>   performance degradation and the violation of the general principle
>   that one should never do at run time that which can be done instead
>   at compile time, design time, etc.; in general, the sooner a problem
>   is handled (and similarly, the sooner an error is detected), the less
>   costly it will be to handle it.  

In my posting I tried to emphasize the fact that there is probably a
trade-off between using a language that supports GC  vs. one that
doesn't.  It is not clear to me that the relative costs are such as to
render GC useless.  I do believe that it is harder to measure the costs
associated with non-GC languages. Please provide some reasoning for your 
statements which indicate that you feel differently.  Simply claiming that 
the costs do not support GC is not enough.  I will try to provide some
reasoning for GC below.

I agree that that as much as possible should be done at compile or 
design time.  I would prefer that the compiler did most of the dirty work.
Also, in the context of reusable software, design decisions should, as much
as possible, not limit later reuse of the software (e.g. a clear violation
of this is C++'s "virtual" and "inline" keywords).

>   Why pay and pay and pay at run time
>   when an efficient self-managing solution (valid forevermore) can be 
>   used (and reused) instead?

It is not clear to me that garbage collection entails a particularly
harsh performance penalty.  It is also not clear to me that the
self-managing solution is simple enough to allow such components to be
easily created and modified (i.e. modification via inheritance, which
Ada does not support).

After reading your posting, I went back and read the sections in
"Object-oriented Software Construction" regarding memory management.
What follows is mostly Bertrand Meyer's ideas regarding memory
management.  I tend to agree, so I will simply present his ideas with
my comments.

Bertrand Meyer claims that programmer-controlled deallocation "is
unacceptable for two reasons: security and complication of program
writing."

By security, Meyer means that programmers are mortals, and they will
make mistakes and will dispose of objects that still have active 
references.  Complication refers to the fact that simply disposing
of an object is not sufficient, all internal objects must also
be disposed.  Meyer calls this the "recursive dispose problem":

	This means that a specific release procedure must be
	written for any type describing objects that may refer
	to other objects.  The result will be a set of mutually
	recursive procedures of great complication.

	[...]
	Instead of concentrating on what should be his job -
	solving an application problem - the programmer turns 
	into a bookkeeper, or garbage collector (whichever 
	metaphor you prefer).  The complexity of programs
	is increased, hampering readability and hence other
	qualities such as ease of error detection and
	ease of modification.  This further affects security: 
	the more complex a program, the more likely it is to 
	contain errors.

This is the trade-off I was talking about. You can do without GC,
but the increased complexity of your programs has its own
penalty also.

Meyer goes on to describe an alternative, the "Self-Management Approach"
in which the designer of an ADT takes care of the storage management
associated with the ADT.  This is the approach I believe Bill Wolf
is advocating.  This is indeed a "responsible alternative" to leaving
everything to the programmer.  Meyer advocates the use of this technique
for languages that do not support GC.  In his example, all objects to
be added to a linked list, are copied into storage that is managed
by the linked list.  This would seem to be quite inefficient for large
objects.  I'm not sure how storage could be safely managed when an ADT
only holds references to objects.  

In his example, procedures that add nodes to or delete nodes from the
linked list cause new nodes to be created or destroyed.  If an ADT
provides direct procedures to add or destroy objects within the ADT,
then the ADT has not really hidden the problem from the programmer at
all.  At the most, it has hidden the recursive dispose problem.

This approach does not solve the previously mentioned problems of
security and complication.  Instead of the applications programmer
worrying about storage, the ADT designer must worry about it.  My hunch
is that the distiction between an ADT designer and an applications
programmer is clear for objects like linked lists, stacks etc. but that
it is not so clear the farther away from the basic data structures you
get.  I wonder if the overhead related to having ADT's managing storage
via copying, and redundant code (all the recursive dispose procedures)
doesn't come close to the overhead for GC.

It should be noted, that Meyer advocates a GC facility that is
incremental in nature (not bursty), and can be explicitly turned on or
off when necessary (e.g. when real-time constraints exist).

Your turn. :-)
-- 
Gary Wright 					...!uunet!hsi!wright
Health Systems International                    wright@hsi.com

wright@hsi.UUCP (Gary Wright) (09/26/89)

In article <62546@tut.cis.ohio-state.edu> 
Golden Richard <grichard@cis.ohio-state.edu> writes:

>If the
>programmer can't define when a certain object can go poof, I suspect
>a serious lack of understanding of the problem at hand.

But I wonder if the code is structured so that a programmer can
define when a certain object can go "poof", whether the program
can be understood by anyone other than the original programmer
and whether the code can be easily reused.

Unfortunately, I don't have a good example at hand (a major problem
with this discussion in general, as has been noted by others).
-- 
Gary Wright 					...!uunet!hsi!wright
Health Systems International                    wright@hsi.com

ram@wb1.cs.cmu.edu (Rob MacLachlan) (09/27/89)

From: wright@hsi.UUCP (Gary Wright)
Subject: Re:  Re: Garbage Collection & ADTs
Date: 25 Sep 89 15:14:32 GMT
[...]
>What we need to determine is if a language that supports GC allows programs
>to be expressed in a form that is "better" than a language without GC.

You are right on target here.  It is silly to claim that "problem X can only
be solved in language Y"; the interesting question is whether language Y
admits a solution to X that is:
 -- More efficient
 -- More robust
 -- More easily (quickly) written
 -- Easier to understand (to maintain)
 -- Aids code reuse [token concession to the nominal subject]
 -- More likely to be correct, etc.

>Perhaps a sufficiently smart compiler for a language that supports GC can
>figure out when GC can be avoided?  That is to say, the compiler can notice
>when the last reference to an object will be lost and can explicity
>"dispose" of the object.

For a long time Lisp compilers have attempted to recognize when some objects
can be allocated according to a stack discipline, rather than being heap
allocated.  This has been done primarily with numbers, lists and closures.
The ML compiler has an interesting optimization along these lines: if all
references to an allocated object can be identified as being in the same
function, then the allocation can be replaced with a collection of temporary
varaibles.

Of course, both of these allocation optimizations are aimed at eliminating
cons-and-throw-away behavior within a single function.  Programmers used to
thinking of storage allocation as expensive would never write such code.

 Rob

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)

From scc@cl.cam.ac.uk (Stephen Crawley):
> Here are some problems where programmer-controlled storage management 
> would be very difficult or impossible.
> 
>   Interactive editting of cyclic graph data structures.  You have a 
>   heterogeneous cyclic graph data structure and the editor must be able
>   to make arbitrary sequences of changes to the data structure.  How
>   do you detect that a subgraph has become detached?   

   The fact that a subgraph has become detached does not imply that
   it is no longer part of a graph.  Not all graphs are fully connected.
   Therefore, it isn't possible to release the storage used until there
   is some sort of directive which indicates that a given set of nodes
   and/or arcs is to be destroyed. 

>   Dynamic loading & unloading of modules in a running program.  
>   [...] the dynamic loader needs to know if the module is
>   currently being 'used' ... e.g. if some thread is executing a 
>   procedure exported by the module or if some other part of the
>   application has salted away a reference into the module (e.g. a 
>   procedure pointer)
> 
> This problem arose when I was using Mesa / XDE to implement my entity system;
> a persistent object system.  I did get the dynamic loading to work ... and 
> it made a big performance difference, but dynamic unloading could not be 
> done safely, since the Mesa / XDE environment has no garbage collector.

   This is an operating-system problem, not an application problem.
   As such, the situation is not "applicable to a wide range of
   applications".  The solution is operating-system dependent; one
   approach might be to use idle time to scan the processes in the
   process queue to determine if there are any processes which were
   started while the old module was in effect which made use of it.

   Operating systems do a lot of things (pointer arithmetic, etc.)
   which application systems should not be doing; this falls into
   exactly that category.


   Bill Wolfe, wtwolfe@hubcap.clemson.edu

tynor@prism.gatech.EDU (Steve Tynor) (09/27/89)

In article <6589@hubcap.clemson.edu>
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes:
...


>From scc@cl.cam.ac.uk (Stephen Crawley):
>> Here are some problems where programmer-controlled storage management 
>> would be very difficult or impossible.
>> 
>>   Interactive editting of cyclic graph data structures.  You have a 
>>   heterogeneous cyclic graph data structure and the editor must be able
>>   to make arbitrary sequences of changes to the data structure.  How
>>   do you detect that a subgraph has become detached?   
>
>   The fact that a subgraph has become detached does not imply that
>   it is no longer part of a graph.  Not all graphs are fully connected.
>   Therefore, it isn't possible to release the storage used until there
>   is some sort of directive which indicates that a given set of nodes
>   and/or arcs is to be destroyed. 

But what if _in_this_case_, a detached subgraph is garbage and should be
deallocated? G.C. surely simplifies the program and would enhance readability.
If you can easily detect when to deallocate - more power to you. But, it's 
not always easy.

...
>>   Dynamic loading & unloading of modules in a running program.  
>>   [...] the dynamic loader needs to know if the module is
>>   currently being 'used' ... e.g. if some thread is executing a 
>>   procedure exported by the module or if some other part of the
>>   application has salted away a reference into the module (e.g. a 
>>   procedure pointer)
...
>   This is an operating-system problem, not an application problem.
>   As such, the situation is not "applicable to a wide range of
>   applications".  The solution is operating-system dependent; one
>   approach might be to use idle time to scan the processes in the
>   process queue to determine if there are any processes which were
>   started while the old module was in effect which made use of it.

Again, this sounds like garbage collection to me (this time the OS is garbage
collecting the unloaded module...)

>   Operating systems do a lot of things (pointer arithmetic, etc.)
>   which application systems should not be doing; this falls into
>   exactly that category.

Are compilers not mini-operating systems (certainly, the Ada run-time
environment is!)? I'd much rather have my compiler doing my pointer arithmetic
(and storage management) than to expect my application to do so. I'm not
against explicit allocation/deallocation - I just want an alternative for the
cases when it's either too convoluted or impossible.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Eschew Obfuscation.
                     
    Steve Tynor
    Georgia Tech Research Institute
    Artificial Intelligence Branch
    tynor@prism.gatech.edu

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)

From wright@hsi.UUCP (Gary Wright):
> Bertrand Meyer claims that programmer-controlled deallocation "is
> unacceptable for two reasons: security and complication of program
> writing."  By security, Meyer means that programmers [...] will
> make mistakes and will dispose of objects that still have active 
> references.  

     Unless they leave the task to an ADT instead.
 
> Complication refers to the fact that simply disposing
> of an object is not sufficient, all internal objects must also
> be disposed.  Meyer calls this the "recursive dispose problem":
> 
> 	This means that a specific release procedure must be
> 	written for any type describing objects that may refer
> 	to other objects.  The result will be a set of mutually
> 	recursive procedures of great complication.

   Not true.  The user supplies a Destroy procedure for whatever
   is being stored.  The ADT, in the course of destroying itself,
   will call upon the user's Destroy procedure to handle the user's
   data type.  The writer of the Destroy procedure need only consider
   destroying his ADT, since the user's Destroy procedure can be relied
   upon to destroy all lower levels.  It's really rather simple.

> Instead of the applications programmer worrying about storage, the ADT
> designer must worry about it.  

   Precisely.  As an ADT designer, I consider the task of storage
   management to be among the least of my problems.  The difficult
   part is providing all the concurrency-related hard guarantees,
   which languages like Eiffel manage to avoid by not providing
   multitasking in the first place. 

> My hunch is that the distiction between an ADT designer and an 
> applications programmer is clear for objects like linked lists, 
> stacks etc. but that it is not so clear the farther away from the 
> basic data structures you get.  

   The distinction is clear: the ADT specification separates the
   application programmer from the ADT implementor, regardless of
   whether you perceive the structure as "basic" or not. 

> I wonder if the overhead related to having ADT's managing storage
> via copying, and redundant code (all the recursive dispose procedures)
> doesn't come close to the overhead for GC.

   Probably not, since this approach can exploit more information;
   it doesn't have to worry about figuring out whether or not a
   piece of space can be released, since it already knows it can.
   The Destroy procedures convey information which does not have
   to be rediscovered at great and continuing expense. 

> It should be noted, that Meyer advocates a GC facility that is
> incremental in nature (not bursty), and can be explicitly turned on or
> off when necessary (e.g. when real-time constraints exist).
> 
> Your turn. :-)

   We know that GC can coexist with managed components, as in Ada,
   where components can declare that the GC system (if any) must
   not operate on objects of a managed type.  We can easily agree
   that if GC is provided, there must be a way for managed types
   to turn it off as far as they are concerned, while leaving it 
   potentially on for others.  If a given piece of code contains 
   only managed types, then the compiler should notice this fact 
   and exclude GC code from the executable.

   The original point was that not all users can tolerate GC
   (real-time users in particular); on the other hand, all GC users
   can use managed components with no problems.  Therefore, if
   a component is to be designed for the widest possible audience,
   that component must manage its own storage.  If components are
   used which meet the highest possible standards, then we don't 
   have to worry about whether our components will stop working 
   when we do maintenance (which might introduce multitasking, 
   real-time operation, etc.) on our application; using GC introduces
   such weaknesses, in addition to giving an *avoidable* run-time cost. 


   Bill Wolfe, wtwolfe@hubcap.clemson.edu

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)

From amull@Morgan.COM (Andrew P. Mullhaupt):
> In this newsgroup, we find other programmers arguing that
> programmers should have no direct use for pointers, and so the language can
> determine the correct lifetime of dynamic data.

    Not exactly.  The argument is that pointers should only be 
    used by ADT implementors, not by application programmers.

> I suggest it is going to prove necessary for the languages which have
> this "Pascalian" point of view to close all loopholes (typecasts to 
> integer, etc.) to prevent the widespread use of C style perfidious 
> trickery to escape from sensible programming practice. (If you give 
> them an inch...)

    The real problem is not the elimination of mechanisms such as
    Ada's Unchecked_Conversion, but the maintenance of a software
    engineering perspective; bad code can be written in any language.


    Bill Wolfe, wtwolfe@hubcap.clemson.edu

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)

From tynor@prism.gatech.EDU (Steve Tynor):
>>   The fact that a subgraph has become detached does not imply that
>>   it is no longer part of a graph.  Not all graphs are fully connected.
>>   Therefore, it isn't possible to release the storage used until there
>>   is some sort of directive which indicates that a given set of nodes
>>   and/or arcs is to be destroyed. 
> 
> But what if _in_this_case_, a detached subgraph is garbage and should
> be deallocated? 

    OK, assume that we do want to kill detached subgraphs.  Which of
    the two (or more) connected components is the one we wish to keep? 
    
    If we assume that there is some distinguished node or arc which
    can never be deleted and which also serves to mark the "permanent"
    connected component, then presumably our graph abstraction will 
    allow us to mark one such node or arc as being endowed with these 
    magic properties.  Now we need an oracle which will tell us if
    a connected component contains the magic node or arc.  The ability
    to determine whether one node (or arc) is reachable if we start at 
    another node or arc is a generally useful capability which should 
    probably be provided to the user anyway, particularly since the 
    algorithm need only be looked up in a standard text on graph theory 
    and typed in.  Since any service which is not explicitly used will 
    be optimized away anyway, the user pays no real price for having the 
    service provided.  Having provided the service to the user anyway, 
    we can now use it ourselves to determine which component to throw 
    away.  Now this is obviously a special-case response to this 
    particular problem, but the problem is also a special case itself. 

    Even if computing it ourselves has the same computational complexity
    as computing it with a garbage collection algorithm, there is still
    a gain in that we can search a specific area at a specific time and
    therefore improve the predictability of our service.  We know that
    there will be no operation whose time complexity will be totally out
    of proportion to the size of our graph because we are paying for all
    garbage everywhere on the machine to be collected.  

    Finally, there is the problem of the GC system repeatedly being
    invoked (at great expense) when the amount of storage remaining
    becomes fairly small.  With a managed system, the user is given
    an opportunity to strategically throw away data structures, and
    thereby free up some space that the garbage collector could not
    possibly take because it is still "in use".
 
 
    Bill Wolfe, wtwolfe@hubcap.clemson.edu
 

djones@megatest.UUCP (Dave Jones) (09/27/89)

From article <62546@tut.cis.ohio-state.edu>, by grichard@hockey.cis.ohio-state.edu (Golden Richard):
>
> If the programmer can't define when a certain object can go poof, I suspect
> a serious lack of understanding of the problem at hand.
> 

Hear, hear.

But as I pointed out earlier, it is also useful to have stack-objects
and heap-objects that allow you to reclaim a whole bunch of objects all
at one go.

djones@megatest.UUCP (Dave Jones) (09/27/89)

From article <599@hsi86.hsi.UUCP>, by wright@hsi.UUCP (Gary Wright):
..
> 
> Well, I have *never* encountered a situation that absolutely demanded using
> a high level language.   That is why I still do all my programming in
> assembly. :-)
> 

Sorry, but I'm too tired of this assembler-hll analogy for a smiley to save
you. Grumble. Ggrrrrrrrrrrrrmpf.  Okay, okay, here: :-)  (Sort of.)

The analogy is particularly inappropriate in this context, because
Assembler and HLL are two techniques for preparing a program, whereas
automatically garbage-collected and explicitly garbage-collected
programs are different end-results.*

Automated garbage collection is runtime overhead.

It will take quite a bit of convincing to make me believe
that it is universally necessary or desirable.  I'm not saying it's
impossible, only that it's going to take better arguments than I've seen
so far. So in the mean time, I want the option of not using it. So
far I have used it approximately, now let's see.. approximately zero
times, give or take once.


*  The principle advantage of high-level languages over assembler
   is usually that it makes the program portable. It is a myth that assembly
   language programming is necessarily tedious and slow. Back in the mid to
   late seventies, there were macro-assembler and macro packages that made
   assembly language programming virtually as easy as programming in high level
   languages. I wrote one once. It handled procedure call/return, nested
   loops, structures, and all that sort of languagy kind of stuff quite nicely.
   Those macro packages were the evolutionary antecedants of HLLs.

   Languages like C++ do automate some rather tedious name-space manipulation,
   and that is another win. But many of the even "higher" level languages kind
   of miss the boat by making the syntax of the language reflect library-routine
   functionality.  I like new AWK, for example, but I really
   wish it were written as C++ classes, so that I could integrate it with
   my own stuff.

jesup@cbmvax.UUCP (Randell Jesup) (09/27/89)

In article <2079@hydra.gatech.EDU> tynor@prism.gatech.EDU (Steve Tynor) writes:
>In article <6589@hubcap.clemson.edu>
>>From scc@cl.cam.ac.uk (Stephen Crawley):
>>> Here are some problems where programmer-controlled storage management 
>>> would be very difficult or impossible.
>>> 
>>>   Interactive editting of cyclic graph data structures.  You have a 
>>>   heterogeneous cyclic graph data structure and the editor must be able
>>>   to make arbitrary sequences of changes to the data structure.  How
>>>   do you detect that a subgraph has become detached?   
...
>But what if _in_this_case_, a detached subgraph is garbage and should be
>deallocated? G.C. surely simplifies the program and would enhance readability.
>If you can easily detect when to deallocate - more power to you. But, it's 
>not always easy.

	But GC is unlikely to reclaim that storage.  Yes, the graph no longer
has pointers to the disconnected subgraph.  However, the subgraph may be
circular (you said it was cyclic), so all the nodes of the subgraph may still
have references to them.  Can current GC systems recognize such a loop and
remove it in it's entirety?  How complex can such a loop be and still be
recognized?

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.cbm.commodore.com  BIX: rjesup  
Common phrase heard at Amiga Devcon '89: "It's in there!"

scc@cl.cam.ac.uk (Stephen Crawley) (09/27/89)

Andrew P. Mullhaupt writes:
>> In this newsgroup, we find other programmers arguing that
>> programmers should have no direct use for pointers, and so the
>> language can determine the correct lifetime of dynamic data.

Bill Wolfe replies:
> Not exactly.  The argument is that pointers should only be 
> used by ADT implementors, not by application programmers.

Some of the best designed programing languages don't have explicit 
pointers at all!!  CLU, Eiffel and ML are good examples.

-- Steve

scc@cl.cam.ac.uk (Stephen Crawley) (09/27/89)

From wtwolfe@hubcap.clemson.edu:
>From tynor@prism.gatech.EDU (Steve Tynor):
>>From wtwolfe@hubcap.clemson.edu:
>>> The fact that a subgraph has become detached does not imply that
>>> it is no longer part of a graph.  Not all graphs are fully connected.
>>> Therefore, it isn't possible to release the storage used until there
>>> is some sort of directive which indicates that a given set of nodes
>>> and/or arcs is to be destroyed. 
>> 
>> But what if _in_this_case_, a detached subgraph is garbage and should
>> be deallocated? 
>
> OK, assume that we do want to kill detached subgraphs.  Which of
> the two (or more) connected components is the one we wish to keep? 
>    
> If we assume that there is some distinguished node or arc which
> can never be deleted and which also serves to mark the "permanent"
> connected component, then presumably our graph abstraction will 
> allow us to mark one such node or arc as being endowed with these 
> magic properties.  Now we need an oracle which will tell us if
> a connected component contains the magic node or arc.  
> [...] Having provided the service to the user anyway, 
> we can now use it ourselves to determine which component to throw 
> away.  Now this is obviously a special-case response to this 
> particular problem, but the problem is also a special case itself.

In YOUR experience, the problem may be a special case.  In my
experience and that of a number of other posters this problem arises 
frequently in one form or another.

> Even if computing it ourselves has the same computational complexity
> as computing it with a garbage collection algorithm, 

.. which is not necessarily true.  You are assuming that we have all
of the options open to the garbage collector guru; e.g. access to 
VM page tables, micro-code, etc.   Also, you are assuming that the
grungy tricks that sophisticated GC algorithms play can be expressed
in ADA and translated by the ADA compiler into efficient code.

Please don't imagine that it is easy to get garbage collection overheads
down to 5%!!

> [...] there is still
> a gain in that we can search a specific area at a specific time and
> therefore improve the predictability of our service.

There is nothing to stop the programmer from inserting a pragma to
force the GC to reclaim some space at a given point in a program.  

> We know that
> there will be no operation whose time complexity will be totally out
> of proportion to the size of our graph because we are paying for all
> garbage everywhere on the machine to be collected.

You are making invalid assumptions about the garbage collector again.

> Finally, there is the problem of the GC system repeatedly being
> invoked (at great expense) when the amount of storage remaining
> becomes fairly small.

Modern garbage collectors can avoid this problem in all but
extreme cases.  Anyway, it is a bad idea to try to run a program 
that allocates memory dynamically with "just enough" memory,
since under some circumstances "just enough" might actually be 
"just not enough".

> ... With a managed system, the user is given
> an opportunity to strategically throw away data structures, and
> thereby free up some space that the garbage collector could not
> possibly take because it is still "in use".

Bogus argument!  The case you are talking about is that where the 
programmer has a pointer to a data structure in a variable that is 
still in scope, but is not conceptually in use.  You suggest that
in the case of non-GC'ed storage the programmer could free the space
early by calling the deallocator.  Well in the GC'ed case, the 
programmer can arrange that the GC can free the data structure by
assigning NIL to the variable.    

-- Steve

wright@hsi.UUCP (Gary Wright) (09/27/89)

In article <6591@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes:
>From wright@hsi.UUCP (Gary Wright):
>> Bertrand Meyer claims that programmer-controlled deallocation "is
>> unacceptable for two reasons: security and complication of program
>> writing."  By security, Meyer means that programmers [...] will
>> make mistakes and will dispose of objects that still have active 
>> references.  
>
>     Unless they leave the task to an ADT instead.

We all agree that writing an ADT that correctly manages it's own storage
is a difficult task.  One of the benefits gained by using ADT's is that
of reuse, the user of an ADT does not have to re-create solutions to
problems that have been solved once and for all the the ADT provider.
Why not use the same reasoning to state that GC solves the problem
once and for all and each ADT designer does not have to re-create 
solutions for memory management?

> 
>> Complication refers to the fact that simply disposing
>> of an object is not sufficient, all internal objects must also
>> be disposed.  Meyer calls this the "recursive dispose problem":
>> 
>> 	This means that a specific release procedure must be
>> 	written for any type describing objects that may refer
>> 	to other objects.  The result will be a set of mutually
>> 	recursive procedures of great complication.
>
>   Not true.  The user supplies a Destroy procedure for whatever
>   is being stored.  The ADT, in the course of destroying itself,
>   will call upon the user's Destroy procedure to handle the user's
>   data type.  The writer of the Destroy procedure need only consider
>   destroying his ADT, since the user's Destroy procedure can be relied
>   upon to destroy all lower levels.  It's really rather simple.

So you agree that there is a recursive dispose problem but you believe that 
it isn't that complicated?  Writing these dispose procedures, is
straightforward but tedious and *boring*.  This is exactly the type
of situation that leads to errors.

>
>> Instead of the applications programmer worrying about storage, the ADT
>> designer must worry about it.  
>
>   Precisely.  As an ADT designer, I consider the task of storage
>   management to be among the least of my problems.  The difficult
>   part is providing all the concurrency-related hard guarantees,
>   which languages like Eiffel manage to avoid by not providing
>   multitasking in the first place. 
>

Let's stick to GC and leave concurrency for another day.  Granted, that
this is an important issue, and will become more important but we
should keep this discussion focused.  Also, I will admit to not being
as versed in this area as I would like to be.

>> My hunch is that the distiction between an ADT designer and an 
>> applications programmer is clear for objects like linked lists, 
>> stacks etc. but that it is not so clear the farther away from the 
>> basic data structures you get.  
>
>   The distinction is clear: the ADT specification separates the
>   application programmer from the ADT implementor, regardless of
>   whether you perceive the structure as "basic" or not. 
>

You seem to be implying that there are only two layers to any given
program, the applications layer, and the ADT's used by the application.
In general, there can be any number of layers.  If you look at a language
like Eiffel or Smalltalk, all components of a program are classes (ADTs).
There is no language distiction between how a stack is constructed and how
any other component of the program is constructed.  The problems than
an ADT designer has to deal with are not unique to the ADT's at the bottom
of the heirarchy.

>   The original point was that not all users can tolerate GC
>   (real-time users in particular);

I have stated that GC may not be appropriate in certain cases
(real-time).  Others have said that there is work being done on
real-time GC.  Are you saying that some users can "tolerate" GC?  If
so, in support of what type of users have your arguments against GC
been made?  If you have been arguing against GC for real-time users
exclusively, that has not been clear.  If you have in mind another set
of users who can not tolerate GC please be specific.

>   on the other hand, all GC users
>   can use managed components with no problems.  

This is your opinion.  I have described some of the problems related to
managed components regarding complexity of constructing ADTs which means
the complexity of the entire program if you consider a program to simply
be a collection of related ADTs.  Others have described problems with general 
graph structures.

An area that hasn't been discussed here is reusing components via 
inheritance.  I really haven't come to any conclusions reagarding
the interaction of memory management and inheritance.  Anybody else
have some ideas?

Another area that concerns me that hasn't been discussed is the interaction
between exceptions and memory management.  My understanding is that
when an exception is raised, execution may resume after any number of
"scopes" have been exited.  Will the objects in those scopes be
guaranteed to be destroyed?  If exceptions need to be handled in the
scope in which they were raised simply due in part to this memory
leakage, has not the lack of GC caused the program to become more
complex?  And if an exception is always handled in the scope in which
it is raised, why bother with using the exception mechanism?  (I am
refering to programmer defined exceptions such as stack underflow or
overflow as opposed to hardware exceptions etc. which should probably
be handled differently.)

>   Therefore, if
>   a component is to be designed for the widest possible audience,
>   that component must manage its own storage.  If components are
>   used which meet the highest possible standards, then we don't 
>   have to worry about whether our components will stop working 
>   when we do maintenance (which might introduce multitasking, 
>   real-time operation, etc.) on our application; using GC introduces
>   such weaknesses, in addition to giving an *avoidable* run-time cost. 

Perhaps GC does introduce problems with reuse in regards to
multi-tasking and real-time operations, perhaps not, I'm not sure.  I
am anxious to see how multi-tasking will be handled in Eiffel.  ISE has
indicated that they are working on incoporating it into the OOP
framework.  The run-time cost of GC *may* be avoided in certain cases
by using ADT controlled memory management.

The point is that there *are* tradeoffs.  Not using GC "might introduce"
reuse problems also.  What is the cost associated with "avoiding" GC?
-- 
Gary Wright 					...!uunet!hsi!wright
Health Systems International                    wright@hsi.com

ted@nmsu.edu (Ted Dunning) (09/28/89)

In article <6589@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) writes:

   From scc@cl.cam.ac.uk (Stephen Crawley):
   > Here are some problems where programmer-controlled storage management 
   > would be very difficult or impossible.
   > 
   >   Interactive editting of cyclic graph data structures.  You have a 
   >   heterogeneous cyclic graph data structure and the editor must be able
   >   to make arbitrary sequences of changes to the data structure.  How
   >   do you detect that a subgraph has become detached?   

      The fact that a subgraph has become detached does not imply that
      it is no longer part of a graph.  Not all graphs are fully connected.
      Therefore, it isn't possible to release the storage used until there
      is some sort of directive which indicates that a given set of nodes
      and/or arcs is to be destroyed. 

good point.  of course, languages which have true garbage collection
can handle this problem, too, while bill wolfe seems to think he
couldn't handle it in his paradigm.

   >   Dynamic loading & unloading of modules in a running program.  
   >   [...] the dynamic loader needs to know if the module is
   >   currently being 'used' ... e.g. if some thread is executing a 
   >   procedure exported by the module or if some other part of the
   >   application has salted away a reference into the module (e.g. a 
   >   procedure pointer)
	...

      This is an operating-system problem, not an application problem.
      As such, the situation is not "applicable to a wide range of
      applications".  The solution is operating-system dependent; one
      approach might be to use idle time to scan the processes in the
      process queue to determine if there are any processes which were
      started while the old module was in effect which made use of it.

this won't work in most cases since there will be at least one
ancestral process that started very early and runs essentially forever.

      Operating systems do a lot of things (pointer arithmetic, etc.)
      which application systems should not be doing; this falls into
      exactly that category.

but of course, this problem is trivial to handle in all the scheme
dialects where continuations are available as first class objects, and
where the garbage collector understands how to collect references from
inside continuations.  not only that, but the solution is portable.


--
ted@nmsu.edu
			remember, when extensions and subsets are outlawed,
			only outlaws will have extensions or subsets

ted@nmsu.edu (Ted Dunning) (09/28/89)

In article <8012@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes:


   But GC is unlikely to reclaim that storage.  Yes, the graph no
   longer has pointers to the disconnected subgraph.  However, the
   subgraph may be circular (you said it was cyclic), so all the nodes
   of the subgraph may still have references to them.  Can current GC
   systems recognize such a loop and remove it in it's entirety?  How
   complex can such a loop be and still be recognized?

a modern gc only worries about references from currently accessible
variables, and thus an unreferenced cycle can be arbitrarily complex
and still be reclaimed.  reference counting systems have trouble with
loops and since they usually have only a few bits of reference count,
they also tend to saturate easily.  for this reason, they will
occasionally stop and do a normal garbage collection.


--
ted@nmsu.edu
			remember, when extensions and subsets are outlawed,
			only outlaws will have extensions or subsets

dl@g.g.oswego.edu (Doug Lea) (09/28/89)

First a warning,

    I do quite a lot of C++ programming

and a perhaps slightly contentious definition,

    I take OO programming to be that which is in part or whole concerned
              -              -
             |  construction  |
    with the |  mutation      | of objects
             |  inspection    |  
             |  destruction   |
              -              -

    (And, of course, OO design to be concerned with the specification,
    definition, interrelationships, etc., of such objects...)


Only half-facetiously, one could make the vicious claim that GC
removes programmer control over 1/4 of the OO programming paradigm: In
C++, at least, the notion of destruction of an object may well require
some structured, active, resource deallocation, and need not just
consist of passive reclamation of memory cells.  The classic example
where such control is useful is a Window object that needs to erase
itself from view upon destruction. In a GC-based system, the
mechanisms for implemententing such actions seem less than natural --
either the programmer would need to `logically' destroy the window,
and then let the collector sometime later reclaim space, or perhaps
the collector could be designed to do all kinds of storage reclamation
associated with the object at some indeterminate time.

My points are just that

    Structured destruction is a good idea. (As is structured construction.)

    Garbage collection can get in the way of this.

    And again, half-facetiously:

        GC is sometimes a byproduct of functional programming methodologies
        that allow programmers to pay attention only to `values', and
        not the objects required in order to arrive at them.

All that said, I think GC is a very useful tool! Really!
It is a serious contender if
    Destruction consists only of releasing storage
and
    The lifetimes of objects are so unstructured/complex that manual
    reclamation is either slower, or more error-prone than GC.

This kind of situation comes up a lot.

But often a redesign that forces a more structured lifetime discipline
supporting simpler techniques is at least as attractive. I take that as
Wulf's main point. I think it's a good one.

How about a real example?  For the amusement of C++ fans following
this discussion, I'll append one below.  It's a fun variation of a
classic kind of situation where GC is commonly used. There are Cells
that may be linked together to create computational pipelines. While I
set it up to use GC, other choices are possible. Here are some of the
not-terribly-profound issues that came to my mind while writing it as
a classroom example:

    * The main() doesn't really exercise the collector. One chain
        is created and destroyed during the run. 

        * If this were a one-shot closed design, culminating only in a
            cute but slow prime number sieve program, the best
            solution is probably to do nothing, letting all
            deletion occur at program termination, since there are
            no other useful aspects of destruction.

        * If this were still self-contained, but a variation of
            the current main() were used as a callable function,
            then it would make sense to insert code to mark the
            base of the chain, use a stack-based allocator, and
            release it on destruction via modified constructor/
            destructor code. You'd also want to insert code to
            ensure that at most one chain were active at a time.

    * In fact, the *current* code using Sieve consists only of single 
        chains of Cells. If only such applications were supported,

        * One could delete space via a virtual no-op destructor
            in Cell, and one in Filter like 
                ~Filter() { delete source; }
            to cause a chain of deletions to occur on destruction.
    
        * Except that the Cells do not know if their sources were
            dynamically allocated or not, so `delete source' can be an
            error if the source is a local. Perhaps, Cells could
            refuse to support `local' creation, and only exist
            dynamically. Alternatively some bookkeeping could be added
            inside each Cell to record whether it was dynamically
            allocated, and constructors and destructors would need to
            use it. The former is probably a better strategy, since
            the implementation is not set up to cleanly support
            by-value argument passing of Cells, etc., anyway, and
            there seems to be every reason not to do so. 

            * Digression: Actually, this is all a bit of a pain in
                C++.  In order to avoid clients having to say `new
                Whatever', and thereafter dealing with pointers, you'd
                need to set up a parallel set of classes, each of
                which served as pointers to the `real' objects of the
                corresponding classes, and then took care of the
                operator `new' stuff on creation, etc.  It's a case
                where the convention taken in Eiffel, Smalltalk, etc.,
                that names are by default just bound to references to
                objects, not the objects themselves, would be more
                convenient. In these kinds of situations, C++
                programmers often settle for requiring clients to use
                pointers or references explicitly.  But even this has
                drawbacks, since, while you can force clients to only
                create objects via `new', the only means for doing this
                leaves the restriction as a mostly run-time, not
                compile-time error.

    * But it is very possible for someone to make two Filters
        point to the same source, or to create new subclasses with
        multiple sources or sinks.

        * The simple GC solution would continue to work in such cases.

        * In order to deal with this and still keep virtual
            destructors, some form of reference counting would be
            required.  Even this won't suffice if forms of circular
            paths that give reference counters problems were created,
            but such possibilities seem extremely unlikely, and
            defensible as a documented restriction, akin to the
            builtin limitation that only, say 32767 references are
            allowed before an object is considered permanent.  The
            work involved in adapting this code to use reference
            counts is not great, but is greater than using GC.  The
            overhead for reference counting vs GC would probably be
            close, depending on the quality of the implementations. In
            either case, storage management would probably amount to
            a tiny percentage of execution time.

        * Except that it makes no sense in *this* code for anyone
            to hook up two Filters to the same source, so perhaps
            Cells should record whether they've been used as sources
            and refuse to comply if they are re-attached. But this
            is reference-counting, just used in a different way.

        * Except that maybe someone might want to make a subclass of
            Cells (e.g., random number generators) that would be
            useful to multiply tap.

        * In either case, reference-counting might be more flexible,
            since it could be used both for allocation and to guard
            against some kinds of Cells being multiply tapped when
            this is would be an error.

    * What about programmers creating subclasses that display
        themselves in action on a screen, or record their
        actions in a log file? In such cases the reference-counting
        version seems necessary, since destruction would require
        real-time screen updates or file manipulation. 

So, perhaps, reference-counting would be more appropriate, since it
may better serve the goal of a robust, reusable, extensible design.
I'll leave implementation as an exercise to the reader (in other
words, I'm too lazy to do it out right now!)

The example has no particular deep moral, except to reflect my feeling
that thinking about the full lifetimes of objects is good for you! It
helps focus attention on various aspects of the semantics of new types
in ways you otherwise might not have thought much about. And further,
that GC-oriented issues and reusability *do* interact, but that GC need
not always be chosen on such grounds.

I suppose it's worth a mention that the issues based on extensibility
via inheritence (i.e., design reuse) don't much come up in Ada.
        
Perhaps others could offer other analyses of this or other examples.

------------------------------- sieve.cc

// Fun with the prime number sieve.
// Based loosely on R. Sethi ``Programming Languages'', sec 6.7


// link to Boehm's comp.sources.unix C GC routines ...

   extern "C" void  gc_init(); 
   extern "C" void* gc_malloc(unsigned int);

class GC      // ... but wrap gc routines as a C++ class
{
public:
               GC()                   { gc_init(); }
         void* alloc(unsigned int sz) { return gc_malloc(sz); }
};



class Cell     // an object with state and a way to update and report it
{
private:
  static   GC  gc;            // one collector for all Cells
protected:
          int  val;           // the value to output next
  virtual void update() {}    // the updater; default as no-op

public:
               Cell(int initial=0) :val(initial) {} 

          int  next()         { update(); return val; } // do one step

                              // set class allocation ops to use gc
         void* operator new(unsigned int sz) { return gc.alloc(sz); }
         void  operator delete(void*) {}
};

GC Cell::gc = GC();          // static class member initialization
                             // must be done outside class decl in C++


class Counter : public Cell  // output consecutive integers
{
protected:
         void update()       { ++val; }

public:
              Counter(int initial=0) :Cell(initial) {}
};


class Filter : public Cell  // transform input from another Cell
{
protected:
         Cell* src;          // input source
          int  inp;          // last input
         void  get()         { inp = src->next(); }

public:
               Filter(Cell* s, int initial=0) :Cell(initial), src(s), inp(0) {}
};


class ModFilter : public Filter // hold last input not divisible by divisor
{
private:
         int  divisor;
         int  divides()      { return inp % divisor == 0; }
public:
              ModFilter(Cell* s, int d) : Filter(s), divisor(d) {}
         void update()       { do get(); while (divides()); val = inp; }

};


class Sieve : public Filter  // Serve as terminal node of a prime number sieve
{
public:
              Sieve()        :Filter(new Counter(1)) {}
         void update()       { get(); src = new ModFilter(src, val = inp); }
};


#include <stream.h> 

main(int argc, char** argv) // exercise everything a little
{
  if (argc < 2) 
  { 
    cerr << "error: enter desired number of primes as program argument\n"; 
    exit(1); 
  }

  int n = abs(atoi(argv[1]));
  Sieve s;
  for (int i = 0; i < n; ++i) cout << s.next() << "\n";
}

Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367
email: dl@oswego.edu              or dl%oswego.edu@nisc.nyser.net
UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl
--
Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367
email: dl@oswego.edu              or dl%oswego.edu@nisc.nyser.net
UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl

scc@cl.cam.ac.uk (Stephen Crawley) (09/28/89)

Randell Jesup writes:

> But GC is unlikely to reclaim that storage.  Yes, the graph no longer
> has pointers to the disconnected subgraph.  However, the subgraph may be
> circular (you said it was cyclic), so all the nodes of the subgraph may still
> have references to them.  Can current GC systems recognize such a loop and
> remove it in it's entirety?

Of course.  Only brain-dead garbage collectors that rely exclusively on
reference counts can't get rid of cycles.  (Reference counting garbage
collectors are bad news anyhow, since they entail the overhead of a ref
count inc/dec on every assignment.)

> How complex can such a loop be and still be recognized?

Arbitrarily complex.

-- Steve

wright@hsi.UUCP (Gary Wright) (09/28/89)

In article <8309@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes:
>From article <599@hsi86.hsi.UUCP>, by wright@hsi.UUCP (Gary Wright):
>..
>> 
>> Well, I have *never* encountered a situation that absolutely demanded using
>> a high level language.   That is why I still do all my programming in
>> assembly. :-)
>> 
>
>Sorry, but I'm too tired of this assembler-hll analogy for a smiley to save
>you. Grumble. Ggrrrrrrrrrrrrmpf.  Okay, okay, here: :-)  (Sort of.)
>

But you missed the point.  It doesn't matter what specific analogy is
used.  I was trying to point out the logical flaw in the argument.  The
point is that just because you have never used a technique, or have
never found a situation for which the techniques was not sufficient
doesn't mean that another technique could be used or might even make
your job easier.

[comments regarding powerful macro assemblers ommitted. GRW]

>   Those macro packages were the evolutionary antecedants of HLLs.

So you were using new techniques to make your job easier at *perhaps* a loss
in generality and raw performance.  That is the analogy I was trying to make.

-- 
Gary Wright 					...!uunet!hsi!wright
Health Systems International                    wright@hsi.com

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/29/89)

From scc@cl.cam.ac.uk (Stephen Crawley):
>> ... With a managed system, the user is given
>> an opportunity to strategically throw away data structures, and
>> thereby free up some space that the garbage collector could not
>> possibly take because it is still "in use".
> 
% Bogus argument!  The case you are talking about is that where the 
% programmer has a pointer to a data structure in a variable that is 
% still in scope, but is not conceptually in use.  You suggest that
% in the case of non-GC'ed storage the programmer could free the space
% early by calling the deallocator.  Well in the GC'ed case, the 
% programmer can arrange that the GC can free the data structure by
% assigning NIL to the variable.    

    No, I was referring to space that is still in use, but which
    has marginal value and could be chucked if necessary.  

    
    Bill Wolfe, wtwolfe@hubcap.clemson.edu
 

madany@m.cs.uiuc.edu (09/29/89)

/* Written  8:29 pm  Sep 26, 1989 by jesup@cbmvax.UUCP in m.cs.uiuc.edu:comp.sw.components */
>> 	But GC is unlikely to reclaim that storage.  Yes, the graph no longer
>> has pointers to the disconnected subgraph.  However, the subgraph may be
>> circular (you said it was cyclic), so all the nodes of the subgraph may still
>> have references to them.  Can current GC systems recognize such a loop and
>> remove it in it's entirety?  How complex can such a loop be and still be
>> recognized?
/* End of text from m.cs.uiuc.edu:comp.sw.components */

GC does reclaim the storage.  That's one of the big reasons why people 
advocate it.  A good garbage collector will reclaim loops of ANY complexity
IFF they are garbage.

I don't have the best reference handy, but page 682 of "Smalltalk-80
The Language and its Implementation" may give you a hint as to how
such GC works.

jans@tekgvs.LABS.TEK.COM (Jan Steinman) (09/30/89)

<..A good garbage collector will reclaim loops of ANY complexity IF they are 
garbage.  I don't have the best reference handy, but page 682 of "Smalltalk-80 
The Language and its Implementation" may give you a hint as to how such GC 
works.>

Smalltalk-80, as described in the "Blue Book", uses a combination of reference 
counting and mark-sweep.  Few, if any, modern implementations use such an 
inefficient scheme.  "Generation scavenging" seems to be the latest and 
greatest, and it also reclaims arbitrarily long cyclic garbage.  It has some 
drawbacks if loops contain garbage of widely differing ages, but evidence 
suggests (at least in Smalltalk) that cyclic garbage tends to be contemporary.

* Caudill, P "A Third Generation Smalltalk-80 Implementation", OOPSLA 86 
Proceedings, Oct 1986.
* Deutsch, LP "An Efficient Incremental Automatic Garbage Collector", CACM Sep 
1976.
* Lieberman, H "A Real-Time Garbage Collector Based on the Lifetimes of 
Objects", CACM, Jun 1983.
* Ungar, D "Generation Scavenging: A Non-disruptive High Performance Storage 
Reclamation Algorithm", SIGSOFT/SIGPLAN Proceedings, Apr 1984.

							   Jan Steinman - N7JDB
						  Electronic Systems Laboratory
					Box 500, MS 50-370, Beaverton, OR 97077
						(w)503/627-5881 (h)503/657-7703