billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/18/89)
From gateley@m2.csc.ti.com (John Gateley): > There exists at least one real-time garbage collection algorithm > (don't have the reference right now, email if you want it). I recently read of a linear-time algorithm (ignoring constant factors, of course) -- the only problem is that it required sixteen times the amount of space actually needed by the program!!! The point is, any system which uses GC is going to be considerably less efficient (and constant factors are important here!) than a competing system in which the fact that an object is no longer needed is directly communicated to the storage management system. Now the advocates of GC frequently claim that space management is too heavy a burden to be placed on the application programmer, and this may well be true. However, there is a much better way to deal with this problem. All that is necessary is to encapsulate the details of storage management within abstract data types, such that the destruction algorithm is integrated into the block exit mechanism. By doing this, we place a level of abstraction between the application programmer and the details of storage management. The use of pointers is confined to the ADT's implementation, and the application programmer is shielded from all the resulting complexities. The interface also simplifies the work of the ADT's implementor; the destruction algorithm is usually quite straightforward. Worrying about GC is essentially barking up the wrong tree; what we need is better reuse mechanisms, so that we can also gain the added benefit of exploiting the implicitly supplied information regarding the fact that a given variable is no longer needed which is naturally produced (with no special effort on the part of the application programmer) upon each and every block exit. Thus, we can have our programming convenience WITHOUT eating our memory or CPU cycles, and thereby keep everyone happy (except perhaps those hackers who get a charge out of playing with pointers). Bill Wolfe, wtwolfe@hubcap.clemson.edu
tynor@prism.gatech.EDU (Steve Tynor) (09/18/89)
In article <6488@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes: ... > By doing this, we place a level of abstraction between the application > programmer and the details of storage management. The use of pointers > is confined to the ADT's implementation, and the application programmer > is shielded from all the resulting complexities. The interface also > simplifies the work of the ADT's implementor; the destruction algorithm > is usually quite straightforward. Freeing the memory _is_ usually straightforward. Knowing when it's safe to do so _isn't_. That's when GC (or reference counting) becomes necessary. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= If the facts do not conform to the theory, they must be disposed of. Steve Tynor Georgia Tech Research Institute Artificial Intelligence Branch tynor@prism.gatech.edu
ted@nmsu.edu (Ted Dunning) (09/18/89)
i find that i am mostly in agreement with mr. wolfe when
In article <6488@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) writes:
The point is, any system which uses GC is going to be considerably
less efficient (and constant factors are important here!) than a
competing system in which the fact that an object is no longer
needed is directly communicated to the storage management
system.
this is not always true. if the objects are heavily shared, or if
point in time that the object becomes unreferencable is obscure, then
this is a difficult problem. people have used reference counting in
the past to solve this problem, but the general consensus has always
been (on lisp implementations) that devoting enough bits to reference
counting was impractical (it certainly is if every value with pointer
semantics must be tagged) and that reference counting collection was
more expensive than stop and copy. in addition, reference counting
does not compact storage as does any of the various forms of stop and
copy.
All that is necessary is to encapsulate the
details of storage management within abstract data types, such that
the destruction algorithm is integrated into the block exit
mechanism.
this is not sufficient. we also need to handle assignment (the old
value of the assignee becomes less referenced), and formal argument
passing (where we suddenly have one more reference). in c++ there is
also all of the complexity of implicit and explicity type conversion.
trickier is the issue of continuations and first class functional
objects. somehow all variables that support reference counting styles
of storage management which are referenced by a continuation, or by a
functional object, must be marked as referenced and not reclaimed on
block exit.
By doing this, we place a level of abstraction between the application
programmer and the details of storage management.
certainly the goal.
The use of pointers
is confined to the ADT's implementation, and the application programmer
is shielded from all the resulting complexities.
again, motherhood and apple pie.
The interface also
simplifies the work of the ADT's implementor; the destruction algorithm
is usually quite straightforward.
this is only straightforward in a language that does not support
advanced concepts. (like ada and c++). once you talk about a
language as strong as scheme, then you have a much more interesting
problem. one that is, incidently, easily handled by conventional
garbage collectors.
in fact, reference counting is not the only possibility in a language
like c++ which provides some knowledge of scope entry and exit for the
programmer. instead, it is possible to keep the equivalent of an
obarray, pushing and popping references to variables as needed. then
when gc is desired, one can do a stop and copy or mark/sweep with an
extra pass to correct all the pointers.
Worrying about GC is essentially barking up the wrong tree; what we
need is better reuse mechanisms, so that we can also gain the added
benefit of exploiting the implicitly supplied information regarding
the fact that a given variable is no longer needed which is naturally
produced (with no special effort on the part of the application
programmer) upon each and every block exit.
this _is_ garbage collection. it just isn't stop and do x garbage
collection.
it should be kept in mind that in general, collection overhead increases
as you go from generational collection to stop and copy to mark sweep
to reference counting to `real time' collection methods (almost all of
which are based indirectly on baker's algorithm in the august '78
issue of cacm). if you care about average throughput, use stop and
copy, and maybe a generational copier. if you want a very simple
system, use ref counting. nobody really uses the real time methods
yet because they are expensive and complex.
--
ted@nmsu.edu
Most of all, he loved the fall
when the cottonwoods leaves turned gold
and floated down the trout streams
under the clear blue windswept skies.
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/19/89)
From article <1895@hydra.gatech.EDU>, by tynor@prism.gatech.EDU (Steve Tynor): > Freeing the memory _is_ usually straightforward. Knowing when it's safe to > do so _isn't_. That's when GC (or reference counting) becomes necessary. It's safe to do so: 1) When the variable goes out of scope 2) When the programmer specifies that it's to be destroyed sooner. So assuming that we use abstract data types like good little programmers, where's the need for reference counting? Bill Wolfe, wtwolfe@hubcap.clemson.edu
rang@cs.wisc.edu (Anton Rang) (09/19/89)
In article <6493@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) writes: >From article <1895@hydra.gatech.EDU>, by tynor@prism.gatech.EDU (Steve Tynor): >> Freeing the memory _is_ usually straightforward. Knowing when it's safe to >> do so _isn't_. That's when GC (or reference counting) becomes necessary. > > It's safe to do so: > > 1) When the variable goes out of scope > > 2) When the programmer specifies that it's to be destroyed sooner. If there's any explicit pointer manipulation, these conditions aren't sufficient, because an object can be pointed to in more than one place. For instance, if I have two objects (A and B) sharing pointers to object X, then object X should be destroyed (assuming no other references) when both A and B have been destroyed. Personally, I prefer controlling my own data structures...leaving it up to a garbage collector has always seemed a little sloppy to me. +----------------------------------+------------------+ | Anton Rang (grad student) | "VMS Forever!" | | University of Wisconsin--Madison | rang@cs.wisc.edu | +----------------------------------+------------------+
scc@cl.cam.ac.uk (Stephen Crawley) (09/19/89)
The most important distinction between ordinary and real-time programs is that the real-time programs need to execute in predictable time. An ordinary synchronous garbage collector that stops execution for a non-trivial amount of time is clearly not acceptable in this context. But an asynchronous garbage collector that allows GC to be done in short burst during periods of inactivity MAY be ok for real-time programming. The key question that determines whether or not asynch GC will work in real-time is whether there is enough idle time to keep up with garbage collection. In many cases the answer should be yes! And if the answer is no, it may be possible to MAKE more idle time can be made by 1) using a faster processor or 2) moving the tasks of garbage collection onto a second processor. Only when we are pushing the limits of the hardware will the answer be no. A second question that affects the viability of GC in a real-time program is how much garbage gets generated. I claim that for most of real-time program that are being written today, the answer is "not a lot". If a programmer can decide to free a piece of dynamic on exiting a scope block, then a smart compiler could usually make the same decision in many cases. Similarly a smart compiler should be able to optimise storage reclamation in other cases. [It is my guess that the main reason that compilers that optimise GC don't exist is that the industry is prejudiced against GC] There are of courses the cases that cannot be solved by freeing stuff on scope exit, or by other static tricks. In such cases, the non-GC'ed solution MUST handled by the application. Bill Wolfe seems to be saying that such situations almost never arise in real-time problems. This may be true right now, but it won't be long before the military start asking for real-time AI! Finally, I'd like to rebut the idea that ADT's that do their own storage management are appropriate for use in a GC'ed environment. If a type does its own management, then in order to get it to do its stuff, it must export an explicit destructor operation or its equivalent. How / when does the destructor get called? There are 2 options as I see it: 1) A type that refers to a non-GC'ed type becomes non-GC'ed, and must have a destructor operation. We soon find that large parts of our application have to understand storage management policy. 2) We arrange that the garbage collector provides hooks for calling an object's destructor operation when the object becomes garbage in the conventional sense. This makes garbage collection more complicated. For example, when the GC finds some garbage instances of ADT's with a destructor, it must decide on the correct order to destroy them. In either case, we end up with the problems we were trying to avoid by using garbage collection; complication for the application, and lots of potential for storage leaks and other bugs. In summary, we need to recognise 3 things: o A GC'ed language is the only sane approach to many hard programming problems. Some of these problems WILL also be real-time problems. o A GC'ed language is not necessarily inappropriate for real-time problems. It depends on the problem, the tools (GC and compiler) and on the hardware. o GC'ed languages and non-GC'ed packages don't mix, so we should not try to mix them. This is a case where software reuse is NOT appropriate. -- Steve
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/19/89)
From rang@cs.wisc.edu (Anton Rang): >> It's safe to do so: >> 1) When the variable goes out of scope >> 2) When the programmer specifies that it's to be destroyed sooner. > > If there's any explicit pointer manipulation, these conditions > aren't sufficient, because an object can be pointed to in more than > one place. True, but the basic thrust of the position is that the programmer should generally not use pointers explicitly; rather, the use of pointers should be made safe via encapsulation within a secure ADT. > Personally, I prefer controlling my own data structures...leaving it > up to a garbage collector has always seemed a little sloppy to me. Same here. I welcome GC algorithms as an ADT *debugging* tool (raising an exception if the algorithm ever succeeds in finding any loose storage), but it just costs way too much at RUN time. Bill Wolfe, wtwolfe@hubcap.clemson.edu
tynor@prism.gatech.EDU (Steve Tynor) (09/19/89)
In article <6497@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes: >From rang@cs.wisc.edu (Anton Rang): >>> It's safe to do so: >>> 1) When the variable goes out of scope >>> 2) When the programmer specifies that it's to be destroyed sooner. >> >> If there's any explicit pointer manipulation, these conditions >> aren't sufficient, because an object can be pointed to in more than >> one place. > > True, but the basic thrust of the position is that the programmer > should generally not use pointers explicitly; rather, the use of > pointers should be made safe via encapsulation within a secure ADT. Whether a reference to a shared object is a pointer, or is hidden via some clever encapsulation, I can't see how the programmer can be expected to know when to explicitly deallocate the reference without maintaining reference counts (whether he does it, or the ADT - it's still reference counting) or relying on GC (or assuming that references are _never_ shared - which seems like a simplistic assumption). I am (as we speak) trying to implement a ADT (in Ada) which contains objects whose lifetimes are not lexically scoped and to which many other objects may refer. I'd be very happy to avoid reference counts and GC if someone can suggest a way to do so! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= WITH disclaimer; USE disclaimer; Steve Tynor Georgia Tech Research Institute Artificial Intelligence Branch tynor@prism.gatech.edu
johnson@p.cs.uiuc.edu (09/19/89)
> Written 6:21 am Sep 18, 1989 by billwolf%hazel.c@hubcap.clemson.edu > Now the advocates of GC frequently claim that space management is > too heavy a burden to be placed on the application programmer, and > this may well be true. However, there is a much better way to deal > with this problem. All that is necessary is to encapsulate the > details of storage management within abstract data types, such that > the destruction algorithm is integrated into the block exit mechanism. > By doing this, we place a level of abstraction between the application > programmer and the details of storage management. The use of pointers > is confined to the ADT's implementation, and the application programmer > is shielded from all the resulting complexities. The interface also > simplifies the work of the ADT's implementor; the destruction algorithm > is usually quite straightforward. Bill Wolfe is being hopelessly idealistic. I've used C++ a lot, which takes the approach that he advocates, and there are many times when it is very hard for an ADT to figure out when to destroy an object. Moreover, memory management is the source of many bugs and takes a lot of programmer time to get right. In general, we want to avoid programming language features that encourage bugs and cause programmers to spend a lot of time. C++ programmers tend to minimize using dynamic memory management as much as possible, and it is probably partly for this reason. Moreover, there are times when memory management would be more efficient if it were implemented by the compiler instead of by the programmer. For example, C++ programmers use reference counting a lot, and there are a number of algorithms by which even stupid compilers can eliminate a lot of reference counting. Of course, programmers can perform these optimizations by hand, but I believe that compilers should be in charge of optimization, not programmers. Most languages with garbage collection are implemented by interpreters or by very simple compilers, and so we haven't seen much optimization of memory management, but we will in the future, as more and more languages include garbage collection. Many garbage collection systems are quite efficient. For example, some generation scavenging systems have a 3% to 5% overhead. The problem is that the efficient systems are not real-time, and the real-time garbage collection algorithms are not efficient. The algorithm by Appel et. al. in the '88 SIGPLAN programming language implementation conference is supposed to be both, but I am not sure that it has been ever been used in a real-time environment. In any case, there were no timings given, and "real-time" is always relative. Ralph Johnson
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/19/89)
From scc@cl.cam.ac.uk (Stephen Crawley): > The key question that determines whether or not asynch GC will work in > real-time is whether there is enough idle time to keep up with garbage > collection. [...] Only when we are pushing the limits of the hardware > will the answer be no. Another key question is "How can we make this product such that it uses as little memory and CPU as possible, so that we might underbid our competitors?"; the reuse of components which do their own space management will permit the elimination of GC, thus providing an edge. > Finally, I'd like to rebut the idea that ADT's that do their own storage > management are appropriate for use in a GC'ed environment. [...] > [how to handle interactions between managed and unmanaged storage?] I'm not an expert on how this is done, but I'd guess that memory is divided into two regions: managed and unmanaged. The GC system could *read* the managed region (to detect pointers into the unmanaged area), but only for the purpose of operating upon the unmanaged region; it must refrain from taking any actions with respect to managed storage. Probably there are considerably more sophisticated techniques involved; I'd suggest porting this thread into comp.lang.ada if someone is seriously interested in the question, since Ada compilers which provide GC are also required to enforce the CONTROLLED pragma (which specifies that this type manages its own storage). Bill Wolfe, wtwolfe@hubcap.clemson.edu
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/19/89)
From tynor@prism.gatech.EDU (Steve Tynor): >>>> It's safe to do so: >>>> 1) When the variable goes out of scope >>>> 2) When the programmer specifies that it's to be destroyed sooner. > > Whether a reference to a shared object is a pointer, or is hidden via some > clever encapsulation, I can't see how the programmer can be expected to know > when to explicitly deallocate the reference without maintaining reference > counts (whether he does it, or the ADT - it's still reference counting) or > relying on GC (or assuming that references are _never_ shared - which seems > like a simplistic assumption). OK, first let me clarify my position. The primary use for pointers is simply to implement abstract data types -- lists, trees, graphs, etc.; by shielding the user from the details of implementation, we can keep the user from having to worry about manipulating pointers. Now your question, I believe, is "What if the user wants to store pointers in a container?"... the response is that the user probably should seriously consider the possibility that s/he is making an inappropriate use of pointers. Let's suppose the user has an object with lots of "inertia", which causes the user to initially think that pointers are a good way to "move" this object as necessary. This can be addressed with a technique which applies to an even harder problem: the case in which the object is not even locally present. The solution is to assign unique identification numbers to objects of the type, keep the objects in a database (which may of course be a distributed database), and then deal with the identification numbers instead. Identification numbers have many nice properties: they can be arbitrary abstractions (e.g., encoding certain properties of their client objects), they are totally machine-independent, and they are not subject to any a priori limitations. Information regarding the maximum lifetime of the object is a particularly good property to encode into the identification number, since the holder can then quickly check whether the object has expired without even consulting the database. Where objects can have an infinite lifetime, a protocol can be devised whereby the database must be checked at least once per some arbitrary time period by all users, which will permit identification numbers to be recycled after one full time period of nonexistence. Other protocols might be used depending upon the characteristics of the specific object involved. The techniques for managing the active range of identification numbers can also be quite sophisticated. > I am (as we speak) trying to implement a ADT (in Ada) which contains > objects whose lifetimes are not lexically scoped and to which many > other objects may refer. I'd be very happy to avoid reference counts > and GC if someone can suggest a way to do so! Consider it done. Bill Wolfe, wtwolfe@hubcap.clemson.edu
tynor@prism.gatech.EDU (Steve Tynor) (09/19/89)
In article <6506@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes: ... > expired without even consulting the database. AWhere objects can have > an infinite lifetime, a protocol can be devised whereby the database > must be checked at least once per some arbitrary time period by all > users, which will permit identification numbers to be recycled after > one full time period of nonexistence. Other protocols might be used Gee. This sounds an awful lot like garbage collection to me. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Progress means replacing something wrong with something more subtly wrong. Steve Tynor Georgia Tech Research Institute Artificial Intelligence Branch tynor@prism.gatech.edu
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/20/89)
From article <1929@hydra.gatech.EDU>, by tynor@prism.gatech.EDU (Steve Tynor): >> expired without even consulting the database. AWhere objects can have >> an infinite lifetime, a protocol can be devised whereby the database >> must be checked at least once per some arbitrary time period by all >> users, which will permit identification numbers to be recycled after >> one full time period of nonexistence. Other protocols might be used > > Gee. This sounds an awful lot like garbage collection to me. No... GC would involve verifying (at tremendous cost) that no more occurrences of a given identification number exist anywhere. This protocol specifies that identification numbers EXPIRE AUTOMATICALLY unless certain conditions are met. GC is restricted to a particular machine; this protocol will function even in a distributed environment. Usually, the semantics of the object will allow even better controls, such as a fixed expiration date, but this protocol covers the worst case. Bill Wolfe, wtwolfe@hubcap.clemson.edu
scc@cl.cam.ac.uk (Stephen Crawley) (09/20/89)
Bill Wolf (?) writes: >>> It's safe to [reclaim a dynamicly allocated ADT]: >>> 1) When the variable goes out of scope >>> 2) When the programmer specifies that it's to be destroyed sooner. Anton Rang replies: >> If there's any explicit pointer manipulation, these conditions >> aren't sufficient, because an object can be pointed to in more than >> one place. In addition: 1) doesn't work if a given scope doesn't exit before you've run out of space OR if the value in a variable is passed out of the scope. 2) doesn't work unless your programmer writes perfect code. Bill Wolf replies: > True, but the basic thrust of the position is that the programmer > should generally not use pointers explicitly; rather, the use of > pointers should be made safe via encapsulation within a secure ADT. Anton is absolutely right. It makes no difference whether references to an ADT is implicit or explicit. The ADT cannot know whether or not the application that uses it still holds a reference to a given instance. In all but the simple cases the application must take a hand in the ADT's storage management, either by invoking the ADT's deallocator explicitly or by telling the ADT when to incrementing/decrementing a reference count.
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/20/89)
From scc@cl.cam.ac.uk (Stephen Crawley): >>>> It's safe to [reclaim a dynamicly allocated ADT]: >>>> 1) When the variable goes out of scope >>>> 2) When the programmer specifies that it's to be destroyed sooner. >> [...] the basic thrust of the position is that the programmer >> should generally not use pointers explicitly; rather, the use of >> pointers should be made safe via encapsulation within a secure ADT. > > It makes no difference whether references to an ADT is implicit > or explicit. The ADT cannot know whether or not the application > that uses it still holds a reference to a given instance. The pointers are INTERNAL to the ADT, and the user will never see them. When the ADT goes out of scope, it dies just like any predefined type would. We are considering here an ADT which is declared at the beginning of a (procedure, function, block). This ADT will do dynamic allocation INTERNALLY so as to facilitate its own expansion and contraction as necessary. Upon block exit, the ADT's destroy procedure will be invoked, and all the space occupied by the ADT will be released. Bill Wolfe, wtwolfe@hubcap.clemson.edu
scc@cl.cam.ac.uk (Stephen Crawley) (09/21/89)
From Bill Wolfe: >>>>> It's safe to [reclaim a dynamicly allocated ADT]: >>>>> 1) When the variable goes out of scope >>>>> 2) When the programmer specifies that it's to be destroyed sooner. >>> [...] the basic thrust of the position is that the programmer >>> should generally not use pointers explicitly; rather, the use of >>> pointers should be made safe via encapsulation within a secure ADT. >> >> It makes no difference whether references to an ADT is implicit >> or explicit. The ADT cannot know whether or not the application >> that uses it still holds a reference to a given instance. > > The pointers are INTERNAL to the ADT, and the user will never > see them. When the ADT goes out of scope, it dies just like > any predefined type would. We are considering here an ADT which > is declared at the beginning of a (procedure, function, block). > This ADT will do dynamic allocation INTERNALLY so as to facilitate > its own expansion and contraction as necessary. Upon block exit, > the ADT's destroy procedure will be invoked, and all the space > occupied by the ADT will be released. OK ... in the trivial case your scheme does work. The conditions for it to work include: a) NO references to the ADT are passed out of scope. b) the scope exits before you run short of space. c) the programmer (i.e. the application!) does not incorrectly tell the ADT to release an instance too soon. If either a), b) or c) do not hold the program dies. Condition a) can be shown to be true, provided that you can trace all of the side-effects that occur within the scope and PROVE that they do not violate the condition. Condition b) is hard to reason about formally except to the extent that it is possible to place an upper bound on the amount of memory allocated within the scope. Condition c) is virtually impossible to reason about ... unless you do something drastic like forbidding all side-effects. There are other problems with your scheme: o Point 2) is in fact the application taking a hand in an ADT's storage management ... something that he has been saying is unnecessary. [If you leave out 2), then your scheme is equivalent to a stack based storage management with an extra stack!] o GC'ed memory management can reclaim junk (heap nodes that are no longer needed) whenever it runs out of space, whereas a scope based scheme can in general only identify and hence reclaim junk on scope exit. In other words, scope based reclamation of memory will in general need more memory. Finally, don't forget that the case where condition a) is not true; i.e. some of the dynamically allocated ADT instance are passed out of scope either explicitly (as a result) or implicitly (by side-effect). ---- This whole GC versus non-GC debate seems to hinge the following trade-off. Do we want programs that are slower (by say 25%), but are intrinsicly free of a large class of storage management errors. or are usually faster (not always), but may use more storage, may have potential storage leaks and other storage management bugs, and which arguably take longer to develop. ---- I would recommend that people should read Chapter 16 (Memory Management) of Bertrand Meyer's book "Object Oriented Software Construction" for a calm and rational analysis of the pros and cons of various methods of
scc@cl.cam.ac.uk (Stephen Crawley) (09/21/89)
[Sorry about the Re:^n's ... bugs in my news reader] Bill Wolfe proposes a scheme for getting around the need for garbage collection by storing objects in a database, using object expiration times to avoid proper garbage collection. Unfortunately this scheme doesn't achieve its aim. Bill says: "Where objects can have an infinite lifetime, a protocol can be devised whereby the database must be checked at least once per some arbitrary time period by all users, which will permit identification numbers to be recycled after one full time period of nonexistence." Exactly. In order to check each object that it still refers to, each "user" (actually application) must traverse the graph of its objects in the database. This process is exactly equivalent to the mark phase of a garbage collection, except that the application is doing it!!! One other point that Bill Wolfe misses is that the CPU overheads of managing objects in this way can be orders of magnitude greater than the cost of using ordinary pointers. Don't get me wrong, I've been doing research in this particular area for a number of years, and I strongly believe that object databases are the way of the future. But they ain't cheap, and they ain't a substitute for garbage collection. -- Steve
scc@cl.cam.ac.uk (Stephen Crawley) (09/21/89)
Bill Wolfe writes: > OK, first let me clarify my position. The primary use for pointers > is simply to implement abstract data types -- lists, trees, graphs, etc.; > by shielding the user from the details of implementation, we can keep > the user from having to worry about manipulating pointers. Frankly, I can't see how this is possible. Suppose I have an ADT 'A' that has a generator operation NEW and a destructor function DISPOSE. Suppose I want a generic ADT LIST[t: type] that can be used in the following way. a: A aa: A while true do a := A$NEW(...) aa := A$NEW(...) l: LIST[A] := LIST[A]$MAKE_EMPTY() LIST[A]$INSERT(l, a) LIST[A]$INSERT(l, A$NEW(...)) LIST[A]$INSERT(l, a) aa := LIST[A]$REMOVE_HEAD(l) etc end while The generation of instances of A must be under the control of the application and assignment of A must be handled. I must be able to put an A instance into a list any number of times. How do I implement the LIST ADT so that it invokes the A$DISPOSE operation at the right time? When is the right time? What happens if I want to put an instance of A into 2 LIST's at the same time?
djones@megatest.UUCP (Dave Jones) (09/22/89)
A very wise man once said, "Floating point representation is a mistake. If you don't know enough about the problem to scale the quantities, you don't know enough about the problem." May he rest in peace. He was martyred by an angry hoard of Fortran application programmers. If he had lived, today he might say, "Garbage collection is a mistake. If you don't know enough about the problem to free the unreferenced memory, you don't know enough about the problem." If I had lived, I probably would agree with him. I have some library packages, oops -- I meant to say "ADT's" -- which help. I can declare Stacks, which can be marked. Later you can deallocate all the memory above the mark in one very fast pop. That covers many many situations. I also have Heaps which contain blocks all of one size. Not only is it quicker than your typical malloc(), sometimes by a factor of several hundred, but if you are through with all the records in a Heap, just destroy the entire Heap. Both of these, like all my librar.. er ADT's, grow in size indefinitely, as required.
tynor@prism.gatech.EDU (Steve Tynor) (09/22/89)
In article <8222@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes: ... >If he had lived, today he might say, "Garbage collection is a mistake. >If you don't know enough about the problem to free the unreferenced >memory, you don't know enough about the problem." > >If I had lived, I probably would agree with him. ... >all the records in a Heap, just destroy the entire Heap. Both of these, >like all my librar.. er ADT's, grow in size indefinitely, as required. ^^^^^^^^^^^^^^^^^^^^^^^^^ I assume this was intended to be followed by a ":-)". =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= No problem is so formidable that you can't just walk away from it. Steve Tynor Georgia Tech Research Institute Artificial Intelligence Branch tynor@prism.gatech.edu
giuliano@ashtate (Giuliano Carlini) (09/23/89)
From article <6519@hubcap.clemson.edu>, by billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ): > > Upon block exit, > the ADT's destroy procedure will be invoked, and all the space > occupied by the ADT will be released. > > > Bill Wolfe, wtwolfe@hubcap.clemson.edu This can't work. What if the ADT is returned as the result of the block. In order to handle this you either need reference counting, or garbage collection. giuliano uunet!ashton!giuliano -- giuliano Net: ...!uunet!ashtate!giuliano Home: 1322 27th San Pedro, CA 90731 (213)538-6554 Work: Ashton-Tate, 20101 Hamilton Ave., Torrance CA 90502-1319 (213)538-7683
ds@hollin.prime.com (09/24/89)
I believe you are both correct. Most created objects can be managed with explicit creation/destruction code, placed in abstract data type specifications or otherwise part of the application, because their lifetimes are naturally connected to the algorithms (or messages) that use them. Sometimes, however, dynamic reclamation (garbage collection) is truly necessary (to avoid unnaturalness or inefficiency). A useful parallel is the need for both stack and heap data storage in general-purpose programming languages. When an application can use explicit creation and destruction for most of its object management, the garbage collection overhead is reduced, and this has all sorts of good consequences. David Spector Prime Computer, Inc. ds@primerd.prime.com (until my layoff, expected soon)
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/24/89)
From scc@cl.cam.ac.uk (Stephen Crawley): > Bill Wolfe proposes a scheme for getting around the need for garbage > collection by storing objects in a database, using object expiration > times to avoid proper garbage collection. Unfortunately this scheme > doesn't achieve its aim. > > "Where objects can have an infinite lifetime, a protocol can be devised > whereby the database must be checked at least once per some arbitrary > time period by all users, which will permit identification numbers > to be recycled after one full time period of nonexistence." > > Exactly. In order to check each object that it still refers to, each > "user" (actually application) must traverse the graph of its objects in > the database. This process is exactly equivalent to the mark phase of > a garbage collection, except that the application is doing it!!! Not exactly. To start with, the user does not have to do any work at all if the identification number's expiration period is also used as the period over which the user will lose interest in the object if it is not further utilized. In this case, the identification number will be allowed to expire and the user need only throw it away. If the user does make use of the number within the expiration period, then the number is automatically revalidated for another period unless the database replies that the object has been destroyed, in which case the identification number expires as well. In the worst case, the user will have to do a database access just to verify that the object still exists and that the identification number remains valid for another expiration period. Since many objects have a finite lifetime, this burden is borne only by those users who make use of infinite-lifetime objects. The task of maintaining the validity of the identification number can be done automatically if desired using a suitably designed identification-number abstraction which carries a user-settable automatic revalidation on-off switch. There are no reference counts kept at the database; the database only maintains the object, the associated identification number, and any authorization protocols which may be required of users who wish to create, read, modify, or destroy. The expiration period protocol allows identification numbers to be distributed without limitation (to systems which are unknown to the database, possibly not even reachable). It is fault-tolerant, allowing users to die with their identification numbers intact without interfering with the destruction of the associated object. If the user wakes up, the user can either continue to utilize the identification number (if it has not expired), or throw it away due to its expiration. > One other point that Bill Wolfe misses is that the CPU overheads of > managing objects in this way can be orders of magnitude greater > than the cost of using ordinary pointers. Where the situation permits (as it usually does), pointers which are encapsulated within ADT implementations will give better performance. We are managing objects in this way because they present worst-case properties which do not permit us to use more efficient techniques. Bill Wolfe, wtwolfe@hubcap.clemson.edu
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/24/89)
From ds@hollin.prime.com: > Sometimes, [...] dynamic reclamation (garbage collection) is > truly necessary (to avoid unnaturalness or inefficiency). A > useful parallel is the need for both stack and heap data storage in > general-purpose programming languages. I think we've established that managed and unmanaged storage paradigms can coexist, and that components which manage their own storage can avoid the inefficiencies of garbage collection. We also know that the user MUST be involved in storage management, if for no other reason than to decide which data structures to throw away in the event of a storage crisis. What I'd like to know is: Is there a hard counterexample which will demonstrate that situations exist in which the inefficiencies of garbage collection cannot be avoided? Someone has already outlined the properties that such a counterexample should have, and mentioned a "gut feeling" that such a counterexample exists. Unfortunately, I have a similar "gut feeling" that GC is never worth the cost. Bill Wolfe, wtwolfe@hubcap.clemson.edu
scc@cl.cam.ac.uk (Stephen Crawley) (09/25/89)
I wrote: >> Bill Wolfe proposes a scheme for getting around the need for garbage >> collection by storing objects in a database, using object expiration >> times to avoid proper garbage collection. Unfortunately this scheme >> doesn't achieve its aim. >> >> "Where objects can have an infinite lifetime, a protocol can be devised >> whereby the database must be checked at least once per some arbitrary >> time period by all users, which will permit identification numbers >> to be recycled after one full time period of nonexistence." >> >> Exactly. In order to check each object that it still refers to, each >> "user" (actually application) must traverse the graph of its objects in >> the database. This process is exactly equivalent to the mark phase of >> a garbage collection, except that the application is doing it!!! Bill Wolfe replies > Not exactly. To start with, the user does not have to do any work > at all if the identification number's expiration period is also used > as the period over which the user will lose interest in the object if > it is not further utilized. In this case, the identification number > will be allowed to expire and the user need only throw it away. In other words the objects don't have a long lifetime. This is a red herring!! The case we were discussing is the one where the objects DO have long lifetimes. Bill continues: > If the user does make use of the number within the expiration period, > then the number is automatically revalidated for another period ... Granted. But unless there is a GUARANTEE that EVERY object of interest is touched periodically by the application, there is a risk that objects of interest to the user will be destroyed prematurely. > ... unless the database replies that the object has been destroyed, > in which case the identification number expires as well. From the user's point of view, this is unacceptable behaviour by the application / database cabal. Bill continues: > In the worst case, the user will have to do a database access just to > verify that the object still exists and that the identification number > remains valid for another expiration period. Since many objects have > a finite lifetime, this burden is borne only by those users who make > use of infinite-lifetime objects. Not true. There is no way for an application to intuit the lifetime of a given set of database entries. If it makes a guess, and is wrong it has committed the sin of destroying user data. We can't expect the user to specify an accurate lifetime either. The user can't accurately predict the future any more than an application can. It would be totally unreasonable to then come back to a user and say "21 of your objects expired and were deleted. Oh, you didn't want that? Well tough." In this day and age, users are more and more naive, and systems must be correspondingly forgiving. Therefore, the burden of storage management must be borne by ALL applications that deal with POTENTIALLY long lived data items. In practice this means ALL applications that store data in the database. Bill continues: > The task of maintaining the validity of the identification number can > be done automatically if desired using a suitably designed identification- > number abstraction which carries a user-settable automatic revalidation > on-off switch. I'm not sure, but this sounds like Bill is finally admitting that the application needs to include a GC style marker. However ... user-settable object refreshing is like giving a hang grenade to a two year old. I can just imagine it now: User: I came in this morning and half of my PC's database has disappeared Support: Hmmm ... why is object refresshing turned off??? User: Joe told me that turning off object would make the database would make it go faster. Bill continues to describe an object access control scheme, which is pretty much beside the point ... I wrote: >> One other point that Bill Wolfe misses is that the CPU overheads of >> managing objects in this way can be orders of magnitude greater >> than the cost of using ordinary pointers. Bill replies: > We are managing objects in this way because they present worst-case > properties which do not permit us to use more efficient techniques. What about garbage collection??? Surely Bill's scheme with object numbers and expiration times and (as Bill finally admits) object refreshing in application code is MUCH MORE EXPENSIVE than an efficient garbage collector! Bear in mind that a previous article (sorry no ref.) claimed measurements of overheads of only 3-5% for a modern garbage collector. -- Steve
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)
From scc@cl.cam.ac.uk (Stephen Crawley): >> ... unless the database replies that the object has been destroyed, >> in which case the identification number expires as well. > > From the user's point of view, this is unacceptable behaviour by the > application / database cabal. The model involves a database which stores objects which can be created, destroyed, read, and written by various users. The users might be partitioned into different groups (e.g., those authorized to destroy vs. those who cannot). Now from the perspective of a user who can read and write but not destroy, the object's lifetime is potentially infinite, depending upon the whims of those who have the power to destroy. By revalidating the identification number, such users receive continued assurances that the object still exists. If the object is destroyed, the identification numbers will then all expire automatically, regardless of how widely they were distributed. > Bill replies: >> We are managing objects in this way because they present worst-case >> properties which do not permit us to use more efficient techniques. > > What about garbage collection??? Won't work under the conditions I described (distributed environment). Bill Wolfe, wtwolfe@hubcap.clemson.edu
johnson@p.cs.uiuc.edu (09/27/89)
> by Bill Wolfe > I think we've established that managed and unmanaged storage > paradigms can coexist, and that components which manage their > own storage can avoid the inefficiencies of garbage collection. In fact, I don't believe that managed and unmanaged storage paradigm should coexist in one program. You should either use components that use automatic garbage collection or components that do not. While it is possible to link C code to Lisp or Smalltalk, it is always tricky and error prone. > We also know that the user MUST be involved in storage management, > if for no other reason than to decide which data structures to > throw away in the event of a storage crisis. Automatic garbage collection prevents storage crises. The system I use generate an exception if it runs out of memory in spite of a garbage collection, but I wouldn't dream of trying to handle that automatically. Moreover, it only happens during debugging, and usually means an infinite loop. Bill Wolfe keeps making statements to the effect that garbage collection is expensive. That is totally false. Garbage collection is cheap. Anyone who is worried by 5% should be considering assembly language. Garbage collection is cheap and is going to be cheaper. For example, there has been little work on using optimizing compilers to reduce the cost of garbage collection. I bet that a good compiler can make automatic garbage collection cheaper than doing it yourself. The problem with garbage collection is that the efficient algorithms are not real-time. There is work being done in this area, and perhaps we will soon find a way to make real-time programming compatible with automatic memory management, but I don't think it is there yet. However, the problem with garbage collection is NOT efficiency when you measure the cost of garbage collection over the life of a program. Ralph Johnson
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)
From johnson@p.cs.uiuc.edu: > In fact, I don't believe that managed and unmanaged storage paradigm > should coexist in one program. You should either use components that > use automatic garbage collection or components that do not. While it > is possible to link C code to Lisp or Smalltalk, it is always tricky > and error prone. Since Ada components can coexist with no difficulty, this is not a true statement in general. > Automatic garbage collection prevents storage crises. The system I > use generate an exception if it runs out of memory in spite of a > garbage collection, but I wouldn't dream of trying to handle that > automatically. Those who write embedded systems would tend to disagree. > [The cost is only 5%] At what cost in terms of extra space requirements? > [Real-time GC is coming, Real Soon Now] When it gets there, let me know. Bill Wolfe, wtwolfe@hubcap.clemson.edu
scc@cl.cam.ac.uk (Stephen Crawley) (09/27/89)
From wtwolfe@hubcap.clemson.edu: > From johnson@p.cs.uiuc.edu: >> In fact, I don't believe that managed and unmanaged storage paradigm >> should coexist in one program. You should either use components that >> use automatic garbage collection or components that do not. While it >> is possible to link C code to Lisp or Smalltalk, it is always tricky >> and error prone. > > Since Ada components can coexist with no difficulty, this is > not a true statement in general. Suppose we garbage collect an object that contains a pointer to a non-GC'able object X. The garbage collector cannot reclaim X and it cannot tell the ADT for X that the pointer has been blown away. In general the object X is likely to be lost to a storage leak. The only possible escape is if the garbage collection mechanism includes some scheme for applying a "destructor" to an object that is about to be reclaimed by the GC. I have not come across such a scheme described in the ADA RM. -- Steve