billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/21/89)
From scc@cl.cam.ac.uk (Stephen Crawley): >> This ADT will do dynamic allocation INTERNALLY so as to facilitate >> its own expansion and contraction as necessary. Upon block exit, >> the ADT's destroy procedure will be invoked, and all the space >> occupied by the ADT will be released. > > OK ... in the trivial case your scheme does work. The conditions > for it to work include: > > a) NO references to the ADT are passed out of scope. In keeping with the general principle that application programmers should not be using pointers; appropriate ADTs should be used instead. > b) the scope exits before you run short of space. Or you have programmed your application to properly handle the exception STORAGE_ERROR. The sequence is then as follows: 1) An ADT request comes in for which the ADT decides it will need more space. The ADT attempts to obtain the space, but STORAGE_ERROR is raised. Fortunately, the ADT is programmed so as to guarantee to the user that the ADT will remain in its original state (i.e., the transaction will abort). 2) The ADT now either directly propagates STORAGE_ERROR or something equivalent (Unable_To_Comply, for example) to the user, in accordance with the ADT's specification. 3) The user can then take whatever steps are deemed appropriate in order to handle the exception. Note carefully that only the user is in a position to decide what data structures to throw away in such a crisis. > c) the programmer (i.e. the application!) does not incorrectly tell > the ADT to release an instance too soon. Since the default is that the ADT will die automatically upon going out of scope, the user will presumably refrain from invoking Destroy without a fairly good reason. > o GC'ed memory management can reclaim junk (heap nodes that are no longer > needed) whenever it runs out of space, whereas a scope based scheme > can in general only identify and hence reclaim junk on scope exit. > In other words, scope based reclamation of memory will in general > need more memory. Not true. Consider -- what precisely is junk? If we define it as "memory that is no longer needed due to deallocate operations", then the ADT will automatically free up all such memory by virtue of the semantics of the deallocation process; an order to deallocate constitutes a guarantee from the ADT that it is completely safe to do so; it knows this because it has received an order from the user (e.g., Remove_Item) which removes the need for the space, and since the ADT is in total control of the space involved, it can proceed to issue the deallocation order and thereby reduce the size of the ADT's representation. If we define junk as "memory which is strictly not necessary at this point in the program", then we require a "Read Programmer's Mind" instruction. Not happening to have such an instruction, we must rely on the programmer to communicate the fact that a data structure is no longer necessary via the Destroy command. Incidentally, the ADT can provide as one of its services a function which indicates the amount of space currently occupied by the ADT, which the user can make use of when it becomes necessary to decide which data structures to throw away. This is made possible by the 'SIZE attribute in Ada; the ADT simply adds up the 'SIZE of all the space it is occupying, and returns that value to the user. The user can then simply select the data structure for which the cost function (Size/Value) is maximal and order it Destroyed, and then proceed to retry the aborted operation. > Finally, don't forget that the case where condition a) is not true; i.e. > some of the dynamically allocated ADT instance are passed out of scope > either explicitly (as a result) or implicitly (by side-effect). Which can't happen unless either the applications programmer is using pointers directly (naughty naughty), or something called by the application programmer (in effect, an extension of the application programmer) is doing it on his/her behalf. In the latter case, we this presumably is in accordance with the called unit's specification, and hence remains the fault of the applications programmer. The rule is simple: Leave the use of pointers to ADT implementors, and simply reuse the code they produce. Exploit to the absolute fullest the hard guarantees (proper space management, serializability, recoverability, concurrent processing, etc.) that the ADT implementors worked so hard to provide you with, and thy Manager will be pleased. > This whole GC versus non-GC debate seems to hinge the following trade-off. > Do we want programs that are slower (by say 25%), but are intrinsicly > free of a large class of storage management errors, or are usually faster > (not always), but may use more storage, may have potential storage leaks > and other storage management bugs, and which arguably take longer to > develop. All of the negatives you cite (except "may use more storage", which is bogus) are overcome by the reuse paradigm. The ADT is developed with extremely high quality at a high cost, but this cost is then spread over an extremely large number of users until the end of time; the result is very much an economic win-win situation. Bill Wolfe, wtwolfe@hubcap.clemson.edu
maslen@Neon.Stanford.EDU (Thomas Maslen) (09/22/89)
In the last week or so there has been an interesting, er, discussion on the desirability of GC in production code. Bill "ADA is the one true way" Wolfe and perhaps Anton Rang have argued the negative case, while Ian Hopper, Scott Henninger, John Gateley, Steve Tynor, Rob MacLachlan, Stephen Crawley, Ted Dunning and Ralph Johnson have been on the affirmative [my apologies if I've mis-characterized anyone's position]. No prizes for guessing my biases: I think the people arguing for GC have the right idea. The paragraph above added exactly no technical content to this discussion. What we need is something that does. If we're to have any hope of convincing the ADAholics that maybe, just maybe, languages need GC in order to be useful, we need a crisp example of a problem that: - requires various unrelated parts of a program to use one instance of an ADT (either because it's too large to copy, or because we really do want updates to apply to the one instance), - doesn't have any obvious point at which we can say "we're sure we've finished with it now, but we definitely couldn't have got rid of it much earlier", - has sufficiently stringent efficiency requirements that we can't pass around magic numbers, or whatever it was that Bill W was proposing a few days ago, which have to be translated before we can actually use the ADT, - is sufficiently tricky that we can't rely on the programmer magically remebering to "do the right thing", - is arguably representative of a non-trivial class of real problems. I don't have such a problem handy, just a gut feeling that the insides of real compilers are rich with suitable examples. Any volunteers? If nobody has a decent example, then maybe Bill W has a point. Since this is likely to involve non-negligible volumes of code, and some of us try to digest lunch while reading this group, I'd like to suggest that E-mail may be more appropriate than clogging up the group. Next, a side bet: In article <6530@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes: [mucho stuff deleted] > In keeping with the general principle that application programmers > should not be using pointers; appropriate ADTs should be used instead. [more stuff deleted] > The rule is simple: Leave the use of pointers to ADT implementors, > and simply reuse the code they produce. Exploit to the absolute > fullest the hard guarantees (proper space management, serializability, > recoverability, concurrent processing, etc.) that the ADT implementors > worked so hard to provide you with, and thy Manager will be pleased. My guess is that any ADA solutions we see to my hypothetical problem will involve auxiliary "appropriate ADTs" which are just limited private types containing a pointer to the reference-counted real ADT. Finally, has anyone made use of the (Boehm et al) GC-for-C that was recently posted to comp.sources.unix? Any comments? Thomas Maslen maslen@polya.stanford.edu
mwolf@Teknowledge.COM (Michael Wolf) (09/22/89)
In article <11885@polya.Stanford.EDU> maslen@Polya.Stanford.EDU (Thomas Maslen) writes: >My guess is that any ADA solutions we see to my hypothetical problem >will involve auxiliary "appropriate ADTs" which are just limited >private types containing a pointer to the reference-counted real ADT. I don't see what the problem is with passing around private types with pointers to reference-counted ADT's. It adds a small amount of overhead to pointer references, and a couple bytes per object for reference counts. If you've got a language which enforces the "private" part, you're pretty damn sure that your reference count is correct, and it's then easy to tell when it is or isn't okay to deallocate an object. The only thing you need is a destructor routine called on each "private" pointer when it goes out of scope or get's deallocated. The nice thing about this approach is that you KNOW that garbage collection isn't going to unpredicatably impact your performance. Now, if you can find a GC algorithm that's as efficient, I'll consider switching. Also, I don't see that writing such a reference counting ADT is all that much tougher then writing the same ADT without the reference counting, so it doesn't take too much extra development time. There may be algorithms which require so many pointer references, and so little memory allocation/deallocation that a little GC would be more efficent, but I'm betting they're in the minority. Of course, it's possible to get applications where you can't afford the space overhead of a couple bytes per object, and then GC would be the way to go. Michael Wolf These opinions are bootleg from Taiwan, so it comes mwolf@Teknowledge.COM as no surprise to find that they're defective.
scc@cl.cam.ac.uk (Stephen Crawley) (09/23/89)
Michael Wolf writes: > I don't see what the problem is with passing around private types > with pointers to reference-counted ADT's. It adds a small amount of > overhead to pointer references, and a couple bytes per object for > reference counts. If you've got a language which enforces the > "private" part, you're pretty damn sure that your reference count > is correct, and it's then easy to tell when it is or isn't okay > to deallocate an object. Ok so far ... sort of > The only thing you need is a destructor > routine called on each "private" pointer when it goes out of scope > or get's deallocated. And there's the problem with this scheme. As I tried to explain before, scope based deallocation works if and only if: 1) the scope DOES exit. [The global scope and the scope of main() NEVER exit for the sake of this argument.] 2) the scope exits BEFORE we start running out of space 3) and we can afford the extra storage overhead of remembering that we allocated object X in this scope Y. [You either need to build a chain of objects allocated at each scope level or a new heap for each scope level ...] In general, one of the above criteria does not hold. This means that we must rely on some other mechanism to tell the ADT to apply the destructor to a given instance at the appropriate time. I claim that this means that the application programmer needs to deal with this in significant percentage of the cases. This is possible, but it is very difficult to get correct when (for instance) the private pseudo-pointer type is itself passed around the place in complicated ways or when it is stored in dynamic data structures. Bill Wolfe maintains that the problem can be solved with a higher level library ADTs that implement generic list, tree, graphs etc. I claim he is incorrect and have presented an example (see <902@scaup.cl.cam.ac.uk>) which I believe demonstrates this. Lets try not to rehash old arguments ... -- Steve
scc@cl.cam.ac.uk (Stephen Crawley) (09/23/89)
Bill Wolfe wrote: >>> This ADT will do dynamic allocation INTERNALLY so as to facilitate >>> its own expansion and contraction as necessary. Upon block exit, >>> the ADT's destroy procedure will be invoked, and all the space >>> occupied by the ADT will be released. I replied: >> OK ... in the trivial case your scheme does work. The conditions >> for it to work include: >> >> a) NO references to the ADT are passed out of scope. Bill replied: > In keeping with the general principle that application programmers > should not be using pointers; appropriate ADTs should be used instead. An I claim that it is IMPOSSIBLE to build sufficiently general library ADT's! See article <902@scaup.cl.cam.ac.uk> for an example. [And I've a better one up my slieve ;-)] I wrote: >> b) the scope exits before you run short of space. Bill replies: > Or you have programmed your application to properly handle the > exception STORAGE_ERROR. The sequence is then as follows: > > [Bill then describes a scheme where the application catches a > STORAGE_ERROR exception, causes the current transaction to be > aborted and throws away some (global) data structures after > asking the user.] This is unrealistic. It assumes that there are enough large global data structures to be thrown. It also assumes that the application contains the necessary code for interacting with the end user; i.e. asking questions that he/she has a chance of understanding. Neither will be true in general. I wrote: >> c) the programmer (i.e. the application!) does not incorrectly tell >> the ADT to release an instance too soon. Bill replies: > Since the default is that the ADT will die automatically upon going > out of scope, the user will presumably refrain from invoking Destroy ^^^^ [he means programmer I think ...] > without a fairly good reason. In this case the *good reason* is that the scope does not exit soon enough; if the programmer doesn't reclaim the ADT before scope exit, the program will run out of space and die. I wrote: >> GC'ed memory management can reclaim junk (heap nodes that are no longer >> needed) whenever it runs out of space, whereas a scope based scheme >> can in general only identify and hence reclaim junk on scope exit. >> In other words, scope based reclamation of memory will in general > need more memory. Bill replied: > Not true. Consider -- what precisely is junk? I define junk to mean dynamically allocated space which was allocated in the current scope (or passed down from a higher scope) that is no longer accessible from an in-scope variable anywhere in the program. In a complex program, such junk may be generated in such a way that it IMPOSSIBLE for any application code or any library ADT to keep track of its accessibility ... without implementing a garbage collector! Bill continues: > If we define it as "memory that is no longer needed due to > deallocate operations" [...] No thats not what I meant ... Bill continues: > If we define junk as "memory which is strictly not necessary at this > point in the program" This is approximately what I mean ... > then we require a "Read Programmer's Mind" instruction. Oh no we don't!!!! A garbage collector can (and will) reclaim a lot of this junk by tracing all in-scope variables etc. The garbage collector can do a "perfect" job of this. Storage reclamation code in an ADT or the application can only do a half-hearted job in difficult cases. Such a difficult case might involve detecting that a subgraph containing cycles has become detached from its root ... the case where reference counting does not work. My argument is that the difference between a perfect job and imperfect job may be enough to cause the program to die. I therefore maintain that my "may use more storage" point is CORRECT. Finally (sigh) I wrote: >> Finally, don't forget that the case where condition a) is not true; i.e. >> some of the dynamically allocated ADT instance are passed out of scope >> either explicitly (as a result) or implicitly (by side-effect). Bill replies: > Which can't happen unless either the applications programmer is > using pointers directly (naughty naughty), or something called by the > application programmer (in effect, an extension of the application > programmer) is doing it on his/her behalf. In the latter case, we > this presumably is in accordance with the called unit's specification, > and hence remains the fault of the applications programmer. Sigh. > The rule is simple: Leave the use of pointers to ADT implementors, > and simply reuse the code they produce. Exploit to the absolute > fullest the hard guarantees (proper space management, serializability, > recoverability, concurrent processing, etc.) that the ADT implementors > worked so hard to provide you with, and thy Manager will be pleased. Except that ADT implementors haven't because it is impossible. And my and their Managers had damn well better not start asking the impossible, 'cos it will be on HIS head when we can't deliver! (Don't worry, I will have asked for a transfer ... or resigned by then.) -- Steve
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/25/89)
From scc@cl.cam.ac.uk (Stephen Crawley): >> [Bill then describes a scheme where the application catches a >> STORAGE_ERROR exception, causes the current transaction to be >> aborted and throws away some (global) data structures after >> asking the user.] > > This is unrealistic. It assumes that there are enough large global > data structures to be thrown. It also assumes that the application > contains the necessary code for interacting with the end user; i.e. > asking questions that he/she has a chance of understanding. Neither > will be true in general. This hasn't even a remote resemblance to what I said. The ADT aborts the transaction because it can't satisfy the request. The user receives an exception to that effect. The user does something to alleviate the problem (e.g., selecting and destroying a data structure) in the process of handling the exception, and proceeds to retry the operation. Incidentally, the questions you are asking (and this is NOT, repeat NOT, intended to reflect negatively on Mr. Crawley) in this and at least one previous article indicate that you do not really understand generics or exception handling at all, and I would therefore suggest that you investigate some basic Ada texts in order to understand these basic concepts and better participate in the discussion. > I define junk to mean dynamically allocated space which was allocated > in the current scope (or passed down from a higher scope) that is no > longer accessible from an in-scope variable anywhere in the program. > In a complex program, such junk may be generated in such a way that > it IMPOSSIBLE for any application code or any library ADT to keep track > of its accessibility ... without implementing a garbage collector! And I contend that if the program is that complex, then it is badly written; the programmer probably has not made use of the basic principle that "abstraction is the fundamental means by which the computer scientist can combat complexity". Abstract data types are an integral part of this process. >> then we require a "Read Programmer's Mind" instruction. > > Oh no we don't!!!! A garbage collector can (and will) reclaim a lot of > this junk by tracing all in-scope variables etc. The garbage collector > can do a "perfect" job of this. Storage reclamation code in an ADT or > the application can only do a half-hearted job in difficult cases. > Such a difficult case might involve detecting that a subgraph containing > cycles has become detached from its root ... the case where reference > counting does not work. My argument is that the difference between a > perfect job and imperfect job may be enough to cause the program to die. Not a difficult case at all, if we have (for example) a generic Simple_Directed_Graph ADT. Again, the argument here reflects a lack of understanding of the basic ideas behind generic software components. >> The rule is simple: Leave the use of pointers to ADT implementors, >> and simply reuse the code they produce. Exploit to the absolute >> fullest the hard guarantees (proper space management, serializability, >> recoverability, concurrent processing, etc.) that the ADT implementors >> worked so hard to provide you with, and thy Manager will be pleased. > > Except that ADT implementors haven't because it is impossible. It is quite possible; start by reading Booch's "Software Components with Ada", and once you understand that you will be in a position to consider some more advanced techniques. Then if you still want to argue against my position, you will at least understand it well enough to launch a reasonable attack. Bill Wolfe, wtwolfe@hubcap.clemson.edu
grichard@cherokee.cis.ohio-state.edu (Golden Richard) (09/25/89)
In article <909@scaup.cl.cam.ac.uk> scc@cl.cam.ac.uk (Stephen Crawley) writes:
[pro-GC arguments deleted]
I find the current GC wars rather interesting, but I find myself siding
naturally with Bill simply because I have *never* encountered a situation
that absolutely demanded GC. Even without automatic scope-exit destructors
for ADTs, programmer-controlled storage management isn't difficult.
I'd be most interested in seeing some concrete example where GC was the
'only way to go'.
-=-
Golden Richard III OSU Department of Computer and Information Science
grichard@cis.ohio-state.edu "I'm absolutely positive! ...or not."
scc@cl.cam.ac.uk (Stephen Crawley) (09/25/89)
Golden Richard III writes: > I find the current GC wars rather interesting, but I find myself siding > naturally with Bill simply because I have *never* encountered a situation > that absolutely demanded GC. Even without automatic scope-exit destructors > for ADTs, programmer-controlled storage management isn't difficult. > I'd be most interested in seeing some concrete example where GC was the > 'only way to go'. Here are some problems where programmer-controlled storage management would be very difficult or impossible. Interactive editting of cyclic graph data structures. You have a heterogeneous cyclic graph data structure and the editor must be able to make arbitrary sequences of changes to the data structure. How do you detect that a subgraph has become detached? This sort of problem arises in many real life problems such as graphical editors and in various programming environment tools. In practice, many such applications don't try very hard to reclaiming storage. When the application has leaked so much space that the processor starts to thrash, the user can always save his work and restart ... Dynamic loading & unloading of modules in a running program. An application consists of a static set of core modules and a number of other modules that are dynamically link-loaded on demand. When the programmer recompiles a dynamic module, he may want to reload it and start using it without shutting down the entire application. But to do this, the dynamic loader needs to know if the module is currently being 'used' ... e.g. if some thread is executing a procedure exported by the module or if some other part of the application has salted away a reference into the module (e.g. a procedure pointer) This problem arose when I was using Mesa / XDE to implement my entity system; a persistent object system. I did get the dynamic loading to work ... and it made a big performance difference, but dynamic unloading could not be done safely, since the Mesa / XDE environment has no garbage collector. [Incidentally, the lack of garbage collection caused so many reliability problems that I was forced to give up using Mesa / XDE altogether about 2 years ago. The new version of the entity system uses MIT CLU as the base language. CLU is garbage collected, and though the GC I am using is slow and has some unpleasant attributes, and though I've had to do some extensive butchery on unsavoury parts of CLU runtime system, I do not regret making the switch] -- Steve
scc@cl.cam.ac.uk (Stephen Crawley) (09/25/89)
Bill Wolfe writes: > Incidentally, the questions you are asking (and this is NOT, > repeat NOT, intended to reflect negatively on Mr. Crawley) in > this and at least one previous article indicate that you do not > really understand generics or exception handling at all, and I > would therefore suggest that you investigate some basic Ada texts > in order to understand these basic concepts and better participate > in the discussion. Nice one Bill. I'll have you know Mr Wolfe that I've been using languages with exception handling (CLU and Mesa) and generic data types (CLU) for many years. I understand the concepts quite adequately thank you. I've also implemented a variety of storage managers, including garbage collectors both for short term and long term data structures, and I've done quite a bit of work on hybrid (i.e. partly GC'ed, partly application controlled) storage management solutions. Frankly I don't see why it is necessary to understand ADA intimately to talk about software reuse. The fact that two issues seem to be so firmly associated in your mind is very sad, since it seems to blind you the possibility of non-Ada solutions. Mr Wolfe: I used to believe fervently (like you) that the overhead garbage collection was unacceptable. Many years of bitter experience trying to get by without it have lead me to change my mind. Perhaps if you had had my experience your views would be different. You never know, you might even be wrong! I'll answer your "technical" points later. -- Steve
wright@hsi.UUCP (Gary Wright) (09/25/89)
In article <62342@tut.cis.ohio-state.edu> Golden Richard <grichard@cis.ohio-state.edu> writes: >In article <909@scaup.cl.cam.ac.uk> scc@cl.cam.ac.uk (Stephen Crawley) writes: >[pro-GC arguments deleted] > >I find the current GC wars rather interesting, but I find myself siding >naturally with Bill simply because I have *never* encountered a situation >that absolutely demanded GC. Even without automatic scope-exit destructors >for ADTs, programmer-controlled storage management isn't difficult. >I'd be most interested in seeing some concrete example where GC was the >'only way to go'. Well, I have *never* encountered a situation that absolutely demanded using a high level language. That is why I still do all my programming in assembly. :-) If our criteria for deciding whether GC is better than explicit memory management is based on absolutes, we will never come to a conclusion. Explicit memory management is theoretically speaking equivelent to GC (we are all using Turing machines aren't we?). What we need to determine is if a language that supports GC allows programs to be expressed in a form that is "better" than a language without GC. Clearly, it will be hard to come to a conclusion about which form of expression is "better" than another. We also need to determine if the benefits derived from the ease of expression outweigh any performace degradation. This trade off has been made in the past. A hand-coded assembly program can probably be made to be more efficient in time and space than a program generated by a high-level language compiler. Yet most programmers don't use assembly. Perhaps this analogy holds for the current debate regarding GC. It is interesting to note the advances that have been made in optimizing compilers. Perhaps a sufficiently smart compiler for a language that supports GC can figure out when GC can be avoided? That is to say, the compiler can notice when the last reference to an object will be lost and can explicity "dispose" of the object. In general, the compiler won't be able to determine this (thus GC) but for the simple cases, why not? I have seen a number of people claim that GC can not be used in a real-time system and then conclude that GC is a bad thing that should not be used. If this isn't a non sequitur I don't know what is. There will always be special cases that will require special techniques. What I am interested in are language features that make my life easier for the majority of applications that don't fall under the catagory of "special case". I also would like compilers that can do the dirty, tedious work of deciding when to use special techniques instead of a more general one. Areas in which I see this now or in the future are: register usage parameter passing techniques dynamic vs. static binding (in the contect of OOP) function inlining GC vs non-GC I am sure others can come up with more examples. My predicition is that more an more people will start using languages with GC but there will always remain those who think it is a "bad thing", and there will always be situations in which GC should not be used. There are people today who claim that we don't need anything more than assemblers, and there are situations today in which it is the only reasonable solution. -- Gary Wright ...!uunet!hsi!wright Health Systems International wright@hsi.com
grichard@hockey.cis.ohio-state.edu (Golden Richard) (09/26/89)
In article <599@hsi86.hsi.UUCP> wright@hsi.com (Gary Wright) writes: >In article <62342@tut.cis.ohio-state.edu> Golden Richard <grichard@cis.ohio-state.edu> writes: >>In article <909@scaup.cl.cam.ac.uk> scc@cl.cam.ac.uk (Stephen Crawley) writes: >>[pro-GC arguments deleted] > >Well, I have *never* encountered a situation that absolutely demanded using >a high level language. That is why I still do all my programming in >assembly. :-) > >If our criteria for deciding whether GC is better than explicit memory >management is based on absolutes, we will never come to a conclusion. > I never intended to sound as if I were daring someone to come up with an application which I couldn't handle with programmer-managed reclamation. The "absolutely demanded" phrase in my article was prompted by references to (unnamed) applications which supposedly require GC to ease the burden on the programmer. To rephrase my statement, I have never encountered an application where GC was necessarily preferable to programmer-managed reclamation. Furthermore, I can't imagine a properly designed application where storage management is so complicated that the usual techniques for programmer-managed storage reclamation aren't adequate. If the programmer can't define when a certain object can go poof, I suspect a serious lack of understanding of the problem at hand. -=- Golden Richard III OSU Department of Computer and Information Science grichard@cis.ohio-state.edu "I'm absolutely positive! ...or not."
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/26/89)
From wright@hsi.UUCP (Gary Wright): > I have seen a number of people claim that GC can not be used in a > real-time system and then conclude that GC is a bad thing that should > not be used. You're missing several important steps here having to do with performance degradation and the violation of the general principle that one should never do at run time that which can be done instead at compile time, design time, etc.; in general, the sooner a problem is handled (and similarly, the sooner an error is detected), the less costly it will be to handle it. Why pay and pay and pay at run time when an efficient self-managing solution (valid forevermore) can be used (and reused) instead? Bill Wolfe, wtwolfe@hubcap.clemson.edu
amull@Morgan.COM (Andrew P. Mullhaupt) (09/26/89)
The debate raging over garbage collection in this newsgroup is an interesting contrast to the "effect of free()" thread in comp.lang.c; there are many there who argue that they need to examine pointers after de-allocating them(!) One argument advanced is that even though it's a bad thing, since it's usually OK, it should become standard practice. This is to my mind in character for C programmers. In this newsgroup, we find other programmers arguing that programmers should have no direct use for pointers, and so the language can determine the correct lifetime of dynamic data. I suggest it is going to prove necessary for the languages which have this "Pascalian" point of view to close all loopholes (typecasts to integer, etc.) to prevent the widespread use of C style perfidious trickery to escape from sensible programming practice. (If you give them an inch...) I would also point out that despite the claims of its adherents, APL (A Pointerless Language) despite having no explicit pointers does not do a great job with storage management. Often space is allocated, then de-allocated as soon as possible, only to be re-allocated for the same object, somewhere else in the workspace, with the attendant risks of fragmentation and garbage collection. The programmer is often forced to peculiar code (even for APL) in order to try and provide memory management cues to the interpreter. There is often no relief for the problem when storage efficiency requires re-use of a local variable for more than one purpose. At least they have the cute idea of always garbage collection immediately after printing the last response to the user. (This has the effect of making garbage collection less noticable when working in the typical sofware engineer's paradise of a tiny test problem, and excrutiatingly painful when you get a trivial syntax error in a workspace where you just started a huge operation.) The point is that it makes a lot of difference when garbage collection is performed, and if the programmer can control it. Another APL story: When calling FORTRAN from APL, you must allocate the storage for the FORTRAN subprogram in the APL workspace. There are many important FORTRAN subprograms which involve two calls sharing data in an auxilliary array residing in the APL workspace, (i.e. at the mercy of the APL garbage collector.) It can happen that APL decides it must perform garbage collection between the two calls to FORTRAN and the auxilliary array is moved to a new home. As can be expected in cases like this, the manual guarantees unpredictable results. The classic example of this is a Fast Fourier Transform which sets up all the trig calls in one call and performs the transform in a second. Repeated transforms can then be done by repeating the second call only, if you're calling from FORTRAN, (or C, or Pascal etc.) but this is too daring from APL. And since there is no way to force APL to leave your array alone, the only hope, and common practice, is to ensure that the garbage collection is performed before the first call to FORTRAN, and to look at the the size of the available workspace to make sure the arrays to be transformed are not large enough to trigger the calamity again. This means that you always suffer the performace hit of garbage collection, and you always have to essentially manage your own storage. If writing an FFT in APL wasn't so slow, and the highly vectorized FORTRAN FFT so fast, and the data arrays so large as to endanger garbage collection, this wouldn't be much of a payoff. But they are. This last example falls into the category of "application programmer doing something naughty (by extension through library routine) with (what are essentially) pointers." The interesting part is that *BOTH* APL and FORTRAN get to accuse you of this: APL doesn't want you to rely on the array being in any given place, and FORTRAN doesn't want you to be able to move it. This is the kind of unforseen conflict which prevents "RE-USABILITY" in many cases. From their separate points of view, both the FORTRAN and APL implementors have correct and sensible implementations, but their products are being used (to very good result) in a way which neither expected, and through silly tricks which make for highly unreadable and difficult to maintain code. The moral of the story is: 1. Put in garbage collection, but also put in sufficient controls for the programmer. 2. If you need to make assertions about scope of dynamic data, FORCE them. 3. Programmers (even us Pascal lovers) will not stop at anything when the big push comes to the big shove, and we'll likely get around sensible and normal guidelines which are not supported by adequate sensible facilities. Later, Andrew Mullhaupt. Disclaimer: Any opinions here are not necessarily those of my employer. (Did you know that Morgan Stanley & Co. Inc., is the world's largest APL shop? IBM told us so...)
jans@tekgvs.LABS.TEK.COM (Jan Steinman) (09/26/89)
<62342@tut.cis.ohio-state.edu (Golden Richard)> <<wright@hsi.UUCP (Gary Wright)>> <...I can't imagine a properly designed application where storage management is so complicated that the usual techniques for programmer-managed storage reclamation aren't adequate. If the programmer can't define when a certain object can go poof, I suspect a serious lack of understanding of the problem at hand.> One who cannot imagine a need for garbage collection other than to correct for programmer deficiencies simply lacks imagination. The situation is not so simple in dynamic-bound systems. Are you claiming that Smalltalk, Lisp, Eiffel, Objective C, Gnu Emacs, et. al. have garbage collectors simply because those who program in them lack serious understanding of their problems? In particular, systems designed for rapid prototyping may have storage systems that have completely non deterministic usage patterns. I'm certain those who have used non-GC systems to build things similar to the NeXT InterfaceBuilder or the Smalltalk Browser have come to appreciate the need for GC. To bring it back to the newsgroup, the problem of reusable components is intimately concerned with automatic storage management. The need to document storage procedures is yet another reason truly reusable components are rare. I really agree with Gary: <If our criteria for deciding whether GC is better than explicit memory management is based on absolutes, we will never come to a conclusion.> Once the religious give up their stubborn, blind opposition to GC, we can all begin to discover when to use it, when not to, and how to control it so that it stays out of the way when not used. I can pound most nails with the handle of a screwdriver, but I'm a fool if I declare there will never be a hammer in my toolbox! In particular, I don't think the combination of asynchronous GC with programmer storage handling has been adequately explored. Creation and destruction of many data can be handled by a compiler, and most real-time systems can have scheduled "idle time" in which to perform overhead activities. (A SONAR project I worked on performed GC during the few milliseconds then the beam was sweeping through the hull.) Jan Steinman - N7JDB Electronic Systems Laboratory Box 500, MS 50-370, Beaverton, OR 97077 (w)503/627-5881 (h)503/657-7703
grichard@cherokee.cis.ohio-state.edu (Golden Richard) (09/26/89)
In article <5995@tekgvs.LABS.TEK.COM> jans@tekgvs.LABS.TEK.COM (Jan Steinman) writes: ><62342@tut.cis.ohio-state.edu (Golden Richard)> [stuff deleted] >The situation is not so simple in dynamic-bound systems. Are you claiming that >Smalltalk, Lisp, Eiffel, Objective C, Gnu Emacs, et. al. have garbage >collectors simply because those who program in them lack serious understanding >of their problems? > I am claiming no such thing. In languages where the choice for GC has been made *for* the programmer, little other choice remains. As a matter of fact, I do not even intend to say that GC is inferior to programmer-managed storage reclamation. The fact that I am relatively ignorant of applications best suited to using GC is why I'm asking the pro-GC spokespersons to please give a concrete example where one would truly need or want GC over the alternative. I'm certainly not of the "assembly language" school that believes that everything should be explicitly and intricately hacked out by the programmer. In particular, automatic scope entry/exit creation/destruction is a godsend. I just don't see the point in making things more complicated than they need to be. -=- Golden Richard III OSU Department of Computer and Information Science grichard@cis.ohio-state.edu "I'm absolutely positive! ...or not."
grichard@cherokee.cis.ohio-state.edu (Golden Richard) (09/26/89)
If anyone received multiple (somewhat mangled and incomplete) copies of my previous article, I would appreciate email. I was having some technical problems. Thanks in advance. ...and now back to the storage wars... -=- Golden Richard III OSU Department of Computer and Information Science grichard@cis.ohio-state.edu "I'm absolutely positive! ...or not."
twl@brunix (Ted "Theodore" (W) Leung) (09/26/89)
In article <599@hsi86.hsi.UUCP> wright@hsi.com (Gary Wright) writes: >It is interesting to note the advances that have been made in >optimizing compilers. Perhaps a sufficiently smart compiler for a language >that supports GC can figure out when GC can be avoided? That is to say, >the compiler can notice when the last reference to an object will be lost >and can explicity "dispose" of the object. In general, the compiler won't >be able to determine this (thus GC) but for the simple cases, why not? There has been some work done in this area..... @incollection{NDJSSM81, author = "Neil D. Jones and Steven S. Muchnick", title = "Flow Analysis and Optimization of LISP-like Structures", booktitle = "Program Flow Analysis: Theory and Applications", publisher = "Prentice Hall", year = "1981", editor = "Steven S. Muchnick and Neil D. Jones", chapter = "4", pages = "102-131" } Jones and Muchnick present some techniques which allow exactly the kinds of determinations which you are talking about. -------------------------------------------------------------------- Internet/CSnet: twl@cs.brown.edu | Ted Leung BITNET: twl@BROWNCS.BITNET | Box 1910, Brown University UUCP: uunet!brunix!twl | Providence, RI 02912
wright@hsi.UUCP (Gary Wright) (09/26/89)
In article <6579@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes: >From wright@hsi.UUCP (Gary Wright): >> I have seen a number of people claim that GC can not be used in a >> real-time system and then conclude that GC is a bad thing that should >> not be used. > > You're missing several important steps here having to do with > performance degradation and the violation of the general principle > that one should never do at run time that which can be done instead > at compile time, design time, etc.; in general, the sooner a problem > is handled (and similarly, the sooner an error is detected), the less > costly it will be to handle it. In my posting I tried to emphasize the fact that there is probably a trade-off between using a language that supports GC vs. one that doesn't. It is not clear to me that the relative costs are such as to render GC useless. I do believe that it is harder to measure the costs associated with non-GC languages. Please provide some reasoning for your statements which indicate that you feel differently. Simply claiming that the costs do not support GC is not enough. I will try to provide some reasoning for GC below. I agree that that as much as possible should be done at compile or design time. I would prefer that the compiler did most of the dirty work. Also, in the context of reusable software, design decisions should, as much as possible, not limit later reuse of the software (e.g. a clear violation of this is C++'s "virtual" and "inline" keywords). > Why pay and pay and pay at run time > when an efficient self-managing solution (valid forevermore) can be > used (and reused) instead? It is not clear to me that garbage collection entails a particularly harsh performance penalty. It is also not clear to me that the self-managing solution is simple enough to allow such components to be easily created and modified (i.e. modification via inheritance, which Ada does not support). After reading your posting, I went back and read the sections in "Object-oriented Software Construction" regarding memory management. What follows is mostly Bertrand Meyer's ideas regarding memory management. I tend to agree, so I will simply present his ideas with my comments. Bertrand Meyer claims that programmer-controlled deallocation "is unacceptable for two reasons: security and complication of program writing." By security, Meyer means that programmers are mortals, and they will make mistakes and will dispose of objects that still have active references. Complication refers to the fact that simply disposing of an object is not sufficient, all internal objects must also be disposed. Meyer calls this the "recursive dispose problem": This means that a specific release procedure must be written for any type describing objects that may refer to other objects. The result will be a set of mutually recursive procedures of great complication. [...] Instead of concentrating on what should be his job - solving an application problem - the programmer turns into a bookkeeper, or garbage collector (whichever metaphor you prefer). The complexity of programs is increased, hampering readability and hence other qualities such as ease of error detection and ease of modification. This further affects security: the more complex a program, the more likely it is to contain errors. This is the trade-off I was talking about. You can do without GC, but the increased complexity of your programs has its own penalty also. Meyer goes on to describe an alternative, the "Self-Management Approach" in which the designer of an ADT takes care of the storage management associated with the ADT. This is the approach I believe Bill Wolf is advocating. This is indeed a "responsible alternative" to leaving everything to the programmer. Meyer advocates the use of this technique for languages that do not support GC. In his example, all objects to be added to a linked list, are copied into storage that is managed by the linked list. This would seem to be quite inefficient for large objects. I'm not sure how storage could be safely managed when an ADT only holds references to objects. In his example, procedures that add nodes to or delete nodes from the linked list cause new nodes to be created or destroyed. If an ADT provides direct procedures to add or destroy objects within the ADT, then the ADT has not really hidden the problem from the programmer at all. At the most, it has hidden the recursive dispose problem. This approach does not solve the previously mentioned problems of security and complication. Instead of the applications programmer worrying about storage, the ADT designer must worry about it. My hunch is that the distiction between an ADT designer and an applications programmer is clear for objects like linked lists, stacks etc. but that it is not so clear the farther away from the basic data structures you get. I wonder if the overhead related to having ADT's managing storage via copying, and redundant code (all the recursive dispose procedures) doesn't come close to the overhead for GC. It should be noted, that Meyer advocates a GC facility that is incremental in nature (not bursty), and can be explicitly turned on or off when necessary (e.g. when real-time constraints exist). Your turn. :-) -- Gary Wright ...!uunet!hsi!wright Health Systems International wright@hsi.com
wright@hsi.UUCP (Gary Wright) (09/26/89)
In article <62546@tut.cis.ohio-state.edu> Golden Richard <grichard@cis.ohio-state.edu> writes: >If the >programmer can't define when a certain object can go poof, I suspect >a serious lack of understanding of the problem at hand. But I wonder if the code is structured so that a programmer can define when a certain object can go "poof", whether the program can be understood by anyone other than the original programmer and whether the code can be easily reused. Unfortunately, I don't have a good example at hand (a major problem with this discussion in general, as has been noted by others). -- Gary Wright ...!uunet!hsi!wright Health Systems International wright@hsi.com
ram@wb1.cs.cmu.edu (Rob MacLachlan) (09/27/89)
From: wright@hsi.UUCP (Gary Wright) Subject: Re: Re: Garbage Collection & ADTs Date: 25 Sep 89 15:14:32 GMT [...] >What we need to determine is if a language that supports GC allows programs >to be expressed in a form that is "better" than a language without GC. You are right on target here. It is silly to claim that "problem X can only be solved in language Y"; the interesting question is whether language Y admits a solution to X that is: -- More efficient -- More robust -- More easily (quickly) written -- Easier to understand (to maintain) -- Aids code reuse [token concession to the nominal subject] -- More likely to be correct, etc. >Perhaps a sufficiently smart compiler for a language that supports GC can >figure out when GC can be avoided? That is to say, the compiler can notice >when the last reference to an object will be lost and can explicity >"dispose" of the object. For a long time Lisp compilers have attempted to recognize when some objects can be allocated according to a stack discipline, rather than being heap allocated. This has been done primarily with numbers, lists and closures. The ML compiler has an interesting optimization along these lines: if all references to an allocated object can be identified as being in the same function, then the allocation can be replaced with a collection of temporary varaibles. Of course, both of these allocation optimizations are aimed at eliminating cons-and-throw-away behavior within a single function. Programmers used to thinking of storage allocation as expensive would never write such code. Rob
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)
From scc@cl.cam.ac.uk (Stephen Crawley): > Here are some problems where programmer-controlled storage management > would be very difficult or impossible. > > Interactive editting of cyclic graph data structures. You have a > heterogeneous cyclic graph data structure and the editor must be able > to make arbitrary sequences of changes to the data structure. How > do you detect that a subgraph has become detached? The fact that a subgraph has become detached does not imply that it is no longer part of a graph. Not all graphs are fully connected. Therefore, it isn't possible to release the storage used until there is some sort of directive which indicates that a given set of nodes and/or arcs is to be destroyed. > Dynamic loading & unloading of modules in a running program. > [...] the dynamic loader needs to know if the module is > currently being 'used' ... e.g. if some thread is executing a > procedure exported by the module or if some other part of the > application has salted away a reference into the module (e.g. a > procedure pointer) > > This problem arose when I was using Mesa / XDE to implement my entity system; > a persistent object system. I did get the dynamic loading to work ... and > it made a big performance difference, but dynamic unloading could not be > done safely, since the Mesa / XDE environment has no garbage collector. This is an operating-system problem, not an application problem. As such, the situation is not "applicable to a wide range of applications". The solution is operating-system dependent; one approach might be to use idle time to scan the processes in the process queue to determine if there are any processes which were started while the old module was in effect which made use of it. Operating systems do a lot of things (pointer arithmetic, etc.) which application systems should not be doing; this falls into exactly that category. Bill Wolfe, wtwolfe@hubcap.clemson.edu
tynor@prism.gatech.EDU (Steve Tynor) (09/27/89)
In article <6589@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes: ... >From scc@cl.cam.ac.uk (Stephen Crawley): >> Here are some problems where programmer-controlled storage management >> would be very difficult or impossible. >> >> Interactive editting of cyclic graph data structures. You have a >> heterogeneous cyclic graph data structure and the editor must be able >> to make arbitrary sequences of changes to the data structure. How >> do you detect that a subgraph has become detached? > > The fact that a subgraph has become detached does not imply that > it is no longer part of a graph. Not all graphs are fully connected. > Therefore, it isn't possible to release the storage used until there > is some sort of directive which indicates that a given set of nodes > and/or arcs is to be destroyed. But what if _in_this_case_, a detached subgraph is garbage and should be deallocated? G.C. surely simplifies the program and would enhance readability. If you can easily detect when to deallocate - more power to you. But, it's not always easy. ... >> Dynamic loading & unloading of modules in a running program. >> [...] the dynamic loader needs to know if the module is >> currently being 'used' ... e.g. if some thread is executing a >> procedure exported by the module or if some other part of the >> application has salted away a reference into the module (e.g. a >> procedure pointer) ... > This is an operating-system problem, not an application problem. > As such, the situation is not "applicable to a wide range of > applications". The solution is operating-system dependent; one > approach might be to use idle time to scan the processes in the > process queue to determine if there are any processes which were > started while the old module was in effect which made use of it. Again, this sounds like garbage collection to me (this time the OS is garbage collecting the unloaded module...) > Operating systems do a lot of things (pointer arithmetic, etc.) > which application systems should not be doing; this falls into > exactly that category. Are compilers not mini-operating systems (certainly, the Ada run-time environment is!)? I'd much rather have my compiler doing my pointer arithmetic (and storage management) than to expect my application to do so. I'm not against explicit allocation/deallocation - I just want an alternative for the cases when it's either too convoluted or impossible. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Eschew Obfuscation. Steve Tynor Georgia Tech Research Institute Artificial Intelligence Branch tynor@prism.gatech.edu
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)
From wright@hsi.UUCP (Gary Wright): > Bertrand Meyer claims that programmer-controlled deallocation "is > unacceptable for two reasons: security and complication of program > writing." By security, Meyer means that programmers [...] will > make mistakes and will dispose of objects that still have active > references. Unless they leave the task to an ADT instead. > Complication refers to the fact that simply disposing > of an object is not sufficient, all internal objects must also > be disposed. Meyer calls this the "recursive dispose problem": > > This means that a specific release procedure must be > written for any type describing objects that may refer > to other objects. The result will be a set of mutually > recursive procedures of great complication. Not true. The user supplies a Destroy procedure for whatever is being stored. The ADT, in the course of destroying itself, will call upon the user's Destroy procedure to handle the user's data type. The writer of the Destroy procedure need only consider destroying his ADT, since the user's Destroy procedure can be relied upon to destroy all lower levels. It's really rather simple. > Instead of the applications programmer worrying about storage, the ADT > designer must worry about it. Precisely. As an ADT designer, I consider the task of storage management to be among the least of my problems. The difficult part is providing all the concurrency-related hard guarantees, which languages like Eiffel manage to avoid by not providing multitasking in the first place. > My hunch is that the distiction between an ADT designer and an > applications programmer is clear for objects like linked lists, > stacks etc. but that it is not so clear the farther away from the > basic data structures you get. The distinction is clear: the ADT specification separates the application programmer from the ADT implementor, regardless of whether you perceive the structure as "basic" or not. > I wonder if the overhead related to having ADT's managing storage > via copying, and redundant code (all the recursive dispose procedures) > doesn't come close to the overhead for GC. Probably not, since this approach can exploit more information; it doesn't have to worry about figuring out whether or not a piece of space can be released, since it already knows it can. The Destroy procedures convey information which does not have to be rediscovered at great and continuing expense. > It should be noted, that Meyer advocates a GC facility that is > incremental in nature (not bursty), and can be explicitly turned on or > off when necessary (e.g. when real-time constraints exist). > > Your turn. :-) We know that GC can coexist with managed components, as in Ada, where components can declare that the GC system (if any) must not operate on objects of a managed type. We can easily agree that if GC is provided, there must be a way for managed types to turn it off as far as they are concerned, while leaving it potentially on for others. If a given piece of code contains only managed types, then the compiler should notice this fact and exclude GC code from the executable. The original point was that not all users can tolerate GC (real-time users in particular); on the other hand, all GC users can use managed components with no problems. Therefore, if a component is to be designed for the widest possible audience, that component must manage its own storage. If components are used which meet the highest possible standards, then we don't have to worry about whether our components will stop working when we do maintenance (which might introduce multitasking, real-time operation, etc.) on our application; using GC introduces such weaknesses, in addition to giving an *avoidable* run-time cost. Bill Wolfe, wtwolfe@hubcap.clemson.edu
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)
From amull@Morgan.COM (Andrew P. Mullhaupt): > In this newsgroup, we find other programmers arguing that > programmers should have no direct use for pointers, and so the language can > determine the correct lifetime of dynamic data. Not exactly. The argument is that pointers should only be used by ADT implementors, not by application programmers. > I suggest it is going to prove necessary for the languages which have > this "Pascalian" point of view to close all loopholes (typecasts to > integer, etc.) to prevent the widespread use of C style perfidious > trickery to escape from sensible programming practice. (If you give > them an inch...) The real problem is not the elimination of mechanisms such as Ada's Unchecked_Conversion, but the maintenance of a software engineering perspective; bad code can be written in any language. Bill Wolfe, wtwolfe@hubcap.clemson.edu
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)
From tynor@prism.gatech.EDU (Steve Tynor): >> The fact that a subgraph has become detached does not imply that >> it is no longer part of a graph. Not all graphs are fully connected. >> Therefore, it isn't possible to release the storage used until there >> is some sort of directive which indicates that a given set of nodes >> and/or arcs is to be destroyed. > > But what if _in_this_case_, a detached subgraph is garbage and should > be deallocated? OK, assume that we do want to kill detached subgraphs. Which of the two (or more) connected components is the one we wish to keep? If we assume that there is some distinguished node or arc which can never be deleted and which also serves to mark the "permanent" connected component, then presumably our graph abstraction will allow us to mark one such node or arc as being endowed with these magic properties. Now we need an oracle which will tell us if a connected component contains the magic node or arc. The ability to determine whether one node (or arc) is reachable if we start at another node or arc is a generally useful capability which should probably be provided to the user anyway, particularly since the algorithm need only be looked up in a standard text on graph theory and typed in. Since any service which is not explicitly used will be optimized away anyway, the user pays no real price for having the service provided. Having provided the service to the user anyway, we can now use it ourselves to determine which component to throw away. Now this is obviously a special-case response to this particular problem, but the problem is also a special case itself. Even if computing it ourselves has the same computational complexity as computing it with a garbage collection algorithm, there is still a gain in that we can search a specific area at a specific time and therefore improve the predictability of our service. We know that there will be no operation whose time complexity will be totally out of proportion to the size of our graph because we are paying for all garbage everywhere on the machine to be collected. Finally, there is the problem of the GC system repeatedly being invoked (at great expense) when the amount of storage remaining becomes fairly small. With a managed system, the user is given an opportunity to strategically throw away data structures, and thereby free up some space that the garbage collector could not possibly take because it is still "in use". Bill Wolfe, wtwolfe@hubcap.clemson.edu
djones@megatest.UUCP (Dave Jones) (09/27/89)
From article <62546@tut.cis.ohio-state.edu>, by grichard@hockey.cis.ohio-state.edu (Golden Richard): > > If the programmer can't define when a certain object can go poof, I suspect > a serious lack of understanding of the problem at hand. > Hear, hear. But as I pointed out earlier, it is also useful to have stack-objects and heap-objects that allow you to reclaim a whole bunch of objects all at one go.
djones@megatest.UUCP (Dave Jones) (09/27/89)
From article <599@hsi86.hsi.UUCP>, by wright@hsi.UUCP (Gary Wright): .. > > Well, I have *never* encountered a situation that absolutely demanded using > a high level language. That is why I still do all my programming in > assembly. :-) > Sorry, but I'm too tired of this assembler-hll analogy for a smiley to save you. Grumble. Ggrrrrrrrrrrrrmpf. Okay, okay, here: :-) (Sort of.) The analogy is particularly inappropriate in this context, because Assembler and HLL are two techniques for preparing a program, whereas automatically garbage-collected and explicitly garbage-collected programs are different end-results.* Automated garbage collection is runtime overhead. It will take quite a bit of convincing to make me believe that it is universally necessary or desirable. I'm not saying it's impossible, only that it's going to take better arguments than I've seen so far. So in the mean time, I want the option of not using it. So far I have used it approximately, now let's see.. approximately zero times, give or take once. * The principle advantage of high-level languages over assembler is usually that it makes the program portable. It is a myth that assembly language programming is necessarily tedious and slow. Back in the mid to late seventies, there were macro-assembler and macro packages that made assembly language programming virtually as easy as programming in high level languages. I wrote one once. It handled procedure call/return, nested loops, structures, and all that sort of languagy kind of stuff quite nicely. Those macro packages were the evolutionary antecedants of HLLs. Languages like C++ do automate some rather tedious name-space manipulation, and that is another win. But many of the even "higher" level languages kind of miss the boat by making the syntax of the language reflect library-routine functionality. I like new AWK, for example, but I really wish it were written as C++ classes, so that I could integrate it with my own stuff.
jesup@cbmvax.UUCP (Randell Jesup) (09/27/89)
In article <2079@hydra.gatech.EDU> tynor@prism.gatech.EDU (Steve Tynor) writes: >In article <6589@hubcap.clemson.edu> >>From scc@cl.cam.ac.uk (Stephen Crawley): >>> Here are some problems where programmer-controlled storage management >>> would be very difficult or impossible. >>> >>> Interactive editting of cyclic graph data structures. You have a >>> heterogeneous cyclic graph data structure and the editor must be able >>> to make arbitrary sequences of changes to the data structure. How >>> do you detect that a subgraph has become detached? ... >But what if _in_this_case_, a detached subgraph is garbage and should be >deallocated? G.C. surely simplifies the program and would enhance readability. >If you can easily detect when to deallocate - more power to you. But, it's >not always easy. But GC is unlikely to reclaim that storage. Yes, the graph no longer has pointers to the disconnected subgraph. However, the subgraph may be circular (you said it was cyclic), so all the nodes of the subgraph may still have references to them. Can current GC systems recognize such a loop and remove it in it's entirety? How complex can such a loop be and still be recognized? -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.cbm.commodore.com BIX: rjesup Common phrase heard at Amiga Devcon '89: "It's in there!"
scc@cl.cam.ac.uk (Stephen Crawley) (09/27/89)
Andrew P. Mullhaupt writes: >> In this newsgroup, we find other programmers arguing that >> programmers should have no direct use for pointers, and so the >> language can determine the correct lifetime of dynamic data. Bill Wolfe replies: > Not exactly. The argument is that pointers should only be > used by ADT implementors, not by application programmers. Some of the best designed programing languages don't have explicit pointers at all!! CLU, Eiffel and ML are good examples. -- Steve
scc@cl.cam.ac.uk (Stephen Crawley) (09/27/89)
From wtwolfe@hubcap.clemson.edu: >From tynor@prism.gatech.EDU (Steve Tynor): >>From wtwolfe@hubcap.clemson.edu: >>> The fact that a subgraph has become detached does not imply that >>> it is no longer part of a graph. Not all graphs are fully connected. >>> Therefore, it isn't possible to release the storage used until there >>> is some sort of directive which indicates that a given set of nodes >>> and/or arcs is to be destroyed. >> >> But what if _in_this_case_, a detached subgraph is garbage and should >> be deallocated? > > OK, assume that we do want to kill detached subgraphs. Which of > the two (or more) connected components is the one we wish to keep? > > If we assume that there is some distinguished node or arc which > can never be deleted and which also serves to mark the "permanent" > connected component, then presumably our graph abstraction will > allow us to mark one such node or arc as being endowed with these > magic properties. Now we need an oracle which will tell us if > a connected component contains the magic node or arc. > [...] Having provided the service to the user anyway, > we can now use it ourselves to determine which component to throw > away. Now this is obviously a special-case response to this > particular problem, but the problem is also a special case itself. In YOUR experience, the problem may be a special case. In my experience and that of a number of other posters this problem arises frequently in one form or another. > Even if computing it ourselves has the same computational complexity > as computing it with a garbage collection algorithm, .. which is not necessarily true. You are assuming that we have all of the options open to the garbage collector guru; e.g. access to VM page tables, micro-code, etc. Also, you are assuming that the grungy tricks that sophisticated GC algorithms play can be expressed in ADA and translated by the ADA compiler into efficient code. Please don't imagine that it is easy to get garbage collection overheads down to 5%!! > [...] there is still > a gain in that we can search a specific area at a specific time and > therefore improve the predictability of our service. There is nothing to stop the programmer from inserting a pragma to force the GC to reclaim some space at a given point in a program. > We know that > there will be no operation whose time complexity will be totally out > of proportion to the size of our graph because we are paying for all > garbage everywhere on the machine to be collected. You are making invalid assumptions about the garbage collector again. > Finally, there is the problem of the GC system repeatedly being > invoked (at great expense) when the amount of storage remaining > becomes fairly small. Modern garbage collectors can avoid this problem in all but extreme cases. Anyway, it is a bad idea to try to run a program that allocates memory dynamically with "just enough" memory, since under some circumstances "just enough" might actually be "just not enough". > ... With a managed system, the user is given > an opportunity to strategically throw away data structures, and > thereby free up some space that the garbage collector could not > possibly take because it is still "in use". Bogus argument! The case you are talking about is that where the programmer has a pointer to a data structure in a variable that is still in scope, but is not conceptually in use. You suggest that in the case of non-GC'ed storage the programmer could free the space early by calling the deallocator. Well in the GC'ed case, the programmer can arrange that the GC can free the data structure by assigning NIL to the variable. -- Steve
wright@hsi.UUCP (Gary Wright) (09/27/89)
In article <6591@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu writes: >From wright@hsi.UUCP (Gary Wright): >> Bertrand Meyer claims that programmer-controlled deallocation "is >> unacceptable for two reasons: security and complication of program >> writing." By security, Meyer means that programmers [...] will >> make mistakes and will dispose of objects that still have active >> references. > > Unless they leave the task to an ADT instead. We all agree that writing an ADT that correctly manages it's own storage is a difficult task. One of the benefits gained by using ADT's is that of reuse, the user of an ADT does not have to re-create solutions to problems that have been solved once and for all the the ADT provider. Why not use the same reasoning to state that GC solves the problem once and for all and each ADT designer does not have to re-create solutions for memory management? > >> Complication refers to the fact that simply disposing >> of an object is not sufficient, all internal objects must also >> be disposed. Meyer calls this the "recursive dispose problem": >> >> This means that a specific release procedure must be >> written for any type describing objects that may refer >> to other objects. The result will be a set of mutually >> recursive procedures of great complication. > > Not true. The user supplies a Destroy procedure for whatever > is being stored. The ADT, in the course of destroying itself, > will call upon the user's Destroy procedure to handle the user's > data type. The writer of the Destroy procedure need only consider > destroying his ADT, since the user's Destroy procedure can be relied > upon to destroy all lower levels. It's really rather simple. So you agree that there is a recursive dispose problem but you believe that it isn't that complicated? Writing these dispose procedures, is straightforward but tedious and *boring*. This is exactly the type of situation that leads to errors. > >> Instead of the applications programmer worrying about storage, the ADT >> designer must worry about it. > > Precisely. As an ADT designer, I consider the task of storage > management to be among the least of my problems. The difficult > part is providing all the concurrency-related hard guarantees, > which languages like Eiffel manage to avoid by not providing > multitasking in the first place. > Let's stick to GC and leave concurrency for another day. Granted, that this is an important issue, and will become more important but we should keep this discussion focused. Also, I will admit to not being as versed in this area as I would like to be. >> My hunch is that the distiction between an ADT designer and an >> applications programmer is clear for objects like linked lists, >> stacks etc. but that it is not so clear the farther away from the >> basic data structures you get. > > The distinction is clear: the ADT specification separates the > application programmer from the ADT implementor, regardless of > whether you perceive the structure as "basic" or not. > You seem to be implying that there are only two layers to any given program, the applications layer, and the ADT's used by the application. In general, there can be any number of layers. If you look at a language like Eiffel or Smalltalk, all components of a program are classes (ADTs). There is no language distiction between how a stack is constructed and how any other component of the program is constructed. The problems than an ADT designer has to deal with are not unique to the ADT's at the bottom of the heirarchy. > The original point was that not all users can tolerate GC > (real-time users in particular); I have stated that GC may not be appropriate in certain cases (real-time). Others have said that there is work being done on real-time GC. Are you saying that some users can "tolerate" GC? If so, in support of what type of users have your arguments against GC been made? If you have been arguing against GC for real-time users exclusively, that has not been clear. If you have in mind another set of users who can not tolerate GC please be specific. > on the other hand, all GC users > can use managed components with no problems. This is your opinion. I have described some of the problems related to managed components regarding complexity of constructing ADTs which means the complexity of the entire program if you consider a program to simply be a collection of related ADTs. Others have described problems with general graph structures. An area that hasn't been discussed here is reusing components via inheritance. I really haven't come to any conclusions reagarding the interaction of memory management and inheritance. Anybody else have some ideas? Another area that concerns me that hasn't been discussed is the interaction between exceptions and memory management. My understanding is that when an exception is raised, execution may resume after any number of "scopes" have been exited. Will the objects in those scopes be guaranteed to be destroyed? If exceptions need to be handled in the scope in which they were raised simply due in part to this memory leakage, has not the lack of GC caused the program to become more complex? And if an exception is always handled in the scope in which it is raised, why bother with using the exception mechanism? (I am refering to programmer defined exceptions such as stack underflow or overflow as opposed to hardware exceptions etc. which should probably be handled differently.) > Therefore, if > a component is to be designed for the widest possible audience, > that component must manage its own storage. If components are > used which meet the highest possible standards, then we don't > have to worry about whether our components will stop working > when we do maintenance (which might introduce multitasking, > real-time operation, etc.) on our application; using GC introduces > such weaknesses, in addition to giving an *avoidable* run-time cost. Perhaps GC does introduce problems with reuse in regards to multi-tasking and real-time operations, perhaps not, I'm not sure. I am anxious to see how multi-tasking will be handled in Eiffel. ISE has indicated that they are working on incoporating it into the OOP framework. The run-time cost of GC *may* be avoided in certain cases by using ADT controlled memory management. The point is that there *are* tradeoffs. Not using GC "might introduce" reuse problems also. What is the cost associated with "avoiding" GC? -- Gary Wright ...!uunet!hsi!wright Health Systems International wright@hsi.com
ted@nmsu.edu (Ted Dunning) (09/28/89)
In article <6589@hubcap.clemson.edu> billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) writes: From scc@cl.cam.ac.uk (Stephen Crawley): > Here are some problems where programmer-controlled storage management > would be very difficult or impossible. > > Interactive editting of cyclic graph data structures. You have a > heterogeneous cyclic graph data structure and the editor must be able > to make arbitrary sequences of changes to the data structure. How > do you detect that a subgraph has become detached? The fact that a subgraph has become detached does not imply that it is no longer part of a graph. Not all graphs are fully connected. Therefore, it isn't possible to release the storage used until there is some sort of directive which indicates that a given set of nodes and/or arcs is to be destroyed. good point. of course, languages which have true garbage collection can handle this problem, too, while bill wolfe seems to think he couldn't handle it in his paradigm. > Dynamic loading & unloading of modules in a running program. > [...] the dynamic loader needs to know if the module is > currently being 'used' ... e.g. if some thread is executing a > procedure exported by the module or if some other part of the > application has salted away a reference into the module (e.g. a > procedure pointer) ... This is an operating-system problem, not an application problem. As such, the situation is not "applicable to a wide range of applications". The solution is operating-system dependent; one approach might be to use idle time to scan the processes in the process queue to determine if there are any processes which were started while the old module was in effect which made use of it. this won't work in most cases since there will be at least one ancestral process that started very early and runs essentially forever. Operating systems do a lot of things (pointer arithmetic, etc.) which application systems should not be doing; this falls into exactly that category. but of course, this problem is trivial to handle in all the scheme dialects where continuations are available as first class objects, and where the garbage collector understands how to collect references from inside continuations. not only that, but the solution is portable. -- ted@nmsu.edu remember, when extensions and subsets are outlawed, only outlaws will have extensions or subsets
ted@nmsu.edu (Ted Dunning) (09/28/89)
In article <8012@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes:
But GC is unlikely to reclaim that storage. Yes, the graph no
longer has pointers to the disconnected subgraph. However, the
subgraph may be circular (you said it was cyclic), so all the nodes
of the subgraph may still have references to them. Can current GC
systems recognize such a loop and remove it in it's entirety? How
complex can such a loop be and still be recognized?
a modern gc only worries about references from currently accessible
variables, and thus an unreferenced cycle can be arbitrarily complex
and still be reclaimed. reference counting systems have trouble with
loops and since they usually have only a few bits of reference count,
they also tend to saturate easily. for this reason, they will
occasionally stop and do a normal garbage collection.
--
ted@nmsu.edu
remember, when extensions and subsets are outlawed,
only outlaws will have extensions or subsets
dl@g.g.oswego.edu (Doug Lea) (09/28/89)
First a warning, I do quite a lot of C++ programming and a perhaps slightly contentious definition, I take OO programming to be that which is in part or whole concerned - - | construction | with the | mutation | of objects | inspection | | destruction | - - (And, of course, OO design to be concerned with the specification, definition, interrelationships, etc., of such objects...) Only half-facetiously, one could make the vicious claim that GC removes programmer control over 1/4 of the OO programming paradigm: In C++, at least, the notion of destruction of an object may well require some structured, active, resource deallocation, and need not just consist of passive reclamation of memory cells. The classic example where such control is useful is a Window object that needs to erase itself from view upon destruction. In a GC-based system, the mechanisms for implemententing such actions seem less than natural -- either the programmer would need to `logically' destroy the window, and then let the collector sometime later reclaim space, or perhaps the collector could be designed to do all kinds of storage reclamation associated with the object at some indeterminate time. My points are just that Structured destruction is a good idea. (As is structured construction.) Garbage collection can get in the way of this. And again, half-facetiously: GC is sometimes a byproduct of functional programming methodologies that allow programmers to pay attention only to `values', and not the objects required in order to arrive at them. All that said, I think GC is a very useful tool! Really! It is a serious contender if Destruction consists only of releasing storage and The lifetimes of objects are so unstructured/complex that manual reclamation is either slower, or more error-prone than GC. This kind of situation comes up a lot. But often a redesign that forces a more structured lifetime discipline supporting simpler techniques is at least as attractive. I take that as Wulf's main point. I think it's a good one. How about a real example? For the amusement of C++ fans following this discussion, I'll append one below. It's a fun variation of a classic kind of situation where GC is commonly used. There are Cells that may be linked together to create computational pipelines. While I set it up to use GC, other choices are possible. Here are some of the not-terribly-profound issues that came to my mind while writing it as a classroom example: * The main() doesn't really exercise the collector. One chain is created and destroyed during the run. * If this were a one-shot closed design, culminating only in a cute but slow prime number sieve program, the best solution is probably to do nothing, letting all deletion occur at program termination, since there are no other useful aspects of destruction. * If this were still self-contained, but a variation of the current main() were used as a callable function, then it would make sense to insert code to mark the base of the chain, use a stack-based allocator, and release it on destruction via modified constructor/ destructor code. You'd also want to insert code to ensure that at most one chain were active at a time. * In fact, the *current* code using Sieve consists only of single chains of Cells. If only such applications were supported, * One could delete space via a virtual no-op destructor in Cell, and one in Filter like ~Filter() { delete source; } to cause a chain of deletions to occur on destruction. * Except that the Cells do not know if their sources were dynamically allocated or not, so `delete source' can be an error if the source is a local. Perhaps, Cells could refuse to support `local' creation, and only exist dynamically. Alternatively some bookkeeping could be added inside each Cell to record whether it was dynamically allocated, and constructors and destructors would need to use it. The former is probably a better strategy, since the implementation is not set up to cleanly support by-value argument passing of Cells, etc., anyway, and there seems to be every reason not to do so. * Digression: Actually, this is all a bit of a pain in C++. In order to avoid clients having to say `new Whatever', and thereafter dealing with pointers, you'd need to set up a parallel set of classes, each of which served as pointers to the `real' objects of the corresponding classes, and then took care of the operator `new' stuff on creation, etc. It's a case where the convention taken in Eiffel, Smalltalk, etc., that names are by default just bound to references to objects, not the objects themselves, would be more convenient. In these kinds of situations, C++ programmers often settle for requiring clients to use pointers or references explicitly. But even this has drawbacks, since, while you can force clients to only create objects via `new', the only means for doing this leaves the restriction as a mostly run-time, not compile-time error. * But it is very possible for someone to make two Filters point to the same source, or to create new subclasses with multiple sources or sinks. * The simple GC solution would continue to work in such cases. * In order to deal with this and still keep virtual destructors, some form of reference counting would be required. Even this won't suffice if forms of circular paths that give reference counters problems were created, but such possibilities seem extremely unlikely, and defensible as a documented restriction, akin to the builtin limitation that only, say 32767 references are allowed before an object is considered permanent. The work involved in adapting this code to use reference counts is not great, but is greater than using GC. The overhead for reference counting vs GC would probably be close, depending on the quality of the implementations. In either case, storage management would probably amount to a tiny percentage of execution time. * Except that it makes no sense in *this* code for anyone to hook up two Filters to the same source, so perhaps Cells should record whether they've been used as sources and refuse to comply if they are re-attached. But this is reference-counting, just used in a different way. * Except that maybe someone might want to make a subclass of Cells (e.g., random number generators) that would be useful to multiply tap. * In either case, reference-counting might be more flexible, since it could be used both for allocation and to guard against some kinds of Cells being multiply tapped when this is would be an error. * What about programmers creating subclasses that display themselves in action on a screen, or record their actions in a log file? In such cases the reference-counting version seems necessary, since destruction would require real-time screen updates or file manipulation. So, perhaps, reference-counting would be more appropriate, since it may better serve the goal of a robust, reusable, extensible design. I'll leave implementation as an exercise to the reader (in other words, I'm too lazy to do it out right now!) The example has no particular deep moral, except to reflect my feeling that thinking about the full lifetimes of objects is good for you! It helps focus attention on various aspects of the semantics of new types in ways you otherwise might not have thought much about. And further, that GC-oriented issues and reusability *do* interact, but that GC need not always be chosen on such grounds. I suppose it's worth a mention that the issues based on extensibility via inheritence (i.e., design reuse) don't much come up in Ada. Perhaps others could offer other analyses of this or other examples. ------------------------------- sieve.cc // Fun with the prime number sieve. // Based loosely on R. Sethi ``Programming Languages'', sec 6.7 // link to Boehm's comp.sources.unix C GC routines ... extern "C" void gc_init(); extern "C" void* gc_malloc(unsigned int); class GC // ... but wrap gc routines as a C++ class { public: GC() { gc_init(); } void* alloc(unsigned int sz) { return gc_malloc(sz); } }; class Cell // an object with state and a way to update and report it { private: static GC gc; // one collector for all Cells protected: int val; // the value to output next virtual void update() {} // the updater; default as no-op public: Cell(int initial=0) :val(initial) {} int next() { update(); return val; } // do one step // set class allocation ops to use gc void* operator new(unsigned int sz) { return gc.alloc(sz); } void operator delete(void*) {} }; GC Cell::gc = GC(); // static class member initialization // must be done outside class decl in C++ class Counter : public Cell // output consecutive integers { protected: void update() { ++val; } public: Counter(int initial=0) :Cell(initial) {} }; class Filter : public Cell // transform input from another Cell { protected: Cell* src; // input source int inp; // last input void get() { inp = src->next(); } public: Filter(Cell* s, int initial=0) :Cell(initial), src(s), inp(0) {} }; class ModFilter : public Filter // hold last input not divisible by divisor { private: int divisor; int divides() { return inp % divisor == 0; } public: ModFilter(Cell* s, int d) : Filter(s), divisor(d) {} void update() { do get(); while (divides()); val = inp; } }; class Sieve : public Filter // Serve as terminal node of a prime number sieve { public: Sieve() :Filter(new Counter(1)) {} void update() { get(); src = new ModFilter(src, val = inp); } }; #include <stream.h> main(int argc, char** argv) // exercise everything a little { if (argc < 2) { cerr << "error: enter desired number of primes as program argument\n"; exit(1); } int n = abs(atoi(argv[1])); Sieve s; for (int i = 0; i < n; ++i) cout << s.next() << "\n"; } Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367 email: dl@oswego.edu or dl%oswego.edu@nisc.nyser.net UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl -- Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367 email: dl@oswego.edu or dl%oswego.edu@nisc.nyser.net UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl
scc@cl.cam.ac.uk (Stephen Crawley) (09/28/89)
Randell Jesup writes: > But GC is unlikely to reclaim that storage. Yes, the graph no longer > has pointers to the disconnected subgraph. However, the subgraph may be > circular (you said it was cyclic), so all the nodes of the subgraph may still > have references to them. Can current GC systems recognize such a loop and > remove it in it's entirety? Of course. Only brain-dead garbage collectors that rely exclusively on reference counts can't get rid of cycles. (Reference counting garbage collectors are bad news anyhow, since they entail the overhead of a ref count inc/dec on every assignment.) > How complex can such a loop be and still be recognized? Arbitrarily complex. -- Steve
wright@hsi.UUCP (Gary Wright) (09/28/89)
In article <8309@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes: >From article <599@hsi86.hsi.UUCP>, by wright@hsi.UUCP (Gary Wright): >.. >> >> Well, I have *never* encountered a situation that absolutely demanded using >> a high level language. That is why I still do all my programming in >> assembly. :-) >> > >Sorry, but I'm too tired of this assembler-hll analogy for a smiley to save >you. Grumble. Ggrrrrrrrrrrrrmpf. Okay, okay, here: :-) (Sort of.) > But you missed the point. It doesn't matter what specific analogy is used. I was trying to point out the logical flaw in the argument. The point is that just because you have never used a technique, or have never found a situation for which the techniques was not sufficient doesn't mean that another technique could be used or might even make your job easier. [comments regarding powerful macro assemblers ommitted. GRW] > Those macro packages were the evolutionary antecedants of HLLs. So you were using new techniques to make your job easier at *perhaps* a loss in generality and raw performance. That is the analogy I was trying to make. -- Gary Wright ...!uunet!hsi!wright Health Systems International wright@hsi.com
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/29/89)
From scc@cl.cam.ac.uk (Stephen Crawley): >> ... With a managed system, the user is given >> an opportunity to strategically throw away data structures, and >> thereby free up some space that the garbage collector could not >> possibly take because it is still "in use". > % Bogus argument! The case you are talking about is that where the % programmer has a pointer to a data structure in a variable that is % still in scope, but is not conceptually in use. You suggest that % in the case of non-GC'ed storage the programmer could free the space % early by calling the deallocator. Well in the GC'ed case, the % programmer can arrange that the GC can free the data structure by % assigning NIL to the variable. No, I was referring to space that is still in use, but which has marginal value and could be chucked if necessary. Bill Wolfe, wtwolfe@hubcap.clemson.edu
madany@m.cs.uiuc.edu (09/29/89)
/* Written 8:29 pm Sep 26, 1989 by jesup@cbmvax.UUCP in m.cs.uiuc.edu:comp.sw.components */ >> But GC is unlikely to reclaim that storage. Yes, the graph no longer >> has pointers to the disconnected subgraph. However, the subgraph may be >> circular (you said it was cyclic), so all the nodes of the subgraph may still >> have references to them. Can current GC systems recognize such a loop and >> remove it in it's entirety? How complex can such a loop be and still be >> recognized? /* End of text from m.cs.uiuc.edu:comp.sw.components */ GC does reclaim the storage. That's one of the big reasons why people advocate it. A good garbage collector will reclaim loops of ANY complexity IFF they are garbage. I don't have the best reference handy, but page 682 of "Smalltalk-80 The Language and its Implementation" may give you a hint as to how such GC works.
jans@tekgvs.LABS.TEK.COM (Jan Steinman) (09/30/89)
<..A good garbage collector will reclaim loops of ANY complexity IF they are garbage. I don't have the best reference handy, but page 682 of "Smalltalk-80 The Language and its Implementation" may give you a hint as to how such GC works.> Smalltalk-80, as described in the "Blue Book", uses a combination of reference counting and mark-sweep. Few, if any, modern implementations use such an inefficient scheme. "Generation scavenging" seems to be the latest and greatest, and it also reclaims arbitrarily long cyclic garbage. It has some drawbacks if loops contain garbage of widely differing ages, but evidence suggests (at least in Smalltalk) that cyclic garbage tends to be contemporary. * Caudill, P "A Third Generation Smalltalk-80 Implementation", OOPSLA 86 Proceedings, Oct 1986. * Deutsch, LP "An Efficient Incremental Automatic Garbage Collector", CACM Sep 1976. * Lieberman, H "A Real-Time Garbage Collector Based on the Lifetimes of Objects", CACM, Jun 1983. * Ungar, D "Generation Scavenging: A Non-disruptive High Performance Storage Reclamation Algorithm", SIGSOFT/SIGPLAN Proceedings, Apr 1984. Jan Steinman - N7JDB Electronic Systems Laboratory Box 500, MS 50-370, Beaverton, OR 97077 (w)503/627-5881 (h)503/657-7703