g-rh@cca.CCA.COM (Richard Harter) (03/31/88)
In article <628@ra.rice.edu> boehm@titan.rice.edu (Hans Boehm) writes: > My experience is that any attempt at manipulation of interesting linked >structures in a language that doesn't support real automatic storage >management will either fail, or waste large amounts of debugging time. >(My experience includes a (working) 40,000 line compiler, written in C, that >manipulates a reference counted syntax "tree", or more accurately, a >reference counted syntax graph.) Normally, it is extremely difficult >to track down bugs created by premature deallocation. When such bugs are >finally removed, the resulting programs normally include a substantial >number of storage leaks. > Some recent experiments by Mark Weiser and myself with retrofitting a >garbage collector to existing C programs, verify the latter point. >(The garbage collector should never free anything since that was the >programmers responsibility. It does. In other people's code. >Our paper on this should appear in Software P&E shortly.) >Mike Caplinger reported similar results for another application at the last >USENIX conference, I believe. We have resurrected C code with storage >management bugs by linking it against a garbage collector (which in the case >of C doesn't always work either, but it has a better track record than manual >storage management). There is problably something I am missing, but I don't quite see how you can implement garbage collection in C without collateral assumptions -- how is the garbage collector supposed to know that that a block is free? Your main point is certainly true - manual allocation and deallocation with the programmer being responsible for ensuring that all comes out right really doesn't work very well. This was a critical issue for us -- our software has to run on systems which do not have pseudo-infinite memory and it sometimes has to run for very long runs. We can't afford memory leaks (or premature deallocation). What we did was to write our own storage allocation package. Salient features: The allocation routine takes two arguments, a size and an ID. The latter is a unique integer specifying that particular call. (We manage the ID list manually.) The allocator creates a node for each request block. In this node are sundry links, the ID, and (in effect) a date stamp. There is also a hash table that contains the address of every allocated block. All allocator control information is stored in a separate space from allocated memory itself. (This eliminates a large class of mystery bugs associated with overwriting allocator control data.) The deallocation routine takes one argument, the address of the block being returned. The deallocation routine validates the address as being an allocated address. This eliminates another class of bugs caused by passing an improper or stale address to the deallocation routine. There is a facility for printing out a complete map of allocated memory including call ID's and date stamps. We use this, from time to time, to check the code for storage leaks. This scheme is not as good as automatic garbage collection, but we have found it to be satisfactory. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
boehm@titan.rice.edu (Hans Boehm) (04/01/88)
In reference to g-rh@cca.CCA.COM (Richard Harter)'s question: > There is problably something I am missing, but I don't quite see >how you can implement garbage collection in C without collateral assumptions >-- how is the garbage collector supposed to know that that a block is free? This is usually, but not always, possible. The idea is to use a traditional non-compacting mark-sweep collector. On a UN*X system, we start by scanning the registers, stack, data, and (statically allocated) bss segments for any addresses of valid allocated objects. We subsequently scan allocated objects we find in the same way. For a typical C program, all reachable objects can be found in this manner. (Storing exclusive or'ed pointer values, tagged pointers, and similiar programming tricks, as well as some rarely used (for C) compiler optimization techniques may break this scheme.) In theory, we may end up marking a few unreachable objects as reachable as well, since integers may be misinterpreted as pointers. The collector can be designed so that the only result of this is excess storage retention. (This does require that we don't compact.) In my experience, I have never seen any indication of such unnecessary retention in practice. There is an argument to be made that no garbage collector is completely immune from this phenomenon. The fact that the collector has to perform a fairly complicated (but still O(1)) check for pointer validity does slow it down a little. On a Sun 3/260, with a 2 Meg heap, half of which is in use, we normally see collection times of about 3 seconds. (This assumes longword aligned pointers and small data and static bss areas.) All of this is largely irrelevant if the collector is used as a debugging tool to detect storage leaks. More details can be found in Mark Weiser's and my upcoming Software P&E article entitled "Garbage Collection in an Uncooperative Environment". Hans-J. Boehm boehm@rice.edu
g-rh@cca.CCA.COM (Richard Harter) (04/01/88)
In article <632@ra.rice.edu> boehm@titan.rice.edu (Hans Boehm) writes: >> There is problably something I am missing, but I don't quite see >>how you can implement garbage collection in C without collateral assumptions >>-- how is the garbage collector supposed to know that that a block is free? > This is usually, but not always, possible. The idea is to use a traditional >non-compacting mark-sweep collector. On a UN*X system, we start by scanning >the registers, stack, data, and (statically allocated) bss segments for >any addresses of valid allocated objects. We subsequently scan allocated >objects we find in the same way. For a typical C program, all reachable >objects can be found in this manner. (Storing exclusive or'ed pointer >values, tagged pointers, and similiar programming tricks, as well as some >rarely used (for C) compiler optimization techniques may break this scheme.) I see. My block was in not thinking of the entire address space of the program as accessible. For our purposes it is not (we live in portability land and are not allowed to do things like that :-)). That is one reason we did our own allocator (in C) -- it allows us to track down memory problems portably. I still don't see how a garbage collector helps with premature deallocation, unless you scrap deallocation entirely, and rely entirely on garbage collection. The problem with garbage collectors in the old days was that they did garbage collection when the available space was allocated. This works well enough until the actual amount of space allocated starts to approach the amount of available space, and then the garbage collector starts thrashing. Is this a problem? If not, how do you deal with it? -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
boehm@titan.rice.edu (Hans Boehm) (04/02/88)
Richard Harter writes: >I still don't see how a garbage collector helps with premature deallocation, >unless you scrap deallocation entirely, and rely entirely on garbage >collection. >The problem with garbage collectors in the old days was that they did >garbage collection when the available space was allocated. This works >well enough until the actual amount of space allocated starts to approach >the amount of available space, and then the garbage collector starts >thrashing. Is this a problem? If not, how do you deal with it? Our current version of the collector doesn't help you with premature deallocation. Further instrumenting it might help. For example, you could use the marking algorithm to determine whether "free"d objects really were inaccessible. (This requires a coding style in which unused references are explicitly cleared. Replacing "free" by a macro that clears its argument takes care of a lot of that and, by itself, occasionally helps track down bugs.) I don't have much experience with this though. In non-real-time applications, we tend to use explicit deallocation where it's easy to do, and let the collector take care of the error-prone cases. You are right in that the garbage collector code is not completely portable. It should really be viewed as part of the run-time library, which is rarely completely portable anyway. Actually porting it isn't terribly hard. There is only about a screenful of machine dependent code in the collector. The longest part of that deals with marking from the machine registers. Frequent collections in small address spaces can become a problem. We operate in a virtual memory environment, so we deal with it by expanding the heap size whenever a collection fails to return enough memory. This requires allocating a bit of extra memory. For debugging, this hardly matters. (Memory is supposedly cheap.) In a production environment, especially without virtual memory, it could be a problem. On the other hand, for some existing code, this can be much less significant than memory otherwise lost to leaks. Hans-J. Boehm (boehm@rice.edu)
nevin1@ihlpf.ATT.COM (00704a-Liber) (04/05/88)
[followups to comp.lang.misc] In article <26358@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes: > There is problably something I am missing, but I don't quite see >how you can implement garbage collection in C without collateral assumptions >-- how is the garbage collector supposed to know that that a block is free? The only way to do this is to allocate a giant heap with malloc() and do *all* the memory management on your own. It is no different than implementing in C or C++ a language that does automatic garbage collection. The problem I found with languages that do automatic garbage collection is that there is no way for me to modify the implementation of the class (or the type, for those of you unfamiliar with C++ lingo :-)) to make it better suit my needs. For applications where the programmer cost is greater than the run-time cost (such as prototypes, personal tools, one-shot programs), I tend to use languages with built in memory management. For other applications, however, I would rather use a language like C++ which allows me to change the implementation of a class without affecting the rest of my code. -- _ __ NEVIN J. LIBER ..!ihnp4!ihlpf!nevin1 (312) 510-6194 ' ) ) "The secret compartment of my ring I fill / / _ , __o ____ with an Underdog super-energy pill." / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/12/88)
From article <124000026@inmet>, by ron@inmet.UUCP: > > Garbage collection certainly does not come for free, but it is extremely > useful. It frees the programmer from the need to repeatedly write > sophisticated ADT deallocate routines and to deal with the potentially > massive book-keeping requirements for determining whether an object is in > fact garbage. In general, lack of GC forces a programmer to invest a lot of > effort addressing issues that are not fundamentally related to the problem > domain. My experience has also shown that problems of "slow heap leakage" > are among the hardest errors to correct. "Repeatedly" write sophisticated ADT deallocate routines?? The basic idea is to write an ADT once and use it forever (i.e., until the next language modification :-) ); where does "repeatedly" come in? Dealing with the details of allocation requires as much effort as dealing with deallocation. Perhaps you would say that any maintenance of the structure used to represent your ADT is not fundamentally related to the problem domain. If so, I will have to disagree. ADTs, and computer programs in general, are characterized by both time complexity and space complexity. Making the tradeoffs between these forms of complexity is one of the most fundamental questions a programmer must deal with. This is the topic of much, if not most, of the literature regarding ADTs. Also, I disagree with the claim that a lot of effort is involved; in my experience, DESTROY routines are among the easiest to code. Usually, the ADT is defined recursively, lending itself to a very easy recursive destruction procedure; after that, just destroy any sub-ADTs in your descriptor, and from there it only takes one line of code to blow away the descriptor. About the hardest part of the whole process is the tedium of looking through to see what fields are user-defined sub-ADTs, since they have to be explicitly destroyed due to Ada's lack of integration between UNCHECKED_DEALLOCATION and user-defined DESTROY procedures. Bill Wolfe wtwolfe@hubcap.clemson.edu
gateley@m2.csc.ti.com (John Gateley) (12/12/88)
In article <3832@hubcap.UUCP> wtwolfe@hubcap.clemson.edu writes: >From article <124000026@inmet>, by ron@inmet.UUCP: >> >> Garbage collection certainly does not come for free, but it is extremely >> useful. It frees the programmer from the need to repeatedly write >> sophisticated ADT deallocate routines and to deal with the potentially >> [...] > "Repeatedly" write sophisticated ADT deallocate routines?? [...] > where does "repeatedly" come in? The repeatedly comes in because you write more than one ADT, each requiring a/many deallocation routines. > Dealing with the details of allocation requires as much effort as > dealing with deallocation. [...] This is not true. When you allocate an object, all you need is "free" memory. The allocator does not have to worry about whether or not the object is in use etc. Deallocation, when done correctly, requires some sort of checking to make sure no currently in use object refers to the object being deallocated. This is a much more complicated problem. > [...] > Also, I disagree with the claim that a lot of effort is involved; > in my experience, DESTROY routines are among the easiest to code. > Usually, the ADT is defined recursively, lending itself to a very > easy recursive destruction procedure; [...] This method is fine, as long as no structures are shared. But when you share structure, then the problem is more complex. A sub adt-object can be used in more than one object. Deleting an object which contains this sub-object may or may not require deleteing the sub object. One answer is "reference counting", adding a field to each object which says how many people refer to it. Automating this gives reference count garbage collection, which is slower than the GC techniques in current use. So if your programming style never uses shared structure, then you don't need GC. If it does, then GC is a valuable time saving tool. > Bill Wolfe > wtwolfe@hubcap.clemson.edu John Gateley gateley@tilde.csc.ti.com
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/13/88)
From article <65475@ti-csl.CSNET>, by gateley@m2.csc.ti.com (John Gateley): >> Also, I disagree with the claim that a lot of effort is involved; >> in my experience, DESTROY routines are among the easiest to code. >> Usually, the ADT is defined recursively, lending itself to a very >> easy recursive destruction procedure; [...] > > This method is fine, as long as no structures are shared. [...] > [...] if your programming style never uses shared structure, then you > don't need GC. Precisely. Structural sharing is nothing more than a euphemism for hacking. It is the spatial equivalent of what hackers enjoy doing with time in the name of efficiency, in their unmitigated zeal to violate every form of abstraction, to throw all traces of readability and reliability to the winds. If GC were prohibited, it would infuriate all those who enjoy hacking with space as opposed to time. Good. Let them use C. Ada is the language of those who appreciate that "abstraction is the fundamental tool with which complexity can be effectively managed", and recognize that to violate an abstraction (rather than design another which is more appropriate to their needs) is to be penny wise but pound foolish. Folks, space management in Ada is NOT that difficult. I had problems with space management in Pascal, but Ada-based ADTs have basically made it into a non-issue. The real challenges which remain are problems which are inherent in the language itself, such as getting local ADTs to be properly destroyed as an automatic consequence of a block exit. I agree that existing tools for detecting space-wasting ADTs are inadequate, but let's call for better debugging tools rather than going about advocating the practice of spewing garbage. Bill Wolfe wtwolfe@hubcap.clemson.edu
klaiber@june.cs.washington.edu (Alexander Klaiber) (12/13/88)
In article <3848@hubcap.UUCP> wtwolfe@hubcap.clemson.edu writes: >From article <65475@ti-csl.CSNET>, by gateley@m2.csc.ti.com (John Gateley): >> This method is fine, as long as no structures are shared. [...] >> [...] if your programming style never uses shared structure, then you >> don't need GC. > Precisely. > Structural sharing is nothing more than a euphemism for hacking. > It is the spatial equivalent of what hackers enjoy doing with time > in the name of efficiency, in their unmitigated zeal to violate every > form of abstraction, to throw all traces of readability and reliability > to the winds. Apart from the amount of cheap flames you've thrown in, do you think you REALLY have thought this out? Sure, structure-sharing *is* sometimes used with the goal of conserving memory, but there are cases where it is vital; especially when you're doing object-oriented like programming. Assume I maintain mailing-lists of people and I represent the lists by pointers to objects representing one person each; i.e. type person is (some adt) type mailinglist is new list(person); Now if I have more than one mailing-list, obviously I have structure-sharing: a given person may appear on more than one list. And it is *NOT* very wise to create one copy of each person for every mailing list that person appears on: people can change the addresses and by having just *one* instance of the person around, there is only *one* place where it needs to be changed -- these changes will be completely transparent to the mailing lists or, for that matter, *FOR ANY OTHER REFERENCES TO THE PERSON* that I might later add to the system! In a system such as this, I will need *some* form of keeping track just how many references to a given object (a person) exist, so I know when I can safely deallocate it --- now one method is to have the programmer take care of it, but that might require making the (mailinglist) and (person) know about each other, thus creating additional dependencies and limiting further extensibility. The other method is to supply GC. Now you *can* do without structure sharing by creating "deep copies" of each reference to any given object --- but you will have to live with the "updating problem" which is well-known in databases where multiple copies of objects are kept. The bottom line is that structure-sharing can be a valuable abstraction tool, and one you apparently are not aware of. > If GC were prohibited, it would infuriate all those who enjoy hacking > with space as opposed to time. Good. Let them use C. Ada is the Why do you want to explicitly *prohibit* it? > Folks, space management in Ada is NOT that difficult. Depends on what kinds of applications you are writing. Note that I do not ask that GC be *required* (obviously that would be rather foolish in real-time applications), but it is sure nice to have it, since it can greatly help to reduce the complexity of your programs by eliminating all these storage-management chores and allowing you to concentrate on the real problems. It sure eliminates a *lot* of possibilities for errors and actually makes programs much more reliable and readable. Maybe *you* are throwing readability and reliability in the wind... Alexander Klaiber klaiber@june.cs.washington.edu P.S.: For complex applications as these, I tend to programm in Smalltalk rather than in Ada, so I might be somewhat biased. But I *have* done o-o programming in C++, and believe me, storage management can be *nasty*!
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/14/88)
From article <6702@june.cs.washington.edu>, by klaiber@june.cs.washington.edu (Alexander Klaiber): > Assume I maintain mailing-lists of people and I represent the lists by > pointers to objects representing one person each; i.e. > > type person is (some adt) > type mailinglist is new list(person); > > Now if I have more than one mailing-list, obviously I have structure-sharing: > a given person may appear on more than one list. No, this isn't structural sharing, because you are explicitly manipulating a list of pointers. Structural sharing occurs when you have two objects, A and B, and making some modification to B causes a modification to A as well, because they share a portion of their structure. The structure of a pointer is simply the address field, and the address fields are not being shared. Now if we had two ADTs, both implemented by *hidden* pointers, and assignment failed to result in a deep copy, then THAT would be structural sharing. In this particular instance, it would be best to store the key by which a person could be identified (social security number, for example) in the list, and then using the key to retrieve the current address from the Person database. Thus, a mailing list would be a list of social security numbers. Bill Wolfe wtwolfe@hubcap.clemson.edu
sommar@enea.se (Erland Sommarskog) (12/14/88)
Bill Wolfe (wtwolfe@hubcap.clemson.edu) refuses to see the point with garbage collection: > "Repeatedly" write sophisticated ADT deallocate routines?? The basic > idea is to write an ADT once and use it forever (i.e., until the next > language modification :-) ); where does "repeatedly" come in? But you have to write the same boring code for every ADT you implement! That was what ron@inmet was talking about. > Also, I disagree with the claim that a lot of effort is involved; > in my experience, DESTROY routines are among the easiest to code. > Usually, the ADT is defined recursively, lending itself to a very > easy recursive destruction procedure; after that, just destroy >... If it is just as easy as that. Let's say that you write a package for double-linked lists. You provide a procedure for taking an element out of a list. Should you free the memory allocated? You don't know because the caller may or may not still keep an reference to the element. He may be totally uninterested in it, but he may also want to insert it in another list. So you end up providing another procedure for deallocating the element, which the user must to remember to call, unless he wants his memory resources run low. And same goes for every ADT you define. You must provide a deallocation routine that the user must remember to call. What worse is, the user may notice if he omits the call, because the implementation of the ADT is a static type, and the deallocation is void. And when he is ready with his tests, you change the implementation, and his program blows the memory. Although you refuse to see it, garbage collection saves the programming community from a lot of headache and true object-oriented programming would be impossible without it. -- Erland Sommarskog ENEA Data, Stockholm sommar@enea.se "Frequently, unexpected errors are entirely unpredictable" - Digital Equipment
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/14/88)
From article <6702@june.cs.washington.edu>, by klaiber@june.cs.washington.edu (Alexander Klaiber): > > Why do you want to explicitly *prohibit* it [garbage collection]? > Basically, I charge garbage collection with the same crime that the GOTO was charged with: its sole function is to facilitate UNDISCIPLINED programming. From Tremblay and Sorenson, 1985: A common thought among proponents of the GOTO is: "I just might need it; something might come up". The answer to this appears to be: "Nothing ever does". If this charge can be successfully prosecuted, then garbage collection will face the same penalty: the total elimination of the facility, in favor of a more disciplined environment. Bill Wolfe wtwolfe@hubcap.clemson.edu
klaiber@june.cs.washington.edu (Alexander Klaiber) (12/14/88)
In article <8812131536.AA08238@galaxy.compass.com> worley@compass.UUCP (Dale Worley) writes: >I will add that garbage collection is one of the greatest aids to >readabilty and reliability, because it takes a complicated and >error-prone part of programming (reclaiming storage) and eliminates it >completely. It probably shouldn't be required in Ada, but as an aid >in programmability, it's great. Thank you very much, exactly my point. At least I'm not alone :-) ========================================================================== In article <3861@hubcap.UUCP> wtwolfe@hubcap.clemson.edu writes: > No, this isn't structural sharing, because you are explicitly manipulating Call it what you want, the fact is that multiple references exist to one object, thus making explicit (i.e. by the programmer) storage management a nightmare and an unnecessary chore. GC deals with this situation easily. > In this particular instance, it would be best to store the key by which > a person could be identified (social security number, for example) in > the list, and then using the key to retrieve the current address from > the Person database. That is, if you are willing to willing to pay the cost for the additional data base lookup (usually an O(log n) operation), as well as the additional complexity introduced in your program. I didn't say it couldn't be done without (what I call) structure sharing, I just wanted to point out that this would be a very intuitive approach and would probably lead to a very readable code. The method of using key searches is, I believe, an unnecessary hack to avoid multiple references. Why bother to include a full B-tree package or such when we can do without? Clearly, there is a tradeoff between readability and efficiency. In my version, I reduce efficiency by requiring garbage collection (although there exist rather powerful and fast GC algorithms). Your proposal would definitely increase the complexity of the program and, due to the overhead of a DB-lookup, might actually run slower than the version with GC --- depending on circumstances such as frequency of lookups in the database, size of the database (your program) vs. time required for GC, amount of garbage produced etc. (my program). In my opinion, people tend to put too much emphasis on plain efficiency of their programs and too little on issues such as readability, reliability and extensibility --- and I contend that (depending on the application) garbage collection *CAN* help to improve all of these. Hence, it is *wrong* to flatly ignore GC; it sure has its place and, especially in o-o systems, should be available as an option to the programmer. Now whether it should be required in *Ada* is a question of just where Ada9x will go and not really my concern. Alexander Klaiber klaiber@june.cs.washington.edu
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/14/88)
From article <6713@june.cs.washington.edu>, by klaiber@june.cs.washington.edu (Alexander Klaiber): > In article <8812131536.AA08238@galaxy.compass.com> worley@compass.UUCP (Dale Worley) writes: >>I will add that garbage collection is one of the greatest aids to >>readabilty and reliability, because it takes a complicated and >>error-prone part of programming (reclaiming storage) and eliminates it >>completely. I don't see storage reclamation as being complicated or error-prone, but maybe that's because I'm so accustomed to dealing with it that it comes almost automatically. Probably the most important reason, though, is that the ADT paradigm provides such a powerful mechanism for simplifying the situation. I'm doing an ADT implementation right now (mergeable priority queues implemented as binomial forests), and I find it hard to believe that anyone could sit there and do an ADT implementation without being able to visualize the structure in question as the code is being generated. Given that the implementor can visualize the structure, the process of destroying it seems trivial. > In article <3861@hubcap.UUCP> wtwolfe@hubcap.clemson.edu writes: >> No, this isn't structural sharing, because you are explicitly manipulating > > Call it what you want, the fact is that multiple references exist to one > object, thus making explicit (i.e. by the programmer) storage management > a nightmare and an unnecessary chore. GC deals with this situation easily. There's an even easier way to deal with it: *** Never let a pointer out of your sight *** Given that one is doing this, the question is reduced to managing local pointers to a local structure, which is quite easy. >> In this particular instance, it would be best to store the key by which >> a person could be identified (social security number, for example) in >> the list, and then using the key to retrieve the current address from >> the Person database. > > That is, if you are willing to willing to pay the cost for the additional > data base lookup (usually an O(log n) operation) Not necessarily. If all you're doing is insertions, deletions, and retrievals, a hashing implementation would give better results. > I didn't say it couldn't be done without (what I call) structure sharing, > I just wanted to point out that this would be a very intuitive approach > and would probably lead to a very readable code. I'd contest both claims; as a maintainer, I wouldn't want anything to get in the way of being able to pin down the state of that program precisely, with regard to both time AND SPACE. For one who is accustomed to being able to nail down the precise state of a program, code from which it is impossible to "complete the picture" is highly counterintuitive, and will probably wind up being rewritten. > The method of using key searches is, I believe, an unnecessary hack to > avoid multiple references. Why bother to include a full B-tree package > or such when we can do without? How can you possibly analyze the space complexity of your program when you can't pin down the precise status of every entity??? You can't just wave your hands and say "Well, I'm using this much space, plus there's a bunch of space that may or may not be occupied..."!! And let's not forget the pure hell of doing a time analysis given that you can be interrupted unpredictably for garbage collection, the timing and duration of which is totally beyond your control, particularly in that it depends on how much memory happens to be currently installed on the executing system!! > In my opinion, people tend to put too much emphasis on plain efficiency of > their programs and too little on issues such as readability, reliability > and extensibility At last, something we can ALL agree on. (I hope...!) Bill Wolfe wtwolfe@hubcap.clemson.edu
ok@quintus.uucp (Richard A. O'Keefe) (12/14/88)
In article <3865@hubcap.UUCP> billwolf@hubcap.clemson.edu writes: > Basically, I charge garbage collection with the same crime that > the GOTO was charged with: its sole function is to facilitate > UNDISCIPLINED programming. From Tremblay and Sorenson, 1985: This is completely back-to-front. Doing your own storage management is like doing your own stack management instead of using procedure calls. If you want to compare garbage collection with a control structure, compare it with resursive procedures. > If this charge can be successfully prosecuted, then garbage > collection will face the same penalty: the total elimination > of the facility, in favor of a more disciplined environment. I think it would be very hard to accuse functional programming languages like ML of providing an _un_disciplined environment. Yet automatic storage control is vital to them. A word about the history of ML may make this clearer. ML was originally the MetaLanguage of Edinburgh LCF, a system for doing proofs about computations. It is strongly typed, because it was important that anything the system claimed to be a proof should be a valid proof. There is a basic set of proof-forming operations which are known to be valid. The language cannot permit changes to elements of a data structure, otherwise the arguments to a proof might be changed after the proof was constructed, rendering it invalid. "deallocate" simply doesn't make sense in that kind of environment. Storage management is particularly easy in a language without side effects to data structures.
leake@cme.nbs.gov (Stephe Leake) (12/15/88)
In article <6713@june.cs.washington.edu> klaiber@june.cs.washington.edu (Alexander Klaiber) writes:
Clearly, there is a tradeoff between readability and efficiency. In my
version, I reduce efficiency by requiring garbage collection (although
there exist rather powerful and fast GC algorithms). Your proposal
would definitely increase the complexity of the program and, due to the
overhead of a DB-lookup, might actually run slower than the version with
GC --- depending on circumstances such as frequency of lookups in the
database, size of the database (your program) vs. time required for GC,
amount of garbage produced etc. (my program).
You are not reducing the complexity of the program; you are merely
hiding it in the vendor-supplied memory management package. If the
program is written correctly, the complexity can be hidden in a
user-supplied memory management package. Then there is no difference
in the readability of the application code, and the programmer has the
chance to tailor the garbage collection algorithm to the application.
Since there are many garbage collection algorithms (each posting here
seems to mention another one), it is clear that each will be suited to
certain applications. Better to require the programmer to choose the
algorithm, than to be tempted to "live with" the single vendor
supplied one. There is no way the vendor can take into account the
trade-offs you mention, but you can!
Stephe Leake (301) 975-3431 leake@cme.nbs.gov
National Institute of Standards and Technology
(formerly National Bureau of Standards)
Rm. B-124, Bldg. 220
Gaithersburg, MD 20899
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/15/88)
From article <864@quintus.UUCP>, by ok@quintus.uucp (Richard A. O'Keefe): > In article <3865@hubcap.UUCP> billwolf@hubcap.clemson.edu writes: >> Basically, I charge garbage collection with the same crime that >> the GOTO was charged with: its sole function is to facilitate >> UNDISCIPLINED programming. From Tremblay and Sorenson, 1985: > > This is completely back-to-front. Doing your own storage management > is like doing your own stack management instead of using procedure calls. Procedure calls are used to manage the complexity of creating the structure and of maintaining it. They should also be used to manage its destruction. This is clear and consistent, and promotes readable and reliable operations on a well-defined structure with well-defined space/time properties. > I think it would be very hard to accuse functional programming languages > like ML of providing an _un_disciplined environment. [...] > Storage management is particularly easy in a language without > side effects to data structures. OK, try getting a referentially transparent language to handle systems which are not static, but change over time. With increased power comes increased responsibility. Bill Wolfe wtwolfe@hubcap.clemson.edu
gateley@m2.csc.ti.com (John Gateley) (12/15/88)
A good example of when garbage collection is needed for readability: An infinite precision integer arithmatic package (or ADT, or module, or whatever you want to call it). Infinite precision integers will take up an unknown amount of memory to represent. Either you have to explicitly deallocate them when you are done, or let GC take care of them. But now consider the uses of integers in programs, they are used all over the place, with no regard for careful deallocation. Converting a program to use infinite precision integers will be very difficult, since it is hard to figure exactly when the program is 'finished' with a particular number. John
tpmsph@ecsvax.uncecs.edu (Thomas P. Morris) (12/15/88)
In article <6713@june.cs.washington.edu>, klaiber@june.cs.washington.edu (Alexander Klaiber) writes: > The method of using key searches is, I believe, an unnecessary hack to > avoid multiple references. Why bother to include a full B-tree package > or such when we can do without? > You mean to tell us that you =intend= to keep your "mailing" lists =permanently in memory=? You're =never= going to write them to a permanent file system/disk? Or that your never going to keep a list larger than can be kept in memory? Really! The method of using key searches is a useful aid to designing a program to read from a database larger than you can keep in memory, which needs to be permanent, and which ought to be well- behaved in terms of no =clobbering= other processes' performance! :-) Yeah, I know that the mailing list abstraction was only for an example. But a fairly poor example if you wish to justify structure sharing, which *has* its proper place... > Clearly, there is a tradeoff between readability and efficiency. In my > version, I reduce efficiency by requiring garbage collection (although > there exist rather powerful and fast GC algorithms). Your proposal > would definitely increase the complexity of the program and, due to the > overhead of a DB-lookup, might actually run slower than the version with > GC --- depending on circumstances such as frequency of lookups in the > database, size of the database (your program) vs. time required for GC, > amount of garbage produced etc. (my program). > For something like a database system (i.e. mailing lists, again), I would have to disagree that a keyed-search DB-lookup paradigm is necessarily less readable or understandable or complex than having to pre-process the databases (to create your structure sharing) to have direct access, which might not be reasonable or possible with a *large* database. Your points about speed of access are well taken, but consider the end user aspects: how many folks are going to wait 10, or 20, or 30 minutes for you to read in the external data, build your structure sharing structures, and =then= process their mailing list (personnel records, widget manufacturing data, etc), rather than using a relational database paradigm, using keyed access to those external permanent files? > In my opinion, people tend to put too much emphasis on plain efficiency of > their programs and too little on issues such as readability, reliability > and extensibility --- and I contend that (depending on the application) > garbage collection *CAN* help to improve all of these. Yes and yes. The mailing list example is and unfortunate choice perhaps. And some of us put also put an emphasis on how well a system works for its intended end-users! ----------------------------------------------------------------------------- Tom Morris BITNET: TOM@UNCSPHVX UNC School of Public Health UUCP : ...!mcnc!ecsvax!tpmsph ----------------------------------------------------------------------------- -- ----------------------------------------------------------------------------- Tom Morris BITNET: TOM@UNCSPHVX UNC School of Public Health UUCP : ...!mcnc!ecsvax!tpmsph -----------------------------------------------------------------------------
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/16/88)
From article <4159@enea.se>, by sommar@enea.se (Erland Sommarskog): > But you have to write the same boring code for [destroying] > every ADT you implement! If there were an ability to iterate over the fields of a generic record parameter, standard generics for recursively destroying certain classes of structures (e.g., n-ary trees) could be written, thus automating the process to some extent. The reason we must suffer writing the same code repeatedly is that Ada does not permit us to abstract over arbitrary record structures. > Let's say that you write a package for double-linked lists. You provide > a procedure for taking an element out of a list. Should you free the > memory allocated? You don't know because the caller may or may not still > keep an reference to the element. He may be totally uninterested in it, > but he may also want to insert it in another list. When the list is instantiated, the user tells us what type of object is to be kept in the list, and provides us with procedures for assignment and equality. Objects are copied into and out of the list. When an object is to be removed from the list, the local copy of the object which is stored within the list is destroyed. Now if the user wants, a pointer type can be supplied as the type of object to be listed, and the user-supplied equality function can compare the objects pointed to rather than the pointers themselves. The user then bears all responsibility for managing the "external storage", and can track his/her own references. There is no implementor's dilemma here whatsoever. Bill Wolfe wtwolfe@hubcap.clemson.edu
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/16/88)
From article <65713@ti-csl.CSNET>, by gateley@m2.csc.ti.com (John Gateley): > A good example of when garbage collection is needed for readability: > An infinite precision integer arithmatic package (or ADT, or module, > or whatever you want to call it). Infinite precision integers will > take up an unknown amount of memory to represent. Either you have to > explicitly deallocate them when you are done, or let GC take care of them. > But now consider the uses of integers in programs, they are used all over > the place, with no regard for careful deallocation. Converting a program > to use infinite precision integers will be very difficult, since it is > hard to figure exactly when the program is 'finished' with a particular > number. The deallocation of every object in the local environment is performed as an automatic service when a procedure, function, or local block is exited. This is not garbage collection, because the programmer has implicitly directed that the destruction be performed. Unfortunately, Ada does not provide a means of integrating ADT destruction algorithms into the mechanism providing this service. This is Language Issue 35 of the Ada Language Issues Working Group; it is presently under editorial review, a step which is usually followed by approval and then by submission to ADA-COMMENT, according to the ALIWG handout I obtained at Tri-Ada '88. Bill Wolfe wtwolfe@hubcap.clemson.edu
sommar@enea.se (Erland Sommarskog) (12/19/88)
Sworn enemy to garbage collection Bill Wolfe writes: > The deallocation of every object in the local environment is > performed as an automatic service when a procedure, function, > or local block is exited. This is not garbage collection, > because the programmer has implicitly directed that the > destruction be performed. I read this as "on block exit all memory allocated to variables declared in that block should be deallocated". Isn't this very dangerous? What if programmer copied the object to a variable declared in a outer block? Or stored it in a table of some sort in a subprogram call made in block? -- Erland Sommarskog ENEA Data, Stockholm sommar@enea.se "Frequently, unexpected errors are entirely unpredictable" - Digital Equipment
gateley@m2.csc.ti.com (John Gateley) (12/20/88)
In article <3918@hubcap.UUCP> billwolf@hubcap.clemson.edu writes: >From article <65713@ti-csl.CSNET>, by gateley@m2.csc.ti.com (John Gateley): >> [I say infinite precision arithmetic is a good example of why GC is needed >> sometimes]. > > The deallocation of every object in the local environment is > performed as an automatic service when a procedure, function, > or local block is exited. This is not garbage collection, > because the programmer has implicitly directed that the > destruction be performed. But, integers are not always deallocated. They may be returned as the result of a function, assigned to global variables, placed in the heap as parts of other data objects etc. When you try and do the same with infinite precision integers, then the deallocation you describe does not work! You can always copy them for these purposes, but this can be very space/time inefficient. It also requires extra effort (remembering to copy them), unless Ada provides some technique for doing this automatically. So, GC is a good way to handle this problem. > Unfortunately, Ada does not provide a means of integrating ADT > destruction algorithms into the mechanism providing this service. > This is Language Issue 35 of the Ada Language Issues Working Group; > ... Integers, however, are not deallocated when a block is exited (if they have been saved elsewhere). So, I don't think this language issue applies to the infinite precision integer case. John gateley@tilde.csc.ti.com
wtwolfe@hubcap.UUCP (Bill Wolfe) (12/21/88)
From article <4176@enea.se>, by sommar@enea.se (Erland Sommarskog): > Sworn enemy to garbage collection Bill Wolfe writes: >> The deallocation of every object in the local environment is >> performed as an automatic service when a procedure, function, >> or local block is exited. This is not garbage collection, >> because the programmer has implicitly directed that the >> destruction be performed. > > I read this as "on block exit all memory allocated to variables > declared in that block should be deallocated". > Isn't this very dangerous? What if programmer copied the object to > a variable declared in a outer block? Or stored it in a table of some > sort in a subprogram call made in block? Not at all. If a programmer assigns the value stored in some local object to some nonlocal object, then we simply have a new value for a nonlocal object, assuming we have not engaged in the foul practice of structural sharing. Those who engage in structural sharing will get the run-time errors they deserve for engaging in such space-hacking. Now there is, of course, the problem that the practice of assigning to nonlocal objects is somewhat distasteful, since this amounts to a "side effect" of a procedure or function. The rap on side effects is that they frequently are not properly documented, and thus make a major contribution to the difficulty of understanding a program. In addition, optimizing compilers find it difficult to completely define the relationship between procedures or functions and their enclosing environment, thus making the relevant optimizations costly. So... language designers to the rescue!!! The solution to this problem has been discussed in recent issues of Sigplan Notices, and it is this: Provide a scheme whereby procedures and functions have, by default, no relationship to their surrounding environment other than that which is explicitly documented in the parameter structure. Then provide a facility whereby one can explicitly "poke holes in the shield" for selected external objects. The maintainers are happy, because for once they can read the specification of a procedure or function and be certain that it does nothing which is not documented in that specification. The writers of optimizing compilers are absolutely ecstatic. And the programmers can do their debugging faster as well. But sadly for the programmers who pride themselves on obscure code, there is now one less form of sloppiness available for their use. Oh, well, perhaps they'll find some comfort in C... Bill Wolfe wtwolfe@hubcap.clemson.edu
wtwolfe@hubcap.UUCP (Bill Wolfe) (12/21/88)
In article <65914@ti-csl.CSNET>, gateley@m2.csc.ti.com (John Gateley) writes: > >> [I say infinite precision arithmetic is a good example of why GC is needed > >> sometimes]. > > > > The deallocation of every object in the local environment is > > performed as an automatic service when a procedure, function, > > or local block is exited. This is not garbage collection, > > because the programmer has implicitly directed that the > > destruction be performed. > > But, integers are not always deallocated. They may be returned as the > result of a function, assigned to global variables, placed in the > heap as parts of other data objects etc. What we are (presumably) discussing here is the space management applying to literals and/or anonymous values of some arbitrary type. During compilation, the compiler will evaluate each literal as an anonymous value of the appropriate type. Anonymous values differ from named values only in that the programmer cannot explicitly refer to them by name. Typically, literals are used in the construction of anonymous expressions which will ultimately serve as a value to be assigned. They are then being supplied as "in" parameters, and we have already discussed the proper method by which compilers should handle anonymous values passed as "in" parameters: they should be passed by handoff; since no external name exists, there is no need to set any external object to "undefined". Since anonymous values cannot simply exist in mid-air, and must always be consumed in the course of some statement, we know that they will always be consumed. By the rules of Ada, they cannot be supplied as "in out" or "out" parameters; hence, the mechanism described for handling anonymous values which are being passed as "in" parameters in a space-efficient manner serves to cover all cases, if we treat assignment as a procedure requiring an "in" parameter for the source value; otherwise, the extension to this case is straightforward. Thus, we have completely described the space management applicable to literals and other anonymous values, such that these values can be managed effectively without any need for recourse to garbage collection. All that is needed is a tighter definition of what happens to an anonymous value which is passed as an "in" parameter. Note that this analysis does not depend upon the amount of space which must be used to contain any given anonymous value. Bill Wolfe wtwolfe@hubcap.clemson.edu
sommar@enea.se (Erland Sommarskog) (12/27/88)
Bill Wolfe (wtwolfe@hubcap.UUCP) first said: >>> The deallocation of every object in the local environment is >>> performed as an automatic service when a procedure, function, >>> or local block is exited. This is not garbage collection, >>> because the programmer has implicitly directed that the >>> destruction be performed. I tried to understand: >> I read this as "on block exit all memory allocated to variables >> declared in that block should be deallocated". >> Isn't this very dangerous? What if programmer copied the object to >> a variable declared in a outer block? Or stored it in a table of some >> sort in a subprogram call made in block? And Bill explains: > Not at all. If a programmer assigns the value stored in some local > object to some nonlocal object, then we simply have a new value for > a nonlocal object, assuming we have not engaged in the foul practice > of structural sharing. Those who engage in structural sharing will > get the run-time errors they deserve for engaging in such space-hacking. And I still don't understand. Let's say we have a tree handler Generic Data_type is limited private; Procedure -- Assign, "<" and ">" Package Tree_handler Tree_type is private; Assume further that we instantiate this package with Some_access_type, it could be an imported limited type, it could be a locally defined. Then we have the procedure: Procedure Insert_data_to_tree(Tree : in out Tree_type; Data_of_some_sort : in Any_type); Reference : Some_access_type; Begin Reference := new Some_type; -- Or subprogram call for external type Prepare(Reference, Data_of_some_sort); Insert(Tree, Reference); End Insert_data_to_tree; According to Bill the data allocated to Reference should be freed when we exit this procedure, but if Insert does what we think this is of course a severe error. Where did I go wrong? Did I plead guilty to structure sharing just because I inserted a pointer to a tree? Or does Bill mean that when Reference is inserted to the tree that Reference.all should be inserted to the tree, not the pointer itself? But what if I also insert the reference into another tree (or some other structure) and then modify the referred data (some counter, for instance) and want this change to appear in both places? Note also that in this example, there are no side effects. We are only dealing with local variables and paramerters. And still deallocation on block exit is completely wrong. (And there is no way the compiler could deduce that Insert really saves Reference somewhere.) Several times in this discussion Bill Wolfe has claimed that garbage collection is a trouble-maker for maintainers. In what way? I can only see one: If the system due to a heavy use of memory is spending a lot of time on garbage collecttion, something must be done. But in this case the problem is often quite easy to spot. Much worse for maintainence is when the system dies with "System access violation" or "segmentation fault" from time to time and you don't know why. The reason may be that an allocated area was erroneously released and then reused and the original data, that someone appearently is relying in, has been over-written. Or because an already deallocated was deallocated a second time. Bill Wolfe is trying to outline some alternative, but I think he has to think more about, until it is equally safe. (And really, I don't think he should. He is just reinventing the wheel.) Bill Wolfe said earlier that garbage collection encourages sloppy programming. Whether it is sloppy or not, I don't care, but obviously garbage collection is taking a lot of the burden of the programmer's shoulder and the maintainer's too. Why should a programmer waste time bothering about something that the computer could do much safer for him? After all, the programmer does not work for free. So to conclude: I think Ada should require garbage collection that could be disabled with a pragma for critical applications. As it is now a portable application cannot rely on it, with all the troubles explicit deallocation implies. -- Erland Sommarskog ENEA Data, Stockholm This signature is not to be quoted. sommar@enea.se
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (12/28/88)
From article <4195@enea.se>, by sommar@enea.se (Erland Sommarskog): [Erland describes a situation in which a user instantiates a tree structure over an access type, and then considers the insertion procedure with which the user inserts an object into the tree] > According to Bill the data allocated to Reference [in the example, > a pointer to the user's object] should be freed when > we exit this procedure, but if Insert does what we think this is of > course a severe error. Where did I go wrong? Did I plead guilty to > structure sharing just because I inserted a pointer to a tree? > Or does Bill mean that when Reference is inserted to the tree that > Reference.all should be inserted to the tree, not the pointer itself? > But what if I also insert the reference into another tree (or some other > structure) and then modify the referred data (some counter, for instance) > and want this change to appear in both places? > Note also that in this example, there are no side effects. We are only > dealing with local variables and paramerters. And still deallocation > on block exit is completely wrong. (And there is no way the compiler > could deduce that Insert really saves Reference somewhere.) The user has supplied the limited private type STORED_OBJECT, and defined assignment over this object. Let's assume that the tree is implemented in terms of a record structure, one field of which is a STORED_OBJECT. In the INSERT_ITEM procedure, the user sends "in out TREE_TYPE; in STORED_OBJECT". The tree is modified as follows: a new record is created to represent a new node of the tree. The statement ASSIGN (NEW_NODE.STORED_ITEM, NEW_OBJECT) is executed, along with whatever pointer manipulations are necessary to maintain the structure of the tree. Now we proceed to block exit. Since the tree and the new object are *parameters* and not local variables, they are not subject to the automatic destruction which applies to the block's local variables. Since the user supplied an access type as the type of object to be stored in the tree, the user bears full responsibility for making responsible use of the fact that he/she is dealing with a tree of pointers. > Several times in this discussion Bill Wolfe has claimed that > garbage collection is a trouble-maker for maintainers. In what way? When there is great difficulty deciding whether a given object exists or not, the maintainer experiences great difficulty pinning down the precise state of the program. > Much worse for maintainence is when the system dies with "System > access violation" or "segmentation fault" from time to time and you > don't know why. The reason may be that an allocated area was erroneously > released and then reused and the original data, that someone appearently > is relying in, has been over-written. Or because an already deallocated > was deallocated a second time. There are basically two cases in which this can happen: 1) When an inexperienced ADT designer is in the process of acquiring the mental discipline required of anyone who programs pointer-based structures. To avoid this, purchase ADTs from professional houses or use experienced ADT designers. Otherwise, it's a cost of professional training. 2) When a user has engaged in S T R U C T U R A L S H A R I N G. *** Don't engage in structural sharing *** > Bill Wolfe said earlier that garbage collection encourages > sloppy programming. Whether it is sloppy or not, I don't care, Obviously. > Why should a programmer waste time bothering about something that > the computer could do much safer for him? After all, the > programmer does not work for free. DESTROY procedures are easy to write, need only be written once per ADT, and can be reused indefinitely. GC costs us the computer time required to repeatedly determine which storage is in use and which is not, and this cost must be paid repeatedly AT RUN TIME. Given that this PERMANENT, RUN_TIME COST can be avoided by simply communicating the fact that a particular object is no longer needed, GC is a rather costly crutch for lazy programmers. Bill Wolfe wtwolfe@hubcap.clemson.edu
rjh@cs.purdue.edu (Bob Hathaway) (12/28/88)
>According to Bill the data allocated to Reference should be freed when >we exit this procedure, but if Insert does what we think this is of >course a severe error. Where did I go wrong? Did I plead guilty to I think Bill meant deallocation should only occur if an object is no longer accessible at the end of a subprogram or block, and Reference is accessible. Subprogram and block exit is a very convenient time to perform clean-up of access types. If Ada had an Adt construct for the "completion of the Adt paradigm", we could specify a termination procedure for Adts which could also be called at scope exit. I agree that Adts/objects are an important enough programming development to justify a new language construct. I've worked with over 200,000 lines of code which used the older data-structure oriented design and found the greatest problem in understanding and working with these programs was caused by the use of global variables and the indescriminant access and modification of data-structures which were not implemented as Adts. Such data-structures are poorly defined and lead to code which is hard to read. For my own opinion on the garbage collection discussion, I think a destroy procedure is desirable. Adts need to be explicitly deallocated for long running daemons or loops even while objects are still accessible. While calling a procedure or declaring objects in a local block from within a loop allows automatic deallocation of objects, I'd rather use scope rules and subprograms for creating environments and performing actions and not for storage (de)allocation. Also, I think Adt operations should include a destroy procedure for completeness. If space management is not a problem the destroy procedure does not have to be called. Concerning working sets, if a page is not accessed it will not reside in memory for long and should not cause a virtual space management problem. I strongly agree with assignment operator overloading, and another powerful extension is to allow any name/operator to be overloadable. For yet another Ada 9X extension, I propose procedural variables. As in Modula procedural variables can be limited to top level procedures but formal parameter names must be included for named parameter association. Here is an example declaration: type insertNode_procedure = procedure (adt : in out structure, node : in node); Variables of type insertNode_procedure can be assigned to any procedure with the same parameter type structure (type domain). Procedural variables can avoid the redundant use of case statements by allowing an operation to be stored within an Adt. It also allows individually parameterizable and reprogrammable Adts since operations can be provided to alter the Adts actions or structure. Generic subprogram parameters can only allow Adts to be set once for any instantiation. I use procedural variables and function pointers in Adts frequently when programming in languages other than Ada and am convinced they are an elegant way to model Adt actions. Bob Hathaway rjh@purdue.edu
snorri@rhi.hi.is (Snorri Agnarsson) (12/29/88)
From article <3985@hubcap.UUCP>, by billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,): > When there is great difficulty deciding whether a given object > exists or not, the maintainer experiences great difficulty > pinning down the precise state of the program. Exactly - and garbage collection is precisely what you need to get rid of this difficulty. When you have garbage collection there is no difficulty in determining whether an object exists or not. It exists if you have some way to refer to it, otherwise not. > DESTROY procedures are easy to write, need only be written once > per ADT, and can be reused indefinitely. GC costs us the > computer time required to repeatedly determine which storage > is in use and which is not, and this cost must be paid repeatedly > AT RUN TIME. Given that this PERMANENT, RUN_TIME COST can be > avoided by simply communicating the fact that a particular object > is no longer needed, GC is a rather costly crutch for lazy programmers. The PERMANENT, RUN_TIME COST is very often much worse if you use the methods Mr. Wolfe dogmatically demands. For example if you have a program which does calculations with indefinite size integers, then each new integer value computed is most appropriately stored in a new area in the heap. If you disallow sharing then each assignment must create a new area. This is costly. Each time you stop using an integer variable you must deallocate it. This is costly. If you have a simple copying garbage collector and allow sharing, which by the way is not dangerous at all in this case at least, you don't have the abovementioned costly operations, but instead you have a simple garbage collection which will in my experience be infrequent (e.g. every ten seconds for a program doing intensive calculations with 10000 digit integers) and fast (less than a fraction of a second for each collection). And this is on a machine with small memory (a PC with 640KB memory). Memory management is one of the most difficult things for programmers to do correctly and efficiently. Garbage collection certainly takes this burden from the programmer and allows him to think in more abstract terms. In my opinion reusable software components can not be adequately created in languages that do not have good automatic memory management. In this respect Ada is a failure unless garbage collection is required. The C++ approach of using destructors is costly both in memory and time and does not really solve the problem, although it is better than nothing. It is regrettable in my opinion that the issue of garbage collection is so much misunderstood. It is also regrettable that proponents of the various interesting new programming methodologies do not push garbage collection as an issue. Functional programming, logic programming and object oriented programming have only one thing in common; reliance on garbage collection. None of those methodologies would have much use in my opinion if it were not for garbage collection. And, I repeat: reusable modules (that includes generic packages) are almost useless in languages with no garbage collection compared to the use you can get in languages having garbage collection. -- Snorri Agnarsson -- snorri@rhi.is Raunvisindastofnun Haskolans Dunhaga 3 IS-107 Reykjavik ICELAND
sommar@enea.se (Erland Sommarskog) (12/29/88)
I gave an example where I stored pointers in a tree, and tried to make Bill Wolfe explain how this should work with his idea with deallocation on block exit. He answered, but I wouldn't say I'm satisfied... I still don't see where he is heading. > Since the user supplied an access type as the type of object to be > stored in the tree, the user bears full responsibility for making > responsible use of the fact that he/she is dealing with a tree of pointers. Well, assume that Assign the procedure I write for Some_access_type (the type I store in the tree) is simply A := B. With your scheme I can't but see that the object I allocated is deallocated when I exit my procedure, although I still have a reference to it, stored in the tree. Bill, tell me, what is wrong here? My understanding of your proposal, or the program I have written? If you think the latter, explain to me how the compiler should detect the error I'm doing. Or do you really think such an error should remain undiscovered until run-time? And does that rhyme with your eagerness for simplify maintenance? >> Several times in this discussion Bill Wolfe has claimed that >> garbage collection is a trouble-maker for maintainers. In what way? > > When there is great difficulty deciding whether a given object > exists or not, the maintainer experiences great difficulty > pinning down the precise state of the program. You have just given an excellent argument for garbage collection. With garbage collection, only the objects that have references exist. Without, there may be other objects that are unreachable. With garbage collection you are sure of that all initiated non-null references point to actual objects. Without, you may by mistake be pointing straight into free space, or right in the middle of some completely other kind of object. The only incertainty you have with garbage collection is whether a non-referred object has yet been returned to free memory, depending on wether the garbage collector has come there or not. But since the object no longer is part of the system, this question is of little interest. >> Much worse for maintainence is when the system dies with "System >> access violation" or "segmentation fault" from time to time and you >> don't know why. The reason may be that an allocated area was erroneously >> released and then reused and the original data, that someone appearently >> is relying in, has been over-written. Or because an already deallocated >> was deallocated a second time. I forgot: When the system chokes with "heap full" and you as a maintainer is trying to find where all space is being wasted. With garbage collection you at least know that all this memory is being used. Without, you don't know if it is just memory leakage or real overuse. > DESTROY procedures are easy to write, need only be written once > per ADT, and can be reused indefinitely. GC costs us the > computer time required to repeatedly determine which storage > is in use and which is not, and this cost must be paid repeatedly > AT RUN TIME. Given that this PERMANENT, RUN_TIME COST can be > avoided by simply communicating the fact that a particular object > is no longer needed, GC is a rather costly crutch for lazy programmers. 1) You have several times stated that the callers of your ADTs are responsible for that their destroy procedures are called. So it's not only to write them. You must remember to call them too. And when you forget, it takes time find out where. 2) Much of this talk reminds of what hear from C programmers when range and type checking is being discussed. And the same arguments apply. If all programmers were skilled enough to do everything right we wouldn't need Ada or high-level langauges at all. Actually, if everyone else is taking help of computers these days, why shouldn't programmers do? Just like range and type checking, garbage collection is a device that increases our thrust in that the system does what we intended. 3) I recommend you to read "Object-Oriented Software Construction" by Bertrand Meyer. There he explains why garbage collection is necessary in an object-oriented system. He also describes how the garbage collector is implemented in the Eiffel system. (Eiffel is an object-oriented language, devised by Mr. Meyer, and comemercially available.) -- Erland Sommarskog ENEA Data, Stockholm This signature is not to be quoted. sommar@enea.se
wtwolfe@hubcap.UUCP (Bill Wolfe) (12/30/88)
From article <692@krafla.rhi.hi.is>, by snorri@rhi.hi.is (Snorri Agnarsson): > From article <3985@hubcap.UUCP>, by billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,): >> When there is great difficulty deciding whether a given object >> exists or not, the maintainer experiences great difficulty >> pinning down the precise state of the program. > > Exactly - and garbage collection is precisely what you need to get rid > of this difficulty. > When you have garbage collection there is no difficulty in determining > whether an object exists or not. It exists if you have some way to > refer to it, otherwise not. So basically you have to sit there and total up the references to every piece of space manually. Sounds like fun. Got an algorithm for doing this in constant time? If not, there is "great difficulty pinning down the precise state of the program". >> DESTROY procedures are easy to write, need only be written once >> per ADT, and can be reused indefinitely. GC costs us the >> computer time required to repeatedly determine which storage >> is in use and which is not, and this cost must be paid repeatedly >> AT RUN TIME. Given that this PERMANENT, RUN_TIME COST can be >> avoided by simply communicating the fact that a particular object >> is no longer needed, GC is a rather costly crutch for lazy programmers. > > The PERMANENT, RUN_TIME COST is very often much worse if you use the > methods Mr. Wolfe dogmatically demands. For example if you have a program > which does calculations with indefinite size integers, then each new integer > value computed is most appropriately stored in a new area in the heap. > If you disallow sharing then each assignment must create a new area. Consider what happens to the storage associated with the old value of the target of the assignment statement. The ASSIGN procedure can reuse it directly rather than deallocating and reallocating it. > Memory management is one of the most difficult things for programmers to > do correctly and efficiently. Garbage collection certainly takes this > burden from the programmer and allows him to think in more abstract terms. Space is an important part of the abstraction. And I challenge the claim that memory management is difficult. One need only deal with a single structural model within the ADT; in the course of writing that ADT's implementation, one will become thoroughly familiar with that structure, and there should be no difficulty deciding when a part of it is no longer needed. > In my opinion reusable software components can not be adequately created > in languages that do not have good automatic memory management. In my opinion, reliance upon garbage collection is the single best way to create an inferior ADT implementation which will be totally rejected by anyone who is programming critical applications. > In this respect Ada is a failure unless garbage collection is required. In this respect garbage collection is a total failure and should NEVER be required. In the interests of seeing to it that portability to the critical-application environment is always possible, GC should be PROHIBITED. > It is regrettable in my opinion that the issue of garbage collection > is so much misunderstood. Particularly by people who have no understanding of the issues applying to proper ADT design. > It is also regrettable that proponents of > the various interesting new programming methodologies do not push > garbage collection as an issue. Functional programming, logic > programming and object oriented programming have only one thing in > common; reliance on garbage collection. None of those methodologies > would have much use in my opinion if it were not for garbage collection. They don't have much use anyway. See David Harland's "Concurrency and Programming Languages". Bill Wolfe wtwolfe@hubcap.clemson.edu
wtwolfe@hubcap.UUCP (Bill Wolfe) (12/30/88)
From article <4200@enea.se>, by sommar@enea.se (Erland Sommarskog): >> Since the user supplied an access type as the type of object to be >> stored in the tree, the user bears full responsibility for making >> responsible use of the fact that he/she is dealing with a tree of pointers. > > Well, assume that Assign the procedure I write for Some_access_type > (the type I store in the tree) is simply A := B. With your scheme I > can't but see that the object I allocated is deallocated when I exit > my procedure, although I still have a reference to it, stored in > the tree. One more time. A exists within the tree. The tree is a PARAMETER. The tree is NOT a local variable. Hence, upon exit from the ASSIGN procedure, A will not suffer deallocation. Now what about B? It's a parameter too, and also will not be deallocated. > Bill, tell me, what is wrong here? My understanding of your proposal, > or the program I have written? Your understanding of my proposal. I'm having a really hard time trying to figure out what you're thinking of here. I thought everything had been clearly defined in terms that any Pascal/Modula-2/Ada programmer would understand. > I forgot: When the system chokes with "heap full" and you as a maintainer > is trying to find where all space is being wasted. With garbage collection > you at least know that all this memory is being used. Without, you don't > know if it is just memory leakage or real overuse. Advanced debugging tools should be used to address this issue. >> DESTROY procedures are easy to write, need only be written once >> per ADT, and can be reused indefinitely. GC costs us the >> computer time required to repeatedly determine which storage >> is in use and which is not, and this cost must be paid repeatedly >> AT RUN TIME. Given that this PERMANENT, RUN_TIME COST can be >> avoided by simply communicating the fact that a particular object >> is no longer needed, GC is a rather costly crutch for lazy programmers. > > 1) You have several times stated that the callers of your ADTs > are responsible for that their destroy procedures are called. > So it's not only to write them. You must remember to call them > too. And when you forget, it takes time find out where. I'm TRYING to get Ada to integrate the automatic destruction of local variables which occurs during block exit with ADT destruction procedures, so the user can rely on the implicit deallocation request associated with block exit. Let me give a concrete example: procedure EXAMPLE (A : in out ADT); B : ADT; C : INTEGER; [body of procedure EXAMPLE] Now when the body of procedure EXAMPLE has finished executing, in current Ada C will be properly destroyed, but B will not. If B is implemented as a pointer to something, that pointer will be destroyed, and everything it pointed to will become garbage. By integrating the DESTROY procedure into block exit, everything B pointed to will be deallocated as well. Since A is a parameter, it is not part of the locally defined environment; it is instead part of the caller's environment, and will stay that way. In the revised version, the "right" DESTROY procedure will apply to B as well as C, thus enabling the use of implicit destruction, which is the most frequently used method of destruction. > 3) I recommend you to read "Object-Oriented Software Construction" > by Bertrand Meyer. There he explains why garbage collection is > necessary in an object-oriented system. He also describes how the > garbage collector is implemented in the Eiffel system. (Eiffel is > an object-oriented language, devised by Mr. Meyer, and comemercially > available.) I'm quite aware of Eiffel, and I have just shown (in another article) how reliance upon garbage collection is a sure way to get your ADT rejected by users who are programming critical applications. This same argument applies to Eiffel and the other members of the peanut gallery; they are unsuitable for critical applications, among their many other flaws. May I suggest David Harland's "Concurrency and Programming Languages"? Representation of classes of types is better accomplished through a synthesis approach, rather than an inheritance approach; however, this is presently a research topic. Followups on this to comp.lang.sigplan or e-mail, please. Bill Wolfe wtwolfe@hubcap.clemson.edu
sommar@enea.se (Erland Sommarskog) (12/31/88)
I'm trying to get Bill Wolfe to explain how he his deallocation-on- block-exit scheme would work in an example I had. For brevity I deleted my example in my last article. Apparently that was a fatal mistake, because the answer I got wasn't even close the question. So, my apologies, I repeat the exmample with some clarifcations. We have: Generic Data_type is limited private; Procedure -- Assign, "<" and ">" Package Tree_handler Tree_type is private; Assume further that we instantiate this package with Some_access_type, it could be an imported limited type, it could be a locally defined. Then we have the procedure: Procedure Insert_data_to_tree(Tree : in out Tree_type; Data_of_some_sort : in Any_type); Reference : Some_access_type; Begin Reference := new Some_type; -- Or subprogram call for external type Prepare(Reference, Data_of_some_sort); Insert(Tree, Reference); End Insert_data_to_tree; The Assign procedure which we provide when we instantiate the tree handler is simply: Procedure Assign(A : in out Some_access_type; B : in Some_access_type) is Begin A := B; End Assign; Now, what happens with Reference.all on exit of Insert_to_data_tree, with your scheme? (Assume there is no user-written DESTROY procedure.) Would Reference.all be deallocated although there still is a living reference to it in Tree? Or will it be ignored in any case? Or are you having a reference count hidden somewhere to save you? Another interesting issue is: When Tree drops from the stack what happens with our little object at this occassion? The Tree's DESTROY procedure cannot know anything about it, so we have to traverse the tree and deallocate all referd objects ourselves. (And pray we haven't inserted some of them in a queue somewhere else too!) With garbage collection... >> I forgot: When the system chokes with "heap full" and you as a maintainer >> is trying to find where all space is being wasted. With garbage collection >> you at least know that all this memory is being used. Without, you don't >> know if it is just memory leakage or real overuse. > > Advanced debugging tools should be used to address this issue. So why can't these "advanced debugging tools" be used to "pin down the exact state of the system" when we have garbage collection? > I'm quite aware of Eiffel, and I have just shown (in another article) > how reliance upon garbage collection is a sure way to get your ADT > rejected by users who are programming critical applications. This I assume with critical mean time-critical here. If the issue is reliability or space, garbage collection is only there to help you. There is good chance that your ADT will be rejected anyway. The critical developper looks at you and says: "You use Unchecked_dellocation? Sorry, then I can't use your ADT. Memory fragmention will cause too great a time penalty when the system have lived for a while." So you return and implement a free-list and try again. The developper looks at you: "You have an internal task for the free-list?" "Eh, no" "You must have that, else all will be a mess if two tasks access the list simultaneously." So you end up with with adding this task, and your critical developper is satisfied. (Reservation: My experience of tasking in Ada is not that great. Possibly the SHARED pragma could help, but I doubt.) Then you come to me who is writing an interactive data-base application for VAX/VMS and give me your ADT. It looks nice so I use it. Some time later my users complains: the reponse time is too long. What have you done? Being equipped with good tools I re-link the system with PCA (a profiler for VMS) to find the bottle-neck. And I to my surprise I find that rendez-vous are taking place in the system. "Ah, that why it's slow, but where do they come from?" Soon your ADT is found to be guilty. And just a little later your ADT is edited to be taskless, and my users are happy again. (Until they get dumps because memory is full, because I forgot to call one of your damned Destroy routines somewhere.) The lesson to learn is that it is very hard to satisfy everyone. And you end up with two versions anyway. Given that there are many shops who never will write an application were garbage collection is unacceptable, ADTs relying on its presence will still have a wide use. Probably more use than ADTs using tasks to protect free lists. (I have reponse times to think of, but garbage collection is no problem. With the collector running as coroutine like in the Eiffel system, it can work while I'm waiting for input.) There is a solution that gives us a possibility to use the same implementation of the ADT above. And that solution is, yes you guessed it: garbage collection. You provide your ADT with a Destroy procedure that your critical developper uses. (As long your block-exit scheme doesn't work properly, he has to call it explicitly. I doubt he would accept your idea anyway.) I, in my turn, give a damn about it. All I want is that the task should not be used, which you can have a special call for. Then I just set the garbage_collection flag in my system configuration file. -- Erland Sommarskog ENEA Data, Stockholm This signature is not to be quoted. sommar@enea.se
ryer@inmet.UUCP (01/03/89)
Currently, Ada neither requires nor precludes garbage collection. For example, the Symbolics Ada compiler provides it (as well they should), but none of the eight cross-compilers for the 1750A computer does (as they shouldn't). What are you (any of you) proposing? That ALL Ada compilers should have garbage collection? Than no Ada compilers should be allowed to have it? Pragmas? Each day I find a new collection of comments on this note, debating the goodness of garbage collection in the abstract. I think that the Ada language has already taken the only defensible position, and would be happy to explain why if anyone wants to take a different concrete position. Mike "speaking only for myself" Ryer Intermetrics, Inc.
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/05/89)
From article <4206@enea.se>, by sommar@enea.se (Erland Sommarskog): > I'm trying to get Bill Wolfe to explain how he his deallocation-on- > block-exit scheme would work in an example I had. [...] > I repeat the exmample with some clarifcations. > > We have: > Generic > Data_type is limited private; > Procedure -- Assign, "<" and ">" > Package Tree_handler > Tree_type is private; > > Assume further that we instantiate this package with Some_access_type, > it could be an imported limited type, it could be a locally defined. > Then we have the procedure: > > Procedure Insert_data_to_tree(Tree : in out Tree_type; > Data_of_some_sort : in Any_type); ^^^^^^^^ You mean Data_type, right? > Reference : Some_access_type; How about "access Data_Type"? It is illegal to reference Some_access_type directly, since it is not a generic formal parameter. We can only reference Data_Type, which in this particular instantiation will refer to Some_access_type. > Begin > Reference := new Some_type; -- Or subprogram call for external type Let's say "Reference := new Data_Type;", to make things legal. > Prepare(Reference, Data_of_some_sort); Prepare is not a generic parameter; I frankly don't see why you need a call to such a procedure anyway. I'll replace it with "Assign (Reference.all, Data_of_some_sort);". > Insert(Tree, Reference); What exactly is Insert doing? It would make more sense to write: begin -- find the node for which the new value is to -- become either the leftmost or rightmost child, -- create a new child, and assign the new data value -- to the appropriate field of the new node. At some -- point in this process, there may be some pointer -- manipulations; it really doesn't matter where. But -- let's assume the tree is initially empty. Then, TREE := new BINARY_TREE_NODE; ASSIGN (NEW_NODE.DATA_VALUE, DATA_OF_SOME_SORT); -- since pointer types are automatically initialized to -- null, we don't have to worry about initializing -- the left child and right child fields, so we're done. end INSERT; > End Insert_data_to_tree; > > The Assign procedure which we provide when we instantiate the tree handler > is simply: > Procedure Assign(A : in out Some_access_type; B : in Some_access_type) is > Begin > A := B; > End Assign; > > Now, what happens with Reference.all on exit of Insert_to_data_tree, with > your scheme? (Assume there is no user-written DESTROY procedure.) Would > Reference.all be deallocated although there still is a living reference > to it in Tree? First of all, I always require the user to pass in a DESTROY procedure. So my initial reaction is that this is a malformed ADT. But let's assume the existence of such a malformed ADT. We'll assume further that we have a pointer variable such as Reference. Destruction of a pointer consists simply of destroying the pointer, and does not imply destruction of what it points to. Destruction of an ADT which happens to be implemented via an initial pointer (as in Tree) does imply destroying the transitive closure of what it points to. >>> I forgot: When the system chokes with "heap full" and you as a maintainer >>> is trying to find where all space is being wasted. With garbage collection >>> you at least know that all this memory is being used. Without, you don't >>> know if it is just memory leakage or real overuse. >> >> Advanced debugging tools should be used to address this issue. > > So why can't these "advanced debugging tools" be used to "pin down > the exact state of the system" when we have garbage collection? They can, but only the exact state of one particular execution, and not the precise state of the theoretical system in general. A debugging tool exists simply to help you find errors which show up in one particular situation. It does not help you understand the piece of software as a theoretical structure, which is essential if one is to perform nontrivial modifications to a system. It is this theoretical imprecision which causes problems. > >> I'm quite aware of Eiffel, and I have just shown (in another article) >> how reliance upon garbage collection is a sure way to get your ADT >> rejected by users who are programming critical applications. This > > There is good chance that your ADT will be rejected anyway. The critical > developper looks at you and says: "You use Unchecked_dellocation? Sorry, > then I can't use your ADT. Memory fragmention will cause too great a > time penalty when the system have lived for a while." Allocation/deallocation routines exists which automatically collapse adjacent blocks of free storage into larger blocks, which minimizes the problem. In general, this is the *compaction* question, which is orthogonal to the GC question. It is properly directed at the allocation/deallocation system and not at the ADT designer. It is conceivable that an allocation/deallocation system could be designed in such a way as to spread the costs of compaction evenly over each allocation/deallocation request, thus meeting all requirements at the same time. Bill Wolfe wtwolfe@hubcap.clemson.edu
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/05/89)
From article <124000028@inmet>, by ryer@inmet.UUCP: > Currently, Ada neither requires nor precludes garbage collection. [...] > What are you (any of you) proposing? That ALL Ada compilers should have > garbage collection? Than no Ada compilers should be allowed to have it? > Each day I find a new collection of comments on this note, debating the > goodness of garbage collection in the abstract. I think that the Ada > language has already taken the only defensible position, and would be > happy to explain why if anyone wants to take a different concrete position. OK, I'll take the position that Ada should be defined such that no Ada compilers should be allowed to have garbage collection. I charge that GC encourages the production of sloppy code which is difficult to maintain and which compiles into permanently inefficient programs, that code can be written by space-conscious programmers at least as fast as by those who use GC as a crutch, and that such code will require less debugging time due to the more complete understanding of the problem which is possessed by a programmer who comprehends all aspects (both space and time) of his/her code, and that GC retards portability by making code non-reuseable from the perspective of real-time programmers. For all these reasons, I conclude that GC should be prohibited, just as the GOTO should be (but unfortunately, is not yet) prohibited. Your turn. Bill Wolfe wtwolfe@hubcap.clemson.edu
sommar@enea.se (Erland Sommarskog) (01/06/89)
Mike Ryer (ryer@inmet.UUCP) writes: >Currently, Ada neither requires nor precludes garbage collection. For >example, the Symbolics Ada compiler provides it (as well they should), but >none of the eight cross-compilers for the 1750A computer does (as they >shouldn't). > >What are you (any of you) proposing? That ALL Ada compilers should have >garbage collection? Than no Ada compilers should be allowed to have it? >Pragmas? I know about nothing about 1750, but I assume it's typically used for those "embedded systems" that Ada was intended for. I can agree that garbage collection is virtually of no interest on such a machine and not worth the development costs. On the other hand if I want to port my interactive program from VAX/VMS to SunOS, I can't with the rules of today rely on garbage collection, but have to stick to Unchecked_deallocation. So it would be nice if garbage collection was required to be available, or at least recommended. How it should be controlled (on or off) is a more tricky question. Pragmas is maybe not the best of ideas since to some extent it applies to the entire system. Of course, if practice turns out that all compilers except those for strict real-time use have garbage collection, then such a rule in the LRM is unnecessary. Unfortunately, this doesn't seems to be the case today. -- Erland Sommarskog ENEA Data, Stockholm This signature is not to be quoted. sommar@enea.se
ryer@inmet.UUCP (01/07/89)
Why some Ada compilers should be allowed to perform garbage collection: a. Because they are to generate code that coexists in an environment with languages like LISP where garbage collection is assumed. They want to operate on data structures shared with other languages. b. Because the current world supply of "reusable" Ada components have not all been written carefully for space management, and in prototyping any reusable component (that works) may be better than none. c. Because some users are very unconcerned with the efficiency or reusability of their programs. This applies to the run-only-once programs such as in a training environment, or where a computer is used as an oversized desk calculator just to obtain numeric answers. It may be that all computer science students should become careful implementers of ADTs. I doubt that all particle physicists need to develop this discipline even though they may write substantial programs. d. It Is FEASIBLE for the computer to do a good job of garbage collection automatically and it does reduce the size of the source code which does tend to reduce the life cycle cost of owning that code. It may increase execution time but some users will prefer this tradeoff. Therefore, garbage collection should not be prohibited, even though it is inappropriate and harmful in some cases. Different applications can benefit from different compilers. Mike Ryer Intermetrics
sommar@enea.se (Erland Sommarskog) (01/07/89)
For the third time I asked Bill Wolfe how his idea with deallocation on block exit. With his last answer, I realized that due to unclarity from my side, he may have misunderstood my example. My apologies to you all, and Bill in particular. So I try again, and see if I can get it right: We have: Generic Data_type is limited private; with Assign, "<", ">" -- What you expect Package Binary_trees is Tree_type is private; Procedure Insert(Tree in out : Tree_type; Data : in Data_type); ... End Binary_trees; Package Use_trees is -- Could be a procedure. Could be generic. Type Any_type; -- Could also be generic parameter, or imported. Type Some_type; Type Some_access_type is access Some_type; Procedure Assign(A : in out Some_access_type; B : in Some_access_type) is Begin A := B End Assign; -- Also add "<" ">" to mean comparison on Any_type here. Package Tree_handler is new Binary_trees(Some_access_type, Assign); ... Procedure Put_something_in_tree(Tree : in out Tree_type; Anything : Any_type) is Reference : Some_access_type; Begin Reference := new Some_type; Do_something(Reference.all, Anything); Tree_handler.Insert(Tree, Reference); End; ... End Use_trees; Question: What happens with Reference.all when Put_something_in_tree is exited? I read your proposal as that it should be deallocated, which would be disastrous. Or is this a misunderstanding from my side? If it is not: is this code erroneous with your scheme? Side note: you said that the tree package should take a Destroy procedure. For what benefit? You can't automatically call it when your Destroy_tree is called. The caller might refer to the stored objects in some other place. Or does your Destroy_tree have a special parameter to control this? Seems like adding two parameters (one generic and one routine) that aren't necessary, but clutters up the user's code. The user can always traverse the tree and destroy what he has there. -- Erland Sommarskog ENEA Data, Stockholm This signature is not to be quoted. sommar@enea.se
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/09/89)
From article <4222@enea.se>, by sommar@enea.se (Erland Sommarskog): > [Still discussing Erland's example regarding automatic destruction > of a local environment upon block exit. Erland now states precisely > that he is instantiating a tree which stores objects of type pointer; > in other words, we are dealing with a tree of pointers. The example > continues with a situation in which a pointer is being inserted into > the tree.] > > Procedure Put_something_in_tree(Tree : in out Tree_type; > Anything : Any_type) is > Reference : Some_access_type; > Begin > Reference := new Some_type; > Do_something(Reference.all, Anything); > Tree_handler.Insert(Tree, Reference); > End; > > Question: What happens with Reference.all when Put_something_in_tree > is exited? I read your proposal as that it should be deallocated, > which would be disastrous. Or is this a misunderstanding from my > side? If it is not: is this code erroneous with your scheme? It is a misunderstanding from your side. I stated in the preceding version of the example (or maybe the first version, or both) that since Reference is a simple pointer, destruction of Reference consists only of eliminating the space occupied by the pointer, and does not imply destruction of what Reference points to. Now let's consider: procedure EXAMPLE (INPUT : SOME_ADT) is TEMP : SOME_ADT; -- now TEMP happens to be *implemented* -- as a pointer to TEMP's descriptor, -- which in turn contains other pointers -- to TEMP's value. But since TEMP is -- limited private, our user doesn't know that. ACCESS_AN_INTEGER : access INTEGER; -- a local object begin ASSIGN (TEMP, SOME_OPERATION (SOME_ADT)); ACCESS_AN_INTEGER := new INTEGER := SIZE_OF (TEMP); -- notice the allocation... DO_SOMETHING_WITH (TEMP, SOME_ADT, ACCESS_AN_INTEGER); DO_SOMETHING_ELSE_WITH (TEMP, ACCESS_AN_INTEGER); -- -- notice failure to deallocate ACCESS_AN_INTEGER.all... -- end EXAMPLE; -- OK, now at this point we are done with TEMP. It is a -- local variable, and therefore its DESTROY procedure -- will be automatically invoked. We are also done with -- ACCESS_AN_INTEGER, so ACCESS_AN_INTEGER will also be -- destroyed. But since ACCESS_AN_INTEGER is only a -- pointer, this leaves the integer value to become -- garbage! We conclude that our user requires -- more programming experience, so that he/she -- might more completely comprehend the characteristics -- of the programming tool known as the pointer. > Side note: you said that the tree package should take a Destroy > procedure. For what benefit? You can't automatically call it when > your Destroy_tree is called. The caller might refer to the stored > objects in some other place. You're forgetting that the caller completely controls what happens inside that caller-supplied DESTROY procedure! If the caller so desires, the DESTROY procedure can consist of a null statement; then only the routine destruction of the space occupied by the pointer will occur, as an effect of the call to UNCHECKED_DEALLOCATION which destroys the space occupied by the record describing a given node. Secure in the knowledge that the caller is responsible for prescribing the destruction procedure, Destroy_Tree will in fact call Destroy automatically on each and every instance of a stored object. Bill Wolfe wtwolfe@hubcap.clemson.edu
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/09/89)
From article <124000031@inmet>, by ryer@inmet.UUCP: > > Why some Ada compilers should be allowed to perform garbage collection: > > a. Because they are to generate code that coexists in an environment with > languages like LISP where garbage collection is assumed. They want to > operate on data structures shared with other languages. Assume that the Ada code cleans up after itself. Then that code can exist perfectly in a Lisp environment; since it does not contribute to the garbage problem, the Ada code will be a very welcome guest indeed. > b. Because the current world supply of "reusable" Ada components have > not all been written carefully for space management, and in prototyping > any reusable component (that works) may be better than none. However, I'll bet that most have. At any rate, once New Ada emerges, you can bet there will be an immediate upgrading taking place among all the component vendors; the upgraded components will be available long before the New Ada compilers. > c. Because some users are very unconcerned with the efficiency or > reusability of their programs. This applies to the run-only-once > programs such as in a training environment, or where a computer is used > as an oversized desk calculator just to obtain numeric answers. It may > be that all computer science students should become careful implementers > of ADTs. I doubt that all particle physicists need to develop this > discipline even though they may write substantial programs. Let's face it: with generics, multitasking, and so on, Ada is a language for professionals. Most nonprofessionals should be using application-specific languages, which are implemented in... Ada. Particle physicists usually require high-performance software, and they should therefore turn to professional application developers. > d. It Is FEASIBLE for the computer to do a good job of garbage collection > automatically and it does reduce the size of the source code which does > tend to reduce the life cycle cost of owning that code. It may increase > execution time but some users will prefer this tradeoff. The reduction in size is trivial. Additionally, I contend that there are positive side effects with regard to a more complete programmer understanding of the problem, which leads to better overall code. Furthermore, the use of the ADT paradigm, in conjunction with the automatic destruction of local environments, means that application programmers will almost never have to deal with deallocation on an explicit basis anyway. This will provide positive incentive to use professionally designed ADTs, leading once again to higher- quality code. > Therefore, garbage collection should not be prohibited, even though it > is inappropriate and harmful in some cases. I'll heartily agree that GC is inappropriate and harmful, but you still haven't proven your case. Bill Wolfe wtwolfe@hubcap.clemson.edu
barmar@think.COM (Barry Margolin) (01/09/89)
In article <4041@hubcap.UUCP> billwolf@hubcap.clemson.edu writes: > ASSIGN (TEMP, SOME_OPERATION (SOME_ADT)); > ACCESS_AN_INTEGER := new INTEGER := SIZE_OF (TEMP); > -- notice the allocation... > DO_SOMETHING_WITH (TEMP, SOME_ADT, ACCESS_AN_INTEGER); > DO_SOMETHING_ELSE_WITH (TEMP, ACCESS_AN_INTEGER); > -- > -- notice failure to deallocate ACCESS_AN_INTEGER.all... But what if DO_SOMETHING_WITH and/or DO_SOMETHING_ELSE_WITH makes a copy of its third parameter in some permanent data structure? If ACCESS_AN_INTEGER.all were to be deallocated at this point, you would end up with a dangling pointer. Or should these routines be required to also copy THIRD_ARGUMENT.all, too? My understanding is that the Ada designers didn't originally espouse manual deallocation (hence the name UNCHECKED_DEALLOCATION) because it is an easy way to write erroneous programs, and there's no way to statically check for this error at compile time. Forgetting to deallocate something also produces an incorrect program (in non-GC environments), but the failure modes are less frequent. Also, there's a relatively simple way to fix the out-of-memory problem that results: add more memory; finding dangling references is often much harder. I've used both GC and non-GC languages. I've done lots of programming in PL/I, and now I'm a Lisp programmer. Personally, I've had few problems with dangling pointers in PL/I, but most of the programming didn't make use of ADT-style programming. The one ADT-style PL/I program I was involved with (the Multics mail system utilities) solved its problem by using reference counts (luckily, there are no potential circular reference chains) in all the data structures. But I've also seen the freedom allowed by a garbage collector in the Symbolics Lisp Machine software. It means that unrelated programs can share data structures without preplanning. A fundamental assumption of manual deallocation is that when the allocator of a structure is through with it, so is everyone else. But when references to these structures are passed around the references may be copied. Unless you want to require everyone to copy those structures you can't assume that the allocator knows when it's safe to deallocate. By the way, for systems programming and performance purposes, the Symbolics environment provides a number of manual allocation/deallocation mechanisms. A facility called "resources" allows you to maintain a free-list of similar structures, and allocate and deallocate from that list instead of from the heap. This tends to be used for large structures that are allocated and become garbage at a very high rate, such as network packets and disk buffers. There's even a primitive for deallocating a structure from the heap, although in the current implementation it does nothing if the structure isn't the youngest object in its region of the heap, because it simply moves the region's allocation pointer to the beginning of the object (if it were to do this with a non-youngest object, it would deallocate younger objects in the process); I've never actually seen this facility used, though, and I think it's only there for compatibility with PDP-10 MacLisp (the MacLisp GC put garbage onto a free-list, and there was a primitive to add an arbitrary object to this). Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/10/89)
From article <35278@think.UUCP>, by barmar@think.COM (Barry Margolin): > In article <4041@hubcap.UUCP> billwolf@hubcap.clemson.edu writes: >> ASSIGN (TEMP, SOME_OPERATION (SOME_ADT)); >> ACCESS_AN_INTEGER := new INTEGER := SIZE_OF (TEMP); >> -- notice the allocation... >> DO_SOMETHING_WITH (TEMP, SOME_ADT, ACCESS_AN_INTEGER); >> DO_SOMETHING_ELSE_WITH (TEMP, ACCESS_AN_INTEGER); >> -- >> -- notice failure to deallocate ACCESS_AN_INTEGER.all... > > But what if DO_SOMETHING_WITH and/or DO_SOMETHING_ELSE_WITH makes a > copy of its third parameter in some permanent data structure? If > ACCESS_AN_INTEGER.all were to be deallocated at this point, you would > end up with a dangling pointer. Or should these routines be required > to also copy THIRD_ARGUMENT.all, too? Some of you appear to be laboring under the false impression that when one passes a pointer into a procedure, one will have no idea what is to be done with that pointer. Let me assure you that before I pass a pointer into *any* procedure, that procedure had damn well better declare its intentions with respect to that pointer. Some possibilities are: -- The pointer must be the sole pointer to the object in question. -- This procedure then claims sole possession of that object; -- the pointer you pass in will therefore be read and then set to null. -- The pointer will be maintained as a reference to the object in qu -- question. The caller is responsible for ensuring that the object -- is not destroyed without first making a call to REMOVE_OBJECT -- in order to destroy this reference to the object in question. -- The pointer must be the sole pointer to the object in question. -- This procedure will read the object pointed to, and then destroy it; -- the pointer you pass in will therefore be read and then set to null. *********************************************** ** The answer, my friends, is DOCUMENTATION. ** *********************************************** > My understanding is that the Ada designers didn't originally espouse > manual deallocation (hence the name UNCHECKED_DEALLOCATION) because it > is an easy way to write erroneous programs, and there's no way to > statically check for this error at compile time. Failure to deallocate is a logic error; like other logic errors, they are best detected by desk checking (primary line of defense), and by using a debugger as a last resort. > I've used both GC and non-GC languages. I've done lots of programming > in PL/I, and now I'm a Lisp programmer. Personally, I've had few > problems with dangling pointers in PL/I, but most of the programming > didn't make use of ADT-style programming. Personally, I've had no problems with dangling pointers in Ada, due to the complexity management provided by the ADT paradigm. > But I've also seen the freedom allowed by a garbage collector in the > Symbolics Lisp Machine software. It means that unrelated programs can > share data structures without preplanning. You mean "without documentation". Thanks, but no thanks. A fundamental assumption > of manual deallocation is that when the allocator of a structure is > through with it, so is everyone else. But when references to these > structures are passed around the references may be copied. Unless you > want to require everyone to copy those structures you can't assume > that the allocator knows when it's safe to deallocate. Sure I can, provided the lost art of documentation is practiced from time to time... (Gee, I sure hope I don't ever have to program with YOUR colleagues...) Bill Wolfe wtwolfe@hubcap.clemson.edu
barmar@think.COM (Barry Margolin) (01/10/89)
In article <4050@hubcap.UUCP> billwolf@hubcap.clemson.edu writes: >A fundamental assumption >> of manual deallocation is that when the allocator of a structure is >> through with it, so is everyone else. But when references to these >> structures are passed around the references may be copied. Unless you >> want to require everyone to copy those structures you can't assume >> that the allocator knows when it's safe to deallocate. > > Sure I can, provided the lost art of documentation is practiced > from time to time... (Gee, I sure hope I don't ever have to > program with YOUR colleagues...) I had a feeling you'd say that. Unfortunately, at the time, I couldn't think of a better example. Here's one (I hope): There are several programs running in a common address space. Call them COMMAND_PROCESSOR, COMPILER, and EDITOR. There is a global data structure, COMPILER_WARNINGS; the operations on this database include adding warnings, selecting warnings based on various criteria, and deleting selected warnings from the database. COMPILER adds warnings, as one would expect. EDITOR has operations that make use of the data in COMPILER_WARNINGS; for instance, you can display the warnings for the current file and have the editor position you to the appropriate source line. And COMMAND_PROCESSOR has a command to clear out COMPILER_WARNINGS (which selects all warnings and then deletes them). The natural way to implement the COMPILER_WARNINGS database is as an array or list of pointers to warning structures. The ADD operation copies the pointer it's given, and is documented so that the caller knows not to try to deallocate the object it added. SELECT simply returns newly allocated pointers to the selected warnings. DELETE objviously should deallocate the pointer and remove it from COMPILER_WARNINGS, but what about pointer.all? If someone has called SELECT and it has returned a pointer to a particular warning, DELETE should not deallocate pointer.all, because there is another handle to it. EDITOR should be permitted to continue to display and manipulate selected warnings even if they've since been deleted from the global database. Reference counts can be used to make this work. Every time a warning is selected, its reference count can be incremented, and it can be decremented when it is DELETEd, and DELETE can deallocate pointer.all when it drops below 0. Callers of SELECT are required to DELETE all the selected warnings when they are done with them. There would also need to be a different DELETE operation to specify that it should be removed from the COMPILER_WARNINGS database, to be used by the COMMAND_PROCESSOR command. But I don't think this is what you are advocating. Can this be solved with only documentation? Or is this an ugly example that only a Lisp afficionado would think is clean? Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/10/89)
From article <35300@think.UUCP>, by barmar@think.COM (Barry Margolin): > There are several programs running in a common address space. Call > them COMMAND_PROCESSOR, COMPILER, and EDITOR. There is a global data > structure, COMPILER_WARNINGS; the operations on this database include > adding warnings, selecting warnings based on various criteria, and > deleting selected warnings from the database. COMPILER adds warnings, > as one would expect. EDITOR has operations that make use of the data > in COMPILER_WARNINGS; for instance, you can display the warnings for > the current file and have the editor position you to the appropriate > source line. And COMMAND_PROCESSOR has a command to clear out > COMPILER_WARNINGS (which selects all warnings and then deletes them). > % The natural way to implement the COMPILER_WARNINGS database is as an % array or list of pointers to warning structures. The ADD operation % copies the pointer it's given, and is documented so that the caller % knows not to try to deallocate the object it added. SELECT simply % returns newly allocated pointers to the selected warnings. DELETE % objviously should deallocate the pointer and remove it from % COMPILER_WARNINGS, but what about pointer.all? % % If someone has called SELECT and it has returned a pointer to a % particular warning, DELETE should not deallocate pointer.all, because % there is another handle to it. EDITOR should be permitted to continue % to display and manipulate selected warnings even if they've since been % deleted from the global database. [...] % % Can this be solved with only documentation? Or is this an ugly % example that only a Lisp afficionado would think is clean? It's in severe need of a cleanup. If I understand the problem correctly, we have a command processor, a compiler, an editor, and some set of files upon which the compiler and editor are to operate. When a file is sent through the compiler, the compiler generates a warning structure with respect to that file. The editor then makes use of these warning structures during debugging. Furthermore, a command exists whereby the warning structures for given files are destroyed, and this command must not interfere with editor processes which are still alive. Let's assume the existence of a suitably implemented warning container, which has been appropriately hardened for use as a shared variable. Our compiler proceeds by read-locking the file and write-locking the warning structure; having acquired the locks, the warning structure for that particular file is updated, and the locks are then released. Our editor proceeds by write-locking the file and read-locking the warning structure, updating the file, and then releasing all locks. Our warning destroyer proceeds by write-locking and then destroying the specified warning structure(s). That was almost too easy. Did I misunderstand the problem? Bill Wolfe wtwolfe@hubcap.clemson.edu
sommar@enea.se (Erland Sommarskog) (01/11/89)
This is where this discussion ends for my part. It has been going on far too long. It seems quite obvious that no one here, at least not me, will succeed in converting Bill Wolfe to believe that garbage collection is good. And it does not seem like he will convert anyone either. I'm glad to hear that Reference.all will survive the block exit. I am still a little puzzled, though. Assume that instead of a "simple pointer" we actually have a linked list, declared as (limited) private in some package. Then, it's Destroy routine will be called at block exit. Now, I don't know how you folks write your Assign procedures for linked lists, but I do simple pointer copying. (And I assume that Bill Wolfe does the same, since he is so keen on that his ADTs will be useful in a real-time environment. Copying value and allocating memory for the new lists would be as harmful as garbage collection, if not worse.) This means of course that my implicit Destroy procedure must be just NULL(*). Then I have an another for the user to call when is tired of his list. Excuse me, but is call-destroy-on-block-exit really a useful device? Seems like adding something to the langauge which you almost should never use. And same goes for the user to supply a Destroy procedure to the generic package. Since its body often will be NULL, its appearance in the instantiating package is just white noise. (*) This isn't really true. If the reference assignment also includes updating of reference counters, we can check them in the Destroy procedure. We have now drifted in to the land of poor man's garbage collection, where to have to very careful everytime so we are not doing something stupid. Finally, Bill Wolfe has claimed that not having garbage collection helps the programmer to understands all aspects of his code, both time and space. Eh, there are some more aspects on the code than so. For example, that it really does what it supposed to. It may very elegantly sweep away all used memory and that, but tell us that 2 + 2 = 5. No good program. Fact is, in many applications, space is seldom an important problem, if at all. All you want to be sure of is that you really got rid all of you allocated, so that your interactive user donesn't get a dump after five hours work. And, oh, the world is not divided into ADT implementors and application programmers. It just doesn't look that way. -- Erland Sommarskog ENEA Data, Stockholm This signature is not to be quoted. sommar@enea.se
barmar@think.COM (Barry Margolin) (01/11/89)
In article <4054@hubcap.UUCP> billwolf@hubcap.clemson.edu writes: > Let's assume the existence of a suitably implemented warning container, > which has been appropriately hardened for use as a shared variable. > Our compiler proceeds by read-locking the file and write-locking the > warning structure; having acquired the locks, the warning structure > for that particular file is updated, and the locks are then released. > Our editor proceeds by write-locking the file and read-locking the > warning structure, updating the file, and then releasing all locks. > Our warning destroyer proceeds by write-locking and then destroying > the specified warning structure(s). > > That was almost too easy. Did I misunderstand the problem? First of all, whether the file is locked is immaterial to the discussion (I never actually said that the compiler compiles files -- in Lisp the compiler can also be invoked on in-core interpreted functions, and many Lisp programming environments allow in-core editor buffers to be compiled). However, I did imagine locking of the COMPILER_WARNINGS database. I expected that it would be read-locked during the operation of the SELECT operation, and write-locked during ADD and DELETE. But once SELECT returns, the lock would be released. The lock serves to protect the collection structure (either an array or linked list, probably), but not the individual elements (since there are no operations that modify an existing warning object). So, I envision that the editor proceeds by read-locking the warnings structure, selecting the appropriate warnings, and then releasing the lock. It then displays the selected warnings to the user and allows him to perform operations such as "go to line referenced in next warning". In many traditional systems the warnings database is a text file containing the output of the compiler, and the SELECT operation works by reading this file and parsing it into a set of warnings structures internal to the editor. The command processor's DELETE_WARNINGS operation is then just a Delete File command. However, this mechanism effectively involves a copy operation (the contents of the warning file are copied into the editor's memory), and it is this copy that I'm trying to avoid by having the editor and compiler in the same address space and allowing them to share the warning object structures. If we do put read and write locks on the individual warning objects, we have effectively implemented reference counts. The multiple-reader locking schemes I'm familiar with increment a counter or add an element to some list every time a reader grabs the lock, and do the opposite when unlocking, so that they will know when all the readers are gone. So, if your answer is that reference counts are the right solution, just say so. It's possibly the case that all well-designed applications can be implemented using either manual or ref-count-based deallocation. However, reference counts add a significant fixed overhead to every assignment of a ref-count-protected data type; what was probably a single instruction becomes: if target not null decrement target.refcount if target.refcount = 0 then deallocate target.all increment source.refcount target = source Most modern GC schemes have time overhead that is a function (often linear) of the frequency of allocation. Since assignments are always more frequent than allocations, and I suspect usually MUCH more frequent, this difference is important. Ada prevents many of the worst types of pointer problems by only allowing pointers to be created by the heap allocation primitive (NEW). In other languages, where pointers to existing objects may be created (C's "&", PL/I's "ADDR") things can get pretty bad without a GC. I ran into such a problem last year when I was trying to use a subroutine library originally designed for display of a database in a facility for editing that database. The database is implemented as a text file, with each line representing a record, and each field of the record separated by a delimiter character. The programs are in C, and the subroutine for reading the database into memory read each line into a newly allocated string, replaced each delimiter with a null character (C's string terminator), and allocated structures containing pointers to the first character of each field. This was fine for the read-only applications, as they could simply deallocate all the strings and record structures that were allocated (they didn't actually bother, since the programs run on Unix and they depended on everything being deallocated when the program terminated). I tried to write a program that reads in this database and then allows the user to edit fields. If the new field value is shorter than the original, I simply overwrote the original value; if not, I allocated a new string for the new value, and changed the record's field pointer to point there instead of into the middle of the original line. But when it came time to deallocate, I needed to know whether individual fields needed to be deallocated independently of the lines. Had I gotten around to finishing this program, I probably would have added a parallel data structure containing flags indicating whether each field had been reallocated. In general, what I see coming out of this is that manual deallocation may always be possible, but it frequently requires a non-trivial amount of extra coding to provide bookkeeping to let the program know when it is safe to deallocate things. Remember, every additional line of code in a program increases its complexity and is another potential failure point. And details such as storage management are generally uninteresting to the application programmer, who is probably more interested in the problem domain, so it is likely that he won't be as careful when designing, coding, and testing them (storage management is also difficult to test, unless you or the runtime library include extra mechanisms just to improve debuggability). Garbage collectors, on the other hand, are usually implemented by system programmers who find them interesting and can be expected to care enough to do the very best. And if you don't think these psychological factors are important, then you're working with programmers of a different species than I'm familiar with (that reminds me, I wonder where my "Psychology of Computer Programming" is -- probably in my parents' house, unfortunately, or I'd look up references), and I wouldn't want to work with them! In conclusion, I expect that I can manage storage as well as you can. And I could also manually convert a number to a character string and send these characters to an I/O device. Every major language provides runtime support for the latter so we don't all have to waste our time writing it. What's so special about storage management that we should all be burdened with it? Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/12/89)
From article <35328@think.UUCP>, by barmar@think.COM (Barry Margolin): > First of all, whether the file is locked is immaterial to the > discussion (I never actually said that the compiler compiles files -- > in Lisp the compiler can also be invoked on in-core interpreted > functions, and many Lisp programming environments allow in-core editor > buffers to be compiled). Whatever is being processed, be it a file or something else, would have to be locked. If it is already inaccessible to all other processes, then it is continuously locked already. > [discussion of copying vs. read-locking] All editors I know of work by copying the targeted file into memory, holding it there while it's being modified, and then writing it back to the file. Another approach is the locking mechanism. Since compiler warnings generally amount to very small text files, I'd probably go the copying route if locking /unlocking consumed too much time. However, a good argument can also be made that there should not be 3 million people simultaneously editing and/or compiling the same file anyway, so it probably makes little difference which method is chosen. > Most modern GC schemes have time overhead that is a function (often > linear) of the frequency of allocation. Since assignments are > always more frequent than allocations, and I suspect usually MUCH more > frequent, this difference is important. No, not just the frequency of allocation. GC's performance also depends upon the frequency of running out of memory. Furthermore, GC is a global mechanism, and it wastes much time scanning space which is already being properly managed. > the subroutine for reading the database into memory read each line > into a newly allocated string, replaced each delimiter with a null > character (C's string terminator), and allocated structures containing > pointers to the first character of each field. This was fine for the > read-only applications, as they could simply deallocate all the > strings and record structures that were allocated (they didn't > actually bother, since the programs run on Unix and they depended on > everything being deallocated when the program terminated). I tried to > write a program that reads in this database and then allows the user > to edit fields. If the new field value is shorter than the original, > I simply overwrote the original value; if not, I allocated a new > string for the new value, and changed the record's field pointer to > point there instead of into the middle of the original line. But when > it came time to deallocate, I needed to know whether individual fields > needed to be deallocated independently of the lines. Had I gotten > around to finishing this program, I probably would have added a > parallel data structure containing flags indicating whether each field > had been reallocated. Why was each line read into a newly allocated monolithic string, with pointers into this string? It would seem far more sensible to read each *field* into a newly allocated string; then when we need to revise a field to a larger value, deallocate the old string and allocate a new one. Flags are not necessary. > In general, what I see coming out of this is that manual deallocation > may always be possible, but it frequently requires a non-trivial > amount of extra coding to provide bookkeeping to let the program know > when it is safe to deallocate things. Remember, every additional line > of code in a program increases its complexity and is another potential > failure point. And details such as storage management are generally > uninteresting to the application programmer, who is probably more > interested in the problem domain Which is why application programmers should make use of ADTs, which encapsulate and hide the details of storage management. > In conclusion, I expect that I can manage storage as well as you can. > And I could also manually convert a number to a character string and > send these characters to an I/O device. Every major language provides > runtime support for the latter so we don't all have to waste our time > writing it. What's so special about storage management that we > should all be burdened with it? Conversion of a number to a character string will perform identically whether we use a system routine or code that same routine ourselves. Deallocation of storage is a one-time cost (and a small one at that) if done by the programmer. Given the implicit destruction of local environments and the use of ADTs, application programmers will practically never have to do any explicit deallocation anyway. When it is necessary, it's not that difficult. In the database example, a single line of code would suffice. Bill Wolfe wtwolfe@hubcap.clemson.edu
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/12/89)
From article <4232@enea.se>, by sommar@enea.se (Erland Sommarskog): > I'm glad to hear that Reference.all will survive the block exit. > I am still a little puzzled, though. Assume that instead of a "simple > pointer" we actually have a linked list, declared as (limited) > private in some package. Then, it's Destroy routine will be called > at block exit. Now, I don't know how you folks write your Assign > procedures for linked lists, but I do simple pointer copying. I copy the values, whatever they are. If they are pointers, then pointers will be copied. If they are ADTs, then ADTs will be copied. If the user can't tolerate the costs of copying, the user is free to make his/her list a list of pointers rather than a list of objects. Pointer copying constitutes structural sharing, which is unacceptable; if a user updates the copy, that must not propagate back to the original. > Finally, Bill Wolfe has claimed that not having garbage collection > helps the programmer to understands all aspects of his code, both > time and space. Eh, there are some more aspects on the code than > so. For example, that it really does what it supposed to. It may very > elegantly sweep away all used memory and that, but tell us that > 2 + 2 = 5. No good program. Fact is, in many applications, space is > seldom an important problem, if at all. All you want to be sure of > is that you really got rid all of you allocated, so that your interactive > user donesn't get a dump after five hours work. Precisely why implicit destruction is a valuable aid to programmers, leaving them secure in the knowledge that whatever local variables they create will be automatically destroyed upon block exit, and free to concentrate on the other aspects of the application such as how much space the procedure will consume while it is active, which then feeds into the overall order-of-magnitude figure for space consumption. Bill Wolfe wtwolfe@hubcap.clemson.edu
barmar@think.COM (Barry Margolin) (01/12/89)
In article <4066@hubcap.UUCP> billwolf@hubcap.clemson.edu writes: >From article <35328@think.UUCP>, by barmar@think.COM (Barry Margolin): >> First of all, whether the file is locked is immaterial to the >> discussion (I never actually said that the compiler compiles files -- >> in Lisp the compiler can also be invoked on in-core interpreted >> functions, and many Lisp programming environments allow in-core editor >> buffers to be compiled). > > Whatever is being processed, be it a file or something else, > would have to be locked. If it is already inaccessible to > all other processes, then it is continuously locked already. The reason I said it was immaterial was because we are discussing garbage collection and structure sharing, not file management. > >> [discussion of copying vs. read-locking] > > All editors I know of work by copying the targeted file into > memory, holding it there while it's being modified, and then > writing it back to the file. Another approach is the locking > mechanism. Since compiler warnings generally amount to very > small text files, I'd probably go the copying route if locking > /unlocking consumed too much time. However, a good argument > can also be made that there should not be 3 million people > simultaneously editing and/or compiling the same file anyway, > so it probably makes little difference which method is chosen. And you're still harping on how the editor manages the file. The whole point of my example was on how the editor, compiler and command processor, running in the same address space (the compiler could be invoked as a subroutine by the editor, for instance), share the in-memory warnings database. 3 million people aren't editing and compiling the same file, but one user is, and he may tell the command processor to delete the compiler warnings before he's finished scanning through them in the editor (because he doesn't want them cluttering up the database if he asks for other compiler warnings to be displayed). >> Most modern GC schemes have time overhead that is a function (often >> linear) of the frequency of allocation. Since assignments are >> always more frequent than allocations, and I suspect usually MUCH more >> frequent, this difference is important. > > No, not just the frequency of allocation. GC's performance also > depends upon the frequency of running out of memory. Furthermore, > GC is a global mechanism, and it wastes much time scanning space > which is already being properly managed. Real-time GC mechanisms are defined to do a bounded amount of work PER ALLOCATION. Decent GC mechanisms can be told to ignore manually managed space. And generational and ephemeral GC mechanisms concentrate their efforts on memory most likely to contain garbage. > Why was each line read into a newly allocated monolithic string, > with pointers into this string? It would seem far more sensible > to read each *field* into a newly allocated string; then when we > need to revise a field to a larger value, deallocate the old string > and allocate a new one. Flags are not necessary. Probably because the language's runtime library provides operations for reading by lines and for searching strings. A program that uses standard library routines is easier to read than one that uses its own. I admit that the database package was not written wonderfully (the guy who wrote it is primarily a Lisp programmer, but he was one of the few people in our company willing to write C utilities at the time). It's a locally-used facility, not part of any product, so high quality was not the top priority. > Which is why application programmers should make use of ADTs, > which encapsulate and hide the details of storage management. No. The application programmer must still know when to call the ADT's DESTROY procedure, or know to call the LOCK and UNLOCK procedures, etc. > Deallocation of storage is a one-time cost (and a small one at that) > if done by the programmer. Given the implicit destruction of local > environments and the use of ADTs, application programmers will > practically never have to do any explicit deallocation anyway. > When it is necessary, it's not that difficult. In the database example, > a single line of code would suffice. You continue with this "one-time cost" fallacy. It is a cost that must be paid at least every time an ADT is designed and implemented, and it also adds complexity that must be borne by the users of the ADT. GC is truly a one-time cost (per runtime implementation). Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
billwolf@hubcap.clemson.edu (William Thomas Wolfe,2847,) (01/12/89)
From article <35340@think.UUCP>, by barmar@think.COM (Barry Margolin): > 3 million people aren't editing and > compiling the same file, but one user is, and he may tell the command > processor to delete the compiler warnings before he's finished > scanning through them in the editor (because he doesn't want them > cluttering up the database if he asks for other compiler warnings to > be displayed). Why doesn't the editor do the deletion of warnings? If the editor is already providing a "search for next warning" function, it should provide the "delete warning" function as well. Then we wouldn't have to deal with shared variables at all, although your later comments do make the whole question moot. > Real-time GC mechanisms are defined to do a bounded amount of work PER > ALLOCATION. [....] And generational and ephemeral GC mechanisms > concentrate their efforts on memory most likely to contain garbage. Could you send me a detailed explanation of these mechanisms? > Decent GC mechanisms can be told to ignore manually managed space. If this means that I can specify that my allocation requests carry the condition that the GC mechanism will stay away from the space I receive, then I'll be happy to change my position on garbage collection. (Erland, you may have spoken too soon!) HOWEVER. There are still things that need changing. There still needs to be implicit destruction of local environments via calls to DESTROY routines. Parameter passing is still ambiguous with respect to storage utilization. In general, I can live with GC being required IFF I can also exert total control over when, how, and for how long every piece of space is occupied, and I can determine exactly how much space is being used implicitly and for how long. If what you say is true, then nothing prevents me from designing an entire system in which GC is completely absent and for which I can make firm statements regarding the system's time and space characteristics; even though the compiler is required to provide GC for those who wish to use it, the programmer can require the compiler to provide a local environment in which GC is prohibited. This I can live with. Bill Wolfe wtwolfe@hubcap.clemson.edu
barmar@think.COM (Barry Margolin) (01/15/89)
In article <4073@hubcap.UUCP> wtwolfe@hubcap.clemson.edu writes: > Why doesn't the editor do the deletion of warnings? If the editor > is already providing a "search for next warning" function, it should > provide the "delete warning" function as well. Then we wouldn't have > to deal with shared variables at all, although your later comments do > make the whole question moot. Who said that the editor doesn't provide for deletion of warnings, too? But that shouldn't prevent the command processor from also deleting them. >> Real-time GC mechanisms are defined to do a bounded amount of work PER >> ALLOCATION. [....] And generational and ephemeral GC mechanisms >> concentrate their efforts on memory most likely to contain garbage. > > Could you send me a detailed explanation of these mechanisms? The canonical reference for real-time GC is a paper, about 10 years old, by Henry Baker. I don't know the exact reference, but the title included the phrases "Garbage Collection" and "Real-time", and it was probably in something like a proceedings of a POPL conference. The general idea of these mechanisms is that every time an object is allocated up to N words are scanned for non-garbage objects to be copied. My reference for ephemeral GC is David Moon's paper, approximately titled "Garbage Collection in Large Virtual Memory Systems", which is in the proceedings of the first (or maybe second) ACM Lisp Conference. I don't know what the references for generational GC's are; recent Lisp and Functional Programming Conference proceedings are probably a good place to look. >> Decent GC mechanisms can be told to ignore manually managed space. > > If this means that I can specify that my allocation requests carry > the condition that the GC mechanism will stay away from the space > I receive, then I'll be happy to change my position on garbage collection. Well, the GC implementation that I'm most familiar with is the Symbolics 3600. This architecture organizes the address space into logical entities called "areas". Each area can be flagged as either dynamic or static, the latter indicating that garbage should not be collected there. The GC still has to scan through static areas to see whether they contain references to objects in dynamic areas, though; otherwise the GC might create a dangling reference. In the Symbolics system, the primary use of static areas is to prevent pure code pages that are paged directly out of load files from being copied into swap space. They are also used for holding I/O buffers, which are manually reused and therefore never become garbage. The features they are missing for your purposes are sophisticated allocation and deallocation primitives; the only primitives the Lisp Machine provides always allocate and deallocate space at the end of an area, because the system depends on the GC to free up interior space. But you could easily build a sophisticated allocator that manages its own space inside a static area. > HOWEVER. There are still things that need changing. There still needs > to be implicit destruction of local environments via calls to DESTROY > routines. Parameter passing is still ambiguous with respect to storage > utilization. In general, I can live with GC being required IFF I can > also exert total control over when, how, and for how long every piece > of space is occupied, and I can determine exactly how much space is > being used implicitly and for how long. Lisp-like languages provide for this by providing macros that encapsulate the allocation and deallocation of objects. For example, in Common Lisp one can write: (WITH-OPEN-FILE (variable file-name options...) body...) This opens the specified file, assigns the stream to the variable, executes the body, and then closes the file. And it guarantees that the file will be closed no matter how the environment is exited. This is generally implemented using Lisp's UNWIND-PROTECT primitive, which allows arbitrary code to be executed when an environment is being exited. Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar