pcg@cs.aber.ac.uk (Piercarlo Grandi) (06/29/90)
I have posted some article advising somebody running a simulation with a probable memory leak either to get a tracing malloc, or to use automatic storage management (garbage collection). Having received a few requests for references, I think many will appreciate seeing them posted. Here they are, from my home News archives: From dld@F.GP.CS.CMU.EDU 13 Feb 89 23:30:30 GMT Path: aber-cs!dcl-cs!ukc!mcvax!uunet!lll-winken!ames!mailrus!cornell!rochester!pt.cs.cmu.edu!pt!dld From: dld@F.GP.CS.CMU.EDU (David Detlefs) Newsgroups: comp.lang.c++ Subject: RE: Garbage Collection Message-ID: <DLD.89Feb13183030@F.GP.CS.CMU.EDU> Date: 13 Feb 89 23:30:30 GMT Organization: CMU CS Department Lines: 42 Status: OR << Kevin Sullivan asks about garbage collection for C++. >> I've been doing a literature search in this area recently, and have found the following relevant references: Boehm & Weiser: "Garbage Collection in an Uncooperative Environment" Software Practice & Experience, September 88 Caplinger: "A Memory Allocator with Garbage Collection for C" Winter 88 Usenix Both of these present mark & sweep algorithms fairly compatible with the existing malloc/free/etc interface. Bartlett: "Compacting Garbage Collection with Ambiguous Roots" DEC WRL Research Report 88/2 Outlines an approach applicable to a copying collector for C. (To get, send a message to wrl-techreports@decwrl.dec.com, with body "help.") All of these methods are applicable, mutas mutandis, to C++. There is also a rumor that Chris Torek of Maryland is working on something along these lines (Chris, if you read this, could you confirm or deny to me, along with a pointer to a reference, or a description of what you're doing? Thanks...) Hopefully I've done a service. Perhaps the collective consciousness out there can help me in return. I would like to find some emprical evidence on just how much more reliable garbage collection makes programs. The ideal reference would be one that states "in project X, we found Y bugs, Z% of which were caused by improper explicit storage management. These took W% of the time to debug..." Any pointers to such a paper would be greatly appreciated. Thanks! -- Dave Detlefs Any correlation between my employer's opinion Carnegie-Mellon CS and my own is statistical rather than causal, dld@cs.cmu.edu except in those cases where I have helped to form my employer's opinion. (Null disclaimer.) -- From bartlett@decwrl.dec.com Sun Dec 10 18:14:34 1989 From: bartlett@decwrl.dec.com (Joel Bartlett) Newsgroups: comp.lang.c++ Subject: Generational, Compacting Garbage Collection for C++ Date: 8 Dec 89 16:53:38 GMT Distribution: comp.lang.c++ Organization: DEC Western Research Lab Status: OR A technical report describing a generational, compacting garbage collector for C++ is available from Digital's Western Research Lab: WRL Technical Note TN-12 Mostly-Copying Garbage Collection Picks Up Generations and C++ Joel Bartlett October 1989 The "mostly-copying" garbage collection algorithm provides a way to perform compacting garbage collection in spite of the presence of ambiguous pointers in the root set. As originally defined, each collection required almost all accessible objects to be moved. While adequate for many applications, programs that retained a large amount of storage spent a significant amount of time garbage collecting. To improve performance of these applications, a generational version of the algorithm has been designed. This note reports on this extension of the algorithm, and its application in collectors for Scheme and C++. For instructions on how to order a hard copy report or have the Postscript e-mailed to you, send a message to our technical report server at Wrl-Techreports@decwrl.dec.com with the word "help" in the subject line. jfb From chase@Ozona.UUCP 16 Nov 88 19:30:05 GMT Path: aber-cs!dcl-cs!ukc!mcvax!uunet!husc6!mailrus!ncar!ames!oliveb!Ozona!chase From: chase@Ozona.orc.olivetti.com (David Chase) Newsgroups: comp.lang.c Subject: C garbage collection Message-ID: <32696@oliveb.olivetti.com> Date: 16 Nov 88 19:30:05 GMT References: <8841@smoke.BRL.MIL> <1988Nov11.232629.15414@utstat.uucp> <20588@apple.Apple.COM> <32596@oliveb.olivetti.com> <276@esosun.UUCP> Sender: news@oliveb.olivetti.com Reply-To: chase@Ozona.UUCP (David Chase) Organization: Olivetti Research Center, Menlo Park, CA Lines: 88 Status: OR In article <276@esosun.UUCP> jackson@freyja (Jerry Jackson) writes: >(I wrote) > >>(Yes, a garbage collector for C -- >> ~ftp/sun-source/gc.shar@titan.rice.edu >>It works on Sun3s, Sun4s, Vaxes. Send mail with subject "help" to >>"archive-server@rice.edu" if you lack FTP access.) >> > >I've written garbage collectors for lisp and have a pretty good idea >what is involved... I can't imagine what this does, but I'm pretty sure >it's something very different. Could someone please explain what this >program does? Sure (I thought you'd never ask :-). The collector was written by Hans-J. Boehm, Alan J. Demers, with contributions from Mark Weiser at Xerox and suggestions from me here. The algorithms are described in Boehm, H., and M. Weiser, "Garbage Collection in an Uncooperative Environment", Software Practice & Experience, September 1988. The collector/allocator segregates objects by size (Big Bags of Pages) and keeps a free list for each size. Objects too large to fit in a page, of course, are allocated using a different strategy. The mark bits for each piece of allocated memory are stored in a "bit array" at the beginning of each page; there are no "header words" prepended to allocated memory. Pages of objects can also be tagged as "non-pointer-containing" to improve the performance of the collector. After collection, pages are scanned to rebuild free lists and empty pages. Collection itself is just trace-and-sweep, using process stack to store temporary marking information. The collector makes certain assumptions about pointer use that don't hold for all C programs; however, they hold for most C programs. These are: 1) BSS, stack, registers, and data are always "live". 2) If something "live" contains a pointer to the *beginning* of a block of allocated memory, then that object is "live". 3) Linked lists store their "next" pointer in the first word of an object (this is just an important optimization). Obviously this collector will sometimes fail to reclaim an object, because some integers will look like pointers. In practice, this is not a problem. Obviously, this collector can be tricked into reclaiming something to which a pointer is retained if the pointer is encoded. Knuth's pointer XORing trick in lists is not compatible with this collector, and adding 4 to every pointer will also confuse it. Note, however, that there is nothing wrong with encoded pointers to an object AS LONG AS AT LEAST ONE pointer to the beginning remains. Thus, this collector can be used with a "front-end allocator" if the front-end allocator keeps its pools linked together. Most C programs do retain pointers to the beginning of objects, so this usually works. How is this used with C? Typically, I replace "malloc" with "garbage-collected malloc" and "free" with "do-nothing". The garbage collector supports freeing of allocated structures, but I don't bother. If free is do-nothing, then a trivial realloc maintains compatibility with the old-style realloc that people seem to worry about so much (this is so because if you can "realloc" a pointer, then you must have a copy of it somewhere, and if you have a copy of it somewhere, then in the garbage collector's eyes it hasn't really been freed, and thus hasn't been overwritten or reused. Note that with BIBOP allocation a realloc must almost always perform a new allocation.) What have we used this with? Code, lots of it. (Read the SP&E article; they ran it with lots of stuff, also.) Thousands of lines of C and Modula-2+, generated by tools, compiled by three different compilers (M2+, cc, gcc), taken from the X library, and recycled from years ago. We are perhaps not a typical group, since M2+ requires garbage collection, and all code written recently (by us) was written with the knowledge that a collector would be used. We've only had two problems with it: 1) if you don't put your linked-list links in the first word and have BIG lists, you'll gobble up lots and lots of stack. Note that signals are turned off during collection to avoid unpleasant surprises. 2) because there are no header words attached to objects, this allocator exposes overwriting bugs that otherwise go unnoticed until memory is freed (and if memory isn't freed, then they just go unnoticed). For more information, obtain the collector and read the README file. The shar file is about 67K large. David From bet@bent 10 Jun 89 23:11:06 GMT Path: aber-cs!dcl-cs!ukc!mcvax!uunet!cs.utexas.edu!rutgers!mcnc!duke!bet From: bet@bent (Bennett Todd) Newsgroups: comp.unix.wizards Subject: Re: Need debugging version of malloc()/free() Message-ID: <14721@duke.cs.duke.edu> Date: 10 Jun 89 23:11:06 GMT References: <850@pcsbst.UUCP> Sender: news@duke.cs.duke.edu Reply-To: bet@bent (Bennett Todd) Organization: Diagnostic Physics, Raddiology, DUMC Lines: 22 Posted: Sat Jun 10 19:11:06 1989 In-reply-to: jkh@pcsbst.UUCP (jkh) Status: OR My error-checking wrappers library (libbent) does this in its wrappers around malloc/free/realloc (emalloc, efree, and erealloc, respectively). I sometimes regret having done this, since as a result of this decision, the pointers handled by emalloc/efree/erealloc aren't interchangeable with malloc/free/realloc, whereas the other cookies that are passed about (file descriptors, FILE pointers, and so forth) are interchangeable between my e* wrappers and the lower level functions. On the other hand, I haven't gotten into trouble as a result of mixing yet, and the additional checking has helped me catch some evil bugs quickly. My code has emalloc over-allocate by twice the length of a header structure, which includes a magic number and the allocation length. One copy is prepended, and one appended, and the pointer to the middle space is returned. On efree and erealloc the headers and trailers are checked; on efree they are mangled in a predictable fasion, so that double freeing can be detected. This catches double freeing, freeing bogus pointers, and running off either end of the array. I use these in all my production code. Although they don't catch problems as quickly as some other more ambitious packages out there, they also don't inflict enough overhead to make me want to leave them out of production code. -Bennett bet@orion.mc.duke.edu From roy@phri.UUCP 8 Jun 89 01:35:09 GMT Path: aber-cs!dcl-cs!ukc!mcvax!uunet!cs.utexas.edu!rutgers!cmcl2!phri!roy From: roy@phri.UUCP (Roy Smith) Newsgroups: comp.unix.wizards Subject: Re: Need debugging version of malloc()/free() Message-ID: <3795@phri.UUCP> Date: 8 Jun 89 01:35:09 GMT References: <850@pcsbst.UUCP> Reply-To: roy@phri.UUCP (Roy Smith) Organization: Public Health Research Inst. (NY, NY) Lines: 226 Status: OR jkh@meepmeep.pcs.com ( Jordan K. Hubbard ) writes: > Has anyone written a version of malloc/free that keeps extra information > around about each chunk allocated and free'd so that garbage things free'd > (or legitimate data free'd multiple times) can be detected and reported? I have two such packages, both of which were posted to the net some time ago. Michael Schwartz's maltrace package (slightly reformatted README file included below) is for tracing memory leaks; not exactly what Jordan asked for, but useful in its own right. I don't have much experience with it personally. Another package, from Brandon Allbery (see excerpt below), is useful for the kind of stuff Jordan wants to do; ckecking for corrupted malloced()'d memory, bad or redundant free()'s, etc. I've used it a lot and think it's wonderful. If you have BSD source, you should be able to recompile your standard malloc package to do similar checking, but I've never done so, I just use Brandon's. See the headers below to find out where and when the articles were posted originally. If you can't find them in some standard archive site (they are probably both on uunet, I would guess) let me know and I could mail them to you. > An additional plus would be for malloc() to somehow detect when you'd gone > off the end (either end) of a malloc'd block, but I can see how this would > be a lot harder to do, if not impossible in some cases. I've never tried this, but it seems possible to automatically generate a call to a checking function after each statement using a procedure like what dbx's "next" function does to single-step through a program to gain control. Each time you malloc a block, you could generate a few extra words before and after the block and fill them with magic numbers. The checking function could make sure all the magic numbers are as they should be. Of course, you still want to have free() be extra careful about checking to make sure it's freeing something it's supposed to. You might also have free zero out every byte that it frees and check to make sure that they stay zero (or some magic number, preferably an odd number which is not likely to be a valid pointer) each time the check function is called. This will catch programs which still use free()'d memory. This would all be slow as shit, but at least it would catch most malloc errors. Perhaps you could have switches all over the place to trade off speed for exhaustiveness -- maybe only check on every Nth statement execution, maybe not to the free'd zero checking, etc. Another thing which might be nice to have is a flag to malloc to automatically make each allocated block N bytes longer than is asked for. If your program has mysterious crashes which go away when you allocate an extra 4 bytes on each block, you might want to start looking for off-by-one errors at the ends of malloc'ed blocks. ---------------------- Newsgroups: comp.sources.misc Subject: maltrace -- trace un-free()'d space with dbx (maybe others?) Message-ID: <2835@ncoast.UUCP> Date: 10 Jul 87 02:00:34 GMT X-Archive: comp.sources.misc/8707/39 Malloc Leak Trace Package by Michael Schwartz University of Washington Computer Science Department Seattle, Washington, USA schwartz@cs.washington.edu (ARPANET) schwartz%cs.washington.edu@relay.cs.net (CSNET) ...{allegra,caip,ihnp4,nike}!uw-beaver!schwartz (UUCP) April 1987 1. Description This package is designed to help trace dynamic memory allocation leaks. It works by providing the standard malloc/free/realloc interface, but at the same time keeping track of all malloc'd buffers, including their size, order of call, and address. That way, at any point during the execution of your program, you can see what malloc'd buffers haven't yet been freed. It is particularly useful when your program performs many allocations before reaching some steady state, and hence you want to ignore the initial allocations and concentrate on where steady-state leaks occur. The idea is that you have some code (usually a server) that looks as follows: initialization code; do { ... } while (1); /* main loop */ There might be some dynamic allocation during the initialization, but this isn't where the memory leak is, since it's a one-shot allocation (i.e., at worst the initialization wastes some memory, but doesn't continually leak it). There might also be some dynamic allocation in the first few iterations of the main loop, until a "steady state" is reached (e.g., until some cache gets filled). In both cases (initialization and pre-steady state iterations), there may be many allocation calls, but you don't really want to look at them; rather, you want to look at what memory isn't being free'd once steady state has been reached. This package helps you to see the state of memory allocation after steady state has been reached. Bug reports and suggestions for improvements are welcome. 2. Instructions To use this package, take your favorite malloc/free/realloc code, and change the routine names as follows: malloc -> mmalloc free -> ffree realloc -> rrealloc You'll probably also need to add the following line to the beginning of your malloc.c: char *malloc(); (because realloc still calls malloc, but malloc is no longer defined in that file). Then link the program to be leak-traced with maltrace.o, btree.o, and (your modified) malloc.o. I would have included my malloc.c, but it's the copyrighted BSD 4.3 code, and besides, there are plenty of public domain malloc's available (e.g., in volume 6 of mod.sources). To trace a leak, take the example program skeleton, and augment it as follows: extern int MalTraceEnbld; extern int MalBrkNum; initialization code; do { ... if ( steady state reached) MalTraceEnbld = 1; ... at end of first steady-state cycle: PMal(-10); /* print last 10 (say) malloc's that haven't yet been free'd */ } while (1); /* main loop */ Then compile the program with -g, and run it. At the end of the first cycle, PMal will print a list of the last 10 malloc's that haven't yet been freed. (PMal(n) will print the first n entries if n > 0, the last -n entries if n < 0, and all entries if n == 0). Note the sequence number of one of these mallocs, and then go into dbx on the program, and put a breakpoint somewhere in the initialization code, and run the program. When you hit the breakpoint, use dbx to set MalBrkNum to the number of the malloc you just noticed, and set a break in MalBreak. Then, continue the program. When the malloc call in question is reached, MalBreak will get called, breaking, and giving you a chance to examine the state of the program at the time of this (potentially leaking) malloc call. In case this call is still within the steady-state setup (it is sometimes difficult to find where the setup ends), you can use dbx to call NextMal, to set MakBrkNum to be the next traced malloc call. 2.1 Usage Details This technique is not applicable to situations where the steady state allocation behavior (i.e., order and size of malloc requests) exhibits variation, e.g., due to pseudo-randomization or interaction with other processes via non-deterministically ordered message exchanges. In such situations you can sometimes inhibit the variation during debugging (e.g., by forcing interactions to occur in the same order each time). Alternatively, you can use dbx to set MalBreakSize to be the size of the malloc request at which to have MalBreak called, to reach a breakpoint (similar to the MalBrkNum scheme described above). This can be useful when the order of mallocs isn't fixed, but a particular size keeps showing up in the list of malloc's that haven't yet been free'd. There is also a routine called UntracedFree that gets called when a free call is made on an address that was not malloc'd with tracing enabled (again, this routine is present to allow one to set dbx breakpoints for this event). This could either indicate a free call on an address that isn't malloc'd (a bug) or a free call on an address that was malloc'd with tracing disabled. You can determine if it was of the former nature by going through the standard malloc code. For example, in the BSD code, you can set the switches -DDEBUG and -DRCHECK to check for this and other types of bugs. Alternatively, you can enable tracing from the very beginning of your program, and then any time UntracedFree gets called, it indicates a free call on an addresss that isn't malloc'd. 3. Interactive Demo You can try out this package interactively by making the program 'test'. Note that if you tell it to free some memory that was not malloc'd (with MalTraceEnbld = 1), it will give you a warning and then try to free the address anyway (for the reasons explained earlier). This may or may not cause malloc/free to get into a bad state; in BSD malloc this can cause a core dump, for instance. 4. Acknoledgements, History Thanks to Richard Hellier from the University of Kent at Canterbury (rlh@ukc.UUCP) for the btree package (which I modified slightly for the current package). I probably could have implemented my trace package more efficiently than it works currently (e.g., by incorporating the linked-list and btree nodes into the malloc header nodes), but I was more into hacking something together quickly that would solve my problems than making efficient code. Besides, this code doesn't need to be efficient, since it's only plugged in during debugging. ---------------- From: allbery@ncoast.UUCP (Brandon S. Allbery) Newsgroups: comp.sources.misc Subject: malloc package with debugging Message-ID: <3268@ncoast.UUCP> Date: 21 Jul 87 01:32:07 GMT X-Archive: comp.sources.misc/8707/59 This is my malloc() debugging package. Response to my small note was rather startling, so I'm posting it. The basic idea is that malloc'ed and free'd blocks are kept in doubly-linked lists. Every time an allocation function (malloc, free, calloc, realloc, or _mallchk) is called, the lists are checked to make sure the pointers have not been overwritten and the sizes are valid. They catch the majority of malloc'ed array overflows, and print dumps on segmentation and bus errors in order to determine if a memory overwrite was involved. They aren't perfect (an interpreter or other form of full runtime checking of every assignment would be needed for that), but they're pretty good. One warning: you can't depend on a free()'d block still being available, it will sbrk() backwards if possible. It also doesn't coalesce adjacent free blocks or do other kinds of "optimum" memory management; I consider this unimportant, since this is a debugging package rather than a full replacement for malloc. It's also slower than the "real" malloc. The code is included below; it's probably heavily 68000 dependent, but I've done my best to reduce such dependencies to a miminum. -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 {allegra,philabs,cmcl2,rutgers,hombre}!phri!roy -or- roy@alanine.phri.nyu.edu "The connector is the network" -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
dld@F.GP.CS.CMU.EDU (David Detlefs) (07/01/90)
Just to follow up on Piercarlo's post -- after doing the literature I mentioned, I did some work of my own, which can be read in @TechReport(Detlefs90, Author = "Detlefs, David L.", Key = "Detlefs", Title = "Concurrent Garbage Collection for C++", Institution = "Carnegie Mellon School of Computer Science", Number = "CMU-CS-90-119", Month = "May", Year = "1990" ) This extends Bartlett's work somewhat, and makes the collection concurrent (the collector and the program proper can run at the same time.) The code itself probably wouldn't be useful to too many people, since 1) it relies on features of the Mach operating system, and 2) it requires the use of a modified AT&T cfront 1.2 compiler. If anybody wants a copy of the tech report, please send me a note. Dave -- Dave Detlefs Any correlation between my employer's opinion Carnegie-Mellon CS and my own is statistical rather than causal, dld@cs.cmu.edu except in those cases where I have helped to form my employer's opinion. (Null disclaimer.)
pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/19/90)
[ I have added comp.lang.c++ to the newsgroup list, because this is is
discussion mostly on C++ implementation, not on general OO
implementation, but I have ensured that followups are to comp.object to
avoid fragmenting the thread across two newsgroups ]
On 18 Sep 90 13:13:07 GMT, sakkinen@tukki.jyu.fi (Markku Sakkinen) said:
sakkinen> In article <PCG.90Sep17152746@odin.cs.aber.ac.uk>
sakkinen> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
pcg> The definition of "in use" being that its address as if returned by the
pcg> storage allocator appears in one or more pointers in storage. If you are
pcg> using any other definition of "in use" you are making your own rules,
pcg> and we are no longer speaking of storage allocation or reclamation as is
pcg> described in the current academic literature on the subject.
sakkinen> Aha, we have been playing with different rules.
This I seemed to understand. Unfortunately your rules are somewhat
questionable; your arguments are based on the assumption that the
programmer and the storage reclaimer may use a different definition of
'in-use', by using different notions of pointer.
Under such an assumption then you show that storage reclamation is not
safe. This is not an interesting result :-).
sakkinen> I thought we were talking about the applicability of GC to
sakkinen> _general_ C++ programmes. You in contrast are intending only
sakkinen> code that obeys certain additional constraints.
The idea that programmer and storage reclaimer use the same notion of
'in-use' and thus of pointer is not an additional constraint, under any
commonly accepted interpretation of the word.
sakkinen> I have no objection to the statement that "GC is applicable to
sakkinen> C++ programmes as far as they obey rules that make them
sakkinen> susceptible to GC."
The correct wording is "Storage reclamation of any sort is only possible
in C++, as in any language, as far as C++ programmes agree with the
storage reclaimer on what is a pointer".
sakkinen> The constraint that you suggest above seems extremely severe
sakkinen> for C++.
It does not seem so to me; it merely excludes all erroneous programs,
i.e. all those that violate the interface contract with the storage
manager. The interface contract on operator delete, the C++ manual
storage reclaimer, is even more severe, as in:
sakkinen> For instance, suppose we have a case of multiple inheritance
sakkinen> where class D is some other base class than the first, of class E.
Nitpicking: well, it could be *any* base class -- there is no reason for
which MI has to be implemented by strict prefixing. The problem is valid
in general.
sakkinen> If we now create an object by 'new E' and cast the resulting pointer
sakkinen> to the type 'D*', it will no more point to the same address that
sakkinen> the allocator returned (at least in some implementations of C++).
Well, in that case you cannot use that pointer as operand to operator
delete either, so if you are doing manual storage reclamation your
object has become unreclaimable -- unless you keep around a pointer to
the 'E*', in which case no problem arises under any reclamation policy.
C++ (like most any language I can remember) has a rule that says that
you can only apply operator delete to the same pointer, in value *and*
type, that was returned by some call to operator new. A (conservative or
not) garbage collector need not be that restrictive.
sakkinen> This is legal, sound, and commonplace C++ usage, but according
sakkinen> to your rule the object would no more be "in use".
It would be 'in-use' as long as the resulting pointer did not have a
different *format* (I very carefully wrote that) from that returned by
the storage allocator and returned by the storage deallocator; in your
example the format is not changed, just the value of the pointer is.
For example, this is virtually exactly equivalent to have a pointer to a
structure member, or an array element, in C, as in:
struct s { int a,b,c; }; int *ip;
s *s1 = (struct s *) malloc(sizeof (struct s));
ip = (int *) &s->b; s1 = 0;
Well, after this, a conservative garbage collector will *not* reclaim
the relevant storage area. The storage for fields 'a' and 'c' will be
wasted of course, but this is nearly unavoidable, without type
information and a very sophisticated reclaimer.
Note that the conservative reclaimer will do one better than even a
manual one using free(3) or operator delete would, because if these were
used either one would have to reclaim all of the structure, thus making
'*ip' meaningless, or could not reclaim the structure even if the
reference to 's1->b' were lost, because
free((void *) ip); ip = 0;
would of course be illegal (by the rule mentioned above that you can
only deallocate an object with the same pointer that was returned by a
call to the storage allocator).
Let's switch to C++ and consider your example
class C { ... };
class D { ... };
class E : C,D { ... };
E *e = new (E);
D *d = (D *) d;
Now, as in the example above, if you lose the pointer contained in 'e',
the object allocated by operator new is unreclaimable, because you
cannot write
e = 0;
delete d;
Of course you can write
e = 0;
delete (E *) d;
but here you are making use of information that you have not given to
the compiler. By contrast a conservative g.c. that uses a BIBOP style
storage allocator will correctly not reclaim that space as in
e = 0;
Reclaim()
because 'd' still protect the entire object, but will effciently reclaim
the space when no longer protected by either 'e' or 'd' as in:
e = 0; d = 0;
Reclaim();
Knowing this, I did not say that the conservative storage reclaimer
would only work with pointers returned by the allocator, but with
pointers *of the same format*, which is weaker than the equivalent
interface contract on manual storage reclamation.
In other words, in similar situations, the conservative g.c.
will safely reclaim *more* space than manual reclamation.
Even illegal pointers are allowed (i.e. those denoting an invalid
address), as long as they have the same format as those returned by the
allocator component of the storage manager.
pcg> Even free(3) will fail if you pass it an encrypted or XOR-ed or
pcg> otherwise encoded form of a pointer.
sakkinen> (Since we are speaking about C++, we should refer to the
sakkinen> 'delete' operator instead of the 'free' function, which is
sakkinen> unknown to C++ standards unless it has surfaced in Release
sakkinen> 2.1.)
Ahhh. Here then we merely need to make the storage reclaimer understand
the format of pointers accepted by operator delete -- if you are using
an underlying storage reclaimer that cannot accept all pointer formats
that the programmer can supply to operator delete, you are using the
wrong reclaimer.
It is an interesting question whether it is always possible to implement
C++ in such a way that all pointer formats used by the implementation
can be fairly easily identified as such by examining a snapshot of
memory, without using type information.
In other words: can C++ be implemented without modification in
such a way that the interface contract with the storage manager
allows for conservative garbage collection?
This is certainly true for C, and almost any other language I can
remember. I think it is also true for C++, because all pointers in it
must point either to the beginning or within a valid object (or array
thereof, except for the harmless end-of-array exception), and a BIBOP
style storage allocator means that one has no problem with pointers to
the middle of an object (like in your example above).
I think that the only problem seems to be with pure based or relative
pointers; but probably this is not a problem, because while a based or
relative pointer is essentially undetectable in a memory snapshot, it is
not meaningful if there is no absolute pointer to the whole object
somewhere, and this absolute pointer will protect the object.
sakkinen> Certainly, but 'delete' does not require that you have
sakkinen> preserved the pointer exactly in its original form, without
sakkinen> interruption, all the time from 'new'. That's the difference!
Here you seem to say that with manual storage reclamation the programmer
can temporarily change the format of a pointer and leave the
corresponding object "unprotected", because reclamation will only be
triggered manually.
Well, an automatic storage reclaimer need not be triggered at unexpected
times -- indeed any decent automatic storage reclaimer offers you the
option either to be only activated explicitly or to defer its invocation
till the end of a critical region, if you are temporarily violating the
interface contract with it.
pcg> It is *much* safer for application programmers to leave reclamation in
pcg> the hands of a purpose built reclaimer, than to try to design their own
pcg> reclamation policy and algorithm, and even system programmers, that know
pcg> the problems, find manual storage reclamation difficult to get right.
sakkinen> I agree, but my conclusion is: programmers should prefer
sakkinen> cleaner OO languages that make things like this possible, and
sakkinen> not C++.
I beg to differ inasmuch C++ is involved -- the advantages of cleaner
languages than C++ may be:
1) they are simpler to understand and use than C++, as you have
so often demonstrated.
2) they usually do not require a conservative storage reclaimer,
so that they can reclaim all garbage.
Automatic storage reclamation is *possible* and *safe* in C++, even if
less efficient than in language that provides type information to the
storage manager.
But again, as to this: the loss of space to garbage is often
minimal, and I know of languages where, even if possible, the
implementation does not bother to provide type tables to the
storage manager, and uses conservative g.c. instead.
--
Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
dsouza@optima.cad.mcc.com (Desmond Dsouza) (09/29/90)
In article <PCG.90Sep23020738@teachb.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: > pallas> Now you're not making sense. Either you honor interior pointers > pallas> or you don't. If you do, your collector cannot reclaim anything > ..... > > Sakkinen proposed an example in which a composite object is protected > only by a pointer to an object part of it. The composite object *cannot* > ^^^^^^^^ > be reclaimed manually, because a pointer to one of its parts cannot be > used with operator delete, because it has the wrong type (if nothing > else). You are right about not being able to reclaim the composite object with a pointer to a *data* member (More on this later) I hope this is not too obvious, but: For an object of a class with multiple base classes (C++), would you say that a Base pointer pointing to the Base class portion of the object is pointing to a *part* of it? If so, you certainly can reclaim the entire object manually by calling a *virtual* delete operator on the Base pointer. > As I said, it takes a sophisticated storage reclaimer, manual or > automatic, and a type table at runtime, to be able to reclaim > only the unreachable subojects of a composite object, > leaving the reachable subobjects alone. operator delete is not > even allowed to do it, because it must be presented with a > pointer with the same type and value as one returned by a ^^^^^^^^^^^^^^^^^^^^^^^ > call to operator new. > Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Sorry, but as long as operator delete is *virtual* in Base, the following is legal. It calls Derived::~Derived, and reclaims a Derived : Derived* dp = new Derived; // "new" returns a Derived* Base* bp = dp; ... delete bp; // delete a Base*: different type, // possibly different value from dp ---------------------------------------- NOW, just for grins, how about this definition of data members: If class A *CONTAINS* a data member of class B, define deletion of the B member as implying deletion of the containing A, provided B::~B is virtual. i.e. destroying a part of an object destroys the object itself. Could this be implemented ? Well, ... : 1. When a B is constructed: *if* it was *contained* in an A, then the dynamic value of B's destructor is the same as the dynamic value of that A's destructor. (This recursive definition accounts for transitive containment). All necessary information is available at compile time, including containment, offsets, etc. The underlying mechanism is much like class derivation, though there will clearly be more versions of B's virtual table -- or equivalent dynamic dispatch mechanism -- due to B being potentially contained (transitively) in several classes. class B { virtual ~B(); }; class A { virtual ~A(); B b; }; B* bp1 = new B; // delete bp1 ==> B::~B B* bp2 = (* new A).b; // delete bp2 ==> A::~A with offsets etc. 2. When an A is constructed: if it was *not* contained in another class, then A's dynamic destructor would be decided by normal class derivation rules. i.e. A::~A if it was an A being created, C::~C if it was constructed as part of a derived class C. class C : public A {}; A* ap = new C; // delete ap ==> C::~C B* bp = &ap->b; // delete bp ==> C::~C Note that since reference data members and constant pointers cannot be re-assigned, the definition of *CONTAINS* could be extended as below to include them, but the implementation might be very difficult. Defns: - B data member: A does contain B - B* data member: can be re-assigned: A does *not* contain B - (B* const) member: re-assign with casts subverts const semantics: A *does* contain B - (const B*) member: re-assign freely: A does *not* contain B - B& data member: cannot be re-assigned: A *does* contain B ---------------------------------------- Personally, I would like a vendor's implementation to provide the information a garbage collector needs to work efficiently. Since there are situations when this is unacceptible it should be a specifiable option, possibly on a per-class basis. Reference counting can get very inefficient very quickly, besides being yet another detail to have to worry about. And manual reclamation, while essential for performance *within* well bounded modules, can seriously complicate re-use from other modules. ...still looking for a compacting, non-conservative gc for c++ :-) Desmond. -- ------------------------------------------------------------------------------- Desmond D'Souza, MCC CAD Program | ARPA: dsouza@mcc.com | Phone: [512] 338-3324 Box 200195, Austin, TX 78720 | UUCP: {uunet,harvard,gatech,pyramid}!cs.utexas.edu!milano!cadillac!dsouza