tmb@ai.mit.edu (Thomas M. Breuel) (04/24/91)
In article <1991Apr23.000632.9175@neon.Stanford.EDU> hoelzle@neon.Stanford.EDU (Urs Hoelzle) writes in response to an earlier message by tmb@ai.mit.edu (Thomas M. Breuel): >>[Any kind of garbage collection other than reference counting] will have >>much higher overhead [in the case of non-recursive data structures]. This is quoted incorrectly, and it does not represent what I was saying. >What is the reason why this must be so? Do you have any numbers to >show that reference counting is superior in this case? (Read: I don't >believe it.) Neither do I. In fact, I was surprised to find that using the conservative garbage collector due to Hans Boehm and Mark Weiser (and described in Software Practice and Experience in 1988?) instead of doing reference counting myself was _considerably_ faster in a C++ implementation which used many dynamically allocated arrays. If my memory serves me correctly, the running times dropped to about 60% of the original ones. (And, yes, the application ran long enough to incur several garbage collections!) Reference counting incurs costs at each copy and assignment operation involving references to data. For most applications that makes reference counting inferior to scanning garbage collectors. However, while the incremental cost of collecting another object with scanning garbage collectors is considerably smaller than the bookkeeping cost required for reference counting that object, the startup cost for a scanning garbage collector is much higher compared to the zero startup cost for a reference counting system. This is the case because you have to scan through a considerable number of objects to determine what is alive and what is dead before you can start any storage reclamation (generational schemes reduce, but do not eliminate, this overhead). Now, if you have some resource of which you can allocate only a small number of items (say 10-100) before you need to garbage collect, the overhead associated with scanning garbage collectors can be much more costly than the overhead associated with updating reference counts. Typical examples of such situations are numerical applications written in a functional style and data parallel languages like *Lisp (other examples are file descriptors, network ports, communication ports, and other kinds of very limited hardware resources). In a serial numerical program written in a functional style, you will easily generate garbage at rates of 10M/sec, meaning that you have to garbage collect very frequently. The rates are much higher for data level parallel languages. Such languages have the additional useful feature that those objects for which reference counting makes sense can also never be circular, which means that reference counting not only is the most efficient garbage collection technique, but that it also is completely sufficient. Reference counting will most likely also give you better locality of reference as well, since recently freed objects are likely to have been accessed recently as well. For C++, the implication of this is that, while it may make sense to use scanning garbage collectors for most of the garbage, reference counting is a sensible technique for large floating point and integer vectors, which is all I was claiming to begin with.
gyro@kestrel.edu (Scott Layson Burson) (04/25/91)
In article <TMB.91Apr24033353@volterra.ai.mit.edu> tmb@ai.mit.edu (Thomas M. Breuel) writes: >Such languages have the additional useful feature that those objects >for which reference counting makes sense can also never be circular, >which means that reference counting not only is the most efficient >garbage collection technique, but that it also is completely >sufficient. Reference counting will most likely also give you better >locality of reference as well, since recently freed objects are likely >to have been accessed recently as well. I must challenge the latter point. I guess you're referring to the fact that a garbage collection requires scanning many objects which may not have been touched recently, and that's true: the garbage collection process *itself* does not have very good locality. However, I think you are overlooking the fact that a well-designed generational garbage collector can make very important improvements in the locality of the objects as it relocates them, in such a way as to substantially improve the paging behavior of the application itself. This is very definitely a concern I have about C++: that when people start manipulating 60 Mb worth of objects in one program (and I have already seen 30 Mb process sizes), that the locality problem is going to bring their machines to their knees, even though they would have had no trouble running the same algorithms on the same data in Lisp. I don't know of any allocation algorithms for C++ that pay any attention to locality, nor do I really see how there *can* be any, because the information about what other objects an object X points to or is pointed to by is not available at the time X is created. Furthermore, there is no way of doing compaction of live objects. As I say, I am very concerned about this. I think the C++ community is not prepared for the magnitude of the problem. Note, by the way, that while class-specific `operator new' and `operator delete' might sometimes be of help, in the general case -- in all the cases I have seen -- reference locality is orthogonal to class boundaries. The "placement" argument to `new' (ARM 5.3.3) is more likely to be of significant value. >For C++, the implication of this is that, while it may make sense >to use scanning garbage collectors for most of the garbage, reference >counting is a sensible technique for large floating point and integer >vectors, which is all I was claiming to begin with. However, with these caveats about locality, I agree with your conclusion. -- Scott Gyro@Reasoning.COM
tmb@ai.mit.edu (Thomas M. Breuel) (04/25/91)
In article <1991Apr24.192424.23745@kestrel.edu> gyro@kestrel.edu (Scott Layson Burson) writes: In article <TMB.91Apr24033353@volterra.ai.mit.edu> tmb@ai.mit.edu (Thomas M. Breuel) writes: >Such languages have the additional useful feature that those objects >for which reference counting makes sense can also never be circular, >which means that reference counting not only is the most efficient >garbage collection technique, but that it also is completely >sufficient. Reference counting will most likely also give you better >locality of reference as well, since recently freed objects are likely >to have been accessed recently as well. I must challenge the latter point. I guess you're referring to the fact that a garbage collection requires scanning many objects which may not have been touched recently, and that's true: the garbage collection process *itself* does not have very good locality. However, I think you are overlooking the fact that a well-designed generational garbage collector can make very important improvements in the locality of the objects as it relocates them, in such a way as to substantially improve the paging behavior of the application itself. This is true of compacting garbage collectors but is not relevant to the case I was discussing. I was talking about large heaps (many Mbytes) and large objects (about 1 Mbyte each or more). You cannot afford to compact (copy) such objects just because you might improve locality of reference (apart from the fact that compacting GC in C++ is tricky anyway). Compacting, generational, scanning garbage collectors are better than non-compacting reference counting garbage collectors for many allocation problems. However, for large monotype arrays, reference counting has lower overhead, and it is much simpler to implement than a solid generational scheme. Keep this in mind before you rush out to switch all your C++ memory management to a scanning garbage collector. Thomas.
cwhitley@mole.gnu.ai.mit.edu (Cecil H. Whitley) (04/25/91)
To: popkin@cs.odu.edu Subject: Re: Turbo Questions (Turbo Vision, Objects, Etc) Newsgroups: comp.lang.pascal,comp.lang.c,comp.lang.c++,comp.lang.misc In-Reply-To: <1991Apr24.165725.29020@cs.odu.edu> Organization: The Internet Cc: Bcc: Hi, tc++ (now named borland c++) doesn't have turbo-vision yet. It does however have an external program that does some of what tv does. It also doesn't have unit files it has somethingg better! It uses .obj files which are pre-compiled sources from which you can use either some or all functions/sub-routines. It also has vroom (virtual run-time object oriented memory management) borland's overlay manager. You will find it to be more flexible than turbo pascal, esp for large programs with a lot of data. Cecil
marti@mint.inf.ethz.ch (Robert Marti) (04/25/91)
In article <TMB.91Apr24033353@volterra.ai.mit.edu> tmb@ai.mit.edu (Thomas M. Breuel) writes: > In article <1991Apr23.000632.9175@neon.Stanford.EDU> > hoelzle@neon.Stanford.EDU > (Urs Hoelzle) writes in response to an earlier message by tmb@ai.mit.edu > (Thomas M. Breuel): > >>[Any kind of garbage collection other than reference counting] will have > >>much higher overhead [in the case of non-recursive data structures]. >This is quoted incorrectly, and it does not represent what I was saying. First off, I (and not Urs Hoelzle) was the one who was paraphrasing what I preceived Thomas Breuel as saying. Sorry if I mistunderstood and therefore misquoted your views, Thomas. [Lengthy explanations deleted.] >For C++, the implication of this is that, while it may make sense >to use scanning garbage collectors for most of the garbage, reference >counting is a sensible technique for large floating point and integer >vectors, which is all I was claiming to begin with. What exactly means "sensible"? All _I_ was saying is that in my C++ application, I had "large" bit vectors and large inter vectors and found that running times using a mark-and-sweep collector were about 60% as opposed to doing reference counting and deallocation myself. I am a little unsure if this means that ref counting is sensible in this very situation ... Robert Marti | Phone: +41 1 254 72 60 Institut fur Informationssysteme | FAX: +41 1 262 39 73 ETH-Zentrum | E-Mail: marti@inf.ethz.ch CH-8092 Zurich, Switzerland |
tmb@ai.mit.edu (Thomas M. Breuel) (04/26/91)
>For C++, the implication of this is that, while it may make sense >to use scanning garbage collectors for most of the garbage, reference >counting is a sensible technique for large floating point and integer >vectors, which is all I was claiming to begin with. What exactly means "sensible"? All _I_ was saying is that in my C++ application, I had "large" bit vectors and large inter vectors and found that running times using a mark-and-sweep collector were about 60% as opposed to doing reference counting and deallocation myself. I am a little unsure if this means that ref counting is sensible in this very situation ... And I have had exactly the opposite experience. Perhaps our notions of "large" and "frequent deallocation" are different. As I said before, I am concerned with arrays that are a significant fraction of heap size (about 1M out of 16M) and that are written and read only a few times during their lifetime. Maybe you could be more specific about your application.
jimad@microsoft.UUCP (Jim ADCOCK) (05/04/91)
In article <1991Apr24.192424.23745@kestrel.edu| gyro@kestrel.edu (Scott Layson Burson) writes: |This is very definitely a concern I have about C++: that when people |start manipulating 60 Mb worth of objects in one program (and I have |already seen 30 Mb process sizes), that the locality problem is going |to bring their machines to their knees, even though they would have |had no trouble running the same algorithms on the same data in Lisp. |I don't know of any allocation algorithms for C++ that pay any |attention to locality, nor do I really see how there *can* be any, |because the information about what other objects an object X points to |or is pointed to by is not available at the time X is created. |Furthermore, there is no way of doing compaction of live objects. | |As I say, I am very concerned about this. I think the C++ community |is not prepared for the magnitude of the problem. Agreed. I see two major impediments to solving these problems: 1) C++ presently has inadequate support for moveable objects [although a number of vendors have C/C++ compiler extensions for moveable objects] 2) C++ presently has inadequate support for GC [although the problem is very similar to exceptions, so why not kill two birds with one stone?] |Note, by the way, that while class-specific `operator new' and |`operator delete' might sometimes be of help, in the general case -- |in all the cases I have seen -- reference locality is orthogonal to |class boundaries. The "placement" argument to `new' (ARM 5.3.3) is |more likely to be of significant value. A more fundamental problem is that any fixed placement strategy assumes that the patterns of usage of objects is going to remain fixed, but this is not true of any non-trivial app. Thus, the only way to maintain good locality of reference over the long run is to have moveable objects. As an extreme, see Jul's thesis on Emerald, where objects are moved "automatically" over the net to the process where they are needed. I agree with your concerns, but I don't think its too hard to fix these problems with some small changes to C++. Its just that some people don't yet seem to be interested in solving these problems. [In addition to all this, note that overloaded operator dot would make one small contribution towards allowing moveable objects, since such overloading allows C++ programs to abstract the notion of an object distinct from the byte address used to store that object.]
boehm@parc.xerox.com (Hans Boehm) (05/08/91)
jimad@microsoft.UUCP (Jim ADCOCK) writes: >In article <1991Apr24.192424.23745@kestrel.edu| gyro@kestrel.edu (Scott Layson Burson) writes: >|This is very definitely a concern I have about C++: that when people >|start manipulating 60 Mb worth of objects in one program (and I have >|already seen 30 Mb process sizes), that the locality problem is going >|to bring their machines to their knees, even though they would have >|had no trouble running the same algorithms on the same data in Lisp. >|I don't know of any allocation algorithms for C++ that pay any >|attention to locality, nor do I really see how there *can* be any, >|because the information about what other objects an object X points to >|or is pointed to by is not available at the time X is created. >|Furthermore, there is no way of doing compaction of live objects. >| >|As I say, I am very concerned about this. I think the C++ community >|is not prepared for the magnitude of the problem. >Agreed. ... >A more fundamental problem is that any fixed placement strategy assumes >that the patterns of usage of objects is going to remain fixed, but this >is not true of any non-trivial app. Thus, the only way to maintain good >locality of reference over the long run is to have moveable objects. >As an extreme, see Jul's thesis on Emerald, where objects are moved >"automatically" over the net to the process where they are needed. I agree that in a distributed memory system this makes sense. In the uniprocessor case, the issues are less clear. There is some reason to believe that objects allocated close together in time are likely to be accessed together. (I believe Paul Wilson's paper in the upcoming SIGPLAN conference makes this point, but I'm sure there are better references.) Fixed placement allocators typically allocate successively allocated objects close to each other. (If they don't, they should.) Thus there is some reason to believe that fixed placement allocators don't do that badly with respect to locality. Moving collectors uniformly win in that they avoid fragmentation. But, at least in our environment (Cedar with 30 MB or so heap and some long lived sessions), this isn't much of an issue even with fixed allocation. A compacting trace-and-sweep collector obviously wins over a noncompacting one in terms of locality. (Actually, in the presence of nonuniform sized objects, even that's not completely clear, but it seems likely.) But it tends to be expensive in other respects. Copying collectors have their own problems. You have to be fairly clever to get any reasonable locality. (My intuition is that breadth-first copying is significantly worse than fixed allocation. I don't have numbers to either back that up or to refute it.) Copying on reference wins substantially for at least one application (see Johnson's paper in ASPLOS '91), but that's hard to do well on a conventional machine. A full copying collection is likely to be a disaster, given the current ratio of cpu to disk speeds. Moving collectors don't get along with some programming techniques (e.g. hashing on addresses). A comparison of some of these approaches can be found in John DeTreville, "Experience with Concurrent Garbage Collectors for Modula-2+", DEC SRC report no. 64. Ben Zorn's paper in the 1990 Lisp conference also compares copying and trace&sweep collection in one particular context. If anybody knows of any other real data (that's not completely obsolete), I would love to know about it. Hans (boehm@xerox.com)
gyro@kestrel.edu (Scott Layson Burson) (05/11/91)
In article <boehm.673720634@siria> boehm@parc.xerox.com (Hans Boehm) writes: >Fixed placement allocators typically allocate successively allocated >objects close to each other. (If they don't, they should.) They don't, at least not when used simplistically. When objects are freed, allocators attempt to reuse the space they took up. The strategies for doing this vary, of course, but the bottom line is that after a lot of allocations and freeings, successively allocated objects get less and less likely to be close to each other. One factor that exacerbates the problem greatly is allocating short-lived and long-lived objects in the same space. So it would seem that the first thing to do would be to try to segregate objects by expected lifetime. To whatever extent that can be done in a particular application, I think it's a good start on the problem, but it isn't always possible to do it very well. Bartlett's generational collector may be a lifesaver for certain applications people I know are working on. I haven't yet heard of anyone who's done a SPARC/G++ port, however. -- Scott Gyro@Reasoning.COM
boehm@parc.xerox.com (Hans Boehm) (05/14/91)
gyro@kestrel.edu (Scott Layson Burson) writes: >In article <boehm.673720634@siria> boehm@parc.xerox.com (Hans Boehm) writes: >>Fixed placement allocators typically allocate successively allocated >>objects close to each other. (If they don't, they should.) >They don't, at least not when used simplistically. When objects are >freed, allocators attempt to reuse the space they took up. The >strategies for doing this vary, of course, but the bottom line is that >after a lot of allocations and freeings, successively allocated >objects get less and less likely to be close to each other. >One factor that exacerbates the problem greatly is allocating >short-lived and long-lived objects in the same space. So it would >seem that the first thing to do would be to try to segregate objects >by expected lifetime. Granted, things will get a little worse as more objects are reallocated, but not terribly so. Let's be a bit more precise. This issue is of interest primarily for small (< 1 page) objects. As an example, let's assume we're allocating 8 byte cons cells on 4K pages. Let's further assume that pages contain only objects of one size. (This is likely to be a good choice for a nonmoving allocator.) Let's further assume the heap is 3/4 full. When I garbage collect, I will find that an average page contains 512/4=128 free objects. These will end up next to each other on the size 8 free list. Thus the probability of two subsequent allocations residing on the same page is about 127/128, assuming all of memory is equally densely populated. This is worse than what I would get for newly allocated memory from a copying collector (which would get 511/512). But it is probably better than what I would get by considering those same list nodes after they've been breadth-first copied a few times. Reality is of course more complicated. Different objects sizes interact, though this is probably not a major effect, since the number of different object sizes in a typical data structure is probably small. In our experience, pages tend to be very nonuniformly populated, even after a system has been up for quite a while. This helps the nonmoving allocator, since it can chose not to allocate on nearly full pages. Breadth-first copying is of course not optimal. But other strategies seem to make it harder to build concurrent collectors on conventional hardware. The whole point is still that the tradeoffs are nontrivial. Hans (boehm@xerox.com)
jimad@microsoft.UUCP (Jim ADCOCK) (05/17/91)
In article <1991May11.054648.2483@kestrel.edu> gyro@kestrel.edu (Scott Layson Burson) writes: |In article <boehm.673720634@siria> boehm@parc.xerox.com (Hans Boehm) writes: |>Fixed placement allocators typically allocate successively allocated |>objects close to each other. (If they don't, they should.) | |They don't, at least not when used simplistically. When objects are |freed, allocators attempt to reuse the space they took up. The |strategies for doing this vary, of course, but the bottom line is that |after a lot of allocations and freeings, successively allocated |objects get less and less likely to be close to each other. |One factor that exacerbates the problem greatly is allocating |short-lived and long-lived objects in the same space. So it would |seem that the first thing to do would be to try to segregate objects |by expected lifetime. If you think about it, the problem is not where to create objects that are being born, but where one should have created the object that is currently dying. Because without moving objects, the only way to conveniently reuse the hole left by a dying object is for that hole to be at "exactly" the right location for a new object to use. I don't see any good why to predict in advance when an object is going to die, thereby allowing it to be placed well at time of creation. Some people claim the right thing to do is to let the end programmer specify where the object will be created, either by specifying how long the programmer guesses the object will live: FOO* pYoungFoo = new (SHORT_LIVED) foo(23); FOO* pOldFoo = new (LONG_LIVED) foo(123); or by specifying a neighborhood of objects that are all going to die at about the same time: FOO* pYoungFool = new (BELLEVUE) fool(34); FOO* pOldFool = new (SEATTLE) fool(92); FOO* pMiddleAgedFool = new (MERCER_ISLAND) fool(45); I think this is a cop-out, in that the end user [if writing "OOP" classes] is seldom better equipped to guess the life-time of objects than the person writing the memory management routines. In particular, if one is writing general-purpose reusable classes, one typically has no idea at all how objects of those classes are going to be used. So, I claim writing reusable classes with explicit placement parameters is troublesome, at best. If one cannot, then, successfully predict the lifetime of objects, so that the hole they leave behind is well placed, one needs to be able to move that hole to a more useful location. The only way I cn see to do that is either to move objects around, implicitly moving left-behind holes too, or by building hardware with virtual memory paging hardware more in tune with the needs of OOP [how often do we make 4K-sized objects?] So, I'd like to see C++ have better support for movable objects.
daniel@fern.ucsc.edu (Daniel Edelson) (05/17/91)
In article <72384@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK) writes: >So, I'd like to see C++ have better support for movable objects. Then lobby X3J16 not to forbid taking the address of ``this'' within a member function. -daniel daniel@cis.ucsc.edu
jimad@microsoft.UUCP (Jim ADCOCK) (05/21/91)
In article <boehm.674163589@siria> boehm@parc.xerox.com (Hans Boehm) writes: |Reality is of course more complicated. Different objects sizes interact, |though this is probably not a major effect, since the number of different |object sizes in a typical data structure is probably small. In our |experience, pages tend to be very nonuniformly populated, even after |a system has been up for quite a while. This helps |the nonmoving allocator, since it can chose not to allocate on nearly |full pages. In which case you're ending up with a situation with old pages, containing un-reclaimed holes. Eventually pages fill up with old long-lived objects, excepting a few holes. Either one ignores those holes, failing to reclaim memory, in which case you don't have a stable persistent system, or one is forced to fill the holes in old pages, in which case you've got new objects spread out across predominantly old pages. The only way around this dilemma is to move objects. Of course, for non-stable systems or even quasi-stable systems, it might be acceptible to "leak" a little memory by never plugging the holes in old pages with new objects. But, unless one can predict a finite lifetime for one's app, you've got to go back and fill those holes eventually.
robertk@lotatg.lotus.com (Robert Krajewski) (05/21/91)
In article <72384@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK) writes:
From: jimad@microsoft.UUCP (Jim ADCOCK)
Newsgroups: comp.lang.c++
Some people claim the right thing to do
is to let the end programmer specify where the object will be created,
either by specifying how long the programmer guesses the object will live:
FOO* pYoungFoo = new (SHORT_LIVED) foo(23);
FOO* pOldFoo = new (LONG_LIVED) foo(123);
or by specifying a neighborhood of objects that are all going to die
at about the same time:
FOO* pYoungFool = new (BELLEVUE) fool(34);
FOO* pOldFool = new (SEATTLE) fool(92);
FOO* pMiddleAgedFool = new (MERCER_ISLAND) fool(45);
I think this is a cop-out...
... but even so, even one of the examples of ultimate Right Thing object
storage implementations, the Lisp Machine, had a similar strategy for
programmers that wanted to optimized locality. Lisp Machine storage
was divided into areas -- there was a normally working storage area,
but one could also direct allocation into other areas, either through
explicit arguments in special versions of CONS or MAKE-ARRAY, or
through binding a special implicit parameter to the allocator.
The Lisp Machine compiler used areas in a particularly nasty way:
after compilation was complete, the area was *reset*, that is, totally
deallocated. That saved a lot of the work for the garbage collector,
but modification of the compiler could be very dangerous if you left
things outside pointing to the compiler area. After the
generation-based GC was moved in, the compiler area was retained, but
reset code was taken out. With this change, compilation time *sped up*
because the generational GC had a chance to reclaim new objects before
the compiler was finished; that storage was then freed up for other
objects, and the machine did not have to page to get more unused
pages.
In general, with the ephemeral GC in place, areas were much less
important than before. However, they still provide some information
about locality, so there is a place for them.
The only way I cn see to do that
is either to move objects around, implicitly moving left-behind holes too,
or by building hardware with virtual memory paging hardware more in tune
with the needs of OOP [how often do we make 4K-sized objects?]
Well, it's been done, although it's more of a question of low-level
software (object storage) conventions than anything else. There's very
little about a Lisp Machine that makes it a *Lisp* machine per se. I
don't think there is ever going to be a version of C++ that's going to
allow similar object architectures to fly at full speed, since there
are some many programs that manipulate addresses in a
non-object-oriented fashion. What is more reachable is getting enough
reasonable features and latitude in spec so that, if one wanted to,
one could write code in a clean dialect of C++ that would not be
hostile to a true object architecture.
boehm@parc.xerox.com (Hans Boehm) (05/23/91)
jimad@microsoft.UUCP (Jim ADCOCK) writes: >In article <boehm.674163589@siria> boehm@parc.xerox.com (Hans Boehm) writes: >|Reality is of course more complicated. Different objects sizes interact, >|though this is probably not a major effect, since the number of different >|object sizes in a typical data structure is probably small. In our >|experience, pages tend to be very nonuniformly populated, even after >|a system has been up for quite a while. This helps >|the nonmoving allocator, since it can chose not to allocate on nearly >|full pages. >In which case you're ending up with a situation with old pages, containing >un-reclaimed holes. Eventually pages fill up with old long-lived objects, >excepting a few holes. Either one ignores those holes, failing to reclaim >memory, in which case you don't have a stable persistent system, or one >is forced to fill the holes in old pages, in which case you've got new >objects spread out across predominantly old pages. The only way >around this dilemma is to move objects. Of course, for non-stable >systems or even quasi-stable systems, it might be acceptible to "leak" >a little memory by never plugging the holes in old pages with new objects. >But, unless one can predict a finite lifetime for one's app, you've >got to go back and fill those holes eventually. Our system does not fill holes on pages that are 3/4 occupied by old objects. (The choice of 3/4 is rather arbitrary. I don't have numbers to back up that choice.) This does nothing to prevent the system from being stable. In the (unlikely) worst case, it will prevent 1/4 of the available memory from getting used. The decision about whethether a page is 3/4 full of old objects is recomputed at every full collection. Thus holes are likely to get filled eventually, without adversely affecting my original argument about locality. Thus stability is not really the issue. We're wasting at most an amount of space proportional to the amount of live space. This is a concern. But it's usually even more of a concern with copying collectors. (Again, compacting in place wins, but ...) Hans (boehm@xerox.com)