[comp.object] C++ and garbage collection

sakkinen@tukki.jyu.fi (Markku Sakkinen) (09/16/90)

In article <PCG.90Sep14160845@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>On 12 Sep 90 07:41:54 GMT, sakkinen@tukki.jyu.fi (Markku Sakkinen) said:
> ...
>sakkinen> Because of pointer arithmetic, it is not *possible* to use
>sakkinen> garbage collection safely with C++ (or any C-based language).
>
>Or perhaps you misinterpret the C/C++ language -- pointer arithmetic is
>only allowed within an array. As such it is exactly equivalent to array
>indexing in other languages. Indeed in C/C++ array indexing and pointer
>arithmetic are *exactly* equivalent. Any other use of pointer arithmetic
>is outside the language. Therefore legal C/C++ programs do not special
>problems because of pointer arithmetic.

That may be, but unfortunately there is no way to check whether
a C/C++ module is legal or not, so your point is somewhat theoretic.
Many "illegal" idioms work "as expected" in practically any implementation;
the situation is similar to Fortran.
I suspect that a good part of the C code in the software that we
happily use every day is more or less illegal.

To go for more nit-picking, one might quarrel about the "exact
equivalence".  For instance, I suppose it is at least legal for
a pointer variable to advance just beyond the last element of an array,
otherwise C/C++ programmers couldn't employ their cherished
"*p++" idiom.

> ...
>sakkinen> (I have read Boehm's article in Software - Practice and
>sakkinen> Experience, [...]

Sorry for the inexact reference: the paper has also a second author,
Mark Weiser. It appeared in the September 1988 issue.

>If we are to speak about actual, and not just _legal_ C/C++ programs,
>*some* programs do things that are illegal (far fewer than is commonly
>thought of), but those same C++ programs will not fail under
>conservative garbage collection, they will just accumulate more
>unreclaimed garbage, and on average not much of it.

Wrong! It is fully possible that illegal code will reference an area
of storage that has looked as garbage to the most conservative GC routine.
(I exclude only the absolutely conservative one that _never_ reclaims
any piece of garbage.) This was even admitted in Boehm and Weiser's
paper.

> ...

Markku Sakkinen
Department of Computer Science and Information Systems
University of Jyvaskyla (a's with umlauts)
Seminaarinkatu 15
SF-40100 Jyvaskyla (umlauts again)
Finland
          SAKKINEN@FINJYU.bitnet (alternative network address)

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/17/90)

On 16 Sep 90 09:02:26 GMT, sakkinen@tukki.jyu.fi (Markku Sakkinen) said:

sakkinen> In article <PCG.90Sep14160845@odin.cs.aber.ac.uk>
sakkinen> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

pcg> If we are to speak about actual, and not just _legal_ C/C++ programs,
pcg> *some* programs do things that are illegal (far fewer than is commonly
pcg> thought of), but those same C++ programs will not fail under
pcg> conservative garbage collection, they will just accumulate more
pcg> unreclaimed garbage, and on average not much of it.

sakkinen> Wrong! It is fully possible that illegal code will reference
sakkinen> an area of storage

You could support this statement by producing a *single* example where a
storage manager using conservative automatic storage reclamation will
deallocate as garbage an area of storage that is in use.

The definition of "in use" being that its address as if returned by the
storage allocator appears in one or more pointers in storage. If you are
using any other definition of "in use" you are making your own rules,
and we are no longer speaking of storage allocation or reclamation as is
described in the current academic literature on the subject.

You are allowed any means of producing a pointer word in the same
representation as that returned by the storage allocator (e.g.
malloc(3)) and and understood by the deallocator (e.g. free(3)) -- any
form of pointer arithmetic, whether legal or illegal.

Naturally you are *not* allowed to change the representation of pointers
from the one used by the language and the storage manager -- for example
you are not allowed to encrypt pointers, or to write pointers in a
different byte order than the standard machine byte order, or to use
strings as pointers, or to keep pointers in a file on disc.

sakkinen> that has looked as garbage to the most conservative GC
sakkinen> routine.

Notice that garbage collectors could not care less about the contents of
data space areas (except where they contain pointers, of course) -- no
storage reclamation algorithm looks at the contents of a given data area
to determine whether that data area can be reclaimed or not.

Also, there is no degree of conservativeness in a storage reclaimer --
either it is conservative, and it will consider as a pointer any word
whose numeric value can be interpreted as an address into the heap, or
it is not conservative, and it *knows* which words are pointers.

sakkinen> This was even admitted in Boehm and Weiser's paper.

This is a misunderstanding of what they say: they say that storage
reclamation, in general, whether manual or automatic, is only possible
when the user program does not change the representation of pointers.
Even free(3) will fail if you pass it an encrypted or XOR-ed or
otherwise encoded form of a pointer.



I would insist in my belief that most application programmers should not
and would not care to understand how the storage manager works, in both
its allocator and reclaimer components. Just as most users of malloc(3)
do not know or understand which is the allocation algorithm and policy
used by the allocator, they should leave reclamation to the storage
manager's automatic reclaimer, to avoid all the pitfalls that only a
system programmer will have carefully avoided.

It is *much* safer for application programmers to leave reclamation in
the hands of a purpose built reclaimer, than to try to design their own
reclamation policy and algorithm, and even system programmers, that know
the problems, find manual storage reclamation difficult to get right.

Of course, for particularly important applications, in which particular
allocation and reclamation policies may be beneficial, programmers may
well acquire system programmer knowledge and roll their own storage
manager implementation, automatic or manual that may be -- this is one
of the great advantages of C/C++ as SILs/MOHLLs, that you can override
the default storage manager, something that is not possible in languages
like Pascal or Ada, but here I guess I am preaching to the converted :-).
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

jimad@microsoft.UUCP (Jim ADCOCK) (09/18/90)

In article <1990Sep16.090226.11051@tukki.jyu.fi> sakkinen@jytko.jyu.fi (Markku Sakkinen) writes:
>To go for more nit-picking, one might quarrel about the "exact
>equivalence".  For instance, I suppose it is at least legal for
>a pointer variable to advance just beyond the last element of an array,
>otherwise C/C++ programmers couldn't employ their cherished
>"*p++" idiom.

This is covered on pgs 72-73 of E&S.  Both pointers and array indexing
allow one to refer to the location *one* beyond the high end of an array --
as long as you do not actually dereference that location.

char a[10];
char* p1 = &a[10];
char* p2 = a+10;

The assignments to p1 and p2 are both legal, and "equivalent."  ....Which has
little or nothing to do with C/C++ pointer hackery as actually practiced....

sakkinen@tukki.jyu.fi (Markku Sakkinen) (09/18/90)

In article <PCG.90Sep17152746@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
> ...
>The definition of "in use" being that its address as if returned by the
>storage allocator appears in one or more pointers in storage. If you are
>using any other definition of "in use" you are making your own rules,
>and we are no longer speaking of storage allocation or reclamation as is
>described in the current academic literature on the subject.

Aha, we have been playing with different rules.
I thought we were talking about the applicability of GC to _general_
C++ programmes. You in contrast are intending only code that
obeys certain additional constraints.
I have no objection to the statement that "GC is applicable to
C++ programmes as far as they obey rules that make them susceptible
to GC."

The constraint that you suggest above seems extremely severe for C++.
For instance, suppose we have a case of multiple inheritance
where class D is some other base class than the first, of class E.
If we now create an object by 'new E' and cast the resulting pointer
to the type 'D*', it will no more point to the same address that
the allocator returned (at least in some implementations of C++).
This is legal, sound, and commonplace C++ usage,
but according to your rule the object would no more be "in use".

In fact, I am afraid that the above rule would not allow completely
safe GC to be implemented even for Pascal, although its pointer
handling is so much more restricted. (I can try to construct
an example if requested.)

> ...
>Even free(3) will fail if you pass it an encrypted or XOR-ed or
>otherwise encoded form of a pointer.

(Since we are speaking about C++, we should refer to the 'delete' operator
instead of the 'free' function, which is unknown to C++ standards unless
it has surfaced in Release 2.1.)
Certainly, but 'delete' does not require that you have preserved
the pointer exactly in its original form, without interruption,
all the time from 'new'.
That's the difference!

> ...
>It is *much* safer for application programmers to leave reclamation in
>the hands of a purpose built reclaimer, than to try to design their own
>reclamation policy and algorithm, and even system programmers, that know
>the problems, find manual storage reclamation difficult to get right.

I agree, but my conclusion is:
programmers should prefer cleaner OO languages that make things like this
possible, and not C++.

Markku Sakkinen
Department of Computer Science and Information Systems
University of Jyvaskyla (a's with umlauts)
Seminaarinkatu 15
SF-40100 Jyvaskyla (umlauts again)
Finland
          SAKKINEN@FINJYU.bitnet (alternative network address)

jimad@microsoft.UUCP (Jim ADCOCK) (09/19/90)

In article <PCG.90Sep17152746@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

>Also, there is no degree of conservativeness in a storage reclaimer --
>either it is conservative, and it will consider as a pointer any word
>whose numeric value can be interpreted as an address into the heap, or
>it is not conservative, and it *knows* which words are pointers.

While this statement is probably only intended as a definition: "A reclaimer
is conservative if it always assumes any numerical value could be a pointer
to [into] an object to be retained" -- in practice things are not so black
and white.

You can have hybrid systems where some locations are known to be or not be
pointers, and other words may or may not be pointers, requiring a conservative
approach to those pointers only.

You can have systems where the requirement is that interior pointers "don't
count" towards keeping an object alive, or where they do.

You can have systems where referred-to objects must be quad-aligned to
be kept alive -- objects on two-byte alignment don't count.

You can have systems where objects must be allocated in known pools to count.
Or where the references have to be kept in pools.

You can have systems where the reference must be quad-aligned to count -- 
pointers on two-byte alignment doesn't count.

etc.

Seems to me for any GC system to work, there are a set of rules on fair usage
of objects and references that must be obeyed.  "Conservative" systems [in
whatever shade on black, white or grey] are no different in that regard.
Successful GC systems will have a set of rules that are not onerous for
programmers to obey, either in complexity of programming, nor in the 
execution speed or size of the resulting code.

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/19/90)

On 18 Sep 90 17:50:14 GMT, jimad@microsoft.UUCP (Jim ADCOCK) said:

jimad> In article <PCG.90Sep17152746@odin.cs.aber.ac.uk>
jimad> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

pcg> Also, there is no degree of conservativeness in a storage reclaimer --
pcg> either it is conservative, and it will consider as a pointer any word
pcg> whose numeric value can be interpreted as an address into the heap, or
pcg> it is not conservative, and it *knows* which words are pointers.

jimad> While this statement is probably only intended as a definition:
jimad> "A reclaimer is conservative if it always assumes any numerical
jimad> value could be a pointer to [into] an object to be retained" --
jimad> in practice things are not so black and white.

Rather the definition goes the other way round: a non conservative
storage reclaimer (whether manual or automatic, ref.counting or g.c.) is
one that knows which values are pointers -- one that has to guess is by
definition conservative, because it has to err on the safe side,
assuming that certain values are pointers when in fact they are not.

There are degrees of efficiency, and maybe this is what Jim Adcock is
referring to; efficiency is inversely related to the percentage of
values mistaken for pointers when they are not.

Note that there is no degree in safety instead -- un unsafe storage
reclaimer is one that will reclaim _any_ 'in-use' storage. In the case
of a conservative storage reclaimer it will be unsafe if even a single
pointer value is not recognized as such (as long as the storage manager
and the programmer agree on what is a pointer).

jimad> [ ... a number of special cases, e.g.: ... ]

jimad> You can have systems where referred-to objects must be
jimad> quad-aligned to be kept alive -- objects on two-byte alignment
jimad> don't count.

jimad> [ ... a number of other special cases ... ]

But really all these special cases relate to the definition of what is
'in-use' or, equivalently, what is a pointer, and this has nothing to do
with whether storage reclamation is manual or automatic, or if g.c. or
ref.counting is used, or whether the g.c. is conservative or not.

jimad> Seems to me for any GC system to work, there are a set of rules
jimad> on fair usage of objects and references that must be obeyed.
jimad> "Conservative" systems [in whatever shade on black, white or
jimad> grey] are no different in that regard.

Absolutely not! There is no need of ad-hoc rules for GC systems. There
must be a definition of 'in use' that both the programmer and the
storage reclaimer agree upon, but this is an essential requisite of
*any* (automatic,manual,ref.counting or g.c. based) storage reclamation
scheme that does not use telepathy (for reading the programmer's mind)
or foresight (to know which storage area will never actually be used
again).

If your program locates employee records using the employee's name
represented as a string, and then an employee retires, and you try to
dispose of the relevant record by saying

	free("A.Retiree");

in a C program, you have written a wrong program, period.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

sakkinen@tukki.jyu.fi (Markku Sakkinen) (09/19/90)

In article <1990Sep18.131307.984@tukki.jyu.fi> I wrote:
>In article <PCG.90Sep17152746@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>> ...
>>The definition of "in use" being that its address as if returned by the
>>storage allocator appears in one or more pointers in storage. If you are
>>using any other definition of "in use" you are making your own rules,
>>and we are no longer speaking of storage allocation or reclamation as is
>>described in the current academic literature on the subject.
>
>Aha, we have been playing with different rules.
> ...
>In fact, I am afraid that the above rule would not allow completely
>safe GC to be implemented even for Pascal, although its pointer
>handling is so much more restricted. (I can try to construct
>an example if requested.)
> ...

To the disappoinment of all readers, here is an example although
I have not been requested:

type bucket = record ... end;
     pail = ... ;
     container = record
         first: pail;
         second: bucket;
         third: ...
         end;

var mystore: ^container;

procedure doit (var vessel: bucket);
    var ... ;
    begin
        mystore := nil;
        ...
        end;

begin
    mystore := new (container);
    doit (mystore^.second);
    ...

After the assignment statement in 'doit' has been executed,
there is nowhere a pointer as returned by the storage allocator,
yet the 'second' part of the allocated record is in use to the
end of the procedure. (This example can be easily rewritten in C or C++.)

Another example is still a little simpler, but I am not 100% sure about
if it is legal standard Pascal. (There is no C or C++ equivalent.)
Assuming the same type and variable definitions as above:

begin
    mystore := new (container);
    with mystore^.second do begin
        mystore := nil;
        ... (* references to components of 'bucket' can be used here *)
        end;
    ...

BTW, now the mouse is already chasing the cat.
I originally claimed that garbage collection is not applicable
to C++ (generally).  Piercarlo Grandi disagreed, and we had some
rounds of dispute.  In his last posting, he presented the rule
quoted above; it is very evident that C++ does not qualify for GC
according to it, but I protested that the rule was too strict.
Why do I want to prevent him from shooting his own foot?

Markku Sakkinen
Department of Computer Science and Information Systems
University of Jyvaskyla (a's with umlauts)
Seminaarinkatu 15
SF-40100 Jyvaskyla (umlauts again)
Finland
          SAKKINEN@FINJYU.bitnet (alternative network address)

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/19/90)

[ I have added comp.lang.c++ to the newsgroup list, because this is is
discussion mostly on C++ implementation, not on general OO
implementation, but I have ensured that followups are to comp.object to
avoid fragmenting the thread across two newsgroups ]

On 18 Sep 90 13:13:07 GMT, sakkinen@tukki.jyu.fi (Markku Sakkinen) said:

sakkinen> In article <PCG.90Sep17152746@odin.cs.aber.ac.uk>
sakkinen> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

pcg> The definition of "in use" being that its address as if returned by the
pcg> storage allocator appears in one or more pointers in storage. If you are
pcg> using any other definition of "in use" you are making your own rules,
pcg> and we are no longer speaking of storage allocation or reclamation as is
pcg> described in the current academic literature on the subject.

sakkinen> Aha, we have been playing with different rules.

This I seemed to understand. Unfortunately your rules are somewhat
questionable; your arguments are based on the assumption that the
programmer and the storage reclaimer may use a different definition of
'in-use', by using different notions of pointer.

Under such an assumption then you show that storage reclamation is not
safe. This is not an interesting result :-).

sakkinen> I thought we were talking about the applicability of GC to
sakkinen> _general_ C++ programmes. You in contrast are intending only
sakkinen> code that obeys certain additional constraints.

The idea that programmer and storage reclaimer use the same notion of
'in-use' and thus of pointer is not an additional constraint, under any
commonly accepted interpretation of the word.

sakkinen> I have no objection to the statement that "GC is applicable to
sakkinen> C++ programmes as far as they obey rules that make them
sakkinen> susceptible to GC."

The correct wording is "Storage reclamation of any sort is only possible
in C++, as in any language, as far as C++ programmes agree with the
storage reclaimer on what is a pointer".

sakkinen> The constraint that you suggest above seems extremely severe
sakkinen> for C++.

It does not seem so to me; it merely excludes all erroneous programs,
i.e. all those that violate the interface contract with the storage
manager.  The interface contract on operator delete, the C++ manual
storage reclaimer, is even more severe, as in:

sakkinen> For instance, suppose we have a case of multiple inheritance
sakkinen> where class D is some other base class than the first, of class E.

Nitpicking: well, it could be *any* base class -- there is no reason for
which MI has to be implemented by strict prefixing. The problem is valid
in general.

sakkinen> If we now create an object by 'new E' and cast the resulting pointer
sakkinen> to the type 'D*', it will no more point to the same address that
sakkinen> the allocator returned (at least in some implementations of C++).

Well, in that case you cannot use that pointer as operand to operator
delete either, so if you are doing manual storage reclamation your
object has become unreclaimable -- unless you keep around a pointer to
the 'E*', in which case no problem arises under any reclamation policy.

C++ (like most any language I can remember) has a rule that says that
you can only apply operator delete to the same pointer, in value *and*
type, that was returned by some call to operator new. A (conservative or
not) garbage collector need not be that restrictive.

sakkinen> This is legal, sound, and commonplace C++ usage, but according
sakkinen> to your rule the object would no more be "in use".

It would be 'in-use' as long as the resulting pointer did not have a
different *format* (I very carefully wrote that) from that returned by
the storage allocator and returned by the storage deallocator; in your
example the format is not changed, just the value of the pointer is.

For example, this is virtually exactly equivalent to have a pointer to a
structure member, or an array element, in C, as in:

	struct s { int a,b,c; }; int *ip;
	s *s1 = (struct s *) malloc(sizeof (struct s));
	ip = (int *) &s->b; s1 = 0;

Well, after this, a conservative garbage collector will *not* reclaim
the relevant storage area. The storage for fields 'a' and 'c' will be
wasted of course, but this is nearly unavoidable, without type
information and a very sophisticated reclaimer.

Note that the conservative reclaimer will do one better than even a
manual one using free(3) or operator delete would, because if these were
used either one would have to reclaim all of the structure, thus making
'*ip' meaningless, or could not reclaim the structure even if the
reference to 's1->b' were lost, because

	free((void *) ip); ip = 0;

would of course be illegal (by the rule mentioned above that you can
only deallocate an object with the same pointer that was returned by a
call to the storage allocator).

Let's switch to C++ and consider your example

	class C { ... };
	class D { ... };

	class E : C,D { ... };

	E *e = new (E);
	D *d = (D *) d;

Now, as in the example above, if you lose the pointer contained in 'e',
the object allocated by operator new is unreclaimable, because you
cannot write

	e = 0;
	delete d;

Of course you can write

	e = 0;
	delete (E *) d;

but here you are making use of information that you have not given to
the compiler. By contrast a conservative g.c. that uses a BIBOP style
storage allocator will correctly not reclaim that space as in

	e = 0;
	Reclaim()

because 'd' still protect the entire object, but will effciently reclaim
the space when no longer protected by either 'e' or 'd' as in:

	e = 0; d = 0;
	Reclaim();

Knowing this, I did not say that the conservative storage reclaimer
would only work with pointers returned by the allocator, but with
pointers *of the same format*, which is weaker than the equivalent
interface contract on manual storage reclamation.

	In other words, in similar situations, the conservative g.c.
	will safely reclaim *more* space than manual reclamation.

Even illegal pointers are allowed (i.e. those denoting an invalid
address), as long as they have the same format as those returned by the
allocator component of the storage manager.

pcg> Even free(3) will fail if you pass it an encrypted or XOR-ed or
pcg> otherwise encoded form of a pointer.

sakkinen> (Since we are speaking about C++, we should refer to the
sakkinen> 'delete' operator instead of the 'free' function, which is
sakkinen> unknown to C++ standards unless it has surfaced in Release
sakkinen> 2.1.)

Ahhh. Here then we merely need to make the storage reclaimer understand
the format of pointers accepted by operator delete -- if you are using
an underlying storage reclaimer that cannot accept all pointer formats
that the programmer can supply to operator delete, you are using the
wrong reclaimer.

It is an interesting question whether it is always possible to implement
C++ in such a way that all pointer formats used by the implementation
can be fairly easily identified as such by examining a snapshot of
memory, without using type information.

	In other words: can C++ be implemented without modification in
	such a way that the interface contract with the storage manager
	allows for conservative garbage collection?

This is certainly true for C, and almost any other language I can
remember. I think it is also true for C++, because all pointers in it
must point either to the beginning or within a valid object (or array
thereof, except for the harmless end-of-array exception), and a BIBOP
style storage allocator means that one has no problem with pointers to
the middle of an object (like in your example above).

I think that the only problem seems to be with pure based or relative
pointers; but probably this is not a problem, because while a based or
relative pointer is essentially undetectable in a memory snapshot, it is
not meaningful if there is no absolute pointer to the whole object
somewhere, and this absolute pointer will protect the object.

sakkinen> Certainly, but 'delete' does not require that you have
sakkinen> preserved the pointer exactly in its original form, without
sakkinen> interruption, all the time from 'new'.  That's the difference!

Here you seem to say that with manual storage reclamation the programmer
can temporarily change the format of a pointer and leave the
corresponding object "unprotected", because reclamation will only be
triggered manually.

Well, an automatic storage reclaimer need not be triggered at unexpected
times -- indeed any decent automatic storage reclaimer offers you the
option either to be only activated explicitly or to defer its invocation
till the end of a critical region, if you are temporarily violating the
interface contract with it.

pcg> It is *much* safer for application programmers to leave reclamation in
pcg> the hands of a purpose built reclaimer, than to try to design their own
pcg> reclamation policy and algorithm, and even system programmers, that know
pcg> the problems, find manual storage reclamation difficult to get right.

sakkinen> I agree, but my conclusion is: programmers should prefer
sakkinen> cleaner OO languages that make things like this possible, and
sakkinen> not C++.

I beg to differ inasmuch C++ is involved -- the advantages of cleaner
languages than C++ may be:

	1) they are simpler to understand and use than C++, as you have
	so often demonstrated.

	2) they usually do not require a conservative storage reclaimer,
	so that they can reclaim all garbage.

Automatic storage reclamation is *possible* and *safe* in C++, even if
less efficient than in language that provides type information to the
storage manager.

	But again, as to this: the loss of space to garbage is often
	minimal, and I know of languages where, even if possible, the
	implementation does not bother to provide type tables to the
	storage manager, and uses conservative g.c. instead.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/19/90)

On 18 Sep 90 17:50:14 GMT, jimad@microsoft.UUCP (Jim ADCOCK) said:

jimad> Seems to me for any GC system to work, there are a set of rules
jimad> on fair usage of objects and references that must be obeyed.

Just to be more precise: any program that uses a reclaimer has an
interface contract with it, which must be respected. The interesting
question is not whether the contract is more or less strict or fair, but
whether the contract is more or less limiting than the rules of the
language in which the program is written.

It is my contention that automatic reclamation in the guise of
conservative g.c. requires a *less* restrictive interface contract than
the manual reclaimer for a language like C++.

Just to make things clearer, I think it has come the time for me to
summarize my understanding of the storage reclamation problem, so that
at least my own "hidden assumptions" on which I base my reasoning are
made explicit.

Storage reclamation is based on the assumption that it is possible to
identify all portions of data space that will be used in the future by
the program. As a proxy to this unimplementable notion we use instead
the stronger one that it is presumed that a portion of data space will
be used in the future if there are valid references to it, i.e. if it
just *possible* for the program to still reach it.

Conceptually therefore storage reclamation consists of two conceptual
activities, identifying which data areas are still reachable, and
putting back in the pool of free space any data space that is not
reachable.

	Note that because at least of the latter requirement, the
	storage	reclaimer's deallocator component and the allocator must
	be connected; the three together form the storage manager.

Automatic storage reclamation is the case where identification of
reachable data space is done by a generalized, program independent,
algorithm; manual reclamation is the case where each program does the
identification process, and is given direct access to the storage
deallocator.

A garbage collecting automatic reclaimer is the case where
identification of reachable data space is performed (at least in
concept) by examining a snapshot of program memory; a reference counting
automatic reclaimer is one (I cannot think of others though) case where
identification is done by (at least in concept) examining dynamically
how a program references data space.

Normally there are several storage managers supporting a program's
execution (just consider PL/1 or Modula-3, for example); for example
often automatic variables are allocated on a stack associated with a
very simplified reference counting storage manager (in Algol 68 the user
is, unusually, given direct access to the stack manager's storage
allocator, called LOC).

Here we are interested in heap (i.e. arbitrary, non nested scope)
storage managers, and in particular in those that manage their heaps
using garbage collection as an automatic storage reclamation policy.

We also are interested in the further subset of garbage collectors for
which a "reference" is a storage address, or pointer.

	It is conceivable, e.g. to support referential integrity in a
	database system, to have garbage collectors to which a reference
	is a key represented as a string -- we are not discussing any of
	there here, even if it would be nearly the same, for as long as the
	definition of "reference" for both program and storage manager
	is the same, the problem does not really change.

So, what does a garbage collector needs to know?

1)	the extent of the heap(s) it must work upon

2)	the pointers from outside the heap into the heap

3)	which object a pointer points *into*

4)	the size of an object given a pointer to it

5)	which "words" in an object hold pointers

Problems 2), 4) and 5) are often solved by having a type table available
at run time to the storage manager and tagging each object as to its
type; problem 1) is usually solved by having the compiler tag pointers
in the stack and static areas (or any others) as well; problem 3), which
is among the thorniest, is usually solved by forbidding pointers in the
middle or an object, by using for that purpose a pointer+offset pair, of
by forbidding pointers to the middle of an object altogether (Simula-67,
Smalltalk, Eiffel...), and only using whole-object references
throughout.

Unfortunately C/C++ compilers do not produce any runtime information for
the storage reclaimer. Fortunately a conservative garbage collector can
collect *most* garbage using weaker technology than that required in
points 1)-5).

A conservative garbage collector storage reclaimer will be associated
with a BIBOP style storage allocator; there will be a separate heap for
each different size of object, and the collector will have access to the
table describing the extent of each heap, and the size of object
contained therein. Given a pointer it thus becomes easy to know whether
it points into some heap, which heap, and which object within that heap
it denotes.

The only problems that remain are therefore 2) and 5), knowing which
pointers point into the heap, from outside or outside.

Here we have conservativeness: we scan all of non heap memory, and every
word (we assume that pointers will be aligned) whose numeric value
interpreted as an address lies between the bounds of an heap section is
taken to be one. The same is done for every object so found; all words
in every object that contain a numeric value that could be an address
into a heap are assumed to hold pointers, and so on recursively.

Notice that this process does not depend on how pointers were generated
at all; it will only and always collect garbage, because all pointers to
an heap object will have been correctly identified as such. It will not
collect *all* garbage, because some words that contain integers or
strings or floating point numbers will have been conservatively mistaked
for addresses into the heap, and will have protected some garbage from
being reclaimed.

Notice that this is in practice extremely rare, taking a little bit of
care, because it is not difficult to ensure that pointer into the heaps
take numeric values that rarely occur as integers, strings, floats.

The efficiency of conservative garbage collection, measured as the
percentage of garbage that is not reclaimed, has been proven to be quite
high; this is because on many common os/architecture pairs only few
words that are not pointers into the heap look like they were, even
without taking a lot of care, so that little garbage is left
unreclaimed.

	Why? because on many machines the heaps are segments in fairly
	high storage addresses, the typical heap address being often a
	bit pattern like

		0x0a00d1fc0

	Most integers are not like this (almost all integers are < 100),
	neither are most floats (this looks like being denormalized),
	neither are most strings (strings are usually made up of
	alphanumerical characters, and are often longer), etc...

Readily accesible papers on conservative garbage collection are those of
Boehm, Bertlett and Edelson. I have already posted a compilation of
information on this subject a while ago.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

moss@cs.umass.edu (Eliot Moss) (09/20/90)

Modula-3 allows safe garbage collection, and it can be thought of as an
extension to Pascal. The rule that M3 adds is that a variable must at all
times contain a valid value of the variable's type. In general, this means
that newly allocated pointers must be set to NIL before the program can see
them. Appropriately sophisticated communication from the compiler to the
run-time system (in the form of tables), plus good data-flow analysis based
optimizations (e.g., dead store removal, etc.) will virtually eliminate any
overhead this would incur, except for cases where the program fails to
initialize items that appear to require initialization.

This is exactly the approach we're taking in merging true garbage collection
with heavy and sophisticated optimizations for Modula-3. I believe this can be
done with any type safe language. Unfortunately, C and C++ are not type safe.
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206; Moss@cs.umass.edu

thomasw@hpcupt1.HP.COM (Thomas Wang) (09/20/90)

It is possible to use a non-conservative GC with C++.  The requirement is
that the concept of a 'GC pointer' need to be encapsulated into a C++ class.

I have done such an implementation as part of a Master's thesis at
Cal Poly.  It worked with single inheritance only.  Unfortunately C++
multiple inheritance is incompatible even with the 'GC pointer' encapsulation
scheme.

I think the thesis can be ordered at the California Polytechnic University
library.  The thesis is called 'The "MM" Garbage Collector for C++'.


 -Thomas Wang
              (Everything is an object.)
                                                     wang@hpdmsjlm.cup.hp.com
                                                     thomasw@hpcupt1.cup.hp.com

jimad@microsoft.UUCP (Jim ADCOCK) (09/21/90)

In article <PCG.90Sep19135618@teacha.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
|Just to make things clearer, I think it has come the time for me to
|summarize my understanding of the storage reclamation problem, so that
|at least my own "hidden assumptions" on which I base my reasoning are
|made explicit.
....
|The efficiency of conservative garbage collection, measured as the
|percentage of garbage that is not reclaimed, has been proven to be quite
|high; this is because on many common os/architecture pairs only few
|words that are not pointers into the heap look like they were, even
|without taking a lot of care, so that little garbage is left
|unreclaimed.
|
|	Why? because on many machines the heaps are segments in fairly
|	high storage addresses, the typical heap address being often a
|	bit pattern like
|
|		0x0a00d1fc0
|
|	Most integers are not like this (almost all integers are < 100),
|	neither are most floats (this looks like being denormalized),
|	neither are most strings (strings are usually made up of
|	alphanumerical characters, and are often longer), etc...

"Just to make things clearer" -- you're assuming you're running on machines
with 32-bit integers, and 32-bit linear addresses.  There's at least one
large interesting class of machines on which these assumptions fail miserably.

pallas@red-dwarf.Eng.Sun.COM (Joseph Pallas) (09/21/90)

In <PCG.90Sep19132024@teacha.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

pcg> The definition of "in use" being that its address as if returned by the
pcg> storage allocator appears in one or more pointers in storage. 

pcg> It merely excludes all erroneous programs, i.e. all those that
pcg> violate the interface contract with the storage manager.

This is a novel interpretation, I should say.  The C++ language most
definitely does not include such a constraint in the language
definition.

sakkinen> For instance, suppose we have a case of multiple inheritance
sakkinen> where class D is some other base class than the first, of class E.
sakkinen> If we now create an object by 'new E' and cast the resulting pointer
sakkinen> to the type 'D*', it will no more point to the same address that
sakkinen> the allocator returned (at least in some implementations of C++).

pcg> Well, in that case you cannot use that pointer as operand to operator
pcg> delete either, so if you are doing manual storage reclamation your
pcg> object has become unreclaimable -- unless you keep around a pointer to
pcg> the 'E*', in which case no problem arises under any reclamation policy.

This is a substantial constraint you are suggesting: automatic storage
reclamation that works correctly only if the program can be
transformed, by just the addition of delete operations, into one that
correctly reclaims all garbage with manual storage reclamation.  We
seem to agree that the set of such programs is smaller than the set of
valid/legal programs.

pcg> It would be 'in-use' as long as the resulting pointer did not
pcg> have a different *format* (I very carefully wrote that) from that
pcg> returned by the storage allocator and returned by the storage
pcg> deallocator; in your example the format is not changed, just the
pcg> value of the pointer is.

No, you did not very carefully write that.  What you very carefully
wrote (above) was that the address of the object (i.e., the VALUE, not
just the format) appears in a pointer in storage.

You seem to be assuming your conservative collector honors all
interior pointers.  But that is not what you are writing.  It is no
wonder that this is causing confusion.

pcg> Note that the conservative reclaimer will do one better than even
pcg> a manual one using free(3) or operator delete would, because if
pcg> these were used either one would have to reclaim all of the
pcg> structure, ... or could not reclaim the structure even if the
pcg> reference ... were lost ....

Now you're not making sense.  Either you honor interior pointers or
you don't.  If you do, your collector cannot reclaim anything in this
example.  If you don't, your above claims are invalidated.  Which is
it?

joe

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/23/90)

On 21 Sep 90 00:48:10 GMT, pallas@red-dwarf.Eng.Sun.COM (Joseph Pallas) said:

pallas> In <PCG.90Sep19132024@teacha.cs.aber.ac.uk> pcg@cs.aber.ac.uk
pallas> (Piercarlo Grandi) writes:

pcg> The definition of "in use" being that its address as if returned by the
pcg> storage allocator appears in one or more pointers in storage. 

pcg> It merely excludes all erroneous programs, i.e. all those that
pcg> violate the interface contract with the storage manager.

pallas> This is a novel interpretation, I should say.  The C++ language
pallas> most definitely does not include such a constraint in the
pallas> language definition.

I am sorry, but I cannot understand your comment. The definition was not
specific in any way to C++. As understood in academic circles, the
garbage collection problem is: given a memory snapshot with recognizable
pointers and object boundaries, and a list of root pointers, identify
all objects reachable from that list, and reclaim all the others.

In this context the language definition here is irrelevant; what is
relevant, I am fairly tired here of repeating it, is the language
implementation. Can we have a faithful and efficient language
implementation (for any given language X) such that conservative g.c. is
possible? For Pascal and C, definitely yes. But for obscure details that
can turn up yet, this is also possible for C++.

Again: a conservative g.c. storage reclaimer cannot work with a language
implementation that uses a format for pointers different from that for
which the storage manager has been written. This is not, as I understand
it, a constraint. It is simply a condition for the implementation to
work at all, i.e. not to be erroneous.

Using a conservative g.c. would be a 'constraint' if it required changes
in the language definition, not just respect of its interface contract
from the other components of the implementation.

sakkinen> For instance, suppose we have a case of multiple inheritance
sakkinen> where class D is some other base class than the first, of class E.
sakkinen> If we now create an object by 'new E' and cast the resulting pointer
sakkinen> to the type 'D*', it will no more point to the same address that
sakkinen> the allocator returned (at least in some implementations of C++).

pcg> Well, in that case you cannot use that pointer as operand to operator
pcg> delete either, so if you are doing manual storage reclamation your
pcg> object has become unreclaimable -- unless you keep around a pointer to
pcg> the 'E*', in which case no problem arises under any reclamation policy.

pallas> This is a substantial constraint you are suggesting: automatic storage
pallas> reclamation that works correctly only if the program can be
pallas> transformed, by just the addition of delete operations, into one that
pallas> correctly reclaims all garbage with manual storage reclamation.  We
pallas> seem to agree that the set of such programs is smaller than the set of
pallas> valid/legal programs.

Again this seems very obscure to me -- you seem to be suggesting that a
program that does not reclaim storage, either totally or partially, is
valid and legal.

It may be true in the sense that a program that does no storage
reclamation gives the same results as one that does (modulo destructors,
that is) -- storage consumption after all is part of pragmatics, not
semantics, so it may seem you have a point.

    Incidentally I am quite disappointed that nobody so far has raised
    the thorniest problem with C++ and automatic storage reclamation,
    especially conservative g.c., which is indeed destructors.

The problem however we are discussing is *reclamation*, not just
semantics, in the sense that we have already granted that pragmatics
*are* relevant to our discussion, because we want to do reclamation.
Sakkinen has explicitly stated that we are not interested in the null
reclaimer. We are interested in comparing a manual reclamation policy
with one based on a conservative g.c., I have so far presumed.

So the question is whether conservative g.c. can be implemented in such
a way that it will indeed identify and *reclaim* as much non in use
storage, and no in-use storage, without altering a program's behaviour,
as would a manual reclaimer, or better.

But even assuming that leaving storage unreclaimed is not a mistake,
programmers can always disable the storage reclaimer for a critical
section, just as they can refrain from deallocating manually an object
if still holding pointers to its members. Nobody *forces* you to always
have the collector enabled either.

It is true that refraining from calling operator delete is not the same
as calling say 'DontReclaim()' (or 'DontReclaim(object)' for a more
selective disabling), but consider that these are pretty marginal cases
on one hand, and on the other that in general the program texts that are
designed for different storage management policies need not be the same,
as long as they are equivalent (just for the sake of example, if the
storage allocation policy changes, so may the number of arguments to
operator new -- for example we may add to each call a binary value to
select fast/slow storage). Typically the program logic that depends on
the details of the interface contract to the storage manager will, in
well written programs, be relegated to some low level module. This will
be particularly important in the case of manual storage reclamation.

pcg> It would be 'in-use' as long as the resulting pointer did not have
pcg> a different *format* (I very carefully wrote that) from that
pcg> returned by the storage allocator and returned by the storage
pcg> deallocator; in your example the format is not changed, just the
pcg> value of the pointer is.

pallas> No, you did not very carefully write that.

I did, I did. You quoted me:

pcg> address *as if* returned by the storage allocator

I think I also used *format* explicitly in some other of my excessively
long articles.

pallas> What you very carefully wrote (above) was that the address of
pallas> the object (i.e., the VALUE, not just the format) appears in a
pallas> pointer in storage.

I do not understand you again here. Of course a pointer, which must be
in the same format as that of pointers to objects returned by the
allocator, must contain a value. Which value is irrelevant -- I have
even allowed explicitly illegal addresses, addresses to members of a
composite object, obtained with any form of arithmetic on pointers,
etc... Or maybe I did assume here that it was understood that 'object'
in general also denotes an object that is part (member or inherited) of
another object, as in the example by Sakkinen.

This is why the *as if* above -- a pointer to an object part of another
object will not be actually one of those returned by the storage
collector, but one derived by pointer arithmetic from one of those, but
as long as it looks *as if* it had been returned by the storage
allocator (i.e. it has the same format), the value of a pointer word is
irrelevant.

pallas> You seem to be assuming your conservative collector honors all
pallas> interior pointers.  But that is not what you are writing.

I am sorry I did not make myself understood here -- as I have repeated
to exhaustion, if you use a BIBOP style storage allocator, a pointer to
the middle of a composite object trivially implies the pointer to the
beginning of the (whole) object, and will protect the entire object. It
is true however that in general, i.e.  without a BIBOP style storage
allocator, or the use of type tables, a pointer to an object that is
part of another is not equivalent, storage reclamation wise, to a
pointer to the whole composite object in which it sits.

pcg> Note that the conservative reclaimer will do one better than even
pcg> a manual one using free(3) or operator delete would, because if
pcg> these were used either one would have to reclaim all of the
pcg> structure, ... or could not reclaim the structure even if the
pcg> reference ... were lost ....

pallas> Now you're not making sense.  Either you honor interior pointers
pallas> or you don't.  If you do, your collector cannot reclaim anything
pallas> in this example.  If you don't, your above claims are
pallas> invalidated.  Which is it?

I do not understand again what you are writing. I am quite tired in
effect, so that maybe why. I will try to express myself better, maybe
this will help:

Sakkinen proposed an example in which a composite object is protected
only by a pointer to an object part of it. The composite object *cannot*
be reclaimed manually, because a pointer to one of its parts cannot be
used with operator delete, because it has the wrong type (if nothing
else).

A conservative g.c. (that cooperates with a BIBOP style storage
allocator) will not reclaim the composite object as long as a pointer to
an object that is part of it is still around; when no such pointers
remain, the entire composite object will be reclaimed. This looks to me
an improvement, and a safe one as to that, over manual reclamation.

	As I said, it takes a sophisticated storage reclaimer, manual or
	automatic, and a type table at runtime, to be able to reclaim
	only the unreachable subojects of a composite object,
	leaving the reachable subobjects alone. operator delete is not
	even allowed to do it, because it must be presented with a
	pointer with the same type and value as one returned by a
	call to operator new.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

pcg@aber-cs.UUCP (Piercarlo Grandi) (09/23/90)

In article <57605@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK) writes:

  |The efficiency of conservative garbage collection, measured as the
  |percentage of garbage that is not reclaimed, has been proven to be quite
  |high; this is because on many common os/architecture pairs only few
  |words that are not pointers into the heap look like they were, even
  |without taking a lot of care, so that little garbage is left
  |unreclaimed.
  |
  |	Why? because on many machines the heaps are segments in fairly
  |	high storage addresses, the typical heap address being often a
  |	bit pattern like
  |
  |		0x0a00d1fc0
  
  "Just to make things clearer" -- you're assuming you're running on machines
  with 32-bit integers, and 32-bit linear addresses.

You are assuming this -- I have written 'on many common os/architecture
pairs'.

  There's at least one large interesting class of machines on which these
  assumptions fail miserably.

Maybe one can be clever enough to find a storage managemen scheme by which
integer, string or float values are not easily mistaken with addresses also
for this one large interesting (for marketing departments only, I surmise
:->) class of machines -- or may be not.

I guess that for the large interesting class of machines and pseudo os
architectures you might be thinking about this can be done with fairly good
efficiency. Especially if you can make the storage allocator and compiler
cooperate.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/25/90)

I had written, after stating that conservative g.c. in practice, on many
popular machine/os combinations, leaves unreclaimed a fairly small
percentage of garbage:

pcg>   Why? because on many machines the heaps are segments in fairly
pcg>   high storage addresses, the typical heap address being often a
pcg>   bit pattern like

pcg>   		0x0a00d1fc0

In article <57605@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK)
comments:

jimad> "Just to make things clearer" -- you're assuming you're running
jimad> on machines with 32-bit integers, and 32-bit linear addresses.
jimad> There's at least one large interesting class of machines on which
jimad> these assumptions fail miserably.

To this I replied:

pcg> I guess that for the large interesting class of machines and pseudo
pcg> os architectures you might be thinking about this can be done with
pcg> fairly good efficiency. Especially if you can make the storage
pcg> allocator and compiler cooperate.

"Just to be clearer": _if_ Jim Adcock was referring to that class of
machines and operating systems where each memory location has 4096
(2^(32-20)) equivalent addresses, one can conventionally use one of
these synonyms to indicate a heap pointer. So for example non-heap
pointers may systematically use the 0th synonym, and heap pointers would
use the 2048th. Hints of tagging...

G.c. support for SIL/MOHLL OO languages (a singleton set) need not be
confined to linear address architectures.

	Naturally there is some problem with mixed model programs, i.e.
	those that use both far and near pointers at the same time. But
	using them is always your choice, isn't it, and they are not in
	C++ anyhow...
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

jimad@microsoft.UUCP (Jim ADCOCK) (09/25/90)

In article <2030@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
|  There's at least one large interesting class of machines on which these
|  assumptions fail miserably.
|
|Maybe one can be clever enough to find a storage managemen scheme by which
|integer, string or float values are not easily mistaken with addresses also
|for this one large interesting (for marketing departments only, I surmise
|:->) class of machines -- or may be not.

I take it you don't find solving hard problems for a large number of
real world customers "interesting" ?

-- In any case, I think you mistake the problem.  On a few machines/os pairs
that are interesting from a "marketing" point of view, including Mac, Dos,
OS/2, Windows, etc, the pointer model chosen is often a 16:16 segment:offset
pair, rather than a 0:32-bit linear address.  This would not in itself be a 
fatal problem, except that segment and offset are frequently stored separately,
making "find all values that might be a valid object address" an N^2 problem.

|.... Especially if you can make the storage allocator and compiler
|cooperate.

If one has compiler cooperation, *why* would one use conservative approaches?

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/25/90)

On 24 Sep 90 21:22:55 GMT, jimad@microsoft.UUCP (Jim ADCOCK) said:

jimad> In article <2030@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo
jimad> Grandi) writes:

pcg> .... Especially if you can make the storage allocator and compiler
pcg> cooperate.

jimad> If one has compiler cooperation, *why* would one use conservative
jimad> approaches?

Because it may entail less overhead. It may be easier just to sweep
memory en-masse than to pause to consult (possibly large, complex) type
tables. I have been told of one Algol 68 compiler of some years ago,
generating C code, that instead of generating type tables as C data
structures would use conservative g.c.

The idea being that for many common machines the garbage uncollected
would cost less than the overhead of really knowing which words are
pointers. Not to mention that a conservative g.c. is probably far
simpler to commission than one that uses tagging and type tables.


As to the other problem raised by Jim Adcock:

jimad> On a few machines/os pairs that are interesting from a
jimad> "marketing" point of view, including Mac, Dos, OS/2, Windows,
jimad> etc, the pointer model chosen is often a 16:16 segment:offset
jimad> pair, rather than a 0:32-bit linear address.  This would not in
jimad> itself be a fatal problem, except that segment and offset are
jimad> frequently stored separately, making "find all values that might
jimad> be a valid object address" an N^2 problem.

You have a point, one cannot always win. If you mix near and far
pointers, you get into some trouble (and not just because of
conservative g.c.). On the other hand the compiler could help a bit.
But then the collector becomes hybridly conservative, I guess. After
all, we are back to our previous theme: any storage manager design will
not work under any possible condition.

Also, I think that it is not terribly difficult to rework those
applications so that they use only near or only far pointer, and at a
probably tolerable cost in (space) efficiency. After all, nothing forces
you to use near and far pointers in the same program; I would even say
that strictly speaking this is not possible in C++.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

jimad@microsoft.UUCP (Jim ADCOCK) (09/26/90)

In article <PCG.90Sep24181007@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
|"Just to be clearer": _if_ Jim Adcock was referring to that class of
|machines and operating systems where each memory location has 4096
|(2^(32-20)) equivalent addresses, one can conventionally use one of
|these synonyms to indicate a heap pointer. So for example non-heap
|pointers may systematically use the 0th synonym, and heap pointers would
|use the 2048th. Hints of tagging...

Nah, because such address synonyms only apply to one obsolescent member of
the family.  Making use of such address puns is the moral equivalent of
people who play with the high bits in a 32 bit linear pointer -- because they
"know" their particular CPU board only implements 24 address lines. In any case,
making explicit use of address synonyms doesn't make much progress on 
the underlying problem, which is that segment and offset parts of a 
pointer are commonly stored separately, such that a "conservative" approach
requires the consideration of n^2 segment:offset word-pairs as "possible 
addresses of objects to be kept alive."

tom@ssd.csd.harris.com (Tom Horsley) (09/26/90)

>>>>> Regarding Re: C++ and garbage collection; pcg@cs.aber.ac.uk (Piercarlo Grandi) adds:
pcg> Also, I think that it is not terribly difficult to rework those
pcg> applications so that they use only near or only far pointer, and at a
pcg> probably tolerable cost in (space) efficiency. After all, nothing
pcg> forces you to use near and far pointers in the same program; I would
pcg> even say that strictly speaking this is not possible in C++.

To say that "nothing forces you to use near and far pointers in the same
program" is absolute nonsense. Performance is not `nothing'. When dealing
with insane and brain damaged architectures produced by the twisted
minds of Intel engineers it is often *absolutely necessary* to mix far
and near pointers in order to get a program that even comes close to
having an acceptable amount of overhead. Saying, "OK, well I'll just use
far pointers for everything" is a sure-fire way to make a 30 second
task take 2 hours.

Furthermore, it is not only possible to mix far and near pointers in C++, it
is actually much more convenient than in C. With C++ you get better type
checking and you can use overloading to provide different versions of "the
same" routine which accept far pointers in some case and near pointers in
another. (Not that it is any fun, of course. Personally I am so sick of
trying to code on this ridiculous architecture that I am about to buy a DG
AviioN, get a real machine with a real address space, and never write 'far'
or 'near' again...).
--
======================================================================
domain: tahorsley@csd.harris.com       USMail: Tom Horsley
  uucp: ...!uunet!hcx1!tahorsley               511 Kingbird Circle
                                               Delray Beach, FL  33444
+==== Censorship is the only form of Obscenity ======================+
|     (Wait, I forgot government tobacco subsidies...)               |
+====================================================================+

chased@rbbb.Eng.Sun.COM (David Chase) (09/27/90)

One thing I haven't noticed yet is mention of transformations applied
to pointers by an optimizing compiler.  I've also got some experience
using the Boehm-Weiser collector, both in "conservative" mode
(ignoring interior pointers) and in "hyper-conservative" mode
(treating interior pointers as pointers).


INTERIOR POINTERS

Treating interior pointers as pointers is not a trivial matter.  In
one application (a Modula-3 compile server) the collector failed to
reclaim large amounts of storage.  This required changes to the
program to manually recycle certain pieces of storage.


OPTIMIZATION AND THE COLLECTOR

If your C++ "compiler" is just a translator to C (or a compiler
written in ignorance of the needs of a garbage collector) then there
is no guarantee at all that the compiler will maintain pointers in the
format required by the collector.  I'll provide some contrived
examples:
    
    char * x;

    for (i = A; i < B; i++)
       {
        ... x[i + C] ... /* C is constant in the loop */
       }
    /* No subsequent uses of x */

It is quite likely that the expression "&(x[C])" could be
recognized as loop invariant and hoisted out of the loop.
There is no guarantee that C is greater than zero, and none is
necessary since the language only requires that "i + C" be a valid
index for the array.


Edelson doesn't rely on conservative pointer location; instead, he
links pointer roots together in lists to aid in location.  This avoids
the problems with conservative collection, but may not work correctly
in (say) a multi-threaded system on a machine with instruction
scheduling in the compiler (e.g., MIPS, Sun, IBM RS/6000), and
preemptive scheduling in the OS (e.g., Mach).  The example goes
something like this:

   int f()
   {
      R_t x;      /* Really
                     x.ptr = NULL;
                     x.link = head.root_list;
                     head.root_list = & x;
                   */
      register int y;
      ...
      y = x -> m; /* Really overloads and inlines into "x.ptr -> m" */
      x = NULL;   /* Really "x.ptr = NULL */

     return  y;
     /* Implicitly, head.root_list = x.link at procedure exit. */
   }

It may not improve the speed of the program (but it might) to first
form the address of "x.ptr -> m" (in a temporary, "t"), then assign
NULL to x.ptr, and then dereference "t" to get the value for "y".  "X"
is local, and not declared volatile, and contains no integers, so a
compiler might reasonably assume that it is safe to reorder the
assignment to "x.ptr" and the load from the "t".  However, if this
thread is suspended after the assignment, but before the load, then
the garbage collector could erronously recycle the memory referenced
by "x".


Now, I haven't gone to the trouble of trying to provoke a compiler
into doing this, but most RISC optimizers fool around with their
instruction schedules, and super-scalar machines often benefit by
getting some separation between their memory operations; it seems to
me that there is some burden on the proponents of garbage collection
in C++ to demonstrate (dare I say, "prove"?) that their collector will
work with a gc-ignorant optimizing compiler, or else I must charge to
the collector the time wasted by running unoptimized code.  Of course,
this needs to be reproved with each new release of the compiler.

In the short term, many of these problems can be solved by the use of
"volatile", though that won't help my program's speed much either.

David Chase
Sun Microsystems

thomasw@hpcupt1.HP.COM (Thomas Wang) (09/28/90)

/ hpcupt1:comp.object / chased@rbbb.Eng.Sun.COM (David Chase) /  2:45 pm  Sep 26, 1990 /
>One thing I haven't noticed yet is mention of transformations applied
>to pointers by an optimizing compiler.

>INTERIOR POINTERS

>OPTIMIZATION AND THE COLLECTOR

>it seems to
>me that there is some burden on the proponents of garbage collection
>in C++ to demonstrate (dare I say, "prove"?) that their collector will
>work with a gc-ignorant optimizing compiler, or else I must charge to
>the collector the time wasted by running unoptimized code.  Of course,
>this needs to be reproved with each new release of the compiler.

If one think a pointer is a 32 bit entity, then you are limited to
C++ GC implementations that have some undesirable properties.

But in today's world, where memory is relatively cheap, there is no reason why
a pointer cannot be encapsulated into an abstract data type that understands
operations on garbage collection.

With this approach, then you need 3 extra requirements for this scheme to
work.

(1)  Every structure must know how many GC pointer it contains, and where.
     Some virtual GC member functions need to be written for each structure.

Manually doing this is tedious, so it is possible to write a template class
generator to make this task semi-automatic.

(2)  GC pointers inside a structure must be initialized to NULL before its
     constructor is called.

This is required, because GC may be triggered during object construction.
The implementation is a little bit tricky; I had to use a macro for object
creation.

(3)  There must be a way to know automatically if a GC pointer is root pointer
     or not.

I assume every GC pointer is root, until they become not root.  For GC pointers
inside a structure, the constructor for that structure will turn all its
GC pointers to non-root.

The result is a non-conservative C++ collector that is also portable.


 -Thomas Wang
              (Everything is an object.)
                                                     wang@hpdmsjlm.cup.hp.com
                                                     thomasw@hpcupt1.cup.hp.com

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/28/90)

On 26 Sep 90 21:45:14 GMT, chased@rbbb.Eng.Sun.COM (David Chase) said:

chased> One thing I haven't noticed yet is mention of transformations
chased> applied to pointers by an optimizing compiler.

Point one: I do not like optimizing compilers :->. Point two:
conservative g.c. *may* require compiler cooperation.

Just repeat with me: no storage reclamation policy works under an
arbitrary interface contract; what matters is whether the required
interface contract is more or less restrictive than the *language*
definition, not the interface contract of some existing compiler X.

If the interface contracts of a particular conservative g.c.
implementation and of a particular compiler X are not compatible, tough.

chased> Treating interior pointers as pointers is not a trivial matter.
chased> In one application (a Modula-3 compile server) the collector
chased> failed to reclaim large amounts of storage.

I must confess I am extremely surprised. Hyper-conservative (as you call
it) collection tests for possible pointerhood by saying (if value
interpreted as an address is within the heap), conservative adds (...
and is on an object address boundary). I would tend to think that the
first test alone is sufficient to screen out virtually all non-pointers,
if the heap addresses are suitably arranged. Maybe you had your heap in
low storage (e.g. you were using a heap in the BSS of a BSD derived
UNIX) -- to work well conservative g.c. must *at least* count on a
cooperative storage allocator design.

chased> it seems to me that there is some burden on the proponents of
chased> garbage collection in C++ to demonstrate (dare I say, "prove"?)
chased> that their collector will work with a gc-ignorant optimizing
chased> compiler,

There is no such burden -- repeat again: if the compiler has an
interface contract, because of optimization or otherwise, that is not
compatible with that of the storage reclaimer, tough. What needs to
be considered is whether there are no alternative implementation
strategies that provide overall (that is compiler + runtime system)
better performance. And even if there is a storage reclamation policy
that has equal or better performance than g.c. under a certain
compilation policy, which is equal or better than any other, one may
still prefer conservative g.c. under a not as good compiler, because it
is *convenient*.    

chased> or else I must charge to the collector the time wasted by
chased> running unoptimized code.  Of course, this needs to be reproved
chased> with each new release of the compiler.  In the short term, many
chased> of these problems can be solved by the use of "volatile", though
chased> that won't help my program's speed much either.

Ahem -- let's switch the burden of proof: let the proponents of manual
storage reclamation, or automatic storage reclamation that uses some
policy different from g.c., prove that *their* alternative overall
involves less overheads than even 'volatilized' conservative g.c.,
in a *system wide* comaprison (speed of compiled code + overhead of
storage reclamation).

You cannot just point at the problems with g.c. in one context and say
"ha ha". Yo want to compare the alternatives, which are reclamation done
otherwise, or no reclamation at all.

Lacking hard data, let's turn to educated guesses (handwaving :->): the
leading alternative to conservative g.c., manual or automatic reference
counting, seems likely to have a much larger performance impact than the
loss of some (dubious IMNHO) compiler optimization; this both in CPU
overheads, and in catastrophic worsening of locality under VM, even when
the best algorithms are considered.

--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/28/90)

>>>>> Regarding Re: C++ and garbage collection; pcg@cs.aber.ac.uk
>>>>> (Piercarlo Grandi) adds:

pcg> Also, I think that it is not terribly difficult to rework those
pcg> applications so that they use only near or only far pointer, and at a
pcg> probably tolerable cost in (space) efficiency. After all, nothing
pcg> forces you to use near and far pointers in the same program; I would
pcg> even say that strictly speaking this is not possible in C++.

On 26 Sep 90 11:54:55 GMT, tom@ssd.csd.harris.com (Tom Horsley) said:

A lot of agreeable things (Go for the AviiON!), but also:

tom> Saying, "OK, well I'll just use far pointers for everything" is a
tom> sure-fire way to make a 30 second task take 2 hours.

OK, we are no longer discussing on the high plane points of language
definitions and interface contracts etc...; here we are discussing howe
to fight with an hostile environment and how to play it dirty safely.

So, you can consider three points:

1) if using far/near pointers makes you use a vastly inferior storage
reclaimer, as a naive reference counting one may be, then the saving
from conservative g.c. may be greater than the overhead of using far
pointers alone.

2) it is my experience that the exclusive use of far pointers slows down
(or grows in space) even pointer intensive programs by something around
30%; a far pointer dereferencing operation takes about double the cycles
of a near pointer dereference, plus a few extra instructions to laod the
segment register, so at most the speed is impacted by 2-3 times if the
program does *only* pointer dereferences.

3) if you use near/far pointers you are playing it dirty, because you
are specifically accepting the idea that you are doing special case
programming for performance reason. So, let me observe that in most
every case you use near pointers as offsets (relative pointers) in some
data structure to which you have somewhere a far pointer; it is from
this far pointer that you get the segment selector after all. Well, the
far pointer will protect the whole data structure from reclamation. Now,
if you also want to reclaim storage *within* that data structure, there
yopu have problems, and need some compiler assistance. You don't have
it? Tough. Complain to your comnpiler vendor that ehri compiler does not
support g.c. (conservative or otherwise) and so deprives you of
consistent performance and safety benefits.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

chased@rbbb.Eng.Sun.COM (David Chase) (09/29/90)

In article <PCG.90Sep27200339@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>On 26 Sep 90 21:45:14 GMT, chased@rbbb.Eng.Sun.COM (David Chase) said:
>
>chased> One thing I haven't noticed yet is mention of transformations
>chased> applied to pointers by an optimizing compiler.
>
>Point one: I do not like optimizing compilers :->. Point two:
>conservative g.c. *may* require compiler cooperation.
...
> Let's switch the burden of proof: let the proponents of manual
>storage reclamation ... prove that *their* alternative overall
>involves less overheads than even 'volatilized' conservative g.c.,
>in a *system wide* comaprison (speed of compiled code + overhead of
>storage reclamation).

I agree that conservative GC may require compiler cooperation, but in
practice anyone using a translator to C is not getting it.  I also
agree that time spent in the garbage collector may be paid back in
other ways (faster allocation, increased locality of reference).
These effects can be significant; Edelson shows that allocation (that
is, allocation and initialization in the building of an expression
tree) using his compacting collector is over twice as fast as
allocation using the system-supplied malloc, and over three times
faster than allocation using reference-counting reclamation.

However, the advantage is not so clear when the time for garbage
collection is added; it varies from twice as fast (all objects
discarded) to slightly slower (no objects discarded).  The
conventional wisdom for Lisp and Smalltalk is that most objects are
discarded, implying that a figure of "twice as fast" ought to hold.

However
(1) Conventional wisdom for Lisp/Smalltalk may not hold for C/C++;
    Lisp and Smalltalk rely on garbage collection for activation
    records and real numbers, for instance, and C does not
    (optimization of Lisp cleans this up somewhat).

(2) Edelson's benchmark compares the *standard* unix allocator with
    his garbage collected allocator.  It is (unfortunately) fairly
    standard practice to interpose user-managed storage caches for
    high-traffic storage sizes, and thus run-time expense of manual
    storage reclamation is easily avoided without turning off the 
    optimizer, and without pauses for garbage collection.

(3) On one benchmark of C (*) that is already heavily hand-optimized,
    and making heavy use of memory allocation (allocating 3-10 meg),
    unoptimized code takes 40% longer than unoptimized code.  I expect
    that the costs will only get larger in the future; new machines
    are not being designed with unoptimized code in mind.

(4) I've used Hans Boehm's collector in place of malloc with this very
    same benchmark, compiled with full optimization, in conservative
    (not hyper-conservative) mode.  It ran 10% slower.

(5) Profiling indicates that memory management takes less than 3% of
    the time; thus, reducing its cost to zero would not offset the
    additional cost of running unoptimized code.

Now, I've said nothing about time wasted by the programmer in writing
the manual storage management code, and nothing about the likelihood
of bugs in the two systems.  Those two costs put me strongly in favor
of garbage collection.

However, those two costs also put me strongly in favor of compiler
optimizations -- as a rule, the compiler does a faster, more correct,
more effective, and more reliable job of micro-optimization than I do,
and though I prefer garbage collection, I do not like the cost of
turning off the optimizer, and I think many people would like it even
less than I do.

David Chase
Sun Microsystems

(*) The benchmark is a bottom-up tree pattern matcher generator.  You
can read about the algorithmic improvements in the 1987 POPL
proceedings.  The code has been heavily hand-optimized across Vax
11/750, Sun 3, and IBM RT.  I can send the code to anyone who cares; I
wrote it as a student.

jimad@microsoft.UUCP (Jim ADCOCK) (09/29/90)

In article <PCG.90Sep25174317@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
|jimad> On a few machines/os pairs that are interesting from a
|jimad> "marketing" point of view, including Mac, Dos, OS/2, Windows,
|jimad> etc, the pointer model chosen is often a 16:16 segment:offset
|jimad> pair, rather than a 0:32-bit linear address.  This would not in
|jimad> itself be a fatal problem, except that segment and offset are
|jimad> frequently stored separately, making "find all values that might
|jimad> be a valid object address" an N^2 problem.
|
|You have a point, one cannot always win. If you mix near and far
|pointers, you get into some trouble (and not just because of
|conservative g.c.). On the other hand the compiler could help a bit.
|But then the collector becomes hybridly conservative, I guess. After
|all, we are back to our previous theme: any storage manager design will
|not work under any possible condition.
|
|Also, I think that it is not terribly difficult to rework those
|applications so that they use only near or only far pointer, and at a
|probably tolerable cost in (space) efficiency. After all, nothing forces
|you to use near and far pointers in the same program; I would even say
|that strictly speaking this is not possible in C++.

Sorry, I think I have still not made clear my point.  The N^2'd-ness of
the problem is not due to programmer's mixing near and far pointers,
but rather due to compilers forced to target machines with separate 
registers for the segment and offset parts of a far pointer.  When a 
compiler needs to reuse one of those registers, the isolated segment,
or offset part of the far pointer in that particular register is spilled
to a temp on stack.  Thus, it's the spilling of the individual registers
making up a segment:offset pair that causes the N^2'd-ness.  If you were
willing to write a new compiler that was willing to take the performance
and code size hit of always keeping and spilling segment:offset registers
in pairs, then you could keep it down to a size N problem.

But, while you could claim that compilers could be rewritten to spill the 
segment:offset registers as a pair, todays optimizing compilers has good 
understanding of how to optimize on shared segments between pointer variables, 
so that forcing the compiler to maintain segment:offset registers as pairs 
would be a significant optimization hit.  Not to mention a compiler rewrite.

Hence my claim that conservative GC makes some sense for 0:32 machines,
but little for 16:16 machines.  I've used Boehms GC on 0:32 machines, and
was quite happy with it.  Just doesn't work on 16:16 machines with 
present compilers, though.

dsouza@optima.cad.mcc.com (Desmond Dsouza) (09/29/90)

In article <PCG.90Sep23020738@teachb.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:


   >   pallas> Now you're not making sense.  Either you honor interior pointers
   >   pallas> or you don't.  If you do, your collector cannot reclaim anything
   >   .....
   >
   >   Sakkinen proposed an example in which a composite object is protected
   >   only by a pointer to an object part of it. The composite object *cannot*
   >   								   ^^^^^^^^
   >   be reclaimed manually, because a pointer to one of its parts cannot be
   >   used with operator delete, because it has the wrong type (if nothing
   >   else).

You are right about not being able to reclaim the composite object
with a pointer to a *data* member (More on this later)

I hope this is not too obvious, but:
For an object of a class with multiple base classes (C++), would you
say that a Base pointer pointing to the Base class portion of the
object is pointing to a *part* of it? If so, you certainly can reclaim
the entire object manually by calling a *virtual* delete operator on
the Base pointer.

   >       As I said, it takes a sophisticated storage reclaimer, manual or
   >	   automatic, and a type table at runtime, to be able to reclaim
   >	   only the unreachable subojects of a composite object,
   >	   leaving the reachable subobjects alone. operator delete is not
   >	   even allowed to do it, because it must be presented with a
   >	   pointer with the same type and value as one returned by a
   			^^^^^^^^^^^^^^^^^^^^^^^
   >	   call to operator new.

   >   Piercarlo "Peter" Grandi           | ARPA:  pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk


Sorry, but as long as operator delete is *virtual* in Base, the
following is legal. It calls Derived::~Derived,  and reclaims a Derived :

Derived* dp = new Derived;	// "new" returns a Derived*
Base* bp = dp;
...
delete bp;			// delete a Base*: different type,
				// possibly different value from dp



----------------------------------------

NOW, just for grins, how about this definition of data members:

If class A *CONTAINS* a data member of class B, define
deletion of the B member as implying deletion of the containing
A, provided B::~B is virtual. i.e. destroying a part of an object
destroys the object itself.

Could this be implemented ? Well, ... :

1. When a B is constructed: *if* it was *contained* in an A, then the
   dynamic value of B's destructor is the same as the dynamic value of
   that A's destructor. (This recursive definition accounts for
   transitive containment). All necessary information is available at
   compile time, including containment, offsets, etc. The underlying
   mechanism is much like class derivation, though there will
   clearly be more versions of B's virtual table -- or equivalent
   dynamic dispatch mechanism -- due to B being potentially contained
   (transitively) in several classes. 

	class B { virtual ~B(); };
	class A { virtual ~A();
		  B b; };
	
	B* bp1 = new B;      	// delete bp1 ==> B::~B

	B* bp2 = (* new A).b;	// delete bp2 ==> A::~A with offsets etc.
	

2. When an A is constructed: if it was *not* contained in another
   class, then A's dynamic destructor would be decided by normal class
   derivation rules. i.e. A::~A if it was an A being created, C::~C if
   it was constructed as part of a derived class C.

	class C : public A {};

	A* ap = new C; 		// delete ap ==> C::~C
	B* bp = &ap->b;		// delete bp ==> C::~C


Note that since reference data members and constant pointers cannot be
re-assigned, the definition of *CONTAINS* could be extended as below
to include them, but the implementation might be very difficult.
  Defns:
  - B data member: A does contain B
  - B* data member: can be re-assigned: A does *not* contain B
  - (B* const) member: re-assign with casts subverts const semantics:
     A *does* contain B
  - (const B*) member: re-assign freely: A does *not* contain B
  - B& data member: cannot be re-assigned: A *does* contain B

----------------------------------------

Personally, I would like a vendor's implementation to provide the
information a garbage collector needs to work efficiently. Since there
are situations when this is unacceptible it should be a specifiable
option, possibly on a per-class basis.

Reference counting can get very inefficient very quickly, besides
being yet another detail to have to worry about. And manual
reclamation, while essential for performance *within* well bounded
modules, can seriously complicate re-use from other modules. 


...still looking for a compacting, non-conservative gc for c++ :-)
 
Desmond.
--

-------------------------------------------------------------------------------
 Desmond D'Souza, MCC CAD Program | ARPA: dsouza@mcc.com | Phone: [512] 338-3324
 Box 200195, Austin, TX 78720 | UUCP: {uunet,harvard,gatech,pyramid}!cs.utexas.edu!milano!cadillac!dsouza

bobatk@microsoft.UUCP (Bob ATKINSON) (10/01/90)

(David Chase) writes:
>However
>(1) Conventional wisdom for Lisp/Smalltalk may not hold for C/C++;
>    Lisp and Smalltalk rely on garbage collection for activation
>    records and real numbers, for instance, and C does not
>    (optimization of Lisp cleans this up somewhat).

Modern implemenations of Smalltalk use processor stack frames for LIFO
activation records, and therefore have little interaction with the 
allocator, let alone the reclaimer. They still, to my knowledge, use
heap-allocated real numbers, but *not* for small integers, which are
far more important.

jimad@microsoft.UUCP (Jim ADCOCK) (10/02/90)

In article <TOM.90Sep26075455@hcx2.ssd.csd.harris.com> tom@ssd.csd.harris.com (Tom Horsley) writes:
|To say that "nothing forces you to use near and far pointers in the same
|program" is absolute nonsense. Performance is not `nothing'. When dealing
|with insane and brain damaged architectures produced by the twisted
|minds of Intel engineers it is often *absolutely necessary* to mix far
|and near pointers in order to get a program that even comes close to
|having an acceptable amount of overhead. Saying, "OK, well I'll just use
|far pointers for everything" is a sure-fire way to make a 30 second
|task take 2 hours.

In partial defense of those twisted minds:  386sx, 386, and 486 machines
are perfectly capable of programming in 0:32 flat model mode, and without
any 10X performance hit.  The real problem is those few millions of customers
out there who keep buying 286 machines.  386, 486 also support 16:32 mode,
which I believe will prove interesting for persistent objects.  Where are
0:32 architectures going to go when 32 bit pointers prove too limiting?
Will they go to 0:48 pointers, and lose backwards compatibility, or will
they also move to 16:32 pointers?  Or is someone going to start making a 
commercially successful OO CPU ?

[[The real problem with Intel is they never include enough GD registers!]]

craig@Neon.Stanford.EDU (Craig D. Chambers) (10/02/90)

In article <57850@microsoft.UUCP> bobatk@microsoft.UUCP (Bob ATKINSON) writes:
>(David Chase) writes:
>>However
>>(1) Conventional wisdom for Lisp/Smalltalk may not hold for C/C++;
>>    Lisp and Smalltalk rely on garbage collection for activation
>>    records and real numbers, for instance, and C does not
>>    (optimization of Lisp cleans this up somewhat).
>
>Modern implemenations of Smalltalk use processor stack frames for LIFO
>activation records, and therefore have little interaction with the 
>allocator, let alone the reclaimer. They still, to my knowledge, use
>heap-allocated real numbers, but *not* for small integers, which are
>far more important.

Our Self implementation supports 30-bit immediate floats, by stealing
2 bits of a normal IEEE exponent for a tag field.  This decreases the
range of immediate floats to 1/4th of what it was, and makes
translations between tagged and untagged format a bit slow, but it
does preserve the precision (the most important thing according to
Velvel Kahan), and drastically reduces the load on the allocator for
floating point calculations.  Larger ranges could be implemented by
supporting 32- or 64-bit (or larger) heap-allocated floats.

See our OOPSLA'89 paper for more details.

-- Craig Chambers

chased@rbbb.Eng.Sun.COM (David Chase) (10/02/90)

In article <-283749998@hpcupt1.HP.COM> thomasw@hpcupt1.HP.COM (Thomas Wang) writes:
>/ hpcupt1:comp.object / chased@rbbb.Eng.Sun.COM (David Chase) /  2:45 pm  Sep 26, 1990 /
>>One thing I haven't noticed yet is mention of transformations applied
>>to pointers by an optimizing compiler.

>...But in today's world, where memory is relatively cheap, there is no reason why
>a pointer cannot be encapsulated into an abstract data type that understands
>operations on garbage collection.
...
>I assume every GC pointer is root, until they become not root.  For GC pointers
>inside a structure, the constructor for that structure will turn all its
>GC pointers to non-root.
>
>The result is a non-conservative C++ collector that is also portable.

Can you elaborate on this somewhat?  Is this a compacting collector,
a non-compacting collector, or a conservatively (Bartlett-style)
compacting collector?  Do you believe that your code is safe in a
preemptively scheduled multi-threaded system (if you don't say
"volatile" somewhere, I'll claim that it isn't.)  Do you deal with the
full language, or do you impose restructions on casting or use of
reference parameters (operations that can generate internal offset
pointers).

Sorry to seem like such a nay-sayer -- I really do like garbage
collection -- but I get the impression that many proponents of garbage
collection in C++ haven't put enough effort into generation of
pathological counter-examples, or (informal) proofs of correctness
against the language *as it is specified* (as opposed to how it is
impemented today).  

Remember that eventually all that fancy syntax turns into pointer
arithmetic and dereferencing, and that optimizing compilers are free
to take great liberties in reordering these operations.

David Chase
Sun Microsystems

thomasw@hpcupt1.HP.COM (Thomas Wang) (10/03/90)

/ hpcupt1:comp.object / chased@rbbb.Eng.Sun.COM (David Chase) /  5:35 pm  Oct  1, 1990 /

>Can you elaborate on this somewhat?  Is this a compacting collector,
>a non-compacting collector, or a conservatively (Bartlett-style)
>compacting collector?  Do you believe that your code is safe in a
>preemptively scheduled multi-threaded system (if you don't say
>"volatile" somewhere, I'll claim that it isn't.)  Do you deal with the
>full language, or do you impose restructions on casting or use of
>reference parameters (operations that can generate internal offset
>pointers).

It's a non-compacting collector.  It is not conservative.  The code is
not safe in a multi-threaded system unless some locks or semaphores are used.
I imagine there are some critical sections in the GC code.  I have not
looked at adapting the GC code in a multi-threaded system yet.

I impose restrictions on the programmer to use my pointer ADT only.
This way, every GC pointer reference in the system can be tracked.

GC operations are hidden in ADTs, the implementation does not use pointer
tricks, or scanning of stack space.  So the implementation will continue to
work independent of any compiler change.  The disadventage with this collector
is that you have to re-write code to use the pointer ADT.

You can mail the library for a thesis request information.

California Polytechnic University Library
San Luis Obispo, CA 93407
USA

"The 'MM' Garbage Collector for C++", master thesis, computer science, 1989.


 -Thomas Wang
              (Everything is an object.)
                                                     wang@hpdmsjlm.cup.hp.com
                                                     thomasw@hpcupt1.cup.hp.com