[comp.lang.modula3] optimization and garbage collection

goldberg@parc.xerox.com (David Goldberg) (01/03/91)

This is on SPARCs.  I compiled the "m3compile" program using cc -O, and
when I used the resulting compiler, I got random behavior: when
repeatedly compiling the same program, sometimes that program would
compile correctly, and sometimes not.  When I patched
collection_allowed to be 0, compiles always worked.  This is true with
both gcc (1.37.1) and the Sun C compiler (SparcCompiler 1.1 beta).

Has anybody tried compiling m3compile with optimization and gotten
it to work reliably?

	David Goldberg

chased@rbbb.Eng.Sun.COM (David Chase) (01/03/91)

goldberg@parc.xerox.com (David Goldberg) writes:
> [with "cc -O" of m3compile, I got random behavior until I turned
> off the garbage collector.]

Please note that this is very probably not a bug in the optimizer;
such behavior was predicted over 3 years ago.  This is one of the
hazards of using C as an intermediate language.  (I'm trying very hard
not to gloat too much.)

As a workaround, you might try (for Sun's compiler) "cc -O1".

David Chase
Sun Microsystems, Inc.

rminnich@super.org (Ronald G Minnich) (01/05/91)

In article <5142@exodus.Eng.Sun.COM>, chased@rbbb.Eng.Sun.COM (David Chase) writes:

|> Please note that this is very probably not a bug in the optimizer;
|> such behavior was predicted over 3 years ago.  This is one of the
|> hazards of using C as an intermediate language.  (I'm trying very hard
|> not to gloat too much.)

Any objection a more detailed explanation? 
I haven't looked at what this is doing, so am curious.
thanks,
ron

-- 
"Socialism is the road from capitalism to communism, but we never promised to 
                 feed you on the way!"-- old Russian saying
"Socialism is the torturous road from capitalism to 
                  capitalism" -- new Russian saying (Wash. Post 9/16)

chased@rbbb.Eng.Sun.COM (David Chase) (01/05/91)

rminnich@super.org (Ronald G Minnich) writes:
>David Chase wrote::
>|> Please note that this is very probably not a bug in the optimizer;
>|> such behavior was predicted over 3 years ago.  

>Any objection a more detailed explanation? 

Nope (and this is the second request, counting e-mail queries, so....)

What happens (in broad terms) is this: The C compiler is not required
to maintain the variables, as you write them, with the values that you
expect to see in them.  The variables may be folded into
common/loop-invariant/loop-inductive expressions (because it makes the
code go faster) and the registers containing the original variables
(now dead) are reused for temporaries.  These temporaries may contain
positive or negative offsets from pointers, or even the difference of
two pointers.

The garbage collector goes looking for pointers to the beginning
of an object, finds none, and reclaims the object (incorrectly).
Any temporaries referencing the object are now dangling pointers.

Note well that this is correct, standard-conforming, behavior on the
part of the (C) optimizer.  The garbage collector is relying on
artifacts of a particular implementation of C.  Those artifacts are
not required to exist, were not intended by the compiler writers when
they wrote the compiler, and are not checked by a single line of test
code.  This problem exists whenever you use C as an intermediate
language, optimize it, and make use of a conservative
stack/register-scanning garbage collector; it is not peculiar to
Modula-3.

For example, suppose you wrote your own copy of strcpy:

char * strcpy (char * s1, char * s2)
{
  int i;
  for (i = 0; s2[i] != 0; i++)
     s1[i] = s2[i];
  s1[i] = 0;
  return s1;
}

This might reasonably be optimized to

char * strcpy (char * s1, char * s2)
{
  char * t = s1;
  while (*s2 != 0)
     {  
     *s1 = *s2;
     s1++; s2++;
     }
  *s1 = 0;
  return t;
}

No memory allocation takes place in the inner loop, so it
appears to be ok to toss the pointers to the beginning of
the strings.  However, the same optimization applies if
a procedure is called within the loop, as in:

char * tweaked_strcpy (char * s1, char * s2, char (*f)(char))
{
  char * t = s1;
  while (*s2 != 0)
     {  
     *s1 = (*f)(*s2);
     s1++; s2++;
     }
  *s1 = 0;
  return t;
}

The procedure f can do whatever it chooses, including allocate memory.
And, with preemptively scheduled threads, a garbage collection can
occur at any time.

David Chase
Sun Microsystems

muller@src.dec.com (Eric Muller) (01/10/91)

David Chase argues that an aggressive C optimizer can disturb Modula-3
programs (in the case of a Modula-3 compiler that generates C, of
course). In particular, garbage collection can be a real trouble, as
roots can be "hidden".

While such occurrences are possible, I would not rule out a bug in the
Modula-3 compiler or in the C optimizer to conclude right away in
favor of an unwanted side-effect of aggressive optimization.  If I
were to give probabilities to:

   1. SRC Modula-3 generates incorrect C code
   2. the C compiler is incorrect
   3. it is one of those things we cannot do because we generate C

I would say: p(1) = 99.9%, p(2) = 0.099%, p(3) = 0.001%.

Of course, we make assumptions when generating code and in the garbage
collector, and they may not be satisfied. As for the generated code, we
try to use no more than what ANSI-C says. For the garbage collector,
we assume that if a traced heap object (i.e. a traced OBJECT or REF)
is accessible by the program, then there is somewhere in the stacks or
the registers a reference to one of the pages that contains that
object. If it turns out that this is not reasonable, we can increase
the level of conservatism, for example that saying that the pointer
can be as far as x bytes from object.

However, we have yet to see an example of 3. In the mean time, keep
these bug reports coming so that we can achieve p(1) = 50%.

Eric Muller.
------------------------------------------------------------------------------
System Research Center - 130 Lytton Av. - Palo Alto, CA 94301 - (415) 853 2193

urlichs@smurf.sub.org (Matthias Urlichs) (01/12/91)

In comp.lang.modula3, article <5293@exodus.Eng.Sun.COM>,
  chased@rbbb.Eng.Sun.COM (David Chase) writes:
< 
< The garbage collector goes looking for pointers to the beginning
< of an object, finds none, and reclaims the object (incorrectly).
< Any temporaries referencing the object are now dangling pointers.
< 
It might be more reasonable to mark the object in use if there is any pointer
anywhere into the object.

Has anyone tried this and watched whether, for a test case which exhibits the
problem, said problem goes away?

-- 
Matthias Urlichs -- urlichs@smurf.sub.org -- urlichs@smurf.ira.uka.de     /(o\
Humboldtstrasse 7 - 7500 Karlsruhe 1 - FRG -- +49+721+621127(0700-2330)   \o)/

chased@rbbb.Eng.Sun.COM (David Chase) (01/14/91)

urlichs@smurf.sub.org (Matthias Urlichs) writes:
>In comp.lang.modula3, article <5293@exodus.Eng.Sun.COM>,
>  chased@rbbb.Eng.Sun.COM (David Chase) writes:

>< The garbage collector goes looking for pointers to the beginning
>< of an object, finds none, and reclaims the object (incorrectly).
>< Any temporaries referencing the object are now dangling pointers.

>It might be more reasonable to mark the object in use if there is any pointer
>anywhere into the object.

If you think about this a bit, you'll see that you get reference
counting, sort of (what if there is more than one pointer "into" the
object?).  Second problem is, how do you find these objects?  Sounds
like one more sweep step (since the optimizer hid them from the
collector).  "Into" is not really "into"; derived pointers may exist
outside the object.  Optimizing C compilers do this.  There is no
useful bound on how far "outside" the object the pointer may "point",
and no reason that there should be.

I think the answer is to get a little bit of help from the compiler;
unfortunately (in the short term) you won't get any help from a
compiler (or back-end/optimizer) for C.

David

hagerman@ece.cmu.edu (John Hagerman) (01/15/91)

David Detlefs just completed his PhD on garbage collection for C++
which addresses the problem of handling member pointers.  It might
address many of the points being raised here.
--
hagerman@ece.cmu.edu