[comp.sys.mac.programmer] MPW compiler bugs

gstein@oracle.com (Greg Stein) (02/14/90)

> From: jln@acns.nwu.edu (John Norstad)
> In article <1046@watserv1.waterloo.edu> bmwrkshp@watserv1.waterloo.edu ( 
> Wrkshp Id - Sys Design ) writes:
> > But I'm sure THINK C is still an order of magnitude
> > faster. Don't know what that linker is doing. Sure takes its own
> > sweet time even on Mac II's.
> 
> As I understand it, one of the reasons is that MPW's linker is much 
> smarter.  For example, it lets you direct individual routines within a 
> single module to different segments, and it eliminates dead code.  I'm not 
> sure about all this, though.
> 
> > I would probably use MPW 99% of the time if it weren't for the fact
> > that the C compiler is broken. Even the updated C compiler
> > included with the C++ package still has bugs.
> 
> I've heard this from other places too (e.g., the guys at Wolfram).  But 
> I'm up to 16,000 lines of pretty hairy C code in Disinfectant now, and 
> I've yet to encounter a single compiler bug.  Maybe because I'm a reformed 
> Pascal fanatic, I don't write as ugly code as the rest of you guys :-)
> 
> John Norstad
> Northwestern University
> jln@acns.nwu.edu
> 

I have a nice bug from V3.1b3e19 of MPW C.  Type in this little code
fragment, compile it, and look at its output:

int main()
{
    char *a, *b = 0L;
    int c = 0L;		/* assignment to prevent warnings */

    if (a = &b[c])
	return(1);
    return(2);
}

The relevant part is this:

    ...
    LEA   $00(A3,D7.L),A4	; all vars are in registers
    BEQ.S @1
    ...

Note that a conditional branch is performed without any testing being
done.  (Yes, the bug has been reported to Apple)  Don't ask me how I
found the bug, but it is repeatable.

Things like this make me wonder just how stable the compiler is.  I
realize that I have a Beta version and that the line of code above is
sheer idiocy for most purposes, but still...

John, I will agree with you, though, about the Linker.  Think C
includes code a whole file at a time -- it won't pick out individual
routines from a file.  MPW C will do this.  What I don't get is why
Think C hasn't changed this by now -- they've had a few versions to do
it.  Heck, in high school when I wrote the Rascal compiler for the
Mac, I wrote a linker that did dead code analysis and picked out a
routine at a time.  Surely, Mike Kahl could do it.

Well, John, I don't know whether you write ugly code or not, but
Disinfectant 1.6 is a very nice piece of work.  Thanx.

Greg Stein	-- This posting bears no relation to my employer
Arpa: gstein%oracle.uucp@apple.com
UUCP: ..!{uunet,apple}!oracle!gstein

amanda@mermaid.intercon.com (Amanda Walker) (02/15/90)

In article <1990Feb14.004350.14475@oracle.com>, gstein@oracle.com (Greg Stein)
writes:
> Things like this make me wonder just how stable the compiler is.


It seems pretty good for "most" code.  In particular, all of the bugs we've
found have been code generation bugs where the condition codes are not
tracked properly.

Well, that and scribbling on the SCSI driver :-)...

--
Amanda Walker
InterCon Systems Corporation

"Many of the truths we cling to depend greatly upon our own point of view."
	--Obi-Wan Kenobi in "Return of the Jedi"

anders@penguin (Anders Wallgren) (02/15/90)

Also, try the following:

#include <stdio.h>

enum test_enum {
  enum_1	= 1,
  enum_2	= 2,
  enum_3	= 3,
  enum_4	= 4,
  };

typedef struct {
	int val_int;
	unsigned char *val_string;
} TEST_STRUCT;

void main(void)
{
  enum test_enum type;
  TEST_STRUCT uval,*uvalp;
  int val;
	
  type = enum_4
  uval.val_int = 1;
  uvalp = &uval;
  val = 1;
	
  type = (enum test_enum)(uval.val_int);
  printf("struct and cast: type is %d (this should be 1)\n", type);
  type = enum_4

  type = uval.val_int;
  printf("struct, no cast: type is %d (this should be 1)\n", type);
  type = enum_4;
	
  type = (enum test_enum)(uvalp->val_int);
  printf("struct and cast through pointer: type is %d (this should be 1)\n", 
	type);
  type = enum_4;

  type = uvalp->val_int;
  printf("struct, no cast, through pointer: type is %d (this should be 1)\n",
	 type);
  type = enum_4;
}


This won't work in 3.1b1 or 3.1 final.

billkatt@mondo.engin.umich.edu (billkatt) (02/15/90)

In article <1990Feb14.004350.14475@oracle.com> you write:
>> From: jln@acns.nwu.edu (John Norstad)
>> In article <1046@watserv1.waterloo.edu> bmwrkshp@watserv1.waterloo.edu ( 
>> Wrkshp Id - Sys Design ) writes:
>> > But I'm sure THINK C is still an order of magnitude
>> > faster. Don't know what that linker is doing. Sure takes its own
>> > sweet time even on Mac II's.
>> 
>> As I understand it, one of the reasons is that MPW's linker is much 
>> smarter.  For example, it lets you direct individual routines within a 
>> single module to different segments, and it eliminates dead code.  I'm not 
>> sure about all this, though.
>> 
>John, I will agree with you, though, about the Linker.  Think C
>includes code a whole file at a time -- it won't pick out individual
>routines from a file.  MPW C will do this.  What I don't get is why
>Think C hasn't changed this by now -- they've had a few versions to do
>it.  Heck, in high school when I wrote the Rascal compiler for the
>Mac, I wrote a linker that did dead code analysis and picked out a
>routine at a time.  Surely, Mike Kahl could do it.

Think C DOES indeed remove dead code, ever since version 3.0.  It doesn't seem
to do quite as good a job as MPW 3.0, but a good job none the less.  To
remove dead code, just check the 'Smart Link' check box when you build your
app/DA/whatever.  You can go out and prove it to yourself by writing a
program which uses one small routine from the ANSI library, and building it to
disk.  Whereas the ANSI library is 27K long, your program will come out to
about 5 or 6K.

-Steve Bollinger
billkatt@mondo.engin.umich.edu

anders@penguin (Anders Wallgren) (02/16/90)

Of course we shouldn't forget that MPW C and C++ are not compatible.
CFront does not optimize enums, whereas the C compiler does, which
causes a lot of problems.  According to Apple there is no way to get
around this:

"When CFront compiles your code, although it uses the same C compiler,
it has already tokenized its input and performed its own version of
optimization at that level.  ...it would be a good idea if all C
compilers on that same platform produced interchangeable optimized
code but this is unfortunately not the case here...  You will not be
able to mix code between CFront and MPW C at this time without adding
glue code to fix up the differences.  Your comments will, however, be
brought to the attention of the compilers group and, hopefully, this
incompatibility will not always be true."

1.  Perhaps it should be suggested to Apple's compiler group that they
    should spend more time testing their compilers and less time
    inventing infantile error messages.

2.  Better yet, perhaps they should provide switches to turn off these
    optimizations so that we can get around bugs and
    'incompatibilities.'

3.  The thing that really steams me is that there's a switch to CFront
    to turn off the optimization that it never does, but no switch to
    the MPW C compiler to turn off the optimization that it always
    does.

Anders

gstein@oracle.com (02/16/90)

Steve Bollinger writes:
>Greg Stein writes:
>John, I will agree with you, though, about the Linker.  Think C
>>includes code a whole file at a time -- it won't pick out individual
>>routines from a file.  MPW C will do this.  What I don't get is why
>>Think C hasn't changed this by now -- they've had a few versions to do
>>it.  Heck, in high school when I wrote the Rascal compiler for the
>>Mac, I wrote a linker that did dead code analysis and picked out a
>>routine at a time.  Surely, Mike Kahl could do it.
>
>Think C DOES indeed remove dead code, ever since version 3.0.  It doesn't seem
>to do quite as good a job as MPW 3.0, but a good job none the less.  To
>remove dead code, just check the 'Smart Link' check box when you build your
>app/DA/whatever.  You can go out and prove it to yourself by writing a
>program which uses one small routine from the ANSI library, and building it to
>disk.  Whereas the ANSI library is 27K long, your program will come out to
>about 5 or 6K.
>

Hmm...

I had thought that Think C works on a file by file basis.  If you
include a routine from a file, the WHOLE file is pulled in.  Note,
though, that this doesn't count for projects/libraries: in these it
picks the file out of the project.  As for the ANSI library, Think has
split up the source into a bunch of tiny files.  Since you use one
routine, you get one (little) file.  I know that using "Run..."
doesn't do any dead code analysis and you pick up whole libraries/
projects (e.g. MacTraps), but building is different.

Maybe I'm brain dead and they've fixed this.  Unfortunately, I have to
wait until I get home to check cuz we don't use Think C here at work :-(

Happy hacking...
Greg Stein	-- This posting bears no relation to my employer
Arpa: gstein%oracle.uucp@apple.com
UUCP: ..!{uunet,apple}!oracle!gstein

ph@cci632.UUCP (Pete Hoch) (02/17/90)

In article <10898@zodiac.ADS.COM>, anders@penguin (Anders Wallgren) writes:
> Of course we shouldn't forget that MPW C and C++ are not compatible.
> 
> You will not be
> able to mix code between CFront and MPW C at this time without adding
> glue code to fix up the differences.

What do you mean?  What are the differences?  I am currently linking
C with C++, assembler and Pascal.  There are no link errors and no
run time errors. (That I have traced to the compilers :-)  So what
is it that is not compatible?

Thanks,
Pete Hoch

CXT105@psuvm.psu.edu (Christopher Tate) (02/17/90)

In article <1990Feb16.012322.19895@oracle.com>, gstein@oracle.com says:

>Hmm...
>
>I had thought that Think C works on a file by file basis.  If you
>include a routine from a file, the WHOLE file is pulled in.  Note,
>though, that this doesn't count for projects/libraries: in these it
>picks the file out of the project.  As for the ANSI library, Think has
>split up the source into a bunch of tiny files.  Since you use one
>routine, you get one (little) file.  I know that using "Run..."
>doesn't do any dead code analysis and you pick up whole libraries/
>projects (e.g. MacTraps), but building is different.

I'm not sure about this, but I think that THINK C does its dead-code removal
on a SEGMENT-basis.  If you have several routines in the same segment in
your project (which you are including as a library), and you use any one
of them, you'll wind up with all of them in your final build.  But, you
won't have any code from other segments of the same included project.

Note that this applies to *projects* which are included in other projects,
not to actual "libraries."

-------
Christopher Tate                    |  "And as I watch the drops of rain
                                    |   Weave their weary paths and die,
cxt105@psuvm.psu.edu                |   I know that I am like the rain;
{...}!psuvax1!psuvm.bitnet!cxt105   |   There but for the grace of you go I."
cxt105@psuvm.bitnet                 |          -- Simon & Garfunkle

minich@a.cs.okstate.edu (MINICH ROBERT JOHN) (02/17/90)

From article <1990Feb16.012322.19895@oracle.com>, by gstein@oracle.com:
> Hmm...
> 
> I had thought that Think C works on a file by file basis.  If you
> include a routine from a file, the WHOLE file is pulled in.  Note,
> though, that this doesn't count for projects/libraries: in these it
> picks the file out of the project.  As for the ANSI library, Think has
> split up the source into a bunch of tiny files.  Since you use one
> routine, you get one (little) file.  I know that using "Run..."
> doesn't do any dead code analysis and you pick up whole libraries/
> projects (e.g. MacTraps), but building is different.
> 
> Maybe I'm brain dead and they've fixed this.  Unfortunately, I have to
> wait until I get home to check cuz we don't use Think C here at work :-(
> Greg Stein

  I am not sure about THINK C, but I know the way THINK Pascal works, it
does include entire files when you're debugging, but when you "Build..."
and check "Smart Linking", it does go routine by routine. I've had a 
couple occasions where I picked out a couple routines out of each of many
seperate files. The total of all the files wouldn't fit into one segment
unless I did the Build.. with Smart Linking. Oc course, I went and deleted
the majority of the source, which I didn't happen to be using, so I could
debug the darn thing!

Robert Minich
minich@a.cs.okstate.edu

anders@penguin (Anders Wallgren) (02/18/90)

The MPW C compiler optimizes the size of enums depending on what
values they take on - if it only has ten values, all of which are less
than the maximum value that a data size can hold, then it will make
the enum that size.  There is no way to turn this off.

When CFront compiles your code, it makes all your enums int's,
irregardless of what values they hold.  Apparently Apple wanted it to
optimize enums like their C compiler does, because there is a flag to
CFront (-z6, I think) to tell it to NOT optimize enum, but this is in
fact what it does all the time.

This causes BIG PROBLEMS.  For example, if you have a struct with a
member that is an enum, and then try to write code in C _and_ C++ that
know about this struct, they won't agree on the size of it, and will
overwrite each other.  Of course there is no warning from either
compiler, since they both think they know what they are doing, and no
warning from the linker since you're just passing pointers to structs,
and not the structs themselves.  The following three-file test program
will demonstrate this pretty clearly:

test.h
------
enum type { type1, type2, type3 };

typedef struct foo {
	enum type t;
	short	  s;
	long	  l; 
}  FOO;

int kernel(FOO *f);

test.c
------
#include "test.h"

int
kernel(f)
  FOO *f;
{
   int i = sizeof(FOO);
   f->t = type2;
   f->s = 0x0101;
   f->l = 0x69696969;
   return i;
}

test.cp
-------
#include <stdio.h>

extern "C" {
#include "test.h"
}

main(void)
{
	char	c[256];
	FOO f;
	char	d[256];
	int	i = sizeof(f);
	int j = kernel(&f);
	printf("sizeof foo (c++): %d\n", i);
	printf("sizeof foo (c): %d\n", j);
	printf("foo.t (2): %d, foo.s (x0101): %x, foo.l (x69696969): %x\n", f.t, f.s, f.l);
}

rmh@apple.com (Rick Holzgrafe) (02/20/90)

In article <10936@zodiac.ADS.COM> anders@penguin (Anders Wallgren) writes:
> The MPW C compiler optimizes the size of enums depending on what
> values they take on - if it only has ten values, all of which are less
> than the maximum value that a data size can hold, then it will make
> the enum that size.  There is no way to turn this off.
> 
> When CFront compiles your code, it makes all your enums int's,
> irregardless of what values they hold.  Apparently Apple wanted it to
> optimize enums like their C compiler does, because there is a flag to
> CFront (-z6, I think) to tell it to NOT optimize enum, but this is in
> fact what it does all the time.

I *know* it's ugly and offensive, but could you do something like:
    enum type { type1, type2, type3, typeDummy=0x7fffffff };
to force MPW C to use 32-bit values for enums? If so,
    #define FORCE_LONG_ENUMS    ,typeDummy=0x7fffffff 
    enum type { type1, type2, type3 FORCE_LONG_ENUMS};
could be used for the duration, then easily compiled out when the 
compilers get their acts together.

==========================================================================
Rick Holzgrafe              |    {sun,voder,nsc,mtxinu,dual}!apple!rmh
Software Engineer           | AppleLink HOLZGRAFE1          rmh@apple.com
Apple Computer, Inc.        |  "All opinions expressed are mine, and do
20525 Mariani Ave. MS: 67-B |    not necessarily represent those of my
Cupertino, CA 95014         |        employer, Apple Computer Inc."

anders@penguin (Anders Wallgren) (02/21/90)

In article <6792@internal.Apple.COM>, rmh@apple (Rick Holzgrafe) writes:
>I *know* it's ugly and offensive, but could you do something like:
>    enum type { type1, type2, type3, typeDummy=0x7fffffff };
>to force MPW C to use 32-bit values for enums? If so,
>    #define FORCE_LONG_ENUMS    ,typeDummy=0x7fffffff 
>    enum type { type1, type2, type3 FORCE_LONG_ENUMS};
>could be used for the duration, then easily compiled out when the 
>compilers get their acts together.

Yeah, we did something like:

typedef enum {
	foo,
	bar,
} ENUM_DEF(FOO_BAR);

using a macro which was conditionally defined based on what platform
we compiled the code on:

#ifdef MACOS
#define ENUM_DEF(name) _ ## name; typedef short name
#else
#define ENUM_DEF(name) name
#endif

This way we didn't have to modify the member of the enums, and could
rely on enums being a certain size.

Anders