[comp.lang.c++] Debugging on the GCC and G++ compilers

rfg@ICS.UCI.EDU (12/12/89)

edsr!jupiter!cheeks@UUNET.UU.NET (Mark Costlow) writes:

> Well, I personally haven't gotten anywhere in looking for this bug.  Although,
> I have come up with a question:  A LOT of the code for cc1plus (and cc1 too,
> I'm sure) seems to be made up of macros designed to access pieces of large 
> structures.  This does make the code look pretty nice, but it seems to render
> debugging near impossible for anybody that isn't intimately familiar with the
> code.  (I suppose it could be argued that those of us who aren't intimate
> with it shouldn't be meddling with it anyway :-).  So, my question is "How
> do you guys debug it?"  Some tips would help.

I have never understood why rms used such an immense number of MACROS().
They are definitely good for some things, but...

It seems to me that the morass of MACROS() in both the GCC and G++ compilers
does, as Mark points out, make debugging of these compilers nearly
impossible for us mear mortals who cannot always remember that
LANG_FOOBAR (TREE_UNUSUAL_MANGLE(x)) is equivalent (in the debugger)
to x->uncommon.mangle->language_foobar (or whatever).

I personally think that this execssive use of macros is made completely
indefensible just based on rms' own statement (in the GCC manual) that:
"An inline function is as fast as a macro."

If that is the case, and if both the GCC and G++ compilers end up being
compiled (eventually) with GCC anyway, what possible reason is there to
*not* have all of the little MACROS() that diddle parts of tree nodes
and/or rtl nodes be inline functions?  It would be just as fast, and
it would be *debuggable*  !!!

As a sort of experment in object-oriented re-engineering, I decided some
time ago to see if I could perform this kind of a change to GCC 1.36.
I only did it for the rtl data structures, and not the tree stuff, but
I did manage to get it going.

Anyway, what I did was this.

I took all of the types (from rtl.h) that define the rtl data structures and
(just to be sure I did a complete job) I "hid" then into a new file I
created called rtl-low.c.  I then changed all of the rtl manipulation macro
definitions in rtl.h into extern function declarations for functions with
the same names.  (Yes, using CAPITAL LETTERS and everything.)

I then wrote functional equivalents for all of the rtl manipulation macros
and stuck then into rtl-low.c.

Finally, I wrote a program (called macrophage :-) that goes through *all*
of the GCC source code and finds cases where a manipulation macro was
used on the left hand side of an assignment (there are lots of these)
and it changes them from:

	FOO(x) = y

into:

	PUT_FOO(x, y)

Note that I built all of the necessary PUT_ functions also.

Anyway, I had to make a lot of other little changes to make this work,
but I did get GCC to build and run after I was done.  And guess what!
Mere mortals could debug the finished product (at least for rtl related
problems) without having to carry around this massive specialized vocabulary.

The next logical step (which I didn't do) would be to insert lots of
assertions into the macro replacements, so that you would be instantly
alerted when a pointer that should not have been null, was null, and when
the thing pointed to was not a valid rtl node (in some sense).  This could
save HOURS and HOURS of debugging time (especially when doing a new port).

The last step of course would be to move all the data structure definitions
and manipulation functions back into rtl.h and declare them as inline.  Of
course that would require that all first-level bootstraps compiles of GCC
(using a native compiler that doesn't grok inline) would have to use -Dinline=""
but who cares?

The long and the short of it is that it is possible to re-engineer GCC/G++
for better information hiding and better debuggability and better portability,
but I doubt very much that this will be accepted.  Old habits die hard.

Nonetheless, I would love to hear the comments of various GNU'ers out there
regarding the type of changes I have described.

After the first proof that it could be done, I dropped the whole idea
because I realized that this significant of a change would not be well
recieved, and thus would never become a permanent part of GCC/G++.

Note that for a full blown application of these techniques, the *other* major
GCC data structure (trees) would have to be converted also.

If anybody else out there is dumb enough to want to play with these
patches, I might be able to dredge them up.  Send me mail.

// rfg

mike@umn-cs.CS.UMN.EDU (Mike Haertel) (12/13/89)

Actually, the Right Thing to do would be to include macro definitions
in the -g debugging information in the object file.  This would be icky
because macros don't obey the same scope rules as the rest of the
language (among other things).
-- 
Mike Haertel <mike@ai.mit.edu>
"Everything there is to know about playing the piano can be taught
 in half an hour, I'm convinced of it." -- Glenn Gould

throopw@sheol.UUCP (Wayne Throop) (12/17/89)

> rfg@ICS.UCI.EDU
> I have never understood why rms used such an immense number of MACROS().
> [...] I personally think that this execssive use of macros is made completely
> indefensible just based on rms' own statement (in the GCC manual) that:
> "An inline function is as fast as a macro."

But there are things that can be done with macros that cannot be done
with functions in pure C, such as producing an lvalue, There are other
things that cannot be done even in pure C++, such as multiple by-name
evaluations when desired.  This was even pointed out in the description
of how to convert the one to the other and sidestep the general problem
raised here (the problem of adequate debugging support in the presence
of macros).

So, rather than railing against necessity, I'd suggest adjusting the
preprocessor to leave around some breadcrumbs so that the debugger
can evaluate the macros that the preprocessor had defined.  This scheme
could get arbitrarily complicated and hard to implement, I admit, but
the simple case of a macro defined once and left defined until the end
of the compilation would solve about 99% of the problems of debugging
with macros.

The breadcrumbs could be left in a file in the filesystem, or inserted
somehow in the compilation unit so that it would find its way into the
executable image's symbol table, or simply cause the source to be
scanned for #defines on an attempt to evaluate a macro, or any number of
other possibilities.  Thus, with much more confidence than nowadays, one
could utter most anything one sees in source code, and have it
evaluated, macros and all. 

I'm sure one of you folks tinkering with the GCC or G++ compilers could
add this in one form or another to the preprocessor.  Why not do it?
Surely it is no more difficult than replacing all macro references
in the compiler with functions and patching non-function-like macros.
And it sure would have a bigger benefit.
--
Wayne Throop <backbone>!mcnc!rti!sheol!throopw or sheol!throopw@rti.rti.org

pcg@aber-cs.UUCP (Piercarlo Grandi) (12/20/89)

In article <0294@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes:
    > rfg@ICS.UCI.EDU
    > I have never understood why rms used such an immense number of MACROS().
    > [...] I personally think that this execssive use of macros is made completely
    > indefensible just based on rms' own statement (in the GCC manual) that:
    > "An inline function is as fast as a macro."

    But there are things that can be done with macros that cannot be done
    with functions in pure C, such as producing an lvalue, There are other
    things that cannot be done even in pure C++, such as multiple by-name
    evaluations when desired.  This was even pointed out in the description
    of how to convert the one to the other and sidestep the general problem
    raised here (the problem of adequate debugging support in the presence
    of macros).

Macros are useful for syntax extensions, and inlines cannot be used for that.
On the other both syntax extensions and inlines should be used sparingly
(syntax extensions especially so) and should be very small, otherwise they
don't buy you a lot.

Rms is writing very opaque code because he is using the macros either as
inlines, and for large pieces of code, or as syntax extensions, to implement
a degree of data abstraction over structures, and the latter is simply not
C; one uses suitably named struct members or whatever to achieve code
readibility.

The use of macros for inlines is also overdone, IMNHO, in gcc, because
macros/inlines buy you something in speed ONLY when the code is going to
execute often AND is small; code that is not going to execute often is not
worth optimizing, and proc overheads are irrelevant with code that is
not small, and putting inline large pieces of code increases paging.

If there is possibility of serious code simplification with macros/inlining,
things change a bit, but this is both rare and tipically only concerns a fast
path in the macro/inline, and the rest can be offlined.

The excessive macros/inlines in gcc and other programs (most notably X11R3)
also mean that code size gets large, and this is bad for programs that run
under virtual memory. Even small improvements in locality reduce page fault
rates dramatically...
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk