[gnu.g++.help] name mangler

vaughan%cadillac.cad.mcc.com@MCC.COM (Paul Vaughan) (04/02/91)

Several people including myself have asked for a name mangler. Michael
Tiemann has replied that such a thing exists in the g++ file
cplus-method.c, but I haven't figured out any way to use that just
yet.  I've been looking into the problem a bit and have come to a
sticky question: Exactly what input strings should a name mangler
accept?


Michael's mangling function is called as part of the process of
compiling code and is oriented very strongly for this purpose. It
appears to accept any legal function declaration (or a start of a
definition) of the current context in parse tree form.  Any types
mentioned must have been declared, default argument expressions are
admitted, explicit type declarations (i.e. void foo(struct Bar);) etc.
are allowed. This makes for a very complex grammar for function
declarations--for instance, it subsumes the expression grammar of C++.

I looked into the way that gdb handles name mangling. It avoids the
issue by only doing name demangling. That is, when you type in a C++
function name (not a complete declaration) like "foo" to set a
breakpoint, it looks through all symbols that start with foo,
demangles any matches and compares the demangled base name (the
demangler has an option to return only the base name, instead of a
full declaration with argument specs) against the given base name. (As
an aside, this code doesn't quite work for ordinary functions in gdb
3.6, and I don't understand how it is intended to work when a function
is overloaded).

The reason I wanted a name mangler was in connection with dynamic
linking. I'd like to be able to specify a full declaration (but not in
any context of typedefed names, and without the return type) and get
out the mangled symbol.  For instance,


"foo(Foo, Bar*, int, int)"

would give 

"_foo__FG3FooP3Barii"

for g++-1.39

I'm wondering, is this even feasible? Would it be necessary to have
built up the context of typedef'ed names? Suppose that certain
restrictions to the input format of unmangled names, such as
prohibiting

	foo(Foo, struct Bar*, int, int)

were in effect. Then would there a exist a 1:1 mapping between legal
mangled and unmangled names?

Does anyone have a specification for the mangler that is simpler than
ferreting it out of the demangler in cplus-dem.cc?

How many people would be interested in having a bison grammar based
mangler and demangler?

tiemann@CYGNUS.COM (Michael Tiemann) (04/02/91)

    Does anyone have a specification for the mangler that is simpler than
    ferreting it out of the demangler in cplus-dem.cc?

You need to do the whole job because of typedefs.  I.e.,

	typedef int foo;
	typedef int bar;

	foo f (bar);

mangles to the same thing that

	bar f (foo);

mangles to.

Michael

vaughan%cadillac.cad.mcc.com@MCC.COM (Paul Vaughan) (04/03/91)

	You need to do the whole job because of typedefs.  I.e.,

		typedef int foo;
		typedef int bar;

		foo f (bar);

I was thinking that in the "function specification language", typedefs
in this sense would not be allowed. Even though foo might be declared

foo f(bar);

in some source code, it would have to be declared

int f(int)

in a function specification to be accepted by the mangler. Note that
there are other differences between this function specification
language and C++.  For instance, 

class Foo  {
  int foo(Foo*);
};

int Foo::foo(Foo*);

isn't a valid declaration in C++. (Oooh, speaking of valid C++, note
that the above is accepted by g++-1.39 but not by cfront 2.0--bug?.)
I was thinking that any identifier (name other than reserved words,
symbols, or basic types) would be assumed to directly name a user
defined type.  Typedefed aliases or full anonymous struct definitions
would not be accepted.

It seems clear that one requirement would be that the mangler be able
to accept any output generated by the demangler and vice versa. I
think the simplifications I'm making are consistent with that. However
it's not clear what other requirements exist for a useful tool. For
instance, these specs would not necessarily let you directly use
pieces of source code or output from the compiler as input to the
mangler. I don't see any way of creating such a mangler without
analyzing the entire source code for a module and that's significantly
more work than I want the mangler to do. Aside from compilation, it
seems the reason most people have cited for wanting a mangler is for
dynamic loading. I'm not sure if these simplifications would
adequately support that.

tiemann@CYGNUS.COM (Michael Tiemann) (04/03/91)

I think the way to handle dynamic loading is related to the way that
parameterized types must be handled.  I would like to see discussion
about how the linker and compiler should communicate to handle both
jobs with equal facility.  If we can get this working, then using the
name mangler that comes with the compiler will be a simple application
of software reuse.

Michael