[comp.lang.c] cpp macro expansion

ms@security.UUCP (04/15/87)

Sorry if this has been discussed before in reference to X3J11 discussions,
or other C problems encountered, but I haven't been reading this newsgroup
very long.

I have a question concerning the expansion of a #define by cpp on a
Sun 3/160 running UNIX 4.2BSD version 3.2 from Sun Microsystems.
Consider the following macro (this is not the actual code used, but will
serve as a typical example):

#define	MACRO(first,last)	(\
first\
_\
last)

The intent of the developers was that when cpp expands the code, the two
macro arguments would be concatenated together into one token for the
compiler, i.e:	MACRO(holy,cow) yields (holy_cow).
However our cpp under 3.2 insists on replacing the escaped newlines in the
macro with spaces, i.e:	(holy _ cow), which the compiler then spits out.

I tried this same example on Sun 3/160 running 3.0, and a Vax 11/780 running
4.2BSD, and both expanded the macro without the spaces.  A quick glance through
K&R did not yield any insight into which expansion is correct, but I may have
overlooked something.

So, my questions are:
Which is the correct expansion, or is it left to the cpp implementors?
Is there a problem with the (our?) Sun version 3.2 cpp?
What (if anything) does the new standard say about this?

Many thanks for any assistance you can provide.
				Jay W. Davison (Mistress Account)
				decvax!linus!security!ms
				jwd@mitre-bedford.arpa

gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/16/87)

In article <2857@linus.UUCP> ms@security.uucp (Mistress Account) writes:
>The intent of the developers was that when cpp expands the code, the two
>macro arguments would be concatenated together into one token for the
>compiler, i.e:	MACRO(holy,cow) yields (holy_cow).

Unfortunately C a la K&R did not provide any way to do such
"token pasting".  People using the Reiser CPP (found on most
UNIX systems) typically resorted to the following trick:
	#define GLUE(a,b) a/**/b
However, this is not guaranteed to work and, indeed, is
guaranteed NOT to work in X3J11-compliant compilers.  The
X3J11 invention for token pasting is:
	#define GLUE(a,b) a ## b
You will probably not find many compilers implementing this
yet, since this part of the spec kept changing.

gemini@homxb.UUCP (04/17/87)

In article <5764@brl-smoke.ARPA>, gwyn@brl-smoke.UUCP writes:
> Unfortunately C a la K&R did not provide any way to do such
> "token pasting".  People using the Reiser CPP (found on most
> UNIX systems) typically resorted to the following trick:
> 	#define GLUE(a,b) a/**/b
> However, this is not guaranteed to work and, indeed, is
> guaranteed NOT to work in X3J11-compliant compilers.  The
> X3J11 invention for token pasting is:
> 	#define GLUE(a,b) a ## b
> You will probably not find many compilers implementing this
> yet, since this part of the spec kept changing.

And we of course have a third method here at AT&T.  The
preprocessor distributed with 4th generation make uses:
 	#define GLUE(a,b) a\+b
Can you say non-compliant?  I knew you could.

Maybe X3J11 should consider BOTH latter methods, especially since
I'm in a love/hate relationship with 4th generation make, and have
	GLUE(me,==nonportable)

Rick Richardson, PC Research, Inc: (201) 922-1134  ..!ihnp4!castor!pcrat!rick
	         when at AT&T-CPL: (201) 834-1378  ..!ihnp4!castor!polux!rer

guy%gorodish@Sun.COM (Guy Harris) (04/17/87)

>#define	MACRO(first,last)	(\
>first\
>_\
>last)
>

>Which is the correct expansion, or is it left to the cpp implementors?

The ANSI C standard indicates that backslash-newline should be
completely stripped from source code fairly early in the translation
process.  This means that inserting blanks is incorrect; however, it
also indicates that substituting for "first" and "last" is incorrect,
because this macro definition should be treated identically to

	#define	MACRO(first,last)	(first_last)

>Is there a problem with the (our?) Sun version 3.2 cpp?

There is a problem with the System V "cpp", from which the 3.2 "cpp"
is derived.  There is a technique more likely to work on various
versions of UNIX:

	#define	MACRO(first,last)	(first/**/_/**/last)

However, *this* is not guaranteed to work on all C implementations,
either.

Also note that *neither* technique will work with the version of the
preprocessor used in many UNIX C implementations if you call the macro as

	MACRO(foo, bar)

since the blank in front of "bar" is considered part of the
argument.

Both of these facts argue against widespread use of this technique,
since it isn't guaranteed to work and since it breaks if you make
changes to the source code that one would think safe.

>What (if anything) does the new standard say about this?

It says you should write the macro like:

	#define	MACRO(first,last)	(first##_##last)

which will cause the "first", the "_", and the "last" to be glued
together into one token.  It also says (see the "debug" macro in the
example on pages 80 and 81 of the October 1, 1986 draft) that blanks
in the argument list should not be considered part of the argument.

Of course, this is a draft standard, and is subject to change.

herndon@umn-cs.UUCP (Robert Herndon) (04/17/87)

In article <2857@linus.UUCP>, ms@security.uucp (Mistress Account) writes:
> #define	MACRO(first,last)	(\
> first\
> _\
> last)
> The intent of the developers was that when cpp expands the code, the two
> macro arguments would be concatenated together into one token for the
> compiler, i.e:	MACRO(holy,cow) yields (holy_cow).

Try:
#define	CONCAT(first,last) first/**/_/**/last

I think this has worked for me on various suns -- the preprocessor
does the expansion, stripping out the comments in the definition,
leaving no spaces.

				Robert Herndon

jbuck@epimass.UUCP (04/18/87)

In article <5764@brl-smoke.ARPA>, gwyn@brl-smoke.UUCP writes:
>> "token pasting".  People using the Reiser CPP (found on most
>> UNIX systems) typically resorted to the following trick:
>> 	#define GLUE(a,b) a/**/b
>> However, this is not guaranteed to work and, indeed, is
>> guaranteed NOT to work in X3J11-compliant compilers.  The
>> X3J11 invention for token pasting is:
>> 	#define GLUE(a,b) a ## b
>> You will probably not find many compilers implementing this
>> yet, since this part of the spec kept changing.

In article <229@homxb.UUCP> gemini@homxb.UUCP (Rick Richardson) writes:
>And we of course have a third method here at AT&T.  The
>preprocessor distributed with 4th generation make uses:
> 	#define GLUE(a,b) a\+b
>Can you say non-compliant?  I knew you could.

I use

#define QUOTE(x) x
#define GLUE(x,y) QUOTE(x)y

Well, it works for me, and it's prettier than a/**/b.  Can't tell
whether ANSI breaks this: I suppose a conforming compiler could
"tokenize" things before the preprocessor is invoked (or there
may be no separate preprocessor pass) -- but I like it.

-- 
- Joe Buck    {hplabs,ihnp4,sun,ames}!oliveb!epimass!jbuck
	      seismo!epiwrl!epimass!jbuck  {pesnta,tymix,apple}!epimass!jbuck

gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/18/87)

In article <1062@epimass.UUCP> jbuck@epimass.UUCP (Joe Buck) writes:
>#define QUOTE(x) x
>#define GLUE(x,y) QUOTE(x)y
>
>Well, it works for me, and it's prettier than a/**/b.  Can't tell
>whether ANSI breaks this...

You're assuming that preprocessing is done at a character level
rather than a token level, but in fact it is more natural to do
it at a token level since that's how macro names and arguments
are treated anyway.  A tokenizing preprocessor would have no
reason to glue together adjacent tokens; that's why X3J11
invented an explicit operator for specifying token pasting.
There are a lot of tokenizing C preprocessors already in existence..

By the way, I object the the frequent use of "ANSI breaks this".
X3J11 is trying to establish standard meanings for previously
underspecified parts of the language that have caused portability
problems in the past.  ANY assumption you have been making that is
not true of ALL C implementations will cause your code to "break"
when it's moved to another environment that had different
interpretations of the rules.  This is not X3J11's doing!  It is
unavoidable that the eventual ANSI specified C environment will
differ in some way from virtually all existing implementations,
since they have all come up with mutually incompatible flavors
of the language.  The hope is that in the long run the ANSI C
environment will be both powerful enough and flexible enough to
be provided by almost all C implementors and used by almost all
C programmers (perhaps as a subset of a wider, system-dependent
environment such as POSIX).

henry@utzoo.UUCP (Henry Spencer) (04/19/87)

> Which is the correct expansion, or is it left to the cpp implementors?

This is one of the (numerous) areas where K&R and such just were not quite
explicit enough to provide solid guidance.  The interface between cpp and
the rest of the compiler is a real minefield, since historically cpp was
a separate pass with very limited understanding of C syntax.

> Is there a problem with the (our?) Sun version 3.2 cpp?

No, its behavior is legitimate.  The programmer who assumed that cpp would
remove the backslashed newlines entirely and then combine things into one
token was relying on an undocumented property of one particular cpp.  There
is *no* portable pre-ANSI way to get this effect.

> What (if anything) does the new standard say about this?

It may be kosher in X3J11, given their slightly-expanded view of the meaning
of a backslashed newline, but I'd have to study the draft standard very
carefully to be sure.  I suspect that the backslashed newlines drop out at
once, so the thing becomes one token *before* macro substitution, so it
still doesn't work.  They have defined a way to get the desired effect,
but with different and more explicit syntax.
-- 
"If you want PL/I, you know       Henry Spencer @ U of Toronto Zoology
where to find it." -- DMR         {allegra,ihnp4,decvax,pyramid}!utzoo!henry

neville@ads.arpa (04/21/87)

There is a potentially non-portable preprocessor feature that most of
the GLUE(a,b) macros that have been suggested suffer from.

Most such macros that people use look like
 	#define PASTE_IT(left,right)  left/**/right

What some people may not realize is that comments are sort of "doubly-
defined" in that they are defined as part of the C language itself,
but most C preprocessors go ahead and strip comments themselves.  There
is no reason that i can see to expect that all preprocessors will do
this.  If a C comment gets passed through to the *compiler*, what
happens.  You just can't count on hacks like this.

							-neville

neville@ads.arpa (04/21/87)

There is a potentially non-portable preprocessor feature that most of
the GLUE(a,b) macros that have been suggested suffer from.

Most such macros that people use look like
     #define PASTE_IT(left,right)  left/**/right

What some people may not realize is that comments are sort of "doubly-
defined" in that they are defined as part of the C language itself,
but most C preprocessors go ahead and strip comments themselves.  There
is no reason that i can see to expect that all preprocessors will do
this.  If a C comment gets passed through to the *compiler*, what
happens.  You just can't count on hacks like this.

                            -neville