[comp.lang.c] ANS C Macro Processing

turner@sdti.UUCP (Prescott K. Turner) (10/20/87)

I have been working with C macro processing, as described in the draft
proposed standard (henceforth the "standard").  In doing so, I have come
across some features which may be of interest because they are vaguely
specified or obscure.  Examples are provided, since these aspects are not
covered by the examples in the standard. 

The standard is clear in 3.8.3.4 about what is rescanned after macro
replacement and parameter substitution.  Most of the obscurities arise
because (as discussed in the rationale) "the Committee agreed simply to turn
off the definition of a macro for the duration of the expansion of that
macro."  After an attempt to understand the pertinent paragraph, it is my
belief that the task of turning off a macro definition is not at all simple.

The first example deals with the term "nested replacmement".  A macro
definition, when suppressed within the macro's replacment, continues
suppressed in further nested replacements.  But how do you tell whether the
invocation of a function-like macro is nested?  Is it the macro name that
counts, or must the parenthesized argument list also be nested?  The latter
is perhaps a more intuitive interpretation of "nested replacement", but is
not consistent with the way a macro invocation is suppressed (which is based
only on where the name appears).  It also leads to some obscurities
which I won't get into here.  So I prefer an interpretation where "if any
other macro name is found during such a scan of the replacement list, and is
expanded, the definition of the original macro is suppressed in the other
macro's replacement".

Here's a contrived example in which the definition of "nested replacment"
makes a difference.  

in the context of       #define S T
                        #define T(x) S
start with              S(1)
replacing S             T(1)              where S is suppressed in "T"
replacing T(1)          S                 ***

*** I consider the replacement of "T(1)" by "S" as nested in the first
replacement "S".  Therefore both S and T are suppressed in the second
replacement "S".  (In the other interpretation, this would be further
expanded to "T".)


Other interesting cases are more complex.  The above example is interesting
because the invocation of T spans the boundary of S's suppression.  What if T
has S as an argument?  The argument appears in a position where the
definition of S is not directly suppressed.  Is processing of the macro name
S as an argument considered part of the "nested replacement" and therefore
not expanded?  I lean toward suppressing the definition of S in this
context for the sake of simplicity.

in the context of       #define S T
                        #define T(x) x
start with              S(S)
replacing S             T(S)              where S is suppressed in "T"
processing T(S)
expanding the argument S as part of the processing of T (no expansion??)
replacing T(S)          S                 where S and T are suppressed in "S"


The rest of my examples deal with "nonreplaced macro name preprocessing
tokens".  Fortunately, the standard is less open to varying interpretations
than in the above cases.  It also gives an example, from which the relevant
piece is: 

in the context of 
   #define z z[0]
   #define f(a) f(2 * (a))
start with              
   f(f(z))
      identify invocation f(f(z))
      process argument
         f(z)
            identify invocation f(z)
            process argument
               z
                  identify invocation z
                  replace z
               z[0]        
                  rescan, with definition of z suppressed
               z[0]                           The "z" is marked "nonreplaced".
            replace f(z)
         f(2 * (z[0]))                        The "z" is marked "nonreplaced".
            rescan, with definition of f suppressed ***
         f(2 * (z[0]))               The "f" and "z" are marked "nonreplaced".
      replace f(f(z))
   f(2 * (f(2 * (z[0]))))     The "z" and second "f" are marked "nonreplaced".
       rescan, with definition of f suppressed ***
   f(2 * (f(2 * (z[0]))))           The "z" and "f"s are marked "nonreplaced".
done

Only the steps marked *** illustrate the point, because the sole reason "z" 
is not expanded is that it has been examined and not replaced.


An area left unexplored by this example is that there may be other reasons
for not expanding a macro name (in particular it may be a function-like macro
not followed by "(").  In case there are other reasons, in addition to the
suppression of the macro definition, one would still consider the token
"nonreplaced".  For example:

in the context of       #define z(x) +z
                        #define f(a) f(a(3))
start with              
   f(z(1))
      identify invocation f(z(1))
      process argument
         z(1)
            identify invocation z(1)
            replace z(1)
         +z        
            rescan, with definition of z suppressed
         +z                               *1* The "z" is marked "nonreplaced".
      replace f(z(1))
   f(+z(3))                                   The "z" is marked "nonreplaced".
      rescan, with definition of f suppressed *2*
   f(+z(3))                          The "f" and "z" are marked "nonreplaced".
done

The point of the above example is that at *1* "z" is marked "nonreplaced"
even though no "(" is present.  Subsequently at *2* "z(3)" is not expanded
even though it was never rejected in this form.


Macro expansion has other obscure yet surprising aspects.  If z expands to y
during argument processing, then during rescan y can have a replacement in
which z expands again.  As an example:

in the context of       #define z +y
                        #define y(b) z
                        #define f(a) f(a(3))
start with              
   f(z)
      identify invocation f(z)
      process argument
         z
            identify invocation z
            replace z
         +y
            rescan, with definition of z suppressed
      replace f(z)
   f(+y(3))
      rescan, with definition of f suppressed
      identify invocation y(3)
      process argument
         3
      replace y(3)
   f(+z)
      rescan starting at "+", with definitions of f and y suppressed 
      identify invocation z
      replace z
   f(++y)
      rescan, starting at 2nd "+", with definitions of f, y, and z suppressed
   f(++y)
done
--
Prescott K. Turner, Jr.
Software Development Technologies, Inc.
375 Dutton Rd., Sudbury, MA 01776 USA        (617) 443-5779
UUCP:necntc!necis!mrst!sdti!turner

minow@decvax.UUCP (Martin Minow) (10/22/87)

In article <167@sdti.UUCP> turner%sdti@harvard.harvard.edu
(Prescott K. Turner, Jr.) notes that the Draft Ansi C Standard rules
for macro rescanning are either vague, obscure, or both.

I believe their intent can best be illustrated by an example:

#if debug > 0
#define exit(status)	(printf("Exit, status = %d\n", status), exit(status))
#endif

If macros were rescanned, the evaluation would loop forever.  Instead,
the evaluation is expanded as a "normal" function.  Hope this is clearer.

Martin Minow
decvax!minow

jagardner@orchid.UUCP (10/23/87)

Wow! I was just about to post a similar article!

The example I've been toying with is:

#define f(a)  a+g
#define g(a)  f(a)
f(1)(2)

The expansion follows as:
	"f(1)(2)" -> "1+g(2)" -> "1+f(2)"
Do I now expand f or not? I currently thinking of not. But consider the
following (with above definitions): f(1)(2)(3)(4)(5)(6)
If I do re-expand f, this leads to "1+2+3+4+5+6+g", which is kind of neat-o.
To prevent the re-expansion, I put a special token into the token sequence
following the replacement list of f. When I start processing the replacement
list of f, I turn f off, and then don't turn it on again until I see that
special token (basically, it says "turn f back on" - it could also say
"turn g back on" and so forth). The catch is that the routine that expands
macros does not trigger these special tokens, it moves them to the end of
replacement. So with my special token represented as # followed by the name
of the macro to turn back on (eg #f), the replacement looks like:

f(1)(2)
	replace f(1) => 1 + g #f	/* f now turned off */
	rescan: 1 + g #f ( 2 )
		I see g, and that it is a valid macro name, so I call the
		replacement routine on g.
		The replacement routine sees the #f token, remembers it but
		does not activate it yet. Therefore it sees 1 + g ( 2 )
	replace g(2) => 1 + f(2) #g	/* g now turned off */
		append the #f to get 1 + f(2) #g #f
	rescan: 1 + f(2) #g #f
		f is off, so we get 1 + f ( 2 )
		see the #g, so turn g back on
		see the #f so turn f back on

When I asked co-workers, 2 said the expansion should be "1+2+g", and 1
said "1+f(2)". I thought "1+f(2)" because the phrase in the standard
"the resulting preprocessing sequence is rescanned with the rest of the
source file's preprocessing tokens" (section 3.8.3.4) vaguely suggests that
this is the case, but I think the wording should be made clear.

Another not well defined point has to do with the ## CPP operator.

#define g(a) #a
#define f(a,b) g(#a ## #b)
f(hi,there)

Is the resulting C string "\"hi\"\"there\"" or "hithere"? I.e., what does 
it mean to ## 2 strings? The phases of translation says that string
concatenation by adjacency comes after macro expansion, so it's a question
of whether ## should join the 2 strings. I favour it joining the strings
because you can have them not joined by just saying 

#define f(a,b) g(#a #b)

although this results in an extra space in the final expansion of g.

David Tanguay, Software Development Group, University of Waterloo

jagardner@orchid.UUCP (10/23/87)

In article <167@sdti.UUCP> turner%sdti@harvard.harvard.edu (Prescott K. Turner, Jr.) writes:
}in the context of       #define z +y
}                        #define y(b) z
}                        #define f(a) f(a(3))
}start with              
}   f(z)
}      identify invocation f(z)
}      process argument
}         z
}            identify invocation z
}            replace z
}         +y
}            rescan, with definition of z suppressed
}      replace f(z)
}   f(+y(3))
}      rescan, with definition of f suppressed
}      identify invocation y(3)
}      process argument
}         3
}      replace y(3)
}   f(+z)
}      rescan starting at "+", with definitions of f and y suppressed 
}      identify invocation z
}      replace z
}   f(++y)
}      rescan, starting at 2nd "+", with definitions of f, y, and z suppressed
}   f(++y)
}done

Just a little point about the above: 
The result is displayed as "f(++y)", but at no point was a single "++" token
ever generated. The question I have is whether rescanning implies that
preprocesing tokens are converted back into text and re-lexed when rescanned.
The difference is whether the above represents the token sequence
"f" "(" "++" "y" ")" or the sequence "f" "(" "+" "+" "y" ")". I think that
the tokens are lexed once, and that the latter sequence is the correct
result, but I don't think I could make a good case for it (or rather, a good
case forbidding the other).

David Tanguay, Software Development Group, University of Waterloo

daveb@geac.UUCP (10/26/87)

In article <11323@orchid.waterloo.edu> datanguay@watbun.waterloo.edu (David Tanguay) writes:
>In article <167@sdti.UUCP> turner%sdti@harvard.harvard.edu (Prescott K. Turner, Jr.) writes:
>}in the context of       #define z +y
>}                        #define y(b) z
>}                        #define f(a) f(a(3))

  I suspect the reason the ANSI committee came out with the
"rescanning turned off" was history:
  The C pre-processor was and is an improper subset of m4, the
Unix[tm] general-purpose preprocessor.  CPP was small and simple,
had a processing cost comparable to "cat" and didn't try to do
everything. M4 was more expensive and did more, and one used it when
one needed more, especially a quoting mechanism to control
rescanning.

  If the semantics of the current cpp can be achieved without
rescanning of X within expansion of X, then they may well be trying
for simplicity and compatibility...

-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

lewisd@homxc.UUCP (David Lewis) (10/26/87)

In article <11322@orchid.waterloo.edu>, jagardner@orchid.waterloo.edu (Jim Gardner) writes:
> Wow! I was just about to post a similar article!
> 
> The example I've been toying with is:
> 
> #define f(a)  a+g
> #define g(a)  f(a)
> f(1)(2)
> 
> The expansion follows as:
> 	"f(1)(2)" -> "1+g(2)" -> "1+f(2)"

How do you determine what the pre-processor expands a macro to be?
Is there any way to take a look at the results of such an expansion?
Can you see the intermediate file produced after all #includes and 
#ifdefs and #defines are evaluated?
-- 

David B. Lewis    {ihnp4,allegra,ulysses}!homxc!lewisd
201-615-5306 EDT

jagardner@orchid.UUCP (10/30/87)

In article <1880@homxc.UUCP> lewisd@homxc.UUCP (David Lewis) writes:
>How do you determine what the pre-processor expands a macro to be?
>Is there any way to take a look at the results of such an expansion?
>Can you see the intermediate file produced after all #includes and 
>#ifdefs and #defines are evaluated?

I'm not sure exactly what you mean. One way to see what CPP expands a macro
to be is to run a source file with the macro in it through CPP and look at
the output. In some environments this involves either invoking C with
appropriate arguments or calling the cpp pass directly. Another way
is to carefully read the dpANS C section on CPP and pretend you're it.
The latter will place your sanity in jeopardy.

David Tanguay, Software Development Group, University of Waterloo

gwyn@brl-smoke.ARPA (Doug Gwyn ) (10/30/87)

In article <1880@homxc.UUCP> lewisd@homxc.UUCP (David Lewis) writes:
>How do you determine what the pre-processor expands a macro to be?
>Is there any way to take a look at the results of such an expansion?

Why ask the list; you do have a manual, don't you?
See if it says that your "cc" command supports a -P or -E option.

If you have a separate text-to-text preprocessor, it is probably
called /lib/cpp and you can also run it by itself and type stuff
at it to see what it does.

This is all implementation-dependent; not all C compilers have an
explicit preprocessing phase.  Sometimes it's bundled into the
lexical analyzer.

rbutterworth@orchid.UUCP (10/30/87)

In article <11446@orchid.waterloo.edu>, jagardner@orchid.waterloo.edu (Jim Gardner) writes:
> In article <1880@homxc.UUCP> lewisd@homxc.UUCP (David Lewis) writes:
> > How do you determine what the pre-processor expands a macro to be?
> > Is there any way to take a look at the results of such an expansion?
> > Can you see the intermediate file produced after all #includes and 
> > #ifdefs and #defines are evaluated?
> I'm not sure exactly what you mean. One way to see what CPP expands a macro
> to be is to run a source file with the macro in it through CPP and look at
> the output. In some environments this involves either invoking C with
> appropriate arguments or calling the cpp pass directly. Another way
> is to carefully read the dpANS C section on CPP and pretend you're it.
> The latter will place your sanity in jeopardy.
> 
> David Tanguay, Software Development Group, University of Waterloo

But ANSI only defines how the compiler in total will work.
It says nothing about the existence of a separate CPP program
or a compiler option that will show the preprocessed output.

I don't think it is hard to imagine a compiler that does the preprocessing
as part of the compilation itself and not as a separate step.

For instance, consider the ANSI compiler produced by the
Software Development Group (i.e. you).  It comes without
a CPP and and I don't see any option mentioned in the expl
file for the compiler that would produce CPP-like output.