turner@sdti.UUCP (Prescott K. Turner) (10/20/87)
I have been working with C macro processing, as described in the draft proposed standard (henceforth the "standard"). In doing so, I have come across some features which may be of interest because they are vaguely specified or obscure. Examples are provided, since these aspects are not covered by the examples in the standard. The standard is clear in 3.8.3.4 about what is rescanned after macro replacement and parameter substitution. Most of the obscurities arise because (as discussed in the rationale) "the Committee agreed simply to turn off the definition of a macro for the duration of the expansion of that macro." After an attempt to understand the pertinent paragraph, it is my belief that the task of turning off a macro definition is not at all simple. The first example deals with the term "nested replacmement". A macro definition, when suppressed within the macro's replacment, continues suppressed in further nested replacements. But how do you tell whether the invocation of a function-like macro is nested? Is it the macro name that counts, or must the parenthesized argument list also be nested? The latter is perhaps a more intuitive interpretation of "nested replacement", but is not consistent with the way a macro invocation is suppressed (which is based only on where the name appears). It also leads to some obscurities which I won't get into here. So I prefer an interpretation where "if any other macro name is found during such a scan of the replacement list, and is expanded, the definition of the original macro is suppressed in the other macro's replacement". Here's a contrived example in which the definition of "nested replacment" makes a difference. in the context of #define S T #define T(x) S start with S(1) replacing S T(1) where S is suppressed in "T" replacing T(1) S *** *** I consider the replacement of "T(1)" by "S" as nested in the first replacement "S". Therefore both S and T are suppressed in the second replacement "S". (In the other interpretation, this would be further expanded to "T".) Other interesting cases are more complex. The above example is interesting because the invocation of T spans the boundary of S's suppression. What if T has S as an argument? The argument appears in a position where the definition of S is not directly suppressed. Is processing of the macro name S as an argument considered part of the "nested replacement" and therefore not expanded? I lean toward suppressing the definition of S in this context for the sake of simplicity. in the context of #define S T #define T(x) x start with S(S) replacing S T(S) where S is suppressed in "T" processing T(S) expanding the argument S as part of the processing of T (no expansion??) replacing T(S) S where S and T are suppressed in "S" The rest of my examples deal with "nonreplaced macro name preprocessing tokens". Fortunately, the standard is less open to varying interpretations than in the above cases. It also gives an example, from which the relevant piece is: in the context of #define z z[0] #define f(a) f(2 * (a)) start with f(f(z)) identify invocation f(f(z)) process argument f(z) identify invocation f(z) process argument z identify invocation z replace z z[0] rescan, with definition of z suppressed z[0] The "z" is marked "nonreplaced". replace f(z) f(2 * (z[0])) The "z" is marked "nonreplaced". rescan, with definition of f suppressed *** f(2 * (z[0])) The "f" and "z" are marked "nonreplaced". replace f(f(z)) f(2 * (f(2 * (z[0])))) The "z" and second "f" are marked "nonreplaced". rescan, with definition of f suppressed *** f(2 * (f(2 * (z[0])))) The "z" and "f"s are marked "nonreplaced". done Only the steps marked *** illustrate the point, because the sole reason "z" is not expanded is that it has been examined and not replaced. An area left unexplored by this example is that there may be other reasons for not expanding a macro name (in particular it may be a function-like macro not followed by "("). In case there are other reasons, in addition to the suppression of the macro definition, one would still consider the token "nonreplaced". For example: in the context of #define z(x) +z #define f(a) f(a(3)) start with f(z(1)) identify invocation f(z(1)) process argument z(1) identify invocation z(1) replace z(1) +z rescan, with definition of z suppressed +z *1* The "z" is marked "nonreplaced". replace f(z(1)) f(+z(3)) The "z" is marked "nonreplaced". rescan, with definition of f suppressed *2* f(+z(3)) The "f" and "z" are marked "nonreplaced". done The point of the above example is that at *1* "z" is marked "nonreplaced" even though no "(" is present. Subsequently at *2* "z(3)" is not expanded even though it was never rejected in this form. Macro expansion has other obscure yet surprising aspects. If z expands to y during argument processing, then during rescan y can have a replacement in which z expands again. As an example: in the context of #define z +y #define y(b) z #define f(a) f(a(3)) start with f(z) identify invocation f(z) process argument z identify invocation z replace z +y rescan, with definition of z suppressed replace f(z) f(+y(3)) rescan, with definition of f suppressed identify invocation y(3) process argument 3 replace y(3) f(+z) rescan starting at "+", with definitions of f and y suppressed identify invocation z replace z f(++y) rescan, starting at 2nd "+", with definitions of f, y, and z suppressed f(++y) done -- Prescott K. Turner, Jr. Software Development Technologies, Inc. 375 Dutton Rd., Sudbury, MA 01776 USA (617) 443-5779 UUCP:necntc!necis!mrst!sdti!turner
minow@decvax.UUCP (Martin Minow) (10/22/87)
In article <167@sdti.UUCP> turner%sdti@harvard.harvard.edu
(Prescott K. Turner, Jr.) notes that the Draft Ansi C Standard rules
for macro rescanning are either vague, obscure, or both.
I believe their intent can best be illustrated by an example:
#if debug > 0
#define exit(status) (printf("Exit, status = %d\n", status), exit(status))
#endif
If macros were rescanned, the evaluation would loop forever. Instead,
the evaluation is expanded as a "normal" function. Hope this is clearer.
Martin Minow
decvax!minow
jagardner@orchid.UUCP (10/23/87)
Wow! I was just about to post a similar article! The example I've been toying with is: #define f(a) a+g #define g(a) f(a) f(1)(2) The expansion follows as: "f(1)(2)" -> "1+g(2)" -> "1+f(2)" Do I now expand f or not? I currently thinking of not. But consider the following (with above definitions): f(1)(2)(3)(4)(5)(6) If I do re-expand f, this leads to "1+2+3+4+5+6+g", which is kind of neat-o. To prevent the re-expansion, I put a special token into the token sequence following the replacement list of f. When I start processing the replacement list of f, I turn f off, and then don't turn it on again until I see that special token (basically, it says "turn f back on" - it could also say "turn g back on" and so forth). The catch is that the routine that expands macros does not trigger these special tokens, it moves them to the end of replacement. So with my special token represented as # followed by the name of the macro to turn back on (eg #f), the replacement looks like: f(1)(2) replace f(1) => 1 + g #f /* f now turned off */ rescan: 1 + g #f ( 2 ) I see g, and that it is a valid macro name, so I call the replacement routine on g. The replacement routine sees the #f token, remembers it but does not activate it yet. Therefore it sees 1 + g ( 2 ) replace g(2) => 1 + f(2) #g /* g now turned off */ append the #f to get 1 + f(2) #g #f rescan: 1 + f(2) #g #f f is off, so we get 1 + f ( 2 ) see the #g, so turn g back on see the #f so turn f back on When I asked co-workers, 2 said the expansion should be "1+2+g", and 1 said "1+f(2)". I thought "1+f(2)" because the phrase in the standard "the resulting preprocessing sequence is rescanned with the rest of the source file's preprocessing tokens" (section 3.8.3.4) vaguely suggests that this is the case, but I think the wording should be made clear. Another not well defined point has to do with the ## CPP operator. #define g(a) #a #define f(a,b) g(#a ## #b) f(hi,there) Is the resulting C string "\"hi\"\"there\"" or "hithere"? I.e., what does it mean to ## 2 strings? The phases of translation says that string concatenation by adjacency comes after macro expansion, so it's a question of whether ## should join the 2 strings. I favour it joining the strings because you can have them not joined by just saying #define f(a,b) g(#a #b) although this results in an extra space in the final expansion of g. David Tanguay, Software Development Group, University of Waterloo
jagardner@orchid.UUCP (10/23/87)
In article <167@sdti.UUCP> turner%sdti@harvard.harvard.edu (Prescott K. Turner, Jr.) writes: }in the context of #define z +y } #define y(b) z } #define f(a) f(a(3)) }start with } f(z) } identify invocation f(z) } process argument } z } identify invocation z } replace z } +y } rescan, with definition of z suppressed } replace f(z) } f(+y(3)) } rescan, with definition of f suppressed } identify invocation y(3) } process argument } 3 } replace y(3) } f(+z) } rescan starting at "+", with definitions of f and y suppressed } identify invocation z } replace z } f(++y) } rescan, starting at 2nd "+", with definitions of f, y, and z suppressed } f(++y) }done Just a little point about the above: The result is displayed as "f(++y)", but at no point was a single "++" token ever generated. The question I have is whether rescanning implies that preprocesing tokens are converted back into text and re-lexed when rescanned. The difference is whether the above represents the token sequence "f" "(" "++" "y" ")" or the sequence "f" "(" "+" "+" "y" ")". I think that the tokens are lexed once, and that the latter sequence is the correct result, but I don't think I could make a good case for it (or rather, a good case forbidding the other). David Tanguay, Software Development Group, University of Waterloo
daveb@geac.UUCP (10/26/87)
In article <11323@orchid.waterloo.edu> datanguay@watbun.waterloo.edu (David Tanguay) writes: >In article <167@sdti.UUCP> turner%sdti@harvard.harvard.edu (Prescott K. Turner, Jr.) writes: >}in the context of #define z +y >} #define y(b) z >} #define f(a) f(a(3)) I suspect the reason the ANSI committee came out with the "rescanning turned off" was history: The C pre-processor was and is an improper subset of m4, the Unix[tm] general-purpose preprocessor. CPP was small and simple, had a processing cost comparable to "cat" and didn't try to do everything. M4 was more expensive and did more, and one used it when one needed more, especially a quoting mechanism to control rescanning. If the semantics of the current cpp can be achieved without rescanning of X within expansion of X, then they may well be trying for simplicity and compatibility... -- David Collier-Brown. {mnetor|yetti|utgpu}!geac!daveb Geac Computers International Inc., | Computer Science loses its 350 Steelcase Road,Markham, Ontario, | memory (if not its mind) CANADA, L3R 1B3 (416) 475-0525 x3279 | every 6 months.
lewisd@homxc.UUCP (David Lewis) (10/26/87)
In article <11322@orchid.waterloo.edu>, jagardner@orchid.waterloo.edu (Jim Gardner) writes: > Wow! I was just about to post a similar article! > > The example I've been toying with is: > > #define f(a) a+g > #define g(a) f(a) > f(1)(2) > > The expansion follows as: > "f(1)(2)" -> "1+g(2)" -> "1+f(2)" How do you determine what the pre-processor expands a macro to be? Is there any way to take a look at the results of such an expansion? Can you see the intermediate file produced after all #includes and #ifdefs and #defines are evaluated? -- David B. Lewis {ihnp4,allegra,ulysses}!homxc!lewisd 201-615-5306 EDT
jagardner@orchid.UUCP (10/30/87)
In article <1880@homxc.UUCP> lewisd@homxc.UUCP (David Lewis) writes: >How do you determine what the pre-processor expands a macro to be? >Is there any way to take a look at the results of such an expansion? >Can you see the intermediate file produced after all #includes and >#ifdefs and #defines are evaluated? I'm not sure exactly what you mean. One way to see what CPP expands a macro to be is to run a source file with the macro in it through CPP and look at the output. In some environments this involves either invoking C with appropriate arguments or calling the cpp pass directly. Another way is to carefully read the dpANS C section on CPP and pretend you're it. The latter will place your sanity in jeopardy. David Tanguay, Software Development Group, University of Waterloo
gwyn@brl-smoke.ARPA (Doug Gwyn ) (10/30/87)
In article <1880@homxc.UUCP> lewisd@homxc.UUCP (David Lewis) writes: >How do you determine what the pre-processor expands a macro to be? >Is there any way to take a look at the results of such an expansion? Why ask the list; you do have a manual, don't you? See if it says that your "cc" command supports a -P or -E option. If you have a separate text-to-text preprocessor, it is probably called /lib/cpp and you can also run it by itself and type stuff at it to see what it does. This is all implementation-dependent; not all C compilers have an explicit preprocessing phase. Sometimes it's bundled into the lexical analyzer.
rbutterworth@orchid.UUCP (10/30/87)
In article <11446@orchid.waterloo.edu>, jagardner@orchid.waterloo.edu (Jim Gardner) writes: > In article <1880@homxc.UUCP> lewisd@homxc.UUCP (David Lewis) writes: > > How do you determine what the pre-processor expands a macro to be? > > Is there any way to take a look at the results of such an expansion? > > Can you see the intermediate file produced after all #includes and > > #ifdefs and #defines are evaluated? > I'm not sure exactly what you mean. One way to see what CPP expands a macro > to be is to run a source file with the macro in it through CPP and look at > the output. In some environments this involves either invoking C with > appropriate arguments or calling the cpp pass directly. Another way > is to carefully read the dpANS C section on CPP and pretend you're it. > The latter will place your sanity in jeopardy. > > David Tanguay, Software Development Group, University of Waterloo But ANSI only defines how the compiler in total will work. It says nothing about the existence of a separate CPP program or a compiler option that will show the preprocessed output. I don't think it is hard to imagine a compiler that does the preprocessing as part of the compilation itself and not as a separate step. For instance, consider the ANSI compiler produced by the Software Development Group (i.e. you). It comes without a CPP and and I don't see any option mentioned in the expl file for the compiler that would produce CPP-like output.