peterson@Compass.COM (Richard Peterson) (03/21/90)
Michael Platoff's inquiry regarding work to incorporate the semantics of C preprocessor directives within a program representation used for source code transformation yielded a response from Jean-Marie Larcheveque citing the line number and file name directives generated by text-to-text preprocessors. I believe Michael's point, though, was that having a line for line correspondence between the source input to the preprocessor and the source input to the parser doesn't really help much in analyzing the program structure in terms of the original identifiers (sometimes macros) and expressions (sometimes #if expressions) by which the program is understood and maintained. We have implemented a C (ANSI plus some extensions) front end which includes an integrated preprocessor and generates an AST-like program representation. The primary motives for the integrated preprocessor were to eliminate the overhead of performing lexical analysis twice, and allow for well-integrated error reporting. Informational, warning, and error messages from preprocessing, parsing, and semantics all show the original source line with a flag on the offending token, whether it's a preprocessing directive, a macro invocation, or is encountered in excluded text. Although not implemented currently, the design allows for flagging error-causing token(s) within a macro expansion in addition to flagging the macro invocation in the original source file. We have not attempted to represent preprocessing information other than true source position in the AST, and the phrase structure grammar input to our LALR(1) frontend generator has no knowledge of preprocessing constructs. Knowledge of preprocessing is confined to the lexical grammar input to the frontend generator, and to hand-coded routines in the screener. The parser calls the screener for each token, unaware of preprocessing actions because they are not useful in the normal process of compilation. However, at the time that the parser calls the screener there is a significant amount of information available about preprocessing, because the preprocessor really produces one token at a time. An #ifdef that appears within an expression is processed right after the preceding token is handed to the parser, and each token that reaches the parser does so in the presence of the conditional inclusion stack maintained by the preprocessor. Even within macro expansions, at the time the parser gets a token generated by a macro, there exists a data structure representing the derivation of that token through nested macro calls, parameter expansions, and token pasting or stringization operations. So it's certainly possible for a C language processor to be aware of the effects of preprocessing in some sense. Storing the information in a program representation that would be useful to the kinds of tools described sounds pretty intractable, at least in the general case, for the reasons Michael listed. Perhaps with a suitable set of (checkable) constraints imposed on the use of the preprocessor by programs to be represented, some useful capabilities could be developed. Rich Peterson Internet: peterson@compass.com Compass, Inc. UUCP: {think,encore,cvbnet}!compass!peterson 550 Edgewater Drive Phone: (617) 245-9540 Wakefield, MA 01880, USA. -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.