osd7@homxa.UUCP (Orlando Sotomayor-Diaz) (03/07/85)
From: Orlando Sotomayor-Diaz (The Moderator) <cbosgd!std-c> mod.std.c Digest Thu, 7 Mar 85 Volume 4 : Issue 6 Today's Topics: preprocessor issues ---------------------------------------------------------------------- Date: From: attunix!lr (Larry Rosler) Subject: preprocessor issues To: houxm!homxa!osd7 As a member of the AT&T organization that supports the UNIX System V cpp, I sympathize with the position represented by Ken Arnold's submissions, and argued as strongly as I could for scanning strings for formal parameters. However, as chairman of the Language Subcommittee responsible for the preprocessor portion of the draft, I also must try to justify the position of the majority of the committee on this matter. It is true that "... many ... systems have scanned strings in macros for parameter substitution since parameter macros were invented...," but this is in direct contravention to the C Reference Manual, p. 207: "Text inside a string or a character constant is not subject to replacement." The accurate history is that in 1978 John Reiser rewrote the original cpp (by Mike Lesk) for efficiency, and threw in several dozen "features" along the way. These were NEVER documented. (The ANSI Committee does not consider /usr/src/cmd/cpp/README to be documentation!) Most implementors of C used the description in K&R as the specification, omitting all of these undocumented features. They have been persuaded by me and the other UNIX-related representatives to accept those features that were theoretically sound, and even some obvious enhancements (such as #elif). Two "features" have been the subject of many days of heated debate because of theoretical deficiencies. The concatenation of two tokens into one by a "disappearing comment" grossly violates the explicit statement in K&R that comments serve as "white space," hence separate tokens. Furthermore, programs that use that misfeature fail when linted, as lint uses comments as pragmas and runs cpp with the -C flag. (Yes, I know it could be hacked to strip empty comments, but it wasn't.) So first the committee had to be convinced that concatenating tokens was worthwhile, and then an acceptable method had to be INVENTED. The scanning of strings for embedded identifiers was something the committee simply could not accept. A string is a token, and what is inside a string has no grammar from which an identifier can be derived in any definable way. Once again the committee had to be convinced that substituting macro arguments in strings was worthwhile, and then an acceptable method had to be INVENTED. I am as aware as anyone of the number of existing "working" programs that will not be maximally portable according to the Standard. But these programs are not broken in their own environments, unless implementors remove capabilities from their own preprocessors, which is not anticipated. Lest any reader get the misconception that the Reiser techniques are *better* than those proposed by ANSI, consider the following, which is the closest I could come to implementing the example in the ANSI draft using the UNIX version of cpp: #define debug(s, t) printf("x-s= %d, x-t= %s", \ x/**/s, x/**/t) debug(1, 2) which results in printf("x-1=%d, x- 2= %2", x1, x 2) The surprises are fairly apparent. Of course, on my second try I wrote: #define debug(S, T) printf("x-S= %d, x-T= %s",\ x/**/S, x/**/T) debug(1,2) and obtained printf("x-1=%d, x-2= %s", x1, x2) This, at least, comes close. But trial-and-error is hardly the way to define the behavior of a macro processor. This lengthy note at best conveys only the flavor of MANY hours of vehement confrontation in the committee. Despite the views expressed on the net in recent months, we are very "savvy" about the preprocessor, indeed about many preprocessors. I hope I have encouraged further discussion of these issues, but on a more enlightened level. Larry Rosler, {allegra,ihnp4}!attunix!lr ------------------------------ End of mod.std.c Digest - Thu, 7 Mar 85 08:22:20 EST ****************************** USENET -> posting only through cbosgd!std-c. ARPA -> ... through cbosgd!std-c@BERKELEY.ARPA (NOT to INFO-C) In all cases, you may also reply to the author(s) above.