[mod.std.c] mod.std.c Digest Volume 4 : Issue 6

osd7@homxa.UUCP (Orlando Sotomayor-Diaz) (03/07/85)

From: Orlando Sotomayor-Diaz (The Moderator) <cbosgd!std-c>


mod.std.c Digest            Thu,  7 Mar 85       Volume 4 : Issue   6 

Today's Topics:
                         preprocessor issues
----------------------------------------------------------------------

Date: 
From: attunix!lr (Larry Rosler)
Subject: preprocessor issues
To: houxm!homxa!osd7

As a member of the AT&T organization that supports the UNIX System V cpp,
I sympathize with the position represented by Ken Arnold's submissions,
and argued as strongly as I could for scanning strings for formal
parameters.  However, as chairman of the Language Subcommittee
responsible for the preprocessor portion of the draft, I also must
try to justify the position of the majority of the committee on this
matter.

It is true that "... many ... systems have scanned
strings in macros for parameter substitution since parameter
macros were invented...," but this is in direct contravention
to the C Reference Manual, p. 207: "Text inside a string or a
character constant is not subject to replacement."

The accurate history is that in 1978 John Reiser rewrote the original
cpp (by Mike Lesk) for efficiency, and threw in several dozen "features"
along the way.  These were NEVER documented.  (The ANSI Committee does
not consider /usr/src/cmd/cpp/README to be documentation!)  Most
implementors of C used the description in K&R as the specification,
omitting all of these undocumented features.  They have been persuaded
by me and the other UNIX-related representatives to accept those
features that were theoretically sound, and even some obvious
enhancements (such as #elif).

Two "features" have been the subject of many days of heated debate
because of theoretical deficiencies.  The concatenation of two tokens
into one by a "disappearing comment" grossly violates the explicit
statement in K&R that comments serve as "white space," hence separate
tokens.  Furthermore, programs that use that misfeature fail when
linted, as lint uses comments as pragmas and runs cpp with the -C
flag.  (Yes, I know it could be hacked to strip empty comments,
but it wasn't.)  So first the committee had to be convinced that
concatenating tokens was worthwhile, and then an acceptable method
had to be INVENTED.

The scanning of strings for embedded identifiers was something the
committee simply could not accept.  A string is a token, and what
is inside a string has no grammar from which an identifier can be
derived in any definable way.  Once again the committee had to be
convinced that substituting macro arguments in strings was worthwhile,
and then an acceptable method had to be INVENTED.

I am as aware as anyone of the number of existing "working" programs
that will not be maximally portable according to the Standard.  But
these programs are not broken in their own environments, unless
implementors remove capabilities from their own preprocessors,
which is not anticipated.

Lest any reader get the misconception that the Reiser techniques
are *better* than those proposed by ANSI,  consider the following,
which is the closest I could come to implementing the example in
the ANSI draft using the UNIX version of cpp:

	#define debug(s, t)	printf("x-s= %d, x-t= %s", \
					x/**/s, x/**/t)
	debug(1, 2)

which results in

	printf("x-1=%d, x- 2= %2", x1, x 2)

The surprises are fairly apparent.  Of course, on my second try
I wrote:

	#define debug(S, T)	printf("x-S= %d, x-T= %s",\
					x/**/S, x/**/T)
	debug(1,2)

and obtained

	printf("x-1=%d, x-2= %s", x1, x2)

This, at least, comes close.  But trial-and-error is hardly the
way to define the behavior of a macro processor.

This lengthy note at best conveys only the flavor of MANY hours of
vehement confrontation in the committee.  Despite the views expressed
on the net in recent months, we are very "savvy" about the preprocessor,
indeed about many preprocessors.  I hope I have encouraged further
discussion of these issues, but on a more enlightened level.

Larry Rosler, {allegra,ihnp4}!attunix!lr

------------------------------

End of mod.std.c Digest - Thu,  7 Mar 85 08:22:20 EST
******************************
USENET -> posting only through cbosgd!std-c.
ARPA -> ... through cbosgd!std-c@BERKELEY.ARPA (NOT to INFO-C)
In all cases, you may also reply to the author(s) above.