[net.lang.c] ANSI C and the C Pre-Processor

hansen@pegasus.UUCP (09/06/84)

One of the minor(?) changes being proposed for the C standard is a change in
the C Pre-Processor to change it from a totally character oriented processor
closer to a token processor. It already does this to a certain degree by
recognizing comments as separate tokens that don't get scanned for text to
be replaced. However, this idea is being extended to include strings and
character constants as tokens that don't get scanned for replacement text.
The idea is to prevent bugs similar to the following:

#define foo(d,g)	printf("%d,%d", d, g)

This would expand 

	foo(f,e);

to

	printf("%f,%f", f, e);

and suddenly your integer variables f and e would be getting printed out in
%f format rather than %d format. Under the new rules, the expansion would be:

	printf("%d,%d", f, e);

This certainly solves the above problem. However, I have seen plenty of
programs which use some of the following constructs:

#define libpath(x)		"/usr/lib/x"
#define CTRL(x)			('x'&037)
#define PRINT1(format,arg)	printf("arg=%format.\n", arg);

A common place to find the libpath construct is in uparm.h used by (among
many others) the vi, curses, terminfo and termcap packages on both System Vr2
machines and 32V/BSD machines.

I don't know of any system code that depends on the CTRL example, but I know
of a number of people who have used it in the past.

Those of you who have read the C Puzzle Book will realize that NONE of Alan
Feuer's programs will work anymore!


The questions are: Should this change be endorsed? If so, what should be
done to bring back the lost functionality? If not, how would you make CPP
more regular in its scanning rather than that which is the de-facto standard
from Reiser?

One possibility would be to introduce a new construct #sdefine which has the
special property that strings and character constants would also be scanned
for replacement; otherwise it would be identical to #define. The programs
which use the above constructs would have to be changed to use #sdefine
instead of #define, but no other changes would have to be made.

Without something to replace the lost functionality, I feel that the number
of programs which would have to be changed would be major.

					Tony Hansen
					pegasus!hansen

hamilton@uiucuxc.UUCP (09/08/84)

#R:pegasus:-169100:uiucuxc:21000013:000:211
uiucuxc!hamilton    Sep  7 21:03:00 1984

c'mon, is it all that hard to avoid using "d" as a formal parameter for
your "foo" macro?  why break so many working programs to "fix" such a
nonproblem?
	wayne ({decvax,ucbvax}!pur-ee!uiucdcs!uiucuxc!)hamilton

henry@utzoo.UUCP (Henry Spencer) (09/18/84)

> ............ However, this idea is being extended to include strings and
> character constants as tokens that don't get scanned for replacement text.

K+R, section 12.1:  "Text inside a string or a character constant is
not subject to replacement."  In other words, this is not something new:
the language has always been specified to behave that way.

> ........................................ However, I have seen plenty of
> programs which use some of the following constructs:
> 
> #define libpath(x)		"/usr/lib/x"
> #define CTRL(x)			('x'&037)
> #define PRINT1(format,arg)	printf("arg=%format.\n", arg);

Such programs are broken and unportable.  Most non-Unix C compilers have
been implemented "by the book", which means that none of the above things
will work on them unless the implementors had a lot of Unix experience
or had a Unix system to compare against.

> The questions are: Should this change be endorsed? If so, what should be
> done to bring back the lost functionality? If not, how would you make CPP
> more regular in its scanning rather than that which is the de-facto standard
> from Reiser?

Of course it should be endorsed, since it's not really a change at all.
The standard is the documentation, not Reiser's code.

As for what should be done to bring back the lost functionality...  the
ANSI C folks have basically said "if you want a general-purpose macro
processor, use m4".  The programs that this "change" will break are
broken already, and should be fixed to do it right.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

kpmartin@watmath.UUCP (Kevin Martin) (09/22/84)

>> ........................................ However, I have seen plenty of
>> programs which use some of the following constructs:
>> 
>> #define libpath(x)		"/usr/lib/x"
>> #define CTRL(x)			('x'&037)
>> #define PRINT1(format,arg)	printf("arg=%format.\n", arg);
>Such programs are broken and unportable.
>
>> The questions are: Should this change be endorsed? If so, what should be
>> done to bring back the lost functionality? If not, how would you make CPP
>> more regular in its scanning rather than that which is the de-facto standard
>> from Reiser?
>Of course it should be endorsed, since it's not really a change at all.
>				Henry Spencer @ U of Toronto Zoology


For a change, I disagree with Henry. However, there are two questions
here, and I am not sure everyone is making the distinction:
1) Should strings be scanned for token replacement (i.e. look for #define'd
   names and replace them with their expansion)? (I call this "token
   replacement" or "macro substitution")
2) When a #define'd token is being inserted, and its expansion contains
   a string, should that string be scanned for formal parameters to the
   macro? (I call this "parameter substitution")
It is fairly evident that the answer to (1) is NO. Otherwise, no string
would be safe. You couldn't have the name 'putc' in a string, for instance.

I think the answer to (2) is YES. It is often useful to have the formal
parameters substituted into string or character constants, and it is not only
possible but EASY for the programmer to avoid using any formal parameters
which match tokens in any string in the expansion.
e.g. it is easy to avoid #define f(d,x) printf( "%d %d", d, x )

The borderline between (1) and (2) is the size of the area which must be
examined for conflicting identifiers. For (1), you must check every include
file (and the source file up to the occurrence of the string in question).
For (2), you only have the check that the formal parameters don't clash.
And correcting clashes is far easier for (2) than for (1).
                          Kevin Martin, UofW Software Development Group

joemu@tekecs.UUCP (Joe Mueller) (09/27/84)

> As for what should be done to bring back the lost functionality...  the
> ANSI C folks have basically said "if you want a general-purpose macro
> processor, use m4".  The programs that this "change" will break are
> broken already, and should be fixed to do it right.

As Henry stated, the X3J11 committee (ANSI C), felt that the preprocessor
was not intended to be a general purpose macro processor, BUT, we did
acknowledge that there was a large body of code that used these types
of "features". The committee is currently concidering proposals for

a) token concatination operations within the preprocessor. It will
   definitely NOT be startoftoken/**/argument. Currently it looks like
   the # will be used like this: startoftoken#argument. I don't believe
   we have definitely decided the syntax for the operation. I think that
   the committee did decide that the functionality was needed.

b) "stringizing" (I didn't make up this term, someone else did) arguments
   is also under concideration. One proposal is to do the substitution
   if the argument name is the only thing within the quotes. i.e.
   #define foo(bar) printf("bar")
   will expand bar within the quotes where
   #define foo(bar) printf("the argument was bar")
   will not expand bar.

The committee is not as dogmatic as it sounds on the net. It is our
intention to produce a standard that will allow someone to do
serious work without resorting to non-portable extensions.

Please continue to discuss concerns about the developing standard
on the net. I know several committe members (including myself) read
it regularly.

If you have alternate proposals for the machinery to do the above
operations, let me know.

mwm@ea.UUCP (09/28/84)

/***** ea:net.lang.c / utzoo!henry /  4:20 am  Sep 25, 1984 */
As for what should be done to bring back the lost functionality...  the
ANSI C folks have basically said "if you want a general-purpose macro
processor, use m4".  The programs that this "change" will break are
broken already, and should be fixed to do it right.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry
/* ---------- */

Great. Another tool that's nearly vital for writing C, but not available
on most (all) non-Unix systems. Anybody got pointers to a public domain
m4?

	<mike

kre@mulga.OZ (Robert Elz) (09/29/84)

From Henry Spencer:
|
| > ............ However, this idea is being extended to include strings and
| > character constants as tokens that don't get scanned for replacement text.
| 
| K+R, section 12.1:  "Text inside a string or a character constant is
| not subject to replacement."  In other words, this is not something new:
| the language has always been specified to behave that way.

I think it instructional to consider the wording of the proposed
(draft) standard.  [This is from the July version, I doubt that its
changed in the Sept one].

Sect 9.2: ..... Character constants and strings in the token sequence or
in the rest of the program are not scanned for defined identifiers or
formal parameters. ....

Now consider the wording in the April version (it was sect 9.1 then)

Sect 9.1: ..... Character strings in the token sequence or in the
rest of the program are not scanned for defined identifiers. ....

Note the difference.  K&R was never clear on this point - its
wording on this point (and others) was ambiguous.  That is,
a perfectly viable interpretation, taken by Reiser, was that
strings in the token sequence could be scanned for parameters.

There are (as has been pointed out many times) many reasons
for allowing this.  The ONLY one for denying it, that I can
see, is that some people get confused (don't understand what's
happening).  The right way to solve that problem is to clearly
document what happens - no-one will have any problems with
it if its made clear what will happen.

Henry continues:
| 
| > The questions are: Should this change be endorsed?
| 
| Of course it should be endorsed, since it's not really a change at all.
| The standard is the documentation, not Reiser's code.

The problem is that K&R is *not* a standard.  If it was, we wouldn't
need X3J11.  In the absence of a standard, and in the presence
of ambiguous documentation, the only place to look is in the
implementations.  Henry also stated (quote omitted) that most
non unix C compilers adopted the restrictive approach.  So,
now we have a conflict - no immedate practical reason (in terms
of broken code) for jumping one way or the other.  In short,
nearly the ideal situation for adopting the best solution.

If C were a language for amateur programmers, beginnners, etc,
I would tend to favour the restricted approach.  But that's not
what C is.  Its a dangerous language, filled with dangerous
features.  Its for professionals.  We should adopt the most
useful approach - the one that gives the greatest power to
the programnmer - that is clearly the liberal approach.

Pragmatically too, it will be much easier to convert programs
broken by this strategy (those in which macro replacement text
contains strings containing "accidental" references to parameters)
than those broken by the current draft proposed standard
(those that use replacement inside strings to good effect).
In the former case, all that needs to be done is to rename
the formal parameter.  In the latter, some whole new mechanism
needs to be devised - possibly requiring changes in the source.
I also suspect that less programs would be broken by the former.

Henry again:
| 
| As for what should be done to bring back the lost functionality...  the
| ANSI C folks have basically said "if you want a general-purpose macro
| processor, use m4".  The programs that this "change" will break are
| broken already, and should be fixed to do it right.

No-one is asking for a full blown macro processor, just that subset
that is really useful for C programs.  If the committee were to
take the "use m4" attitude, they would logically have to standardize
m4 as a (possibly optional) part of the C compiler.  Otherwise
all those programs that go to the trouble of adopting their
recommendation, and use m4, will stop being portable, which can
hardly be the aim.

Joe Mueller replied:
|
| As Henry stated, the X3J11 committee (ANSI C), felt that the preprocessor
| was not intended to be a general purpose macro processor, BUT, we did
| acknowledge that there was a large body of code that used these types
| of "features". The committee is currently concidering proposals for
| 
| a) token concatination operations within the preprocessor. It will
|    definitely NOT be startoftoken/**/argument. Currently it looks like
|    the # will be used like this: startoftoken#argument. I don't believe
|    we have definitely decided the syntax for the operation. I think that
|    the committee did decide that the functionality was needed.

I agree that this is needed - while I regret the need to alter
some of my source (I am a xxx/**/yyy user) I admit that this
is a revolting way of forming tokens, something better, anything
better, would be welcome.  [No, please don't tell me about your
favourite revolting way of avoiding xxx/**/yyy, I've seen most
of them, none of the existing ones is clearly better.]
The '#' operator proposal looks reasonable to me.  When you're
considering this, please also remember to do something about
the problems of blanks in the actual parameter strings - are
they signifigant, or not?  That is spaces between the preceding
comma or '(' and the start of the replacement text, and blanks after
the text before the ')' or next comma.  I would prefer that the
standard make it clear that these should not be included as
part of the replacement text.

Joe:
|
| b) "stringizing" (I didn't make up this term, someone else did) arguments
|    is also under concideration. One proposal is to do the substitution
|    if the argument name is the only thing within the quotes. i.e.
|    #define foo(bar) printf("bar")
|    will expand bar within the quotes where
|    #define foo(bar) printf("the argument was bar")
|    will not expand bar.

Ugh!  How could you justify that!  I appreciate, that combined with
constant string concatenation, it would give all the functionality
that is needed - the second example could be rephrased as:
	#define foo(bar) printf("the argument was ""bar")
but that's going to be a nasty distinction to try to explain
to anyone.  And that would break ALL existing implementations.

Seems to me that in this case, adopting the Reiser interpretation
is the better thing to do.  Document it clearly, so people aren't
trapped, and that should end the problems.

Robert Elz				decvax!mulga!kre

henry@utzoo.UUCP (Henry Spencer) (09/30/84)

>   ............................ One proposal is to do the substitution
>   if the argument name is the only thing within the quotes. i.e.
>   #define foo(bar) printf("bar")
>   will expand bar within the quotes where
>   #define foo(bar) printf("the argument was bar")
>   will not expand bar.

Some folks may not understand why the approach Joe describes is a full
solution to "stringizing".  Remember that the draft standard specifies
that consecutive string constants are concatenated at compile time, so
you could say something like

	#define foo(bar) printf("the argument was " "bar")

to get the effect of substitution within a string.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (09/30/84)

Hurray for Robert Elz!

henry@utzoo.UUCP (Henry Spencer) (10/02/84)

> ....................  K&R was never clear on this point - its
> wording on this point (and others) was ambiguous.  That is,
> a perfectly viable interpretation, taken by Reiser, was that
> strings in the token sequence could be scanned for parameters.

Point taken, K&R was never entirely clear on this, but it *sounded*
clear enough that a lot of people just assumed "no substitution
inside strings".  Including a lot of implementors.

> The problem is that K&R is *not* a standard.  If it was, we wouldn't
> need X3J11...

On the contrary, ask most C compiler implementors outside Bell and they
will tell you that K&R is the standard they worked from.  It is true
that K&R is not precise enough or complete enough to be an ANSI-quality
standard, but anyone who denies that K&R has been a *de facto* standard
for quite some time is kidding himself.  A poor one, yes, but a standard.

> No-one is asking for a full blown macro processor, just that subset
> that is really useful for C programs...

The trouble is that "really useful" is a subjective judgement.  My
personal view is that both token concatenation and in-string substitution
are useless junk.  I am quite aware that other people feel otherwise.
Anyone who has looked at the implementation of the S statistics language
has some idea of just how far "really useful" can be pushed.  (Much of
the S stuff needs *several passes* through m4 before compilation!)  In
practice, one has to draw the line somewhere.  The question is not "is
this feature useful?" but "is this feature useful *enough* to force
*everyone* to implement it?".  Saying "if you want fancy stuff, use m4"
is not a cop-out, it is a statement that the committee is not going to
solve all the world's problems.

> Seems to me that in this case, adopting the Reiser interpretation
> is the better thing to do.  Document it clearly, so people aren't
> trapped, and that should end the problems.

My impression is that the committee's biggest problem with this is that
retrofitting the Reiser interpretation into existing compilers is not
necessarily easy.  A good many compilers do *not* do the preprocessing
as a separate text-manipulation step first; their "preprocessors" are
integrated into the scanner, or following it.  Pulling tokens apart
again into text, and then reassembling them, isn't trivial.  The current
committee notion (substitution only on whole strings) avoids much of the
complexity of this.

One can argue that issues of current implementation are not significant,
that the future users should be the primary consideration.  This ignores
a nasty pragmatic consideration:  if the standard is going to fly,
standard-conforming implementations are going to have to be common.  It
would really be nice if existing compilers didn't have to be rewritten
from scratch.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

henry@utzoo.UUCP (Henry Spencer) (10/02/84)

> Great. Another tool that's nearly vital for writing C, but not available
> on most (all) non-Unix systems. Anybody got pointers to a public domain
> m4?

Gee, I've never found any of the stuff under discussion (token concatenation
and substitution inside strings) either "nearly vital" or even particularly
useful.  The new string-constant-concatenation feature would answer the one
or two places where I've wanted to use such things.

My point is not that these features aren't useful in some sense -- it is
quite possible that I just haven't encountered the particular situations
where they are useful -- but that they are not, in any realistic sense,
"nearly vital" for writing C.  The existence of active C programmers who
have never used them and don't miss them is notable, as is the existence
and continuing use of C compilers that don't implement them.

I don't know of a public-domain m4, but there are public-domain macro
processors of other kinds.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

bsa@ncoast.UUCP (Brandon Allbery) (10/04/84)

> Article <>
> From: mwm@ea.UUCP

> /***** ea:net.lang.c / utzoo!henry /  4:20 am  Sep 25, 1984 */
> As for what should be done to bring back the lost functionality...  the
> ANSI C folks have basically said "if you want a general-purpose macro
> processor, use m4".  The programs that this "change" will break are
> broken already, and should be fixed to do it right.
> -- 
> 				Henry Spencer @ U of Toronto Zoology
> 				{allegra,ihnp4,linus,decvax}!utzoo!henry
> /* ---------- */
> 
> Great. Another tool that's nearly vital for writing C, but not available
> on most (all) non-Unix systems. Anybody got pointers to a public domain
> m4?
> 
> 	<mike

Anybody got pointers to a sane ANSI committee?  We just got a C compiler
on CSUOHIO.BITNET (VM/370) and I intend to port quite a few of my compatible
(i.e. not based on Unix peculiarities) programs.  If the ANSI committee
thinks I'm going to use m4 on Unix and lose ALL portability, they've another
think coming.

--bsa

henry@utzoo.UUCP (Henry Spencer) (10/07/84)

>> Great. Another tool that's nearly vital for writing C, but not available
>> on most (all) non-Unix systems. Anybody got pointers to a public domain
>> m4?
>> 
>> 	<mike
>
>Anybody got pointers to a sane ANSI committee?  We just got a C compiler
>on CSUOHIO.BITNET (VM/370) and I intend to port quite a few of my compatible
>(i.e. not based on Unix peculiarities) programs.  If the ANSI committee
>thinks I'm going to use m4 on Unix and lose ALL portability, they've another
>think coming.

My personal impression is that the committee is saner than most of the
people flaming on this issue.  If they say "if you want a general-purpose
macro processor, use m4", all this means is that they are not able to solve
all the world's problems.  At some point, it is necessary to give up and
say "the tool we are trying to settle on is not powerful enough to solve
your problem".  Otherwise they never produce a standard, since the number
and complexity of problems that people would *like* their tool to solve
tends to grow without bound.

The committee, as nearly as I can see, is *not* crazy and is quite concerned
about portability.  They have simply judged that the problems that are under
discussion are (a) sufficiently uncommon, (b) sufficiently ill-understood,
and (c) sufficiently difficult, that attempting to solve them in the C
standard is inappropriate.  I agree.

Bear in mind that we do not **WANT** a C standard committee that is bent
on solving every possible problem.  The result would look nothing like C.
This has happened to other languages; ever looked at some of the recent
output from the ANSI BASIC effort?  If you are a serious C user, it is
appropriate for you to thank whatever gods you believe in that the ANSI
C committee hasn't gone that way.
-- 
	"If you ask for the moon, you may get the shaft instead."

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry