[net.lang.c] Is this correct action for the c compiler/preprocessor ??

mike@WISDOM.BITNET (Mike Trachtman) (10/30/85)

Consider the folowing short program.

#define BAD_SEGMENT     29
#define ERROR(number)   printf("Error number %d\n",number);

main() {
        /* typically there would be here some condition test
        if (somthing)
        */

        ERROR(BAD_SEGMENT);
}

what should this program do ????


On the compilers I have tried it, (vax 4.2, Vms C, Sun 2.0),
it has output of:

Error BAD_SEGMENT 29

rather than

Error number 29

which I would have expected.

Is this correct ???

I would think that anything inside double quotes is protected
from any/all substitution, and that the preprocessor, would
not do the parameter replacement.

Mike

                                Mike Trachtman
My address:

        mike@wisdom                             (BITNET)
        mike%wisdom.bitnet@wiscvm.ARPA          (ARPA/CSNET)
        mike%wisdom.bitnet@berkley              (ARPA/CSNET)
and if all else fails (ONLY for VERY short items)
        ...!decvax!humus!wisdom!mike            (UUCP)

dfuller.wbst@Xerox.ARPA (Dave) (10/31/85)

It also fails to work correctly on the C compiler supplied with Un*x
system V for the AT&T Un*x PC but it does work the opional LPI C
compiler; But don't ask me why?

< Dave >

arnold@ucsfcgl.UUCP (Ken Arnold%CGL) (11/02/85)

In article <2667@brl-tgr.ARPA> dfuller.wbst@Xerox.ARPA (Dave) writes:
>It also fails to work correctly on the C compiler supplied with Un*x
>system V for the AT&T Un*x PC but it does work the opional LPI C
>compiler; But don't ask me why?
>
>< Dave >

It does not "fail to work correctly" -- that's how it works.  It
allows you to say something like

	# define	Pval(var)	printf("var = %d\n", var)

	...
	pvar(Count);
	pvar(Errors);
	...

and get something like

	Count = 112
	Errors = 0

Being able to insert literal text in strings is very useful.  Even the
standards committee agreed, although they refused to continue to use
this method, since some preprocessors do *not* scan strings.  But
scanning strings for replacements is a feature, albeit a somewhat
unpopular one, which is purposefully used by many programs.

		Ken Arnold

P.S.  This was discussed at some length when several of us argued about
whether the standards committee was right to reject this method.  I
thought not, but others thought they were, and I don't want to start it
up again.

mouse@mcgill-vision.UUCP (der Mouse) (11/03/85)

> Consider the folowing short program.

> #define BAD_SEGMENT     29
> #define ERROR(number)   printf("Error number %d\n",number);
> main() {
>        ERROR(BAD_SEGMENT);
> }

> what should this program do ????

     This  is open  to interpretation.  All current preprocessors I know
of substitute  "number"  inside the double  quotes as  well.  Of course,
this is not serious; merely rewrite ERROR as

#define ERROR(n) printf("Error number %d\n",n);

     PS.  There really should be no  semicolon after the definition;  as
it stands the result of expanding

 ERROR(BAD_SEGMENT);

is

  printf("Error BAD_SEGMENT %d\n",BAD_SEGMENT);;

which makes a difference  if it  is the then clause of  an  if statement
without  braces (yes, Virginia,  there are people who write that sort of
statement).)

     This seems to be a  common  stumbling  block.   There  seems to  be
confusion  between   *expanding  macros*  inside  a  quoted  string  and
*replacing  macro  formals*  inside  a  quoted  string.    The  compiler
designers originally  chose (perhaps not conciously)  to replace formals
inside strings, which is in fact useful for debugging:

#define LOGINTEGER(i) printf("i = %d\n",i)

which not  only prints out the value desired but  prints  the expression
producing it, as in

 LOGINTEGER(table[index]);

which expands into

  printf("table[index] = %d\n",table[index]);

a nice feature.  Of course, the statement

 printf("LOGINTEGER(foo)\n");

will not be touched.  Macros are not expanded  inside double quotes; but
inside double quotes in the replacement string, formals  get changed  to
actuals.  This actually has more frequent use in a definition like

#define CTRLCHAR(c) ('c'&0x1f)

(used on  ASCII machines).   Here, it is necessary that c be substituted
inside the (single) quotes,  so  for uniformity if nothing else the same
should happen with double quotes.

> Is this correct ???

     Apparently it is de-facto correct.  Anyone with  a copy of the ANSI
C standard want to answer this?
-- 
					der Mouse

{ihnp4,decvax,akgua,etc}!utcsri!mcgill-vision!mouse
philabs!micomvax!musocs!mcgill-vision!mouse

Hacker: One responsible for destroying /
Wizard: One responsible for recovering it afterward

rcd@opus.UUCP (Dick Dunn) (11/04/85)

The question was whether the C preprocessor should substitute for an
occurrence of a macro formal within a string within the body of the
macro...
> >It also fails to work correctly on the C compiler supplied with Un*x
> >system V for the AT&T Un*x PC but it does work the opional LPI C
> >compiler; But don't ask me why?
>...
> It does not "fail to work correctly" -- that's how it works.  It
> allows you to say something like
> 
> 	# define	Pval(var)	printf("var = %d\n", var)
> 	...
> 	pvar(Count);
> 	pvar(Errors);
> ...
> Being able to insert literal text in strings is very useful.

The fact that a feature is "useful" is not sufficient argument that it is
correct.

Ken Arnold (>) continues with a discussion about what happened in the
standards committee--apparently they found it useful but didn't accept it.
Leaving aside what happened there, and leaving aside the usefulness of the
feature, the problem stems from the fact that the definition that most of
us use these days (K&R) says one thing:
	Text inside a string or a character constant is not subject to
	replacement.
...which is pretty explicit, but the compiler that a lot of us use
substitutes inside strings.  I would like to have an authoritative
definition and a correct compiler in accord with the definition.  Lacking
this, let's not throw too many stones; the compiler may not be wrong, but
it's not clearly right.

I'd like to know how the discrepancy came about--anyone care to fill me in
(by email, preferably).
-- 
Dick Dunn	{hao,ucbvax,allegra}!nbires!rcd		(303)444-5710 x3086
   ...Never attribute to malice what can be adequately explained by stupidity.

henry@utzoo.UUCP (Henry Spencer) (11/05/85)

> I would think that anything inside double quotes is protected
> from any/all substitution, and that the preprocessor, would
> not do the parameter replacement.

This is an ill-documented quirk of the Unix C preprocessor.  It is not
portable because many other C compilers don't do it, and X3J11 has
decided to provide the capability but with a different syntax.  Until
that happy day when a randomly-chosen C compiler has a high probability
of conforming to the ANSI soon-to-be-standard, the only safe thing to do
is to avoid writing macros in which something that looks like one of the
parameter names appears inside a string.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

sra@oddjob.UUCP (Scott R. Anderson) (11/05/85)

In article <326@mcgill-vision.UUCP> mouse@mcgill-vision.UUCP (der Mouse) writes:
>> #define BAD_SEGMENT     29
>> #define ERROR(number)   printf("Error number %d\n",number);
>> main() {
>>        ERROR(BAD_SEGMENT);
>> }
>
>All current preprocessors I know
>of substitute  "number"  inside the double  quotes as  well.  Of course,
>this is not serious; merely rewrite ERROR as
>
>#define ERROR(n) printf("Error number %d\n",n);

Maybe you want a formal argument besides "n"?  This expands into

       printf("Error number %d\BAD_SEGMENT",    29);;	(:-)
-- 

					Scott Anderson
					ihnp4!oddjob!kaos!sra

bc@cyb-eng.UUCP (Bill Crews) (11/07/85)

> The question was whether the C preprocessor should substitute for an
> occurrence of a macro formal within a string within the body of the
> macro...
>
> > Being able to insert literal text in strings is very useful.
> 
> The fact that a feature is "useful" is not sufficient argument that it is
> correct.
> 
>                                                the definition that most of
> us use these days (K&R) says one thing:
> 	Text inside a string or a character constant is not subject to
> 	replacement.
> ...which is pretty explicit, but the compiler that a lot of us use
> substitutes inside strings.  I would like to have an authoritative
> definition and a correct compiler in accord with the definition.
> -- 
> Dick Dunn

As I read K&R, one could define the following:

#define	COMMENT	/*
#define	TNEMMOC	*/

COMMENT
	This tells all about my great program.
TNEMMOC

Not that I would particularly WANT to, but it DOES seem like C compilers
should allow this kind of thing if K&R allows it.  It is interesting that
this kind of thing never came up as a problem so long as de facto standard
Unix compilers were used.  Only when a lot of people started writing compilers
to the K&R specification was it discovered (or considered significant) that
Unix compilers didn't always conform either.  My guess is that the weight of
experience with Unix compilers is greater than K&R's these days.
-- 
	- bc -

..!{seismo,topaz,gatech,nbires,ihnp4}!ut-sally!cyb-eng!bc  (512) 835-2266

lr@sftig.UUCP (L.Rosler) (11/16/85)

> The question was whether the C preprocessor should substitute for an
> occurrence of a macro formal within a string within the body of the
> macro...
>
> > Being able to insert literal text in strings is very useful.
> 
> The fact that a feature is "useful" is not sufficient argument that it is
> correct.
> 
>                                                the definition that most of
> us use these days (K&R) says one thing:
> 	Text inside a string or a character constant is not subject to
> 	replacement.
> ...which is pretty explicit, but the compiler that a lot of us use
> substitutes inside strings.  I would like to have an authoritative
> definition and a correct compiler in accord with the definition.
> -- 
> Dick Dunn

Having been involved in many aspects of this fiasco, I'll give a
capsule history.

The original C preprocessor, designed and implemented by Mike
Lesk of AT&T Bell Labs for the PDP-11, did not substitute inside strings
(hence, the disclaimer in K&R).

The preprocessor distributed with VAX UN*X, hence picked up by
UCBerkeley, was implemented by John Reiser.  In addition to being
much faster than the original, it included many "features"
which were documented only in a file /usr/src/cmd/cpp/README,
dated August 25, 1978 (after the publication of K&R).
The file is still there, though updated -- look and see!

Among the features included without a great deal of review
were the "magic disappearing comment" used to glue tokens
together (despite K&R p. 179 "...comments...serve to separate tokens")
and the issue at hand of substituting within strings
(and character constants, for that matter, though no one seems
to pay much attention to this part of the issue).  The only
justification for the latter seems to be K&R p. 207:
"Each occurrence of an identifier mentioned in the formal
parameter list of the definition is replaced by the corresponding
token string from the call.

When I championed these features before the ANSI X3J11 C Committee
(most of whom had implemented a preprocessor according to the K&R
description, not the UN*X code), I first had to convince the
Committee that they were useful.  Several UN*X headers and
Alan Feuer's "The C Puzzle Book" helped here.

But I could not convince the Committee that the way the
features were implemented was acceptable, despite the tons of
code that incorporated them.  Reliance on undocumented
(what README file?!?) capabilities of a particular implementation
which contravened the clear sense of the de facto standard did
not fall under the purview of the Committee's goal of not
breaking existing "valid" code.

Several syntaxes were proposed, some of which were as simple
to implement as a new directive "#defines," meaning in THIS
macro, substitute for identifiers inside strings.
But they all foundered on the simple point that there ARE
no identifiers inside strings!  Strings and identifiers are
each "tokens," and writing a grammar to parse strings into
tokens was considered too outrageous.

(Note that "tokens" can turn up in surprising places:

#define PRINT(s) printf("%s", s)

produces remarkable results on UN*X compilers.)

So the Committee resorted to invention: # identifier
meaning "stringize" the argument token-string substituted for
the identifier; and token1 ## token2
meaning concatenate the two tokens nearest the ## after
all other substitutions.  The latter will be easy to substitute
mechanically for /**/, but the former will require some work.
Each of them has some advantages over the UN*X way,
not the least of which is that they don't do violence to
the rest of the language.

Even though I'm not happy with the idea of standards
committees inventing solutions that invalidate existing
solutions, I buy into this case.  As Henry Spencer warns,
don't use the UN*X features, and wait for the ANSI Standard
to provide better ways.

Sorry to be so long-winded, but this history HAD to be told.

Larry Rosler, AT&T Information Systems
(Editor, ANSI X3J11 C STandards Committee)
ihnp4!attunix!lr, 201-522-5086