[comp.lang.c] how widespread is this cpp bug?

markhall@pyramid.pyramid.com (Mark Hall) (12/01/88)

The following code compiles and runs on pyramid, att-3b2, and sun3:

	#include <stdio.h>
	main() 
	{
		prin/* comment in the middle */tf ( "Hello World.\n" );
	}

But, according to K&R pg. 179:

	``... comments [...] as described below
	are ignored except as they serve to separate tokens.''

So the above program is actually in error, as `prin' and `tf' should
be two separate tokens.  I was going to fix our cpp until I realized 
how pervasive the bug is.   Looking into the June draft of the standard
I see that they have addressed this problem explicitly, and mention that
all comments are to be replaced with a single space.  I hear that
this `feature' is used for gluing togehter tokens, as in:

#define VERSION 2
main() {
	proc/**/VERSION( a,b,c );
}

which, given the buggy cpp, will produce:

main() {
	proc2( a,b,c );
}

Does your cpp have this `feature'?  Anyone know the history?  I
suspect that AT&T and SUN know about this, but have chosen not to fix it.
Anyone know why?

-Mark Hall (smart mailer): markhall@pyramid.pyramid.com
	   (uucp paths): {ames|decwrl|sun|seismo}!pyramid!markhall

kchen@Apple.COM (Kok Chen) (12/01/88)

In article <49179@pyramid.pyramid.com> markhall@pyramid.UUCP (Mark Hall) writes:
>The following code compiles and runs on pyramid, att-3b2, and sun3:
>
>	#include <stdio.h>
>	main() 
>	{
>		prin/* comment in the middle */tf ( "Hello World.\n" );
>	}
> ...
>
>Does your cpp have this `feature'?  Anyone know the history?  I
>suspect that AT&T and SUN know about this, but have chosen not to fix it.
>Anyone know why?
>

Boy, have I seen this "feature" abused! (Greenhills 68k accepts it, the 
last I looked.)  The worst abuse was of the form:

#define	FOO( x,y )    foo/**/x( y )

main()
{
    FOO( bar, 1 ) ;
    FOO( baz, 2 ) ;
}

foobar( z )
{
}

foobaz( z )
{
}

Readability?  Half of the folks who encountered that segment of code
asked what it did.  When they found out, they questioned the sanity of
the author.


Kok Chen			{decwrl,sun}!kchen
Apple Computer, Inc. 

gandalf@csli.STANFORD.EDU (Juergen Wagner) (12/01/88)

Sun4 (SunOS 4.0):		bug!
HP 9000/320 (HP-UX 6.0):	bug!
VAX 8700 (Ultrix):		bug!

So I tried one of our old TOPS-20 machines, which happily reported
    Error at main+4, line 6 of x.c:
      prin/*comment*/tf(
    Undefined symbol: "prin"

    Error at main+4, line 6 of x.c:
      prin/*comment*/tf(
    Expected token (semicolon) not found

    Error at main+6, line 8 of x.c:
      proc/**/VERSION
    Undefined symbol: "proc"

    Error at main+6, line 8 of x.c:
      proc/**/VERSION
    Expected token (semicolon) not found
    ?4 error(s) detected

People relying on this bug should change their habits. There are better ways
to concatenate tokens.

-- 
Juergen Wagner		   			gandalf@csli.stanford.edu
						 wagner@arisia.xerox.com

kchen@Apple.COM (Kok Chen) (12/02/88)

In <6625@csli.STANFORD.EDU> wagner@arisia.xerox.com (Juergen Wagner) writes:
>Sun4 (SunOS 4.0):		bug!
>HP 9000/320 (HP-UX 6.0):	bug!
>VAX 8700 (Ultrix):		bug!
>
>So I tried one of our old TOPS-20 machines, which happily reported
>    Error at main+4, line 6 of x.c:
>      prin/*comment*/tf(
>    Undefined symbol: "prin"
>...

It should be pointed out that there are at least two C compilers in common
use on TOPS-20.  One is the Utah port of pcc that probably propagated this
"bug."  The other one started out as a "home grown" at Stanford (nowadays
called "kcc") and was not based on pcc but simply on what was gleamed 
off of K&R 1st Ed.  This latter was probably the one reported above (the 
format of the error messages looked familiar...).

The Stanford one did not even have a separate preprocessor (cpp) phase.  A 
single symbol table manager handled everything - macros, reserved words, 
identifiers, types, etc.  That is probably what caused it to reject the 
"prin/**/tf" construct (i.e., the lexical analyser stopped scanning 
"prin/**/tf" when it saw the first slash, returning the lexeme "prin").
I know for a fact :-) that nothing special was done to purposely reject the
"prin/**/tf" hack.

(The original version did, however, *purposely disallowed* goto's! :-) :-)


Kok Chen				{decwrl,sun}!apple!kchen
Apple Computer, Inc.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (12/02/88)

In article <6625@csli.STANFORD.EDU> wagner@arisia.xerox.com (Juergen Wagner) writes:
>      proc/**/VERSION
>People relying on this bug should change their habits. There are better ways
>to concatenate tokens.

No, for Reiser-based preprocessors there aren't any better ways.
ANSI-style token pasting is fairly new, and many C implementations
in current use do not support it.

henry@utzoo.uucp (Henry Spencer) (12/02/88)

In article <49179@pyramid.pyramid.com> markhall@pyramid.UUCP (Mark Hall) writes:
>		prin/* comment in the middle */tf ( "Hello World.\n" );
>
>...Does your cpp have this `feature'?  Anyone know the history?  I
>suspect that AT&T and SUN know about this, but have chosen not to fix it.
>Anyone know why?

This token-concatenation technique is a quirk (quirk, n:  accidental and
unintended behavior that is not clearly a bug and may be useful) of the
Reiser cpp implementation, which is universal in AT&T-derived compilers
and virtually nonexistent elsewhere.  It has seen enough use to make
folks who already have it reluctant to drop support for it, but it is
a quirk of specific compilers and was never documented as a property of
the language.

X3J11 has provided the same capability in a cleaner and more portable way
(the Reiser trick does not work in tokenizing preprocessors) with their ##
operator.  Ugh.
-- 
SunOSish, adj:  requiring      |     Henry Spencer at U of Toronto Zoology
32-bit bug numbers.            | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

khera@romeo.cs.duke.edu (Vick Khera) (12/02/88)

In article <49179@pyramid.pyramid.com> markhall@pyramid.UUCP (Mark Hall) writes:
>The following code compiles and runs on pyramid, att-3b2, and sun3:
> ...
>But, according to K&R pg. 179:
>
>	``... comments [...] as described below
>	are ignored except as they serve to separate tokens.''
>
> ...
>I hear that
>this `feature' is used for gluing togehter tokens, as in:
>
>#define VERSION 2
>main() {
>	proc/**/VERSION( a,b,c );
>}
>
>-Mark Hall (smart mailer): markhall@pyramid.pyramid.com
>	   (uucp paths): {ames|decwrl|sun|seismo}!pyramid!markhall


I have used this ``feature'' to simplify having to write a bunch of
duplicate code with a macro. what i needed was a bunch of buttons that had
a particular label and when pressed, would call the function with a name
based on the button label. for example, the button labeled ``inc'' would
call inc_proc().  the comment is used to delimit the tokens as far as the
pre-processor is concerned, but when the compiler gets it, it needs to be
one token.  how else would this macro be constructed? 

excerpts from a sunview application:

-----

#define BUTTON_WIDTH 8	/* width for command buttons */
#define cmd_button(fnc) \
	panel_create_item(bs_panel, PANEL_BUTTON, \
		PANEL_LABEL_IMAGE,	panel_button_image(bs_panel, \
						"fnc",BUTTON_WIDTH,0), \
		PANEL_LABEL_BOLD,	TRUE, \
		PANEL_NOTIFY_PROC,	fnc/**/_proc, \
		0)


main(argc,argv)
int argc;
char *argv[];
{

[ bunches of window creating code delted... ]

	cmd_button(ali);	/* create the actual buttons */
	cmd_button(comp);
	cmd_button(forw);
	cmd_button(inc);
	cmd_button(msgchk);
	cmd_button(next);
	cmd_button(prev);
	cmd_button(refile);
	cmd_button(repl);
	cmd_button(rmm);
	cmd_button(scan);
	cmd_button(show);
	cmd_button(sortm);
	cmd_button(folders);
	cmd_button(rne);	/* rmm;next */
	cmd_button(rpr);	/* rmm;prev */

[ bunches more code deleted. ]

}
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
ARPA:	khera@cs.duke.edu		Department of Computer Science
CSNET:	khera@duke        		Duke University
UUCP:	decvax!duke!khera		Durham, NC 27706

guy@auspex.UUCP (Guy Harris) (12/02/88)

>I hear that this `feature' is used for gluing togehter tokens...

Yes, it is.

>Does your cpp have this `feature'?

If your "cpp" is based on the "Reiser" "cpp", which first appeared
publicly in V7 and is the basis of the C preprocessor code in most UNIX
C implementations, it probably works that way.

>I suspect that AT&T and SUN know about this, but have chosen not to fix
>it.

Berkeley definitely knows about it, since they make use of it in some
places.  Sun knows about it as well, and makes use of it in places where
they inherited it from Berkeley.  AT&T may well know about it as well,
and plenty of other organizations probably do as well (I suspect most,
if not all, of the ones who started with BSD do).

>Anyone know why?

Probably because it's one of the more convenient ways to glue tokens
together if you don't have the dpANS "#" and "##" operators.  It, and
other non-dpANS-conformant Reiserisms, are unlikely to disappear until
dpANS-conformant compilers, or ANSI C compilers once the standard is
official, become more common; when they do, those Reisersms will
disappear (except maybe in "compatibility mode" precisely because they
will *be* non-conformant (barring major surprises in the evolution of
ANSI C). 

gvb@tnoibbc.UUCP (Gerlach van Beinum) (12/02/88)

	One way we use this 'bug' is in Fortran callable C-programs.

	Most Fortran implementations in unix make external names

	of subroutines and functions by adding an '_' at the end.

	So if you want to write a Fortran callable subroutine foo

	you have to call it foo_(). You can use the bug in the

	following way :

	#define F77_TRAILER	_

	foo/**/F77_TRAILER()
	{
	....
	....
	}

	By changing the define F77_TRAILER you can remove the

	trailer from all the names


		Gerlach van Beinum

		     TNO-IBBC

		   gvb@tnoibbc

daveh@marob.MASA.COM (Dave Hammond) (12/02/88)

In article <49179@pyramid.pyramid.com> markhall@pyramid.UUCP (Mark Hall) writes:
>The following code compiles and runs on pyramid, att-3b2, and sun3:
>
>	#include <stdio.h>
>	main() 
>	{
>		prin/* comment in the middle */tf ( "Hello World.\n" );
>	}
>
>But, according to K&R pg. 179:
>
>	``... comments [...] as described below
>	are ignored except as they serve to separate tokens.''
>
>So the above program is actually in error, as `prin' and `tf' should
>Does your cpp have this `feature'?

On Xenix 386 (SCO 2.3.1), cpp gets it right:

------------------------------ snip snip ------------------------------
#include <stdio.h>

main(argc, argv)
int argc; char *argv[];
{
prin/*comment*/tf ("hello, world\n");
}
------------------------------ snip snip ------------------------------

$ cc foo.c
foo.c
foo.c(7) : error 65: 'prin' : undefined
foo.c(7) : error 61: syntax error : identifier 'tf'
$

--
Dave Hammond
...!uunet!masa.com!{marob,dsix2}!daveh

cjc@ulysses.homer.nj.att.com (Chris Calabrese[mav]) (12/02/88)

In article <12967@duke.cs.duke.edu>, khera@romeo.cs.duke.edu (Vick Khera) writes:
| I have used this ``feature'' to simplify having to write a bunch of
| duplicate code with a macro. what i needed was a bunch of buttons that had
| a particular label and when pressed, would call the function with a name
| based on the button label. for example, the button labeled ``inc'' would
| call inc_proc().  the comment is used to delimit the tokens as far as the
| pre-processor is concerned, but when the compiler gets it, it needs to be
| one token.  how else would this macro be constructed? 
| 
| excerpts from a sunview application:
| 
| #define BUTTON_WIDTH 8	/* width for command buttons */
| #define cmd_button(fnc) \
| 	panel_create_item(bs_panel, PANEL_BUTTON, \
| 		PANEL_LABEL_IMAGE,	panel_button_image(bs_panel, \
| 						"fnc",BUTTON_WIDTH,0), \
| 		PANEL_LABEL_BOLD,	TRUE, \
| 		PANEL_NOTIFY_PROC,	fnc/**/_proc, \
| 		0)
| 
| main(argc,argv)
| int argc;
| char *argv[];
| {
| 
| [ bunches of window creating code delted... ]
| 
| 	cmd_button(ali);	/* create the actual buttons */
| 	cmd_button(rne);	/* rmm;next */
| 	cmd_button(rpr);	/* rmm;prev */
| 
| [ bunches more code deleted. ]

You do this with pointers to functions of course.

void	cmd_button(char	*label, void	(*function)())
	{
	panel_create_item(bs_panel, PANEL_BUTTON,
		PANEL_LABEL_IMAGE,
		panel_button_image(bs_panel, label, BUTTON_WIDTH,0),
		PANEL_LABEL_BOLD,	TRUE,
		PANEL_NOTIFY_PROC,	function,
		0)

main() {
	extern	void	ali_proc();
	...
	cmd_button("ali", ali_proc);
	...
	}

You could also do it as a macro, or you could also use
an optimizing compiler which will put the call to cmd_button
inline if you tell it to.
-- 
	Christopher J. Calabrese
	AT&T Bell Laboratories
	att!ulysses!cjc		cjc@ulysses.att.com

daveh@cbmvax.UUCP (Dave Haynie) (12/03/88)

in article <49179@pyramid.pyramid.com>, markhall@pyramid.pyramid.com (Mark Hall) says:
> Summary: ``whitespace separates tokens''

> 
> 	#include <stdio.h>
> 	main() {
> 		prin/* comment in the middle */tf ( "Hello World.\n" );
> 	}

[...]

> Does your cpp have this `feature'?  Anyone know the history?  I
> suspect that AT&T and SUN know about this, but have chosen not to fix it.

I found it to be present on two Amiga compilers, Manx V3.6a and Lattice
C++ V1.0 (which is based on AT&T's cfront V1.1a, though the cpp program
itself could be based on Lattice's or AT&T's, that's not made clear in
the documentation).

> -Mark Hall (smart mailer): markhall@pyramid.pyramid.com


-- 
Dave Haynie  "The 32 Bit Guy"     Commodore-Amiga  "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: D-DAVE H     BIX: hazy
              Amiga -- It's not just a job, it's an obsession

dg@lakart.UUCP (David Goodenough) (12/03/88)

From article <9026@smoke.BRL.MIL>, by gwyn@smoke.BRL.MIL (Doug Gwyn ):
> In article <6625@csli.STANFORD.EDU> wagner@arisia.xerox.com (Juergen Wagner) writes:
>>      proc/**/VERSION
>>People relying on this bug should change their habits. There are better ways
>>to concatenate tokens.
> 
> No, for Reiser-based preprocessors there aren't any better ways.
> ANSI-style token pasting is fairly new, and many C implementations
> in current use do not support it.

% cat snark.c:
#define	proc()	proc

#define	grunt(baz)	proc()baz

main()
 {
    grunt(snarf);
 }

%cc -E snark.c
# 1 "snark.c"

main()
 {
    		procsnarf;
 }

%
Works for me.

Now, how it's used is open to discussion: I have seen it used when grunt()
becomes a procedure like macro that actually is a template for generating
a lot of copies of a structure.

Just out of idle curiosity, what is a Reiser-base preprocessor?
-- 
	dg@lakart.UUCP - David Goodenough		+---+
							| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@harvard.harvard.edu	  	  +---+

rick@kimbal.UUCP (Rick Kimball) (12/04/88)

From article <408@marob.MASA.COM>, by daveh@marob.MASA.COM (Dave Hammond):
> In article <49179@pyramid.pyramid.com> markhall@pyramid.UUCP (Mark Hall) writes:
>The following code compiles and runs on pyramid, att-3b2, and sun3:
>
>	#include <stdio.h>
>	main() 
>	{
>		prin/* comment in the middle */tf ( "Hello World.\n" );
>	}
>
On AT&T UNIX-PC SYSTEM V V3. 51 the standard cpp gets it wrong
however, GCC V1.30 gets it right:

$ gcc -O 6449.c -o 6449
6449.c: In function main:
6449.c:7: undeclared variable `prin' (first use here)
6449.c:7: parse error before `tf'

-- 
____________________________________________________________________________
Rick Kimball | Mac Source BBS, Altamonte Springs, FL     DATA (407) 862-6214
             |                                          VOICE (407) 788-6875
UUCP: rick@kimbal ..!gatech!fabscal!kimbal!rick ..!ucf-cs!sdgsun!kimbal!rick

ath@helios.prosys.se (Anders Thulin) (12/05/88)

In article <49179@pyramid.pyramid.com> markhall@pyramid.UUCP (Mark Hall) writes:
> [deleted stuff about "prin/* ... */tf" feature]
>
>Does your cpp have this `feature'?  Anyone know the history?  I
>suspect that AT&T and SUN know about this, but have chosen not to fix it.
>Anyone know why?

The first release of Norcroft ANSI C compiler for the Acorn Archimedes
did not have it. The second release supports it as an option. The help
text says something about PCC compatibility ...



-- 
Anders Thulin			INET : ath@prosys.se
ProgramSystem AB		UUCP : ...!{uunet,mcvax}!enea!prosys!ath
Teknikringen 2A			PHONE: +46 (0)13 21 40 40
S-583 30 Linkoping, Sweden	FAX  :

vfm6066@dsacg3.UUCP (John A. Ebersold) (12/05/88)

Compiles and runs on a Gould 9050 (UTX 32 1.2)



-- 
John A. Ebersold,     Defense Logistics Agency, DSAC-FM         Autovon 850-5923
      of              3990 E Broad St		       Commercial 1-614-238-5923
Unify Corporation at  Columbus, Ohio 43216-5002    osu-cis!dsacg1!dsacg3!vfm6066
						   lll-tis/

john@frog.UUCP (John Woods) (12/06/88)

In article <12967@duke.cs.duke.edu>, khera@romeo.cs.duke.edu (Vick Khera) writes:
> >#define VERSION 2
> >main() {
> >	proc/**/VERSION( a,b,c );
> >}
> I have used this ``feature'' to simplify having to write a bunch of
> duplicate code with a macro... how else would this macro be constructed? 
> 
In ANSI C, you use the ## operator.  In some existing C's, you just simply
can't construct such a macro, and if you ever want to use one of those C's,
you're out of luck.

The last time I wanted to do such a thing, I used M4 to generate the macros.
It was much more flexible, anyway.

-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

Go be a `traves wasswort.		- Doug Gwyn

kgordon@brandx.rutgers.edu (Ken Gordon) (12/14/88)

q