[comp.lang.c] Macro parameters getting substituted into strings

gnu@hoptoad.uucp (John Gilmore) (03/27/88)

I am not saying that I like the idea of macro parameter names being
replaced even inside strings.  My complaint is that an ANSI C compiler
does not have a capability that Unix C compilers have, and which many
Unix programs depend upon.  This is the capability to turn a single-
character argument into a character constant.  MOVING TO ANSI C REQUIRES
CHANGING ALL THE **USES**, AS WELL AS ALL THE **DEFINITIONS**, OF MACROS
THAT NEED THIS CAPABILITY!

In the BSD sources, Keith Bostic and I had to change more than 50 files
to deal with this.  The CPP on my binary A/UX, supposedly a System V,
replaces macro parameters inside strings.  I would like someone who has
System V sources to grep the sources for the CTRL macro -- I bet you
will find it there, using this technique.  If it's there, this ANSI
change is a "Quiet Change" to the compilers of both major Unix variants
(effectively, to ALL Unix implementations) and breaks dozens of the
application programs in both variants.  Can you say "Codifying existing
practice", boyz and goils?

My preferred way to fix this would be for ANSI C to allow the *
(indirection) and [] (subscripting) operators on string literals in
constant expressions.  Then the new ANSI "#" operator can be used to
create a character string, and * or [] can pull a character out of it,
all in a constant expression, e.g.:

#define CTRL(x) (# x [0]&0x1F)
...
	case CTRL(q):		turns to:	case ("q"[0]&0x1F):

I noticed that integer constant expressions are permitted to contain
subscripting in the new ANSI draft (Jan 88), but they aren't allowed to
contain string literals!  Inserting "string literals, " in lines 25
and 35 on page 56, and adding "except the values of string literals"
at the end of line 1 of page 57, would fix this.
-- 
{pyramid,ptsfa,amdahl,sun,ihnp4}!hoptoad!gnu			  gnu@toad.com
		"Watch me change my world..." -- Liquid Theatre

gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/28/88)

In article <4253@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>In the BSD sources, Keith Bostic and I had to change more than 50 files
>to deal with this.

Don't say that Berkeley wasn't informed of this nonportable usage
many years ago, before X3J11 in fact.  I recall that several of us
pointed it out, to no avail.  (In my System V emulation, when I
needed macros like Berkeley's _IOR, I defined mine the right way.)

>Can you say "Codifying existing practice", boyz and goils?

Can you say, "codifying existing correct practice"?

Macro substitution inside string literals and character constants was
explicitly disallowed by K&R, and many independent implementations of
C followed the rule.  For such implementations, requiring the Reiser
cpp behavior would have been a change to existing practice with
visible adverse effects on existing correct code, and there is more
support for their point of view than for that of Reiser cpp abusers.

>My preferred way to fix this would be for ANSI C to allow the *
>(indirection) and [] (subscripting) operators on string literals in
>constant expressions.

I could support this, or some char-ize operator, but I don't think
they'll make it into the standard at this point.  Char-ize had been
proposed before in several forms; I forget whether subscripted
string literals in constant expressions had ever been proposed.

Send it in.

ado@elsie.UUCP (Arthur David Olson) (03/28/88)

In article <4253@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:
> I am not saying that I like the idea of macro parameter names being
> replaced even inside strings.  My complaint is that an ANSI C compiler
> does not have a capability that Unix C compilers have, and which many
> Unix programs depend upon.  This is the capability to turn a single-
> character argument into a character constant.  MOVING TO ANSI C REQUIRES
> CHANGING ALL THE **USES**, AS WELL AS ALL THE **DEFINITIONS**, OF MACROS
> THAT NEED THIS CAPABILITY!

Here's an example (taken from "/usr/include/sys/ttychars.h") of how we coped
with the problem here at elsie; this avoids the need to change *most* uses;
you'll get complaints about the ones that do need special attention.

	#if !defined __STDC__ && !defined __DECUS_CPP__
	#define	CTRL(c)	('c'&037)
	#else /* defined __STDC__  || defined __DECUS_CPP__ */
	#ifndef LETR_a
	#define LETR_a	'a'
	#define LETR_b	'b'
	#define LETR_c	'c'
	#define LETR_d	'd'
	#define LETR_e	'e'
	#define LETR_f	'f'
	#define LETR_g	'g'
	#define LETR_h	'h'
	#define LETR_i	'i'
	#define LETR_j	'j'
	#define LETR_k	'k'
	#define LETR_l	'l'
	#define LETR_m	'm'
	#define LETR_n	'n'
	#define LETR_o	'o'
	#define LETR_p	'p'
	#define LETR_q	'q'
	#define LETR_r	'r'
	#define LETR_s	's'
	#define LETR_t	't'
	#define LETR_u	'u'
	#define LETR_v	'v'
	#define LETR_w	'w'
	#define LETR_x	'x'
	#define LETR_y	'y'
	#define LETR_z	'z'
	#define LETR_A	'A'
	#define LETR_B	'B'
	#define LETR_C	'C'
	#define LETR_D	'D'
	#define LETR_E	'E'
	#define LETR_F	'F'
	#define LETR_G	'G'
	#define LETR_H	'H'
	#define LETR_I	'I'
	#define LETR_J	'J'
	#define LETR_K	'K'
	#define LETR_L	'L'
	#define LETR_M	'M'
	#define LETR_N	'N'
	#define LETR_O	'O'
	#define LETR_P	'P'
	#define LETR_Q	'Q'
	#define LETR_R	'R'
	#define LETR_S	'S'
	#define LETR_T	'T'
	#define LETR_U	'U'
	#define LETR_V	'V'
	#define LETR_W	'W'
	#define LETR_X	'X'
	#define LETR_Y	'Y'
	#define LETR_Z	'Z'
	#endif /* !LETR_a */
	#define CTRL(c)	((LETR_ ## c) & 037)
	#endif /* defined __STDC__  || defined __DECUS_CPP__ */
-- 
olson@ncifcrf.gov	". . .that lucky ol' Sun ain't got nothin' to do. . ."

gnu@hoptoad.uucp (John Gilmore) (03/28/88)

ado@elsie.UUCP (Arthur David Olson) wrote:
> Here's an example (taken from "/usr/include/sys/ttychars.h") of how we coped
> with the problem here at elsie...
> 
> 	#if !defined __STDC__ && !defined __DECUS_CPP__
> 	#define	CTRL(c)	('c'&037)
> 	#else /* defined __STDC__  || defined __DECUS_CPP__ */
> 	#ifndef LETR_a
> 	#define LETR_a	'a'
> 	#define LETR_b	'b'
> 	#define LETR_c	'c'
>... 	#define LETR_y	'y'
> 	#define LETR_z	'z'
> 	#define LETR_A	'A'
> 	#define LETR_B	'B'
>... 	#define LETR_Z	'Z'
> 	#endif /* !LETR_a */
> 	#define CTRL(c)	((LETR_ ## c) & 037)
> 	#endif /* defined __STDC__  || defined __DECUS_CPP__ */

When I was 14 years old I wrote a program in an early BASIC (which
did not support characters at all, just numbers), that would mess around
with words (each character stored as a number) and print them out.
The printout routine looked like:

4000	if (i = 1) print "A";
4010	if (i = 2) print "B";
4020	if (i = 3) print "C";
4030	if (i = 4) print "D";
and so on...

I was hoping to never have to do that sort of thing again.
-- 
{pyramid,ptsfa,amdahl,sun,ihnp4}!hoptoad!gnu			  gnu@toad.com
		"Watch me change my world..." -- Liquid Theatre

jss@hector.UUCP (Jerry Schwarz) (03/30/88)

In article <4253@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>
>In the BSD sources, Keith Bostic and I had to change more than 50 files
>to deal with this.  The CPP on my binary A/UX, supposedly a System V,
>replaces macro parameters inside strings.  I would like someone who has
>System V sources to grep the sources for the CTRL macro -- I bet you
>will find it there, using this technique.  If it's there, this ANSI
>change is a "Quiet Change" to the compilers of both major Unix variants
>(effectively, to ALL Unix implementations) and breaks dozens of the
>application programs in both variants.  Can you say "Codifying existing
>practice", boyz and goils?

Once again.  The adoption of the standard will not break anything.
New compilers will break things.  But compiler vendors can support
compatibility modes for as long as they choose.

I grepped some old Sys V source.  The abominable CTRL occured 62
times all but 6 of them in vi.

Yes it is a quiet change.  The Rationale notes the quiet change that
substitution does not occur in strings.  It doesn't mention character
constants.  I don't know why.

karl@haddock.ISC.COM (Karl Heuer) (03/30/88)

In article <8035@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes:
>Here's an example (taken from "/usr/include/sys/ttychars.h") of how we coped
>with the problem [of ANSI C not having a charizing operator] here at elsie...
>	#define LETR_a	'a'	[ditto for all of 'a'..'z', 'A'..'Z']
>	#define CTRL(c)	((LETR_ ## c) & 037)

Since you have to edit the source file anyway, why don't you just do it right
in the first place?
	1,$s/CTRL(\(.\))/CTRL('\1')/g
	/^#define CTRL/s/.*/#define CTRL(c) ((c) & 037)/

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
(Actually I prefer ((c)^0x40), but that's irrelevant to this article.)

henry@utzoo.uucp (Henry Spencer) (03/31/88)

> ado@elsie.UUCP (Arthur David Olson) wrote:
> ...
> I was hoping to never have to do that sort of thing again.

But you don't have to, of course, because Arthur did it for you! :-)
-- 
"Noalias must go.  This is           |  Henry Spencer @ U of Toronto Zoology
non-negotiable."  --DMR              | {allegra,ihnp4,decvax,utai}!utzoo!henry

flaps@dgp.toronto.edu (Alan J Rosenthal) (04/05/88)

Discussing the removal of macro replacement inside strings, John
Gilmore complains that this will break dozens of application programs,
both in sysV and in BSD.

In article <10194@ulysses.homer.nj.att.com> jss@hector (Jerry Schwarz) writes:
>Once again.  The adoption of the standard will not break anything.
>New compilers will break things.  But compiler vendors can support
>compatibility modes for as long as they choose.

In this area, the compiler cannot support backward compatibility.  Given
a definition:

	#define a(b)"a b c"

what does a(x) substitute to?  ANSI says "a b c"; existing practice
says "a x c"; there is no way to accomodate both.

ajr

-- 
"Comment, Spock?"
"Very bad poetry, Captain."

mouse@mcgill-vision.UUCP (der Mouse) (04/10/88)

In article <7566@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> In article <4253@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>> [stuff about Reiser behavior its (ab)use]
>> My preferred way to fix this would be for ANSI C to allow the *
>> (indirection) and [] (subscripting) operators on string literals in
>> constant expressions.
> I could support this, or some char-ize operator, but I don't think
> they'll make it into the standard at this point.

Then we are faced with a loss of functionality.  Suddenly there is no
way to define _IO or CTRL that is compatible with existing usage.  It
is reasonable to require us to rewrite our macro definitions; it is not
reasonable to require us to rewrite all our uses of the macro.  The
committee apparently recognized this when they provided # and ##, to
preserve the functionality of Reiser cpp substitution in strings and
/**/, but we need either a charize operator analogous to # or (my
preference) making subscripted string literals into constant
expressions in order to preserve the functionality of Reiser
substitution within ''.

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/10/88)

In article <1039@mcgill-vision.UUCP> mouse@mcgill-vision.UUCP (der Mouse) writes:
>Then we are faced with a loss of functionality.  Suddenly there is no
>way to define _IO or CTRL that is compatible with existing usage.

There never was a guaranteed way to do this.  Some people found that
they could exploit a bug in the Reiser preprocessor so they did.  I
don't have a lot of sympathy for them, since when I was faced with
this choice I did it the portable way even though I was using only
Reiser preprocessors at the time.

jss@hector.UUCP (Jerry Schwarz) (04/11/88)

In article <1039@mcgill-vision.UUCP> mouse@mcgill-vision.UUCP writes:
>
>Then we are faced with a loss of functionality.  Suddenly there is no
>way to define _IO or CTRL that is compatible with existing usage.  It
>is reasonable to require us to rewrite our macro definitions; it is not
>reasonable to require us to rewrite all our uses of the macro.  

Its been several months since I made my comment: "the standard does
not break programs, new compilers break programs".   It seems
releveant here. If there is a loss of functionality it is because a
compiler vendor who provided this functionality in the past is
failing to provide it now.  If you want a portable program you can't
use this functionality, but that is not a change since in the past not
all compilers provided it.

Jerry Schwarz
Bell Labs, Murray Hill

mouse@mcgill-vision.UUCP (der Mouse) (04/12/88)

In article <3222@haddock.ISC.COM>, karl@haddock.ISC.COM (Karl Heuer) writes:
> In article <8035@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes:
>>	#define LETR_a	'a'	[ditto for all of 'a'..'z', 'A'..'Z']
>>	#define CTRL(c)	((LETR_ ## c) & 037)
> Since you have to edit the source file anyway, why don't you just do
> it right in the first place?
> 	1,$s/CTRL(\(.\))/CTRL('\1')/g
> 	/^#define CTRL/s/.*/#define CTRL(c) ((c) & 037)/

Because there is no "the" source file.  There may be many source files
involved.  Arthur's fix involves changing only the file in which CTRL
is defined; your fix invovles changing not only that file but also all
files which use it.  This may be impractical or even impossible.

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu