[comp.lang.c] char constant?

chris@mimsy.UUCP (Chris Torek) (04/15/88)

In article <5206@ihlpg.ATT.COM> tainter@ihlpg.ATT.COM (Tainter) writes:
-Is
-    "ABCD"[0]
-legal ANSI C?

Yes.  It is not, however, a constant expression (see section 3.4).

-If so then
-    (#x[0])
-becomes a 'charize' expression equivalent to a direct charize operator for
-all intents and purposes.

No.  For instance, "C"[0] is not legal as a case label.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

tainter@ihlpg.ATT.COM (Tainter) (04/18/88)

In article <11072@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> In article <5206@ihlpg.ATT.COM> tainter@ihlpg.ATT.COM (Tainter) writes:
> -Is "ABCD"[0] legal ANSI C?
> Yes.  It is not, however, a constant expression (see section 3.4).

Well then, fix the standard!  Quick before it gets cast in concrete!

There is no possbile excuse for this not being a constant expression.
Every component is a constant, why isn't the result a constant?

> In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
> Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@trantor.umd.edu (Chris Torek) (04/18/88)

>>In article <5206@ihlpg.ATT.COM> tainter@ihlpg.ATT.COM (Tainter) asks:
>>-Is "ABCD"[0] legal ANSI C?

>In article <11072@mimsy.UUCP> I answered:
>>Yes.  It is not, however, a constant expression (see section 3.4).

In article <5214@ihlpg.ATT.COM> tainter@ihlpg.ATT.COM (Tainter) replies:
>Well then, fix the standard!  Quick before it gets cast in concrete!

>Every component is a constant, why isn't the result a constant?

To a large extent, I happen to agree.  Constants should (almost?)
always be reduced to their simplest form at compile time.  There
is one argument against converting "x"[0] to 'x' at compile time;
it is rather weak:  Some C compilers---notably PCC variants---do
not carry strings about within the compiler.  Instead, they work
as follows:

When the lexical analyser sees a double-quote (`"') character, it
calls a special routine to collect strings.  If the compiler has
just begun the initialisation of an array of char, this routine
gathers each byte and drops it in the main initialised data space
(`.data' or `.data 0').  If not, it switches to the alternate
initialised data space (`.data 1') and generates a label, then
drops each byte in this space, then resumes the previous space
(instruction or main data).  This can be seen in the compiler output
for, e.g.,

	f() {
		static char str[] = "main data";
		char *p = "alternate data";
	}

	# saw function declaration, so generate prologue for f
	_f:
		.word	L12
	# (end-of-function code moved here by peephole optimiser)
		subl2	$4,sp
		.set	L12,0x0

	# saw `static char str[] = "': generate static local variable `str'
		.data		# main data space
	L16:			# str is L16
		.long	0x6e69616d	# "main"
		.long	0x74616420	# " dat"
		.long	0x61		# "a\0"
	# end of string

	# saw `char *p = "': generate anonymous string constant
		.data	1	# alternate data space
	L17:			# anonymous string is L17
		.ascii	"alternate data\0"
		.data		# resume previous space
		.text		# finish p = "..." initialisation:
		moval	L17,-4(fp) # p = L17

		ret		# end of f()

This method of handling anonymous aggregates, while expedient (the
compiler never carries more than one `thing' in its `head'), has
several unpleasant side effects.  One is that "text"[0] generates
code like

		cvtbl	L17,r11

rather than simply

		movl	't',r11

Another is that

	char *p1 = "hello", *p2 = "hello";

generates two separate strings that have the same text, rather than
making only one `hello\0' but making p1 and p2 both point to that.
Finally,

	f() { return (sizeof("hello")); }

compiles to the following mess:

	# some junk deleted
		.data	1
	L16:
		.ascii	"hello\0"
		.text
	_f:
		.word	0
		movl	$6,r0
		ret

Although the string itself is never used, it is still generated.

The latter two problems can be cured without changing the basic
anonymous aggregate string builder; the first cannot.
-- 
In-Real-Life: Chris Torek, Univ of MD Computer Science, +1 301 454 7163
Domain: chris@mimsy.umd.edu		Path: ...!uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (04/21/88)

In article <2569@umd5.umd.edu> chris@trantor.umd.edu (I, using another
machine over the weekend since this one had a bad cache address board) wrote:
>[PCC's] method of handling anonymous aggregates, while expedient (the
>compiler never carries more than one `thing' in its `head'), has
>several unpleasant side effects. ... Another is that

> 	char *p1 = "hello", *p2 = "hello";

>generates two separate strings that have the same text ....  [This] can
>be cured without changing the basic anonymous aggregate string builder ....

Oops.  This is almost absurd.  One could fix it in /lib/c2, perhaps, or
in a separate utility specially constructed for hacking over `.data 1'
constructs, but to do it in the compiler requires either backing up
the output file, or `carrying the string in its head' (perhaps in a
temporary file).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris