[comp.lang.c] Literal Strings in C

mack@inco.UUCP (06/10/87)

	There has recently been some discussion of literal strings
	in C in this group, and I thought I'd confuse the issue by
	pointing out a couple of real peculiarities. Both of the
	following statements are legal, executable C, at least to
	the Sun 3.2 C compiler, which is presumably based on PCC.

	c = "literal string"[i];

	"literal string"[i] = c;

	The first form is not unreasonable (saves a character pointer,
	anyway.)

	The second statement seems utterly useless. 

	Make of this what you will. Can anybody out there imagine a
	case where something like the second statement would be
	useful?

	Does the ANSI standard address this sort of thing?

	"C is not merely stranger than we imagine; it is stranger than
	we *can* imagine."
	-- 
------------------------------------------------------------------------------
  Dave Mack  (from Mack's Bedroom :<)
  McDonnell Douglas-Inco, Inc. 		DISCLAIMER: The opinions expressed
  8201 Greensboro Drive                 are my own and in no way reflect the
  McLean, VA 22102			views of McDonnell Douglas or its
  (703)883-3911				subsidiaries.
  ...!seismo!sundc!hadron!inco!mack
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

mcdaniel@uicsrd.UUCP (06/14/87)

> 	pointing out a couple of real peculiarities. Both of the
> 	following statements are legal, executable C, at least to
> 	the Sun 3.2 C compiler, which is presumably based on PCC.
>
> 	c = "literal string"[i];
>
> 	"literal string"[i] = c;

It's worse than that.  According to K&R, "[]" is not really an
operator;  it is an abbreviation:
	a[b] is equivalent to *(a+b)
and vice versa.  In other words, it's an abbreviation for pointer
arithmetic.  In C's arithmetic model + is commutative, so
	a[b] is equivalent to b[a]
I just compiled and ran this program:
	#include <stdio.h>
	main()
	{
		int i;
		i = 5;
		fprintf(stderr, "%c\n", "0123456789"[i]);
		fprintf(stderr, "%c\n", "0123456789"[5]);
		fprintf(stderr, "%c\n", i["0123456789"]);
		fprintf(stderr, "%c\n", 5["0123456789"]);
	}
and got output of
	5
	5
	5
	5
as expected.

--
Tim, the Bizarre and Oddly-Dressed Enchanter
Center for Supercomputing Research and Development
at the University of Illinois at Urbana-Champaign

UUCP:	 {ihnp4,seismo,pur-ee,convex}!uiucdcs!uicsrd!mcdaniel
ARPANET: mcdaniel%uicsrd@a.cs.uiuc.edu
CSNET:	 mcdaniel%uicsrd@uiuc.csnet
BITNET:	 mcdaniel@uicsrd.csrd.uiuc.edu

guy@sun.UUCP (06/14/87)

> 	Both of the following statements are legal, executable C, at least to
> 	the Sun 3.2 C compiler, which is presumably based on PCC.

It is.  They are, in fact, legal C according to both K&R and the ANSI C draft,
although the second statement may not be executable C according to
the ANSI C draft.

> 	"literal string"[i] = c;
> 
> 	(This) seems utterly useless.

That particular statement is unlikely to be useful, since no other
occurrence of "literal string" will be modified, and thus the
newly-modified value can't be accessed.

The fact that you *can* do that is a consequence of the definition of
character strings in C - they are just arrays of characters, and can
thus be treated just like any other array - just like the fact that
you can do

	c = i["literal string"];

(means the same thing as

	c = "literal string"[i];

) is a conseqence of the definition of subscripting in C.

> 	Does the ANSI standard address this sort of thing?

3.1.4 String literals

	...

	   A string literal has static storage duration and type
	``array of "char"'', ...

	   ...If the program attempts to modify a string literal, the
	behavior is undefined.

Since it doesn't say ``"const" array of "char"'', I presume this
means that statements of that sort are allowed, although the
implementation is not required to make them work.

It might be nice if C compilers were to offer an option that not only
attempted to put string literals in a non-writable portion of the
address space, but assigned them type "const char []", so that
attempts to modify them will be caught at compile time.  The function
"mktemp" in UNIX overwrites the template argument it is given, but
people sometimes do

	mktemp("/tmp/fooXXXXXX")

which will overwrite the string and return a pointer to it (this means you
*can* use the value of that string elsewhere), which won't work very
well at all if you can't write on string literals.  However, if
"mktemp" were declared as

	char *mktemp(char *template);

and string literals were of type "const char []", the compiler would
rightfully complain about the conversion of "const char *" (which is
what the "const char []" expression "/tmp/fooXXXXXX" would be
converted to) to "char *".
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

karl@haddock.UUCP (Karl Heuer) (06/15/87)

In article <4257@caip.rutgers.edu> brisco@caip.rutgers.edu (Thomas Paul Brisco) writes:
>[mack@inco.UUCP (Dave Mack) writes:]
>>c = "literal string"[i];
>>"literal string"[i] = c;
>>
>>The first form is not unreasonable (saves a character pointer, anyway.)
>>The second statement seems utterly useless. 
>
>The first form tends to be more than useful, it saves you not only the char*,
>but in the case of a series of string constants can be downright useful; such
>as:    #define TTYS "/dev/ttya\0/dev/ttyb"
>... Although I've (personally) used the second form as following
>	#define SCCDEV "/dev/scc?"
>	SCCDEV[strlen(SCCDEV) - 1] = inputdev;
>it should be noted as "non-portable" (for all that's worth).

This last usage is certainly dangerous, since a compiler may (and in a strict
pre-ANSI implementation, must) store each instance of the string literal in a
different location.

More importantly, it isn't necessary -- even for efficiency reasons.  Dave and
Thomas (and others, I think) have stated that using the string literal "saves
a pointer".  This is true if your alternative is to write
	char *SCCDEV = "/dev/scc?";
but the best way to write this is simply
	static char SCCDEV[] = "/dev/scc?";
which should be identical to the "#define" version, except that it forces the
strings to occupy the same storage (and is thus *more* efficient).  The only
"waste" is in the symbol table.

The version with the embedded \0 can also be written this way.  (I'll remain
silent on the issue of whether it should be written at all.)

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

peter@sugar.UUCP (Peter DaSilva) (06/18/87)

In article <212@inco.UUCP>, mack@inco.UUCP (Dave Mack) writes:
> 	c = "literal string"[i];
> 
> 	"literal string"[i] = c;
> 
> 	The second statement seems utterly useless. 

The (in)famous Ken Arnold actually used the second form in an incredibly
complex macro in (I think) an early version of curses. I'll not
embarrass him or me by attempting to accurately reproduce it, but it
looked something like:

	#define ASCII(c) (c<' ')?("^"[1]=c+'A',""-2):(""[0]=c,""-1)

> 	"C is not merely stranger than we imagine; it is stranger than
> 	we *can* imagine."

Maybe you, bubba, but not the veterans of the Great Self-reproducing War.

henry@utzoo.UUCP (Henry Spencer) (06/18/87)

> It might be nice if C compilers were to offer an option that not only
> attempted to put string literals in a non-writable portion of the
> address space, but assigned them type "const char []", so that
> attempts to modify them will be caught at compile time...

A sensible idea at first glance, and in fact at least one earlier draft
of X3J11 tried that.  The trouble is that making this work consistently is
hard:  people routinely assign the addresses of string literals to "char *"
pointers, so complaining about unconsting (to coin a word... ugh) will
produce a zillion complaints unless one is, somehow, very selective about
it.  As I recall, the situation now is that unconsting, except by explicit
cast, is illegal, which makes it impractical to make string literals const.

Actually, I suspect that "egrep mktemp" will pick up the vast majority of
the problem cases.
-- 
"There is only one spacefaring        Henry Spencer @ U of Toronto Zoology
nation on Earth today, comrade."   {allegra,ihnp4,decvax,pyramid}!utzoo!henry

amodeo@dataco.UUCP (Roy Amodeo) (10/18/90)

In article <2466@ux.acs.umn.edu> edh@ux.acs.umn.edu (Eric D. Hendrickson) writes:
>Basically, what I want to do is take a string of upper/lower case, and make
>it all upper case.  Here is a first try at it,
>
>#include <ctype.h>
>main()
>{
>	char *duh = "Hello";
	....
>		if (islower(*duh)) *duh = toupper(*duh);
	....

In the above segment of code, the literal string pointed to by 'duh' is
being modified in place. Is this portable according to the ANSI standard?

For our embedded system, we've asked the nice cross-compiler to put the
literal strings with the code and the const data because literal strings
are rarely modified. Since our code, const, and string area resides in
a memory location where writing is verboten in user state, any user program
that attempts to modify a literal string on our system will be shot for
trespassing. The above program would compile, but not run. To make it run,
the routine would actually have to copy the string being upcased into a
buffer:

	char*	from	= "Hello";
	char	buf[ sizeof( "Hello" ) ];
	char*	to	= buf;

	for( ; *from; from += 1, to += 1 )
		if ( islower( *from ) )
			*to = toupper( *from );
		else
			*to = *from;
	*to = '\0';

( Apologies if my coding style is offensive. It's designed to compensate
for my marginal observational skills. )

Another reason to not modify literal strings is that the compiler may be
smart enough to collapse identical literal strings:

	char*	s1	= "hello";
	char*	s2	= "hello";

In this case, s1 and s2 could have identical values. If literal strings are
modifiable, this space optimization is a bad idea. ( In practice, it doesn't
seem to gain a lot of space anyway, so I wouldn't be surprised if most
compilers don't. However I seem to remember a UNIX utility that you could
run on a program to coalesce identical literal strings in this fashion
if you wanted this optimization. )

What does current practice dictate on this?

>			Eric Hendrickson
>-- 
>/----------"Oh carrots are divine, you get a dozen for dime, its maaaagic."--
>|Eric (the "Mentat-Philosopher") Hendrickson	  Academic Computing Services
>|edh@ux.acs.umn.edu	   The game is afoot!	      University of Minnesota
>\-"What does 'masochist' and 'amnesia' mean?   Beats me, I don't remember."--

rba iv		- signatures? We don't need no stinkin' signatures!

amodeo@dataco

karl@haddock.ima.isc.com (Karl Heuer) (10/19/90)

In article <256@dcsun21.dataco.UUCP> amodeo@dcsun03.UUCP (Roy Amodeo,DC ) writes:
>>	char *duh = "Hello"; ... *duh = ...
>
>In the above segment of code, the literal string pointed to by 'duh' is
>being modified in place. Is this portable according to the ANSI standard?

No.  String literals may be shared and read-only.  (K&R 1 explicitly said
otherwise, which was a botch; I choose to interpret it as having described a
particular implementation rather than the language itself.)

>What does current practice dictate on this?

Some compilers make them writable and separate; others make them read-only and
shared.  The latter compilers usually have an option to yield the former
behavior, since some existing code depends on it.

Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint