mack@inco.UUCP (06/10/87)
There has recently been some discussion of literal strings in C in this group, and I thought I'd confuse the issue by pointing out a couple of real peculiarities. Both of the following statements are legal, executable C, at least to the Sun 3.2 C compiler, which is presumably based on PCC. c = "literal string"[i]; "literal string"[i] = c; The first form is not unreasonable (saves a character pointer, anyway.) The second statement seems utterly useless. Make of this what you will. Can anybody out there imagine a case where something like the second statement would be useful? Does the ANSI standard address this sort of thing? "C is not merely stranger than we imagine; it is stranger than we *can* imagine." -- ------------------------------------------------------------------------------ Dave Mack (from Mack's Bedroom :<) McDonnell Douglas-Inco, Inc. DISCLAIMER: The opinions expressed 8201 Greensboro Drive are my own and in no way reflect the McLean, VA 22102 views of McDonnell Douglas or its (703)883-3911 subsidiaries. ...!seismo!sundc!hadron!inco!mack ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
mcdaniel@uicsrd.UUCP (06/14/87)
> pointing out a couple of real peculiarities. Both of the > following statements are legal, executable C, at least to > the Sun 3.2 C compiler, which is presumably based on PCC. > > c = "literal string"[i]; > > "literal string"[i] = c; It's worse than that. According to K&R, "[]" is not really an operator; it is an abbreviation: a[b] is equivalent to *(a+b) and vice versa. In other words, it's an abbreviation for pointer arithmetic. In C's arithmetic model + is commutative, so a[b] is equivalent to b[a] I just compiled and ran this program: #include <stdio.h> main() { int i; i = 5; fprintf(stderr, "%c\n", "0123456789"[i]); fprintf(stderr, "%c\n", "0123456789"[5]); fprintf(stderr, "%c\n", i["0123456789"]); fprintf(stderr, "%c\n", 5["0123456789"]); } and got output of 5 5 5 5 as expected. -- Tim, the Bizarre and Oddly-Dressed Enchanter Center for Supercomputing Research and Development at the University of Illinois at Urbana-Champaign UUCP: {ihnp4,seismo,pur-ee,convex}!uiucdcs!uicsrd!mcdaniel ARPANET: mcdaniel%uicsrd@a.cs.uiuc.edu CSNET: mcdaniel%uicsrd@uiuc.csnet BITNET: mcdaniel@uicsrd.csrd.uiuc.edu
guy@sun.UUCP (06/14/87)
> Both of the following statements are legal, executable C, at least to > the Sun 3.2 C compiler, which is presumably based on PCC. It is. They are, in fact, legal C according to both K&R and the ANSI C draft, although the second statement may not be executable C according to the ANSI C draft. > "literal string"[i] = c; > > (This) seems utterly useless. That particular statement is unlikely to be useful, since no other occurrence of "literal string" will be modified, and thus the newly-modified value can't be accessed. The fact that you *can* do that is a consequence of the definition of character strings in C - they are just arrays of characters, and can thus be treated just like any other array - just like the fact that you can do c = i["literal string"]; (means the same thing as c = "literal string"[i]; ) is a conseqence of the definition of subscripting in C. > Does the ANSI standard address this sort of thing? 3.1.4 String literals ... A string literal has static storage duration and type ``array of "char"'', ... ...If the program attempts to modify a string literal, the behavior is undefined. Since it doesn't say ``"const" array of "char"'', I presume this means that statements of that sort are allowed, although the implementation is not required to make them work. It might be nice if C compilers were to offer an option that not only attempted to put string literals in a non-writable portion of the address space, but assigned them type "const char []", so that attempts to modify them will be caught at compile time. The function "mktemp" in UNIX overwrites the template argument it is given, but people sometimes do mktemp("/tmp/fooXXXXXX") which will overwrite the string and return a pointer to it (this means you *can* use the value of that string elsewhere), which won't work very well at all if you can't write on string literals. However, if "mktemp" were declared as char *mktemp(char *template); and string literals were of type "const char []", the compiler would rightfully complain about the conversion of "const char *" (which is what the "const char []" expression "/tmp/fooXXXXXX" would be converted to) to "char *". -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)
karl@haddock.UUCP (Karl Heuer) (06/15/87)
In article <4257@caip.rutgers.edu> brisco@caip.rutgers.edu (Thomas Paul Brisco) writes: >[mack@inco.UUCP (Dave Mack) writes:] >>c = "literal string"[i]; >>"literal string"[i] = c; >> >>The first form is not unreasonable (saves a character pointer, anyway.) >>The second statement seems utterly useless. > >The first form tends to be more than useful, it saves you not only the char*, >but in the case of a series of string constants can be downright useful; such >as: #define TTYS "/dev/ttya\0/dev/ttyb" >... Although I've (personally) used the second form as following > #define SCCDEV "/dev/scc?" > SCCDEV[strlen(SCCDEV) - 1] = inputdev; >it should be noted as "non-portable" (for all that's worth). This last usage is certainly dangerous, since a compiler may (and in a strict pre-ANSI implementation, must) store each instance of the string literal in a different location. More importantly, it isn't necessary -- even for efficiency reasons. Dave and Thomas (and others, I think) have stated that using the string literal "saves a pointer". This is true if your alternative is to write char *SCCDEV = "/dev/scc?"; but the best way to write this is simply static char SCCDEV[] = "/dev/scc?"; which should be identical to the "#define" version, except that it forces the strings to occupy the same storage (and is thus *more* efficient). The only "waste" is in the symbol table. The version with the embedded \0 can also be written this way. (I'll remain silent on the issue of whether it should be written at all.) Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
peter@sugar.UUCP (Peter DaSilva) (06/18/87)
In article <212@inco.UUCP>, mack@inco.UUCP (Dave Mack) writes: > c = "literal string"[i]; > > "literal string"[i] = c; > > The second statement seems utterly useless. The (in)famous Ken Arnold actually used the second form in an incredibly complex macro in (I think) an early version of curses. I'll not embarrass him or me by attempting to accurately reproduce it, but it looked something like: #define ASCII(c) (c<' ')?("^"[1]=c+'A',""-2):(""[0]=c,""-1) > "C is not merely stranger than we imagine; it is stranger than > we *can* imagine." Maybe you, bubba, but not the veterans of the Great Self-reproducing War.
henry@utzoo.UUCP (Henry Spencer) (06/18/87)
> It might be nice if C compilers were to offer an option that not only > attempted to put string literals in a non-writable portion of the > address space, but assigned them type "const char []", so that > attempts to modify them will be caught at compile time... A sensible idea at first glance, and in fact at least one earlier draft of X3J11 tried that. The trouble is that making this work consistently is hard: people routinely assign the addresses of string literals to "char *" pointers, so complaining about unconsting (to coin a word... ugh) will produce a zillion complaints unless one is, somehow, very selective about it. As I recall, the situation now is that unconsting, except by explicit cast, is illegal, which makes it impractical to make string literals const. Actually, I suspect that "egrep mktemp" will pick up the vast majority of the problem cases. -- "There is only one spacefaring Henry Spencer @ U of Toronto Zoology nation on Earth today, comrade." {allegra,ihnp4,decvax,pyramid}!utzoo!henry
amodeo@dataco.UUCP (Roy Amodeo) (10/18/90)
In article <2466@ux.acs.umn.edu> edh@ux.acs.umn.edu (Eric D. Hendrickson) writes: >Basically, what I want to do is take a string of upper/lower case, and make >it all upper case. Here is a first try at it, > >#include <ctype.h> >main() >{ > char *duh = "Hello"; .... > if (islower(*duh)) *duh = toupper(*duh); .... In the above segment of code, the literal string pointed to by 'duh' is being modified in place. Is this portable according to the ANSI standard? For our embedded system, we've asked the nice cross-compiler to put the literal strings with the code and the const data because literal strings are rarely modified. Since our code, const, and string area resides in a memory location where writing is verboten in user state, any user program that attempts to modify a literal string on our system will be shot for trespassing. The above program would compile, but not run. To make it run, the routine would actually have to copy the string being upcased into a buffer: char* from = "Hello"; char buf[ sizeof( "Hello" ) ]; char* to = buf; for( ; *from; from += 1, to += 1 ) if ( islower( *from ) ) *to = toupper( *from ); else *to = *from; *to = '\0'; ( Apologies if my coding style is offensive. It's designed to compensate for my marginal observational skills. ) Another reason to not modify literal strings is that the compiler may be smart enough to collapse identical literal strings: char* s1 = "hello"; char* s2 = "hello"; In this case, s1 and s2 could have identical values. If literal strings are modifiable, this space optimization is a bad idea. ( In practice, it doesn't seem to gain a lot of space anyway, so I wouldn't be surprised if most compilers don't. However I seem to remember a UNIX utility that you could run on a program to coalesce identical literal strings in this fashion if you wanted this optimization. ) What does current practice dictate on this? > Eric Hendrickson >-- >/----------"Oh carrots are divine, you get a dozen for dime, its maaaagic."-- >|Eric (the "Mentat-Philosopher") Hendrickson Academic Computing Services >|edh@ux.acs.umn.edu The game is afoot! University of Minnesota >\-"What does 'masochist' and 'amnesia' mean? Beats me, I don't remember."-- rba iv - signatures? We don't need no stinkin' signatures! amodeo@dataco
karl@haddock.ima.isc.com (Karl Heuer) (10/19/90)
In article <256@dcsun21.dataco.UUCP> amodeo@dcsun03.UUCP (Roy Amodeo,DC ) writes: >> char *duh = "Hello"; ... *duh = ... > >In the above segment of code, the literal string pointed to by 'duh' is >being modified in place. Is this portable according to the ANSI standard? No. String literals may be shared and read-only. (K&R 1 explicitly said otherwise, which was a botch; I choose to interpret it as having described a particular implementation rather than the language itself.) >What does current practice dictate on this? Some compilers make them writable and separate; others make them read-only and shared. The latter compilers usually have an option to yield the former behavior, since some existing code depends on it. Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint