[comp.lang.c] string assignment in C

romwa@gpu.utcs.toronto.edu (Mark Dornfeld) (10/11/88)

I've been reading a book on /rdb recently.  In the book they
have a small example where they assign strings.

	char *p1="first";
	char *p2;

	main( argc, argv )
	int argc;
	char *argv[];

	{
	  p2 = " is:";
	}

The assignment to p1 makes sense to me, because the compiler
could set aside the size of the string being assigned.  The
second case baffles me.  I always thought that you had to give
a string a "maximum size" and then use strcpy or sprintf for
assignment.  Isn't the assignment of p2 a dangerous thing to
do since the compiler has (presumably) only left enough space
for the pointer and not for the string.  I tried this example
out on QuickC and everything worked.  i.e. printf'ing p2 gives
'is:'.

Could someone please shed some light on this for me.  Could
you also please respond via e-mail, since I am borrowing someone's
account in order to post this.

advTHANKSance

Pavneet Arora
...!utgpu!rom!pavneet

Royal Ontario Museum
100 Queen's Park
Toronto, Ontario
M5S 2C6
(416) 585-5626

john@chinet.chi.il.us (John Mundt) (10/13/88)

In article <1988Oct11.143728.28627@gpu.utcs.toronto.edu> romwa@gpu.utcs.toronto.edu (Mark Dornfeld) writes:
>	char *p1="first";
>	char *p2;
>	main(  )
>	{
>	  p2 = " is:";
>	}
>Isn't the assignment of p2 a dangerous thing to
>do since the compiler has (presumably) only left enough space
>for the pointer and not for the string.


The two are the same.  Each is a pointer to char.  Each string, 
"first" and " is:" are reserved by the compiler as unnamed strings 
somewhere in memory.  Both p1 and p2 are pointers who are set to point
to these strings.  You could reassign p1 or p2 to any other string as
well.   In other words, you could say p1 = p2 or p2 = (char *) 0.
Try running sizeof() on either of them and both will return 
an integer equal to  sizeof(char *).  

Now, this would be different:

char p[] = { 't','h','i','s',' ','a',' ','s','t','r','i','n','g','\n' };

main()
{
	printf(p);
}

Here, p is a fixed array of characters and cannot be reassigned.
Trying to say p = (char *) 0 would be illegal.

Further, sizeof(p) would be the length of the string "this is a
string\n" rather than the size of a character pointer.
-- 
---------------------
John Mundt   Teachers' Aide, Inc.  P.O. Box 1666  Highland Park, IL
(312) 432-8860	-998-5007 Voice  ||  -432-5386 Modem  

kyriazis@rpics (George Kyriazis) (10/13/88)

In article <6777@chinet.chi.il.us> john@chinet.chi.il.us (John Mundt) writes:
>  ... stuff deleted ...
>Each string,
>"first" and " is:" are reserved by the compiler as unnamed strings
>somewhere in memory. ......

My question is:  Are strings like " is:" volatile or not?
When you say p2 = " is:", are you sure that the string will remain in
memory or the optimizer will decide to put something else there since
the string is basically a constant used only once??
I also have the idea that if you say
        p1 = "abc";
        p2 = "abc";
p1 and p2 will have different value, since the strings are not the same
(they have the same contents, but physically should be different).
Is that a right assumption?



  George Kyriazis
  kyriazis@turing.cs.rpi.edu
------------------------------

peter@ficc.uu.net (Peter da Silva) (10/14/88)

In article <6777@chinet.chi.il.us>, john@chinet.chi.il.us (John Mundt) writes:
> Now, this would be different:

> char p[] = { 't','h','i','s',' ','a',' ','s','t','r','i','n','g','\n' };

> main()
> {
> 	printf(p);
> }

Different all right. You forgot to terminate your string. Your milage
will vary, but I got:

% a.out | vcat
this a string
l&^Z'.^R% []

[] is the cursor.
-- 
Peter da Silva  `-_-'  Ferranti International Controls Corporation.
"Have you hugged  U  your wolf today?"            peter@ficc.uu.net

levy@ttrdc.UUCP (Daniel R. Levy) (10/16/88)

In article <6777@chinet.chi.il.us>, john@chinet.chi.il.us (John Mundt) writes:
> char p[] = { 't','h','i','s',' ','a',' ','s','t','r','i','n','g','\n' };
> 
> main()
> {
> 	printf(p);
> }

Be careful with this kind of declaration: without the terminating '\0' there
is no guarantee where the "string" p[] ends.  In fact, try this on a vax, 3b,
or machine of similar architecture:

char p[] = { 't','h','i','s',' ','a',' ','s','t','r','i','n','g',' ',
	     ' ','\n' };	/* 16 bytes, NO NULL TERMINATOR */
char q[] = { 't','h','i','s',' ','g','a','r','b','a','g','e','\n','\0' };

main()
{
	printf(p);
}

The output of the program will be:

this a string  
this garbage

The 16 bytes in p[] is to make it end just before the word boundary on which
q[] begins.  Otherwise the system will probably pad p[] with nulls and you
won't notice the lack of explicit null terminator.
-- 
|------------Dan Levy------------|  THE OPINIONS EXPRESSED HEREIN ARE MINE ONLY
| Bell Labs Area 61 (R.I.P., TTY)|  AND ARE NOT TO BE IMPUTED TO AT&T.
|        Skokie, Illinois        | 
|-----Path:  att!ttbcad!levy-----|

nagel@paris.ics.uci.edu (Mark Nagel) (10/17/88)

In article <1414@imagine.PAWL.RPI.EDU>, kyriazis@rpics (George Kyriazis) writes:
|In article <6777@chinet.chi.il.us> john@chinet.chi.il.us (John Mundt) writes:
|>Each string,
|>"first" and " is:" are reserved by the compiler as unnamed strings
|>somewhere in memory. ......
|
|My question is:  Are strings like " is:" volatile or not?
|When you say p2 = " is:", are you sure that the string will remain in
|memory or the optimizer will decide to put something else there since
|the string is basically a constant used only once??

Of course not!  That would be analogous to having the 2 in:

        x = 2;

change at some point in the program since it is "basically a constant."  
Also, *all* code has the potential for being executed more than once 
through function calls (well, not global variable initialization, but 
all code in functions).

|I also have the idea that if you say
|        p1 = "abc";
|        p2 = "abc";
|p1 and p2 will have different value, since the strings are not the same
|(they have the same contents, but physically should be different).
|Is that a right assumption?

No.  Some compilers will treat them as different objects (i.e. you get
two distinct pointer values for the two string constants).  Other's will
optimize the program's space usage by sorting and uniq'ing all of the
strings so that different string constant references all have the same
physical address.  This is fine since constant strings should be
read-only objects.

Mark D. Nagel
  UC Irvine - Dept of Info and Comp Sci | The probability of someone
  nagel@ics.uci.edu             (ARPA)  | watching you is proportional to
  {sdcsvax|ucbvax}!ucivax!nagel (UUCP)  | the stupidity of your action.

gandalf@csli.STANFORD.EDU (Juergen Wagner) (10/17/88)

In article <1414@imagine.PAWL.RPI.EDU> George Kyriazis writes:
...
>        p1 = "abc";
>        p2 = "abc";
>p1 and p2 will have different value, since the strings are not the same
>(they have the same contents, but physically should be different).

Hmmm....

>Is that a right assumption?

I don't think so. Look, what is the same, is not the pointer to those strings.
The contents of the respective memory locations are the same (happen to be).

There is a difference between e.g. ints and those strings: ints fit into
a register and can be 'in-line' coded, strings can't. If the compiler finds
a line
	foo = 6;
and another line
	bar = 6;
then these values might be transformed into instructions loading the value
6 directly into the locations of foo and bar (i.e. without evaluating a lot).

On the other hand, the lines
	p1 = "abc";
	p2 = "abc";
do not allow to do that in general. The optimization is to assign a fixed
memory location to each of those strings, and optimize the use of their
addresses. Usually, strings like these are stored in the static area of the
data space. They have to be distinct unless the compiler can make assumptions
like "static data are read-only and can therefore be merged into the text
space". If you want to share then, use xstr(1) to get shared strings.

-- 
Juergen "Gandalf" Wagner,		   gandalf@csli.stanford.edu
Center for the Study of Language and Information (CSLI), Stanford CA

gwyn@smoke.ARPA (Doug Gwyn ) (10/17/88)

In article <790@paris.ics.uci.edu> nagel@paris.ics.uci.edu (Mark Nagel) writes:
>|        p1 = "abc";
>|        p2 = "abc";
>|p1 and p2 will have different value, since the strings are not the same
>optimize the program's space usage by sorting and uniq'ing all of the
>strings so that different string constant references all have the same
>physical address.  This is fine since constant strings should be
>read-only objects.

The key is that you are allowed to portably compare pointers only in two
cases:  at least one pointer is a null pointer, or both pointers are
pointers into the same object.  This means that the fact that p1==p2 for
pointers to distinct objects is not a problem, since such comparison is
"undefined".  (Otherwise, the two string objects would have to be given
unique addresses.)  As it stands, if p1 and p2 are pointers to const
char, the storage may be shared, but not if they are pointers to char
(unless the compiler can determine that the pointers and all aliases to
them are not used to modify the contents of the string literals).  Thus
the automatic "dumb" crunching together of string literals is not
permitted in a standard-conforming implementation.

You can accomplish this coalescing yourself in your source code:
	static char abc_str[] = "abc";
	...
	char *p1 = abc_str;
	char *p2 = abc_str;
Whether it is a good idea or not depends on how the pointers are used.

knudsen@ihlpl.ATT.COM (Knudsen) (10/18/88)

In article <1414@imagine.PAWL.RPI.EDU>, kyriazis@rpics (George Kyriazis) writes:
> My question is:  Are strings like " is:" volatile or not?
> When you say p2 = " is:", are you sure that the string will remain in
> memory or the optimizer will decide to put something else there since
> the string is basically a constant used only once??

Nope.  p2 will continue to point to " is:" unless you reassign p2,
which the compiler has no idea if or when you might do.
So as long as p2 exists, that constant string can't be recycled.
Worse yet, even if p2 is an auto variable, the " is:" has to stick around
in the text/code segment so that p2 can be initialized again if that
function is re-entered.

> I also have the idea that if you say
>         p1 = "abc";
>         p2 = "abc";
> p1 and p2 will have different value, since the strings are not the same
> (they have the same contents, but physically should be different).

Some compilers are fancy enough to detect identical string constants
and merge them, so indeed p1==p2 there.  I doubt many compilers
go this far, however.
-- 
Mike Knudsen  Bell Labs(AT&T)   att!ihlpl!knudsen
"Lawyers are like handguns and nuclear bombs.  Nobody likes them,
but the other guy's got one, so I better get one too."

karl@haddock.ima.isc.com (Karl Heuer) (10/20/88)

In article <8696@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn) writes:
>... As it stands, if p1 and p2 are pointers to const char, the storage may be
>shared, but not if they are pointers to char....  Thus the automatic "dumb"
>crunching together of string literals is not permitted in a standard-
>conforming implementation.

True in K&R C, but fixed in dpANS: "Identical string literals ... need not be
distinct.  If the program attempts to modify a string literal ... the behavior
is undefined." [3.1.4]  "This specification allows implementations to share
copies of [identical] strings." [R3.1.4]

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

gwyn@smoke.BRL.MIL (Doug Gwyn ) (10/20/88)

In article <9710@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes:
-True in K&R C, but fixed in dpANS: "Identical string literals ... need not be
-distinct.  If the program attempts to modify a string literal ... the behavior
-is undefined." [3.1.4]  "This specification allows implementations to share
-copies of [identical] strings." [R3.1.4]

True.  I was thinking of string-literal initialized char arrays,
not pointers to string literals.  Sorry.