[net.lang.c] sizeof "string"

jack@rlgvax.UUCP (Jack Waugh) (01/27/84)

I have seen at least one compiler (I forget which) that  gave
2  (the  size  of a pointer on that machine) as the size of a
string.  So it isn't a  reliable  portable  practice  to  use
sizeof on strings.

Jack Waugh

mark@elsie.UUCP (01/27/84)

There is a subtle difference between arrays and pointers to arrays. Take the
following test program:

    char		dbuf[] = "1234567890";
    main()
    {
    char		buf[10];
    char		*cbuf = "1234567890";

    printf("bufsz= %d, cbufsz= %d, dbufsz= %d\n",
	    sizeof buf, sizeof cbuf, sizeof dbuf);
    }

outputs: "bufsz= 10, cbufsz= 4, dbufsz= 11" (we have a 32 bit machine).
"buf[10]" allocates 10 bytes of space, with "buf" pointing to the first
location. "*cbuf" allocates a pointer, cbuf, that points to the string
"1234567890"; "sizeof cbuf" thus is the size of the pointer. "dbuf[] = .."
(which must be an external to be initiallized) is an array containing the
string "1234567890\0" (the NULL at the end of the string is the 11th byte).
Dbuf, and buf are the same, while cbuf is different; at least as far as
sizeof is concerned. Note also that "&cbuf" is a meaningful construct while
"&buf" and "&dbuf" are not. The important point to note here is that while
arrays and pointers to arrays are very similiar, they are not the same.


-- 
UUCP:	decvax!harpo!seismo!rlgvax!cvl!elsie!mark
Phone:	(301) 496-5688

woods@hao.UUCP (01/28/84)

  Of course it isn't a good idea to use sizeof on strings! It should properly
return the size of a char pointer. That is why "strlen" is in the stdio
library!

		    GREG
-- 
{ucbvax!hplabs | allegra!nbires | decvax!kpno | harpo!seismo | ihnp4!kpno}
       		        !hao!woods

stevens@inuxh.UUCP (W Stevens) (01/31/84)

On our system (Vax 11/780 running UNIX 5.0, a.k.a. System V), the
program:

main()
{
	printf("size of \"Hello world!\" is %d\n",
	 sizeof("Hello world!"));
	exit(0);
}

prints the value 13.  This agrees with K&R page 181: "A string has type
'array of characters' and storage class static ... and is initialized
with the given characters."

--
Scott Stevens
AT&T Consumer Products Laboratories
Indianapolis, Indiana, USA
UUCP: inuxh!stevens

The difficult didn't get done yesterday, so the impossible will have to wait.

wolfe@mprvaxa.UUCP (Peter Wolfe) (01/31/84)

Of course you got the size of a pointer as a result of doing sizeof "string".
"string" is in fact a "pointer" to a constant string.  I believe that sizeof
is the only portable way to dtermine how large data types (eg. char's) are in C.
-- 

    Peter Wolfe
    Microtel Pacific Research
    ..decvax!microsoft!ubc-vision!mprvaxa!wolfe

jas@druxy.UUCP (ShanklandJA) (02/01/84)

A number of incorrect answers to this relatively simple question have
been posted (as well as a roughly equal number of correct ones).  This
is an attempt to clarify the issue.  Please, for those of you who
disagree with what follows, READ THE PERTINENT SECTIONS OF K&R BEFORE
POSTING YET ANOTHER INCORRECT ANSWER TO THE NET!  All page numbers given
below are references to K&R.

Peter Wolfe (mprvaxa!wolfe) says:

    Of course you got the size of a pointer as a result of
    doing sizeof "string".  "string" is in fact a "pointer"
    to a constant string.  I believe that sizeof is the
    only portable way to dtermine [sic] how large data types
    (eg. char's) are in C.

And Greg Woods (hao!woods) says:

    Of course it isn't a good idea to use sizeof on strings!
    It should properly return the size of a char pointer.
    That is why "strlen" is in the stdio library!

And Jack Waugh (rlgvax!jack) says:

    I have seen at least one compiler (I forget which)
    that  gave 2  (the  size  of a pointer on that
    machine) as the size of a string.  So it isn't a
    reliable  portable  practice  to  use sizeof on
    strings.

Folks, "string" is NOT in fact a pointer to a constant string.  "string"
has type array of char and storage class static (pg. 181).  When a
string is referenced in an expression, it, like any other array,
is converted to a pointer to the first element of the array (pp. 94, 185).
But sizeof, when applied to an array, yields the size of the array (pg. 188).
sizeof the array "hello, world" is 13:  enough room for the 12 characters
in quotes plus a terminating '\0', which is what the compiler initializes
the array to.

Yes, sizeof is the only portable way to determine how large data types
are, but that has nothing to do with the question at hand.

strlen's presence in the stdio library has nothing to do with sizeof;
strlen returns the size of a particular string:  in essence, the distance
in bytes from the pointer it is passed to the first occurrence of
a null character ('\0').  strlen is a function called at run-time;
sizeof is resolved at compile time.  strlen( "hello\0, world" ) will
return 5 (note the \0 in the middle); sizeof( "hello\0, world" ) should
be compiled to the constant 14.

Finally, the fact that one C compiler somewhere said that
sizeof( "hello, world" ) was 2 does not mean much; that is just a
compiler bug.  Saying that therefore, using sizeof( <string> ) is
not reliable portable practice makes as much sense as saying that
using the '+' operator is not reliable portable practice because
someone once wrote a C compiler that incorrectly implemented addition.

Jim Shankland
..!ihnp4!druxy!jas

slb@inuxh.UUCP (Stephen Browning) (02/02/84)

After reading the concise and well referenced explanation of
sizeof("Hello\0,world."), I was reminded of a question raised by
the people at Ecosoft here in Indy while they were writing their
C compiler.

	Is "Hello\0, world." a string, or is it two strings?
	Put another way, is '\0' a legal character to embed
	within a string?

Remember, that just because a compiler accepts it, doesn't make it
right!  Any takers on this one?

	Stephen L. Browning
	AT&T CPL
	inuxh!slb

chris@umcp-cs.UUCP (02/03/84)

"Hello\0, world" is one string.  If I say

	write (1, "Hello\0, world", 13);

I expect 13 characters to be written, with NUL and all.

(One might actually have a reason to depend on this -- say you were
storing ``very small'' integers in a char array.  You could use:

	char permute[] = "\2\1\0\9\8\7\6\3\4\5";

to get a weird permutation table.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris.umcp-cs@CSNet-Relay

wolfe@mprvaxa.UUCP (Peter Wolfe) (02/03/84)

As usually happens when you type first and investigate and think later
problems result.  My apologies for that ridiculous bit about sizeof "string".
Of course the compiler that produces "2" foe sizeof "string" is in error.

Resolved to think first type later then find out the truth.

PS.
	Enough about NULL and 0 and pointers etc.  The language isn't 
	perfect.  How about somebody who knows more about the proposed
	ANSI standard giving us more details.

-- 

    Peter Wolfe
    Microtel Pacific Research
    ..decvax!microsoft!ubc-vision!mprvaxa!wolfe

geoff@proper.UUCP (Geoff Kuenning) (02/06/84)

>	char permute[] = "\3\2\1\9\8\7...

Naughty, naughty!  In the FIRST place, \9 is NOT a legal octal value.  In
the second place, if you are initializing a "char" array to binary numbers,
rather than characters, you should use:

	char permute[] = {3, 2, 1, 9, 8, 7...

chris@umcp-cs.UUCP (02/08/84)

Ok, so "\9\8\7" is a bit weird.  (It just happens to work though.)
But in fact you might want to put a \0 in a real string.  How about
a new example:

char translate[] = "these\0those\0this\0that\0";

The two "\0"s in a row are an end marker.  Someone might then write
something to translate words ("these" => "those", "this" => "that").
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris.umcp-cs@CSNet-Relay

keesan@bbncca.ARPA (Morris Keesan) (02/10/84)

----------------------------

    slb@inuxh.UUCP (Stephen Browning) asks,

>	Is "Hello\0, world." a string, or is it two strings?
>	Put another way, is '\0' a legal character to embed
>	within a string?

The answer is "Yes, of course," to both questions.  Section 2.5 of the C
Reference Manual (p. 181 of K&R) says "A string is a sequence of characters
surrounded by double quotes," and "The compiler places a null byte \0 at the end
of each string so that programs which scan the string can find its end." This
puts absolutely no restrictions on the contents of a string, except that the
last character will always be '\0'.  The idea that a NUL character always
indicates the end of a string is strictly a matter of convention.  One should
not confuse definitions used by library routines with the definition of a
language.  In particular, the manual page string(3), which says, "The arguments
. . . point to strings (arrays of characters terminated by a null character),"
should be ignored for the purposes of any discussion of the definition of C. 
-- 
					Morris M. Keesan
					{decvax,linus,wjh12}!bbncca!keesan
					keesan @ BBN-UNIX.ARPA