jack@rlgvax.UUCP (Jack Waugh) (01/27/84)
I have seen at least one compiler (I forget which) that gave 2 (the size of a pointer on that machine) as the size of a string. So it isn't a reliable portable practice to use sizeof on strings. Jack Waugh
mark@elsie.UUCP (01/27/84)
There is a subtle difference between arrays and pointers to arrays. Take the
following test program:
char dbuf[] = "1234567890";
main()
{
char buf[10];
char *cbuf = "1234567890";
printf("bufsz= %d, cbufsz= %d, dbufsz= %d\n",
sizeof buf, sizeof cbuf, sizeof dbuf);
}
outputs: "bufsz= 10, cbufsz= 4, dbufsz= 11" (we have a 32 bit machine).
"buf[10]" allocates 10 bytes of space, with "buf" pointing to the first
location. "*cbuf" allocates a pointer, cbuf, that points to the string
"1234567890"; "sizeof cbuf" thus is the size of the pointer. "dbuf[] = .."
(which must be an external to be initiallized) is an array containing the
string "1234567890\0" (the NULL at the end of the string is the 11th byte).
Dbuf, and buf are the same, while cbuf is different; at least as far as
sizeof is concerned. Note also that "&cbuf" is a meaningful construct while
"&buf" and "&dbuf" are not. The important point to note here is that while
arrays and pointers to arrays are very similiar, they are not the same.
--
UUCP: decvax!harpo!seismo!rlgvax!cvl!elsie!mark
Phone: (301) 496-5688
woods@hao.UUCP (01/28/84)
Of course it isn't a good idea to use sizeof on strings! It should properly return the size of a char pointer. That is why "strlen" is in the stdio library! GREG -- {ucbvax!hplabs | allegra!nbires | decvax!kpno | harpo!seismo | ihnp4!kpno} !hao!woods
stevens@inuxh.UUCP (W Stevens) (01/31/84)
On our system (Vax 11/780 running UNIX 5.0, a.k.a. System V), the
program:
main()
{
printf("size of \"Hello world!\" is %d\n",
sizeof("Hello world!"));
exit(0);
}
prints the value 13. This agrees with K&R page 181: "A string has type
'array of characters' and storage class static ... and is initialized
with the given characters."
--
Scott Stevens
AT&T Consumer Products Laboratories
Indianapolis, Indiana, USA
UUCP: inuxh!stevens
The difficult didn't get done yesterday, so the impossible will have to wait.
wolfe@mprvaxa.UUCP (Peter Wolfe) (01/31/84)
Of course you got the size of a pointer as a result of doing sizeof "string". "string" is in fact a "pointer" to a constant string. I believe that sizeof is the only portable way to dtermine how large data types (eg. char's) are in C. -- Peter Wolfe Microtel Pacific Research ..decvax!microsoft!ubc-vision!mprvaxa!wolfe
jas@druxy.UUCP (ShanklandJA) (02/01/84)
A number of incorrect answers to this relatively simple question have been posted (as well as a roughly equal number of correct ones). This is an attempt to clarify the issue. Please, for those of you who disagree with what follows, READ THE PERTINENT SECTIONS OF K&R BEFORE POSTING YET ANOTHER INCORRECT ANSWER TO THE NET! All page numbers given below are references to K&R. Peter Wolfe (mprvaxa!wolfe) says: Of course you got the size of a pointer as a result of doing sizeof "string". "string" is in fact a "pointer" to a constant string. I believe that sizeof is the only portable way to dtermine [sic] how large data types (eg. char's) are in C. And Greg Woods (hao!woods) says: Of course it isn't a good idea to use sizeof on strings! It should properly return the size of a char pointer. That is why "strlen" is in the stdio library! And Jack Waugh (rlgvax!jack) says: I have seen at least one compiler (I forget which) that gave 2 (the size of a pointer on that machine) as the size of a string. So it isn't a reliable portable practice to use sizeof on strings. Folks, "string" is NOT in fact a pointer to a constant string. "string" has type array of char and storage class static (pg. 181). When a string is referenced in an expression, it, like any other array, is converted to a pointer to the first element of the array (pp. 94, 185). But sizeof, when applied to an array, yields the size of the array (pg. 188). sizeof the array "hello, world" is 13: enough room for the 12 characters in quotes plus a terminating '\0', which is what the compiler initializes the array to. Yes, sizeof is the only portable way to determine how large data types are, but that has nothing to do with the question at hand. strlen's presence in the stdio library has nothing to do with sizeof; strlen returns the size of a particular string: in essence, the distance in bytes from the pointer it is passed to the first occurrence of a null character ('\0'). strlen is a function called at run-time; sizeof is resolved at compile time. strlen( "hello\0, world" ) will return 5 (note the \0 in the middle); sizeof( "hello\0, world" ) should be compiled to the constant 14. Finally, the fact that one C compiler somewhere said that sizeof( "hello, world" ) was 2 does not mean much; that is just a compiler bug. Saying that therefore, using sizeof( <string> ) is not reliable portable practice makes as much sense as saying that using the '+' operator is not reliable portable practice because someone once wrote a C compiler that incorrectly implemented addition. Jim Shankland ..!ihnp4!druxy!jas
slb@inuxh.UUCP (Stephen Browning) (02/02/84)
After reading the concise and well referenced explanation of sizeof("Hello\0,world."), I was reminded of a question raised by the people at Ecosoft here in Indy while they were writing their C compiler. Is "Hello\0, world." a string, or is it two strings? Put another way, is '\0' a legal character to embed within a string? Remember, that just because a compiler accepts it, doesn't make it right! Any takers on this one? Stephen L. Browning AT&T CPL inuxh!slb
chris@umcp-cs.UUCP (02/03/84)
"Hello\0, world" is one string. If I say write (1, "Hello\0, world", 13); I expect 13 characters to be written, with NUL and all. (One might actually have a reason to depend on this -- say you were storing ``very small'' integers in a char array. You could use: char permute[] = "\2\1\0\9\8\7\6\3\4\5"; to get a weird permutation table.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris.umcp-cs@CSNet-Relay
wolfe@mprvaxa.UUCP (Peter Wolfe) (02/03/84)
As usually happens when you type first and investigate and think later problems result. My apologies for that ridiculous bit about sizeof "string". Of course the compiler that produces "2" foe sizeof "string" is in error. Resolved to think first type later then find out the truth. PS. Enough about NULL and 0 and pointers etc. The language isn't perfect. How about somebody who knows more about the proposed ANSI standard giving us more details. -- Peter Wolfe Microtel Pacific Research ..decvax!microsoft!ubc-vision!mprvaxa!wolfe
geoff@proper.UUCP (Geoff Kuenning) (02/06/84)
> char permute[] = "\3\2\1\9\8\7...
Naughty, naughty! In the FIRST place, \9 is NOT a legal octal value. In
the second place, if you are initializing a "char" array to binary numbers,
rather than characters, you should use:
char permute[] = {3, 2, 1, 9, 8, 7...
chris@umcp-cs.UUCP (02/08/84)
Ok, so "\9\8\7" is a bit weird. (It just happens to work though.) But in fact you might want to put a \0 in a real string. How about a new example: char translate[] = "these\0those\0this\0that\0"; The two "\0"s in a row are an end marker. Someone might then write something to translate words ("these" => "those", "this" => "that"). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris.umcp-cs@CSNet-Relay
keesan@bbncca.ARPA (Morris Keesan) (02/10/84)
---------------------------- slb@inuxh.UUCP (Stephen Browning) asks, > Is "Hello\0, world." a string, or is it two strings? > Put another way, is '\0' a legal character to embed > within a string? The answer is "Yes, of course," to both questions. Section 2.5 of the C Reference Manual (p. 181 of K&R) says "A string is a sequence of characters surrounded by double quotes," and "The compiler places a null byte \0 at the end of each string so that programs which scan the string can find its end." This puts absolutely no restrictions on the contents of a string, except that the last character will always be '\0'. The idea that a NUL character always indicates the end of a string is strictly a matter of convention. One should not confuse definitions used by library routines with the definition of a language. In particular, the manual page string(3), which says, "The arguments . . . point to strings (arrays of characters terminated by a null character)," should be ignored for the purposes of any discussion of the definition of C. -- Morris M. Keesan {decvax,linus,wjh12}!bbncca!keesan keesan @ BBN-UNIX.ARPA