[comp.lang.c] Initializing arrays of char

nreadwin@micrognosis.co.uk (Neil Readwin) (10/04/90)

 Can someone tell me why the following initializer is legal inside a 
 structure, but not outside it ? Or is it a compiler bug ?

struct foo {
	char x[5];
	} bar = {"12345"};

char baz[5] = "12345";

The VMS compiler barfs on the second one with 
 %CC-W-TRUNCSTRINIT, String initializer for "baz" contains
		too many characters to fit;  truncated.
		At line number 5 in CASSIUS:[NREADWIN.TMP]ZZ.C;4.

The SunOS compiler agrees
 "zz.c", line 5: too many initializers

gcc seems quite happy with both. 

I was unable to decrypt what K&R had to say on the matter - should the null
character appended to the string count as an initializer in both cases ?

 Disclaimer: 818  Phone: +44 71 528 8282  E-mail: nreadwin@micrognosis.co.uk
 W Westfield: Abstractions of hammers aren't very good at hitting real nails

poser@csli.Stanford.EDU (Bill Poser) (10/05/90)

In article <1990Oct4.152756.6850@micrognosis.co.uk> nreadwin@micrognosis.co.uk (Neil Readwin) writes:
>
> Can someone tell me why the following initializer is legal inside a 
> structure, but not outside it ? Or is it a compiler bug ?
>
>struct foo {
>	char x[5];
>	} bar = {"12345"};
>
>char baz[5] = "12345";

I would say that both are erroneous. The reason that you can't assign
"12345" to baz is that baz is an array of FIVE chars and the string "12345"
requires SIX characters, five for the five digits, and one for the
terminating null. The largest string (in the sense of "sequence of characters
terminated by a null") that you can put in baz is one four characters long.
For this reason, the structure initialization shouldn't work either.
Padding of the structure may allocate an additional byte so that
the assignment doesn't actually trash anything, but I don't see why the
compiler isn't checking the declared array size.

edgincd2@mentor.cc.purdue.edu (Chris Edgington *Computer Science Major*) (10/05/90)

In article <1990Oct4.152756.6850@micrognosis.co.uk>, nreadwin@micrognosis.co.uk (Neil Readwin) writes:
> 
>  Can someone tell me why the following initializer is legal inside a 
>  structure, but not outside it ? Or is it a compiler bug ?
> 
> struct foo {
> 	char x[5];
> 	} bar = {"12345"};
> 
> char baz[5] = "12345";
> 
When you request the compiler to allocate n chars in the array, you really
should only use the first 4 if you are going to be using the array as a 
string because one of the chars allocated is used for the NULL, which tells
the compiler where the end of the string is.  If you write over the NULL and 
then try to print the string, the compiler [runtime code] will just continue
printing until it encounters a NULL, signifying the end of the string.  
Therefore, to allocate ample space for your string "12345", you need to have
char baz[6].


   __                     __
  /  ) /                 /  `    /                _/_
 /    /_  __  o _       /--   __/ _,  o ____  _,  /  ________
(__/ / /_/ (_<_/_)_    (___, (_/_(_)_<_/ / <_(_)_<__(_) / / <_
                                  /|          /|
                                 |/          |/
Chris Edgington      edgincd2@mentor.cc.purdue.edu       Purdue University

poser@csli.Stanford.EDU (Bill Poser) (10/05/90)

Regarding the assignment of "12345" to char x[5] and struct{char x[5]},
I spoke too soon. K&R2 contains a detail I hadn't noticed, and am not
sure that I approve of. On p.219, in the discussion of initialization
of fixed size arrays by string constants, it states:

	...the number of characters in the string, NOT COUNTING
	THE TERMINATING NULL CHARACTER, must not exceed the
	size of the array. [emphasis mine]

This means that the assignment of "12345" to an array of five characters,
is legal. If K&R2 here reflects the standard, then both initializations
are legitimate.

This seems to me to be a bad idea. Everywhere else, one has to take
into account the terminating null. For example, x[5] = 'a' is
an error. Not counting the terminating null here is inconsistent.
Can anyone explain this decision?

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/05/90)

In article <14796@mentor.cc.purdue.edu> edgincd2@mentor.cc.purdue.edu (Chris Edgington *Computer Science Major*) writes:
> In article <1990Oct4.152756.6850@micrognosis.co.uk>, nreadwin@micrognosis.co.uk (Neil Readwin) writes:
> > char baz[5] = "12345";
  [ explanation ]

More to the point, an alert reader will notice that you haven't
accounted for the NULL. Whether or not that's legal, you should always
treat non-string (i.e., non-NULL-terminated) character arrays as real
arrays with no relation between consecutive characters. Something like

  char baz[5] = { '1', '2', '3', '4', '5' };

This expresses your intent much more clearly.

> Therefore, to allocate ample space for your string "12345", you need to have
> char baz[6].

Only if you really do mean it that way---but from your article you
obviously know how many characters to allocate for a NULL-terminated
string, so you wouldn't be asking if that were the answer. (For those
new to C, the easy way to allocate a string is char baz[] = "12345";.)

Am I reading your mind correctly? :-)

---Dan

bengsig@oracle.nl (Bjorn Engsig) (10/05/90)

re:  char mesg[5] = "help!"; /* what about the null terminator? */

The ANSI standard says (3.5.7):

"Successive characters of the character string literal (including the 
terminating null character if there is room or if the array is of unknown size)
initialize the elements of the array."

and the rationale mentions:

"(Some widely used implementations provide precedent.)"

further, it fits well with the way strncpy() works.
-- 
Bjorn Engsig,	Domain:		bengsig@oracle.nl, bengsig@oracle.com
		Path:		uunet!mcsun!orcenl!bengsig
		From IBM:	auschs!ibmaus!cs.utexas.edu!uunet!oracle!bengsig

volpe@underdog.crd.ge.com (Christopher R Volpe) (10/05/90)

In article <15674@csli.Stanford.EDU>, poser@csli.Stanford.EDU (Bill
Poser) writes:
|>Regarding the assignment of "12345" to char x[5] and struct{char x[5]},
|>I spoke too soon. K&R2 contains a detail I hadn't noticed, and am not
|>sure that I approve of. On p.219, in the discussion of initialization
|>of fixed size arrays by string constants, it states:
|>
|>	...the number of characters in the string, NOT COUNTING
|>	THE TERMINATING NULL CHARACTER, must not exceed the
|>	size of the array. [emphasis mine]
|>
|>This means that the assignment of "12345" to an array of five characters,
|>is legal. If K&R2 here reflects the standard, then both initializations
|>are legitimate.
|>
|>This seems to me to be a bad idea. Everywhere else, one has to take
|>into account the terminating null. For example, x[5] = 'a' is
|>an error. Not counting the terminating null here is inconsistent.
|>Can anyone explain this decision?

The fact that x[5] = 'a' is an error has nothing to do with 
any terminating null. It's an error because x[5] doesn't exist. 
The array has elements x[0] through x[4]. 

There may be situations where you just want an array of characters, and
DON'T want a "string" (null terminated). Thus, you have the capability of
creating a five-byte array of char and initializing it with "abcde" and
a six-byte string and initializing it with "abcde" also. If you don't
like having to remember to allocate space for the terminating null when
declaring the array, let the compiler do it for you:
      char x[] = "abcde";
will create an array of six chars and initialize it, including the
terminating null.                 
==================
Chris Volpe
G.E. Corporate R&D
volpecr@crd.ge.com

chris@mimsy.umd.edu (Chris Torek) (10/05/90)

In article <15674@csli.Stanford.EDU> poser@csli.Stanford.EDU
(Bill Poser) writes:
>Regarding the assignment of "12345" to char x[5] ... [K&R 2 says]
>	...the number of characters in the string, NOT COUNTING
>	THE TERMINATING NULL CHARACTER, must not exceed the
>	size of the array. [emphasis mine]
>Can anyone explain [why the ending '\0' is not counted]?

This is a change in New (ANSI) C.  In Classic (K&R-1) C, a
double-quoted string in an initializer context%, when setting the
initial value of a character array, was treated uniformly as if it were
a bracketed initializer consisting of all the characters, including
the terminating NUL, in the string.  That is,

	char x[5] = "12345";

meant exactly the same thing as

	char x[5] = { '1', '2', '3', '4', '5', '\0' };

(and was therefore in error, having too many characters).

The X3J11 committee decided# that this was overly restrictive, and
relaxed the rule to `is equivalent to a bracketed initializer
consisting of all the characters, including the terminating NUL if it
fits'.  Thus

	char x[] = "12345";

means the same as

	char x[] = { '1', '2', '3', '4', '5', '\0' };

or

	char x[6] = { '1', '2', '3', '4', '5', '\0' };

but

	char x[5] = "12345";

now means the same as

	char x[5] = { '1', '2', '3', '4', '5' };

If the declaration is changed to

	char x[4] = "12345";

it is once again in error.

-----
% Note that here (in an initializer context) and as an argument to
  sizeof (e.g., `sizeof "abc"') are the only two places that a double
  quoted string does not undergo the usual `array degenerates into
  pointer' rule.  All other legal occurrences of a double-quoted string
  are in a value context, and therefore change from `array N of char'
  to `pointer to char', pointing to the first character in the string.

# This wording is not meant to imply judgement as to this decision.
  (When I do not take a stand on some aspect of the language I use
  weasel-wording like `seems to be' or merely present bare facts.)
  Since I use old compilers, I have not made up my mind on this.  I
  am leaning towards the `not a bad idea after all' faction.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

jim@jagmac2.gsfc.nasa.gov (Jim Jagielski) (10/05/90)

In article <9418:Oct503:06:2790@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <14796@mentor.cc.purdue.edu> edgincd2@mentor.cc.purdue.edu (Chris Edgington *Computer Science Major*) writes:
>> In article <1990Oct4.152756.6850@micrognosis.co.uk>, nreadwin@micrognosis.co.uk (Neil Readwin) writes:
>> > char baz[5] = "12345";
>  [ explanation ]
>
>> Therefore, to allocate ample space for your string "12345", you need to have
>> char baz[6].
>
>Only if you really do mean it that way---but from your article you
>obviously know how many characters to allocate for a NULL-terminated
>string, so you wouldn't be asking if that were the answer. (For those
>new to C, the easy way to allocate a string is char baz[] = "12345";.)
>                                               --------------------
                                                   ^
Let's clarify the above ---------------------------|


You forget that doing "char baz[] = "12345";" is the same as:

	char baz[] = { '1','2','3','4','5','\0' };

It appears in the above that you imply that doing char baz[] = "12345";
would result in a non-NULL-terminated string -- this is not correct.

Of course, I may not be reading your mind right :)

In any case, recall that C will always append the '\0' to any string constant.
If you don't want \0 in there, either copy upto the NULL (strncpy) or 
use characters ('a', etc...)
--
=======================================================================
#include <std/disclaimer.h>
                                 =:^)
           Jim Jagielski                    NASA/GSFC, Code 711.1
     jim@jagmac2.gsfc.nasa.gov               Greenbelt, MD 20771

"Kilimanjaro is a pretty tricky climb. Most of it's up, until you reach
 the very, very top, and then it tends to slope away rather sharply."

defaria@hpclapd.HP.COM (Andy DeFaria) (10/05/90)

>/ hpclapd:comp.lang.c / poser@csli.Stanford.EDU (Bill Poser) /  6:16 pm  Oct  4, 1990 /
>Regarding the assignment of "12345" to char x[5] and struct{char x[5]},
>I spoke too soon. K&R2 contains a detail I hadn't noticed, and am not
>sure that I approve of. On p.219, in the discussion of initialization
>of fixed size arrays by string constants, it states:
>
>	...the number of characters in the string, NOT COUNTING
>	THE TERMINATING NULL CHARACTER, must not exceed the
>	size of the array. [emphasis mine]
>
>This means that the assignment of "12345" to an array of five characters,
>is legal. If K&R2 here reflects the standard, then both initializations
>are legitimate.
>
>This seems to me to be a bad idea. Everywhere else, one has to take
>into account the terminating null. For example, x[5] = 'a' is
>an error. Not counting the terminating null here is inconsistent.
>Can anyone explain this decision?
>----------

It seems to me (and I am be no stretch of the imagination a  C expert) that
K&R C is saying "Sure you can use all 5 characters for a legitimate string.
You can manipulate them any way you want.  You might be using it to contain
a fixed length string of 5 characters.  But don't you  ever  try  to use it
with any string procedures (strlen, or even printf's %s operator) or expect
to get burned!"

henry@zoo.toronto.edu (Henry Spencer) (10/06/90)

In article <1990Oct4.152756.6850@micrognosis.co.uk> nreadwin@micrognosis.co.uk (Neil Readwin) writes:
> Can someone tell me why the following initializer is legal inside a 
> structure, but not outside it ? Or is it a compiler bug ?
>
>struct foo {
>	char x[5];
>	} bar = {"12345"};

It's a compiler bug.  ANSI C, 3.5.7 (emphasis added):

	An array of character type may be initialized by a character
	string literal, optionally enclosed in braces.  Successive
	characters of the character string literal (including the
	terminating null character *if there is room* or if the array
	is of unknown size) initialize the elements of the array.

Your compilers are assuming that "12345" has six characters in it, which
is correct in general, but for this oddball special case in initializers
the terminating null is present only if there is room for it.
-- 
Imagine life with OS/360 the standard  | Henry Spencer at U of Toronto Zoology
operating system.  Now think about X.  |  henry@zoo.toronto.edu   utzoo!henry

henry@zoo.toronto.edu (Henry Spencer) (10/06/90)

In article <15674@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes:
>This means that the assignment of "12345" to an array of five characters,
>is legal. If K&R2 here reflects the standard, then both initializations
>are legitimate.

It does; they are.

>This seems to me to be a bad idea. Everywhere else, one has to take
>into account the terminating null...
>... Not counting the terminating null here is inconsistent.
>Can anyone explain this decision?

It's a special case because this form of initializer is a special case.
Normally, assigning a string of any length to an array of char would be
illegal.
-- 
Imagine life with OS/360 the standard  | Henry Spencer at U of Toronto Zoology
operating system.  Now think about X.  |  henry@zoo.toronto.edu   utzoo!henry

jbickers@templar.actrix.co.nz (John Bickers) (10/06/90)

Quoted from - poser@csli.Stanford.EDU (Bill Poser):
> 	...the number of characters in the string, NOT COUNTING
> 	THE TERMINATING NULL CHARACTER, must not exceed the
> 	size of the array. [emphasis mine]

> an error. Not counting the terminating null here is inconsistent.
> Can anyone explain this decision?

    Sounds like this is intended to allow a nice way to initialize
    character arrays that aren't necessarily strings.

    Like, say, a 4 character ID in a structure, that is meant to be
    compared and writ with things like mem... or strn...

    Consider that a character array is not necessarily going to be used as
    a "string", and since C doesn't distinguish between the two with any
    sort of type keyword, it's better to provide for the more general case.

    Does lint warn about this sort of thing?
--
*** John Bickers, TAP, NZAmigaUG.         jbickers@templar.actrix.co.nz ***
***          "All I can do now is wait for the noise." - Numan          ***

cpcahil@virtech.uucp (Conor P. Cahill) (10/06/90)

In article <1990Oct4.152756.6850@micrognosis.co.uk> nreadwin@micrognosis.co.uk (Neil Readwin) writes:
>
> Can someone tell me why the following initializer is legal inside a 
> structure, but not outside it ? Or is it a compiler bug ?

While the "legality" is questionable, so is the "correct" behaviour.

My pcc compiler accepts it, but only takes the first 5 items (of course,
this may not be obvious in a test because of alignment considerations, but
when you use 8 as the dimension and "12345678" as the initializer, you 
will see a problem.

For example:

	char b[8] = "12345678";
	char c[8] = "1234";
	main()
	{
		printf("b = 0x%lx (%s), c = 0x%lx (%s)\n",b,b,c,c);
	}

The output of which is:

	b = 0x400acc (123456781234), c = 0x400ad4 (1234)

Anyway, the compiler *should* complain about it in both cases, but in many
cases will silently do the truncation.

Playing with it a bit more show that both GCC and pcc will complain 
about it if the next byte in the initialization string is not null.

For example:

 char baz[5] = "123456";

will get a warning about the initalizer string being too long (from gcc)
or "non-null byte ignored in string initializer" from pcc.

>I was unable to decrypt what K&R had to say on the matter - should the null
>character appended to the string count as an initializer in both cases ?

No cases should copy the null terminator.  They should not copy any 
more bytes than is specified in the array dimension.  The fact that you
chose a count of 5 will usually result in some alignment bytes between
each variable/structure and hence it appears that the null was copied. 
This is not the case.  Only the first 5 bytes would be copied.

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

cpcahil@virtech.uucp (Conor P. Cahill) (10/06/90)

In article <15674@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes:
>This means that the assignment of "12345" to an array of five characters,
>is legal. If K&R2 here reflects the standard, then both initializations
>are legitimate.

While it is "legal" it still should get a warning since it is doing something
that you may not expect.

>This seems to me to be a bad idea. Everywhere else, one has to take
>into account the terminating null. For example, x[5] = 'a' is
>an error. Not counting the terminating null here is inconsistent.

This has nothing to do with a terminating null.  x[5] is illegal because
you are accessing an element beyound the end of the array (assuming it
was declared as char x[5]).

>Can anyone explain this decision?

Probably because that was the existing standard (the way C has worked all
along).  

Another way to look at this is that "char x[dim];" declares an array
of characters, not a character string.  So the null need not be there and
without this rule you couldn't initialize the last element of the 
array to be a non-null.

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

dan@kfw.COM (Dan Mick) (10/06/90)

In article <9418:Oct503:06:2790@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>More to the point, an alert reader will notice that you haven't
>accounted for the NULL. 
                   ^^^^
Argh.  Dan, I'm shocked.

That's NUL.  NULL is a pointer.  NUL is a character.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/06/90)

In article <1990Oct6.011240.8538@kfw.COM> dan@kfw.com (Dan Mick) writes:
> In article <9418:Oct503:06:2790@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >More to the point, an alert reader will notice that you haven't
> >accounted for the NULL. 
> Argh.  Dan, I'm shocked.
> That's NUL.  NULL is a pointer.  NUL is a character.

Only for people who think in C. I learned from Knuth, and I still write
/\ (well, can't really do a capital lambda on a non-APL keyboard) when I
think of the null/nil/meaningless pointer.

The null character is 0. Meaning 3 in my dictionary...

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/06/90)

In article <3568@dftsrv.gsfc.nasa.gov> jim@jagmac2.gsfc.nasa.gov (Jim Jagielski) writes:
> You forget that doing "char baz[] = "12345";" is the same as:
> 	char baz[] = { '1','2','3','4','5','\0' };

That's what I said. Null-terminated character arrays are called strings,
and my point was that the original poster was *not* asking about them.

Again, the right way to initialize a five-element character array is to
list the five characters explicitly:

  char baz[5] = { '1', '2', '3', '4', '5' } ;

If you use "12345", you'll confuse the reader (not to mention any old
compilers) into thinking that you really want a (0-terminated) string.

---Dan

henry@zoo.toronto.edu (Henry Spencer) (10/07/90)

In article <21149:Oct604:52:2190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>> That's NUL.  NULL is a pointer.  NUL is a character.
>
>Only for people who think in C. I learned from Knuth, and I still write
>/\ (well, can't really do a capital lambda on a non-APL keyboard) when I
>think of the null/nil/meaningless pointer.
>
>The null character is 0. Meaning 3 in my dictionary...

The capitalization here is significant.  NULL is a name for 0, used when
discussing null [note lower case] pointers in C.  NUL is the official
ASCII name for the character with bit pattern 0000000, often used as a
null character.
-- 
Imagine life with OS/360 the standard  | Henry Spencer at U of Toronto Zoology
operating system.  Now think about X.  |  henry@zoo.toronto.edu   utzoo!henry

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/07/90)

In article <26860@mimsy.umd.edu>, chris@mimsy.umd.edu (Chris Torek) writes:
[char x[5] = "12345";]
> is a change in New (ANSI) C.
and provides a lucid explanation.  He further says
>   Since I use old compilers, I have not made up my mind on this.  I
>   am leaning towards the `not a bad idea after all' faction.

Data point:  the annotated C++ reference manual explicitly says that this
feature has _not_ been accepted for C++.  I don't know what the C++
standard will say; I'm sure there will be big fights over whether it is
better to be close to the C++ base document or the C standard.  At any
rate, for now, C code using this feature will not port to C++.

-- 
Fear most of all to be in error.	-- Kierkegaard, quoting Socrates.

msb@sq.sq.com (Mark Brader) (10/08/90)

> Again, the right way

(as a point of style, he means)

> to initialize a five-element character array is to
> list the five characters explicitly:
>      char baz[5] = { '1', '2', '3', '4', '5' } ;
> If you use "12345", you'll confuse the reader (not to mention any old
> compilers) into thinking that you really want a (0-terminated) string.

This may be true if you're dealing with 5-character arrays, but it
fails as soon as there are too many initializers to count by eye.
Suppose it was:

	char parity[64] = 
	    "EOOEOEEOOEEOEOOEOEEOEOOEEOOEOEEOOEEOEOOEEOOEOEEOEOOEOEEOOEEOEOOE";

It's obvious from its content that this is not a string to be printed,
so the absence of a trailing null* should not cause confusion.  You could
add a one-line comment if you you really must.  But I would (mildly) prefer
to see the one line above than four lines of 'E', 'O', 'O', 'E', 'O', ...

*or NUL, or '\0', but, please, never NULL.
-- 
Mark Brader			"Metal urgy.  The urge to use metals.
SoftQuad Inc., Toronto		 That was humans, all right."
utzoo!sq!msb, msb@sq.com			-- Terry Pratchett: Truckers

This article is in the public domain.

bengsig@oracle.nl (Bjorn Engsig) (10/08/90)

Article <26860@mimsy.umd.edu> by chris@mimsy.umd.edu (Chris Torek) says:
|
|  In Classic (K&R-1) C, a
|double-quoted string in an initializer context%, when setting the
|initial value of a character array, was treated uniformly as if it were
|a bracketed initializer consisting of all the characters, including
|the terminating NUL, in the string.
Yes, it seems to me that K&R1 says so - even if I would say it didn't.  The
rationale for ANSI C says that accepting 'char x[2] = "ab"' (omitting the NUL)
is due to widely existing practice.  This is in fact true, at least I have 
seen many Classic C compilers that allowed it and didn't warn about it.
Since K&R1 seems to be clear, how come the compilers accepted it?  Or
does K&R1 actually hide it somewhere?

As a comment to another note in this thread that string functions shouldn't be
used with non NUL terminated strings; strncpy is actually designed to work
with non NUL terminated fixed length strings, and you will normally use
'x[0]=0; strncat(x,s,n)' if you want a limited NUL terminated copy of strings,
whereas 'strncpy(x,s,n)' may yield surprises.
-- 
Bjorn Engsig,	       E-mail: bengsig@oracle.com, bengsig@oracle.nl
ORACLE Corporation   From IBM: auschs!ibmaus!cs.utexas.edu!uunet!oracle!bengsig

	"Stepping in others footsteps, doesn't bring you ahead"

flint@gistdev.gist.com (Flint Pellett) (10/08/90)

chris@mimsy.umd.edu (Chris Torek) writes:

>In article <15674@csli.Stanford.EDU> poser@csli.Stanford.EDU
>(Bill Poser) writes:
>>Regarding the assignment of "12345" to char x[5] ... [K&R 2 says]
>>	...the number of characters in the string, NOT COUNTING
>>	THE TERMINATING NULL CHARACTER, must not exceed the
>>	size of the array. [emphasis mine]
>>Can anyone explain [why the ending '\0' is not counted]?

>This is a change in New (ANSI) C.  In Classic (K&R-1) C, a
>double-quoted string in an initializer context%, when setting the
>initial value of a character array, was treated uniformly as if it were
>a bracketed initializer consisting of all the characters, including
>the terminating NUL, in the string.  That is,

>	char x[5] = "12345";

>meant exactly the same thing as

>	char x[5] = { '1', '2', '3', '4', '5', '\0' };

>(and was therefore in error, having too many characters).

On AT&T 3B2 machines about 2-3 years ago, it did not produce a compile
error: I know, I lived through it.  See story below.

>The X3J11 committee decided# that this was overly restrictive, and
>relaxed the rule to `is equivalent to a bracketed initializer
>consisting of all the characters, including the terminating NUL if it
>fits'.  Thus

IMHO the committee blew it: their decision lets a programmer who will
only use a string in a non-null terminated manner (like with strncpy)
save 1 lousy byte, and opens the door for a ton of mistakes to get through.
I imagine their main motivation was compatibility, but I think this is
still a mistake: if I write it as a double quoted string, _I_ mean that
I want it null terminated.

Here is a real life example of the impact of this decision: for
about a week we had a 3B2 machine which kept crashing about once an
hour because of this!  We finally traced the problem through this chain,
at a cost of 20 minutes per reboot and anywhere from 10 minutes to several
hours chasing the problem at each step.
1. It always crashed because it ran out of swap space.
2. It was incorrectly set up so that one user could use up all the swap.
3. One particular program was always running when it crashed.
4. Performance hit bottom when that program was run, and you couldn't
   abort the program without killing it from another terminal.
5. Only certain functions within the program caused the crash.
6. We were able to keep the system from crashing by retuning, but we
   still had performance problems, and this program wasn't working: it
   appeared to be in an infinite loop.
7. The critical routine that killed us (reduced to the part that mattered)
   eventually was this:

char foo[5] = "abcde";	/* NOTE: no room for terminating '\0' char */
char bar[]  = "fghi";	/* NOTE: declared immediately behind the foo array */
sprintf(bar,"%s",foo);	/* copy foo into bar: other tweaking omitted */

The problem was introduced by a maintenance change correction to the
string in foo, making it 1 longer but forgetting to fix the length of 5.
That, coupled with the fact that array bar followed immediately behind
array foo, which no longer was NUL terminated, turned the sprintf into
an infinite loop chasing it's own tail.

If C thinks this feature is useful, they __at least__ ought to generate
a warning message, because 99 times out of 100 it's going to be a bug,
not an intended use, and it is VERY hard to spot an error of this nature
when looking at the code-- it "looks" right.
-- 
Flint Pellett, Global Information Systems Technology, Inc.
1800 Woodfield Drive, Savoy, IL  61874     (217) 352-1165
uunet!gistdev!flint or flint@gistdev.gist.com

lerman@stpstn.UUCP (Ken Lerman) (10/08/90)

In article <15674@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes:
->Regarding the assignment of "12345" to char x[5] and struct{char x[5]},
->I spoke too soon. K&R2 contains a detail I hadn't noticed, and am not
->sure that I approve of. On p.219, in the discussion of initialization
->of fixed size arrays by string constants, it states:
->
->	...the number of characters in the string, NOT COUNTING
->	THE TERMINATING NULL CHARACTER, must not exceed the
->	size of the array. [emphasis mine]
->
->This means that the assignment of "12345" to an array of five characters,
->is legal. If K&R2 here reflects the standard, then both initializations
->are legitimate.
->
->This seems to me to be a bad idea. Everywhere else, one has to take
->into account the terminating null. For example, x[5] = 'a' is
->an error. Not counting the terminating null here is inconsistent.
->Can anyone explain this decision?

I can't explain the decision, but I can understand that it might be
useful.  It does make sense to have an array of characters in the same
sense that one has an array of integers.  In that case, if one knows
the length, there should be no requirement that a character with the
value 0 be stored to signify the end.

It does seem to be an opportunity for error, though.

Ken

karl@haddock.ima.isc.com (Karl Heuer) (10/09/90)

In article <26860@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>[Allowing `char x[5]="12345";' is new to ANSI C.]

True, and therefore the answer to the original question is "failure to accept
this is a compiler bug iff your compiler claims ANSI conformance".

I opposed this feature (prefering to leave it a Common Extension, which was
its pre-ANSI status) because I had a counterproposal (enclosed for your
reading pleasure) that I think was cleaner and more general.  Unfortunately, I
didn't have existing practice on my side, and it was rejected.

Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint
--------cut here--------
Proposal #1

Add new escape sequence \c.

Summary

This proposal cleans up two warts in the language: initializing a
character  array without adding a null character, and terminating
a  hexadecimal  escape  which  might  be  followed  by  a   valid
hexadecimal  digit.   It  also  allows  the  user  to  explicitly
document   when   a   null   character   is   unnecessary,   e.g.
write(1,"\n\c",1).

Justification

I presume the Committee is already aware of  the  need  for  non-
null-terminated character arrays, since the January Draft makes a
special case for them in S3.5.7.  However, the mechanism requires
the  user  to  count the characters himself in order to make sure
that he doesn't leave room for the null  characters;  this  is  a
maintenance   nightmare.    My  proposal  is  a  cleaner  way  to
accomplish this.

It has been suggested that although an  escape  to  suppress  the
null  character  is useful, the termination of hex escapes is not
an issue because it is handled by string literal pasting.

String  pasting  is  useful   for   line   continuation   without
backslash-newline,   and  for  constructing  string  literals  in
macros, but using it to indicate the end of a  hex  escape  is  a
botch.  This is nearly as bad as suggesting that the whole string
be written in hex.

Moreover, it's very C-specific; one could not advertise a program
that  `accepts all the C escapes' as input, without first solving
the hex-termination problem all over again.

Also, it doesn't handle  character  constants.   The  example  in
S3.1.3.4  is  clearly  a  kludge--it  suggests  replacing the hex
escape  with  octal.   This  won't  always  be  possible  on   an
architecture with 12-bit bytes, for example.

Finally,  if  the  \c  escape  is  added  anyway  for  the  null-
suppression  feature,  the additional change of insisting that it
be a no-op in other contexts is minor.

Specific changes

In S3.1.3.4, page 29, line 10, add \c to  the  list  of  escapes.
Add  the  description:  `The  \c  escape  at  the end of a string
literal  suppresses  the  trailing  null  character  that   would
normally  be appended.  If \c appears in a character constant, or
anywhere in a string literal other than at the end,  then  it  is
ignored, but may serve to separate an octal or hexadecimal escape
from a following digit.'

In S3.1.3.4, page 30, line 35, change '\0223' to '\x12\c3'.

In S3.1.4, page 31, line 29, after  `A  null  character  is  then
appended'  add `unless the string literal ended with \c'.  Make a
similar change to line 31.  Add  the  sentence  `If  a  character
string  literal  or  a  wide  string literal has zero length, the
behavior is undefined'.  Add to footnote 16 the text `or  it  may
lack a trailing null character because of \c'.

In S3.1.4, page 31, line 41, add `This string may also be denoted
by "\x12\c3"'.

In S3.5.7, page 73, line 23, replace `if there is room or if  the
array  is of unknown size' with `if it has one'.  (The ability to
initialize a non-null-terminated array without using  \c  may  be
listed as a Common Extension.)

henry@zoo.toronto.edu (Henry Spencer) (10/10/90)

In article <1009@nlsun1.oracle.nl> bengsig@oracle.nl (Bjorn Engsig) writes:
>Since K&R1 seems to be clear, how come the compilers accepted it?  Or
>does K&R1 actually hide it somewhere?

It is worth remembering that K&R1 was obsolete in details very quickly.
For example, it does not mention structure assignment or enums, both of
which were de facto part of old C.  People implementing old-C compilers
had to work fairly hard to figure out exactly what language they were
implementing.  (Harbison & Steele, the best pre-ANSI reference manual,
came out of one company's effort to pin down an exact definition of C.)
Various little extensions became common practice without ever being
blessed by K&R1.
-- 
Imagine life with OS/360 the standard  | Henry Spencer at U of Toronto Zoology
operating system.  Now think about X.  |  henry@zoo.toronto.edu   utzoo!henry

scs@adam.mit.edu (Steve Summit) (10/16/90)

In article <1017@gistdev.gist.com> flint@gistdev.gist.com (Flint Pellett) writes:
>IMHO the committee blew it: their decision lets a programmer who will
>only use a string in a non-null terminated manner (like with strncpy)
>save 1 lousy byte, and opens the door for a ton of mistakes to get through.

Anyone who wants character arrays initialized with "regular"
strings should always be using

	char a[] = "hello";

Both

	char a[6] = "hello";
and
	char a[5] = "hello";

are risky, and both "open the door for a ton of mistakes to get
through."  Neither should be used in the normal case, but in the
abnormal case, when you've taken character counting upon yourself
for whatever reason and are prepared to live with the
consequences, either seems appropriate (depending, of course, on
your needs, which should be well documented and understood).

If anything, I'd say that non-nul-terminated strings are a bit
closer to the elusive "spirit of C."  The fact that the compiler
politely appends \0 has always seemed microscopically odd to me,
since nothing in the language proper assumes or depends on it.
(Yes, the standard libraries are now essentially part of the
"language proper;" so this statement is less true today.)  To be
sure, having the compiler append \0 is monumentally handy, and
I'm not saying that it shouldn't, but since when has the C
compiler held your hand?

(I'm actually not being terribly sarcastic here, but please don't
flame this opinion, in either direction, if you disagree -- it's
not a major point.)

Given that the implicit appending of \0 is a little bit "out of
band," I am pleased that there is a way for the programmer who
needs to to explicitly disable it.  This seems very much in the
spirit of C (and Unix).  (Granted, counting characters is
upsetting.  See Karl Heuer's recent post for an alternative
mechanism, which happened not to be adopted by the committee.)

>Here is a real life example of the impact of this decision: for
>about a week we had a 3B2 machine which kept crashing about once an
>hour because of this!
>1. It always crashed because it ran out of swap space.
>2. It was incorrectly set up so that one user could use up all the swap.
>3. One particular program was always running when it crashed.
[the program contained an inadvertent non-nul-terminated string
due to the above mechanism which turned into]
>an infinite loop chasing it's own tail.

Sorry to be unsympathetic, but if a system can be brought to its
knees by a user program grabbing all available swap space and/or
cpu cycles, then that's the bug, pure and simple.

Should "features" such as

	while(1);				/* don't try */
or
	while(malloc(1) != NULL);		/* these at */
or
	while(fork() >= 0);			/* home, kids */

be disallowed for the same reason?

                                            Steve Summit
                                            scs@adam.mit.edu

msb@sq.sq.com (Mark Brader) (10/18/90)

> Anyone who wants character arrays initialized with "regular"
> strings should always be using
> 	char a[] = "hello";
>
> Both	char a[6] = "hello";
> and	char a[5] = "hello";
> are risky, and both "open the door for a ton of mistakes to get
> through."  Neither should be used in the normal case, but in the
> abnormal case, when you've taken character counting upon yourself
> for whatever reason and are prepared to live with the
> consequences, either seems appropriate  ...

Agreed.  A particularly tricky case, though, is this one:

	#include "foo.h"

	char bar[BAR_LEN] = "initial bar";

which gives surprising results if you think that strlen("initial bar")
is safely less than BAR_LEN, and it really is equal to it.  However,
this is a fairly rare form of initialization, and I wouldn't give
much weight to the point I just raised.  X3J11 chose rightly, I think.
-- 
Mark Brader		   "I don't care HOW you format   char c; while ((c =
SoftQuad Inc., Toronto	    getchar()) != EOF) putchar(c);   ... this code is a
utzoo!sq!msb, msb@sq.com    bug waiting to happen from the outset." --Doug Gwyn

This article is in the public domain.

flint@gistdev.gist.com (Flint Pellett) (10/23/90)

scs@adam.mit.edu (Steve Summit) writes:

>Sorry to be unsympathetic, but if a system can be brought to its
>knees by a user program grabbing all available swap space and/or
>cpu cycles, then that's the bug, pure and simple.

You won't get any argument from me on that: you're right.  (The
configuration in which this occurred happened to be the default
unmodified configuration right from the vendor: after it happened
I told them the same thing you just said, but not as politely.)

>Should "features" such as

>	while(1);				/* don't try */
>or
>	while(malloc(1) != NULL);		/* these at */
>or
>	while(fork() >= 0);			/* home, kids */

>be disallowed for the same reason?

Obviously not.  They are different from the thing being discussed,
because in the above, everything that is happening is EXPLICIT.
That's my whole argument, that I dislike to see things happening
IMPLICITLY, or inconsistently, which is clearly what is going on when 
char a[2] = "a"; does add a '\0' and char a[2] = "ab"; does not.
That's why I like Karl's idea of adding the \c : because it makes
everything that is happening very explicit, so you don't miss it.

Have we beaten this to death yet?
-- 
Flint Pellett, Global Information Systems Technology, Inc.
1800 Woodfield Drive, Savoy, IL  61874     (217) 352-1165
uunet!gistdev!flint or flint@gistdev.gist.com