nreadwin@micrognosis.co.uk (Neil Readwin) (10/04/90)
Can someone tell me why the following initializer is legal inside a
structure, but not outside it ? Or is it a compiler bug ?
struct foo {
char x[5];
} bar = {"12345"};
char baz[5] = "12345";
The VMS compiler barfs on the second one with
%CC-W-TRUNCSTRINIT, String initializer for "baz" contains
too many characters to fit; truncated.
At line number 5 in CASSIUS:[NREADWIN.TMP]ZZ.C;4.
The SunOS compiler agrees
"zz.c", line 5: too many initializers
gcc seems quite happy with both.
I was unable to decrypt what K&R had to say on the matter - should the null
character appended to the string count as an initializer in both cases ?
Disclaimer: 818 Phone: +44 71 528 8282 E-mail: nreadwin@micrognosis.co.uk
W Westfield: Abstractions of hammers aren't very good at hitting real nails
poser@csli.Stanford.EDU (Bill Poser) (10/05/90)
In article <1990Oct4.152756.6850@micrognosis.co.uk> nreadwin@micrognosis.co.uk (Neil Readwin) writes: > > Can someone tell me why the following initializer is legal inside a > structure, but not outside it ? Or is it a compiler bug ? > >struct foo { > char x[5]; > } bar = {"12345"}; > >char baz[5] = "12345"; I would say that both are erroneous. The reason that you can't assign "12345" to baz is that baz is an array of FIVE chars and the string "12345" requires SIX characters, five for the five digits, and one for the terminating null. The largest string (in the sense of "sequence of characters terminated by a null") that you can put in baz is one four characters long. For this reason, the structure initialization shouldn't work either. Padding of the structure may allocate an additional byte so that the assignment doesn't actually trash anything, but I don't see why the compiler isn't checking the declared array size.
edgincd2@mentor.cc.purdue.edu (Chris Edgington *Computer Science Major*) (10/05/90)
In article <1990Oct4.152756.6850@micrognosis.co.uk>, nreadwin@micrognosis.co.uk (Neil Readwin) writes: > > Can someone tell me why the following initializer is legal inside a > structure, but not outside it ? Or is it a compiler bug ? > > struct foo { > char x[5]; > } bar = {"12345"}; > > char baz[5] = "12345"; > When you request the compiler to allocate n chars in the array, you really should only use the first 4 if you are going to be using the array as a string because one of the chars allocated is used for the NULL, which tells the compiler where the end of the string is. If you write over the NULL and then try to print the string, the compiler [runtime code] will just continue printing until it encounters a NULL, signifying the end of the string. Therefore, to allocate ample space for your string "12345", you need to have char baz[6]. __ __ / ) / / ` / _/_ / /_ __ o _ /-- __/ _, o ____ _, / ________ (__/ / /_/ (_<_/_)_ (___, (_/_(_)_<_/ / <_(_)_<__(_) / / <_ /| /| |/ |/ Chris Edgington edgincd2@mentor.cc.purdue.edu Purdue University
poser@csli.Stanford.EDU (Bill Poser) (10/05/90)
Regarding the assignment of "12345" to char x[5] and struct{char x[5]}, I spoke too soon. K&R2 contains a detail I hadn't noticed, and am not sure that I approve of. On p.219, in the discussion of initialization of fixed size arrays by string constants, it states: ...the number of characters in the string, NOT COUNTING THE TERMINATING NULL CHARACTER, must not exceed the size of the array. [emphasis mine] This means that the assignment of "12345" to an array of five characters, is legal. If K&R2 here reflects the standard, then both initializations are legitimate. This seems to me to be a bad idea. Everywhere else, one has to take into account the terminating null. For example, x[5] = 'a' is an error. Not counting the terminating null here is inconsistent. Can anyone explain this decision?
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/05/90)
In article <14796@mentor.cc.purdue.edu> edgincd2@mentor.cc.purdue.edu (Chris Edgington *Computer Science Major*) writes: > In article <1990Oct4.152756.6850@micrognosis.co.uk>, nreadwin@micrognosis.co.uk (Neil Readwin) writes: > > char baz[5] = "12345"; [ explanation ] More to the point, an alert reader will notice that you haven't accounted for the NULL. Whether or not that's legal, you should always treat non-string (i.e., non-NULL-terminated) character arrays as real arrays with no relation between consecutive characters. Something like char baz[5] = { '1', '2', '3', '4', '5' }; This expresses your intent much more clearly. > Therefore, to allocate ample space for your string "12345", you need to have > char baz[6]. Only if you really do mean it that way---but from your article you obviously know how many characters to allocate for a NULL-terminated string, so you wouldn't be asking if that were the answer. (For those new to C, the easy way to allocate a string is char baz[] = "12345";.) Am I reading your mind correctly? :-) ---Dan
bengsig@oracle.nl (Bjorn Engsig) (10/05/90)
re: char mesg[5] = "help!"; /* what about the null terminator? */ The ANSI standard says (3.5.7): "Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array." and the rationale mentions: "(Some widely used implementations provide precedent.)" further, it fits well with the way strncpy() works. -- Bjorn Engsig, Domain: bengsig@oracle.nl, bengsig@oracle.com Path: uunet!mcsun!orcenl!bengsig From IBM: auschs!ibmaus!cs.utexas.edu!uunet!oracle!bengsig
volpe@underdog.crd.ge.com (Christopher R Volpe) (10/05/90)
In article <15674@csli.Stanford.EDU>, poser@csli.Stanford.EDU (Bill Poser) writes: |>Regarding the assignment of "12345" to char x[5] and struct{char x[5]}, |>I spoke too soon. K&R2 contains a detail I hadn't noticed, and am not |>sure that I approve of. On p.219, in the discussion of initialization |>of fixed size arrays by string constants, it states: |> |> ...the number of characters in the string, NOT COUNTING |> THE TERMINATING NULL CHARACTER, must not exceed the |> size of the array. [emphasis mine] |> |>This means that the assignment of "12345" to an array of five characters, |>is legal. If K&R2 here reflects the standard, then both initializations |>are legitimate. |> |>This seems to me to be a bad idea. Everywhere else, one has to take |>into account the terminating null. For example, x[5] = 'a' is |>an error. Not counting the terminating null here is inconsistent. |>Can anyone explain this decision? The fact that x[5] = 'a' is an error has nothing to do with any terminating null. It's an error because x[5] doesn't exist. The array has elements x[0] through x[4]. There may be situations where you just want an array of characters, and DON'T want a "string" (null terminated). Thus, you have the capability of creating a five-byte array of char and initializing it with "abcde" and a six-byte string and initializing it with "abcde" also. If you don't like having to remember to allocate space for the terminating null when declaring the array, let the compiler do it for you: char x[] = "abcde"; will create an array of six chars and initialize it, including the terminating null. ================== Chris Volpe G.E. Corporate R&D volpecr@crd.ge.com
chris@mimsy.umd.edu (Chris Torek) (10/05/90)
In article <15674@csli.Stanford.EDU> poser@csli.Stanford.EDU (Bill Poser) writes: >Regarding the assignment of "12345" to char x[5] ... [K&R 2 says] > ...the number of characters in the string, NOT COUNTING > THE TERMINATING NULL CHARACTER, must not exceed the > size of the array. [emphasis mine] >Can anyone explain [why the ending '\0' is not counted]? This is a change in New (ANSI) C. In Classic (K&R-1) C, a double-quoted string in an initializer context%, when setting the initial value of a character array, was treated uniformly as if it were a bracketed initializer consisting of all the characters, including the terminating NUL, in the string. That is, char x[5] = "12345"; meant exactly the same thing as char x[5] = { '1', '2', '3', '4', '5', '\0' }; (and was therefore in error, having too many characters). The X3J11 committee decided# that this was overly restrictive, and relaxed the rule to `is equivalent to a bracketed initializer consisting of all the characters, including the terminating NUL if it fits'. Thus char x[] = "12345"; means the same as char x[] = { '1', '2', '3', '4', '5', '\0' }; or char x[6] = { '1', '2', '3', '4', '5', '\0' }; but char x[5] = "12345"; now means the same as char x[5] = { '1', '2', '3', '4', '5' }; If the declaration is changed to char x[4] = "12345"; it is once again in error. ----- % Note that here (in an initializer context) and as an argument to sizeof (e.g., `sizeof "abc"') are the only two places that a double quoted string does not undergo the usual `array degenerates into pointer' rule. All other legal occurrences of a double-quoted string are in a value context, and therefore change from `array N of char' to `pointer to char', pointing to the first character in the string. # This wording is not meant to imply judgement as to this decision. (When I do not take a stand on some aspect of the language I use weasel-wording like `seems to be' or merely present bare facts.) Since I use old compilers, I have not made up my mind on this. I am leaning towards the `not a bad idea after all' faction. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
jim@jagmac2.gsfc.nasa.gov (Jim Jagielski) (10/05/90)
In article <9418:Oct503:06:2790@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <14796@mentor.cc.purdue.edu> edgincd2@mentor.cc.purdue.edu (Chris Edgington *Computer Science Major*) writes: >> In article <1990Oct4.152756.6850@micrognosis.co.uk>, nreadwin@micrognosis.co.uk (Neil Readwin) writes: >> > char baz[5] = "12345"; > [ explanation ] > >> Therefore, to allocate ample space for your string "12345", you need to have >> char baz[6]. > >Only if you really do mean it that way---but from your article you >obviously know how many characters to allocate for a NULL-terminated >string, so you wouldn't be asking if that were the answer. (For those >new to C, the easy way to allocate a string is char baz[] = "12345";.) > -------------------- ^ Let's clarify the above ---------------------------| You forget that doing "char baz[] = "12345";" is the same as: char baz[] = { '1','2','3','4','5','\0' }; It appears in the above that you imply that doing char baz[] = "12345"; would result in a non-NULL-terminated string -- this is not correct. Of course, I may not be reading your mind right :) In any case, recall that C will always append the '\0' to any string constant. If you don't want \0 in there, either copy upto the NULL (strncpy) or use characters ('a', etc...) -- ======================================================================= #include <std/disclaimer.h> =:^) Jim Jagielski NASA/GSFC, Code 711.1 jim@jagmac2.gsfc.nasa.gov Greenbelt, MD 20771 "Kilimanjaro is a pretty tricky climb. Most of it's up, until you reach the very, very top, and then it tends to slope away rather sharply."
defaria@hpclapd.HP.COM (Andy DeFaria) (10/05/90)
>/ hpclapd:comp.lang.c / poser@csli.Stanford.EDU (Bill Poser) / 6:16 pm Oct 4, 1990 / >Regarding the assignment of "12345" to char x[5] and struct{char x[5]}, >I spoke too soon. K&R2 contains a detail I hadn't noticed, and am not >sure that I approve of. On p.219, in the discussion of initialization >of fixed size arrays by string constants, it states: > > ...the number of characters in the string, NOT COUNTING > THE TERMINATING NULL CHARACTER, must not exceed the > size of the array. [emphasis mine] > >This means that the assignment of "12345" to an array of five characters, >is legal. If K&R2 here reflects the standard, then both initializations >are legitimate. > >This seems to me to be a bad idea. Everywhere else, one has to take >into account the terminating null. For example, x[5] = 'a' is >an error. Not counting the terminating null here is inconsistent. >Can anyone explain this decision? >---------- It seems to me (and I am be no stretch of the imagination a C expert) that K&R C is saying "Sure you can use all 5 characters for a legitimate string. You can manipulate them any way you want. You might be using it to contain a fixed length string of 5 characters. But don't you ever try to use it with any string procedures (strlen, or even printf's %s operator) or expect to get burned!"
henry@zoo.toronto.edu (Henry Spencer) (10/06/90)
In article <1990Oct4.152756.6850@micrognosis.co.uk> nreadwin@micrognosis.co.uk (Neil Readwin) writes: > Can someone tell me why the following initializer is legal inside a > structure, but not outside it ? Or is it a compiler bug ? > >struct foo { > char x[5]; > } bar = {"12345"}; It's a compiler bug. ANSI C, 3.5.7 (emphasis added): An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character *if there is room* or if the array is of unknown size) initialize the elements of the array. Your compilers are assuming that "12345" has six characters in it, which is correct in general, but for this oddball special case in initializers the terminating null is present only if there is room for it. -- Imagine life with OS/360 the standard | Henry Spencer at U of Toronto Zoology operating system. Now think about X. | henry@zoo.toronto.edu utzoo!henry
henry@zoo.toronto.edu (Henry Spencer) (10/06/90)
In article <15674@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes: >This means that the assignment of "12345" to an array of five characters, >is legal. If K&R2 here reflects the standard, then both initializations >are legitimate. It does; they are. >This seems to me to be a bad idea. Everywhere else, one has to take >into account the terminating null... >... Not counting the terminating null here is inconsistent. >Can anyone explain this decision? It's a special case because this form of initializer is a special case. Normally, assigning a string of any length to an array of char would be illegal. -- Imagine life with OS/360 the standard | Henry Spencer at U of Toronto Zoology operating system. Now think about X. | henry@zoo.toronto.edu utzoo!henry
jbickers@templar.actrix.co.nz (John Bickers) (10/06/90)
Quoted from - poser@csli.Stanford.EDU (Bill Poser): > ...the number of characters in the string, NOT COUNTING > THE TERMINATING NULL CHARACTER, must not exceed the > size of the array. [emphasis mine] > an error. Not counting the terminating null here is inconsistent. > Can anyone explain this decision? Sounds like this is intended to allow a nice way to initialize character arrays that aren't necessarily strings. Like, say, a 4 character ID in a structure, that is meant to be compared and writ with things like mem... or strn... Consider that a character array is not necessarily going to be used as a "string", and since C doesn't distinguish between the two with any sort of type keyword, it's better to provide for the more general case. Does lint warn about this sort of thing? -- *** John Bickers, TAP, NZAmigaUG. jbickers@templar.actrix.co.nz *** *** "All I can do now is wait for the noise." - Numan ***
cpcahil@virtech.uucp (Conor P. Cahill) (10/06/90)
In article <1990Oct4.152756.6850@micrognosis.co.uk> nreadwin@micrognosis.co.uk (Neil Readwin) writes: > > Can someone tell me why the following initializer is legal inside a > structure, but not outside it ? Or is it a compiler bug ? While the "legality" is questionable, so is the "correct" behaviour. My pcc compiler accepts it, but only takes the first 5 items (of course, this may not be obvious in a test because of alignment considerations, but when you use 8 as the dimension and "12345678" as the initializer, you will see a problem. For example: char b[8] = "12345678"; char c[8] = "1234"; main() { printf("b = 0x%lx (%s), c = 0x%lx (%s)\n",b,b,c,c); } The output of which is: b = 0x400acc (123456781234), c = 0x400ad4 (1234) Anyway, the compiler *should* complain about it in both cases, but in many cases will silently do the truncation. Playing with it a bit more show that both GCC and pcc will complain about it if the next byte in the initialization string is not null. For example: char baz[5] = "123456"; will get a warning about the initalizer string being too long (from gcc) or "non-null byte ignored in string initializer" from pcc. >I was unable to decrypt what K&R had to say on the matter - should the null >character appended to the string count as an initializer in both cases ? No cases should copy the null terminator. They should not copy any more bytes than is specified in the array dimension. The fact that you chose a count of 5 will usually result in some alignment bytes between each variable/structure and hence it appears that the null was copied. This is not the case. Only the first 5 bytes would be copied. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc., uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
cpcahil@virtech.uucp (Conor P. Cahill) (10/06/90)
In article <15674@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes: >This means that the assignment of "12345" to an array of five characters, >is legal. If K&R2 here reflects the standard, then both initializations >are legitimate. While it is "legal" it still should get a warning since it is doing something that you may not expect. >This seems to me to be a bad idea. Everywhere else, one has to take >into account the terminating null. For example, x[5] = 'a' is >an error. Not counting the terminating null here is inconsistent. This has nothing to do with a terminating null. x[5] is illegal because you are accessing an element beyound the end of the array (assuming it was declared as char x[5]). >Can anyone explain this decision? Probably because that was the existing standard (the way C has worked all along). Another way to look at this is that "char x[dim];" declares an array of characters, not a character string. So the null need not be there and without this rule you couldn't initialize the last element of the array to be a non-null. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc., uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
dan@kfw.COM (Dan Mick) (10/06/90)
In article <9418:Oct503:06:2790@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >More to the point, an alert reader will notice that you haven't >accounted for the NULL. ^^^^ Argh. Dan, I'm shocked. That's NUL. NULL is a pointer. NUL is a character.
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/06/90)
In article <1990Oct6.011240.8538@kfw.COM> dan@kfw.com (Dan Mick) writes: > In article <9418:Oct503:06:2790@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >More to the point, an alert reader will notice that you haven't > >accounted for the NULL. > Argh. Dan, I'm shocked. > That's NUL. NULL is a pointer. NUL is a character. Only for people who think in C. I learned from Knuth, and I still write /\ (well, can't really do a capital lambda on a non-APL keyboard) when I think of the null/nil/meaningless pointer. The null character is 0. Meaning 3 in my dictionary... ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/06/90)
In article <3568@dftsrv.gsfc.nasa.gov> jim@jagmac2.gsfc.nasa.gov (Jim Jagielski) writes: > You forget that doing "char baz[] = "12345";" is the same as: > char baz[] = { '1','2','3','4','5','\0' }; That's what I said. Null-terminated character arrays are called strings, and my point was that the original poster was *not* asking about them. Again, the right way to initialize a five-element character array is to list the five characters explicitly: char baz[5] = { '1', '2', '3', '4', '5' } ; If you use "12345", you'll confuse the reader (not to mention any old compilers) into thinking that you really want a (0-terminated) string. ---Dan
henry@zoo.toronto.edu (Henry Spencer) (10/07/90)
In article <21149:Oct604:52:2190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >> That's NUL. NULL is a pointer. NUL is a character. > >Only for people who think in C. I learned from Knuth, and I still write >/\ (well, can't really do a capital lambda on a non-APL keyboard) when I >think of the null/nil/meaningless pointer. > >The null character is 0. Meaning 3 in my dictionary... The capitalization here is significant. NULL is a name for 0, used when discussing null [note lower case] pointers in C. NUL is the official ASCII name for the character with bit pattern 0000000, often used as a null character. -- Imagine life with OS/360 the standard | Henry Spencer at U of Toronto Zoology operating system. Now think about X. | henry@zoo.toronto.edu utzoo!henry
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/07/90)
In article <26860@mimsy.umd.edu>, chris@mimsy.umd.edu (Chris Torek) writes: [char x[5] = "12345";] > is a change in New (ANSI) C. and provides a lucid explanation. He further says > Since I use old compilers, I have not made up my mind on this. I > am leaning towards the `not a bad idea after all' faction. Data point: the annotated C++ reference manual explicitly says that this feature has _not_ been accepted for C++. I don't know what the C++ standard will say; I'm sure there will be big fights over whether it is better to be close to the C++ base document or the C standard. At any rate, for now, C code using this feature will not port to C++. -- Fear most of all to be in error. -- Kierkegaard, quoting Socrates.
msb@sq.sq.com (Mark Brader) (10/08/90)
> Again, the right way (as a point of style, he means) > to initialize a five-element character array is to > list the five characters explicitly: > char baz[5] = { '1', '2', '3', '4', '5' } ; > If you use "12345", you'll confuse the reader (not to mention any old > compilers) into thinking that you really want a (0-terminated) string. This may be true if you're dealing with 5-character arrays, but it fails as soon as there are too many initializers to count by eye. Suppose it was: char parity[64] = "EOOEOEEOOEEOEOOEOEEOEOOEEOOEOEEOOEEOEOOEEOOEOEEOEOOEOEEOOEEOEOOE"; It's obvious from its content that this is not a string to be printed, so the absence of a trailing null* should not cause confusion. You could add a one-line comment if you you really must. But I would (mildly) prefer to see the one line above than four lines of 'E', 'O', 'O', 'E', 'O', ... *or NUL, or '\0', but, please, never NULL. -- Mark Brader "Metal urgy. The urge to use metals. SoftQuad Inc., Toronto That was humans, all right." utzoo!sq!msb, msb@sq.com -- Terry Pratchett: Truckers This article is in the public domain.
bengsig@oracle.nl (Bjorn Engsig) (10/08/90)
Article <26860@mimsy.umd.edu> by chris@mimsy.umd.edu (Chris Torek) says: | | In Classic (K&R-1) C, a |double-quoted string in an initializer context%, when setting the |initial value of a character array, was treated uniformly as if it were |a bracketed initializer consisting of all the characters, including |the terminating NUL, in the string. Yes, it seems to me that K&R1 says so - even if I would say it didn't. The rationale for ANSI C says that accepting 'char x[2] = "ab"' (omitting the NUL) is due to widely existing practice. This is in fact true, at least I have seen many Classic C compilers that allowed it and didn't warn about it. Since K&R1 seems to be clear, how come the compilers accepted it? Or does K&R1 actually hide it somewhere? As a comment to another note in this thread that string functions shouldn't be used with non NUL terminated strings; strncpy is actually designed to work with non NUL terminated fixed length strings, and you will normally use 'x[0]=0; strncat(x,s,n)' if you want a limited NUL terminated copy of strings, whereas 'strncpy(x,s,n)' may yield surprises. -- Bjorn Engsig, E-mail: bengsig@oracle.com, bengsig@oracle.nl ORACLE Corporation From IBM: auschs!ibmaus!cs.utexas.edu!uunet!oracle!bengsig "Stepping in others footsteps, doesn't bring you ahead"
flint@gistdev.gist.com (Flint Pellett) (10/08/90)
chris@mimsy.umd.edu (Chris Torek) writes: >In article <15674@csli.Stanford.EDU> poser@csli.Stanford.EDU >(Bill Poser) writes: >>Regarding the assignment of "12345" to char x[5] ... [K&R 2 says] >> ...the number of characters in the string, NOT COUNTING >> THE TERMINATING NULL CHARACTER, must not exceed the >> size of the array. [emphasis mine] >>Can anyone explain [why the ending '\0' is not counted]? >This is a change in New (ANSI) C. In Classic (K&R-1) C, a >double-quoted string in an initializer context%, when setting the >initial value of a character array, was treated uniformly as if it were >a bracketed initializer consisting of all the characters, including >the terminating NUL, in the string. That is, > char x[5] = "12345"; >meant exactly the same thing as > char x[5] = { '1', '2', '3', '4', '5', '\0' }; >(and was therefore in error, having too many characters). On AT&T 3B2 machines about 2-3 years ago, it did not produce a compile error: I know, I lived through it. See story below. >The X3J11 committee decided# that this was overly restrictive, and >relaxed the rule to `is equivalent to a bracketed initializer >consisting of all the characters, including the terminating NUL if it >fits'. Thus IMHO the committee blew it: their decision lets a programmer who will only use a string in a non-null terminated manner (like with strncpy) save 1 lousy byte, and opens the door for a ton of mistakes to get through. I imagine their main motivation was compatibility, but I think this is still a mistake: if I write it as a double quoted string, _I_ mean that I want it null terminated. Here is a real life example of the impact of this decision: for about a week we had a 3B2 machine which kept crashing about once an hour because of this! We finally traced the problem through this chain, at a cost of 20 minutes per reboot and anywhere from 10 minutes to several hours chasing the problem at each step. 1. It always crashed because it ran out of swap space. 2. It was incorrectly set up so that one user could use up all the swap. 3. One particular program was always running when it crashed. 4. Performance hit bottom when that program was run, and you couldn't abort the program without killing it from another terminal. 5. Only certain functions within the program caused the crash. 6. We were able to keep the system from crashing by retuning, but we still had performance problems, and this program wasn't working: it appeared to be in an infinite loop. 7. The critical routine that killed us (reduced to the part that mattered) eventually was this: char foo[5] = "abcde"; /* NOTE: no room for terminating '\0' char */ char bar[] = "fghi"; /* NOTE: declared immediately behind the foo array */ sprintf(bar,"%s",foo); /* copy foo into bar: other tweaking omitted */ The problem was introduced by a maintenance change correction to the string in foo, making it 1 longer but forgetting to fix the length of 5. That, coupled with the fact that array bar followed immediately behind array foo, which no longer was NUL terminated, turned the sprintf into an infinite loop chasing it's own tail. If C thinks this feature is useful, they __at least__ ought to generate a warning message, because 99 times out of 100 it's going to be a bug, not an intended use, and it is VERY hard to spot an error of this nature when looking at the code-- it "looks" right. -- Flint Pellett, Global Information Systems Technology, Inc. 1800 Woodfield Drive, Savoy, IL 61874 (217) 352-1165 uunet!gistdev!flint or flint@gistdev.gist.com
lerman@stpstn.UUCP (Ken Lerman) (10/08/90)
In article <15674@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes:
->Regarding the assignment of "12345" to char x[5] and struct{char x[5]},
->I spoke too soon. K&R2 contains a detail I hadn't noticed, and am not
->sure that I approve of. On p.219, in the discussion of initialization
->of fixed size arrays by string constants, it states:
->
-> ...the number of characters in the string, NOT COUNTING
-> THE TERMINATING NULL CHARACTER, must not exceed the
-> size of the array. [emphasis mine]
->
->This means that the assignment of "12345" to an array of five characters,
->is legal. If K&R2 here reflects the standard, then both initializations
->are legitimate.
->
->This seems to me to be a bad idea. Everywhere else, one has to take
->into account the terminating null. For example, x[5] = 'a' is
->an error. Not counting the terminating null here is inconsistent.
->Can anyone explain this decision?
I can't explain the decision, but I can understand that it might be
useful. It does make sense to have an array of characters in the same
sense that one has an array of integers. In that case, if one knows
the length, there should be no requirement that a character with the
value 0 be stored to signify the end.
It does seem to be an opportunity for error, though.
Ken
karl@haddock.ima.isc.com (Karl Heuer) (10/09/90)
In article <26860@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: >[Allowing `char x[5]="12345";' is new to ANSI C.] True, and therefore the answer to the original question is "failure to accept this is a compiler bug iff your compiler claims ANSI conformance". I opposed this feature (prefering to leave it a Common Extension, which was its pre-ANSI status) because I had a counterproposal (enclosed for your reading pleasure) that I think was cleaner and more general. Unfortunately, I didn't have existing practice on my side, and it was rejected. Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint --------cut here-------- Proposal #1 Add new escape sequence \c. Summary This proposal cleans up two warts in the language: initializing a character array without adding a null character, and terminating a hexadecimal escape which might be followed by a valid hexadecimal digit. It also allows the user to explicitly document when a null character is unnecessary, e.g. write(1,"\n\c",1). Justification I presume the Committee is already aware of the need for non- null-terminated character arrays, since the January Draft makes a special case for them in S3.5.7. However, the mechanism requires the user to count the characters himself in order to make sure that he doesn't leave room for the null characters; this is a maintenance nightmare. My proposal is a cleaner way to accomplish this. It has been suggested that although an escape to suppress the null character is useful, the termination of hex escapes is not an issue because it is handled by string literal pasting. String pasting is useful for line continuation without backslash-newline, and for constructing string literals in macros, but using it to indicate the end of a hex escape is a botch. This is nearly as bad as suggesting that the whole string be written in hex. Moreover, it's very C-specific; one could not advertise a program that `accepts all the C escapes' as input, without first solving the hex-termination problem all over again. Also, it doesn't handle character constants. The example in S3.1.3.4 is clearly a kludge--it suggests replacing the hex escape with octal. This won't always be possible on an architecture with 12-bit bytes, for example. Finally, if the \c escape is added anyway for the null- suppression feature, the additional change of insisting that it be a no-op in other contexts is minor. Specific changes In S3.1.3.4, page 29, line 10, add \c to the list of escapes. Add the description: `The \c escape at the end of a string literal suppresses the trailing null character that would normally be appended. If \c appears in a character constant, or anywhere in a string literal other than at the end, then it is ignored, but may serve to separate an octal or hexadecimal escape from a following digit.' In S3.1.3.4, page 30, line 35, change '\0223' to '\x12\c3'. In S3.1.4, page 31, line 29, after `A null character is then appended' add `unless the string literal ended with \c'. Make a similar change to line 31. Add the sentence `If a character string literal or a wide string literal has zero length, the behavior is undefined'. Add to footnote 16 the text `or it may lack a trailing null character because of \c'. In S3.1.4, page 31, line 41, add `This string may also be denoted by "\x12\c3"'. In S3.5.7, page 73, line 23, replace `if there is room or if the array is of unknown size' with `if it has one'. (The ability to initialize a non-null-terminated array without using \c may be listed as a Common Extension.)
henry@zoo.toronto.edu (Henry Spencer) (10/10/90)
In article <1009@nlsun1.oracle.nl> bengsig@oracle.nl (Bjorn Engsig) writes: >Since K&R1 seems to be clear, how come the compilers accepted it? Or >does K&R1 actually hide it somewhere? It is worth remembering that K&R1 was obsolete in details very quickly. For example, it does not mention structure assignment or enums, both of which were de facto part of old C. People implementing old-C compilers had to work fairly hard to figure out exactly what language they were implementing. (Harbison & Steele, the best pre-ANSI reference manual, came out of one company's effort to pin down an exact definition of C.) Various little extensions became common practice without ever being blessed by K&R1. -- Imagine life with OS/360 the standard | Henry Spencer at U of Toronto Zoology operating system. Now think about X. | henry@zoo.toronto.edu utzoo!henry
scs@adam.mit.edu (Steve Summit) (10/16/90)
In article <1017@gistdev.gist.com> flint@gistdev.gist.com (Flint Pellett) writes: >IMHO the committee blew it: their decision lets a programmer who will >only use a string in a non-null terminated manner (like with strncpy) >save 1 lousy byte, and opens the door for a ton of mistakes to get through. Anyone who wants character arrays initialized with "regular" strings should always be using char a[] = "hello"; Both char a[6] = "hello"; and char a[5] = "hello"; are risky, and both "open the door for a ton of mistakes to get through." Neither should be used in the normal case, but in the abnormal case, when you've taken character counting upon yourself for whatever reason and are prepared to live with the consequences, either seems appropriate (depending, of course, on your needs, which should be well documented and understood). If anything, I'd say that non-nul-terminated strings are a bit closer to the elusive "spirit of C." The fact that the compiler politely appends \0 has always seemed microscopically odd to me, since nothing in the language proper assumes or depends on it. (Yes, the standard libraries are now essentially part of the "language proper;" so this statement is less true today.) To be sure, having the compiler append \0 is monumentally handy, and I'm not saying that it shouldn't, but since when has the C compiler held your hand? (I'm actually not being terribly sarcastic here, but please don't flame this opinion, in either direction, if you disagree -- it's not a major point.) Given that the implicit appending of \0 is a little bit "out of band," I am pleased that there is a way for the programmer who needs to to explicitly disable it. This seems very much in the spirit of C (and Unix). (Granted, counting characters is upsetting. See Karl Heuer's recent post for an alternative mechanism, which happened not to be adopted by the committee.) >Here is a real life example of the impact of this decision: for >about a week we had a 3B2 machine which kept crashing about once an >hour because of this! >1. It always crashed because it ran out of swap space. >2. It was incorrectly set up so that one user could use up all the swap. >3. One particular program was always running when it crashed. [the program contained an inadvertent non-nul-terminated string due to the above mechanism which turned into] >an infinite loop chasing it's own tail. Sorry to be unsympathetic, but if a system can be brought to its knees by a user program grabbing all available swap space and/or cpu cycles, then that's the bug, pure and simple. Should "features" such as while(1); /* don't try */ or while(malloc(1) != NULL); /* these at */ or while(fork() >= 0); /* home, kids */ be disallowed for the same reason? Steve Summit scs@adam.mit.edu
msb@sq.sq.com (Mark Brader) (10/18/90)
> Anyone who wants character arrays initialized with "regular" > strings should always be using > char a[] = "hello"; > > Both char a[6] = "hello"; > and char a[5] = "hello"; > are risky, and both "open the door for a ton of mistakes to get > through." Neither should be used in the normal case, but in the > abnormal case, when you've taken character counting upon yourself > for whatever reason and are prepared to live with the > consequences, either seems appropriate ... Agreed. A particularly tricky case, though, is this one: #include "foo.h" char bar[BAR_LEN] = "initial bar"; which gives surprising results if you think that strlen("initial bar") is safely less than BAR_LEN, and it really is equal to it. However, this is a fairly rare form of initialization, and I wouldn't give much weight to the point I just raised. X3J11 chose rightly, I think. -- Mark Brader "I don't care HOW you format char c; while ((c = SoftQuad Inc., Toronto getchar()) != EOF) putchar(c); ... this code is a utzoo!sq!msb, msb@sq.com bug waiting to happen from the outset." --Doug Gwyn This article is in the public domain.
flint@gistdev.gist.com (Flint Pellett) (10/23/90)
scs@adam.mit.edu (Steve Summit) writes: >Sorry to be unsympathetic, but if a system can be brought to its >knees by a user program grabbing all available swap space and/or >cpu cycles, then that's the bug, pure and simple. You won't get any argument from me on that: you're right. (The configuration in which this occurred happened to be the default unmodified configuration right from the vendor: after it happened I told them the same thing you just said, but not as politely.) >Should "features" such as > while(1); /* don't try */ >or > while(malloc(1) != NULL); /* these at */ >or > while(fork() >= 0); /* home, kids */ >be disallowed for the same reason? Obviously not. They are different from the thing being discussed, because in the above, everything that is happening is EXPLICIT. That's my whole argument, that I dislike to see things happening IMPLICITLY, or inconsistently, which is clearly what is going on when char a[2] = "a"; does add a '\0' and char a[2] = "ab"; does not. That's why I like Karl's idea of adding the \c : because it makes everything that is happening very explicit, so you don't miss it. Have we beaten this to death yet? -- Flint Pellett, Global Information Systems Technology, Inc. 1800 Woodfield Drive, Savoy, IL 61874 (217) 352-1165 uunet!gistdev!flint or flint@gistdev.gist.com