kpmartin@watmath.UUCP (Kevin Martin) (10/25/84)
The following article refers to the entire C environment, including the compiler, the linker, and the operating system. There seem to be four alternatives for what to do with externs and statics which are not explicitly initialized: 1) Have their value be undefined (i.e. garbage). Disadvantages: Breaks many current programs. It could be argued that well-written programs (as opposed to 'correct' programs) would not be broken, since a well-written program initializes variables explicitly if it cares about the initial value. Arrays of unknown size become effectively impossible to initialize (at all)(see note 1) Advantages: Consistent behaviour with autos and malloc'ed space Consistent with normal reason (i.e. the variable contains a predictable value ONLY IF it has been initialized in the C source). Tends to encourage easy-to-read code: the reader can tell (or *should* be able to tell, if coded cleanly) if there is initialization *code* somewhere. e.g. you are sure that in int x; int y = 5; there is initialization code (somewhere) for 'x' but not for 'y'. Makes object and a.out files smaller, thus program load time is also reduced (note 2)(note 4). Allows the programmer to get genuine "bss" (un-initialized) space. This becomes especially important if overlays are being used, since it may be desired that an overlay be loaded without re- initializing all the variables it contains (note 4). 2) Have their value be the 0 bit pattern. Disadvantages: Programs which don't explicitly initialize their pointers and floats would not port to any more machines than they currently do (note 3) Arrays of unknown size containing floats, doubles or pointers cannot be initialized (note 1). Advantages: This is the current method (i.e. inertia reigns) Makes object and a.out files smaller, thus program load time is also reduced (note 4). 3) Have their value set to a zero of the appropriate type. Disadvantages: Requires a somewhat arbitrary rule on "what is the appropriate type for a union?" Generates larger object files, etc (note 4). The programmer cannot signal to the reader that a variable is deliberately being left un-initialized. Arrays of unknown size cannot be initialized if they contain non-zero values. Advantages: Allows old code to be ported to new machines (note 3). 4) A combination of (1) and (2): Un-initialized variables start off as zero in the first overlay that is loaded. Subsequent overlays get whatever was left in the storage location by previous overlays. Disadvantages: Same as for (1), except that existing programs are not broken. Advantages: Same as (1), except that sloppy coding has a better chance of running. Note 1: By "array of unknown size", I mean, for example, and array whose size is a #define'd constant. There is currently no method of giving explicit initializers to such an array in its entirety, unless the source file is heavily modified each time the #define'd constant is changed. Note that the improved CPP facilities (#eval and genuine macros) which I described in an earlier article would allow such arrays to be initlalized to *any* value (not just zero bit pattern or zero of the appropriate type), thus making the variations on this disadvantage go poof. Note 2: Since most systems clear the memory before a program is loaded, for security purposes, method (1) often flukes out to be method (2). Note 3: If the purpose of the standard does not include porting existing (old) programs to new C implementations on "hostile" hardware, this advantage/ disadvantage does not exist. I believe that it is the case that the new standard should allow NEW programs to be written portably, and that old programs continue to work, but *only on machines on which they already work*. Note 4: These features (reduced object or a.out size, and overlays) may or may not exist on any particular system, and they may be non-issues to many users (because they have lots of disk space, or they think overlays are for the birds). However, these features *do* exist on some systems, and the users *do* find them useful, and it would be desireable that the standard *not* be written such that a compiler has to be non-conforming to take advantage of such features. If overlays are going to be ignored, (2) and (4) are equivalent. Ignoring the problems of upward compatibility and lazy programming styles, choice (1) is the winner. However, given that old programs must continue to work, Choice (4) looks like the best one. The only bad problem with (4) is that of array initialization. As mentioned above, this can be solved much more generally with an improved CPP. This standard will probably not include such features, or a method of choosing which union member to initialize. But there will be more C standards down the road, and these features may appear, making (1), (2) or (4) the clear winning choices. If the committee goes for choice (3) now, this will only encourage code which doesn't explicitly initialize things, and make for an even larger base of software to break when the next standard tries to go back to choice (1) or (2). I consider (4) with improved CPP to be the long-range goal, and the implementation of (3) in the current standard prevents changing to (4) in the next standard. We can either let it sit as is for now, and fix it properly when the facilities become available, or we can (for the feeble reason of porting old shit code to new machines) paint ourselves into yet another corner by fixing it poorly immediately. Kevin Martin, UofW Software Development Group
henry@utzoo.UUCP (Henry Spencer) (10/25/84)
It is definitely much too late to remove default initialization from C; far, far too much code depends on it, including the Unix kernel. Adding features is one thing. Taking them away is another. Note that "not breaking existing correct code" is a major objective of the ANSI committee. This means that default initialization must be present, and must follow either the zero-bit-pattern or as-if-integer-0 rule. The oddball machines are the only ones that pay a penalty for the wrong choice, so it comes down to a simple question of whether the people with such machines would rather maximize portability of older software to their machines, or maximize the efficiency (object-module size and load time) of new code. This is clearly a case where those of us with un-oddball machines should keep quiet; it is presumptuous of us to tell the oddball-machine people "we know what you ought to want". I suspect that really odd machines may end up with a compiler option to settle the matter; probably the default setting should be "portable". If overlays are being used mostly to get more code into a limited space, then clearly they should not affect the data. Such overlays are logically just an implementation technique for fitting lots of code into a small space, and (ideally) should not be visible on the language level at all. If it is specifically desired that overlays overlay the data space as well (i.e. act like exec()), then there's no problem. If what you have is something in between, then I think the only practical answer is that your techniques and the problems associated with them are implementation- dependent and are not a standards issue. Depending on what sort of overlays you have, the data gets left alone, or trashed, or re-initialized wholly or partly; I see no reason for an ANSI standard to try to bless one kind of overlays and condemn the others. > If the committee goes for [integer-0] now, this will only encourage code > which doesn't explicitly initialize things, and make for an even larger > base of software to break when the next standard tries to go back to > [no-default-initialization] or [zero-bit-pattern]. Why in the world would the next standard do anything so stupid? You are setting up a straw man. Of course there will be hell to pay if the next standard goes out of its way to be incompatible with the current one, but that's true regardless. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/29/84)
I have omitted much of the original text to keep the size down: > There seem to be four alternatives for what to do with externs and statics > which are not explicitly initialized: > > 1) Have their value be undefined (i.e. garbage). > Disadvantages: > Breaks many current programs. Very important from an economic standpoint. > Advantages: > Allows the programmer to get genuine "bss" (un-initialized) space. > This becomes especially important if overlays are being used, > since it may be desired that an overlay be loaded without re- > initializing all the variables it contains (note 4). Any overlay system that reinitializes variables is WRONG. > 2) Have their value be the 0 bit pattern. As has been pointed out in another discussion, the 0 bit pattern may not be appropriate for pointers on some machines. > Disadvantages: > Arrays of unknown size containing floats, doubles or pointers > cannot be initialized (note 1). If all elements are to be initialized to the same value, then this statement is false. The last explict initializer is repeated as necessary to fill out the array. > 3) Have their value set to a zero of the appropriate type. > Disadvantages: > Requires a somewhat arbitrary rule on "what is the appropriate type > for a union?" Easy; the type of the first member. > The programmer cannot signal to the reader that a variable is > deliberately being left un-initialized. Sure he can. Nothing prevents you from specifying initializers you care about and letting the rest default. I do this anyway. > Arrays of unknown size cannot be initialized if they contain > non-zero values. See above. > 4) A combination of (1) and (2): Un-initialized variables start off as > zero in the first overlay that is loaded. Subsequent overlays get whatever > was left in the storage location by previous overlays. The C language says nothing about overlays. This is an implementation issue that must be addressed by the overlay system designer, but it does not belong in the language specification. > Ignoring the problems of upward compatibility and lazy programming > styles, choice (1) is the winner. However, given that old > programs must continue to work, Choice (4) looks like the best one. > If the committee goes for choice (3) now, this will only encourage code > which doesn't explicitly initialize things, and make for an even larger > base of software to break when the next standard tries to go back to > choice (1) or (2). I vote for choice (3). I don't see that arrays or overlays have anything to do with the choice among these alternatives. (3) is cleanest.
kpmartin@watmath.UUCP (Kevin Martin) (10/30/84)
>>I wrote: >Doug Gwyn writes: >> There seem to be four alternatives for what to do with externs and statics >> which are not explicitly initialized: >> >> 1) Have their value be undefined (i.e. garbage). >> Disadvantages: >> Breaks many current programs. > >Very important from an economic standpoint. Yes. That is why I am not advocating this alternative. >> Advantages: >> Allows the programmer to get genuine "bss" (un-initialized) space. >> This becomes especially important if overlays are being used, >> since it may be desired that an overlay be loaded without re- >> initializing all the variables it contains (note 4). >Any overlay system that reinitializes variables is WRONG. Depends what you want. But as I said, I am not advocating this alternative for C. >> 2) Have their value be the 0 bit pattern. >As has been pointed out in another discussion, the 0 bit pattern may >not be appropriate for pointers on some machines. Whether it is appropriate for some machines does not affect its existance as an alternative. The point I made was to consider whether it is ANSI's goal to support old code on new machines. I realize there are such machines; I have to write a C compiler for at least two of them. Complete with non-zero NULL pointers. >> Disadvantages: >> Arrays of unknown size containing floats, doubles or pointers >> cannot be initialized (note 1). >If all elements are to be initialized to the same value, then this >statement is false. The last explict initializer is repeated as necessary >to fill out the array. Oh yeah? Since when? k&r appendix A section 8.6 paragraph #5 says missing items get zeroed. >> 3) Have their value set to a zero of the appropriate type. >> Disadvantages: >> Requires a somewhat arbitrary rule on "what is the appropriate type >> for a union?" >Easy; the type of the first member. As I said, an arbitrary rule. I didn't say it would be difficult to come up with one. Besides, who says I always want the same variant of the union initialized. What if one array uses the 'int' element (so I want int zeros), and another array uses the pointer element (so I want NULL's)? > >> The programmer cannot signal to the reader that a variable is >> deliberately being left un-initialized. >Sure he can. Nothing prevents you from specifying initializers you care >about and letting the rest default. I do this anyway. If you do this anyway, why do you care about this pariticular discussion? That's fine & dandy if I can be sure that whoever wrote the code was as diligent as you and I are about initializing things we really care about (and drank enough coffee that day, etc.) I agree, he can signal deliberately *initialized* ones. But given a declaration like int x; I can't tell if he really cares that there is a zero there, if he forgot to initialize it, or if he doesn't care what is in it. This is what I mean by deliberate *lack of initialization*. It is (partly, at least) a statement of my opinion (and yours, judging from your comment above) that a programmer should always use explicit initializers if the initial value matters. If that practice were followed by everyone, this discussion would hardly be as exciting. Unfortunately, it isn't always possible, if you have a union or an array. >> Arrays of unknown size cannot be initialized if they contain >> non-zero values. >See above. Yes, please do. > >> 4) A combination of (1) and (2): Un-initialized variables start off as >> zero in the first overlay that is loaded. Subsequent overlays get whatever >> was left in the storage location by previous overlays. >The C language says nothing about overlays. This is an implementation >issue that must be addressed by the overlay system designer, but it >does not belong in the language specification. The language should not make the "overlay system designer"'s job impossible. The language spec doesn't even have to mention overlays. It merely has to say what happens to un-initialized variables. > >> Ignoring the problems of upward compatibility and lazy programming >> styles, choice (1) is the winner. However, given that old >> programs must continue to work, Choice (4) looks like the best one. > >> If the committee goes for choice (3) now, this will only encourage code >> which doesn't explicitly initialize things, and make for an even larger >> base of software to break when the next standard tries to go back to >> choice (1) or (2). > >I vote for choice (3). I don't see that arrays or overlays have anything >to do with the choice among these alternatives. (3) is cleanest. As far as 'cleanliness', I disagree, but I admit that this is just an opinion What I was trying to state is that there are two ways to solve the problem of hostile machines (without them, this problem wouldn't exist). One method is better CPP; it requires extensions to CPP, and the programmer will have to type (as in keyboard, not as in typedef) more, but this approach is far more powerful (for going *beyond* a minimal solution to the original problem). It even lets you initialize all the rest of the array elements to a non-zero value (as you seem to think C can already do). The other method is this implicit zero-of-the-right-type initialization. It solves *only the immediate problem* and is otherwise a dead end. These two solutions are not incompatible, but implementing both of them (eventually...) will give no advantages over just the CPP solution, but will have the disadvantages of both. I doubt that any CPP changes are forthcoming in this standard, but I don't like stopgap solutions, either. I would rather see it stay as is for now (zero bit pattern). Bear in mind that I use such hostile machines in my regular work, and have encountered no problems with the current rules (other than not being able to tell deliberate un-initialized from deliberate but implicit zero, a problem which is not solved by typed-zero initialization). Kevin Martin, UofW Software Development Group
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/02/84)
> >... The last explict initializer is repeated as necessary > >to fill out the array. > Oh yeah? Since when? k&r appendix A section 8.6 paragraph #5 says missing > items get zeroed. Well, I tried this on a UNIX System V PCC and what Kevin says does describe its behavior. I wonder where I got the other idea (which I suggest is a better rule, but incompatible with current behavior). > ... Besides, who says I always want the same variant of the union > initialized. What if one array uses the 'int' element (so I want int zeros), > and another array uses the pointer element (so I want NULL's)? In many years of C programming I have never had such a requirement. Unions are pretty much a kludge for things like memory allocation. I can think of a general way to specify this type of initialization: union { int a; double b; char *c; } foo = { , 3.14159, }; using explictly empty members in the initializer (this would apply to structs as well as unions). The only incompatibility with current C that I see here is the slightly different meaning of the final , in the initializer list. This solution avoids the ambiguity of using just a type specifier (which would work for unions but not for structs) and having to supply explicit member names for initializers (which calls for a significant change to existing compilers). > >> The programmer cannot signal to the reader that a variable is > >> deliberately being left un-initialized. > >Sure he can. Nothing prevents you from specifying initializers you care > >about and letting the rest default. I do this anyway. > I agree, he can signal deliberately *initialized* ones. But given a declaration > like > int x; > I can't tell if he really cares that there is a zero there, if he forgot > to initialize it, or if he doesn't care what is in it... Unless you outlaw int x; altogether, or require that there be an explicit initializer SOMEWHERE among all the load modules (`a la Whitesmiths), you still won't be able to tell if he wanted the default initialization according to the rules (assuming a "non-junk" rule), if he doesn't care, or if he forgot. Using the same method I suggested above for struct/unions, int x = { }; is an explicitly empty initialization showing that the programmer has thought about the matter and decided that he didn't care what's there. > The language should not make the "overlay system designer"'s job impossible. > The language spec doesn't even have to mention overlays. It merely has to > say what happens to un-initialized variables. I agree that something definite should be said about the initial contents of un-initialized variables. If an overlay system designer finds that there is no practical way of avoiding clobbering variables (by reloading their initial values), then he has to give up the idea of transparent overlay facilities, since it is clear that non-auto/register variables are intended to retain whatever is stored into them until the program explicitly stores something else there. This is perfectly reasonable and any violation needs to be announced loudly to the user of that particular overlay system (which should not get in the way when the user elects NOT to use overlays). Very few overlay systems (including the one Ron Natalie and I did for JHU/BRL PDP-11 UNIX) are COMPLETELY transparent at the source code level, although that is certainly a desirable goal. > Bear in mind that I use such hostile machines in my regular work, and > have encountered no problems with the current rules (other than not > being able to tell deliberate un-initialized from deliberate but implicit > zero, a problem which is not solved by typed-zero initialization). If the rule is that uninitialized data is filled with proper-typed zero, then it seems that you wouldn't have to care which was intended (since the value of deliberately un-initialized "don't care" storage cannot be correctly used until it is stored into). The problems would appear to be due to trying to follow different rules, for example using specially- tagged "illegal data" values or "not defined" memory manager traps for uninitialized data instead of zero. By the way, I think we should beat on the hardware designers who keep dreaming up these "helpful" features without checking with compiler/OS implementers to see what their effects will be. If possible, buy more reasonable hardware and TELL the loser of the competition just what's wrong with his fancy design. The reason for zero bit pattern is clearly because that is what UNIX does automatically for "bss" storage. Not all OSes allow one to use tricks like this, although the C runtime startoff module could be a fast loop to initialize "bss" to a zero bit pattern. Typed zero, though, in general would have to be initialized by the compiler or by a rather smart link editor (I can think of some other, incredibly ugly, kludges). I think I will modify my position: IF uninitialized data HAS to have some valid value, then I would (still) recommend 0 of the appropriate type rather than a 0 bit pattern. This seems to be compatible with currently portable C code. However, if one is willing to drop the compatibility requirement (apparently the ANSI committee is not), then I would have uninitialized data contents UNKNOWN, possibly trap-causing, if they are used before being defined. That would help stamp out sloppy coding practices (nothing will completely solve this problem).
kpmartin@watmath.UUCP (Kevin Martin) (11/03/84)
> foo = { > , > 3.14159, > > }; >using explictly empty members in the initializer (this would apply to >structs as well as unions). The only incompatibility with current C >that I see here is the slightly different meaning of the final , in >the initializer list. This implies that the ordering of union elements is significant, which is currently not the case. But the 'zero-according-to-the-type-of-the- first-union-element' rule also does this. I would still prefer to name the element when I am explicitly initializing it, but that is a different discussion. >If the rule is that uninitialized data is filled with proper-typed zero, >then it seems that you wouldn't have to care which [don't-care about the >value vs. wanting a zero] was intended... When I go to modify code which was written by someone else (or by myself in the long-forgotten past), I do care. If it was deliberately left un-initialized, but I don't realize this, and I now give it an explicit initializer (non-zero) for some purpose, it will have no effect, because of the initialization code which I failed to look for. And I can't trust other people to have explicitly initialized *every* variable they cared about. >The problems would appear to >be due to trying to follow different rules, for example using specially- >tagged "illegal data" values or "not defined" memory manager traps for >uninitialized data instead of zero. By the way, I think we should beat >on the hardware designers who keep dreaming up these "helpful" features >without checking with compiler/OS implementers to see what their effects >will be. If possible, buy more reasonable hardware and TELL the loser >of the competition just what's wrong with his fancy design. I always did like somple machines like the Nova... there were so few ways of doing anything that most of the code was already optimal! Unfortunately, in the cases I am working with, guess who is paying me (indirectly). At least I don't have the funny undefined values and tags you mention. > >I think I will modify my position: IF uninitialized data HAS to have >some valid value, then I would (still) recommend 0 of the appropriate >type rather than a 0 bit pattern. This seems to be compatible with >currently portable C code. However, if one is willing to drop the >compatibility requirement (apparently the ANSI committee is not), then >I would have uninitialized data contents UNKNOWN, possibly trap-causing, >if they are used before being defined. That would help stamp out sloppy >coding practices (nothing will completely solve this problem). >Doug Gwyn That seems to reflect my feelings for the *eventual* resolution of this question. Once having chosen one of these paths, there will be no backing out, which is why I am loathe to choose either of them in the first standard. I think we've run out of things to discuss... (I hear the entire net breathe a sigh of relief) Kevin Martin, UofW Software Development Group
henry@utzoo.UUCP (Henry Spencer) (11/06/84)
> ... Once having chosen one of these paths, there will be no backing > out, which is why I am loathe to choose either of them in the first standard. I hate to mention this, Kevin, but the first standard was the first edition of DMR's C Reference Manual. The ANSI standardization process is different only in degree. There is some cause for debate about which of the two paths should be chosen, but the decision to choose one of the two was made perhaps a decade ago. Like C's revolting switch syntax, it's far too late to reconsider this now. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry