jagardner@watmath.UUCP (Jim Gardner) (08/21/86)
The following (huge) document comments on the latest proposal for a C standard. It is paginated, but does not contain tabs. COMMENTS ON THE DRAFT PROPOSED STANDARD (Dated July 21, 1986) - prepared by - The Software Development Group University of Waterloo Waterloo, Ontario Our comments are based on the _D_r_a_f_t _P_r_o_p_o_s_e_d _A_m_e_r_i_c_a_n _N_a_t_i_o_n_a_l _S_t_a_n_d_a_r_d _f_o_r _I_n_f_o_r_m_a_t_i_o_n _S_y_s_t_e_m_s -- _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e _C, Doc.No. X3J11/86-104, dated July 21, 1986. In addition, we make a number of comments on the Rationale for the standard, Doc.No. X3J11/86-099, dated July 7, 1986. This document supersedes previous submissions from the Software Development Group, which were submitted in comment on previous drafts of the standard. Generally, we will use the same order of presentation as the standard itself. Our section headings correspond to the appropriate sections in the standard. However, we will begin with some general observations. _G_e_n_e_r_a_l _N_o_t_e _1: _R_e_s_e_r_v_e_d _W_o_r_d_s: The standard defines many new symbols, particularly #defined names in header files. These are effectively reserved words, since programs that use the symbols for other things will get into trouble sooner or later. We count a total of 255 effectively reserved words: 32 language keywords 44 implementation limits 179 library-related names In contrast, Cobol only has 227 reserved words! To avoid a jungle of symbols that are effectively reserved, we strongly urge that the committee follow one of its own principles: symbols that begin with an underscore are not for the programmer's use. This is a simple rule that gets around most of the pitfalls. _A_l_l _n_e_w_l_y _i_n_t_r_o_d_u_c_e_d _s_y_m_b_o_l_s _s_h_o_u_l_d _b_e_g_i_n _w_i_t_h _a_n _u_n_d_e_r_s_c_o_r_e. This means that the symbols in <limits.h> should be _CHAR_BIT _CHAR_MAX _SCHAR_MAX - 1 - University of Waterloo August, 1986 /* etc. */ The same holds for other new symbols: "size_t" should become "_size_t", "ptrdiff_t" should become "_ptrdiff_t", and so on. Of course, the old stand-bys like NULL and "errno" will stay as they are, even if we might wish differently. Note that we would recommend against reserving the prefixes tttooo_, SSSIIIGGG, ssstttrrr, mmmeeemmm, and iiisss. This sort of rule would prevent implementations from supporting common opera- tions that aren't in the standard. For example, it actively rules out "isascii" and "isodigit" since these are not recognized by the standard. This will break a great deal of code. Besides, the fewer "reserved word" rules a programmer has to remember, the better. _G_e_n_e_r_a_l _N_o_t_e _2: _P_o_r_t_a_b_i_l_i_t_y: In Section 1.2, the designers state the principle MMMaaakkkeee iiittt fffaaasssttt, eeevvveeennn iiifff iiittt iiisss nnnooottt ggguuuaaarrraaannnttteeeeeeddd tttooo bbbeee pppooorrrtttaaabbbllleee. We do not argue with this principle in general, but we think it should be counterbalanced by other considerations. When there are only a few popular alternative behaviors, the standard should provide both a _f_a_s_t operation (with implementation-defined behavior) and one or more possibly slower operations with well-defined behavior. The whole point of a language standard is to allow program portability. The standard should ensure that there is _s_o_m_e way to write a portable program. For example, consider the ">>" operator. The standard does not indicate whether ">>" shifts arithmetically (propagating the sign bit) or logically (inserting zeros). What this effectively states is that the operation is only defined on a subset of the possible range of operands (i.e. when the operand to be shifted is positive). It is not rigorously defined outside the range, but implementa- tions are expected to support the operation outside the range. This sort of situation crops up in many places in the standard. In order to make it possible to write portable programs, we suggest that the standard should provide new extended-range operators corresponding to each limited-range operation. The standard can allow ">>" to work in an implementation-defined way outside of its defined range, but it should define additional functions, macros, or operations to handle the full range of operands. - 2 - University of Waterloo August, 1986 For example, you might have _ARITH_RS(A,B) which works like "A>>B" when A is positive, but which always performs an arithmetic shift when B is negative. If the programmer wants to be sure of a _l_o_g_i_c_a_l right shift, the operand to be shifted can be cast to uuunnnsssiiigggnnneeeddd. In this way, the programmer could always dictate whether an arithmetic or logical shift was desired. A second way to make portable programs possible is to make appropriate definitions in <stdefs.h> or some other header. In most cases where behavior is implementation- defined, there are a limited number of possibilities. For example, an implementation could define a symbol _ARITH_SHIFT to indicate that right shifts were done arithmetically and _LOG_SHIFT to indicate that right shifts were done logically. With appropriate #iiifffdddeeefff directives, source code could be adapted to either possibility. Similar symbols would tell how "A%B" works when B is negative, how integer division worked in the same situation, and so on. Note that this technique adds _n_o extra expense at execution time to determine how an implementation behaves. _G_e_n_e_r_a_l _N_o_t_e _3: _T_h_e _9_0% _R_u_l_e: The portability of a program is influenced by two factors: how it uses C code, and how it uses the library functions. If a program is ported from system A to system B, the implementation on B will usually report places where code is used incorrectly. However, it usually will not report situations where B's implementation of a library function differs significantly from A's implementation. Thus, compatibility of library functions is of major concern in porting programs, and therefore in design of a standard for writing portable programs. Our philosophy is that a function on system A should not have the same name as a function on system B, unless the A function is at least 90% the same as the B function. If the two functions are not almost identical in functionality, pretending that they _a_r_e the same by giving them the same name is just asking for trouble. In the context of the C standard, the 90% rule suggests that the standard library functions should behave in a manner that is almost identical on all systems. It is a mistake, for example, to make the definition of a binary file loose enough to encompass a widely divergent set of I/O devices and file formats. We would rather see the defini- tion restricted to allow operations that could reasonably be regarded as portable, and nothing more. If a particular system had special file formats that needed to be supported, - 3 - University of Waterloo August, 1986 the implementation on that system could provide additional I/O routines to deal with such formats. If a program is written using special routines for system-dependent I/O, porting the program is actually simpler. When the program is taken to a new system, the C implementation will issue diagnostic messages indicating the special I/O routines that are not available on the new system, and the programmer finds out what has to be changed. When porting a program written only with "standard" routines, the programmer must laboriously track down system dependencies that were disguised by using the "standard" routines and this is usually a great deal more work. In general, then, we believe that the standard library should _n_o_t be designed to conceal the system dependencies that exist on a particular machine. Instead, it should provide support for features that are common to _a_l_l machines, leaving it up to the individual implementation to support dependencies. _G_e_n_e_r_a_l _N_o_t_e _4: _T_h_e _C_o_r_r_e_c_t _A_n_s_w_e_r: At times, efficiency has been put ahead of correctness. A good example of this occurs with mixed signed and unsigned operations. Consider the following code. short i; unsigned short u; ... if ( i < u ) ... Since the committee allows this to result in either a signed or an unsigned comparison (depending on whether sssiiizzzeeeooofff(ssshhhooorrrttt) is less than sssiiizzzeeeooofff(iiinnnttt)), iiittt iiisss pppooossssssiiibbbllleee ttthhhaaattt aaa nnneeegggaaatttiiivvveee vvvaaallluuueee ooofff "iii" cccooouuulllddd bbbeee fffooouuunnnddd greater than a posi- tive value of "u". Correctness has been sacrificed to efficiency. This situation should be avoided. If a signed integer is negative, it should be less than all unsigned integers. All other considerations of lengthening or converting argu- ments are secondary. Note that efficiency doesn't enter into this -- a compiler or interpreter has a much better chance of generating efficient code than a user checking for the situation explicitly by writing iiifff statements or conditional expressions. - 4 - University of Waterloo August, 1986 If a programmer really wants the possibility of a nega- tive integer being larger than a signed one, he or she can always use explicit casts, as in if ( (unsigned)i < (unsigned)u ) ... Similarly, if a programmer knows that a particular form of comparison is more efficient than the default, he or she can always use explicit casting to ask for the more efficient comparison method. When it _i_s possible for the programmer to control efficiency, why give a more naive programmer the wrong answer? _G_e_n_e_r_a_l _N_o_t_e _5: _E_x_i_s_t_i_n_g _I_m_p_l_e_m_e_n_t_a_t_i_o_n_s: The Rationale states that existing code is important, but existing implementations are not. We agree with the principle, but must point out that many popular C programs are intimately connected to a particular implementation. Adopting practices contrary to the way popular implementa- tions work (e.g. the UNIX C compilers) may indeed break programs in subtle ways. For this reason, our comments sometimes state, "Implementation X does this differently." In such cases, we are not saying that the standard should be changed to do things X's way; we simply want to point out that some important compiler does not behave in the given manner, and one should expect some code to break as a result. _G_e_n_e_r_a_l _N_o_t_e _6: _M_e_a_n_i_n_g_f_u_l _S_l_o_p_p_y _C_o_d_e: In a few instances, the designers have made it necessary for conforming implementations to support "sloppy" programming practices. For example, the following are supposed to be equivalent. extern int i; int extern i; When a sloppy practice is well-established, the designers are justified, because existing programs should continue to work. However, the above practice is virtually unknown (at least to the people we have talked to and in the programs we have examined), and requiring all implementations to support it is surprising. More to the point, making sloppy code meaningful has undesirable side effects. A user who accidentally writes such code receives no diagnostic message, because the code is correct...even when the code is probably not what the user intended to write. The program may behave in an unexpected way because a typing mistake is accepted. - 5 - University of Waterloo August, 1986 In addition, the implementation that is forced to accept sloppy code has more difficulty generating error messages. It has less chance of identifying precisely where the code went wrong, because the programmer has so much more leeway. Consequently, the diagnostic facilities of the implementation are degraded for _a_l_l users, in order to support the sloppy few. The standard should not require implementations to support "loose" language constructs that are seldom used. Obviously, there are instances where the designers must decide whether a construct is or isn't used and different people may have different opinions on the matter. Still, the basic principle should be, "No sloppiness, unless required by common practice." _G_e_n_e_r_a_l _N_o_t_e _7: _R_a_t_i_o_n_a_l_e: The purpose of the Rationale document should be to explain why particular decisions were made. All too often, the Rationale is used to explain what the standard says. Obviously then, the standard itself should be made more clear, with more examples and illustrations. _S_p_e_c_i_f_i_c _S_e_c_t_i_o_n _C_o_m_m_e_n_t_s: The rest of this document talks about specific sections of the standard and the Rationale. _1._6 _C_o_m_p_l_i_a_n_c_e: A conforming freestanding implementation should provide the standard header <stddef.h> in addition to <limits.h> and <float.h>. The <stddef.h> should _n_o_t include a declaration for "errno". For more on "errno", see our comments on Sec- tion 4.1.1. _2._1._1._2 _T_r_a_n_s_l_a_t_i_o_n _P_h_a_s_e_s: Since the process of linking translated source files is described in this section, one might believe that linking must take place in the translation environment. The standard should explicitly state that linking can take place in either the translation environment or the execution environment (or in some other environment, for that matter). _2._2._4._2 _N_u_m_e_r_i_c_a_l _L_i_m_i_t_s: We do not understand why so many #defined names are missing their vowels. For example, why use SHRT instead of SHORT? The difference in keystrokes is minimal, and the standard guarantees that #defined names can be 31 characters - 6 - University of Waterloo August, 1986 long. This criticism applies to many of the names chosen by the designers. Technically speaking, the definition of FLT_ROUNDS is incorrect. The beginning of the section states that each macro must be at least the given value. The given value for FLT_ROUNDS is 0, but the value -1 is also said to be meaningful. Also, the alternatives for this value are "rounds", "chops", and "indeterminate". This overlooks the fact that "chops" could mean "truncate towards zero" or "truncate towards negative infinity". We conclude that there should actually be four alternatives for FLT_ROUNDS, not just three. _3._1._2._1 _S_c_o_p_e_s _o_f _I_d_e_n_t_i_f_i_e_r_s: The Rationale says that the behavior is undefined if you use an identifier outside its scope. The standard itself says nothing about this possibility. _3._1._2._2 _L_i_n_k_a_g_e_s _o_f _I_d_e_n_t_i_f_i_e_r_s: According to the rules for declarations that include the keyword eeexxxttteeerrrnnn, it is not possible to declare extern i; ... static i; inside a file and have the first declaration of "i" refer to the static (internal linkage) "i". The Rationale says this decision was made in order to allow one-pass compilers, but in fact, one-pass compilers are possible without this restriction. All that the one-pass compiler needs to make this work is a loader with a bit of intelligence. We believe that this is contrary to the principle stated in Section 1.2 of the Rationale: Existing code is important, existing implementations are not. Ruling out the above construct will break a good many existing UNIX programs, since the existing UNIX compilers allow eeexxxttteeerrrnnn declarations to be resolved to ssstttaaatttiiiccc objects that have file scope and internal linkage. Therefore, we believe the above construct should be made legal. We note also that if the eeexxxttteeerrrnnn definition occurs in a function and the ssstttaaatttiiiccc outside a function, we have a different situation. For example, consider - 7 - University of Waterloo August, 1986 f() { extern int i; ... } static float i; According to the second paragraph of the Semantics section in 3.5.1, an eeexxxttteeerrrnnn definition inside a function refers to an object that is defined somewhere with file scope. It cannot refer to the ssstttaaatttiiiccc definition (because that comes later), so it must refer to some definition with external linkage. As soon as the ssstttaaatttiiiccc declaration is encountered however, all subsequent references in the file refer to the static variable. This is odd, to say the least. _3._1._2._3 _N_a_m_e _S_p_a_c_e _o_f _I_d_e_n_t_i_f_i_e_r_s: The Rationale states that the intention is to _p_e_r_m_i_t as many separate name spaces as possible. In fact, we believe it _r_e_q_u_i_r_e_s as many separate name spaces as possible. The standard says that all tags (structure, union, and enum tags) should be folded together. We don't see why this is necessary. Distinguishing the different types of tags will not break any existing programs, but folding them together may break programs that were written for an implementation that _d_i_d distinguish the different tags. _3._1._2._5 _T_y_p_e_s: An unsigned and signed integer take up the same amount of memory. The standard should also state they have the same alignment requirements. This assumption is true of all machines we know, and allows simpler coding of portable programs. There is also the implication at various points in the standard that an integral zero consists of all 0-bits. For example, Footnote 71 (to "calloc") implies that every zero except pointers and floating point types consists entirely of 0-bits. Furthermore, the range of values available to the unsigned type overlaps the range of non-negative values for the signed type. This argues that the document should state that all values which can be represented by both signed and unsigned integers (i.e. the non-negative integers that can be represented by sssiiigggnnneeeddd iiinnnttt) have the same bit pattern. This is true for all common representation schemes: one's comple- ment, two's complement, and signed magnitude. We believe - 8 - University of Waterloo August, 1986 that the signed-unsigned algorithm stated in 3.2.1.2 tacitly assumes that this equivalence is true. The equivalence also legitimizes many of the bit operations that take place inside existing C programs. The explanation of pointer types should be considerably expanded. Our reading of the standard shows several assump- tions about various pointer types that are never stated explicitly. We believe that users would understand the language better if these assumptions were stated explicitly in this section. For example, footnote 36 to section 3.5.2.2 assumes that the alignment and size of all pointers to struc- ture/union types will be the same. We believe this assump- tion is valid, but it should be stated explicitly in 3.1.2.5. Similarly, if A is a pointer to type T, it should be true that (char *) (A + 1) == ((char *) A) + sizeof(T) (If this were not true, "malloc" would be in serious trouble.) This should be stated explicitly. As another example, given that the alignment of signed and unsigned integers is equal, and given that arrays are made up of contiguous objects, a statement like the following is true. int *p; (int *) ( (unsigned *)p + 10) == p+10 It would be helpful if the standard or the Rationale actually pointed this out. Also, we cannot find an explicit definition of the phrase "pointer to object". We assume that it means a pointer type which is not a pointer to a function or vvvoooiiiddd, but we could not find such a definition. _3._1._3._2 _I_n_t_e_g_e_r _C_o_n_s_t_a_n_t_s: According to the standard, an unsuffixed octal or hex integer constant can be interpreted as either signed or unsigned. Certain constants will be interpreted as signed on some machines and unsigned on others, because of differences in machine word size. Due to the drastic effect an unsigned operand may have (e.g. in a comparison opera- tion), there must be some way to ensure that a number is taken as signed. We suggest an "s" suffix. - 9 - University of Waterloo August, 1986 Since the sign is not part of the definition of a constant, the "number" -32768 will be treated as a long integer, even though it fits into 16 bits. This will surprise programmers who use it as the smallest possible short integer. _3._2._2._1 _A_r_r_a_y_s, _f_u_n_c_t_i_o_n_s, _a_n_d _p_o_i_n_t_e_r_s: The second paragraph of this section states Except when used as an operand that may or shall be a function locator, an identifier declared as "function returning type" is converted to an expression that has type "pointer to function returning type". The way we read this, it appears that we can say something like extern int f(); (*f)(); Since the "*" operator may not take a function locator, the function locator is regarded as a pointer, and therefore the "*" operator accepts it. By applying recursion, it then seems legal to say (**f)() (***f)() (****f)() and so on. The Rationale should point out that function pointers cannot be cast into other pointer types, and that the only thing that can be assigned to a function pointer is a pointer of the same type or (vvvoooiiiddd *) 000. We believe the document has a built-in assumption that any pointer cast to (vvvoooiiiddd *) yields a unique value. (If this assumption is not true, a function like "memcpy" could not work.) We think this assumption should be stated explicitly. _3._3._2._2 _F_u_n_c_t_i_o_n _C_a_l_l_s: The second paragraph of the Semantics section should be changed to the following: If the postfix expression preceding the parentheses in a function call consists solely of - 10 - University of Waterloo August, 1986 an identifier, and if no declaration is in scope for this identifier, the identifier is implicitly declared exactly as if, in the innermost block containing the function call, the declaration extern int identifier; appeared. This prevents implicit declaration when the function call has a form like (f)() or (*f)() _3._3._3._2 _A_d_d_r_e_s_s _a_n_d _I_n_d_i_r_e_c_t_i_o_n _O_p_e_r_a_t_o_r_s: Consider an array declared with int A[10]; By 3.3.3.4, we have sizeof(A) == sizeof(int) * 10 We also have (char *)(A+1) == (char *)A + sizeof(int) What is the value of (char *)( (&A) + 1) Is it ( (char *)A ) + sizeof(int) or ( (char *)A ) + 10 * sizeof(int) 3.3.3.2 implies the second (i.e. that &A is a pointer to an array of 10 ints), but does not state it precisely. _3._3._4 _C_a_s_t _O_p_e_r_a_t_o_r_s: This section says that a pointer to type ccchhhaaarrr has the least strict alignment. It should also make some comment saying that a pointer to vvvoooiiiddd is the most _g_e_n_e_r_a_l pointer type, and therefore shares the least strict alignment with ccchhhaaarrr. - 11 - University of Waterloo August, 1986 _3._3._6 _A_d_d_i_t_i_v_e _O_p_e_r_a_t_o_r_s: A very close reading of this section indicates that arithmetic with (vvvoooiiiddd *) pointers is illegal. However, the point is very subtle and could easily be missed. We suggest that it be emphasized. The same point should be made in 3.3.8 (on relational operators). _3._3._1_5 _C_o_n_d_i_t_i_o_n_a_l _E_x_p_r_e_s_s_i_o_n: The standard states that you can have expressions of the form i ? p : v where "p" is a pointer type and "v" is a (vvvoooiiiddd *). The result of this expression is said to be a (vvvoooiiiddd *). It seems to us that this is the wrong way around. Instead, the result of the expression should have the type of the pointer "p". For example, consider char *cp; int *ip; ... ip = cp ? cp : malloc(10); Since the result of "malloc" is (vvvoooiiiddd *) the result of the right hand side of the assignment will be (vvvoooiiiddd *). This will be quietly assigned to "ip", even if the actual value of the expression is "cp". To avoid such quiet problems, the result should be the pointer type that is not (vvvoooiiiddd *). _3._3._1_6._1 _S_i_m_p_l_e _A_s_s_i_g_n_m_e_n_t: The standard must be more clear on assignments of "pointers to functions". Suppose A and B are both pointers to functions returning iiinnnttt but the functions have different prototypes (or one function has a prototype and the other doesn't). Is A=B legal? Guidelines for compatibility between function pointers should be established. We believe the guidelines should follow the rules for type equivalence given in 3.5.5. The standard says that assigning overlapping objects to one another is undefined (and therefore illegal). While we recognize that there are many instances when assigning overlapping objects to one another cannot be done safely (e.g. when objects are referenced with pointers), there are some instances where we believe it is a mistake to say the operation is illegal. In particular, many of our own C programs use the operation - 12 - University of Waterloo August, 1986 union { float f; int i; } u; ... u.f = u.i; According to the standard, this operation will become illegal. We might point out the odd effect that u.f = (float) u.i; would still seem to be legal, even if the assignment without the cast is not. The cast operation presumably takes the value of "u.i", converts it, and stores it in some temporary storage, so assigning it to "u.f" causes no overlap. If this really is intended, the standard or the Rationale should comment on it. The difference between ccchhhaaarrr, uuunnnsssiiigggnnneeeddd ccchhhaaarrr, and sssiiigggnnneeeddd ccchhhaaarrr must be discussed. If a program declares char *p; unsigned char *u; signed char *s; is it possible to make assignments like p = u; u = p; u = s; s = u; p = s; s = p; This question arises because ccchhhaaarrr may be signed in some implementations and unsigned in others. As a result, some of the above assignments will be valid on some machines but not on others. We suggest that the assignment rules be changed to allow the (uncast) assignment of ccchhhaaarrr to uuunnnsssiiigggnnneeeddd ccchhhaaarrr and vice versa. The same should apply to pointers to these types. Note that people writing portable programs will never use the ccchhhaaarrr type; they will use sssiiigggnnneeeddd ccchhhaaarrr when they are using the value arithmetically and uuunnnsssiiigggnnneeeddd ccchhhaaarrr when they are using the character as a character. Using the plain - 13 - University of Waterloo August, 1986 ccchhhaaarrr type will be non-portable. However, this runs into other problems. In particular, suppose someone writes unsigned char a[] = "string"; unsigned char *cp; ... cp = "abc"; These operations will work on a system where ccchhhaaarrr is unsigned, but not if ccchhhaaarrr is signed. To make such opera- tions possible, it must be possible to intermix plain ccchhhaaarrr and uuunnnsssiiigggnnneeeddd ccchhhaaarrr types in the ways shown above. _3._3._1_6._2 _C_o_m_p_o_u_n_d _A_s_s_i_g_n_m_e_n_t: According to the standard, an operation like int i; i /= 3.5; would be performed using floating point division. However, the Berkeley C compiler uses integer division. For this reason, this should be marked as a quite change. _3._3._1_7 _C_o_m_m_a _O_p_e_r_a_t_o_r: The standard states that the comma operator is a sequence point, but it is not clear what point of the comma operation is _t_h_e point. For example, consider the expres- sion A = ((B=1),B) + ((B=2),B) + ((B=3),B); What should A equal? (We note that the Berkeley C compiler assigns the value 9 to A in the expression above.) Does the sequence point take place at the comma (i.e. when only the left half of the expression has been evaluated) or does it take place when both sides of the comma have been evaluated? Are there actually two sequence points? The same sort of problem obviously occurs with func( (b=1,b) , (b=2,b) ); In fact, the question generalizes. Several operators (e.g. "&&", "||") are said to be sequence points, when the opera- tion actually has several "points" to it. The standard should be more explicit, e.g. There is a sequence point after the evaluation of the left operand. or - 14 - University of Waterloo August, 1986 There is a sequence point after the evaluation of the result of the operator. The second paragraph of 3.3 makes some effort to address this problem, but it is too nebulous to be much help. _3._4 _C_o_n_s_t_a_n_t _E_x_p_r_e_s_s_i_o_n_s: We point out that if the "offsetof" macro is implemented as suggested in the rationale, it will not be a constant expression according to the rules of this section. Since we like the suggested implementation, we suggest the definition of constant expressions be modified. In particular, the standard should say that the implementa- tion's behavior is _u_n_d_e_f_i_n_e_d if a constant expression does not comply with the given rules. This gives an implementa- tion the freedom to support an expanded definition of constant expressions if desired. _3._5 _D_e_c_l_a_r_a_t_i_o_n_s: For the sake of readability, it should not be legal to enclose an entire declarator in parentheses, as in int (x); A function prototype containing such a declaration is very deceptive. For example, int f(int (x)); means that "f" has an integer parameter named "x"...unless "x" happens to be the name of a type as declared in a tttyyypppeeedddeeefff statement, in which case the argument of "f" is a function that takes an argument of type "x" and returns an integer. Confusion can be avoided if such extraneous parentheses are not allowed. We were surprised that the standard allows storage class specifiers to be intermixed with type specifiers, as in const int extern long a; We were even more surprised to discover that the Berkeley C compiler already supports such constructs. We don't really understand why it is necessary to support this sort of thing -- we would be surprised if any existing programs make use of it. An implementation that accepts this kind of code has a good deal of trouble generating comprehensible error messages, since it cannot be so rigid in its approach to - 15 - University of Waterloo August, 1986 parsing. _A_l_l programmers will receive poorer diagnostic messages in the interests of catering to the very few programmers who would want to ignore very well-established code-writing conventions. _3._5._1 _S_t_o_r_a_g_e-_C_l_a_s_s _S_p_e_c_i_f_i_e_r_s: The semantic description of the rrreeegggiiisssttteeerrr storage class should be reworded to the following: A declaration with storage-class specifier rrreeegggiiisssttteeerrr is an aaauuutttooo declaration with a suggestion that the object will be frequently accessed, and thus that the compiler should attempt to speed up access to the object. One restriction applies to an object declared with storage-class specifier rrreeegggiiisssttteeerrr: the unary "&" (address-of) operator must not be applied to it. Since the program cannot legitimately generate a pointer to an object with the storage-class specifier rrreeegggiiisssttteeerrr, a frequently-used optimization is to keep the object in fast storage which cannot be accessed through a pointer, e.g. a hardware register. By rephrasing the definition this way, you give the rrreeegggiiisssttteeerrr storage-class more meaning. In particular, you open the door to compilers that perform global optimizations using the fact that rrreeegggiiisssttteeerrr variables can never have their values changed by indirection through a pointer. The compiler can optimize the use of rrreeegggiiisssttteeerrr variables because it can always know when the register values are used and changed. As currently defined in the standard, rrreeegggiiisssttteeerrr is an all-or-nothing optimization. We feel that machines which can't give "all" (due to a shortage of registers) shouldn't be forced to give "nothing". The standard might also make some statement on what implementations should do if there are several rrreeegggiiisssttteeerrr declarations and only some of these can be used for optimizations. We propose that the standard say that the declarations which come lexically first will be optimized first. This gives a programmer some way of indicating preference of optimization. _3._5._2._1 _S_t_r_u_c_t_u_r_e _a_n_d _U_n_i_o_n _S_p_e_c_i_f_i_e_r_s: We suggest that the definition for "struct-declaration" be changed to - 16 - University of Waterloo August, 1986 struct-declaration: type-specifier-list struct-declarator-list; struct-or-union-specifier; The added possibility lets you define an unnamed element of this type. The sub-elements will appear as first-level ele- ments in the enclosing structure. Using this scheme, the example in 3.3.2.3 could become struct { int type; union { int intnode; double doublenode; }; } u; /* ... */ u.type = 1; u.doublenode = 3.14; /* ... */ if (u.type == 1) /* ... */ sin(u.doublenode) /* ... */ _3._5._2._2 _S_t_r_u_c_t_u_r_e _a_n_d _U_n_i_o_n _T_a_g_s: The form struct y; now has a special meaning. Suppose we define typedef struct y z; Does the code z; have the same effect as struct y; We note that you can use pointers to structures without having to define the structure itself. Do you ever have to define a structure's contents in a particular source file? _3._5._2._3 _E_n_u_m_e_r_a_t_i_o_n _T_y_p_e_s: If we have - 17 - University of Waterloo August, 1986 enum E1 { e1 } var; enum E2 { e2 }; is it legal to say var = e2; The answer is almost certainly yes...but we would be happy if we were allowed to give a warning or an error message for the operation, if there is no explicit cast. Similarly, we would like to give a warning for things like var = e2 + 1; _3._5._2._4 _c_o_n_s_t _a_n_d _v_o_l_a_t_i_l_e: According to our reading of the standard, the following code is illegal. f1() { extern const x; ... } f2() { extern x; ... } int x; On the other hand, it would be very convenient if one func- tion could declare an object cccooonnnsssttt while another did not. This would let a function indicate when it did not intend to change the value of an external object, and thereby allow local optimizations. The actual definition of the object would establish whether or not the object really was cccooonnnsssttt (and therefore suitable for allocation in read-only memory). The same principle would hold for vvvooolllaaatttiiillleee. _3._5._3._3 _F_u_n_c_t_i_o_n _D_e_c_l_a_r_a_t_o_r_s: The last sentence of the Semantics section reads If the list is empty in a function declaration that is part of a function definition, the func- tion has no parameters. What does this say about a function definition like - 18 - University of Waterloo August, 1986 int (*F(int a))() {... Since the empty identifier list appears as part of a func- tion definition, the function pointed to by F's return value takes no arguments. This rules out returning a pointer to an arbitrary integer function. _3._5._5 _T_y_p_e _D_e_f_i_n_i_t_i_o_n_s _a_n_d _T_y_p_e _E_q_u_i_v_a_l_e_n_c_e: The standard should discuss structs that have the same tag but different internal structures. We also have some questions about the situation where a tttyyypppeeedddeeefff declares a named type with the same name as a variable defined in an enclosing scope. Inside the scope of the named type, is the variable completely invisible? Or can the variable be visible in contexts where the compiler can clearly determine that the named type is not valid? As another questionable construction, consider the code typedef int X; typedef X *Y; f(void) { typedef char X; Y b; The definition of X inside the function clearly supercedes the external definition of X. However, it is not clear if "b" is a pointer to an integer (using the definition of X at the time Y was defined) or a pointer to a character (as X was defined at the time "b" was declared). An even more subtle situation is struct X { /* definition 1 */ }; typedef struct X *Y; ... f(void) { struct X { /* new definition */ }; Y Z; ... Is Z a pointer to the old X structure or the new one? We believe that most people would expect Z to be a pointer to the old X structure. However, a strict reading of the definition of tttyyypppeeedddeeefff suggests otherwise. The standard says that a typedef type is not a new type; it is a name for a type that could be defined in another way. Since a pointer - 19 - University of Waterloo August, 1986 to the old X structure could _n_o_t be defined in another way after the declaration of the new X structure, the typedef type Y would have to refer to the new structure. To avoid such hair-splitting, the standard should state precisely what happens in such a case. _3._5._6 _I_n_i_t_i_a_l_i_z_a_t_i_o_n: Is the following initialization legal? int f(int a) { const int b = a*2; Consider struct X { int a,b; }; f() { struct X Z; int junk = (Z.a=1,Z.b=2,7); int more_declarations; Is this allowed? Can we initialize an aaauuutttooo structure in this way? Can we use the side effects in an initializer to initialize another object? (We note that the designers of the standard ruled out the use of non-constant expressions to initialize auto aggregates precisely because of the problem of side effects. The above example shows that side effects are still possible.) Can auto initializers make use of external variables with the same name as the symbol being initialized? For example, is the following valid? int i = 1; f() { char i = i * 2; ... This sort of construction is allowed and used in Berkeley C code. It appears that the standard says that the following is legal. int i = {{{{{10}}}}}; - 20 - University of Waterloo August, 1986 Is this really intended? The paragraph beginning at line 542 (about initializa- tion of subaggregates inside aggregates) is very confusing. At the very least, it should be reworded to be more clear. We also believe that it might not say what you mean it to say, but it's too hard to construe for us to be sure. For example, how is the following interpreted? int a[4][5][6] = { { 1, 2 }, { 3, 4, 5 }, { 6, 7, 8, 9 } }; _3._6._4._2 _T_h_e _s_w_i_t_c_h _S_t_a_t_e_m_e_n_t: The Rationale says that ranges in case labels were rejected because many current compilers would generate excessive amounts of code. This does not seem to be a good reason for rejecting something that could be quite useful. Making a compiler generate the equivalent iiifff code for a switch range is trivial compared with (for example) requiring both signed and unsigned characters. This is indeed a minor extension. It is our belief that a compiler is usually able to produce better code for case ranges than a programmer trying to do it by hand using iiifff statements. Therefore efficiency is actually improved by supporting case ranges. If you do not want to sanctify case ranges as part of the standard, the committee should still recognize that case ranges are likely to be common extensions to the language and should be listed in Section 5.6.4. More to the point, the committee should develop some syntax for case labels so that implementations that want to offer the extension can do so in a consistent way. The ".." notation mentioned in the Rationale is not acceptable because of the tokenizing rules: 1..3 will be interpreted as the two floating point numbers 1.0 and 0.3. We would suggest using the tilde as the case range separator, as in case 1~10: ... - 21 - University of Waterloo August, 1986 This does not introduce a new operator and yet it is easy to parse because tilde has no binary meaning. It also looks good (i.e. the visual appearance suggests its meaning). To improve ssswwwiiitttccchhh statements even more, we would recommend provisions for "open-ended" case ranges as well, of the form case >n: case >=n: case <n: case <=n: These avoid the example case 0..65535: given in the Rationale, since the case just becomes case >=0: Moreover, _t_h_i_s form is completely portable, since it's independent of the number of bits in the switch variable. _3._7._1 _F_u_n_c_t_i_o_n _D_e_f_i_n_i_t_i_o_n_s: The standard seems to allow extraneous declarations in a function heading, as in f(a,b,c) int a; typedef struct X ...; int b; int c; { ... Was this the intention? It strikes us as a poor idea. Also note that the UNIX C compiler currently allows the form int (*f())(a,b,c) {... in function definitions, but the standard will require int (*f(a,b,c))() {... We believe this is a quiet change. - 22 - University of Waterloo August, 1986 _3._8._1 _C_o_n_d_i_t_i_o_n_a_l _I_n_c_l_u_s_i_o_n: The directive #elif should be renamed to the more mnemonic #elseif. As an alternative, the compiler might recognize the following. #else if #else ifdef #else ifndef If an undefined identifier appears in an #if expres- sion, _a_n_d _i_f _i_t_s _v_a_l_u_e _w_a_s _n_e_e_d_e_d, an error should be given. Thus if A is not defined, #if A gives an error. However, if B is defined and non-zero #if (B||A) does not give an error, because the value of A is irrelevant. Note that the confusion of an undefined symbol meaning "zero" does not arise from something simple like #if UNDEF_SYMBOL but from code like #define X (5*y) int y = 0; ... printf("%d ",X); #if X printf("is non-zero"); #else printf("is zero"); #endif In the #iiifff directive, the X is replaced with "(5*y)". The preprocessor then checks to see if "y" is a #defined symbol. It isn't so, it turns into zero and the final result of the #iiifff condition is zero, even though X itself was defined. If the definition of X is changed to some different expression (e.g. a simple constant), the #iiifff condition suddenly becomes true. - 23 - University of Waterloo August, 1986 If the user really wants to assume "undefined" means zero, he or she should write #if defined(A)&&A or #ifndef A #define A 0 #endif _3._8._3 _M_a_c_r_o _R_e_p_l_a_c_e_m_e_n_t: What happens if a macro with parameters is invoked with the wrong number of parameters or no parameters? Is it an error, or is the text preserved? The fourth paragraph on page 80 (lines 7 through 12) is very difficult to understand. An example would certainly help clarify what it is trying to say. _S_e_c_t_i_o_n _4: _G_e_n_e_r_a_l _N_o_t_e_s: It should be stated that if a program mixes macro and non-macro invocations of the same library functions, the results are unpredictable. For example, characters written with the "putchar" macro and the "putchar" function intermixed may come out in the wrong order (or perhaps won't come out at all). Many functions return pointers to values that are created by the system. For example, "strerror" returns a pointer to a string that the system sets up. Are such values placed in static storage areas or in memory that has been dynamically allocated (by "malloc")? The answer to this question must be stated exactly in each case to make for uniformity across systems. The difference is important, since static storage makes a function dangerous to use in exception handlers. In addition, storage allocated through "malloc" can be freed if it is no longer needed, while static storage cannot be. _4._1._1 _T_e_r_m_s _a_n_d _C_o_m_m_o_n _D_e_f_i_n_i_t_i_o_n_s: The file <stddef.h> should be required for stand-alone operation. However, it should not mention the "errno" value. All the other contents of <stddef.h> are characteristics of the hardware and the implementation. The "errno" value is related to the library and should have its own header <errno.h>. - 24 - University of Waterloo August, 1986 The standard implies that headers which need symbol definitions that are "officially" in other headers will redefine the symbols. For example, <stdlib.h> needs to use "size_t" in function prototypes, so it will include its own definition of "size_t". We feel that it makes more sense for <stdlib.h> to explicitly #include the header that defines "size_t" rather than giving its own definition. Multiple definitions of the same symbol always mean trouble. A similar problem is raised with the functions "strtod" and "strtol". Their definition implies that including <stdlib.h> is all you have to do to use the functions. However, the user may also need to use the symbols HUGE_VAL, ERANGE, LONG_MAX, LONG_MIN, and "errno". Should the <stdlib.h> file make these available (by defining the values directly or including the appropriate header files) or should the user have to include the appropriate headers explicitly? The standard should answer this question. _4._1._2 _H_e_a_d_e_r_s: The first paragraph contains the sentence If the program redefines a reserved external identifier, even with a semantically equivalent form, the behavior is implementation-defined. The term "implementation-defined" should be changed to "undefined". By definition, "implementation-defined" refers to behavior of a correct program construct. We believe it is too broad-sweeping to say that redefinition of a reserved external identifier should always be allowed; therefore, "undefined" is the better term, giving implementations the choice to accept or not accept the construct. Also, "implementation-defined" implies that the implementation must document how it behaves. The ramifications of redefining a library symbol may be too unpredictable to document. _4._3._1._9 _T_h_e _i_s_s_p_a_c_e _F_u_n_c_t_i_o_n: The "isspace" function should also test for the line- feed character if it is not identical with the new-line. _4._5._4._6 _T_h_e _m_o_d_f _F_u_n_c_t_i_o_n: We feel "modf" should behave in the same way as float to integer conversions. This means that "*iptr" should have the same value as - 25 - University of Waterloo August, 1986 (double)(long) value when this operation does not cause an overflow. This definition is more consistent in the (-1,0) range than the definition proposed in the standard. Even when the "(double)(long)" conversion would cause an overflow, "modf" should still behave as if it is performing this sort of conversion, in the interests of consistency. If the integer part of "value" is exactly equal to the most negative long integer, a problem arises. The "(double)(long)" approach is likely to give one lower than the most negative integer. The "modf" code should recognize this problem and issue an EDOM error in such cases. _4._5._6._5 _T_h_e _f_m_o_d _F_u_n_c_t_i_o_n: "fmod" should follow the same principle as "modf". In the expression x == i*y + f the sign of "f" should be such that i == (long) (x/y) Alternatively, you might declare that "f" is always posi- tive. Either alternative is better than declaring that "f" has the same sign as "x". _4._7 _S_i_g_n_a_l _H_a_n_d_l_i_n_g: Does the SIGABRT signal catch other abnormal termina- tions besides one raised by "raise" or "abort"? We believe it should not. _4._7._2._1 _T_h_e _r_a_i_s_e _F_u_n_c_t_i_o_n: Must "raise" be able to generate _e_v_e_r_y valid signal, or is the implementation allowed to restrict the sort of signals that "raise" can send? Is it allowed to issue more than the standard signals? _4._8._1 _V_a_r_i_a_b_l_e _A_r_g_u_m_e_n_t _L_i_s_t _A_c_c_e_s_s _M_a_c_r_o_s: The standard does not explain why these routines should be implemented as macros. We realize that the reason is that the parameters aren't necessarily expressions, but the standard should say this; otherwise, it just sounds like a petty rule. - 26 - University of Waterloo August, 1986 _4._9._1 _I/_O _I_n_t_r_o_d_u_c_t_i_o_n: What happens if the BUFSIZ default value depends on the type of device that is connected to the I/O stream? Making this a fixed constant may be inadvisable. _4._9._2 _S_t_r_e_a_m_s: The sentence beginning at line 59 should read Data read in from a text stream will not necessarily compare equal to the data that were earlier written out to that stream, unless the data consist only of complete _n_o_n-_n_u_l_l lines, _w_i_t_h _n_o _t_r_a_i_l_i_n_g _b_l_a_n_k_s, and composed only of printable characters and the control characters horizontal tab, new-line, vertical tab, and form feed. Also, we do not know why the backspace was excluded from the set of characters that could be safely written and read on a text stream. The committee obviously believes that binary files will map into some machine-dependent idea of what a binary file is. This is not necessarily so. For example, it is not obvious how to map the binary file concept into a record- based file system. Such systems can have random access to records, but if records do not have a fixed length, there is no simple relationship between the UNIX concept of random access and the file system's. The committee says that the contents of a binary file stream will be exactly what is written with an implementation-defined number of NUL characters appended. This is a curious change on existing UNIX file system concepts. One of the most important principles of binary file streams on UNIX is that you can write a file, then read it and get back _e_x_a_c_t_l_y what was written. The addition of extra NUL characters violates this principle. Evidently, the designers allowed the extra NUL characters in order to accommodate systems that might need to pad files out to a certain length. However, it is not clear that the freedom to add NUL characters is sufficient to satisfy arbitrary file system requirements. The file system may be just as upset at extra NUL characters as it would be with data that was not padded to some appropriate boundary. For this reason, we feel that the standard should simply state that reading from a binary file stream gives precisely what was written to the file stream, and leave it up to the implementation to figure out how to provide such a service. - 27 - University of Waterloo August, 1986 It is not the business of a portable standard to describe how to perform non-portable operations. In particular, we believe it is a mistake to encourage the use of binary streams when creating files in system-specific formats. A program that builds formatted files in a byte- by-byte manner will certainly not be portable to systems that use different file formats. If someone does try to port such a program, it is better for the program to fail in a very obvious way than to write out a distorted version of some other system's file format. If an implementation believes users will need to create certain kinds of system- specific files, the implementation should provide its own routines to accomplish such tasks. _4._9._6._1 _T_h_e _f_p_r_i_n_t_f _F_u_n_c_t_i_o_n: The description of the "%f" specifier says that the output should have six decimal places (if there is no preci- sion field) and that the number should be widened to the appropriate number of digits. Since the IEEE floating point standards indicate that floating point numbers may be as great as 10**308, the standard may result in widening a floating point number to as many as 314 (308+6) digits. We recommend that implementations be allowed to use scientific notation ("%e" format) in cases where the other approach would widen the value beyond the maximum possible number of significant digits. This would probably require the defini- tion of a macro in <float.h> to indicate the maximum number of significant digits. The standard explicitly states that the "#" qualifier has no effect on "%s". We see no reason why this is necessary. In fact, we believe that a natural interpreta- tion of "%#s" would be to print out a string using escape sequences for non-printable characters. While this behavior need not be required by the standard, we don't see why it should be explicitly ruled out when it would clearly be a useful facility. The same point applies to "%#c". All things being considered, it would be easier to say that the use of "#" in "%c", "%d", "%i", "%s", and "%u" is implementation-defined. The Environmental Limit section reads The minimum value for the number of characters produced by any single conversion shall be at least 509. Obviously, what you really mean is Implementations may place a maximum on the number - 28 - University of Waterloo August, 1986 of characters produced by any single conversion, but this maximum cannot be less than 509. It seems perverse that lllooonnnggg dddooouuubbbllleee conversion specifiers must use an upper case 'L' while lllooonnnggg ones must use lower case. It is more sensible to allow either upper or lower case in both instances. _4._9._6._2 _T_h_e _f_s_c_a_n_f _F_u_n_c_t_i_o_n: The last sentence of the first paragraph seems redundant. The excess arguments will obviously be evaluated before they are passed to "fscanf". What you mean to say is that no error occurs if too many arguments are specified, but the excess arguments are ignored. It seems odd that "fscanf" returns EOF if input items cannot be read. EOF is conceptually a special character value (though of course, it is an integer). Since "fscanf" returns an integer in all other cases, it would make more sense for "fscanf" to return -1. _4._9._6._7-_9 _v_f_p_r_i_n_t_f, _v_p_r_i_n_t_f, _v_s_p_r_i_n_t_f: The Rationale states that a format for variable-length argument lists was rejected because the functions "vfprintf", etc. were "more controlled". This comment confuses us, because we don't understand what "more controlled" means. Very clearly, the "vfprintf" approach offers less freedom and therefore is less useful. We suggest that "printf" and friends obtain a new specifier "%v", which accepts two arguments: a new format string and a "va_list" of items to format. This is similar to the existing "%r" construct on UNIX systems. Given the "%v" specifier, writing functions to perform the work of "vprintf" and friends is trivial. However, the opposite is _n_o_t true -- "vprintf" and friends have significant difficulty in simulating many of the results that are possible with "%v". The "%v" approach is simply faster, more readable, and more versatile than using "vprintf" and friends. For example, a call to "printf" could take several normal argu- ments, followed by a "va_list" argument pointing to a variable list, followed by more normal arguments. This avoids the problem of having to make three calls, one for the normal arguments, one for the variable list, and one for the remaining normal arguments. - 29 - University of Waterloo August, 1986 _4._9._1_0._2 _T_h_e _f_e_o_f _F_u_n_c_t_i_o_n: The semantics of the EOF "indicator" are based on the UNIX stream I/O implementation. Not all systems treat end- of-file in this manner, so we suggest adopting the following simple and consistent rule: "feof" should return TRUE if and only if the next "getchar" will return EOF and the most recent "getchar" also returned EOF. (The second part of the provision is needed to avoid Pascal's problem of having to read ahead.) Thus "fseek" should _n_o_t clear the EOF indicator; instead, it should re-evaluate it. After a call like ungetc(non_EOF_character); "feof" should return FALSE. If a program reaches end-of-file, then another program grows the file, it should be possible to continue reading without explicitly clearing the EOF indicator. _4._1_0._1._4 _T_h_e _s_t_r_t_o_d _F_u_n_c_t_i_o_n: What do "strtod" and related functions assign to "*endptr" if there is a range error? _4._1_0._3 _M_e_m_o_r_y _M_a_n_a_g_e_m_e_n_t _F_u_n_c_t_i_o_n_s: The standard states that pointer values returned by "malloc" et al may be assigned to a pointer to any type of object, then used to access such an object in the space allocated. We suggest that this be changed to read "may be assigned to a pointer to any type of object _w_h_o_s_e _s_i_z_e _i_s _l_e_s_s _t_h_a_n _t_h_e _a_m_o_u_n_t _o_f _m_e_m_o_r_y _r_e_q_u_e_s_t_e_d". This allows greater efficiency of memory allocation, especially on machines that have a high alignment requirement for some data types. For example, some machines require 32-byte alignment for their highest precision floating point, but it is silly to hand out memory in 32 byte chunks when the user only requests a few bytes. It would also be useful to have a library func- tion/macro similar to "malloc" that would take both a length and an alignment as arguments. This would allow for finer allocation of memory, to shorter alignment boundaries. - 30 - University of Waterloo August, 1986 In order to make such a function/macro useful in portable programs, an aaallliiigggnnnooofff operator would be very convenient. This operator would behave in much the same way as sssiiizzzeeeooofff: it would return an integral value indicating the alignment of a type or object. For example, if a machine has words containing four bytes and a particular type must be aligned on a word boundary, the result of aaallliiigggnnnooofff would be 4 (indicating four-byte alignment). The actual type of the result of aaallliiigggnnnooofff would be implementation-defined like "size_t". Note that aaallliiigggnnnooofff would allow programs to write their own efficient portable memory allocators. Memory could be "nibbled" away in alignments suitable to whatever data object needed the storage. It would not be necessary to get the largest possible alignment for _e_v_e_r_y object. _4._1_0._4._3 _T_h_e _g_e_t_e_n_v _F_u_n_c_t_i_o_n: The description of this function should read as follows. The "getenv" function searches an _e_n_v_i_r_o_n_m_e_n_t _l_i_s_t, provided by the host environment, for an entry identified by the string pointed to by "name". The set of environment names and the method for altering the environment list are implementation-defined. The "getenv" function returns a pointer to a string containing the value associated with the given name. Our point is that the name=value format is strictly a UNIX concept and need not be grafted onto other techniques for handling environment variables. The standard should decide whether the returned value is stored in a static storage area or in storage obtained through "malloc". _4._1_0._4._4 _T_h_e _o_n_e_x_i_t _F_u_n_c_t_i_o_n: Why isn't the "onexit" defined as int onexit(void (*f)(void)); - 31 - University of Waterloo August, 1986 This simplifies the definition considerably. _4._1_0._4._5 _T_h_e _s_y_s_t_e_m _F_u_n_c_t_i_o_n: The explanation of "system" should be expanded to make it more clear that passing a null pointer is a query about the existence of a command processor. _4._1_0._6._2 _T_h_e _d_i_v _F_u_n_c_t_i_o_n: We certainly recognize the need to implement a well- specified integer division and remainder operation, but we do not believe the given "div" function suits the need. First, "div" is an inappropriate name for a function that performs both a division and a remainder operation. In fact, we believe that the function should _n_o_t perform both operations. Instead, you should have int _div(int numer,int denom); int _rem(int numer,int denom); This approach has several advantages. (a) You do not have the overhead of calculating the remainder when you want the quotient, and vice versa. While it is true that many machines generate a quotient and remainder simultaneously, this practice is far from universal. VAX machines, for example, can only perform division. To calculate A%B, the machine must make the calculation A-(B*(A/B)). It is expensive to calculate this number when it may not even be needed. (b) On some machines, the two functions could be implemented as macros. With a single function returning a structure, macros could never be used, even if the hardware did the division and remainder opera- tions in the prescribed manner. We also note that the operation prescribed by the standard's "div" function is the less useful of the two alternatives. In our experience, the operation that you usually want to perform is the one that always gives a posi- tive remainder. For example, it is much more common to want (-2)/3 to have a quotient of -1 and a remainder of +1 than to have a quotient of 0 and a remainder of -1. You almost always want to move negative quotients towards negative infinity, not towards zero. - 32 - University of Waterloo August, 1986 _4._1_1._3._2 _T_h_e _s_t_r_n_c_a_t _F_u_n_c_t_i_o_n: It seems odd that "strncat" always adds a trailing '\0' but "strncpy" does not. _4._1_1._4 _C_o_m_p_a_r_i_s_o_n _F_u_n_c_t_i_o_n_s: In the interests of portability, we believe that character comparisons for "memcmp", "strcmp", and "strncmp" should be made using uuunnnsssiiigggnnneeeddd ccchhhaaarrr instead of the implementation-defined approach specified in the standard. _4._1_1._5._6 _T_h_e _s_t_r_s_p_n _F_u_n_c_t_i_o_n: For greater uniformity, the name of this function should be changed to "strpspn". This emphasizes the way it parallels "strpbrk". _4._1_1._6._2 _T_h_e _s_t_r_e_r_r_o_r _F_u_n_c_t_i_o_n: The standard should be more explicit about the connec- tion between the "errnum" argument for "strerror" and the possible values of "errno". _4._1_2._1 _C_o_m_p_o_n_e_n_t_s _o_f _T_i_m_e: Again, we wonder why vowels have fallen into disrepute. CLK_TCK could easily be named _CLK_TICK or _CLOCK_TICK. It should be explicitly stated that values of type "time_t" may not represent time in meaningful units and may not even give values that are uniformly distributed. _4._1_2._2._1 _T_h_e _c_l_o_c_k _F_u_n_c_t_i_o_n: "clock" is a poor name for a function that returns processor time. A name like "processor_time" would be better. The description of "clock" says it returns processor time used since some point in time related only to program invocation. We believe that it should instead return processor time accumulated since some previous point in time, e.g. the time when the user logged on. To time a particular program, the user would make two calls to "clock": one at the beginning of execution and one at the end (or whenever a time check is required). - 33 - University of Waterloo August, 1986 The reason for our suggestion is that many non-UNIX systems have no system call to get per-process timings. Instead, many just keep track of total session time. If implementations are forced to support "clock" as it is now described, many implementations will have to put "time check" code into the set-up routine for every C program. This seems very inefficient, especially because "clock" is not the sort of function that will be used frequently. If a program calls another process using the "system" function, it may be more efficient on some systems for the processor time of the child process to be included in the parent's time, while on other systems it is more efficient not to include the child's CPU time. Thus, this behavior should be implementation-defined. _4._1_2._2._4 _T_h_e _t_i_m_e _F_u_n_c_t_i_o_n: The standard states that "time" returns ((time_t)-1) if the current time is not available. However, -1 may well be a valid time value on many systems. If you are going to select a reserved value arbitrarily, choosing 0 makes more sense, since it allows tests of the form if (time(p)) ... A better solution would be to create a macro named _TIME_UNAVAILABLE with #define _TIME_UNAVAILABLE ( (time_t) X ) where X is some implementation-defined value. "time" would return this value if the time was undefined. _4._1_2._3 _T_i_m_e _M_a_n_i_p_u_l_a_t_i_o_n _F_u_n_c_t_i_o_n_s: It has always been a nuisance to get the current time of day in string format because you must declare your own variable of type "time_t". The library needs a function that behaves like "ctime" but which is declared with char *timefunc(time_t timer); We could then use - 34 - University of Waterloo August, 1986 timefunc( time( (time_t) 0 ) ) to get the current time-of-day string. _S_u_m_m_a_r_y: In order to avoid a deluge of reserved words, all newly introduced symbols should follow a simple rule, e.g. beginning with an underscore. Ambiguities in the defini- tions of structures, unions, and tttyyypppeeedddeeefff constructs should be clarified or eliminated. If you have any questions or comments about any of the material in this document, please contact Peter Fraser, manager of the Software Development Group, at (519) 888-4546. - 35 -
guy@sun.uucp (Guy Harris) (08/28/86)
> According to the standard, an operation like > > int i; > i /= 3.5; > > would be performed using floating point division. However, > the Berkeley C compiler uses integer division. For this > reason, this should be marked as a quite change. *Which* "Berkeley C compiler"? The 4.3BSD one performs this operation correctly; it uses floating-point division. The 4.2BSD compiler (and probably the System III compiler it was derived from) had a bug which made them do integer division in this case. As such, there is no need to make note of this, as it is not a change. > By rephrasing the definition this way, you give the "register" > storage-class more meaning. In particular, you open the > door to compilers that perform global optimizations using > the fact that "register" variables can never have their values > changed by indirection through a pointer. The compiler can > optimize the use of "register" variables because it can always > know when the register values are used and changed. Well, maybe. What about something of storage class "static" or "external" that is never aliased? Should "external register foo" be allowed? If it is, should the compiler actually try (somehow) to place it into a register? What about something that is a member of a structure or array? It may give the "register" storage class more meaning, but the meaning isn't what people think of when they think "register". If such a construct is truly necessary, some other keyword should be provided. I'm not convinced there aren't better ways of accomplishing this goal. > If we have > > enum E1 { e1 } var; > enum E2 { e2 }; > > is it legal to say > > var = e2; > > The answer is almost certainly yes...but we would be happy > if we were allowed to give a warning or an error message for > the operation, if there is no explicit cast. Similarly, we > would like to give a warning for things like > > var = e2 + 1; Yes. 100 votes for this. > On the other hand, it would be very convenient if one func- > tion could declare an object "const" while another did not. > This would let a function indicate when it did not intend to > change the value of an external object, and thereby allow > local optimizations. Presumably, an optimizing compiler can figure this out without the programmer's help. 1) "const" is supposed to be part of the type of an object, so that attempts to modify a "const" object can be detected at compile time. Allowing this sort of thing muddies the waters somewhat. 2) It might be tricky to implement with some loaders. > The last sentence of the Semantics section reads > > If the list is empty in a function declaration > that is part of a function definition, the func- > tion has no parameters. > > What does this say about a function definition like > > int (*F(int a))() {... > > Since the empty identifier list appears as part of a func- > tion definition, the function pointed to by F's return value > takes no arguments. This was probably not what they intended; presumably, the intent was that int foo() { ... } was to declare a function with no arguments. The term "part of" is poorly chosen; some more specific term should be used. (Or C++ should be used, where "int foo();" also declares a function with no arguments; unfortunately, this can't be changed in C. Yet.) > Can auto initializers make use of external variables > with the same name as the symbol being initialized? For > example, is the following valid? > > int i = 1; > f() { > char i = i * 2; > ... > > This sort of construction is allowed and used in Berkeley C > code. Where is this code used? Sounds like it should be cleaned up to me. This kind of construct is going to cause somebody to trip over it when reading code. > The Rationale states that a format for variable-length > argument lists was rejected because the functions > "vfprintf", etc. were "more controlled". This comment > confuses us, because we don't understand what "more > controlled" means. Very clearly, the "vfprintf" approach > offers less freedom and therefore is less useful. Can "%v" and the like be implemented on any system on which "vfprintf" can be implemented? If no, the "vfprintf" will be more widely available and is therefore more useful. > We suggest that "printf" and friends obtain a new > specifier "%v", which accepts two arguments: a new format > string and a "va_list" of items to format. This is similar > to the existing "%r" construct on UNIX systems. *Some* UNIX systems. It's not in 4.2BSD or System V, and it wasn't documented in V7 (I don't remember whether it was there or not). > It seems odd that "strncat" always adds a trailing '\0' > but "strncpy" does not. It may be considered odd, but that's the way UNIX works. Too late to change it now, unless you want to give the new function a different name. (I think the intent can best be described by discussing two character string types; null-terminated strings, and null-or-end-of-buffer-terminated strings. The latter appear, for example, in directories on UNIX systems using the V6 or V7 file systems, where a 14-character name has no null terminator. "strncpy" copies one null-or-end-of-buffer-terminated string to another, while "strncat" appends a null-or-end-of-buffer-terminated string to a null-terminated string. (If you want a version of "strncpy" that copies a null-or-end-of-buffer-terminated string to a null-terminated string, clear out the null-terminated string and "strcat" the other string to it.) > For greater uniformity, the name of this function > should be changed to "strpspn". This emphasizes the way it > parallels "strpbrk". For greater compatibility with existing UNIX implementations, the name of the function should be left as "strspn". This emphasizes the fact that it does the same thing as the UNIX function with that name. -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)
karl@haddock (09/03/86)
watmath!jagardner writes: >The following (huge) document comments on the latest proposal >for a C standard. It is paginated, but does not contain tabs. In fact, it was so huge that the last 15% got truncated here. Would you care to repost or mail from 4.10.3 onward? Perhaps it should be in mod.std.c (mail to cbosgd!std-c)? I'm planning to summarize/repost/followup my earlier ANSI ramblings there, Real Soon Now. Karl W. Z. Heuer (ima!haddock!karl; karl@haddock.isc.com), The Walking Lint
hamilton@uiucuxc.CSO.UIUC.EDU (09/09/86)
>> We suggest that "printf" and friends obtain a new >> specifier "%v", which accepts two arguments: a new format >> string and a "va_list" of items to format. This is similar >> to the existing "%r" construct on UNIX systems. > >*Some* UNIX systems. It's not in 4.2BSD or System V, and it wasn't >documented in V7 (I don't remember whether it was there or not). i remember having to scavenge the old v6 printf.s when our switch to v7 broke all my %r's... wayne hamilton U of Il and US Army Corps of Engineers CERL UUCP: {ihnp4,pur-ee,convex}!uiucdcs!uiucuxc!hamilton ARPA: hamilton%uiucuxc@a.cs.uiuc.edu USMail: Box 476, Urbana, IL 61801 CSNET: hamilton%uiucuxc@uiuc.csnet Phone: (217)333-8703 CIS: [73047,544] PLink: w hamilton