gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/08/87)
In response to Richard Stallman's article <2144@brl-adm.ARPA>, which contained many thoughtful comments concerning the draft proposed standard, I offer these remarks, which are not intended as official X3J11 position statements. I hope that they may aid subsequent newsgroup discussion and perhaps indicate where the proposals could benefit from modification (such as stronger justification). >Support for arbitrary casts and arithmetic in static initializers >also requires changes to linkers. Consider > int foo = ((int)&bar * 3) % 5001 | (int)&baz; There would seem to be the problem that RMS describes only if the implementation chooses to support casts of pointers to arithmetic types. However, the draft standard does not require this (3.3.4, Semantics). Any use of such address arithmetic is system-dependent. It would perhaps be worth adding a footnote to 3.4, Constraints or 3.3.4, Semantics to the effect that support for such pointer casts implies full address arithmetic support for external linkage; certainly this should be pointed out in the Rationale. >ITEM 2, 2.2.4.2. Use an underscore prefix for library `#define's. This keeps being proposed, but it keeps being voted down primarily on the grounds of conflist with existing practice. Macro names such as CHAR_BIT were taken directly from the /usr/group 1984 standard and its continuation via IEEE 1003. >Unfortunately, these lists are long and programmers cannot know them >all. It is not practical for programmers to avoid all the names in >all the system header files. They can easily do this, as indeed they have to anyway under existing practice, if they are provided with a list of reserved words. I expect many such lists to start appearing. >... But, in fact, any program that uses the standard >input facilities `getchar' or `fread' will also get the `read' system >call which they use, and any attempt in the program to define `read' >in some other fashion will conflict. This is true in any case, as I have discussed in other recent postings. Implementors are constrained to NOT refer to extraneous visible symbols (other than those beginning with an underscore) in implementing their ANSI C libraries. This causes a bit of rewriting of C library sources (e.g., change read to _read), that's all. For extensions such as POSIX (IEEE 1003.1), the reserved name space becomes larger, but the same principle applies. >The C standard is also defining many new names. ... Not really very many. I suppose it depends what your reference point is; compared with UNIX (not BSD) as extended by the /usr/group 1984 standard, the X3J11-added names are: ??* trigraphs (which I hope are gotten rid of), \a character, SCHAR_MAX, SCHAR_MIN, UCHAR_MAX, USHRT_MAX, UINT_MAX (was USI_MAX), ULONG_MAX, <float.h> macros (modeled on the names in the Fortran 8x standard) FLT_*, DBL_*, and LDBL_*, const, volatile, #error, #pragma, ptrdiff_t, offsetof(), <locale.h> macros LC_*, setlocale(), HUGE_VAL (was HUGE), sig_atomic_t, raise(), fpos_t, the three SEEK_* macros, remove(), rename(), the b modifier in fopen() mode strings, %p and %n in fprintf()/fscanf() format strings, fgetpos(), fsetpos(), div_t, ldiv_t, RAND_MAX, strtoul(), atexit(), div(), labs(), ldiv(), memmove(), strcoll(), strxfrm() (voted in last meeting, not in public draft yet), strstr(), strerror(), clock_t, difftime(), mktime(), and strftime(). Compared to the vast number of names already in the C environment, this is a rather short list, most of them of sufficient use in programming that C programmers should learn about them anyway. BSD programmers have a larger number of names to learn since BSD hasn't provided many of the standard UNIX functions that have been around for a long time already, so those too would appear to be new to them. >I propose that *all* names defined by the standard be renamed with the >addition of an initial underscore, with the exception of `NULL'. This proposal doesn't have any chance of being adopted. One of the main goals of X3J11 was to standardize existing practice, so that relatively little change would be required over the large body of existing code. This proposal would require changes to virtually all existing code; if interpreted to apply just to names that are explicitly macros, it would still affect a tremendous amount of code. >Yes, this is brutal, but I don't see any other way to avoid chaos. >Can you find another way? Sure. Keep a list of reserved names nearby while programming, much as many of us do with the table on p. 49 of K&R. Note that DMD programmers already do this (the DMD C environment has many visible names that weren't necessary). Actually, experienced C programmers generally remember practically all the names to be avoided without having to refer to such a list. Routine use of "lint" or equivalent external linkage checking (such as in a smart linker) will catch many oversights in this regard. >ITEM 3, 3.8.3.4. Nested macro problems. >... It therefore needs to be clarified. I think Dave Prosser was going to publish the pseduo-code we've been using internally to discuss the preprocessing algorithm. >But with this rule, it becomes very difficult to implement a >character-based preprocessor. Such a preprocessor has no way to >distinguish an `h' that should not be replaced from another `h' that >may be replaced. Sure it does. One can have quoting "marks" (or "colors") on characters in preprocessing. Indeed, I think they are necessary. I should point out that the power and semantics of preprocessing have been the subject of considerable discussion in X3J11. Some of the cleanest powerful approaches that have been suggested were argued against on the grounds that even small changes in macro processing rules can lead to large surprises in consequences. I think the current model (with slight tweaks from the last meeting) is a fairly good compromise, although some of us would prefer more power, as could be obtained by proposed simplifications of the model preprocessing algorithm mentioned above. There have also been nice proposals for improved pasting, stringizing, and charizing operators that have not been adopted, due I think to members being uneasy with making radical changes to these. >... floating constants are not in the list of what may appear in >an integral constant expression. >... >Perhaps I have misunderstood 3.4. It also says that casts >from arithmetic types to integer types are allowed, which appears >to imply that there may be subexpressions whose type are arithmetic >but not integral. Yet the list of allowed constructs does not >include any subexpressions that could have floating type, making >it superfluous to list the possibility of such casts. Since the operands can include casts of arithmetic types to integral types, and because floating constants have some floating-point type (3.1.3.1, Semantics), (int)3.5 is okay. >... serious problem because it implies that expressions such as >`((float)((int)&foo | 38))' (where `foo' is static) are valid. I discussed this above. The draft standard does not require support for casting pointers to other types. >forbidding preprocessors to allow equivalent definitions >with different argument spellings serves no useful purpose. The intent is to permit "benign" redefinition only, such as may occur when including the same header file twice, and to catch other cases as possibly erroneous redefinitions. The only constraint that made sense for this was the spelling apart from whitespace. >... A preprocessor that ignores the argument spellings >when comparing definitions actually fits this spirit better than >what is currently required by the standard. I don't see this, and would appreciate some examples. >ITEM 6, 4.9.6.5. `sprintf' is unsuitable for robust programs. Yes, that's true in many (not all) situations. A similar problem is fscanf() into a too-short data type (text is to be added to emphasize that this is undefined). > int snprintf (char *s, int len, const char *format, ...); This is a good idea, which should probably be added rather than replacing the function (sprintf()) already in wide use. This is analogous to having both strcpy() and strncpy(). >ITEM 7, 3.3.4. Allow casts to union type. The problem is, a union is not a scalar type. In general, it cannot be. To make this suggestion work, we would have to distinguish between scalar-unions and aggregate-unions, and for symmetry the same should be done for structures. This is a lot of invention just to support rare operations on unions, which are inherently kludges anyhow. (I forget when the last time I needed a union was; they're seldom necessary or advisable.) >ITEM 8, 3.3.8 and 3.3.9. Allow comparison of types such as `int *' >and `const int *' that differ only in the presence or absence of >`const' or `volatile' in the type pointed to. I don't believe that there is any constraint with regard to this now. Certainly const and volatile are not part of the object's type, and the constraint is only that the pointers be to objects of the same type. >ITEM 9, 3.5.3.2. Allow the length of an array to be zero. This has been suggested before, but only a few X3J11 committee members like 0-sized objects. If we were to allow this, there are several other parts of the draft standard that would also need to be adjusted. As a proposal, it needs strong evidence of utility in order to overcome the X3J11 position against it. >The examples above show why zero-length arrays are useful; the burden >of proof is now on whoever wishes to show there is a reason not to >allow them. This runs afoul of the common idiom for the number of elements in an array: sizeof array / sizeof array[0], since sizeof array[0] might be 0. (This could realistically occur for an array of structures whose contents were just a 0-length array, "details not yet determined", during software development.) >I do not propose any change to the specifications of `malloc', etc., >... The currently specified behavior >for size zero is adequate even when there are types of size zero. For logical consistency, malloc(0) should return an actual storage address of a zero-sized object (furthermore, each such object should be at a different address). We couldn't get the committee to agree along with that; whether a NULL or something useful is returned is now up to the implementor (so at least UNIX can have reasonable semantics for this), as of last meeting. >... no strictly conforming program can use #pragma. Use of #pragma is not permitted to alter the virtual-machine semantics of the code, so it isn't totally unsafe. However, I agree with the spirit of the argument, that it isn't as useful in portable programming as one might wish. I would be happy to see #pragma removed from the standard. >... Yet it is >desirable to be able to state frequency information in strictly >conforming programs. I don't know that this is a burning necessity. The problem with "feature X would be nice" is that, although in any one case it doesn't hurt to adopt such a feature, by the time we gave equal weight to the zillion such features that have been proposed, the language would be hopelessly out of control. New features, for the most part, have been added only when their logical or practically necessity has been demonstrated. >... serving only as a declaration that its body will be executed an >average of NN times as often as the smallest containing `if', `while', ... "Average" is a pretty slippery concept. The simple arithmetic- mean count is often not an appropriate measure of the importance of optimizing a section of code. Even worse, the usual number of iterations is determined at run time, not compile time, so a simple constant is inadequate. It seems premature to try to invent a standard way of specifying this sort of predictive information. (It also doesn't seem terribly important to me.) >ITEM 11, 2.1.2.3. The standard ought to say more explicitly when >aliasing can validly take place in a strictly conforming C program. It isn't clear that it is feasible to completely specify the circumstances under which C pointers might address overlapping objects. Some "Safe C"-like products do this, though, for a limited subset of situations. >I am not sure whether the standard implies that > > short in, out; > { > char *inptr, *outptr; > > inptr = (char *) ∈ > outptr = (char *) &out; > int i; > > for (i = 0; i < sizeof (short); i++) > outptr[i] = inptr[i]; > } > >is defined and equivalent to > > short out, in; > > out = in; No, this can't be guaranteed. For example, there may be bits in the short that are not covered by its chars. >If this is not done, implementors will search, separately, for valid >rules, thus duplicating effort. Some of these rules will make certain >C programs not work as intended, and then users and implementors will >argue inconclusively over whether the actual behavior violates the >standard. They will do that anyway. Not conforming to the operation of the abstract machine described in the draft standard is definitely a compiler bug; if some characteristic of the abstract machine is left undefined, then it is folly to rely on it. I also wonder just how much typical code would be sped up by the sort of optimizer you're concerned about. I think this is another case of worrying about "microefficiency", which is usually misplaced concern. >Note that the first example in section 2.1.2.3 of the Rationale gives >an example where this issue is relevant. If the variable `sum' is >normally held in memory, keeping its value in a register during the >loop will give incorrect results if `a' is equal to `&sum - 1'. This doesn't seem much different to me from what happens for i += i; The wise programmer avoids writing code that could fall into such situations. If this isn't sufficient for pointers, I would like to know why. Note that not-specifying something is sometimes a deliberate decision intended to avoid unduly constraining implementations in non-crucial areas. >However, when 2.1.1.3 is added to these decisions, it has the effect >of forbiding any subsequent art that would ever shed light on the >matter. The current wording was carefully chosen. Forbidden extensions could be enabled by a compiler switch, for example, with the understanding that use of the switch is in effect use of a separate, non-conforming, compiler. I suspect many vendors will do this, at least for a transition period. >2. A preprocessor directive that allows defining a macro so >that each time it is called it appends some text to the definition >of another macro. I thought this could be done with the defined facilities. >... Thus, invalid syntax typically violates >no particular rule, but rather fails to correspond to any rule. This is a subtle point, but correct. The wording should be improved, perhaps to "a violation of syntax rules or any constraint". >ITEM 14, 2.1.1.3. The idea of erroneous program has been misapplied. I frankly don't know why 2.1.1.3 is present, since this seems to be what X3J11 refers to among themselves as a "quality of implementation" issue. >ITEM 15, 2.2.1.1. Eliminate the ?? trigraphs. Yes, absolutely! I suspect that few implementations will exactly reproduce the shapes of the graphics printed in 2.2.1; what is the distinction between an apostrophe that looks like a vertical single quote or a vertical-bar that looks like a deformed colon, and a curly bracket that looks like an umlauted O? This is just a matter of degree, and even it can be much improved for more recent ISO character set developments that include the necessary glyphs in the full 8-bit set. >ITEM 16, 3.5.6. Allow variable elements in aggregate initializers. Lack of prior art. This is another of many nice possible extensions to C, which are outside the scope of the standards committee. I don't think order-of-evaluation was a major factor in leaving this out. It is of course possible that such extensions may be blessed in a future revision of the standard, but they are prohibited by this one. >ITEM 17, 3.8.3. Don't say whether keywords can be #defined. This is necessarily excluded by some implementations that bundle preprocessing into the lexical analyzer, so it can't be officially blessed behavior. >... making it undefined >greatly preferable to forbidding it. This is worth reconsidering, although "implementation-defined" might be preferable, but there is a natural reluctance to say this about too many aspects of the language. >ITEM 18, 2.1.1.2. Converting preprocessor tokens. + is already an operator, therefore a preprocessing token, therefore in step 7 it is converted to a normal token, by itself (not combined with other pp tokens such as =). >ITEM 19, 2.2.2. The wording of the definition of `\f' should be changed. I'm not at all happy with trying to build a model of display device operation into the language. Note, however, that 2.2.2 is phrased in terms of the INTENT of these control characters. That has no legal significance as a constraint. > \f (form feed) Is regarded as dividing a document into pages. Whatever that means... >The example of float.h values for IEEE standard floating point >does not define FLOAT_ROUNDS. Is this an omission? Yes, it has to be defined as something. Editorial change needed. >ITEM 21, 3.1.2.5 says that `int' and `long' are different types even if >they are identical in range. This is a legalism necessary due to wording throughout the standard. Your example is correct and relevant. >ITEM 22, 3.2.2.1. This says that arrays are coerced to pointers >"where an lvalue is not permitted". This is not only correct, it is absolutely essential. Sizeof is indeed one situation requiring this. We didn't provide a list because experience has taught us that such lists almost always are incorrect. >ITEM 23, 3.3, paragraph 3. It would be natural to allow >associative regrouping for `+' and `- together', as in >`a - (b - c)' => `(a - b) + c' and `a - (b + c)' => `(a - b) - c'. >But the wording in use seems to rule this out. The regrouping issue is important only for operands that have side-effects, and for access to volatile objects. Otherwise, a compiler is free to rearrange its parse tree so long as the abstract machine semantics are obeyed. >However, at the end of section 3.3 in the Rationale it says that a >decision was made against "extending grouping semantics [of unary >plus] to the unary minus operator". This would seem to mean that >regrouping through unary minus is permitted. Unary + inhibits regrouping of subexpressions of its operand with subexpressions outside it; this is intentional. Checking the wording in 3.3.3.3 about unary -, it does appear that the sentence in the Rationale is unnecessary and therefore confusing. >In addition, it is conceptually simple to regard `a-b' as equivalent >to `a+-b'. This may be true in mathematics, but it isn't recognized by the draft standard, so it would be an unwarranted assumption. We really didn't want to pursue this path, since then overflow on negation (2's complement machines) would have to be dealt with, etc., greatly complicating the specification. > extern char x[]; > ... sizeof x ... Oops! Something needs to be added to prohibit this. Thanks for discovering this oversight. >ITEM 25, 3.5.2.1. It is not clear whether an `unsigned int' bit field >of fewer bits than the width of an `int' undergoes an integral >promotion of type to type `int'. 3.2.1.1 suggests that it does. >3.5.2.1 suggests that it does not. Each bit field already has a type: int, unsigned int, or signed int, as specified in 3.5.2.1, Semantics. Conversions in expressions occur by the usual rules given for these types. >ITEM 26, 3.5.5 says, "two types are the same if they have the same ordered set >of type specifiers and abstract declarators". > >Does this mean that `long unsigned int' and `unsigned long int' might >be different? Must be different? I think 3.1.2.5 was mean to enumerate the complete set of basic types, so both the above are canonically `unsigned long int', but you're right in pointing out that this part of the specification appears to be incomplete amd should be beefed up. I don't think `const' and `volatile' were meant to be part of the object's official type, but this is indeed unclear. >ITEM 27, 3.7.6. 3.7.1? I seem to remember agreement at the last meeting that there was a problem with these examples, to be fixed editorially. >ITEM 28, 3.8. The constraints in this section appear to imply that >comments are not allowed on preprocessor lines except at the end. It should indeed state that this constraint applies just after translation phase 3 (see 2.1.1.2). This should be clarified. >As I understand it, whitespace other than newlines is not significant >at the stage of preprocessor tokens, so this statement is misleading. No, they lose significance at the point where pp tokens become normal tokens (2.1.1.2). >What is worse is that the only way a separate preprocessor can >make sure that two preprocessor tokens don't convert to one normal token >is to output whitespace between them. I don't think this is correct. At this phase of translation, tokens do not get "glued" to each other; that occurs only during the macro expansion phase, and there are ways of blocking inadvertent gluing there (I think). Certainly the committee decided to avoid pasting with things to the left of the replacement string, in order to limit rescanning scope. >ITEM 30, 3.8.3. Is there some motivation for not standardly supporting >the use of empty macro arguments? Some of us want that facility, as well as other additional preprocessing power, but we couldn't get the whole committee to buy into it. Perhaps some of the counter-arguments could be placed in the Rationale. >ITEM 31, 3.8.8. This would appear to forbid the common practice of >predefining some macro names that identify the type of hardware >and software that are in use. That's absolutely correct. Conforming implementations are prohibited from doing this. Note that a lot of us have not been very happy at having symbols like `sel' and `sun' turned into `1' in our code. > A.6.5.13 says that this practice >is expected to continue. A.6.5 says that only names beginning >with an underscore could be predefined. > >No two of the above can be true. Sure they can. Actually, A.6.5.13 documents a current extension in wide use, and A.6.5 notes that such an extension renders the implementation nonconforming. No conforming implementation can continue the practice described in A.6.5.13. And high time! >Actually, the predefined names currently in use are undesirable >because they do not begin with underscore. PRECISELY. >However, the need for some way to indicate to the C program what >kind of environment it is being compiled for is a great one and >has to be filled somehow. > >It would be better to predefine names that begin with underscore for >this purpose so as to avoid conflict with names chosen by applications >programmers. This is what we anticipate will occur. There really ought to be a standard registry for such things, but X3J11/ANSI isn't interested in acting as one. I would like to suggest that everyone use the prefix "_sys_" for OS environment and "_mach_" for architecture for any future such predefined macro names. For example, the machine I'm preparing this response on would predefine _sys_bsd43 and _mach_vax. I'm willing to act as a clearing house for such names, so long as it is understood that no legal liability is involved. >ITEM 32, 4.9.4.1. The Rationale says that `remove' was defined >because the definition of `unlink' is too Unix-specific. Actually it is the NAME "unlink" that is UNIX-specific. My UNIX implementation of remove() just calls unlink(). By providing a separate name, room was given to IEEE 1003 to specify additional semantics for unlink() without worrying about conflict with X3J11. We had a lengthy struggle to resolve the other overlap conflicts (all taken care of now). >The definition of `remove' could be read as requiring that the file >(not just one of its names) actually disappear immediately unless it >is currently open. Well, since "causing the file to be removed" is not explained, one could apply UNIX semantics to this, where "file" is taken to mean "link". However, I agree that consideration should be given to clarifying this. >Elsewhere it is stated that it is not defined whether files created >by `tmpfile' are removed on abnormal termination. This is a good >specification, but it needs to be stated with the definition of >`tmpfile'. The definition could now be read as implying that the >file will be deleted on normal termination. Definitely the temp file must be auto-deleted on normal termination. It should be arranged, if possible, for it to be removed also upon abnormal termination, but of course we can't insist on this (on UNIX it is easy). 4.9.4.3 should probably be changed to say NORMAL program termination. >ITEM 34, A.6.3.3. What does the "order of bits in a character" mean? I don't know why this is in A.6.3.3 at all, but since we insist on a binary numeration system for integers (in order to make sense of the bitwise operators), the fact that the bit order is undefined might be worth noting. However, since there is no particular guarantee about character set encoding, it is probably a waste of ink to include this. >I know about the difference between byte ordering among machines, >but I don't see that it constitutes any ordering of the bits within >a byte. The arithmetically least significant bit is definitely in there somewhere, so there IS a well-defined bit order. Serial transmission also implies a bit order. Anyway, this will be a moot point if the useless item is deleted. I'd like to thank RMS for his helpful comments and suggestions!
wcs@ho95e.UUCP (#Bill.Stewart) (01/12/87)
<Richard Stallman submitted a long commentary on the current ANSI C draft standard. Doug Gwyn replied.> One of the topics of discussion was the standards's disparagement of the practice of #define constants for machine names not starting with _. (This results in variable names like "sun" getting trashed into "1".) Doug's suggestion was to have such constants start with _sys_ and _mach_, e.g. _sys_43bsd or _mach_vax. As an alternative, why not let _sys_ and _machine_ be preprocessor variables, leading to #define _sys_ SysVR2 #define _machine_ u3b2 and #if _machine_ = vax instead of #ifdef _mach_vax -- # Bill Stewart, AT&T Bell Labs 2G-202, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs
karl@haddock.UUCP (01/16/87)
In article <1258@ho95e.UUCP> wcs@ho95e.UUCP (Bill Stewart) writes: >Doug's suggestion was to have such constants start with _sys_ and _mach_, >e.g. _sys_43bsd or _mach_vax. As an alternative, why not let >_sys_ and _machine_ be preprocessor variables, leading to > #define _sys_ SysVR2 > #define _machine_ u3b2 >and > #if _machine_ = vax [I presume that was a typo for "==". --kwzh] Your suggestion is incomplete. In cpp as currently implemented, your "#if" would expand into "#if u3b2 == vax" which would be true (!) since tokens with no definitions are evaluated as zero. Perhaps you're assuming a hook in cpp to recognize such a comparison and use the token names instead of their values. (Perhaps you even intended "=", as opposed to "==", to distinguish this operation? I hope not.) I think such a rule is too confusing. Someone else already suggested a variant that I think is more viable: #define _machine_ "u3b2" could be tested by some comparison operator. Either "==" or "strcmp(,)==0" could be used, but since the interpretation is not identical to that of C, I will suggest a new preprocessor operator "#==". Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
rbutterworth@watmath.UUCP (01/16/87)
In article <303@haddock.UUCP>, karl@haddock.UUCP (Karl Heuer) writes: > Someone else already suggested a variant that I think is more viable: > #define _machine_ "u3b2" > could be tested by some comparison operator. Either "==" or "strcmp(,)==0" > could be used, but since the interpretation is not identical to that of C, I > will suggest a new preprocessor operator "#==". Even that extension isn't good enough. What if I want to test for "vax" in some places and "vax/780" in others? Using #ifdef is still the best solution and it doesn't require any changes to cpp. e.g. #if defined(_TS_UNIX) # if defined(_TS_UNIX_BSD) && !defined(_TS_UNIX_BSD_4_3) ... You can't do such things (assuming one really wants to do such things) using a single _operating_system_ symbol. The only "change" is that X3J11 should reserve some prefixes such as "_TS_" and "_TM_" for local defintions of the target operating system and target machine.