henry@utzoo.UUCP (Henry Spencer) (10/14/84)
It looks like many people are getting an incorrect impression of the ANSI C standards folks over this business of identifier length limits. Let's see if I can clarify things a bit. (Note that I am not a member of the committee, just an interested observer; I cannot claim to speak for them in any official way.) Just in case anybody has *really* misunderstood, the problem at hand is not imposing a limit on the actual length of identifiers; shades of Fortran! The only major question is whether external identifiers are distinct in only the first N characters, and what the value of N should be. (There is a lesser question of whether the uppercase letters are distinct from lowercase letters within those first N.) Henceforth when I refer to "length", I mean "significant length". It should be noted that, historically, most implementations have imposed rather severe limits, and most existing C programs were written in such environments. The 7-character case-sensitive environment found on the original PDP11 Unix was a de facto standard for quite a while. Arbitrary- length names ("flexnames") are a recent arrival, found in Berklix and in System V.2. Essentially everyone involved in the debate, including (I think) pretty well all the members of the ANSI C committee, agrees that limits on length are bad. *NOBODY* is seriously suggesting that new compilers in new environments should arbitrarily impose artificial limits. Nor is anyone suggesting that existing environments should be changed to be more restrictive. Even in the existing drafts, which do specify length limits, the "common extensions" section of the draft lists flexnames as a "widely used" extension. Nobody is suggesting any serious limit on the length of non-external identifiers, including (notably) preprocessor identifiers. The 21 Aug draft specifies a 31-character limit, on the argument that existing software often wants to see *some* limit for table dimensioning; it is easier to make the limit larger than to remove it altogether. It WOULD be nice to remove the internal-identifiers limit completely, and this may yet happen. I doubt that a 31-character limit would inconvenience too many people. (Yes, of course, there'll be some...) Henceforth when I refer to "names" or "identifiers", there is an implicit "external" on the front. The big problem is, What To Do about all the existing environments where it is essentially impossible to retrofit flexnames. It is easy to suggest defining a new object-module format, but few manufacturers in their right minds will agree to either (1) change all their old object modules over, or (2) support two object-module formats simultaneously. Anyone who seriously proposes either of these notions has no concept of the problems involved. It's just not a viable solution; they *won't* do it. (People who persist in arguing for the feasibility of this approach should be required to demonstrate it by going out and convincing, say, IBM or DEC to agree to do it if the committee votes that way. This is *not* an unrealistic demand, because the Unix gurus who can rewrite a linker on the drop of a hat are *not* the problem area. Good luck.) The obvious "best" solution is to define the C standard to specify flexnames as the standard and anything else as a subset. Repeated attempts to get this past the committee have failed, perhaps largely because the committee has heavy representation from outfits with existing non-flexname compilers. (If you think this is awful and unfair, why aren't you on the committee? ANSI standards committees are required to be open to anyone willing to invest the money [mostly travel expenses and such] and time [more than you think]. These people have votes on the committee because they've demonstrated motivation and involvement, not because they've been picked by God for the job.) Actually, it looks like the next draft will go halfway towards this view anyway; the wording is likely to change to strongly imply that flexnames are the default approach, with implementation-defined limits possibly restricting things further. Note that when an implementation-defined limit is provided for by the standard, having one is not a violation of the standard, merely a specific instantiation of it. This still allows standard-conforming implementations with length limits, but it does make things a bit more explicit. An implementation *is* required to document such implementation-dependent behavior to be standard-conforming. The discussion on what implementation-defined limits might be imposed will probably mention a possible "six characters, monocase" limit on external-identifier significance. It is impossible to do any better than this without making conformance impossible on many major systems. Let's be realistic, folks: this standard has a much better chance of surviving if major manufacturers like it enough to live with it. A standard that everyone ignores is worthless. Defining the standard in such a way that folks like DEC can't meet it may be fun, and might be a pleasant form of revenge, but it is a terrible mistake if you want to see the standard widely known and adopted. Why "six characters, monocase"? Because it's the best you can do without incurring the problems just described. It *IS* possible to do worse; the committee is not going for the lowest common denominator. At least one of the manufacturers represented (I believe) on the committee has a large investment in an environment with a five-character limit. You can find still worse if you really work at it. But the major breakpoint, where lots of existing systems fail, is just beyond "six characters, monocase". There is little point in imposing a larger limit, because most of the people who can't provide full flexnames can't provide much more than "six characters, monocase". In practice, if things go the way they look to go -- flexnames implied as preferred, but "six characters, monocase" optional -- the probable result is that all new or easily-changed environments will provide flexnames, while acceptance of the standard will be much wider than if it *required* flexnames. The only people who will need to worry about the "six character, monocase" limit, apart from those working on archaic systems, will be those who are either: (A) Trying for maximal portability, including portability to old/defective systems. Such people will need to worry about such limits anyway, since many of these old systems will *not* change no matter what the standard says. This is why "lint -p" enforces "six characters, monocase", and always has. (B) Defining portable library specs. "strncpy" used to be named "strcpyn"; guess why it changed? Again, this cannot be avoided if maximal portability is an important objective. This strikes me as an acceptable situation, and probably the best that we can *realistically* hope for. It is time for compromise and pragmatism, not ideological purism at any cost. Can we drop this issue and get back to real problems, like what to do about the outstanding preprocessor difficulties? [I know what *I* would do about them, but we don't have a consensus by any means.] -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry