[net.lang.c] six-char names known harmful, but...

henry@utzoo.UUCP (Henry Spencer) (10/14/84)
It looks like many people are getting an incorrect impression of the
ANSI C standards folks over this business of identifier length limits.
Let's see if I can clarify things a bit.  (Note that I am not a member
of the committee, just an interested observer; I cannot claim to speak
for them in any official way.)

Just in case anybody has *really* misunderstood, the problem at hand is
not imposing a limit on the actual length of identifiers; shades of
Fortran!  The only major question is whether external identifiers are
distinct in only the first N characters, and what the value of N should
be.  (There is a lesser question of whether the uppercase letters are
distinct from lowercase letters within those first N.)  Henceforth when
I refer to "length", I mean "significant length".

It should be noted that, historically, most implementations have imposed
rather severe limits, and most existing C programs were written in such
environments.  The 7-character case-sensitive environment found on the
original PDP11 Unix was a de facto standard for quite a while.  Arbitrary-
length names ("flexnames") are a recent arrival, found in Berklix and in
System V.2.

Essentially everyone involved in the debate, including (I think) pretty
well all the members of the ANSI C committee, agrees that limits on
length are bad.  *NOBODY* is seriously suggesting that new compilers
in new environments should arbitrarily impose artificial limits.  Nor
is anyone suggesting that existing environments should be changed to be
more restrictive.  Even in the existing drafts, which do specify length
limits, the "common extensions" section of the draft lists flexnames as
a "widely used" extension.

Nobody is suggesting any serious limit on the length of non-external
identifiers, including (notably) preprocessor identifiers.  The 21 Aug
draft specifies a 31-character limit, on the argument that existing software
often wants to see *some* limit for table dimensioning; it is easier to
make the limit larger than to remove it altogether.  It WOULD be nice to
remove the internal-identifiers limit completely, and this may yet happen.
I doubt that a 31-character limit would inconvenience too many people.
(Yes, of course, there'll be some...)  Henceforth when I refer to "names"
or "identifiers", there is an implicit "external" on the front.

The big problem is, What To Do about all the existing environments where
it is essentially impossible to retrofit flexnames.  It is easy to suggest
defining a new object-module format, but few manufacturers in their right
minds will agree to either (1) change all their old object modules over,
or (2) support two object-module formats simultaneously.  Anyone who
seriously proposes either of these notions has no concept of the problems
involved.  It's just not a viable solution; they *won't* do it.

(People who persist in arguing for the feasibility of this approach should
be required to demonstrate it by going out and convincing, say, IBM or DEC
to agree to do it if the committee votes that way.  This is *not* an
unrealistic demand, because the Unix gurus who can rewrite a linker on the
drop of a hat are *not* the problem area.  Good luck.)

The obvious "best" solution is to define the C standard to specify flexnames
as the standard and anything else as a subset.  Repeated attempts to get
this past the committee have failed, perhaps largely because the committee
has heavy representation from outfits with existing non-flexname compilers.

(If you think this is awful and unfair, why aren't you on the committee?
ANSI standards committees are required to be open to anyone willing to
invest the money [mostly travel expenses and such] and time [more than
you think].  These people have votes on the committee because they've
demonstrated motivation and involvement, not because they've been picked
by God for the job.)

Actually, it looks like the next draft will go halfway towards this view
anyway; the wording is likely to change to strongly imply that flexnames
are the default approach, with implementation-defined limits possibly
restricting things further.  Note that when an implementation-defined
limit is provided for by the standard, having one is not a violation of
the standard, merely a specific instantiation of it.  This still allows
standard-conforming implementations with length limits, but it does make
things a bit more explicit.  An implementation *is* required to document
such implementation-dependent behavior to be standard-conforming.

The discussion on what implementation-defined limits might be imposed
will probably mention a possible "six characters, monocase" limit on
external-identifier significance.  It is impossible to do any better
than this without making conformance impossible on many major systems.
Let's be realistic, folks:  this standard has a much better chance of
surviving if major manufacturers like it enough to live with it.  A
standard that everyone ignores is worthless.  Defining the standard
in such a way that folks like DEC can't meet it may be fun, and might
be a pleasant form of revenge, but it is a terrible mistake if you
want to see the standard widely known and adopted.

Why "six characters, monocase"?  Because it's the best you can do
without incurring the problems just described.  It *IS* possible to
do worse; the committee is not going for the lowest common denominator.
At least one of the manufacturers represented (I believe) on the
committee has a large investment in an environment with a five-character
limit.  You can find still worse if you really work at it.  But the
major breakpoint, where lots of existing systems fail, is just beyond
"six characters, monocase".  There is little point in imposing a larger
limit, because most of the people who can't provide full flexnames can't
provide much more than "six characters, monocase".

In practice, if things go the way they look to go -- flexnames implied
as preferred, but "six characters, monocase" optional -- the probable
result is that all new or easily-changed environments will provide
flexnames, while acceptance of the standard will be much wider than if
it *required* flexnames.  The only people who will need to worry about
the "six character, monocase" limit, apart from those working on archaic
systems, will be those who are either:

	(A) Trying for maximal portability, including portability to
	old/defective systems.  Such people will need to worry about
	such limits anyway, since many of these old systems will *not*
	change no matter what the standard says.  This is why "lint -p"
	enforces "six characters, monocase", and always has.

	(B) Defining portable library specs.  "strncpy" used to be
	named "strcpyn"; guess why it changed?  Again, this cannot be
	avoided if maximal portability is an important objective.

This strikes me as an acceptable situation, and probably the best that
we can *realistically* hope for.  It is time for compromise and pragmatism,
not ideological purism at any cost.

Can we drop this issue and get back to real problems, like what to do
about the outstanding preprocessor difficulties?  [I know what *I* would
do about them, but we don't have a consensus by any means.]
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry