[comp.lang.c] Six, no, seven, no, make that eight, mono, no, dual, wait...

chris@mimsy.UUCP (Chris Torek) (10/14/87)

In article <104@aimt.UUCP> breck@aimt.UUCP (Robert Breckinridge Beatie)
asked:
>>>Hmmm...  why is a 6 character limit necessary in any environment?

In article <8992@mimsy.UUCP> I answered that the limit is left over
from FORTRAN.  (FORTRAN was the first `high level language' for a
large number of machines, and it allowed six characters, mono case.
Many systems were written with this limit; so other languages
followed suit.  If FORTRAN had had dual case identifiers, more
machines would have supported this, and it is likely that Wirth
would have had dual case identifiers in Pascal, and that MS-DOS
would default to case significance, and so forth.  Or so I believe.)
I concluded with

>>`If it was good enough for the 1950s, it is good enough for you'

(and then the comment `(bleah)'.)

In article <11599@labrea.STANFORD.EDU> mason@Pescadero.stanford.edu
(Tony Mason) writes:
>It wasn't so long ago that that limitation was very real.  In
>Version 7 (not really the 50's)

(Mid 70s.  The limits it imposed held into the early 80s.)

>the limitation exists in the linker that a symbol name be no more
>than eight ascii characters.

Right so far...

>With one for the '\0' at the end,

No:  Eight character symbols had no terminating NUL.

>and one for the '_' at the beginning

Whitesmith's (or is it Whitesmiths'?) compiler for the 8080 used
a clever trick:  It *ap*pend the special character (in this case
`.') so as to separate the namespace from that of the assembler,
without taking away one of those precious eight characters.  It
would have been easy to modify V7 to do the same.

>(remember this was a fix so variable names wouldn't clash with
>register names in the UNIX assembler.)

V7 had a seven character dual case system, with the potential for
eight (one tiny change...).  Here are some numbers:

	Character set		Max sym length	# possible symbols [note 1]

	[a-z_][a-z0-9_]*	6		      1 924 294 806
	[a-z_][a-z0-9_]*	7		     71 198 907 849
	[a-z_][a-z0-9_]*	8		  2 634 359 590 440

	[A-Za-z_][A-Za-z0-9_]*	6		     53 447 509 952
	[A-Za-z_][A-Za-z0-9_]*	7		  3 367 193 127 029
	[A-Za-z_][A-Za-z0-9_]*	8		212 133 167 002 880

	(and just for fun)
	[A-Za-z_][A-Za-z0-9_]*	31	51 456 614 975 406 020 687 179-
					   378 939 503 261 597 730 789-
					   013 233 371 509

There are less than two thousand million possible symbols using
six monocase characters.  How many of these can be meaningful, I
wonder?  Adding one more character increases the space by an order
of magnitude; using eight characters gains an additional two orders
of magnitude.  V7's seven character dual case system had three
orders of magnitude more symbol space than the old FORTRAN limit.

In more direct, if less definite, terms, the *feeling* of six
monocase characters is like that of a straightjacket.  Having used
systems with all sorts of different limits, I would say that dual
case adds no `warm fuzzies' (to borrow a phrase), but that seven
significant characters, with more allowed, changes the straightjacket
to an occasional major annoyance; eight characters is actually
about the same; fourteen is an occasional minor annoyance, and 31
or more never gets noticed.  (More than 31 are still needed, for
systems like C++.)

Nonetheless, I believe that we *are* stuck with a six character
monocase limit, for whatever reasons.  I need not like it, but I
will accept its existence---though not without complaint.

>... Why did the ANSI committee leave this temporary limitation in?

(Temporary?  Is that not what they said in the 1950s?  Perhaps too
few people complained :-) .)

>My guess would be it had little to do with FORTRAN compilers

(No more---but that is what started it.)

>but rather with machines that are still in use, still running UNIX
>programs, and still doing so in a limited address space.

Were this the case, we would be at 7 characters, dual case, which
is (in my opinion) *much* better.  The least common denominator is
lower than V7 Unix.  It *is* still out there, still doing work ...
but I will not---not without great incentive, at any rate---use
it. [note 2]

-----
Notes:

[1] A `bc' program to calculate namespace sizes.

	define f(a, b, n) {
		auto s, i
		s = 0
		for (i = 0; i < n; i++) {
			s += a*(b^i)
		}
		return (s)
	}

Function f takes the number of symbols that can appear in the first
position (a), the number of symbols that can appear in the second
through n'th positions (b), and the number of positions (n) and
computes the size of the symbol space (brute force method).

[2] In first person, `shall' is a prediction, `will' carries more
the force of a promise.  In second and third person, this is
reversed.  English is a crazy language....
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

crowl@cs.rochester.edu (Lawrence Crowl) (10/14/87)

In article <8997@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>[Re: the symbol space of fixed length identifiers composed of a letter
>followed by some number of letters or digits.]  Here are some numbers:
>    [six mono-case 2e9]     [thirty-one dual-case 5e55]
>... In more direct, if less definite, terms, the *feeling* of six monocase
>characters is like that of a straightjacket. ... and 31 or more never gets
>noticed. ... I would say that dual case adds no `warm fuzzies', ...

The space of possible symbols should not be measured in terms of possible
character combinations, but in terms of possible word combinations.  (Accepted
abbreviations are fine too.)  With this measure, we can see clearly why six
characters is such a straightjacket.  There are not a lot of meaningful
combinations of words which will fit into six characters.  There are a lot more
that will fit in 31 characters.  This measure also explains why dual case does
not help much.
-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627

franka@mmintl.UUCP (Frank Adams) (10/21/87)

In article <8997@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <104@aimt.UUCP> breck@aimt.UUCP (Robert Breckinridge Beatie)
>asked:
>>>>Hmmm...  why is a 6 character limit necessary in any environment?
>
>In article <8992@mimsy.UUCP> I answered that the limit is left over
>from FORTRAN.

FORTRAN may be responsible for the 6 character limit, but it is not to blame
for the mono-case requirement.  FORTRAN's restriction to mono-case was an
accurate reflection of the hardware available at the time -- not so much the
computers themselves, though some of them did not support lower case -- but
the other devices.  Keypunches, printers, and communications devices (some
derived from teletypes) initially did not support dual case, or supported it
only with great difficulty.

Things could be worse.  BASIC, as originally defined, supported one (upper
case) letter and an optional digit.  And it doesn't have the excuse that
nobody knew better at the time.  It's hard to imagine a two-character
standard emerging, but four has a certain grisly plausibility to it.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108