[net.lang.c] 6 char externs and the ANSI standard

joemu@tekecs.UUCP (10/04/84)

Here's another hot issue in the committee. Should the minimum character
limit for external symbols be longer than 6 chars, case indistinct?

Arguments for 6 char:
1. The standard's purpose is to define a portable language that may be supported
   on most if not all machines. Some older machines can not support more than
   this limit. To ignore this limitation would be contrary to its purpose.
2. The limit is a minimum NOT a maximum, your compilers are free to support
   longer identifiers, but PORTABLE programs should not depend on them.
3. Prelinking is perceved to be a time consuming kludge.

Arguments against 6 char:
1. leads to cryptic names
2. external and internal identifiers have differing significant lengths
3. compiler systems on machines with smaller limits may "prelink" their
   objects similar to the mechanism proposed for ada. The prelinker would
   map these longer names to shorter names and provide the data necessary
   for symbolic debuggers to convert the names back and forth.

I know this topic has been discussed before in this forum, but the
committee really needs to get a clear sense of the user community on how
acceptable this limitation is.

warren@tikal.UUCP (warren) (10/04/84)

[This is not a sentence]

	Six character identifiers are much too short for significant
	software.  There is much to be gained by defining a larger
	minimum, we all know the benefits.  The worst single result is
	unintented synonyms created when the first 6 letters in an
	identifier match, as in

	int record_count;
	int record_type;

	The worst error in the C language was in allowing identifier names to
	be longer than the number of significant characters.  This is an 
	invitation to disaster.  If possible the ANSI commitee should remove
	this "feature".  Compilers should be forced to at least produce
	warnings.  Although if a compiler can produce warnings, it can handle
	longer names...  

	If a minimum must be specified, let it be 16 or 20, at least.

						teltone!warren

jeff@alberta.UUCP (C. J. Sampson) (10/05/84)

I say NO!  I do not believe that a 6 character limit on external 
variables would be good at all.  For programs to be truely "portable",
they have to run on the vast majority of machines with little or no
change.  If you are going to have to make changes, it is much easier
to do it with program that are well structured and have names like
"last_article_read" rather than "lstard".  If we are going to have a
good language, let's not limit it in a silly way like this.
-- 
------------------------------------------------------------------------
C. J. Sampson	Snail Canada: #712 11135-83rd ave.	***DISCLAIMER***
ihnp4!alberta!jeff		Edmonton, Alberta	+--------------+
ubc-vision!alberta!jeff		CANADA  T6G 2C8		| These may    |
sask!alberta!jeff					| be opinions. |
		 					+--------------+
"He who spends the storm beneath a tree, takes life with a grain of TNT."

hammond@mouton.UUCP (10/05/84)

Since the problem for older computers is the
linker and not the compiler, the compiler should generate errors
whenever it sees identifiers which should be distinct but won't
be to the linker, either in number of sig. chars or in case!

i.e:   int this_is_one_identifier; int this_is_another;

should generate errors on compilers whose linkers only distinguish
the first 8 chars or less.

Also: int THIS_IS_ONE_IDENTIFIER; int this_is_one_identifier;

should generate errors if the linker is not case-sensitive.

Then it is OK to specify a minimum 6 char/case insensitive minimum.
After all, if you're compiling, you do have the source and can edit
it IF the compiler warns you about problems.

Rich Hammond, Bell Communications Research

mike@RICE.ARPA (10/07/84)

From:  Mike Caplinger <mike@RICE.ARPA>

I can't believe that anyone is seriously debating this.  Come on!  It's
1984!  The only reason variable names were ever limited to any
particular length is because linker writers in the 1960s and 70s were
sleazes.  How long are we going to pay that price?  How long are we
going to look at object modules as 80-column card images ala OS/360?

Please, please!  Don't put a limit on external identifiers!  I
guarantee you that at best it will just cause a flurry of "standard
extensions".  If C is to become portable, don't make it conform to the
lowest common denominator.  That's what has killed every other ANSI
language standard.  For example, when was the last time you wrote a
program in ANSI PL/I?  And nobody around here writes in Fortran 77
because X3J3 couldn't decide on the syntax of the WHILE loop!

I just hope somebody who makes a difference actually reads this list.

	Mike Caplinger
	Computer Science Department, Rice University

jim@ism780b.UUCP (10/08/84)

> 1. The standard's purpose is to define a portable language that may be supported
>    on most if not all machines. Some older machines can not support more than
>    this limit. To ignore this limitation would be contrary to its purpose.

How can you talk about portability when you consider the large number of
existing otherwise portable programs this would break?  The standard should
protect the large number of existing programs, and demand that implementors
deal with it.  Protecting implementors with weak linkers but screwing existing
code is not the greatest good for the greatest number.  And implementors
should, at the very least, be required to provide a tool that detects
identifiers that will be treated identically even though they are not
textually identical within, say, 32 characters.

One method that pushes portability but might be acceptable would be for an
implementor to provide a tool that maps identifiers into shorter names,
either creating new files or through a list of preprocessor defines.  This
would allow programs to compile and run properly, although debugging would be
more difficult due to the name changes.  But the standard makes no demands
upon the quality of the debugging facilities.  Note that such tools have
already been published on the net.

In light of such methods, I do not think the committee is correct to hobble
the quality of the language for the sake of antiquated implementations.

-- Jim Balter, INTERACTIVE Systems (ima!jim)

hansen@pegasus.UUCP (Tony L. Hansen) (10/08/84)

I have a very serious problem with the proposed minimum standard of 6 char,
mono-case external variables. Say I'm working for a small company that wants
to write a new C compiler and they want me to write it fast without throwing
in anything fancy! What is my boss going to tell me when I ask him/her how
many external characters have to be supported? S/he'll certainly say "what's
the standard say?", to which I'd have to reply "the minimum is 6 characters
mono-case." Of course s/he'll say "Well write it by the standard!"

Come on folks, anytime you set a minimum like that it's going to be followed
in new compilers/linkers as well as the cranky old linkers that won't change
to keep pace with the world.

The last time that this subject came up, the consensus seemed to agree with
my position that the 6 character/mono-case limit should be a SUBSET of the
standard rather than having the standard be the minimum case. Make the
standard something sensible and then recognize those sub-standard compilers
for what they are. There's already a section on valid extensions to the
language; add a section on recognized subsets.

Think again about the above scenario. If my reply to my boss' question were
"external variables can be N characters, multi-case, but there's a subset
which only allows 6 characters, mono-case", my boss would certainly say
"We're not writing a new compiler just to be a subset! Write it by the
standard!"

----
On to specifics:

< Arguments for 6 char:
< 1. The standard's purpose is to define a portable language that may be supported
<   on most if not all machines. Some older machines can not support more than
<   this limit. To ignore this limitation would be contrary to its purpose.

So don't ignore them but recognize them as a subset from the language proper.

< 2. The limit is a minimum NOT a maximum, your compilers are free to support
<   longer identifiers, but PORTABLE programs should not depend on them.

My scenario above indicates that the minimum often becomes the maximum when
people refuse to go beyond the minimum requirements. Besides, PORTABLE
programs based on such minimum standards are very difficult to maintain
without resorting to some trickery using pre-processing of some sort.

< 3. Prelinking is perceved to be a time consuming kludge.

Only on the systems which can't be upgraded to the standard. The rest of us
get something reasonable.

----

If we allow this to get through now, we're going to be the ones that will
have to live with it in the future. Here's my vote for sanity. Please add
yours while there's still a chance!


					Tony Hansen
					pegasus!hansen

jwp@sdchema.UUCP (John Pierce) (10/09/84)

In article <4095@tekecs.UUCP> joemu@tekecs.UUCP writes:
> Here's another hot issue in the committee. Should the minimum character
> limit for external symbols be longer than 6 chars, case indistinct?
> ...
> I know this topic has been discussed before in this forum, but the
> committee really needs to get a clear sense of the user community on how
> acceptable this limitation is.

It isn't acceptable at all.  Though I don't *really* care.  I've got tools
than that, and I'm loyal enough to the people who pay me that I will use those
tools if they're necessary to reduce the production and maintenance costs of
what I do.  If that means that what I write doesn't meet the standard, that's
too bad.  That doesn't mean I'll deliberately violate it; it just means that
no time will be wasted on worrying about it.  I doubt we will ever retreat to
the 11/40 and "pre-phototypsetter" compiler we started with.

And you're right.  It has been discussed here before.  Endlessly.  I don't
understand why we're going through it again, since it's basically a waste of
disk space and transmission time.  The committee will make whatever decision
is necessary so that everyone can say "My favorite toy, the Blah C Compiler,
meets the ANSII standard - so buy it", without having to spend any money making
it (and its associated linker) into a decent product.

				John Pierce
				{decvax,sdcsvax}!sdchema!jwp

radford@calgary.UUCP (Radford Neal) (10/10/84)

I think the standard should specify INDEFINITE length identifiers, both
internal and external. If this is thought to be too much, about 16 is
the minimum. SIX is right out.

Since C does not have a "package" mechanism, libraries of routines should
really prefix all external names to avoid conflicts with application
routines and with other packages. For example, I have a library in which
all externals are prefixed by "j_" in order to avoid conflicts. Two 
characters is about minimum for a prefix, so the proposal would leave
only FOUR significant characters in an external name, which is nowhere
near enough to be meaningful.

Six character names are downright archaic. If the standard specifies this,
the temptation to make use of a local extension to the length will be
irresistable (and rightly so). This defeats the whole purpose of the
excercise. Rewriting a brain-damaged loader is not all that difficult.

Whatever you do, the one place where indefinite-length identifiers
ABSOLUTELY MUST be allowed is in the pre-processor. As long as this is
the case, one can at least rename all one's long identifiers as short
ones with a .h file.

      Radford Neal
      Dept. of Computer Science
      University of Calgary

gordon@uw-june (Gordon Davisson) (10/10/84)

I think you should definetly have the standard allow long external variable
names. Limiting them to 6 characters would have a number of bad effects:

  1) Compiler writes would follow the standard. This means that there
     would be even more obsolete implementations out there next time
     someone designs a language, and they'll have to deal with this
     question all over again. The 6 character limit is older than I am,
     and it's high time it was retired.

  2) Compiler writers would *not* follow the standard. The problem with
     this is that everyone will violate it in a different way. Some
     compilers will support 31 char limits, some 64, some 255. In other
     words, the standard will not be used and would therefore be a
     failure. This is not nearly as serious as 1, but it is a problem.

  3) Programmers would follow the standard. This leads to unreadable,
     unmaintainable, and generally ugly programs.

  4) Programmers would *not* follow the standard. Since 6 character
     variables are such a pain in the *ss, programmers using extended
     compilers will tend to take advantage of the ability to use
     understandable names, and proceed to write non-portable programs.

The main problem with long externs is that some compilers will have to
work with existing linkers that can't handle long names, but I think the
best way to deal with this is to define a "standard subset". I don't
want to see such limited compiler/linker setups advertised as "Full
ANSI C" as if they weren't 10 years out of date.

--
Human:    Gordon Davisson
USnail:   5008 12th NE, Seattle, WA, 98105
UUCP:     {ihnp4,decvax,tektronix}!uw-beaver!uw-june!gordon

henry@utzoo.UUCP (Henry Spencer) (10/10/84)

> How can you talk about portability when you consider the large number of
> existing otherwise portable programs this would break?  The standard should
> protect the large number of existing programs, and demand that implementors
> deal with it.  Protecting implementors with weak linkers but screwing existing
> code is not the greatest good for the greatest number.  

One of the ANSI committee's basic goals is the protection of existing
*correct* code.  Note that the previous de facto standard, K&R, quite
explicitly specified an 8-character limit.  Pre-ANSI code which depends
on long names is not portable, regardless of fraudulent claims to the
contrary by Berklocentric implementors.

Don't get me wrong; I am entirely in favor of long names, and I tend to
agree with the suggestion that the 6-character limit on significance of
external names should be listed as a "subset" feature.  But people who
wrote long-name programs long before there was any standard along those
lines, and then had the gall to call them "portable", have no cause to
complain about portability problems.

*MOST* existing C programs were written in environments with a 7-character
limit or something similar.
-- 

	"Yes, Virginia, there is life outside Berkeley."

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

hammond@mouton.UUCP (10/11/84)

I don't think you need a pre-processor, why not a post-processor?
I.e. only when you do a "cc *.o -o executable" do you need to
worry about the name clashes, and by then the compiler ought to
have eliminated most parsing problems, since you're dealing with
C specific objects.

Advantages:  C compiler can support arbitrary length identifiers.
	     The post processor only has to worry about extern symbols.
	     Can still support debugging by providing the mapping info.

Disadvantages: Would not be able to link C ".o" files with other compiler
		files.
	     Would require a "cc *.o" not an "ld *.o", but then most
	     sensible makefiles I have seen do that anyway, since
	     cc ought to know the library and loader flags needed.
Rich Hammond

PS As some of you kindly pointed out, my proposal to have the compiler
check for symbol clashes does not work for separate compilation.

rcd@opus.UUCP (Dick Dunn) (10/11/84)

> Here's another hot issue in the committee. Should the minimum character
> limit for external symbols be longer than 6 chars, case indistinct?
                                                     ^^^^ ^^^^^^^^^^

This one almost got past me.  What's this?  It seems that the matter of
case distinction is thornier than the 6-char business, unless I'm missing
something obvious.  What can the standard say?  I see:

	1.  Case matters in externals.  Breaks some systems which ignore
	case; standard-conforming programs would fail on these systems.
	2.  Case doesn't matter.  Breaks more systems which distinguish
	case; standard-conforming program would fail on these systems.
	3.  Case is not allowed to matter--i.e., a program is not standard-
	conforming if it contains externals which are distinguished only by
	case.

I suppose (3) is what we'll get?  I would certainly prefer (1) just on the
basis of C's case-sensitivity (about which I have mixed feelings but I'd
like the (foolish) consistency).
-- 
Dick Dunn	{hao,ucbvax,allegra}!nbires!rcd		(303)444-5710 x3086
   ...Relax...don't worry...have a homebrew.

chuqui@nsc.UUCP (Zonker T. Chuqui) (10/12/84)

I have to agree. 6 char externs are ugly. But look at reality for a second.
there are many, many, MANY systems out there with this (or similar)
restrictions inbred into the system software. Manufacturers who would want
to implenent a standard C compiler would have to change ALL of their
software to meet that standard. For example, the DEC people would not only
have to change/'fix' vax-11C, but cobol, pascal, fortran, bliss, logo,
smalltalk, euclid, macro, and the kitchen sink. that's a LOT of man years.
then they give it to their customers, and suddenly all the object files
break. And the object only programs you bought stop working. And... And.
And. 

what would really happen, of course, is that the manufacturers who have
these kinds of restrictions wouldn't fix them, they'd document it as a
slight variation from the standard, in small print, in the manual they
forgot to ship. And the software that is 'standard' would not work, or be
flakey, or work most of the time. Much as I hate this restriction, I think
it has to be considered or we'll simply leave a loophole in the standard
that'll come back to bite us later.

chuq (upward compatibility sucks, but the alternative is worse)
-- 
From the Department of Bistromatics:                   Chuq Von Rospach
{cbosgd,decwrl,fortune,hplabs,ihnp4,seismo}!nsc!chuqui  nsc!chuqui@decwrl.ARPA

How about 'reason for living?'

gino@voder.UUCP (Gino Bloch) (10/12/84)

bugeat

Count me as another vote AGAINST 6-char externals.
-- 
Gene E. Bloch (...!nsc!voder!gino)

qwerty@drutx.UUCP (10/12/84)

As one who was just bit by trying to port software (my own, unfortunately)
from a machine that understood externs > 6 characters to a modern Vax 11/780
that didn't, I am all for encouraging anything that will get Dec's software
out of the dark ages molded around 6 bit ASCII notation and label syntax
carried up from OS-8 operating systems.  POX on 6 character externs.

					Brian Jones

chris@umcp-cs.UUCP (Chris Torek) (10/13/84)

*	From: chuqui@nsc.UUCP (Zonker T. Chuqui)

	But look at reality for a second.  there are many, many, MANY
	systems out there with this (or similar) restrictions [short
	external names] inbred into the system software. Manufacturers
	who would want to implement a standard C compiler would have to
	change ALL of their software to meet that standard. For example,
	the DEC people would not only have to change/'fix' vax-11C, but
	cobol, pascal, fortran, bliss, logo, smalltalk, euclid, macro,
	and the kitchen sink.

Wait a minute.  First of all, doesn't VMS support 31-character names?
But more important, no one would have to change ALL their software to
meet a new standard.  They have lots of options: don't support it (or
the "full" version); write a new linker that can be used with (probably
only with) the C compiler; come up with funny hash/name-translation
schemes, etc.

(By the way, speaking of ``break zoop'' vs ``goto out'' -- when you get
right down to it, it amounts to the same thing, so the ``structuredness''
should be the same, given a reasonable definition (like flowgraph
reducibility).  The ONLY REALLY important thing is how it affects you
humans :-).)
-- 
(This mind accidently left blank.)

In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

dat@hpcnoe.UUCP (dat) (10/14/84)

	As far as case distinction, I think that it should be up
to the programmer to define a consistant standard.   For example,
in ANY language, my constants are always all uppercase, and my
'junk' variables (like loop counters) are always in lowercase.

	The compiler recognizing differences in cases can lead
to such evils as;
	
	int	i,I;
	char	word, Word;

and then code lines like

	if (i < 20) I=2;

	word = Word;

very evil.

	Of course the converse argument also holds true; the more that
you limit a language, the less functional it becomes for 'real' tasks.
Look at Pascal for an example of this.  Any version of Pascal that I 
have ever done any significant programs on has always been a superset
of the original J&W Pascal.  In fact here at HP we have a clone called
ModCal which is purported to be a cross between Modula-2 and Pascal!!!

	I would opt for the language ignoring case, though, since it is
easier conceptually for the programmer (the person who counts in the end
ANYWAY) to not worry about the case of a variable than to figure out 
errors like;

	main()
	{
	int i, I = 0;
	
	scanf("%d",&i); /* <-- lower case i */
	printf("i = %d\n", I); /* <-- upper case i */
	}

			somewhat wishy-washy on the subject,

				Dave (dAVE) Taylor (tAYLOR)


<pipe THAT through crypt and decode it!>

padpowell@wateng.UUCP (PAD Powell) (10/14/84)

I don't think that having two different object code formats
(short name, long name) would be hard to handle.  For example,
I have done just that,   and wrote an entire loader, that was tested,
correct,  and so forth, in a month.  I will root around and see if I can
find the code, and post it too the net.  I think that supporting other
object code formats should be a piece of cake.  The big problems are
in speeding things up.  Symbol table searching, which is the biggest
impact on the different formats, was less than 10% of the source code,
and about 30% of the execution time.

Patrick Powell

chuqui@nsc.UUCP (Zonker T. Chuqui) (10/15/84)

> Wait a minute.  First of all, doesn't VMS support 31-character names?
> But more important, no one would have to change ALL their software to
> meet a new standard.  They have lots of options: don't support it (or
> the "full" version); write a new linker that can be used with (probably
> only with) the C compiler; come up with funny hash/name-translation
> schemes, etc.

Ok, my apologies to DEC, I used a bad example-- I cut my teeth on RSTS way
back when and it shows... The example doesn't invalidate the problem,
though-- many manufactures have significant bases of software that would
have to be 'fixed' to support the standard. There are situations is
generating standards when a less than perfect LCD must be used because
making the standard 'right' would cause a number of people to have to
deviate from the standard or ignore it completely. I'd much rather be able
to write software to the standard that IS standard than write software to
the standard that is going to run into deviations from the standard and
break. Let manufacturers extend the standard, not restrict it.
-- 
From the Department of Bistromatics:                   Chuq Von Rospach
{cbosgd,decwrl,fortune,hplabs,ihnp4,seismo}!nsc!chuqui  nsc!chuqui@decwrl.ARPA

How about 'reason for living?'

jwp@sdchema.UUCP (John Pierce) (10/16/84)

Tony L. Hansen writes [in part]:

 > ... anytime you set a minimum like that it's going to be followed in new
 > compilers/linkers as well as the cranky old linkers that won't change
 > to keep pace with the world.

That's exactly correct.  Bad money drives out good money.  Or as with milk:
There's a minimum standard for butter fat content milk must meet for it to be
marketed as "whole" milk.  So what do producers do?  They strip *all* the
butter fat from milk, then add back in just enough to meet the standard.

 > ... my position that the 6 character/mono-case limit should be a SUBSET of
 > the standard rather than having the standard be the minimum case. Make the
 > standard something sensible and then recognize those sub-standard compilers
 > for what they are.

This is far and away the most rational comment on this subject I've seen.  The
rest of Mr Hansen's comments are worth noting, also - see 1802@pegasus.UUCP if
you don't remember them.
-----------------------------------------------------------------------------
Comments on other people's comments:

 > ... Ansi is not proposing limiting variable names to 6 charaters, they are
 > just saying they should be unique in 6 charaters...

Whatever length and case (or lack thereof) is chosen, *all* characters should
be significant.  Talk about ways of promoting subtle, mind destroying bugs...

 > ... The ANSI C standard ... will allow extensions.  The standard is simply
 > to specify what a program must be like if it hopes to be 100% portable ...

It seems to me the standard should specify what a compiler must be able to
accept if it is to be considered a standard C compiler.  Otherwise, it is a
substandard one, and see Tony Hansen's remarks quoted above.

 > ... implementors are not discouraged from adding to the language ...

Thus one achieves portability?  I would much rather see a language defined that
we can live with without "extensions" (i.e., "unportable constructs").

				John Pierce, Chemistry, UC San Diego
				{decvax,sdcsvax}!sdchema!jwp

atbowler@watmath.UUCP (Alan T. Bowler [SDG]) (10/19/84)

We all know we want long external names.  The fact remains that the
loader format is the single hardest thing to change on a system.
I know of one case where the loader (and its object format)
have survived about 20 years.  There have been 3 complete rewrites
of the operating system, but the object deck is essentially the same.
(although in one version of the operating system the limit on names
went from 6 to 8).  It took almost 10 years for even the Unix
loader to allow names longer than 7 in C programs.
    It is easy for people to say, "let the manufacturer write a new loader
that handles C's long names".  There is an implicit assumption here
that the compiler author is the manufacturer.  This is frequently
not the case.  An independant software house writing a C compiler
must make it work with the manufacturer's loader, or the compiler will
simply not sell.  Supplying another loader, is also not a viable option
we have seen this tried, and the new loader (and compiler) are dismissed
as simply an academic exercise because it fails to provide all
the baggage functionality that the old loader has accumulated over
the years.  All you can do is produce a compiler that produces
the old object deck (probably by postprocessing a more reasonable
new deck), and lobby the manufacturer for a brand new more modern
loader.  In the meantime you have to live with the 6 character limit.
(the suggestion about a post compiling step that remaps names
 falls on its face on any reasonable sized program (200-400
 routines spread over as many source files.)  You have to use
 the manufacturers loader, and library formats).
     The best thing the ANSI committee can do is put in the line,
that says that a standard comforming program must have its external
names unique in the first 6 characters (case ignored).  The wording
needs to be done carefully, so that it is clear that a compiler
loader environment that uses longer names is also standard conforming.
I.e. it is NOT being required to consider "abcdef1" equivalent to "abcdef2".
Furthmore it is legitimate, for programmers to write long names,
but if they are not unique enough, the program may not port to some
other systems.   If this is not done C will not be implemented
on these systems, the customers will continue to use Fortran and Cobol
and the manufacturer will never see why he needs a new loader.
(Everyone programs in Fortran and Cobol don't they? :-)).
If on the other hand the standard allows the 6 character restriction,
C will be implemented on these machines, people will write programs
that use longer names, and they will try to port program that use
long names from systems that properly support them.  They will
of course get into some problems, but it will be obvious that the
problem is the loader.   This way complaints and requests for change
come to the manufacturer from the sales side, and that carrys a lot more
weight with the guys that make the decisions than the most logical
arguments from any compiler writer.
      In summary, there are a number of loaders out there with
short name restrictions.  The languages on these systems do not
use long names, and will not benefit if the loader supported long
names, so there is no customer pressure for a loader that takes
long names.  If the C standard does not recognize these systems
then C will simply not get implemented (or not get widely distributed
because it lacks the official blessing of being a standard conforming
implementation).  Since C doesn't get added to the set of languages
on the system, there is still no pressure to change the loader.
If the standard does recognize these systems, then C will get implemented and
distributed over the customer base. Since it will be obvious that fixing
the loader will make C better, this will generate pressure from the
customers to fix the loader, and the loaders will finally get fixed,
and we will finally be rid on the 6 character restriction
(and won't have to fight THIS battle on the nxt language).

padpowell@wateng.UUCP (PAD Powell) (10/23/84)

>From: dat@hpcnoe.UUCP (dat)
>Subject: Re: 6 char externs and the ANSI standard
>	As far as case distinction, I think that it should be up
>to the programmer to define a consistant standard.   For example,
>in ANY language, my constants are always all uppercase, and my
>'junk' variables (like loop counters) are always in lowercase.
>
>	( ugly example inserted )
>	Of course the converse argument also holds true; the more that
>you limit a language, the less functional it becomes for 'real' tasks.
>Look at Pascal for an example of this.  Any version of Pascal that I 
>have ever done any significant programs on has always been a superset
>of the original J&W Pascal.  In fact here at HP we have a clone called
>ModCal which is purported to be a cross between Modula-2 and Pascal!!!
>
>	I would opt for the language ignoring case, though, since it is
>easier conceptually for the programmer (the person who counts in the end
>ANYWAY) to not worry about the case of a variable than to figure out 
>errors like;
>	scanf("%d",&i); /* <-- lower case i */
>	printf("i = %d\n", I); /* <-- upper case i */
>
>				Dave (dAVE) Taylor (tAYLOR)
>
>
><pipe THAT through crypt and decode it!>

ARRRGGHHH!  why not go all the way, and insist that all characters
outside strings will be forced into lower case? ALA: Pascal/Fortran, etc.

This has to be one of the most noxious things that I encounter, when porting
code:  you no longer have a WYSIWYG: What you see is What you get.

Less CLUDGES.  Less "intimate" knowledge.  Upper case and lower case
SHOULD BE DISTINGUISHED.

By the way,  I think that this should hold for external variables.

ESPECIALLY for external variables.

Patrick ("Yep, sure does look like FORTRAN") Powell

geoff@desint.UUCP (Geoff Kuenning) (10/28/84)

AAAAAAAAAAAAARRRRRRRRRRRRRRRRRRRGGGGGGGGGGGGGGGGGGGGGGGGGGGGGHHHHHHHHHHHHHH!!!!

We have hashed, rehashed, and rerehashed (no, re*hashed) this subject for
months now.  The consensus appeared in the first week:  about 80% of the
votes are for no limitations;  20% for 6-chars.

So alright already!  We all agree;  let's make sure the ANSI committee knows
and move on to something else.  I wouldn't be surprised to learn they've
already caved in to the overwhelming public opinion.
-- 

	Geoff Kuenning
	First Systems Corporation
	...!ihnp4!trwrb!desint!geoff

henry@utzoo.UUCP (Henry Spencer) (10/30/84)

> So alright already!  We all agree;  let's make sure the ANSI committee knows
> and move on to something else.  I wouldn't be surprised to learn they've
> already caved in to the overwhelming public opinion.

Caved in?  Not really; there is too much at stake in regard to getting wide
acceptance of the standard.  But the latest draft (17 Oct) has indeed had
the wording changed in the way I reported earlier:  the language itself no
longer defines any limits, but specific implementations are allowed to set
limits (provided they are no more severe than "six characters, monocase").
With any luck, this achieves the desired result:  getting sensible systems
to adopt arbitrary-length names, while encouraging adoption of the standard
by making it possible to produce standard-conforming implementations even
on systems with brain-damaged compatibility constraints.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

ralph@uf-csg.UUCP (Ralph Kuntz) (11/01/84)

[this line intentionally left blank]

Come on folks!!  Let's move out of the ``stone-age'' of compilers.  The
proposed standard allows 6 mono-case characters for externs.  If a program is
to comform to the standard it too cannot depend on more than 6 significant
characters.  The alternative is to allow an ``unlimited'' number of significant
characters, ala 4.2bsd C, and bring all of the out-of-date loaders into the
'80s.  Writing compiler-assemblers is no longer the chore it used to be.
Loader technology should also be brought up to date.
-- 
	From the dungeon at the University of Florida

  .......     .......     ...   ,..,   VOICE:	Ralph Kuntz
 /HHHHHHH\    |HHHHHH\    |H|  /HH/    UUCP:	..!akgua!uf-csv{!uf-csg}!ralph
|HH/   \HH|   |HH/ \HH|   |H| /HH/     USNAIL:	512 Weil Hall
|H|           |HH\ /HH|   |H|/HH/      		Computer and Information Science
|H|   [HHHH]  |HHHHHH/    |H|\HH\      		University of Florida
|HH\   /HH/   |H| \HH\    |H| \HH\     		Gainesville, FL. 32611
 \HHHHHHH/    |H|  \HH\   |H|  \HH\    AT&T:	(904) 392-2371
  """""""     """   `""`  """   `""`   CSNET:	ralph@ufl

[The preceding does not necessarily represent the opinion of anyone.]

henry@utzoo.UUCP (Henry Spencer) (11/06/84)

> ... and bring all of the out-of-date loaders into the
> '80s.  Writing compiler-assemblers is no longer the chore it used to be.
> Loader technology should also be brought up to date.

If you've been reading the earlier discussions, you know that the problem
is compatibility, not the difficulty of rewriting the loaders.  Vendors are
generally stuck with being backward compatible with all previous mistakes
forever.  If you disagree, please prove it by convincing at least one
major manufacturer (IBM would be a good choice) to abandon compatibility.
Please do this *before* flaming about how easy it is.
-- 
"Ancient principle of hacking:  he who complains, gets to fix it."

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

david@ukma.UUCP (David Herron) (11/07/84)

> 
> Having read the mail, I'm changing my vote...I'd like to
> see the standard require long externals than 6 characters.
> The suggestion that compilers check for potential collisions
> isn't bad, but wouldn't it make more sense to let lint do the
> checking?  That's where portability problems have traditionally
> been detected.  Also, it's lint, rather than the compiler, that
> gets to see all the programs together and can detect externals
> from different files that wouldn't be distinguished. I'd like
> to see two kinds of output: lint should give a "high water
> mark" ("The linker must examine the first XXX characters if
> cases are distinct and the first YYY characters if they are not."),
> and should flag externals (or internals, for that matter) that
> don't differ within some settable number of characters.
> 			- Jim Van Zandt
> 
> 

Who actually uses lint?  Around here we never got used to using
it because our only machines were two 11/23's and an 11/10 and it
wouldn't fit.

I am actually curious how useful it is.  I have written one program
using lint regularly while writing it, and all the things lint found
were somewhat silly.  So, I went to my other programs (written over
the last few years) and ran lint on them....same result.  Do I just
write good code or what?
-----------------------------------------
David Herron
Phone:	(606) 257-4244 (work, phone will usually be answered as "Vax Lab").
	(606) 254-7820

        Arpa-Net-----\
		      \   (or cbosgd!hasmed!qusavx!ukma!david)
	unmvax----\    \
	research   \____\____ anlams!ukma!david
	boulder    /      /
	ucbvax----/      /
                        /
	decvax!ucbvax--/

For arpa-net, anlams has the name ANL-MCS.  I have been having trouble
getting mail from arpa-net through anlams so maybe try a different route
or the user name "s".

geoff@desint.UUCP (Geoff Kuenning) (11/08/84)

In article <9477@watmath.UUCP> atbowler@watmath.UUCP (Alan T. Bowler) writes:

>We all know we want long external names.  The fact remains that the
>loader format is the single hardest thing to change on a system.
>I know of one case where the loader (and its object format)
>have survived about 20 years.  There have been 3 complete rewrites
>of the operating system, but the object deck is essentially the same.
>(although in one version of the operating system the limit on names
>went from 6 to 8).  It took almost 10 years for even the Unix
>loader to allow names longer than 7 in C programs.

Harumph.  That's called lack of foresight.  I seriously doubt those 3
complete rewrites took place on a "let's rewrite the OS from scratch"
basis.  More likely it got done piece by piece.  If, when you do one piece,
you leave the hooks for longer names, over time you will have hooks in
nearly everything, and the conversion is then easy.  This is proven by the
fact that it _w_a_s possible to upgrade the limit from 6 to 8.  The
best time to do this is when you are rewriting the single hardest-to-change
part, which is usually either the loader or the part of the OS that
initiates tasks/programs/processes/jobs.

>    It is easy for people to say, "let the manufacturer write a new loader
>that handles C's long names".  There is an implicit assumption here
>that the compiler author is the manufacturer.  This is frequently
>not the case.  An independant software house writing a C compiler
>must make it work with the manufacturer's loader, or the compiler will
>simply not sell.  Supplying another loader, is also not a viable option
>we have seen this tried, and the new loader (and compiler) are dismissed
>as simply an academic exercise because it fails to provide all
>the baggage functionality that the old loader has accumulated over
>the years.

Again, harumph.  My employer gets most of its sales from the fact that our
linker provides all the ugly features you find on Intel's.  While our C
compiler must work reasonably well with the Intel loader, it is perfectly
acceptable in my opinion to have a switch that says "produce old-style
object modules" to achieve that goal.

>(the suggestion about a post compiling step that remaps names
> falls on its face on any reasonable sized program (200-400
> routines spread over as many source files.)

Why?  Don't think of it as a post-compiling step, think of it as a
pre-linking step.  Big difference in binding times.

-- 

	Geoff Kuenning
	First Systems Corporation
	...!ihnp4!trwrb!desint!geoff

brad@gcc-opus.ARPA (Brad Parker) (11/08/84)

In article <237@uf-csg.UUCP> ralph@uf-csg.UUCP (Ralph Kuntz) flames (-::
>Come on folks!!  Let's move out of the ``stone-age'' of compilers.  The
>proposed standard allows 6 mono-case characters for externs.  If a program is
>to comform to the standard it too cannot depend on more than 6 significant
>characters.  The alternative is to allow an `unlimited' number of significant
>characters, ala 4.2bsd C, and bring all of the out-of-date loaders into the
>'80s.  Writing compiler-assemblers is no longer the chore it used to be.
>Loader technology should also be brought up to date.
>-- 

I'd have to agree with this. After a very painfull port of "portable" C code
from WhiteSmith's C on one machine (Z-80) to Whitesmith's C on a PDP-11
running RSTS (rat-sh*t time sharing), and discovering that the linker only
accepted symbol of 6 chars or less, I have paid my dues. Moving from 8
or more significant chars in an external symbol to 6 can cause premature
balding. This is a special case, I realize, but one that should be concidered
before adopting a standard. 

dhp@ihnp3.UUCP (Douglas H. Price) (11/12/84)

>In article <9477@watmath.UUCP> atbowler@watmath.UUCP (Alan T. Bowler) writes:
>
>> ... The fact remains that the loader format is the single hardest thing
>> to change on a system. ...
>
>Harumph.  That's called lack of foresight.  I seriously doubt those 3
>complete rewrites took place on a "let's rewrite the OS from scratch"
>basis.  More likely it got done piece by piece.  If, when you do one piece,
>you leave the hooks for longer names, over time you will have hooks in
>nearly everything, and the conversion is then easy.  This is proven by the
>fact that it was possible to upgrade the limit from 6 to 8.  The
>best time to do this is when you are rewriting the single hardest-to-change
>part, which is usually either the loader or the part of the OS that
>initiates tasks/programs/processes/jobs.
>
	All well and good, but manufacturers have very little interest
	in touching what already works just fine, thank-you, for their
	own software.  Why should a manufacturer risk the good-will of
	their customers by fielding a completely new version of such a
	key tool (the loader)?  Why reintroduce all of the bugs that have
	been shaken out over the life of the product?  To anticipate the
	argument, this is NOT the same as normal product enhancement.  Make
	all of the demands you like, the fact is that only new systems will
	have long symbol names, and only normal attrition will get rid of old
	systems.

>>(the suggestion about a post compiling step that remaps names
>> falls on its face on any reasonable sized program (200-400
>> routines spread over as many source files.)
>
>Why?  Don't think of it as a post-compiling step, think of it as a
>pre-linking step.  Big difference in binding times.
>
	Ever try to debug a program that has had its symbols remapped?
	The defense rests..

-- 
						Douglas H. Price
						Analysts International Corp.
						@ AT&T Bell Laboratories
						..!ihnp4!ihnp3!dhp

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/13/84)

> Who actually uses lint?  ...
> 
> I am actually curious how useful it is.  ...

I have made a practice of writing C code that passes entirely
unscathed through "lint" (SVR2 version) and find that it helps
tremendously when I have accidentally omitted an & from a struct
paramater to a function or some other hard-to-find mistake.
Often when someone brings me buggy code to help with, "lint"
shows their problem.  This is quite apart from its use in making
one's code portable.

The only thing I have not been able to keep "lint" happy about
is the use of a pointer returned by malloc() for some non-(char *)
datum.  This problem should go away with the advent of (void *).

henry@utzoo.UUCP (Henry Spencer) (11/13/84)

> Who actually uses lint?  ...

Most anybody who wants portable programs.  It's indispensable.

> I am actually curious how useful it is.  ... [Some of its complaints
> look silly.]

Very useful.  Even things that look "silly" sometimes have non-trivial
implications for portability, although it may just be that your code
*is* well-written and lint *is* being silly (which it is at times).
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

guy@rlgvax.UUCP (Guy Harris) (11/14/84)

> Who actually uses lint?  Around here we never got used to using
> it because our only machines were two 11/23's and an 11/10 and it
> wouldn't fit.

Well, the 11/10 is a pain, considering it can't even run V6, much less
V7, but 2.9BSD on the 11/23 might run lint with overlaying (address-space-
map switching, not on-disk - the non-split-I&D 11s' major problem is with
virtual address space, not physical).

We use lint quite a bit here.  If you have a machine with 16-bit "int"s
and 32-bit pointers, you learn to do so.  Quickly.  It catches attempts
to pass an unadorned NULL or 0 to a routine that expects a (32-bit) pointer
instead of a (16-bit) int - and those will nail you to the wall.

> I am actually curious how useful it is.  I have written one program
> using lint regularly while writing it, and all the things lint found
> were somewhat silly.  So, I went to my other programs (written over
> the last few years) and ran lint on them....same result.  Do I just
> write good code or what?

Probably you just write good code.  There's tons of code out there that
"lint" would just toss its cookies on, from Bell, Berkeley, and other
places.  Somebody at Bell seems to be putting the pressure on to "lint",
as a lot of stuff in System V seems to have been changed to quiet "lint".
A lot of 4.2BSD stuff seems to have been "lint"ed also.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

kpmartin@watmath.UUCP (Kevin Martin) (11/18/84)

>Sorry, I lost the intermediate author... It was whoever that is who likes
>clearing his throat before typing...
>
>>In article <9477@watmath.UUCP> atbowler@watmath.UUCP (Alan T. Bowler) writes:
>>
>>> ... The fact remains that the loader format is the single hardest thing
>>> to change on a system. ...
>>
>>Harumph.  That's called lack of foresight.  I seriously doubt those 3
>>complete rewrites took place on a "let's rewrite the OS from scratch"
>>basis.  More likely it got done piece by piece.
Sorry, you lose. They were all re-writes, essentially top to bottom.
Only (some of) the module names remain.


I think the older manufacturers are more likely to fix their loaders if
they can first sell an "ANSI standard C compiler" using the current loader.
When customers start complaining that the brain damaged loader doesn't
let them bring in programs from other systems, they might actually fix it.

On the other hand, the manufacturers would also be quite happy NOT to have
an ANSI standard C (if this would require re-writing the loader). Then no one
buys it, and it never gets fixed.

Sort of two "vicious" circles... which one you get depends on whether the
initial compiler can be called "standard-conforming".

Perhaps the standard should require minimum 6 char caseless externals,
but the implementation of anything less than arbitrary length case-distinct
is, as they say, 'deprecated'
              Kevin Martin, UofW Software Development Group.


P.S. The Random House College Dictionary defines:
dep-re-cate: v.t. 1. To express earnest disapproval of.
   2. To protest against (a scheme, purpose, etc.).
   3. to depreciate or belittle.
   4. (archaic) to pray for deliverance from.
[ From the latin, deprecat(us), "prayed against, warded off" ]

I often find the 4th definition appropriate :-)

geoff@desint.UUCP (Geoff Kuenning) (11/20/84)

In article <120@ihnp3.UUCP> dhp@ihnp3.UUCP (Douglas H. Price) writes:

>	All well and good, but manufacturers have very little interest
>	in touching what already works just fine, thank-you, for their
>	own software.  Why should a manufacturer risk the good-will of
>	their customers by fielding a completely new version of such a
>	key tool (the loader)?  Why reintroduce all of the bugs that have
>	been shaken out over the life of the product?  To anticipate the
>	argument, this is NOT the same as normal product enhancement.  Make
>	all of the demands you like, the fact is that only new systems will
>	have long symbol names, and only normal attrition will get rid of old
>	systems.

Because, over 15 years, very little of that important old software will be
satisfactory.  Most major OS utilities have a lifespan of ten years or less;
even in that time frame they may undergo several near-rewrites.  All are
eventually rewritten for one reason or another.

>	Ever try to debug a program that has had its symbols remapped?
>	The defense rests..

Ah, yeah, I have to confess I did that just today.  The nasty old compiler
took my nice mnemonic symbols and remapped them to _B_I_N_A_R_Y _N_U_M_B_E_R_S!  How
uncooperative of it.  Fortunately, my debugger has access to a table giving
my original name and the mapped name.  It is no harder to set up a symbol
table for names that have been remapped to shorter ones.  Have you ever
tried to debug a program that has _t_r_u_n_c_a_t_e_d names?

The prosecution rests...
-- 

	Geoff Kuenning
	First Systems Corporation
	...!ihnp4!trwrb!desint!geoff

joemu@tekecs.UUCP (Joe Mueller) (11/21/84)

>>(the suggestion about a post compiling step that remaps names
>> falls on its face on any reasonable sized program (200-400
>> routines spread over as many source files.)
>
>Why?  Don't think of it as a post-compiling step, think of it as a
>pre-linking step.  Big difference in binding times.
>
<<	Ever try to debug a program that has had its symbols remapped?
<<	The defense rests..
<
I believe your defense is a weak one. There is no reason that debugging
a program that has had it's symbols remapped should be any more
difficult that debugging a "normal" file. The post compiler just
generates a table that the debugger (assembly and/or symbolic) reads
and translates the symbols back and forth. It only adds one fairly
simple level of complexity to the debugger and solves all the problems
I've seen so far. I don't see any big deal in designing either the
post processor or in modifying the debuggers. The main question is
whether it's appropriate for the Standard to mandate the limit.

My personal preference is to set the internal and external identifier
limits to be identical so the poor slob that has to maintain the code
doesn't have to try to keep the two separate. The current standard
states that internal identifiers are significant to 31 chars, case
distinct, and I feel externals should follow the same rules.

						Joe Mueller
UUCP:	...!{ucbvax or decvax}!tektronix!tekecs!joemu
ARPA:	tekecs!joemu.tektronix @ udel-relay

henry@utzoo.UUCP (Henry Spencer) (11/21/84)

> ...  It is no harder to set up a symbol
> table for names that have been remapped to shorter ones.

Sounds like we have a volunteer to head an ANSI committee to standardize
symbol tables.  Good luck; you'll need it.

And no, I am not just being facetious.  Not entirely, anyway.  Mapping
symbols to short names automatically loses badly if every compiler has
to do it independently, and none of them agree on the conventions or the
symbol-table format.  If you have the luxury of having support systems
(like debuggers) custom-built for one language, that's fine.  Otherwise...
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

henry@utzoo.UUCP (Henry Spencer) (11/22/84)

> My personal preference is to set the internal and external identifier
> limits to be identical so the poor slob that has to maintain the code
> doesn't have to try to keep the two separate. The current standard
> states that internal identifiers are significant to 31 chars, case
> distinct, and I feel externals should follow the same rules.

As I've mentioned before, the problem is backward compatibility with
systems that *cannot* provide more than 6 chars monocase for external
identifiers (barring kludges like remapping).  The current draft of
the standard says that identifiers are theoretically unlimited in
length, but that implementations may impose limitations, with worst
allowed limits of 31 dualcase for internals, 6 monocase for externals.
Do you really suggest that implementations on brain-damaged systems should
limit internal identifiers as well to 6 chars monocase?  I understand
the desire for consistency, but if I can't have long identifiers for
externals, I at least want them for internals.  Note that you can't
use the preprocessor to do remapping if it doesn't take long identifiers
(although the desirability of preprocessor remapping is another story...).
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry