jeff@alberta.UUCP (C. J. Sampson) (12/31/84)
> > and object file format don't accept long or case-sensitive names. What > > does the proposed ANSI standard say about the issue? > > The current draft says that the length limit (if any) and treatment > of case in external identifiers are "implementation-defined", which > means that implementors can do things as they wish but must document > their decisions. Also, the length limit may not be shorter than 6. Gads! When are they going to figure out that 6 or 8 characters is *not* enough. I spent three hours porting ogre to an Altos 586 running some ancient verson of Xenix and most of that was spent changing function names because I had only 7 signifcant characters. I think that the standard should enforce a minimum of 32 characters. We will make programs more portable and readable. ------------------------------------------------------------------- C. J. Sampson Snail Canada: #712 11135-83rd ave. ihnp4! \ Edmonton, Alberta ubc-vision! |- alberta!jeff CANADA T6G 2C8 sask! / Phone: (403) 439-6851
david@ukma.UUCP (David Herron, NPR Lover) (01/02/85)
> From: jeff@alberta.UUCP (C. J. Sampson) > Newsgroups: net.lang.c > Subject: Re: length of external names > Message-ID: <380@alberta.UUCP> > Date: Sun, 30-Dec-84 21:38:19 EST > > > The current draft says that the length limit (if any) and treatment > > of case in external identifiers are "implementation-defined", which > > means that implementors can do things as they wish but must document > > their decisions. Also, the length limit may not be shorter than 6. > > Gads! When are they going to figure out that 6 or 8 characters is *not* > enough. I spent three hours porting ogre to an Altos 586 running some > ancient verson of Xenix and most of that was spent changing function > names because I had only 7 signifcant characters. I think that the standard > should enforce a minimum of 32 characters. We will make programs more > portable and readable. But if we enforce a minimum size then they will be portable only within the systems that support that size. I think "implementation-defined" is the way to go. At least for now. --:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:- David Herron; ARPA-> "ukma!david"@ANL-MCS (Try the arpa address w/ and w/o the quotes, I have had much trouble with both.) UUCP -:--:--:--:--:--:--:--:--:- (follow one of these routes) {ucbvax,unmvax,boulder,research} ! {anlams,anl-mcs} -----\ vvvvvvvvvvv >-!ukma!david {cbosgd!hasmed,mcvax!qtlon,vax135,mddc} ! qusavx -----/ ^^^^^^^^^^^
henry@utzoo.UUCP (Henry Spencer) (01/02/85)
> > The current draft says that the length limit (if any) and treatment > > of case in external identifiers are "implementation-defined", which > > means that implementors can do things as they wish but must document > > their decisions. Also, the length limit may not be shorter than 6. > > Gads! When are they going to figure out that 6 or 8 characters is *not* > enough. I spent three hours porting ogre to an Altos 586 running some > ancient verson of Xenix and most of that was spent changing function > names because I had only 7 signifcant characters. I think that the standard > should enforce a minimum of 32 characters. We will make programs more > portable and readable. Oh lord, not this again... This topic was discussed *to death* a few months ago. To summarize the major points that emerged: - There are many systems which are doomed to live with old, brain-damaged linker formats. Manufacturers have too big a commitment to the old formats to change, and their users have no say in the matter. It is politically vital for the acceptance of the standard that standard-conforming implementations be possible on such machines. This is regrettable but impossible to avoid. - Trying to pick a number other than 6 is silly. People who have a choice about the number can just as easily opt for no limit at all, which is clearly the right decision. People who do not have a choice about the number generally are stuck with a rather low number, typically 6. - Software which relies on long names is not fully portable, regardless of claims to the contrary. - It is generally agreed that the situation is unsatisfactory and painful. - I repeat a challenge I made at the time: if you think a mandatory bigger number is appropriate despite the problems this will cause for the more backward systems, prove your point by convincing DEC or IBM to agree with you. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
jeff@alberta.UUCP (C. J. Sampson) (01/03/85)
> > [ Names are not long enough, etc. etc. ] > > But if we enforce a minimum size then they will be portable only > within the systems that support that size. I think "implementation-defined" > is the way to go. At least for now. The idea is that all standard systems will support a minimum size that is reasonably large. "implementaton-defined" sizes will make C programs no more portable then they are now in that respect. What is the point of a standard if it does not make programs written to the standard more portable? I still say that we should have minimum 32 character externs. Porting ~500 lines an hour just because of this is very expensive as well as very stupid. ===================================================================== Curt Sampson ihnp4!alberta!jeff --------------------------------------------------------------------- "It looked like something resembling white marble, which was probably what is was: something resembling white marble."
cottrell@nbs-vms.ARPA (01/04/85)
/*
> Gads! When are they going to figure 6 or 8 chars is not enuf?
i hate long names. what is this anyway, cobol? why say
'social_security_number' when you can say 'ssn'? personally, i would like
to see variable names restricted to 3 chars exactly :-)
*/
henry@utzoo.UUCP (Henry Spencer) (01/04/85)
> The idea is that all standard systems will support a minimum size that is > reasonably large. If you can convince people like IBM and DEC to go along with this and change their object-module formats to match, the entire C community will be forever indebted to you. That is *not* a facetious comment; we cannot afford to ignore the major manufacturers when we are talking about making something really standard. If they don't accept it, then we have a two- level standard in practice even if it's not so in theory. And I see no chance whatsoever that they are going to change their object-module formats at this late date. None, zero. Give it up, it's hopeless. And just how much acceptance would you expect for a standard that none of the major manufacturers can comply with? > "implementaton-defined" sizes will make C programs no more > portable then they are now in that respect. What is the point of a standard > if it does not make programs written to the standard more portable? I hate to tell you this, but the current drafts have a substantial appendix listing all the "implementation-defined" characteristics of a conforming implementation. Identifier length is only one of a longish list. Even if the standard does not make programs more portable -- and it will, it will -- it prevents future compiler writers from making them still *less* portable. "The difference between bad and worse is much sharper than the difference between good and better." > I still > say that we should have minimum 32 character externs. Porting ~500 lines an > hour just because of this is very expensive as well as very stupid. Imagine how some of us PDP11 people feel when we try to port 4BSD programs to our machines; the stupidities are not limited to long names. ("Malloc never fails, so we needn't bother checking its return value.") ("Of course ints are 32 bits, everywhere. The whole world's a VAX, after all.") ("I know I'm really supposed to use %ld to print a long, but who cares? It's the same size as an int, so I'll just use %d.") The only cure for this sort of malignant imbecility is more care by the original author. Porting unportably-written software is always going to be hard. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
Paul Schauble <Schauble@MIT-MULTICS.ARPA> (01/06/85)
I'm not sure that I should post this to the net, but I can't resist.. Henry Spencer, who seems to be one of the chief exponents of short external names, just posted a convincing explaination of the need to not break existing linkers. I understand why and the issues involved. I even mostly agree. In a previous incarnation I worked on COBOL and PL/1 for a manufacturer that had the same problem: a language that required long names and a linker that only handled short ones. The solution that was used, and worked, was to have the COMPILER use the external "name" to store a hashed value. During the recent net discussion I posted a description of this technique and some analysis of the chance and cost of collisions. This is done entirely in the compiler, and has no effect on the linker. I have not seen any reasonable statement of why this would not be workable. The only objection that I can recall was that having to look up the name translation during debugging was extra work. True, but consider...Would you rather have the extra work on the few occasions that you need to look up a symbol on the load map, or on the many more frequent occasions that you are dealing with C source and have to guess what "dtfmdu" or something means? You know which way I will vote. More recent discussion prompts me to post a small modification of the technique. Several people have pointed out the desirability of a language feature that would have the internal and external names of a global item be different, e.g. extern int date_and_time() entry "SYS$TIME"; extern int memory_size entry "CSYS$MEMSIZ"; I like this, other languages have it, it's useful, and it would have saved me having to write a number of assembler routines whose only purpose was to change names. It also allows me to suggest a modification of the hashing technique. Note that this only applies to systems with deficient linkers. If the declaration contains an entry clause, use that as an external name. Otherwise, if the item name is short enough, use the item name. Otherwise, hash the item name and use the result as the external name. This allows programming using the full names, and using the entry clause for those cases where you really care what the external name is, or in the rare cases when the hash causes a duplication of external names. ---------------------------------------------------------------------- Now, my questions: To the standards commiteee poeple: 1. Suppose that the standard required longer names and suggested the hashing technique as an implementation technique, you would force manufacturers to update either linker or compiler to meet the standard. Is this politically possible? 2. In some other areas, I am told, the standard described a relatively high level language, rather than the mimimum of implementations. This will prevent some present compilers from meeting the standard. Why should it pick the mimimum here? 3. How can I get a copy of the draft standard? 4. Is this an adequate method of getting comments and questions to the committee? If not, what is a useful channel? To the net at large: 1. What are specific objections to the hashing technique? 2. Are there any machines where it won't work, and why? Please copy me on any answers. Service from the list has been erratic lately. Thanks for all the fish... Paul Schauble@MIT-Multics.ARPA
henry@utzoo.UUCP (Henry Spencer) (01/08/85)
> Henry Spencer, who seems to be one of the chief exponents of short > external names, just posted a convincing explaination of the need to not > break existing linkers. ... To rebut a misconception: I don't like short external names. I merely think that (a) some provision for them in the standard is inevitable, and (b) annoying though this is, we can live with it, which is a passing grade for a standard that has to apply to everyone. > [A] solution that was used, and worked, was to have the COMPILER use the > external "name" to store a hashed value. During the recent net > discussion I posted a description of this technique and some analysis of > the chance and cost of collisions. I don't recall seeing the previous posting about this, but the problem of collisions is definitely a nasty one. Bearing in mind that separately- compiled modules must agree on the object-file (i.e. short) name under which an identifier is known, the possibility of collisions is a major flaw in a hashing scheme. I've worked with compilers that did similar things (first 4 and last 3 chars of the identifier, as I recall) and one had to be careful about collisions; it really wasn't much better than short identifiers. If the algorithm used is really a hashing function rather than a systematic "cut and paste" rearrangement of the original identifier, collisions become (a) less likely, and (b) harder to spot and deal with. Note that hashing *demands* a way to force an internal-to-external correspondence, like the proposed "entry" clause, for linking to system services and other languages. I like the idea of using an "entry" clause to manage correspondences between internal/long and external/short names, although if you ignore the issue of identifiers containing funny characters, you can do exactly the same thing with #define. (Note that preprocessor identifiers are internal, hence must be long.) I am not a member of the committee, but will comment on some of the suggestions addressed to them... > 1. Suppose that the standard required longer names and suggested the > hashing technique as an implementation technique, you would force > manufacturers to update either linker or compiler to meet the > standard. Is this politically possible? I don't know. If the problem of collisions can be shown to be a non-issue, and the "entry" clause or something like it can be introduced, it might be viable. It depends on how manufacturers feel about hashing. > 2. In some other areas, I am told, the standard described a relatively > high level language, rather than the mimimum of implementations. > This will prevent some present compilers from meeting the standard. > Why should it pick the mimimum here? Because the problems go much farther than the compiler. Object-module formats are visible system-wide, making changes much harder. > 3. How can I get a copy of the draft standard? I believe the draft has gone to ANSI for publication for formal public comment; it should be available from CBEMA (don't have the address handy) shortly. The price will be unpleasant, though, knowing CBEMA. I don't know whether the older informal channels are still open. > 4. Is this an adequate method of getting comments and questions to the > committee? If not, what is a useful channel? Some of the committee folks definitely do read this newsgroup. If you want to be forceful about something, though, the recommended course is to write (on a piece of paper) to them. The transition to ANSI formal- public-comment phase may have altered this, though. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
jans@mako.UUCP (Jan Steinman) (01/08/85)
In article <6951@brl-tgr.ARPA> cottrell@nbs-vms.ARPA writes: >...personally, i would like to see variable names restricted to 3 chars >exactly :-) Or better yet, a single upper-case character, ('I' preferred) followed by exactly four digits! ::-) -- :::::: Jan Steinman Box 1000, MS 61-161 (w)503/685-2843 :::::: :::::: tektronix!tekecs!jans Wilsonville, OR 97070 (h)503/657-7703 ::::::
Paul Schauble <Schauble@MIT-MULTICS.ARPA> (01/09/85)
One other comment on the hashing technique: When I made the original posting I assumed the linker model I was most familiar with: one external definition and a series of references. For this model, having two C symbols that hash to the same external is not very much of a problem. The linker will see two different definitions of its symbol and should complain. The numbers given also assumed the shortest linker name space I was aware of, 30 bits. For anything larger (GCOS, 36 bits, MS-DOS, 64 bits), the probability of collision is too small to compute on the equipment I have at hand. Paul
bright@dataio.UUCP (Walter Bright) (01/09/85)
> The solution that was used, and worked, was to have the COMPILER use the > external "name" to store a hashed value. During the recent net > discussion I posted a description of this technique and some analysis of > the chance and cost of collisions. > > This is done entirely in the compiler, and has no effect on the linker. > > To the net at large: > > 1. What are specific objections to the hashing technique? a) Reading linker maps would be terrible. b) All the other tools that depend on the global symbol table would be messed up. So, you say, rewrite the tools so they inverse hash the symbols. So then it would be easier to just fix the linker, and we're back where we started. My solution to the external symbol dilemma is that it should be implementation-defined, since the behavior is determined by the linker and the compiler writer typically has no control over the linker. If code is being ported to a machine with a smaller linker, the programmer could 'hash' the overly long externals himself with macros.
wdr@faron.UUCP (William D. Ricker) (01/10/85)
>From: Paul Schauble <Schauble@MIT-MULTICS.ARPA> >. >: >The solution that was used, and worked, was to have the COMPILER use the >external "name" to store a hashed value. During the recent net >discussion I posted a description of this technique and some analysis of >the chance and cost of collisions. >This is done entirely in the compiler, and has no effect on the linker. >. >: >More recent discussion prompts me to post a small modification of the >technique. Several people have pointed out the desirability of a >language feature that would have the internal and external names of a >global item be different, ... >If the declaration contains an entry clause, use that as an external >name. >Otherwise, if the item name is short enough, use the item name. >Otherwise, hash the item name and use the result as the external name. ----------- One interpretative language I'm familiar with uses a similar hashing scheme. (This ties in with the suggestion of 7chars & length, as was used in PL/I.) The length, initial three characters, and a hash-code of 1-31 character identifiers where used as the internal name. In the special case of length=4, the hash-code is the fourth character, also compressed. In reality, the length and 4 characters are compressed into 4 bytes. This is possible due to the limited character set for identifiers. The interpreter unpacked the length, initials and hash when the structure was displayed (in debugging or listing what routines were loaded). It even altered the format to distinguish visually between 4 frob 7 fro b to empashize that "frob" hashes itself, "4frob", but "frobble" hashes to "7frob". I'm not sure what the character set was, nor bit assignments. (I could look it up at home if anyone cares.) It might have been 5 bits for the length (1-31) and 6 bits for the compressed initials and hash--but it wasn't the SIXBIT character set. For some reason, I think the number of bits for the 3-char compression (perfect hash) was not divisible by three, though, and the 4th char/hash was compressed separately. 17 bits would represent 128k combinations, which would represent 3 characters from a 50-character font; 16 bits suffices for 3 alphamerics (40-character font: [A-Z0-9@$#_]). -- William Ricker wdr@MITRE-Bedford.ARPA (MIL) wdr@faron.UUCP (UUCP) decvax!genrad!linus!faron!wdr (UUCP) {allegra,ihnp4,utzoo,philabs,uw-beaver}!linus!faron!wdr (UUCP) Opinions are my own and not necessarily anyone elses. Likewise the "facts".
mike@enmasse.UUCP (Mike Schloss) (01/11/85)
> One other comment on the hashing technique: When I made the original > posting I assumed the linker model I was most familiar with: one > external definition and a series of references. For this model, having > two C symbols that hash to the same external is not very much of a > problem. The linker will see two different definitions of its symbol > and should complain. This will work fine for one object module and one or more libraries, but what about multiple object modules??? Like when you compile a kernel, shell, or other large (multi source file) utility. > The numbers given also assumed the shortest linker name space I was > aware of, 30 bits. For anything larger (GCOS, 36 bits, MS-DOS, 64 > bits), the probability of collision is too small to compute on the > equipment I have at hand. The probability is too small... Does this mean it will never occur. Would you like to find the bug that an unreported collision will cause if (when) it does happen. Or, would this be the first place you would look if you did have a elusive bug. P.S. Assuming hashing is used... A possible solution to finding this rare bug would be to recompile everything (libraries and all) using an alternate hashing function.
mike@enmasse.UUCP (Mike Schloss) (01/11/85)
> >...personally, i would like to see variable names restricted to 3 chars > >exactly :-) > > Or better yet, a single upper-case character, ('I' preferred) followed by > exactly four digits! ::-) > -- > :::::: Jan Steinman Box 1000, MS 61-161 (w)503/685-2843 :::::: > :::::: tektronix!tekecs!jans Wilsonville, OR 97070 (h)503/657-7703 :::::: Or how about a single upper-case character only for numbers and a '$' followed by a single upper-case character only for characters?
jack@vu44.UUCP (Jack Jansen) (01/13/85)
>> >...personally, i would like to see variable names restricted to 3 chars >> >exactly :-) >> Or better yet, a single upper-case character, ('I' preferred) followed by >> exactly four digits! ::-) >Or how about a single upper-case character only for numbers and >a '$' followed by a single upper-case character only for characters? Yeah! This sounds great! And let's add *mandatory* line numbers, so it will be much simpler to discuss programs! And lets call this discussion "Beauty And Style Into 'C'!!". \ \ :-) / / -- Jack Jansen, {seismo|philabs|decvax}!mcvax!vu44!jack or ...!vu44!htsa!jack Help! How do I make a cup of tee while working with an acoustic modem?
Schauble@MIT-MULTICS.ARPA (Paul Schauble) (11/06/86)
A couple of months back, I was involved in a fairly active tirade about the length of external names in the C standard. I believed then, and still do, the the proposed standard's length of 8 characters in inadequate. This minimum will become a maximum for anyone wanting to write portable code. Now, I don't want to reopen the argument here. I am very curious, however, as to why that limit was established. The only reason I can come up with is to accommodate limitations in somebody's linker. But who? The last machine I am aware of that had a short name restriction in the linker was Honeywell's GCOS line. They now have a new linker with a 500 character limit. I have reason to suspect that there are no current machines and operating systems with a very short limit. Reason being the the COBOL standard requires 30 character names, and that forced most manufacturers to update their linkers. So, I am asking for information. Are there any current production machines and operating systems with a linker that will not accept 30 character external names? By current production I mean one that is actively supported by new software, such that one could reasonably expect it to get an ANSI C compiler. Please reply directly to me. I will post results in two weeks. If you know of such a machine, please provide me my counterexample. Thanks, Paul Schauble at MIT-Multics.ARPS