[comp.lang.c] six-character extern id limit

dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/19/88)

I said that I thought Doug Gwyn exaggerated in saying that "many" C
implementors were not in a position to improve the linker that would
"of necessity" be used with the output from their compiler.  The
context was a discussion of ANSI's guaranteeing no more than 6
significant characters in external names.

Both Doug Gwyn and Henry Spencer disagree.  But although I have been
following this newsgroup for some time, I don't recall any specific
cases being described of linkers that can't handle more than
6-character externals and that will of necessity be used to link C
code.  Are there more than just a few?  (Remember, we're talking about
a 6-character limit, not 7 or 8, which are more common.)
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

ra@isncr.is.se (Robert Andersson) (09/19/88)

In article <4003@bsu-cs.UUCP>, dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
> 
> But although I have been
> following this newsgroup for some time, I don't recall any specific
> cases being described of linkers that can't handle more than
> 6-character externals and that will of necessity be used to link C
> code.  Are there more than just a few?  (Remember, we're talking about
> a 6-character limit, not 7 or 8, which are more common.)
> -- 

The linker on the Honeywell Bull DPS6 minicomputers is limited to 
6-character uppercase-only externals, and the C-compiler has to live
with this severe limitation. In order to make life a bit easier (though
sometimes it just makes it worse, see below), the C-compiler does the
following with externals. Flames to Honeywell Bull, not to me :-)
1. All characters are changed to uppercase.
2. All underscores are removed.
3. If more than six characters remain, vowels are eliminated from right
   to left until either; (1) there are only siz characters left, or
   (2) there are no more vowels.
4. If there is still more than six characters left, the excess is truncated
   right to left.

I remember I had a program with a function called strcomp(). By rule 3 
above it got reduced to strcmp(), and the linker didn't complain that
this symbol was defined both in my program and in the C-library. All
calls to strcmp() thus ended up calling my strcomp(), not exactly what
I wanted.
-- 
Robert Andersson, International Systems, Oslo, Norway
Internet:         ra@isncr.is.se 
UUCP:             ...!{uunet,mcvax,enea}!isncr.is.se!ra
UUCP in Norway:   ...!ndosl!ifi!naggum!isncr!ra

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (09/20/88)

In article <4003@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
| I said that I thought Doug Gwyn exaggerated in saying that "many" C
| implementors were not in a position to improve the linker that would
| "of necessity" be used with the output from their compiler.  The
| context was a discussion of ANSI's guaranteeing no more than 6
| significant characters in external names.
| 
| Both Doug Gwyn and Henry Spencer disagree.  But although I have been
| following this newsgroup for some time, I don't recall any specific
| cases being described of linkers that can't handle more than
| 6-character externals and that will of necessity be used to link C
| code.  Are there more than just a few?  (Remember, we're talking about
| a 6-character limit, not 7 or 8, which are more common.)

  The key here is politics and technology. Almost all of the 36 bit
machines use six char BCD names for at least some of their linkers. The
consensus was to make 31 char cases sensitive a "future direction" to
avoid any opposition from vendors who would have to write new linkers
for their C. Single case is even more prevalent, the IBM hack tell me
that most IBM systems still use it, even in EBCDIC.

  This is really a valid compromise. If a vendor is faced with either
spending a lot of money on an operating system or having competitors
saying stuff like "we have full ANSI C and {so-and-so} only has a
subset, we felt that vendors might either ignore C or actually oppose
it. To do something which might be seen as giving one vendor a marketing
advantage over another is not appropriate for the standard, since the
intent was to "codify current practice."

  Hopefully in C96 we will be able to start with a premise of "codifying
widely used extensions" or something. Being on a standards committee is
an educational experience, and I feel that I learned a lot from it.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

karl@haddock.ima.isc.com (Karl Heuer) (09/24/88)

In article <169@isncr.is.se> ra@isncr.is.se (Robert Andersson) writes:
>The linker on the Honeywell Bull DPS6 minicomputers is limited to
>6-character uppercase-only externals, and the C-compiler [removes underscores
>and vowels in order to compress to six letters, thus unintentionally mapping
>strcomp() to the same name as strcmp().]

Fortunately, such an algorithm would not be legal in an ANSI C compiler.  An
implementation is allowed to restrict the significance to six characters, but
"any identifiers that differ in a significant character are different
identifiers" [3.1.2].

I hadn't noticed this before, but the rules imply that underscores must be
significant in external identifiers.  This would be a potential problem if the
linker doesn't allow underscores.  Are there any linkers that require symbol
names to be alphanumeric only?

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
Followups to comp.std.c.