[net.lang.c] $ in identifiers -- poll

bobl@aeolus.UUCP (Bob Lewis) (12/03/84)

I'd like to bring up the matter of using '$' in C identifiers.  Here are
several points of view I've come across:

Ritchie's "The C Programming Language -- Reference Manual":
While he states that '_' counts as a letter, he says nothing about '$'.

4.2bsd "cc":
'$' counts as a letter.

VAX/VMS "CC":
'$' counts as a letter, but the documentation warns:

	The dollar sign should be used only in identifiers for VAX/VMS
	global symbols.  Identifiers that contain dollar signs may not
	be portable.

What does the standard say about this?  '$' is not used anywhere else in C.
I think its use as a letter should be officially permitted.  (It makes a
nice "package" identifier, c.f. VMS.)

I'm conducting an informal poll on what other C compilers do with '$'.
If you're interested, please send me your findings and I will summarize.
(Deadline: 12/10/84).

	- Bob L.
	  ...!tektronix!teklds!bobl

jss@sftri.UUCP (J.S.Schwarz) (12/05/84)

-- 

> I'd like to bring up the matter of using '$' in C identifiers.  Here are
> several points of view I've come across:
> 
> What does the standard say about this?  '$' is not used anywhere else in C.
> I think its use as a letter should be officially permitted.  (It makes a
> nice "package" identifier, c.f. VMS.)
 
The latest (Oct 31) ANSI draft that I have does not include '$' as
a character in identifiers.  

It should be pointed out that many preprocessors (such as YACC) use
'$' in some way to distinguish "special" identifiers.  It would
cause confusion if '$' was now made legal in ordinary C identifiers.

Looking at my keyboard, the characters not currently used by C (outside
of comments and strings) are '@', '$', and backquote(`).  If you really 
need a new nonalphabetic character in identifiers I would suggest backquote.
Identifiers like xyz`p look nice to me. (But personally, I see no
need for such a modification to the language definition.)

	Jerry Schwarz
	ihnp4!btlunix!jss

henry@utzoo.UUCP (Henry Spencer) (12/06/84)

'$' is, alas, commonly used as an escape from C, for example in yacc.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

steveg@hammer.UUCP (Steve Glaser) (12/07/84)

In article <260@sftri.UUCP> jss@sftri.UUCP (J.S.Schwarz) writes:
>-- 
>
>> I'd like to bring up the matter of using '$' in C identifiers.  Here are
>> several points of view I've come across:
>> 
>> What does the standard say about this?  '$' is not used anywhere else in C.
>> I think its use as a letter should be officially permitted.  (It makes a
>> nice "package" identifier, c.f. VMS.)
> 
>The latest (Oct 31) ANSI draft that I have does not include '$' as
>a character in identifiers.  
>
>It should be pointed out that many preprocessors (such as YACC) use
>'$' in some way to distinguish "special" identifiers.  It would
>cause confusion if '$' was now made legal in ordinary C identifiers.
>
>Looking at my keyboard, the characters not currently used by C (outside
>of comments and strings) are '@', '$', and backquote(`).  If you really 
>need a new nonalphabetic character in identifiers I would suggest backquote.
>Identifiers like xyz`p look nice to me. (But personally, I see no
>need for such a modification to the language definition.)
>
>	Jerry Schwarz
>	ihnp4!btlunix!jss

Backquote is already used in the GCOS version of C.  Try this on you're
favorite C compiler (4.2 in this case, but it's in all versions of PCC
I've looked for it).

	% cat test2.c
	main()
	{
		char c[] = `hi there`;
	}
	% cc test2.c
	"test2.c", line 3: no automatic aggregate initialization
	"test2.c", line 3: BCD constant exceeds 6 characters
	"test2.c", line 3: gcos BCD constant illegal
	"test2.c", line 3: illegal left operand of assignment operator
	%

Seriously though, I can't see adding anymore letters to the C language
identifier set just "cause it'd be nice and besides VMS does it".  If
we have to change that area, let it be in the area of somehow allowing
the various national variants of the ISO character code (of which US
ASCII is just one).  That's the only character set issue that I can see
that's important enough to warrant changing the language for.
Unfortunately, with the overloading of characters ('|' is a letter in
some countries), I don't see an easy solution emerging.

	Steve Glaser
	steveg.tektronix@csnet-relay
	tektronix!steveg

smh@mit-eddie.UUCP (Steven M. Haflich) (12/09/84)

Another very ugly but very practical reason for not allowing additional
alphameric characters in identifiers is portability.  Regardless what
the C Standard eventually says, not all machine/OS combinations support
all C-ASCII characters in identifiers (especially externals), and some
support non-C-ASCII characters.  There is little a standards committee
or net.lang.c can do about this [except, I suppose, flame :-)].

My intuition is that languages and OS/linkers most commonly allow
exactly *one* legal nonalphameric character in identifiers, and this
character is most often overloaded as an informal package flag:  `_' as
an external prefix char in Unix/C, `$' in lots of Big Blue systems, etc.
When porting code either direction, the simple one-to-one mapping of
these characters saves a lot of grief.  Let's not make it tougher to
*ex*port Unix/C to other systems by trying to make it very occasionally
easier to *im*port foreign code.

It is my opinion, by the way, that the traditional availability of these
informal package-flag chars inside identifiers was a portability botch,
mostly impeding exporting code, not importing.  But it is only recently
that vendors and their captive language designers have come to realize
that exportability of code can sell machines just as well as
importability.

henry@utzoo.UUCP (Henry Spencer) (12/11/84)

For what it's worth, the current ANSI C draft (12 Nov 1984) says that
dollar signs aren't in C's vocabulary at all (except for the usual
exemption for comments and strings), but adds the following in the
"Common extensions" discussion in appendix E:

	E.4.4.1 Specialized identifiers
	
	Characters other than the underscore _, letters, and digits, that
	are not defined in the required source character set (such as the
	dollar sign $) may appear in an identifier.

Note that this is identified as a "common extension", not as part of the
standard proper, and "[is] not portable to all implementations".
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

85488116@sdcc3.UUCP (Oliver Boliver Butt) (12/12/84)

> For what it's worth, the current ANSI C draft (12 Nov 1984) says that
> dollar signs aren't in C's vocabulary at all (except for the usual
> exemption for comments and strings), but adds the following in the
> "Common extensions" discussion in appendix E:
> 
> 	E.4.4.1 Specialized identifiers
> 	
> 	Characters other than the underscore _, letters, and digits, that
> 	are not defined in the required source character set (such as the
> 	dollar sign $) may appear in an identifier.

    *
Yacc  uses $$ and $n to allow for manipulations on its parsing stack.  In this
case, $$ & $n are used as pseudo identifiers for the stack in the C code actions
which get executed when a rule gets reduced.  Therefore, there is an ambiguity
problem with yacc and the proposed $ operators when you try to compile the code
yacc generates.   So you either have to change yacc, or forget about the $ ops.

Paul van de Graaf	U. C. San Diego		sdcsvax!sdcc3!85488116

* If you don't know, yacc stands "for yet another compiler compiler" It is a tool
  which generates a compiler given a LR(1) grammar and some supporting code that 
  the programmer writes.

robert@gitpyr.UUCP (Robert Viduya) (12/14/84)

><

Perhaps what C really needs is a way to define a seperate external name for
an identifier when declaring that identifier.  For example, the following:

    extern sys_read "SYS$READ" ();

would tell the compiler/linker to call the entrypoint  "SYS$READ"  whenever
"sys_read"  was called in the source.  "sys_read" would be the source level
name of the procedure and "SYS$READ" would be the object level name of  the
procedure.   This should also be able to be applied to int's and other data
structures.  The definition  of  the  object  level  name  should  also  be
optional.

Of course, the syntax is a only a suggestion.

                        robert

-- 
Robert Viduya
Office of Computing Services
Georgia Institute of Technology, Atlanta GA 30332
Phone:  (404) 894-4669

...!{akgua,allegra,amd,hplabs,ihnp4,masscomp,ut-ngp}!gatech!gitpyr!robert
...!{rlgvax,sb1,uf-cgrl,unmvax,ut-sally}!gatech!gitpyr!robert

geoff@desint.UUCP (Geoff Kuenning) (12/18/84)

In article <422@gitpyr.UUCP> robert@gitpyr.UUCP (Robert Viduya) writes:

>    extern sys_read "SYS$READ" ();
>
>would tell the compiler/linker to call the entrypoint  "SYS$READ"  whenever
>"sys_read"  was called in the source.

Bravo for this idea!  The syntax, however, conflicts with the "old-style
initializer" syntax.  Anybody got ideas for a parseable syntax?
-- 

	Geoff Kuenning
	...!ihnp4!trwrb!desint!geoff

garys@bunker.UUCP (12/19/84)

> In article <422@gitpyr.UUCP> robert@gitpyr.UUCP (Robert Viduya) writes:
> 
> >    extern sys_read "SYS$READ" ();
> >
> >would tell the compiler/linker to call the entrypoint  "SYS$READ"  whenever
> >"sys_read"  was called in the source.
> 
> Bravo for this idea!  The syntax, however, conflicts with the "old-style
> initializer" syntax.  Anybody got ideas for a parseable syntax?
> -- 
> 
> 	Geoff Kuenning

OK, how about:

	extern sys_read() = "SYS$READ";

Gary Samuelson

joe@fluke.UUCP (Joe Kelsey) (12/20/84)

>From: henry@utzoo.UUCP (Henry Spencer)
>'$' is, alas, commonly used as an escape from C, for example in yacc.

I ported yacc to VMS, where identifiers can contain $, and encountered
absolutely no problems!  yacc only uses identifiers of the form $n,
where n is some small number, so as long as you stay away from
identifiers like that, there is no problem.  I see no real reason to
loose sleep over the incompatibility with yacc, as there is no problem
in practice.

/Joe

tim@cmu-cs-k.ARPA (Tim Maroney) (12/20/84)

Re the "new" idea for resolving lexical mismatches between an OS and C by
introducing some construct such as

OSid text_limit "text$space$top";

(making the compiler put out the identifier "text$space$top" [illegal in C]
in place of all ocurrences of "text_limit"), I am very much in favor of this
idea.  I would like to point out, though, that I suggested this last Spring
on this very newsgroup and was met by thundering silence.  It is not a "new"
idea at all.
-=-
Tim Maroney, Carnegie-Mellon University Computation Center
ARPA:	Tim.Maroney@CMU-CS-K	uucp:	seismo!cmu-cs-k!tim
CompuServe:	74176,1360	audio:	shout "Hey, Tim!"

"Remember all ye that existence is pure joy; that all the sorrows are
but as shadows; they pass & are done; but there is that which remains."
Liber AL, II:9.

sde@Mitre-Bedford (12/21/84)

Actually, HP Pascal has had that feature, with slightly different syntax,
for years, and for exactly that reason.

mab@druxp.UUCP (BlandMA) (12/22/84)

There have been several suggestions to resolve the $ identifier problem
by adding new constructs to the language.  Perhaps a better (at least
different) place would be to have the linker/loader (whatever you call it)
do the name translation.  I seem to recall an IBM link editor capable of
something like this.

For example, to make a call to the sys$whatever function, you would
write your code using a syntactically valid C identifier, such as
sys_whatever.  The load command would include some option to resolve
the symbol "sys_whatever" to "sys$whatever".  There would probably be
a file on the system somewhere that contains the commonly used translations
for that particular system.

I would prefer this solution over changing every compiler for
every language that currently doesn't allow $ in identifiers.
-- 
Alan Bland
{ihnp4, allegra}!druxp!mab
AT&T Information Systems Labs, Denver

jack@vu44.UUCP (Jack Jansen) (12/26/84)

If the idea to make a construct like
extern sys_read "SYS$READ" ();
cannot be implemented like this because of the conflict with
old-style initialization, why not use the "entry" keyword?
It is still a reserved word (well, according to my K&R, at least),
and I've never heard of an implementation using it.
Besides that,
extern sys_read entry "SYS$READ" ();
looks even more intellegible (to me).
-- 
	Jack Jansen, {seismo|philabs|decvax}!mcvax!vu44!jack
	or				       ...!vu44!htsa!jack

david@ukma.UUCP (David Herron, NPR Lover) (12/28/84)

I've got some bad news.  Using "entry" to mark external
identifiers that have bad names seems (on the surface) to
be a bad idea.  However entry is not in the standard as 
a reserved word (this was as of Oct. 17).  So it would
have to be added.

I have never heard of a C compiler that uses it.  Are there
any?

David Herron.

kpmartin@watmath.UUCP (Kevin Martin) (12/28/84)

>How about:
>
>	extern sys_read() = "SYS$READ";
>
>Gary Samuelson


That only works for functions... If the identifier is an object, this looks
like an initializer.
                             Kevin Martin, UofW Software Development Group.

kpmartin@watmath.UUCP (Kevin Martin) (12/28/84)

>For example, to make a call to the sys$whatever function, you would
>write your code using a syntactically valid C identifier, such as
>sys_whatever.  The load command would include some option to resolve
>the symbol "sys_whatever" to "sys$whatever".  There would probably be
>a file on the system somewhere that contains the commonly used translations
>for that particular system.
This stil leaves the problem of getting at the UNcommonly-used funny-named
variables. Perhaps the user has to supply another translation file for
these...

>I would prefer this solution over changing every compiler for
>every language that currently doesn't allow $ in identifiers.
>Alan Bland
>{ihnp4, allegra}!druxp!mab
>AT&T Information Systems Labs, Denver


I would prefer being able to tell what is happening just from reading the C
code. Having to search in (several) places to find which names map to what
would be an ongoing cost, compared to the fixed cost of having the compiler
translate the names. Having many symbols automatically mapped could easily
cause external name clashes too...

A C compiler which allows re-naming like this could also be used to port
long-name programs to systems with 6-character linkers with minimal effort.

It should be noted that '$' is not the only offending character. '.' is also
popular, and there is *no way* of including it in C identifiers.
                         Kevin Martin, UofW Software Development Group.

henry@utzoo.UUCP (Henry Spencer) (12/30/84)

> ...  However entry is not in the standard as 
> a reserved word (this was as of Oct. 17).  So it would
> have to be added.
> 
> I have never heard of a C compiler that uses it.  Are there
> any?

Actually, "put back" is more accurate than "added".  It has been a
"reserved for future use" keyword in C for a long time, but the ANSI
committee decided (in my opinion, correctly) that it did not seem to
have a future and thus could be deleted.  As far as I know, nobody
has ever done anything with it.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

bsa@ncoast.UUCP (Brandon Allbery (the tame hacker on the North Coast)) (01/02/85)

How about this construct:

	extern sys_read() : "SYS$READ";

About the only possible clash is with bitfields, and this wouldn't be
useful where bitfield declarations are legal (i.e. inside structures).
It also wouldn't eat up the `entry' keyword.

--bsa
-- 
  Brandon Allbery @ decvax!cwruecmp!ncoast!bsa (..ncoast!tdi1!bsa business)
	6504 Chestnut Road, Independence, Ohio 44131   (216) 524-1416
    Who said you had to be (a) a poor programmer or (b) a security hazard
			     to be a hacker?

jack@vu44.UUCP (Jack Jansen) (01/03/85)

> How about this construct:
> 
> 	extern sys_read() : "SYS$READ";
> 
I think that it should be closer to the identifier. It is the
identifier that is modified, after all.
If you don't want to use 'entry', the use ':', but keep it next to
the identifier, something like
	extern sys_read:"SYS$READ"();

This is also much more readable in the case of *defining*
funny names, which hasn't been discussed yet, but which can
be just as useful, for instance when you're writing a
library and you want to hide your internal routines.

By the way, I'm still in favor of using 'entry' in stead of
Yet Another Funny Char. This usage won't even make the entry
symbol unusable, since the use of entry for defining entrypoints
is presumably somewhere inside the *code*, not the declarations.
-- 
	Jack Jansen, {seismo|philabs|decvax}!mcvax!vu44!jack
	or				       ...!vu44!htsa!jack
If *this* is my opinion, I wasn't sober at the time.

elbaum@reed.UUCP (Daniel Elbaum) (01/06/85)

.
	Actually, OASIS systems use this keyword in their C compilers.
It's kind of useful, since you can use it to enter values at compile
time, thereby aiding debugging.

				-Daniel Elbaum

{decvax, ucbvax, pur-ee, uw-beaver, masscomp, cbosg,
 mit-ems, psu-cs, uoregon, orstcs, ihnp4, uf-cgrl}!tektronix
						  teneron----\
						  ogcvax------+-!reed!elbaum
						  muddcs-----/
						  cadic-----/
						  oresoft--/
						  grpwre--+