[net.unix-wizards] Prepending _ in C external names necessary?

davel@hpda.UUCP (Dave Lennert) (11/09/85)

We're considering changing our C compiler to *not* prepend an underscore
at the beginning of all external names (functions, variables).  Will this
break things?  Are there reasons (technical/religious) that we should not
do this?  Please respond via MAIL; our news connection is frequently
flakey.  I will summarize to the net.

I recall seeing a discussion of this topic before.  If someone has an
archive of it, please *mail* it to me.

Thanks!

    Dave Lennert                {ucbvax, hplabs}!hpda!davel     [UUCP]
    Hewlett-Packard - 47UX      ihnp4!hpfcla!hpda!davel         [UUCP]
    19447 Pruneridge Ave.       hpda!davel@ucb-vax.ARPA         [ARPA]
    Cupertino, CA  95014        (408) 447-6325                  [AT&T]

thomas@kuling.UUCP (Thomas H{meenaho) (11/21/85)

In article <1232@hpda.UUCP> davel@hpda.UUCP (Dave Lennert) writes:
>We're considering changing our C compiler to *not* prepend an underscore
>at the beginning of all external names (functions, variables).  Will this
>break things?  Are there reasons (technical/religious) that we should not
>do this?

I say don't do it!

In one particular C compiler I've seen one can produce perefectly good
C code that doesn't work when compiled!

Consider the following small example from a 68K machine running V7:

main(){
   extern long a7;	/* perfectly legal declaration */
   a7 = 0x1234;
}

As the variable a7 is declared as external the compiler doesn't allocate
any space for it but assumes it will be defined at link time.
If I compile this into assembler the resulting code looks something
like this:

main:
	link	a6,#0
	move.l	#0x1234,a7
	unlk	a6
	rts

As a7 is the stackpointer on a 68K the result will be at best unpredictable
with another program! Needless to say the behaviour is the same for any
legal register name.

-- 
Thomas Hameenaho, Dept. of Computer Science, Uppsala University, Sweden
Phone: +46 18 138650
UUCP: thomas@kuling.UUCP (...!{seismo,mcvax}!enea!kuling!thomas)

guy@sun.uucp (Guy Harris) (11/28/85)

> >We're considering changing our C compiler to *not* prepend an underscore
> >at the beginning of all external names (functions, variables).
> 
> I say don't do it!
>
> (Example where a variable is given the same name as the name used by a
> register in the assembler, and all h*ll breaks loose when the code is
> assembled, linked, and run)

This was (according to DMR) the reason this was done in the first place.  I
believe the 3Bs assemblers do not pre-define names like "r0" for the
registers (I think you say something like "%0"), so they can get away with
it.  Just make sure that, if you don't prepend an underscore, your assembler
has a symbol table completely devoid of built-in symbols when it's started
up (i.e., don't put opcodes, registers, etc. into the symbol table).  If you
do this, the only think you're likely to break is programs that use "nlist"
to get symbol names from the kernel or something like that - S5 programs
which use "nlist" have "#if u3b"-type stuff around the declaration of the
namelist.

	Guy Harris

speck@cit-vlsi.arpa (Don Speck) (12/02/85)

>   We're considering changing our C compiler to *not* prepend an underscore
>   at the beginning of all external names (functions, variables).

    The purpose of the underscore is merely to make it impossible for
external names to collide with the assembler's predefined identifiers.
This purpose is served just as well by appending, rather than prepending,
the underscore.  The former approach gives you a free extra character
of significance in identifiers.

	Don Speck	speck@cit-vlsi.arpa

andy@altos86.UUCP (Andy Hatcher) (12/03/85)

I've finally decided to put my two cents in.
It is not necessary for C variables to be prepended with _,

In fact, the C compiler used on our 68020 box does not prepend _,
and it works fine.  I am not a compiler wiz, but I do know that
the assembler requires register names to be prepended with %
and immediate data with &.

This has caused a very few (very minor) problems,
and we are pretty happy with the compiler.

The compiler is a system V coff format compiler.
Is this treatment of _ generic to system V?
Getting one extra character of significance in identifiers
was not the reason it was done, (the compiler already
supports long names).

	Ramblingly,
		Andy Hatcher
		Altos Computer Systems

spw2562@ritcv.UUCP (Fishhook) (12/04/85)

In article <200@brl-tgr.ARPA> speck@cit-vlsi.arpa (Don Speck) writes:
>This purpose is served just as well by appending, rather than prepending,
>the underscore.  The former approach gives you a free extra character
>of significance in identifiers.
>
>	Don Speck	speck@cit-vlsi.arpa

Some assemblers (like ours) recognize a limited number of character(6(?)).
If identifier names in the source already exceed this length, appending
an underscore won't make any diff. example:

	  Function name:         With underscore:     Resulting symbol:
prepend:  dosomegoodstuff()      _dosomegoodstuff     _dosom
append:   dosomeotherstuff()     dosomeotherstuff_    dosome 

now suppose there's a symbol dosome defined by the assembler.  By prepending
you avoid a conflict.  with appending, you don't.

==============================================================================
        Steve Wall @ Rochester Institute of Technology
        USnail: 6675 Crosby Rd, Lockport, NY 14094, USA
        Usenet: ..!rochester!ritcv!spw2562 (Fishhook)   Unix 4.2 BSD
        BITNET: SPW2562@RITVAXC (Snoopy)                VAX/VMS 4.2
        Voice:  Yell "Hey Steve!"

    Disclaimer:  What I just said may or may not have anything to do
                 with what I was actually thinking...

spw2562@ritcv.UUCP (Fishhook) (12/04/85)

[munch crunch]

I don't know if this is tru for all systems, but one system I worked
on passed parameters to system calls using specific registers.  Paremeters
to C functions are passed on the stack.  Therefore, for the compiler to
call user functions and system calls, it has to know what all the system
calls are.
UNLESS...
The system I worked on treated all system calls as function calls and
had function stubs which moved the stack parameters to the proper registers
then performed the actual system call.  Naturally, these stubs are written
in assembly.  For this to be possible, the system call has to be different
from the function call.  Answer?  Prepend every C function call with an
underline, prepend all the function stubs with an underline, DON'T prepend
system calls.  Therefore,

    ...
    write(fd,&data,bytes) in C becomes
    ...

      ...
      pushl fd,-@sp           or whatever, push the args to stack
      ...
      calls _write            and call the function
      ...

    And the library _write function stub is

      ...
    _write:                   address of function
      movl +@sp,d1            or whatever, move function args from
      ...                     stack to proper registers
      calls write             do the actual system call - no '_'
      ...                     and do the stuff to return the value it gets
                              from the system call

Without underscore prepending this would not be possible.

==============================================================================
        Steve Wall @ Rochester Institute of Technology
        USnail: 6675 Crosby Rd, Lockport, NY 14094, USA
        Usenet: ..!rochester!ritcv!spw2562 (Fishhook)   Unix 4.2 BSD
        BITNET: SPW2562@RITVAXC (Snoopy)                VAX/VMS 4.2
        Voice:  Yell "Hey Steve!"

    Disclaimer:  What I just said may or may not have anything to do
                 with what I was actually thinking...

thomas@kuling.UUCP (Thomas H{meenaho) (12/05/85)

In article <3040@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>> >We're considering changing our C compiler to *not* prepend an underscore
>> >at the beginning of all external names (functions, variables).
>> 
>> I say don't do it!
>>
>This was (according to DMR) the reason this was done in the first place.  I
>believe the 3Bs assemblers do not pre-define names like "r0" for the
>registers (I think you say something like "%0"), so they can get away with
>it.

I don't like this method of separating register names from identifiers.

In the SysV port for the 68K from Motorola they do just that. The net result
is that while you might save a few _:s, you loose much readability of the
assembler code. The silly %:s makes the code almost unreadable.
For C programs it doesn't really matter but when you're forced to do something
in assembler it's a pain.

I can't see the point why anyone would rather use the verrry frequent %:s
in favor of the rare _:s.


-- 
Thomas Hameenaho, Dept. of Computer Science, Uppsala University, Sweden
Phone: +46 18 138650
UUCP: thomas@kuling.UUCP (...!{seismo,mcvax}!enea!kuling!thomas)

ken@rochester.UUCP (Ipse dixit) (12/05/85)

In article <9109@ritcv.UUCP> spw2562@ritcv.UUCP (Fishhook) writes:
>now suppose there's a symbol dosome defined by the assembler.  By prepending
>you avoid a conflict.  with appending, you don't.

Ah, but all the assembler symbols like r6 are short. If the C symbol is
long enough to exceed the significance limit it certainly won't clash
either. Unless you define some long symbols in assembler, that is.
-- 
UUCP: ..!{allegra,decvax,seismo}!rochester!ken ARPA: ken@rochester.arpa
USnail:	Dept. of Comp. Sci., U. of Rochester, NY 14627. Voice: Ken!

radzy@calma.UUCP (Tim Radzykewycz) (12/05/85)

In article <9113@ritcv.UUCP> spw2562@ritcv.UUCP (Fishhook) writes:
>I don't know if this is tru for all systems, but one system I worked
>on passed parameters to system calls using specific registers.  Paremeters
>to C functions are passed on the stack.  Therefore, for the compiler to
>call user functions and system calls, it has to know what all the system
>calls are.

Many flavors of UNIX, and at least some other OS's I know of use
special code to make system calls.  Generally, this is something
like "trap", "emt", "sintr" or the equivalent.  These are treated
similar to interrupts, in that they cause the state of the processor
to change from user to system, change from user stack to system
stack, and several other things.  With this kind of architecture,
the system calls really do have parameters passed on the stack,
and the OS grabs the info off of the user stack whenever there is
a system call.  This eliminates the need for having a separate pair
of subroutines, one of which massages the locations of parameters
for the other.

>The system I worked on treated all system calls as function calls and
>had function stubs which moved the stack parameters to the proper registers
>then performed the actual system call.  Naturally, these stubs are written
>in assembly.  For this to be possible, the system call has to be different
>from the function call.  Answer?  Prepend every C function call with an
>underline, prepend all the function stubs with an underline, DON'T prepend
>system calls.  Therefore,

>[examples given]

>Without underscore prepending this would not be possible.

The only system I'm familiar with which has this kind of setup is XINU,
although I'm certain there are others (perhaps OS-9?).  The real point
of this followup, though, is that it is still possible to get the
functionality that Mr. Wall specified, simply by reversing the scheme:
thus, the massaging routine would be named "write", and the underlying
system call would be named "_write" (or something similar).  This
does have the disadvantage of making it possible for someone to write
a subroutine whose name conflicts with a system call, but it might
be possible to set the name to something which is illegal in C, such
as "$write", instead.  Of course, that's assuming that the '$' character
*is* legal for the assembler.

No, underscore prepending isn't the only answer, but it seems to be
the most common.  I don't see why it's such a big deal.
-- 
Tim (radzy) Radzykewycz, The Incredible Radical Cabbage
	calma!radzy@ucbvax.ARPA
	{ucbvax,sun,csd-gould}!calma!radzy

spw2562@ritcv.UUCP (12/10/85)

In article <13697@rochester.UUCP> ken@rochester.UUCP (Ipse dixit) writes:
>Ah, but all the assembler symbols like r6 are short. If ...
>-- 
>UUCP: ..!{allegra,decvax,seismo}!rochester!ken ARPA: ken@rochester.arpa
You're considering strictly assembler symbols.  I was refering also to
global names from libraries, et al, that are currently defined by (in?)
the assembler.  For example, say assembler ZZZ has 4 characters significance.
What will differentiate(sp?) a call to 'exit' from a call to 'exit_'?
'exit' and '_exit' are easy to differentiate.  I think that pre-pending
is much less likely to fail than appending.
But don't take that as a guarantee.. 8-)

==============================================================================
        Steve Wall @ Rochester Institute of Technology
        USnail: 6675 Crosby Rd, Lockport, NY 14094, USA
        Usenet: ..!rochester!ritcv!spw2562 (Fishhook)   Unix 4.2 BSD
        BITNET: SPW2562@RITVAXC (Snoopy)                VAX/VMS 4.2
        Voice:  Yell "Hey Steve!"

    Disclaimer:  What I just said may or may not have anything to do
                 with what I was actually thinking...