[comp.lang.c] compilers and linkers

pal@calmasd.GE.COM (Peter Lawrence) (11/13/87)

   It appears that in Unix derived C compilers global names get an
underscore prepended to them before they end up as symbols in the object file.
It also appears that in Unix derived Fortran compilers global names get an
underscore appended to them. I am sure there is going to be a bad reason for
this but I have to know: why the underscores and why in different places.
Most other compilers dont do this kind of thing, but how in Unix environments
does one link C and or Fortran with Pascal for example.
-- 
pal@calmasd.GE.COM   or  ...!sdcsvax!calmasd!pal

limes%ouroborous@Sun.COM (Greg Limes) (11/14/87)

In article <2522@calmasd.GE.COM> pal@calmasd.GE.COM (Peter Lawrence) writes:
>... C global names get an underscore prepended to them ...
>... FORTRAN global names get an underscore appended to them.
>how does one link C and or FORTRAN with Pascal?

Peter, if C and FORTRAN globals did not get slightly different names,
one might be tempted to assume that they used the same calling and
storage conventions. In at least one FORTRAN compiler that I know of,
parameters are passed by address to subroutines. Thus, if you call a
FORTRAN subroutine from a C program, you have to pass pointers.
Likewise, if your C routine is called by a FORTRAN routine, what you
get is pointers to the FORTRAN parameters. Note that, in this case,
the interface conversion burden is entirely on the C language.

Thus, a C function callable by FORTRAN would be declared

	int
	foo_ (barp, bazp)
		int	*barp;
		int	*bazp;
	{
	}

and would be called from fortran as:

	CALL _FOO (IBAR, IBAZ)

Note the careful modification of the name "foo" so that each
can talk to the other; the assembler thinks they are both
using the global symbol "_foo_".

** You will probably have to contact a guru about your particular
C, FORTRAN, and Pascal implementations. Be sure to ask him about
nice things like naming of routines, calling sequences, return
values, and so on.

-- Greg [are we there yet?] Limes (limes@sun.com)

DISCLAIMER: If I spoke for SUN, who would speak for me?

rab@mimsy.UUCP (Bob Bruce) (11/14/87)

In article <2522@calmasd.GE.COM> pal@calmasd.GE.COM (Peter Lawrence) writes:
>   It appears that in Unix derived C compilers global names get an
>underscore prepended to them before they end up as symbols in the object file.
>It also appears that in Unix derived Fortran compilers global names get an
>underscore appended to them. I am sure there is going to be a bad reason for
>this but I have to know: why the underscores ...

There are several reasons for this.  Most Unix compilers produce 
assembly code at some point during compilation.  Imagine what
the assembler will think if you use symbols such as `sp', `r0',
`push', `pop', `add', `and', etc.  Since these symbols may
have some type of special meaning to the assembler, such as
register names or operand mnemonics, they cannot be used as
variable names.   If you pass symbols directly to the assembler
then you must restrict the name space available to C programmers.
By prepending an underscore this problem is avoided.

Another advantage of prepending the underscore is that when
you write assembly language routines, as long as you don't
prepend underscores you can create global names that are
unaccessible from C code.  This makes for more modular code.
This is useful if you want to write optimized assembly
routines that don't use the same procedure call protocol
as C.  (e.g: The Sun C compiler uses this technique to call
`ldivt', `lmult', etc. to emulate 32 bit arithmetic when
generating 010 executables.)

> ... and why in different places.

The C and fortran put the underscores in different places so that
you can have the same functions with the same name available in each
(e.g., sin, cos, etc), and still be able to link C and fortran
code with each function being extracted from the appropriate library.
(i.e. _sin from the C math library, _sin_ from the fortran library).

>Most other compilers dont do this kind of thing, but how in Unix environments
>does one link C and or Fortran with Pascal for example.

If you want to reference a fortran variable in C then you prepend
an underscore to the variable in your C program.  So `INTEGER X' in
fortran is `int X_' in C.  (There is, of course, no guarantee that
an `int' in C is the same size, or referenced in the same manner,
as an `INTEGER' in fortran.)

gp3147@sdcc15.UUCP (stockfisch) (11/14/87)

In article <33890@sun.uucp>, limes%ouroborous@Sun.COM (Greg Limes) writes:
> In article <2522@calmasd.GE.COM> pal@calmasd.GE.COM (Peter Lawrence) writes:
> >... C global names get an underscore prepended to them ...
> >... FORTRAN global names get an underscore appended to them.

And Fortran globals also get an underscore prepended.

> >how does one link C and or FORTRAN with Pascal?
> 
> ...In at least one FORTRAN compiler that I know of, parameters are
> passed by address to subroutines.

This is true of ALL Fortran compilers.

> Thus, a C function callable by FORTRAN would be declared
> 
> 	int
> 	foo_ (barp, bazp)
> 		int	*barp;
> 		int	*bazp;
> 	{
> 	}

If your C function has the synopsis int "foo( barp, bazp ) int barp,
bazp;" then I would write an interface routine just like you have
above, but then would add between the braces:

	return	foo( *barp, *bazp );

> and would be called from fortran as:
> 
> 	CALL _FOO (IBAR, IBAZ)

This is completely wrong.  Identifiers are not allowed to have
underscores in Fortran.  The underscore is not needed anyway, since
the fortran compiler prepends and postpends an underscore.  The C 
compiler prepends, so "foo_" in a C source is matched to "foo" in a
Fortran source.
All of this is for any UNIX(supply your own footnote) system, I
couldn't say how universal it is otherwise.

|| Tom Stockfisch, UCSD		tps@sdchemf.UCSD.EDU

gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/14/87)

In article <2522@calmasd.GE.COM> pal@calmasd.GE.COM (Peter Lawrence) writes:
>   It appears that in Unix derived C compilers global names get an
>underscore prepended to them before they end up as symbols in the object file.
>It also appears that in Unix derived Fortran compilers global names get an
>underscore appended to them. I am sure there is going to be a bad reason for
>this but I have to know: why the underscores and why in different places.
>Most other compilers dont do this kind of thing, but how in Unix environments
>does one link C and or Fortran with Pascal for example.

Not all UNIX-derived C compilers prepend an underscore to globals,
but many do.  The reason for this should be obvious by considering:
	double R0;
	void iterate(double r) { R0 = r * R0 * (1 - R0); }
What assembly code would this generate?  If the assembler knows "R0"
as the name of register #0, the appearance of the global names unmodified
as operands in the assembly language could be confused with the registers
and the code would be broken.  Some assemblers do not have this problem,
and for them the C compiler normally would not prepend an underscore.

Fortran subprograms are given different names from C functions because
the two languages do not have the same linkage conventions (calling
sequences), so to prevent C library functions from intruding in Fortran's
external name space, the postpended-_ convention ensures that they are
not meaningful names to Fortran.  (Note that Fortran globals normally
also have prepended underscores in evironments where that is done for
C globals, for the same reasons.)

It is fairly easy for most UNIX Fortrans to write a C function that
can be called from Fortran, once the Fortran linkage conventions are
understood.  The C function name would of course be the same as its
Fortran counterpart (probably mapped to lower-case) with an underscore
appended.

Chris Torek periodically posts an article on C/Fortran interfacing,
the last posting being not very long ago as I recall.  You should also
be able to find information about this in your Programmer's Guide.

pd1h+@andrew.cmu.EDU (Philip H. Dye) (11/17/87)

Does anyone out there have a C-compiler for a Motorola 6801 or 6811.

I could really use one.

Thanks,

Philip Dye

gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/17/87)

In article <387@sdcc15.UUCP> gp3147@sdcc15.UUCP (stockfisch) writes:
>> ...In at least one FORTRAN compiler that I know of, parameters are
>> passed by address to subroutines.
>This is true of ALL Fortran compilers.

Not so -- read the Fortran-77 spec.

turner@sdti.UUCP (Prescott K. Turner) (11/17/87)

In article <387@sdcc15.UUCP>, gp3147@sdcc15.UUCP (Tom Stockfisch) writes:
> In article <33890@sun.uucp>, limes%ouroborous@Sun.COM (Greg Limes) writes:
> > ...In at least one FORTRAN compiler that I know of, parameters are
> > passed by address to subroutines.
> This is true of ALL Fortran compilers.

Fortran would be better off if this were so.  However, both the Fortran 77
standard and the new proposed Fortran restrict the kinds of aliasing which
a program can use, so that compilers can handle parameters by 
copy-in/copy-out.  Some important implementations actually do it this way.
I believe IBM's mainframe compilers are examples, and pass REAL scalars
this way.

--
Prescott K. Turner, Jr.
Software Development Technologies, Inc.
375 Dutton Rd., Sudbury, MA 01776 USA        (617) 443-5779
UUCP:necntc!necis!mrst!sdti!turner

atbowler@orchid.waterloo.edu (Alan T. Bowler [SDG]) (11/20/87)

In article <180@sdti.UUCP> turner%sdti@harvard.harvard.edu (Prescott K. Turner, Jr.) writes:
>In article <387@sdcc15.UUCP>, gp3147@sdcc15.UUCP (Tom Stockfisch) writes:
>> In article <33890@sun.uucp>, limes%ouroborous@Sun.COM (Greg Limes) writes:
>> > ...In at least one FORTRAN compiler that I know of, parameters are
>> > passed by address to subroutines.
>> This is true of ALL Fortran compilers.
>
>Fortran would be better off if this were so.  However, both the Fortran 77
>standard and the new proposed Fortran restrict the kinds of aliasing which
>a program can use, so that compilers can handle parameters by 
>copy-in/copy-out.
>

The fact the the subroutine handles the passed value with a copy-in/copy-out
rule does not change the fact that the parameter was passed to the
subroutine by address.  The act of making a local copy of the parameter
is done by the subroutine (and on the IBM compilers can be overridden),
for efficiency.  The subroutine linkage used by the caller just passes
addresses for everything.
   One could imagine an implementation where the choice was made by
the caller between copy-in/copy-out, and reference, but it would have
be a strange architecture for it not to be less efficient in both
memory and time.  Furthermore, it would not allow some relatively
common (though questionable) practices such as passing a scalar as
a 1 element vector.  As far as I know, the statement that ALL
Fortran compilers pass parameters by address is true.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/23/87)

In article <11772@orchid.waterloo.edu> atbowler@orchid.waterloo.edu (Alan T. Bowler [SDG]) writes:
>As far as I know, the statement that ALL
>Fortran compilers pass parameters by address is true.

Try the following experiment:
	PROGRAM TEST
	LOGICAL FUNC
	INTEGER I
	DO 10 I = 1, 2
10	WRITE (*,*) FUNC(3)
	END
	LOGICAL FUNCTION FUNC(K)
	INTEGER K
	IF (K .EQ. 3) THEN
		K = 5
		FUNC = .TRUE.
	ELSE
		FUNC = .FALSE.
	ENDIF
	END
If all parameters are truly passed by address, then the constant "3"
would be changed for I.EQ.1 and therefore the second invocation of
FUNC would return .FALSE.  In fact, this did happen for many older
implementations of FORTRAN!  I doubt that it happens for any current
FORTRAN-77 implementation.

This somehow no longer seems like a comp.lang.c issue..