[comp.lang.fortran] C subroutine calls from FORTRAN

still@usceast.UUCP (02/18/87)

	Here is an interesting problem:

When I try to compile and link the following routine (and a few others
which are insignificant for this particular query), I get a bunch of
unresolved external reference messages from the linker. This is apparently
because when you call a routine from f77, an underscore (_) is appended to
the fcn identifier both in front (as cc does) and BEHIND. How does one get
around this problem ? Any ideas would be greatly appreciated.

FORTRAN program using the specified routines...

	c     PROGRAM MAIN
	      integer ovcnt
	      integer jmpbuf(10),rv
	      common /sjmp/ jmpbuf,rv
	      external onbrk
	      call trapov(50)
	      rv = 0
	      call setjmp(jmpbuf)
	      call sgset(2,onbrk)
	      call matlab(rv)
	      k = ovcnt(0)
	      if (k .gt. 0) write(6,13) k
	   13 format('total overflows ',i9/)
	      stop
	      end

The exact messages produced when executing the "link" command...


	f77 -o matlab unixsys.o lib.o mat.o handler.o -lm -lc

	Undefined:
	_setjmp_
	_sgset_
	_onbrk_

As you can see, there is something going on here. Apparently, the above
program worked just fine under BSD4.1 (or so it was claimed); and the above
routine was part of the original distribution of the MATLAB tape (for
"mainframes") dated May 1982. Any suggestions (of a friendly nature) will
be greatly appreciated.

In fact, I notice that if I try to link a C subroutine with a FORTRAN program,
I get the exact same situation (so perhaps there is no dependence on the fact
that I am trying to use some of the system C fcns).

woods@hao.UUCP (02/19/87)

In article <2324@usceast.UUCP> still@usceast.UUCP (Bert Still) writes:
>
>	Here is an interesting problem:
>
>because when you call a routine from f77, an underscore (_) is appended to
>the fcn identifier both in front (as cc does) and BEHIND. How does one get
>around this problem ?

>	      call setjmp(jmpbuf)

  I know this is awful, but it's the only way. You also link with a C file that
contains something like:

void setjmp_(jmpbuf) int *jmpbuf;
{
  (void) setjmp(jmpbuf)
}

  We've got a whole library of stuff like this here.

>Apparently, the above >program worked just fine under BSD4.1 (or 
> so it was claimed)

  I find that hard to believe, but the above C code fragment should have worked
under 4.1 too.

--Greg
>routine was part of the original distribution of the MATLAB tape 

  They probably assumed you had some kind of library for the system routines.
Some of them already have such stubs in the 3F library, but as you found
out, not all of them do.

--Greg
-- 
UUCP: {hplabs, seismo, nbires, noao}!hao!woods
CSNET: woods@ncar.csnet  ARPA: woods%ncar@CSNET-RELAY.ARPA
INTERNET: woods@hao.ucar.edu

tps@sdchem.UUCP (02/19/87)

In article <2324@usceast.UUCP> still@usceast.UUCP (Bert Still) writes:
>When I try to compile and link the following routine (and a few others
>which are insignificant for this particular query), I get a bunch of
>unresolved external reference messages from the linker. This is apparently
>because when you call a routine from f77, an underscore (_) is appended to
>the fcn identifier both in front (as cc does) and BEHIND. How does one get
>around this problem ? Any ideas would be greatly appreciated.

The answer is simple.  You can't call C routines directly from fortran unless
the latter end in an underscore.  THIS IS DEFINITELY A FEATURE.  Fortran calls
by address, C by value, so 7 times out of 8 you wouldn't want to call the
C routine directly.  The answer is, for each C routine you want to call, to
write a front-end routine in C which the fortran code actually calls, which
ends in underscore, and which properly translates to call-by-value.  For
instance, isatty() has the synopsis

	isatty( fd )
		int	fd;

Write a front end which (in the C code) you call isatty_():

	isatty_( fd )
		int	*fd;	/* a POINTER, which is what fortran passes */
	{
		return	isatty( *fd );
	}

Now in your fortran code you just do
	
	integer function isatty()
	integer ifd, ttyinput

	ifd =	0
	ttyinput =	isatty(ifd)

|| Tom Stockfisch, UCSD Chemistry	tps%chem@sdcsvax.UCSD

chris@mimsy.UUCP (02/20/87)

Time to send out another copy of this, I see.  Good thing I started
saving these a while back. . . .

Date: 3 Aug 86 21:34:31 GMT

I am going to attempt to produce the definitive article on mixed
language linking under 4BSD Vax Unix.  Stand back!

First, there is the matter of names:  The symbols in the object
files must match, that the linker may resolve the right references.
Yet each compiler has its own methods for mapping from source to
object.  Within one language we may safely ignore this mapping;
but when mixing tongues, it becomes important indeed.

The C compiler takes any global symbol and prepends an underscore
character, `_'.  Names are not limited in length---though in fact
there is a limit of about a thousand characters, no one seems to
run into it.  Thus

	int global_var;

	char *
	somefunc()
	{
		...

generates the symbols `_global_var' and `_somefunc'.

The F77 compiler limits names to six characters, then prepends and
appends an underscore:

	subroutine sub
	integer var
	common /com/ var
	...

names the subroutine `_sub_' and creates a global `variable'
containing one integer.  The `variable' is called `_com_'.  Variables
that are not part of a common block do not have global names.  F77
does not allow underscores in source-level names: `subroutine sub_1'
is illegal.  The compiler also ignores any PROGRAM name:

	program prog
	...

creates the symbol `_MAIN_'.

The Berkeley Pascal compiler strings together the names of all
nested procedures to concoct unique global names.  Only variables
defined in the `program' part are global (no surprise here), and
these names are constructed in the same way as C's globals.  However,
the program name is ignored, and the compiler uses the name
`_program':

	program foo;			{ symbol _program }
	var v: integer;			{ symbol _v }
	    procedure proc;		{ symbol _proc }
		function func;		{ symbol _proc_func }
		begin func := 0 end;	{ end proc's func }
	    begin end;			{ end proc }
	begin end.			{ end program }

generates the symbols `_program', `_v', `_proc', and `_proc_func'.
(It also generates the names `___proc_func' and `___proc', but we
shall ignore these for the moment.)  The Pascal compiler does not
permit source-level names to contain `_': i.e., `procedure proc_a'
is illegal.

It should be clear at this point that C programs can call any F77
or Pascal subroutines (procedures) or functions, and that Pascal
can call many C routines---not all, as names with underscores are
out of reach---while F77 routines can call only specially-named C
routines, namely those that end with an underscore, are less than
seven other characters, and contain no internal underscores.  F77
and Pascal routines can never call each other directly.

Even with a compatible set of names, the task is not yet done.
There remain two problems, each bound up with the other.  Every
program must have an entry point (`main'); and every language has
its libraries.  C's is the simplest of the three, for its main
looks like every other C routine and needs no libraries not used
by both F77 and Pascal as well.  F77's main is a C routine that
initialises its I/O system, traps signals, and calls the program's
_MAIN_.  Pascal's main is similar to F77's, but does not trap
signals and calls _program, not _MAIN_.  Both F77's and Pascal's
mains also save argc and argv, F77's in `_xargc' and `_xargv' and
Pascal's in __argc and __argv.

Now if you intend to call C routines from F77 or Pascal, and assuming
that these routines are entirely self-contained, all you need do
is compile the C code to object, and mention the `.o' file in the
linking command.  Of course, you must also use the proper parameter
passing conventions---but I anticipate.  Calling F77 or Pascal
routines from C, however, is a bit more difficult.  If the routines
will do no I/O, you can simply compile the routines to object, and
mention them in the linking command.  If they may do I/O, you will
need not only to initialise the I/O system, but also to clean up
afterward.  This becomes quite tricky and is best avoided whenever
possible.

F77's I/O system is initialised by the C routine `f_init' (F77's
support library is written almost entirely in C) and torn down
by the routine `f_exit'.  Both take no parameters.  Indeed, the
F77 main program consists mainly of the three lines

	f_init();
	MAIN_();	/* recall that C prepends an underscore */
	f_exit();

though there is much other code dealing with signals, and of course
with argc and argv.

Pascal's I/O system is initialised by the C routine `PCSTART'.
(Yes, Pascal's support library too is written in C.  I find it
amusing to note that other language libraries can be written in C,
but C's language libraries cannot, for the most part, be written
in the other languages.)  Pascal's main can be written in C as

	extern int _argc;
	extern char **_argv;

	main(argc, argv)
		int argc;
		char **argv;
	{

		PCSTART(0);
		_argc = argc;
		_argv = argv;
		program();
		PCEXIT(0);
		/*NOTREACHED*/
	}

---though the compiler in fact generates this directly, eliminating
an unnecessary return instruction.  PCEXIT, unfortunately, terminates
the program as well as flushing any pending output.

As to the various libraries themselves, there are many:

	Library		Used by
	-------		-------
	-lF77		F77
	-lI77		F77
	-lU77		F77
	-lpc		Pascal
	-lm		F77, Pascal
	-lc		C, F77, Pascal

In other words, all the linking commands pass `-lc' to the linker
`ld'; the others depend on the command.  `f77' calls ld with all
except `-lpc'; `pc' calls ld with `-lpc -lm -lc'.  `cc' calls ld
with only `-lc', so to use an F77 routine with a C main, one must
link with

	cc main.o f77sub.o -lF77 -lI77 -lU77 -lm

Moreover, the order of the libraries specified is also important.
`-lF77' builds on `-lI77', and `-lI77' builds on `-lU77'; all build
on `-lm' and `-lc'.  `-lpc' builds on `-lm' and `-lc'.  Thus `-lpc'
may be put anywhere with respect to `-lI77', for example; but both
must appear before `-lm'.

You should now be able to compile and link mixed language sources.
But this is not the whole story:  There is still the matter of
parameter passing.  The F77 compiler uses call by reference; the
Pascal compiler uses call by value or call by reference, depending
on the declaration of the called routine.  The C compiler invariably
uses call by value, but the language is powerful enough to simulate
other parameter mechanisms using only call by value.  One thing
that can be done in Pascal but not C is to pass arrays by value.
(This can be simulated in C using structures.)

Pretty words, those: but what do they mean?  For a strict definition
I will tell you only to consult any good compiler book; but here are
some examples:

	[f77sub.f]
		SUBROUTINE SUB (A)
		INTEGER A
		DOUBLE PRECISION D

	C Mixed mode arithmetic is legal in Unix F77
		D = A + 2.0
		CALL CSUB(D)
		RETURN
		END

	[psub.p]
	{ declare external C subroutine }
	procedure csub2(i: integer); external;

	procedure psub(var i: integer);
	begin i := 3 end;

	function pfunc(i: integer): integer;
	begin
	    pfunc := i + 2;
	    csub(i)
	end;

	[cmain.c]
	/*ARGSUSED*/
	main(argc, argv)
		int argc;
		char **argv;
	{
		int i;

		psub(&i); /* call Pascal subroutine with var parameter */
		sub_(&i); /* call F77 subroutine: call by reference */
		i = pfunc(7); /* call Pascal function with value parameter */
		exit(0);
	}

	/* called from F77: call by reference */
	csub_(d) double *d; { printf("%g\n", *d); }

	/* called from Pascal by value */
	csub2(i) int i; { printf("%d\n", i); }

Fortunately, function return values are all done the same way for
simple-valued functions.  Structure-valued functions should simply
be avoided.

Since the above example does no I/O in its F77 and Pascal routines,
and in fact calls no F77 or Pascal intrinsics, this can be compiled
with the commands:

	f77 -c f77sub.f
	pc -c psub.p
	cc -c cmain.c
	cc -o example cmain.o psub.o f77sub.o

Appending `-lF77 -lI77 -lU77 -lpc -lm' to the last command would not
hurt, and might be required in more complex cases.

There is one remaining trick in linking Pascal and C/F77 routines,
and that has to do with nested procedures and functions, and nonlocal
variable access.  Neither C nor F77 have these, and there is no
provision in the runtime environment for them.  Pascal, however,
uses something called a `display' to be able to get at nonlocal
variables.  The display manipulation is normally compiled in-line;
for procedure parameters, the compiler uses those `extra' names.
In the earlier example, these were `___proc' and `___proc_func'.
These routines do display winding for entry to _proc and _proc_func.
The unwinding after procedure parameter calls is generated in-line.
If you never use nested procedures, or nonlocal variables, you can
safely ignore this.  If you do, but do not know what a display is
all about, again I will tell you only to consult a good compiler
book.  Look at the assembly code generated by `pc -S' for details
on the display format.  Indeed, looking at the assembly code is
a good way to determine just what the compiler is really doing
for all three of these compilers.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu