[net.lang.c] 4.2 extern a DISASTER

kempf@hplabsc.UUCP (Jim Kempf) (07/14/85)

I find the 4.2 C compiler's treatment of extern a total and
complete DISASTER. I have so far been completely unsuccessful
at using extern, though with the 4.1 compiler (which was more
or less K & R standard), there was no trouble. I have traced
loading, and find that, if I specify no initializations for
the variables, the compiler generates TWO seperate common
blocks for the declarations in two seperate files, and the
linker refuses to resolve the two as a reference to the
same variable. If I specify initialization, the compiler
generates an external declaration all right, but no matter
what I do, I can't avoid getting an undefined external
reference from the linker. If I put the extern keyword
before the declaration, if the declaration is at the
top level or in the variable declarations for the block
in which I intend to use the variable-it makes no difference.
The linker tells me that the reference is to an undefined
extern.

Listen: I know extern variables are an undesirable feature
from the software engineering viewpoint, but sometimes they
are handy and, in face, make code easier to understand or
reduce the amount of code one must write. I certainly don't
want some anonomyous compiler writer making it impossible
for me to use them!
		jim kempf	kempf@hplabs

PS: The explaination of extern for the 4.2 compiler on pg.
82 of Harbison & Steele is WRONG, or, if not wrong, then
just incomplete enough to be useless.

mjs@eagle.UUCP (M.J.Shannon) (07/15/85)

> I find the 4.2 C compiler's treatment of extern a total and
> complete DISASTER. I have so far been completely unsuccessful
> at using extern....
> 		jim kempf	kempf@hplabs

From your note, it's unclear what you've done, but here's how the language
defines what must happen:

Correct:	(file0.c)		|		(file1.c)

extern int gronk;				int gronk;

Correct:

int gronk;					extern int gronk;

Incorrect: /* linker should complain about gronk being undefined */

extern int gronk;				extern int gronk;

Incorrect: /* but most/many/some compilers accept this */

int gronk;					int gronk;


In all but the last case, the definition (declaration without extern) may
include an initialization of gronk.  The last case is somewhat special.  The
historical interpretation is that each file will result in a common area (of
size (sizeof (int))) being defined for gronk, and the linker will resolve both
names to the same address.  Strictly speaking (according to K&R), if there are
N files with declarations of gronk, N-1 of them must have extern.  Most/many/
some compilers allow at most 1 initialization of gronk, and resolve all
references to the initialized data.  Does this answer your question?
-- 
	Marty Shannon
UUCP:	ihnp4!eagle!mjs
Phone:	+1 201 522 6063

mjs@eagle.UUCP (M.J.Shannon) (07/15/85)

An addendum to my previous note.  Dennis Ritchie has commented in this forum on
the suitability of the strict interpretation of my last example.  In a
nutshell, he said that it was a mistake to put it in the manual at all, and
it was there for a single machine (Honeywell running GCOS) whose linker had
arbitrary restrictions on the use of common symbols.  I'm not sure exactly
what the ANSI (draft) standard says on this matter, as I don't have my copy
handy.
-- 
	Marty Shannon
UUCP:	ihnp4!eagle!mjs
Phone:	+1 201 522 6063

gwyn@BRL.ARPA (VLD/VMB) (07/15/85)

Here is how to use external functions and data in C:

All data etc. may have its type DECLARED as often as necessary
to get correct code generated but must be DEFINED precisely
once.  In every DECLARATION of external data etc., use the
keyword "extern".  In the sole place that each datum etc. is
DEFINED (i.e., allocated storage if data, code if function),
omit the "extern" keyword.

If you do not specify an explicit initializer in the
DEFINITION (note that you cannot initialize a DECLARATION),
then external data is pre-initialized to zero (just what this
means has been subject to various interpretations; it is best
to explicitly initialize all extern storage except for that
that will always be stored into before being fetched).  An
external function that has never been DEFINED (by exhibiting
its source code instructions) will be flagged by the linker
as an unresolved global reference.

Of course, the standard C library provides several external
functions so that any references to them will be satisfied at
link time.

Some C compilers cheat a bit on the above rules; notably, most
UNIX C compilers allow multiple external definitions of the
same datum so long as not more than one of them explicitly
initializes the datum.  But you should not rely on this.

EXAMPLE:

Contents of file main.c:

	extern void sub();	/* may be in another file */
	extern int count;	/* ditto */

	main( argc, argv )
		int argc;
		char *argv[];
	{	/* definition of "main" starts here */
		count = 1;	/* stores into datum which
				   has storage allocated by
				   some other file	*/
		sub();		/* invokes function defined
				   elsewhere		*/
		return 0;
	}

Contents of file sub.c:

	int count /* = 0 */ ;	/* defines "count"; initialization
				   redundant in this case, since
				   0 is the default if not
				   specified */
	void sub()
	{	/* definition of "sub" starts here */
		(void)printf( "%d\n", count );
		/* because "printf" was not declared, it is
		   assumed to be an extern int function;
		   this is one of C's magic default rules */
	}

chris@umcp-cs.UUCP (Chris Torek) (07/16/85)

Er, um, I hate to tell you this, but the treatment of ``extern''
didn't change between 4.1 and 4.2.  Are you sure someone didn't
just break ld?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

faustus@ucbcad.UUCP (Wayne A. Christopher) (07/16/85)

> I find the 4.2 C compiler's treatment of extern a total and
> complete DISASTER. I have so far been completely unsuccessful
> at using extern, though with the 4.1 compiler (which was more
> or less K & R standard), there was no trouble.

You ought to include a sample of code in which you find externs not
to work. I have no idea what you are talking about, and since hundreds
of thousands of lines of C have compiled perfectly with the 4.2 compiler,
I think it is more likely that either your local compiler is broken or
you are confused.

	Wayne

george@mnetor.UUCP (George Hart) (07/17/85)

> An addendum to my previous note.  Dennis Ritchie has commented in this forum on
> the suitability of the strict interpretation of my last example.  In a
> nutshell, he said that it was a mistake to put it in the manual at all, and
> it was there for a single machine (Honeywell running GCOS) whose linker had
> arbitrary restrictions on the use of common symbols.  I'm not sure exactly
> what the ANSI (draft) standard says on this matter, as I don't have my copy
> handy.
> -- 
> 	Marty Shannon
> UUCP:	ihnp4!eagle!mjs
> Phone:	+1 201 522 6063

Excerpts from the 84 04 26 Draft ANSI C Standard.
(It seems as if the standard is saying the compilers which resolve
the variables in your last example to the same location are doing
it right!  Please correct me if I'm misinterpreting things.)

=============================

Section 7.2, "External data definition" 

  "A declaration of the identifier of an object with an initializer
   constitutes the DEFINITION of the object (i.e., that declaration for
   which the storage is allocated.

   A declaration of the identifier of an object without an initializer and
   -----------------------------------------------------------------------
   without the keyword "extern" constitutes a TENTATIVE DEFINITION.  If a
   ----------------------------------------------------------------------
   definition is encountered , all tenative definitions are taken to be
   --------------------------------------------------------------------
   declarations of the same object (subject to the linkage rules of
   ----------------------------------------------------------------
   Section 2.2.1).  If no subsequent definition is encountered, the first
   ----------------------------------------------------------------------
   tenative definition is taken to be a definition with initializer equal
   ---------------------------------------------------------------------
   to 0.
   -----

   A declaration of the identifier of an object without an initializer and
   with the keyword "extern" requires that somewhere in the entire program
   there must be exactly one definition for the identifier."


From Section 2.2.1 "Scopes and linkages of identifiers"

   ...In the set of source files and libraries that constitutes an entire
   program, every instance of an identifier with external linkage denotes
	    -------------------------------------------------------------
   the same function or object....
   ----------------------------

   ...For an identifier of a function or object declared with file scope, if
   the lexically first declaration in the source file contains the keyword
   "static", the identifier has internal linkage.  Otherwise the
						   -------------
   identifier has external linkage....
   --------------------------------

==============================
-- 


Regards,

George Hart, Computer X Canada Ltd.
UUCP: {allegra|decvax|linus|ihnp4}!utzoo!mnetor!george
BELL: (416)475-8980

kempf@hplabsc.UUCP (Jim Kempf) (07/24/85)

This is where the problem is occuring:

file1.c

int foo;

file2.c

int uses_foo()

{
  extern int foo;

   ...
}

According to the K&R standard, any references to foo within uses_foo()
should refer to the external variable in file1.c, and the linker
should take care of making the connection (unless, of course, there
is a static within file2.c's file scope which masks the external
definition). 

Not according to the 4.2 compiler, however. If I try to run this, 
I get various bus errors, etc. (it links OK). 

Furthermore, according to Haberson & Steele (and I've verified this
by using the verbose mode of ld), if one defines foo as above, 
a FORTRAN like COMMON block is allocated for foo, and one could 
declare foo in file2.c *without* declaring the storage class static
and *without* declaring foo extern and the linker will *not* make the
connection between the two.

The only way to get foo declared as a true external is to include
an initialization:

file1.c

int foo=0;

file2.c

extern int foo;

uses_foo()

{
  ...

}

This will work. Notice that I did NOT declare foo external within the
scope of uses_foo(). This, also, does NOT work with the 4.2 compiler.
I find this particular point an abomination, since I like to declare
all the externs that I use in a particular function at the top of
the function, where the locals are declared. This makes it easy for
someone reading the program to simply glance at the function header
to see how a particular variable is used, rather than having to
page backward several pages through the listing for the whole file.

My conclusion from all this is that the 4.2 compiler has taken something
which was fairly easy and intuitive to use and made it so complicated
that it is no longer useful.

Unfortunately, the ANSI standard doesn't look like much of an improvement.
I do have their Feb. document, and they essentially support the "two
external storage classes" model-FORTRAN like COMMON if no initialization
is given, and external definition if the variable is initialized.

	jim kempf hplabs!kempf

chris@umcp-cs.UUCP (Chris Torek) (07/28/85)

What you have just described is NOT the 4.2 compiler.  (In fact it
sounds alot like the Whitesmiths compiler, but not quite the same.)

Here is your example of something that (you claim) does not work:

>This is where the problem is occuring:
>
>file1.c
>
>int foo;
>
>file2.c
>
>int uses_foo()
>{
>  extern int foo;
>   ...
>}

This code is fine, and works the way you expect (uses_foo() in file2.c
has a variable called "foo" available to it, and it is the same "foo"
as in file1.c).

>If I try to run this, I get various bus errors, etc. (it links OK). 

I would suggest you find the bug in your code, or find out who broke
the compiler and/or linker.

>Furthermore, according to Haberson [sic] & Steele (and I've verified this
>by using the verbose mode of ld), if one defines foo as above, 
>a FORTRAN like COMMON block is allocated for foo, and one could 
>declare foo in file2.c *without* declaring the storage class static
>and *without* declaring foo extern and the linker will *not* make the
>connection between the two.

What "verbose mode of ld"?  Sounds to me like you haven't got a
4.2BSD ld, but some other strange beast.  Also, it's not quite
clear what you are trying to say is happening or is supposed to
happen; but given the following two files, the "foo" in f1() and
the "foo" in f2() are the same variable:

	file1.c:
	int foo;
	f1() { ... }

	file2.c:
	int foo;
	f2() { ... }

In some compilers/linkers, one of the two defintions must be preceded
by "extern" in order to avoid a "multiple definition" linker error
(or more generally, there must be exactly one occurrence of a global
defintion that does not use "extern"); in 4.2BSD Unix (indeed in
every currently available Unix, so far as I know) this is not
necessary.

Returning to "...one could declare foo in file2.c *without* declaring
the storage class static...", if you were to write

	file1.c:
	int foo;
	f1() { ... }

	file2.c:
	f2() { int foo; ... }

you would not be referencing the same piece of storage, since in
this case f2()'s "foo" is an automatic variable, created at function
entry.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

gwyn@BRL.ARPA (VLD/VMB) (07/28/85)

I don't suppose it has occurred to you that if the 4.2BSD C compiler
and linker really had botched the "extern" linkage issue, 4.2BSD
would not run?  The "common" approach to externs is just fine; in
fact Dennis Ritchie has expressed a preference for it over the
"official" (K&R or X3J11) def/ref model.  The way that "ld" handles
named common blocks is, of course, to allocate a single storage
area of size equal to the largest of the blocks having the same
name.  This area can either be explicitly initialized (.data in
effect overrides .lcom) or left uninitialized, in which case the
operating system is obliged to ensure that the area is 0-filled
upon program initialization (on UNIX, the kernel does this upon
exec).

If you are getting core dumps, etc., either your system is
terribly buggy or else you have errors in your code.  In any
case, most releases of UNIX have implemented externs this way.

faustus@ucbcad.UUCP (Wayne A. Christopher) (07/28/85)

> This is where the problem is occuring:
> 
> file1.c
> 
> int foo;
> 
> file2.c
> 
> int uses_foo()
> 
> {
>   extern int foo;
> 
>    ...
> }
> 
> If I try to run this, > I get various bus errors, etc. (it links OK). 

It has nothing to do with your declarations of foo. If this didn't
work, do you think that anything in 4.2 would compile and run?

> The only way to get foo declared as a true external is to include
> an initialization:
> 
> file1.c
> 
> int foo=0;

This is the default.

> My conclusion from all this is that the 4.2 compiler has taken something
> which was fairly easy and intuitive to use and made it so complicated
> that it is no longer useful.

It works the same as it has always worked. You are confused. Stop blaming
the compiler when your code doesn't work.

	Wayne

boyce@daemen.UUCP (Doug Boyce) (07/31/85)

> This is where the problem is occuring:
> 
> file1.c
> 
> int foo;
> 
> file2.c
> 
> int uses_foo()
> 
> {
>   extern int foo;
> 
>    ...
> }
> 
> According to the K&R standard, any references to foo within uses_foo()
> should refer to the external variable in file1.c, and the linker
> should take care of making the connection (unless, of course, there
> is a static within file2.c's file scope which masks the external
> definition). 
> 
> Not according to the 4.2 compiler, however. If I try to run this, 
> I get various bus errors, etc. (it links OK). 
> 

I tried that and didn't come across any problems.


Script started on Tue Jul 30 18:26:02 1985
# cat a.c
int foo;


main()
{
        usefoo();
        foo = 3;
        printf("main: foo: %d\n",foo);
}
# cat b.c
usefoo()
{
        extern  int foo;
        printf("usefoo: foo: %d\n",foo);
}
# cc a.c b.c
a.c:
b.c:
# a.out
usefoo: foo: 0
main: foo: 3
# 
script done on Tue Jul 30 18:26:49 1985