[net.lang.c] extern declaration inconsistency

gary@mit-eddie.UUCP (Gary Samad) (04/04/84)

I spent hours debugging this--anyone know why it is a problem?
Is it a 'feature' or is it a 'bug'?

in file foo.c:
	char ch[32];
	foo()
	{
	    strcpy(ch,"string");
	    printf("&ch=%x\n",ch);
	}

in file ref.c:
	extern char *ch;
	ref()
	{
	    printf("&ch=%x\n",ch);
	    printf(ch);
	}

in file main.c:
	main()
	{
	    foo();
	    ref();
	}

The program prints:
	&ch=2eb4
	&ch=64abc
	Segmentation fault (core dumped)

(The addresses are aproximately correct)
The compiler didn't resolve the extern char correctly!
Replacing the 'extern char *ch' with 'extern char ch[]' fixes it!

Does anyone know why?
By the way, this is with the 4.1BSD compiler running under Eunice.

		Gary Samad
		decvax!genrad!mit-eddie!gary

gwyn@brl-vgr.ARPA (Doug Gwyn ) (04/04/84)

The compiler is just fine.  You are confusing a char[] with a char *.
They are NOT the same thing (try printing their sizeofs, for example).

P.S.  "lint" would have caught this.
P.P.S.  You shouldn't print a (char *) with an int format specifier
unless you cast the (char *) to an int.  "lint" will NOT catch this,
although "Safe C" is supposed to.

ark@rabbit.UUCP (Andrew Koenig) (04/04/84)

If, in one file, you say

	char ch[32];

and in another file, you say

	char *ch;

then your program won't work.  Reason:  in the first file you have
asked for memory to be associated with the external name "ch" that
should contain a 32-character array, and in the second you have asked
for the same memory to be associated with a character pointer.  In
those implementations which I am familiar, the first four (or two)
characters in the array will be interpreted as the address pointed to
by the pointer.

Arrays and pointers are simply different, though they can be used
interchangably in a few contexts.

jas@drutx.UUCP (04/04/84)

To paraphrase the question:

     Defining a global array "char ch[ 32 ];" in one file and
     declaring it externally as "extern char *ch" in another
     file causes bad craziness ("core dump").  Is this a bug
     or a feature?

It is most emphatically a feature.  An array of characters is not
the same thing as a character pointer.  Lying to the compiler about
the type of an external variable will result in severe retribution.

To wit:  a reference to ch in the file in which it was defined as
an array of 32 chars will be automatically dereferenced by the compiler,
i.e., converted to the address of the first element of the array, because 
arrays are automatically dereferenced when they appear in an expression.
A reference to ch in a file in which it was declared as "extern char *"
will cause the compiler to issue code retrieving the POINTER VALUE STORED
AT THAT LOCATION, i.e., to take the first several chars in the array
("several" usually = 2 or 4, depending on the machine), and interpret
them as a pointer to a character somewhere.  Interpreting the CONTENTS
of "Hi, Mom!" as a character pointer will usually make you point at 
something you later wish you hadn't pointed at.

Jim Shankland
..!ihnp4!druxy!jas

ks@ecn-ee.UUCP (04/05/84)

#R:mit-eddi:-153800:ecn-ee:13100011:000:1241
ecn-ee!ks    Apr  4 15:58:00 1984

There is an important distinction between the following:

	extern char ch[];
	extern char *ch;

ch[] indicates that you have reserved space elswhere for some number of
characters and you can use ch as the address of the first reserved space.
*ch indicates that you reserved space for a pointer to some characters
which may or may not contain a valid value such that it points to some real
space that is holding some characters.  In the first case, ch is a "constant",
and in the second case, ch is a variable.

The confusion persists because when either form of ch is passed to a function
as a parameter, it is passed by value.  (The value of an array is the address
of it's first element.)  All function parameters can be modified as if they
were automatic variables, so both forms are equivalent only for function
parameter declarations.

In my opinion, this is a very elegant way of doing things, even if it
is confusing at first.

The REAL problem is that many C loaders do not flag the extern declaration
as an error.  They just load incorrect code.  So, the moral of the story is:

			>>>>	USE LINT    <<<<

The compiler is not meant to check for every little inconsistency.
That is why lint is around.

					Kirk Smith
					Purdue EE

robert@erix.UUCP (Robert Virding) (04/06/84)

This is one instance when a pointer is not the same as an array. When the
compiler sees

extern char *ch;

it assumes that ch is a pointer to a string of char. However when the
compiler sees

extern char ch[];

it assumes that ch IS the actual string, not a pointer. The difference in
the code generated is how it actually references the external variable ch.

I have also come across this feature in writing programs.

			Robert Virding  @ L M Ericsson, Stockholm

colonel@sunybcs.UUCP (George Sicherman) (04/09/84)

[tail +2]

It's just what you'd expect.  The extern char *c expects to find
a character address stored in a global word.  Instead, it finds
an array of bytes.  If you call it extern char c[] the problem
should go away.

	Col. G. L. Sicherman
...seismo!rochester!rocksvax!sunybcs!colonel

john@edai.UUCP (John Hallam) (04/16/84)

Article 185 refers to the problems that arise when a name is declared

char ch[...];		/* in the first file */

extern char *ch;	/* in another file */

It appears that the name is not correctly linked.

------------------

The answer to the question raised is that it is not a compiler bug, but it
might be called a feature depending on your definition of those. It is a
part of the language definition!

For those who knwow about l-values and r-values the explanation is this:

In C, array names (and function names) denote an r-value CONSTANT which is
determined at link editing time; most other variable names denote l-values
and implicit contents coercion is done when necessary.  Thus in the above,
the name 'ch' is first defined as an array (I use first in the sense that this
declaration actually allocates space for the array) and denotes the address
of the storage in which the characters will go.  The second declaration informs
the compiler that 'ch' is an l-value, i.e. the name now denotes the address
of storage in which a pointer value can be put.  Thus accessing 'ch' under the
second declaration actually gives the value of the first word of the array!

The problem encountered here is more faulty explanation in K&R of identifier
semantics than anything else.  In an r-value context (when used in expressions)
the two declarations give the same TYPE, but the pointer declaration implies a
contents coercion (which in this case fetches the first word of the array)
and the array declaration implies no contents coercion (because the name 
already denotes an r-value). Conversely, you can assign to the pointer declared
name 'ch', because it is an l-value (this is just what l-value means -- can
stand on the left of assignments), but if you try to assign to the array
declared name 'ch' you'll get a message 'Lvalue required' or something like it
from the compiler.

I hope this makes things a little clearer.

		John Hallam.
			(edai!john).