[comp.lang.c] extern char *foo vs. extern char foo

unhd (Roger Gonzalez ) (05/30/90)

A little history: I know C inside and out, and don't need any "I think"s.
I have been aware of this "feature" for quite a while, and know how to
program around it.  I am posting this to find out *why* things are done this
way, since it seems to violate what K&R say. 

According to K&R, there should be no difference between the two extern
defs in my subject header.  In fact, the second form (char foo[]) should
get translated into the first.  Unfortunately, it doesn't.  Here's a
more detailed example of what I mean:

FILE #1:

     char hello[100];
     main()
     {
         strcpy(hello, "Hello, world.\n");
         printf("In main(), hello is at %06X\n", hello);
         foo();
     }

FILE #2:

     extern char *hello;
     foo()
     {
         printf("In foo(), hello is at %06X\n", hello);
     }

Now, theoretically, since C is not strongly typed, the two values printed
should be the same.  They aren't.  Try it if you don't believe me.  If
you change file #2 so that the extern declaration is

     extern hello[];

it works.  It also will work properly if in file #1, hello is declared
to be a pointer to char, and is malloc-ated to 100 bytes.  Well, I poked
around a little further; I stopped the output at the assembler stage, and
looked at the .s files for file #2 (using the first method of correction,
namely declaring extern char hello[]), did a diff, and guess what I found?

>
mov hello, (sp%)              I know, I know. This isn't *quite* what I
<                             found, but I'm typing it from memory, and its
mov &hello, (sp%)             close enough for government work.

Aha!  Zees eez very wrong!

This is the same output I got on 3 different machines, running Unisoft
and Green Hills compilers.  So.  Why are my compilers so stupid?  Do others
behave the same way, or am I just unfortunate enough to have lousy software?
If this is more widespread, WHY?  Is this a manifestation of some other
C rule that takes precedence over what the "proper" behavior?  Please clue
me in.  Please.


-Roger
-- 
UUCP:   ..!uunet!unhd!rg      | USPS: Marine Systems Engineering Laboratory
BITNET: r_gonzalez at unhh    |       University of New Hampshire
PHONE:  (603) 862-4600        |       Marine Programs Building
FAX:    (603) 862-4399        |       Durham, NH  03824-3525

dkeisen@Gang-of-Four.Stanford.EDU (Dave Eisen) (05/30/90)

In article <1990May30.001219.23564@uunet!unhd> rg@unhd.unh.edu.UUCP (Roger Gonzalez ) writes:
>
>According to K&R, there should be no difference between the two extern
>defs in my subject header.  In fact, the second form (char foo[]) should
>get translated into the first.  Unfortunately, it doesn't.  Here's a
>more detailed example of what I mean:
>
>FILE #1:
>
>     char hello[100];
>FILE #2:
>
>     extern char *hello;


No, these are not the same thing. char [] and char * are two different
types and funny things happen when you declare variables incorrectly.

An array is treated as a pointer to the first element when used in an 
expression or when passed as a function parameter, this does not
mean that an array and a pointer are the same thing. And (as you pointed
out) you cannot declare a variable as one and hope to get the other.


--
Dave Eisen                      	    Home: (415) 323-9757
dkeisen@Gang-of-Four.Stanford.EDU           Office: (415) 967-5644
1447 N. Shoreline Blvd.
Mountain View, CA 94043

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (05/30/90)

In article <1990May30.001219.23564@uunet!unhd>, rg@uunet!unhd (Roger Gonzalez ) writes:
> According to K&R, there should be no difference between
  [	extern char *foo;
    and	extern char foo[];
  ]
> In fact, the second form (char foo[]) should get translated into the first.

No, it shouldn't, and it hasn't done since UNIX V6 at least.

Given	extern char *foo;
foo is a variable of type pointer-to-char with external linkage.
Foo's type is complete, (sizeof foo) is allowed.  You can assign to foo:
	foo = NULL;
and it is possible that
	(foo == NULL)
might be true.

Given	extern char baz[];
baz is a variable of type array-UNKNOWN-of-char with external linkage.
This is an "incomplete" type, you can't take (sizeof baz).
You cannot assign to foo, and although "foo" will decay into a pointer
in most contexts, it is not possible for
	(baz == NULL)
ever to be true.

Now, an 'extern' declaration is supposed to match up with one definition.
	extern char *foo;	matches		char *foo;
				DOES NOT MATCH	char foo[10];

	extern char baz[];	matches		char baz[10];
				DOES NOT MATCH	char *baz;

> This is the same output I got on 3 different machines, running Unisoft
> and Green Hills compilers.  So.  Why are my compilers so stupid?

Because they got it right.  The two forms of extern declaration are not
and were not equivalent.
-- 
"A 7th class of programs, correct in every way, is believed to exist by a
few computer scientists.  However, no example could be found to include here."

diamond@tkou02.enet.dec.com (diamond@tkovoa) (05/30/90)

In article <1990May30.001219.23564@uunet!unhd> rg@unhd.unh.edu.UUCP (Roger Gonzalez ) writes:
>A little history: I know C inside and out, and don't need any "I think"s.
This is a lie.
flame = m * pow(c, 2.0);

>According to K&R, there should be no difference between the two extern
>defs in my subject header.  In fact, the second form (char foo[]) should
>get translated into the first.
You mean declarations, not defs.  Let's assume the correction.  Now:
Where does K&R say that a declaration of anything other than a parameter
gets translated in this manner?

>Unfortunately, it doesn't.
Bingo!  This time, you don't need an "I think".  Indeed, for starters,
an array of 100 characters has size 100, while a pointer to a character
has the size that a pointer-to-char has.  Therefore the two external
objects cannot even begin to be the same.

>I stopped the output at the assembler stage, and
>mov hello, (sp%)
>mov &hello, (sp%)
>Aha!  Zees eez very wrong!
No.  It is very right.

>This is the same output I got on 3 different machines, running Unisoft
>and Green Hills compilers.  So.  Why are my compilers so stupid?
Usually it's because they do what you tell them.

>Do others behave the same way,
Mostly.  Many compilers have bugs, but most of them will get this right.

>or am I just unfortunate enough to have lousy software?
I guess so, if your brain is soft.

>If this is more widespread, WHY?  Is this a manifestation of some other
>C rule that takes precedence over what the "proper" behavior?
Well, there is a C rule that says that a pointer is a pointer.  I don't
think this is ever overridden.
Another C rule says that an array is an array.  This is overridden for
declarations of formal parameters, and for rvalues in executable
expressions, neither of which is the case in your example.
Yeah, I guess C rules take precedence over your imagination.
Common sense also takes precedence over your imagination.
"Proper" behavior is to ignore your coding, ignore the language you're
programming in, and just guess what results you wanted, eh?

m = 0; exit(UINT_MAX);
-- 
Norman Diamond, Nihon DEC     diamond@tkou02.enet.dec.com
Proposed group comp.networks.load-reduction:  send your "yes" vote to /dev/null.

steve@taumet.COM (Stephen Clamage) (05/30/90)

In article <1990May30.001219.23564@uunet!unhd> rg@unhd.unh.edu.UUCP (Roger Gonzalez ) writes:
>
>A little history: I know C inside and out, and don't need any "I think"s.

Evidently not.

>According to K&R, there should be no difference between the two extern
>defs in my subject header.  In fact, the second form (char foo[]) should
>get translated into the first.

Not true.

>     char hello[100];
>     extern hello[];
In both declarations, hello means the address of an array.
hello is a (link-time) constant and may not be assigned to.

>     extern char *hello;
hello means the address of a variable which *contains* the address of an
array (or of a variable, which would be an array of size 1).
hello may be assigned to, so as to point to another array.

>So.  Why are my compilers so stupid?  Do others
>behave the same way, or am I just unfortunate enough to have lousy software?
Other compilers better behave the same way, since the language definition
requires it.

It is a common misunderstanding that arrays and pointers are in some sense
"the same thing".  This is one case where they are not the same thing.  The
difference is discussed in Chapter 5 of old original K&R, and this difference
has not changed.  It is still a difference in ANSI C.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

pejn@wolfen.cc.uow.oz (Paul Nulsen) (05/31/90)

In article <1990May30.001219.23564@uunet!unhd> rg@unhd.unh.edu.UUCP (Roger Gonzalez ) writes:
>According to K&R, there should be no difference between the two extern
>defs in my subject header. ... Here's a
>more detailed example of what I mean:
>
>FILE #1:
>
>     char hello[100];
>     main()
>     {
>         strcpy(hello, "Hello, world.\n");
>         printf("In main(), hello is at %06X\n", hello);
>         foo();
>     }
>
>FILE #2:
>
>     extern char *hello;
>     foo()
>     {
>         printf("In foo(), hello is at %06X\n", hello);
>     }
>
>... I stopped the output at the assembler stage, and
>looked at the .s files for file #2 (using the first method of correction,
>namely declaring extern char hello[]), did a diff, and guess what I found?
>
>>
>mov hello, (sp%)              I know, I know. This isn't *quite* what I
><                             found, but I'm typing it from memory, and its
>mov &hello, (sp%)             close enough for government work.
>
>Aha!  Zees eez very wrong!

The declaration:

	char *hello;

tells the compiler to reserve storage for a one pointer to char. It also
announces that that (global) storage space will be referred to by the name
hello. The declaration:

	extern char *hello;

announces that this has been done somewhere else your program.

The statement:

	char hello[100];

tells the compiler to reserve storage for 100 chars and that this storage is
to be referred to by the name hello. If you inspect the assembler output you
will see that this is indeed how it has interpreted your program. The two
are not equivalent.

The confusion over the difference between char * and char [] is very common.
I believe that it is due to the way that C handles the passing of array
arguments to functions. As rule C passes arguments by value, but it passes
an array argument by reference. This means that passing a char [] to a
function results in the creation of a char * on the stack. As a result,
the programmer is permitted to refer to passed array as either char * or
char [], and the two types of reference are treated in much the same way
within the function.

Another anomaly that we get from this is illustrated by the fragment:

char *pointer_array[] = {"One", "Two", "Three"};
char two_d_array[3][10];

main ()
{
	int i,j;
	for (i = 0; i < 3; i++) {
		strcpy (two_d_array[i], pointer_array[i]);
	}

/* Now (with i,j in bounds) pointer_array[i][j] == two_d_array[i][j] */
/* but these two references are quite different and generate different */
/* code */

	...
}

cliffhanger@cup.portal.com (Cliff C Heyer) (05/31/90)

I've noticed this too!! But instead of
investigating, I changed my declarations
as you did to make it work.

As usual with these things, I'm sure there
is a simple explanation....
Cliff

mark@isi.UUCP (Mark Bailey) (05/31/90)

In article <1990May30.001219.23564@uunet!unhd>, rg@uunet!unhd (Roger Gonzalez ) writes:
> 
> According to K&R, there should be no difference between the two extern
> defs in my subject header.  In fact, the second form (char foo[]) should
> get translated into the first.  Unfortunately, it doesn't.
> FILE #1:
> 
>      char hello[100];
> 
> FILE #2:
> 
>      extern char *hello;
> 


In one case, you are allocating an array of 100 characters.  The name of the
array refers to the base address of the array:  (K&R 1, p. 24.)

     When the name of an array is used as an argument, the value passed
     to the function is actually the location or address of the
     beginning of the array.

In the other case, you are passing the value of an unitialized character
pointer.  The result of looking at this is machine/compiler dependent.

It is the declarations:
     char     foo[];  and
     char     *foo;  

which are identical.  (K&R 1, p. 95)

     As formal parameters in a function definition,
               char    s[];
                                    and
               char   *s;
     are exactly equivalent;  which one should be written is determined
     largely by how expressions will be written in the function.

The reference manual appears to allow such a declaration which is not
used as a formal parameter (K&R 1, p. 194).

Mark Bailey                                 (I didn't really say this.)
via:  ...!uunet!pyrdc!isi!mark              ------Have a  8-|  day!!!!!
-- 
Mark Bailey                                 (I didn't really say this.)
via:  ...!uunet!pyrdc!isi!mark              ------Have a  8-|  day!!!!!

rjw@atti07.ATT.COM (Ralph J. Winslow x7774) (06/01/90)

In article <1755@tkou02.enet.dec.com> diamond@tkou02.enet.dec.com (diamond@tkovoa) writes:
>In article <1990May30.001219.23564@uunet!unhd> rg@unhd.unh.edu.UUCP (Roger Gonzalez ) writes:
>>A little history: I know C inside and out, and don't need any "I think"s.
>This is a lie.
>flame = m * pow(c, 2.0);
>-- 
>Norman Diamond, Nihon DEC     diamond@tkou02.enet.dec.com
>Proposed group comp.networks.load-reduction: send your "yes" vote to /dev/null.

Norman, you weren't very kind to this poor soul, but I guess when you lead
of a question usually expected in the first 2 days of an introductory C
course with "I know C inside out" you really can't expect kid glove treatment.
This note is really just to express my appreciation for your .sig. Loverly!

----------
			Ralph Winslow
			ulysses!attibr!atti07!rjw

karl@haddock.ima.isc.com (Karl Heuer) (06/05/90)

In article <6263@wolfen.cc.uow.oz> pejn@wolfen.UUCP (Paul Nulsen) writes:
>The confusion over the difference between char * and char [] is very common.
>I believe that it is due to the way that C handles the passing of array
>arguments to functions. As rule C passes arguments by value, but it passes
>an array argument by reference.

Actually, function arguments in C are *always* passed by value, but sometimes
the value in question is a pointer, which can give the same result as a call
by reference.  Thus `void f(int *pi); ... f(&i);' can be considered either a
pointer-to-int passed by value, or an int passed by reference.

In the case of an array, strictly speaking, a true call by reference in this
sense would be `void f(int (*pa)[N]); ... f(&a);', where the user explicitly
passes the address of the array (and the caller expects it).  The more common
usage `void f(int *pi); ... f(a);' is not an array passed by reference.  Also,
this effect is not in any way an exception to the usual rules.  In *any*
rvalue context, an array-valued expression `a' decays into the pointer-valued
expression `&a[0]'.

The wart in the language is not that `arrays are passed by reference and
scalars are passed by value', but rather that, in the one special case of
declaring a formal parameter to a function, the language permits the user to
write a pointer declaration using brackets instead of an asterisk.  This
often misleads people into believing that they've declared an array, when in
fact they've declared a pointer.  This is probably the cause of the original
confusion in this thread.

Karl W. Z. Heuer (karl@ima.ima.isc.com or harvard!ima!karl), The Walking Lint

pejn@wolfen.cc.uow.oz (Paul Nulsen) (06/06/90)

In article <16788@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes:
>
>Actually, function arguments in C are *always* passed by value ...
>
>In the case of an array, strictly speaking, a true call by reference in this
>sense would be `void f(int (*pa)[N]); ... f(&a);', where the user explicitly
>passes the address of the array (and the caller expects it)...

My argument is not that there is internal inconsistency in the treatment of
arrays by C. The inconsistency is between the treatment of arrays and other
types. If the syntax of C was entirely consistent an `a' in a
function argument would push the entire array onto the stack. For example,
	int a[10];
	printf ("%d\n", sizeof(a));
produces 10 * sizeof(int). For any other type (including struct's) the thing
pushed onto the stack is the value of the entire entity, using a space equal
to sizeof (entity). This is not meant to be a criticism of C, just an
observation about the source of confusion.

You have produced yet another illustration of the confusion caused by the
handling of arrays. C compilers will ignore the & in f(&a). This is because,
within the scope of the array definition, there is no data location where
the address of a is kept. If the compiler was pedantic (and did not concede
some inconsistency) it should reject this as a syntax error.

steve@taumet.com (Stephen Clamage) (06/06/90)

In article <6273@wolfen.cc.uow.oz> pejn@wolfen.cc.uow.edu.au (Paul Nulsen) writes:
>My argument is not that there is internal inconsistency in the treatment of
>arrays by C.

Ah, but there is an internal inconsistency.  The problem is that in C,
arrays are not first-class data types.  As you note, there are
contexts in which a mention of an array is converted to a pointer to
its first element, and contexts in which this does not occur.  By me,
that's inconsistent, and is also the source of all the confusion.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

karl@haddock.ima.isc.com (Karl Heuer) (06/06/90)

In article <6273@wolfen.cc.uow.oz> pejn@wolfen.cc.uow.edu.au (Paul Nulsen) writes:
>If the syntax of C was entirely consistent an [array] `a' in a function
>argument would push the entire array onto the stack.

Yes.  I recently discussed this proposed extension here.

>You have produced yet another illustration of the confusion caused by the
>handling of arrays.  C compilers will ignore the & in f(&a).

This is true only in Classic C, and only for certain compilers.  (Technically
it was illegal to apply `&' to an array, but these compilers would treat it as
a warning only.)

>This is because, within the scope of the array definition, there is no data
>location where the address of a is kept.

No, it's because there are a couple of lines in the compiler source that go to
extra trouble to forbid the construct.  There's no logical reason not to allow
it; ANSI compilers *must* allow it; and it can be fixed in pcc by simply
deleting those two lines in the compiler source.

Note that if `a' has type `int [3]', then `&a' has type `int (*)[3]' (pointer
to array), whereas `&a[0]' (the value to which the expression `a' decays when
used in an rvalue context) has type `int *' (pointer to int).  Part of the
confusion here is that it's very common for someone to say `pointer to array'
when they really mean `pointer to element of array'.  (Chris Torek uses
`pointer *at* array' for this latter construct; I prefer `pointer *into*
array').

The only sense in which `a' and `&a' are the same is that they will compare
equal if they are converted to a common type.  (This is also true of `&s' and
`&s.firstmember' for a struct `s'.)  To demonstrate that they are not
equivalent, try the enclosed program.

Karl W. Z. Heuer (karl@ima.ima.isc.com or harvard!ima!karl), The Walking Lint
--------cut here--------
#include <stdio.h>
#define p(x) printf("%lu %lu\n", (unsigned long)x, (unsigned long)(x+1))
int main() {
    int a[3][5];
    p(a); p(a[0]); p(&a[0]); p(&a[0][0]);
    return 0;
}

karl@haddock.ima.isc.com (Karl Heuer) (06/08/90)

In article <586@isi.UUCP> mark@isi.UUCP (Mark Bailey) writes:
>It is the declarations:
>     char     foo[];  and
>     char     *foo;
>which are identical.  (K&R 1, p. 95)

This is true *only* for the declaration of a pointer as a formal parameter,
as the K&R quote says.

>The reference manual appears to allow such a declaration which is not
>used as a formal parameter (K&R 1, p. 194).

Yes, but then they're no longer identical.  In this context `char foo[];'
declares an array (and it had better be an extern if you're leaving out the
size), while `char *foo' declares a pointer.  See the FAQ list.

Karl W. Z. Heuer (karl@ima.ima.isc.com or harvard!ima!karl), The Walking Lint

reso%sevihc@sharkey.cc.umich.edu (Dennis Reso) (06/13/90)

I found it much easier to deal with pointers and "arrays" when I
stopped looking at "[]" as meaning "array", and treated it like
the postfix "offset from pointer" operator. Thus the notation
"int a[5][10]" rather than "int a[5,10]", where "[10]" operates
on the expression "a[5]" which yeilds a pointer. Think of it as
just another shorthand notation:

                "a[5][10]"      == "*(*(a+5) + 10)"
            as  "record->field" == "(*record).field"

The confusion is not from mishandling of "arrays" by not passing
their entirety in pass-by-value situations:

            int a[10];
            foo(a);

but in the use of "[10]" to inform the compiler to set aside
storage in the first place.  This is yet another shorthand
notatation that perpetuates the concept of "arrays" as they
exist in other languages.  Don't get me wrong...I appreciate
the existance of this shorthand. I would hate to declare
pointers and malloc() everything.


The reason you can't take the address of a pointer as in
"foo(&a)"  (or say "a++"), is because due to its declaration,
you did not request a storage location to hold the address as
in "int *a;".  The compiler is then free to replace all
occurrences of "a" with *expressions* relative to the stack
pointer for automatic variables, or the beginning of global 
storage for globals.  (I wouldn't say "&(3+5)", either - do
some compilers do something useful with this?)

Separate the concepts of data storage and the storage of a
pointer to data storage and the confusion disappears.

________________________________________________________________________
Dennis Reso
At home in Ypsilanti, MI USA       {sharkey|itivax}!sevihc!reso
Ford Motor Company, Dearborn       reso@pms415.ford.com  [128.5.220.115]

r91400@memqa.uucp (06/13/90)

In article <1990Jun13.030910.11593@sharkey.cc.umich.edu%sevihc>, reso%sevihc@sharkey.cc.umich.edu (Dennis Reso) writes:
> I found it much easier to deal with pointers and "arrays" when I
> stopped looking at "[]" as meaning "array", and treated it like
> the postfix "offset from pointer" operator. Thus the notation
> "int a[5][10]" rather than "int a[5,10]", where "[10]" operates
> on the expression "a[5]" which yeilds a pointer. Think of it as
> just another shorthand notation:
> 
>                 "a[5][10]"      == "*(*(a+5) + 10)"
>             as  "record->field" == "(*record).field"

I thought that, in this case, a[5] does not yield a pointer, but rather
the fifth element in the array.  Is that not why people prefer sometimes
to create their arrays dynamically, so that a[5] WOULD return a pointer?

...something like
	int *a[6],b[120];
	a[0]=&b[0]; a[1]=&b[20]; a[2]=&b[40]; 
	a[3]=&b[60]; a[4]=&b[80]; a[5]=&b[100];
Indeed, I wouldn't do it quite this way (it may not even be syntactically
correct), but it is an illustration of what I mean.  This construction
does support the syntax a[5][10], and a[5] does return a pointer, and
it saves on a multiplication.  Simply declaring int a[6][20] would not
do this.

So, while the illustration he gave is insightful, C doesn't do things
that way unless you tell it to.

Michael C. Grant
Motorola Inc., Austin, Texas
(Motorola does not care what I think.)

karl@haddock.ima.isc.com (Karl Heuer) (06/14/90)

In article <1990Jun13.030910.11593@sharkey.cc.umich.edu%sevihc> reso%sevihc@sharkey.cc.umich.edu (Dennis Reso) writes:
>The reason you can't take the address of [an array] as in "foo(&a)" [is that]
>you did not request a storage location to hold the address...

Wrong.  I just finished debunking this myth; see <16804@haddock.ima.isc.com>
earlier in this same thread.

Karl W. Z. Heuer (karl@ima.ima.isc.com or harvard!ima!karl), The Walking Lint