[net.lang.c] Pointers and Arrays

chris@umcp-cs.UUCP (Chris Torek) (06/29/86)

In article <2201@umcp-cs.UUCP> chris@maryland.UUCP (Chris Torek) writes:
>Perhaps I just have an odd mind, but all this pointer/array stuff
>never really bothered me.

Or perhaps I simply read K&R, chapter 5, Pointers and Arrays.  I
needed to refer to K&R recently (see article <2204@umcp-cs.UUCP>),
and while I was looking at it, I just happened to stumble across
some text in this chapter that seems to me quite clear.  Let me
give some excerpts, with commentary.  (Suggestion: while reading
this, imagine me grinning teasingly at points.  I hope the tone
comes across properly, but I have spent enough time revising this
now---great grief, an hour and a half now!)

  It is also necessary to declare the variables that participate
  in all of this:

	int x, y;
	int *px;

  The declaration of x and y is what we've seen all along.  The
  declaration of the pointer px is new.

	int *px;

  is intended as a mnemonic; it says that the combination *px is
  an int, that is, if px occurs in the context *px, it is equivalent
  to a variable of type int.  In effect, the syntax of the declaration
  for a variable mimics the syntax of expressions in which the
  variable might appear.  This reasoning is useful in all cases
  involving complicated declarations.  For example

	double atof(), *dp;

  says that in an expression atof() and *dp have values of type
  double.

So much for understanding declarations.  K&R said it all, eight
years ago.

  ... Any operation which can be acheived by array subscripting
  can also be done with pointers.  The pointer version will in
  general be faster but, at least to the uninitiated, somewhat
  harder to grasp immediately.

K&R seem to have a gift for understatement.

  The correspondence between indexing and pointer arithmetic is
  evidently very close.  In fact, a reference to an array is
  converted by the compiler to a pointer to the beginning of the
  array.  The effect is that an array name *is* a pointer expression.
  ...

(Note `expression', not `variable'.  The above does not apply to
sizeof.)

  There is one difference between an array name and a pointer that
  must be kept in mind.  A pointer is a varible, so pa=a and pa++
  are sensible operations.  But an array name is a *constant*, not
  a variable: constructions like a=pa or a++ or p=&a are illegal.

`p = &a' is much like `p = &3': illegal by fiat, not because it
cannot be done.  If it were legal, `&a' would have type `pointer to
<type of a>' (compare with `a', which has type `pointer to <type of
a[0]>').

  When an array name is passed to a function, what is passed is the
  location of the beginning of the array.  Within the called function,
  this argument is a variable, just like any other variable, and so
  an array name argument is truly a pointer, that is, a variable
  containing an address.  ...

  As formal parameters in a function definition,

	char s[];

  and

	char *s;

  are exactly equivalent; ...

This is all in the context of singly-dimensioned arrays, but with
the proper mindset applies to multi-dimensional arrays without
trouble.  (With the wrong mindset it leads to much confusion.)
K&R will have more to say about this later.

Note that this is where sizeof starts acting odd:  A compiler
treats the following as equivalent:

	   array		  pointer
	   -----		  -------
	f(arr)			f(ap)
	int arr[];		int *ap;
	{			{
		...			...

	f(a2)			f(a2p)
	int a2[][5];		int (*a2p)[5];
	{			{
		...			...

The second equivalent pointer version is neither `int **a2p' nor
`int *a2p'; nor for that matter is it `int *a2p[5]'.  This is
consistent, if (painfully apparently, given recent net.lang.c
articles) confusing.

  5.7  Multi-Dimensional Arrays

  C provides for rectangular multi-dimensional arrays, though in
  practice they tend to be much less used than arrays of pointers. ...

  ... In C, by definition a two-dimensional array is really a one-
  dimensional array, each of whose elements is an array.  Hence
  subscripts are written as

	day_tab[i][j]

  rather than

	day_tab[i, j]

  as in most languages. ...

What they do *not* mention is that day_tab[i,j] is a valid expression,
and tends to surprise people.  Lint does not, unfortunately, warn
about these.

  If a two-dimensional array is to be passed to a function, the
  argument declaration in the function *must* include the column
  dimension; the row dimension is irrelevant, since what is passed
  is, as before, a pointer.

What did I tell you?

Note that this *is* consistent.  One cannot pass an array as an
argument to a function.  Pointers, however, are fine, *including
pointers to arrays*.  Given a two or more dimensional array,
the array `constant' is converted to a pointer to an array of
one fewer dimensions.  This is now a *pointer*, and remains a
pointer until dereferenced.  For example, in

	int day_tab[2][13] = { ... };

the following are type-correct calls:

	f2d(p) int (*p)[13]; { ...  }

	f1d(p) int *p; { ...  }

	proc()
	{
					/* argument types: */
		f2d(day_tab);		/* pointer to array 13 of int */
		f2d(&day_tab[0]);	/* pointer to array 13 of int */

		f1d(day_tab[0]);	/* pointer to int */
		f1d(&day_tab[0][0]);	/* pointer to int */
	}

Calling f2d(&day_tab[0][0]) passes the right *value* but the wrong
*type*.  That it happens to work is not an excuse to do it.  If C 
were different, it would be different, but it is not, so it is not.

To return to K&R:

  5.10 Pointers vs. Multi-dimensional [sic] Arrays

(So they are not consistent with capitalisation in section names.)

  Newcomers to C are sometimes confused about the difference between
  a two-dimensional array and an array of pointers, ...

Ah, a gift indeed.

  Given the declarations

	int a[10][10];
	int *b[10];

  the usage of a and b may be similar, in that a[5][5] and b[5][5]
  are both legal references to a single int.  But a is a true array:
  all 100 storage cells ahve been allocated, and the conventional
  rectangular subscript calculation is done to find any given
  element.  For b, however, the declaration only allocates 10
  pointers; each must be set to point to an array of integers.
  Assuming that each does point to a ten-element array, then there
  will be 100 storage cells set aside, plus the ten cells for the
  pointers.  Thus the array of pointers uses slightly more space,
  and may require an explicit initialization step.  But it has two
  advantages:  accessing an element is done by indirection through
  a pointer rather than by a multiplication and addition, and the
  rows of the array may be of different lengths.  That is, each
  element of b need not point to a ten-element vector; some may
  point to two elements, some to twenty, and some to none at all.

Now for some even more horrid examples of my own, all type-correct:

	/* declare st as array 1 of array 5 of pointer to char */
	char *st[1][5] = { { "fee", "fie", "foo", "fum", "foobar" } };

	/* declare x as pointer to array 5 of pointer to char */
	char *(*x)[5] = st;

	/* declare y as array 1 of array 3 of array 4 of pointer to
	   array 5 of pointer to char */
	char *(*y[1][3][4])[5] = { {
		{ st, 0, 0, st },
		{ 0, st, st, 0 },
		{ 0, 0, st, st }
	} } ;

	/* declare p as array 2 of pointer to array 3 of array 4
	   of pointer to array 5 of pointer to char */
	char *(*(*p[2])[3][4])[5] = { y, 0 };

It does take some trickery to do this.  Given the declaration

	char *strings[5] = { ... };

the type of `strings' is `array 5 of pointer to char', which, when
used in an expression, becomes `pointer to pointer to char' (by
changing the first `array of' to `pointer to'), but for `x' and
`y' I wanted a type of `pointer to array 5 of pointer to char'.
It might be nice if I could write `&strings' to get this, but I
cannot; however, I can use the declaration above for `st' to get
`array 1 of array 5 of pointer to char'.  Changing the first `array
of' yeilds `pointer to array 5 of pointer to char', which was what
I wanted.

Likewise, for `p' I wanted `y' to evaluate to `pointer to array
3 of array 4 of pointer to array 5 of pointer to char'; in order
to get that, I again used a `fake' [1] in the declaration.

	`You can hack anything you want,
	 with pointers and funny C . . .'
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

throopw@dg_rtp.UUCP (Wayne Throop) (07/07/86)

> chris@umcp-cs.UUCP (Chris Torek)

Good article overall.  However, I have a nit to pick with one of the
examples:

>         int day_tab[2][13] = { ... };
>
> the following are type-correct calls:
>
>         f2d(p) int (*p)[13]; { ...  }
>
>         f1d(p) int *p; { ...  }
>
>         proc()
>         {
>                                         /* argument types: */
>                 f2d(day_tab);           /* pointer to array 13 of int */
>                 f2d(&day_tab[0]);       /* pointer to array 13 of int */
>
>                 f1d(day_tab[0]);        /* pointer to int */
>                 f1d(&day_tab[0][0]);    /* pointer to int */
>         }

There is one problem here.  The expression (&day_tab[0]) is illegal.
Given this simplified example:

     1  int aa[2][3];
     2
     3  void f1(pa) int (*pa)[3]; { }
     4
     5  void f2(){
     6      f1(aa);
     7      f1(&aa[0]);
     8  }

lint has this to say (among other things):

        (7)  warning: & before array or function: ignored

Granted, things are very strange here, since the [] operator is always
supposed to yield an lvalue.  However, as Chris pointed out, values of
type ([]) are always coerced to expressions of type (*) in all contexts
except sizeof.  Thus, the subscription yields a (potentially) lvalued
expression of array type, and it is coerced to a non-lvalued pointer
type, and thus the address-of is illegal.  (I think.)

Note well: The problem isn't with the type of aa.  Lint is *not*
complaining about the fact that aa follows the "&".  This peculiarity
arises because (aa[0]) has type (int [3]), and is immediately coerced to
an expression of type (int *) in all contexts except sizeof.  Thus,
while (&aa[0]) is illegal, (&aa[0][0]) is quite legal indeed.  Of
course, the latter operation doesn't yield a pointer of the correct type
to be passed to function f1 above.

--
>       `You can hack anything you want,
>        with pointers and funny C . . .'

Sung to "Alice's Restaurant", I presume?

(We'll wait 'til it comes around, then join in...)
(That's what we're doing now... waiting for it to come around...)
(Here it comes...)

        You can hack anything you want,
        With pointers and funny C.
        You can hack anything you want,
        With pointers and funny C.

        Dive right in, if you feel the need,
        It's a great language but it's gone to seed.
        You can hack anything you want,
        With pointers and funny C (Excepting Ritchie...)

(That was pittiful.)

--
"I wanna *HACK*!  I wanna *HACK*!!!
 I wanna feel *BITS* between my teeth!"
-- 
Wayne Throop      <the-known-world>!mcnc!rti-sel!dg_rtp!throopw

davidsen@steinmetz.UUCP (07/17/86)

After thinking about the discussions about using the address operator
on an array by name, I come to the reluctant conclusiont that it SHOULD
be allowed in the new ANSI standard. I have read an understood the
arguments against it, I have spent hours teaching the language and
convincing students that they should not do it, but now I have to
reluctantly say that there are some fairly good reasons why it should
be allowed.

Reason 1: "codification of existing practice"
  If three ports of SysV, 2 of 4.2BSD and three of the largest selling
C compiler for PCDOS represent current practice, then it is legal. I
admit that a few compiler BITCHED about it, but they compiled it.

Reason 2: "modularity and information hiding"
  If I am writing a modular program in which I have the typedefs in an
include file used by the programmers writing the modules, there is no
way to allow them to take the address of an item which is defined by
typedef unless they know that the item is (or isn't) an array.

Example:
  typedef int PARTA[10];
  typedef struct { int x,y; float t[40]; } PARTB;

in a module...
  PARTA source, dest, *whead, work[2];
  PARTB *head, workb;

If PARTA is an array, I must say:
  whead = &dest[0];
while if it's not, I say:
  whead = &dest;

This means that the beauty of having the content of types changable at
some time is no longer present, and every programmer who works with
them, even is s/he never uses the contents (passes addresses, etc, like
FILES) must know the type.

Reason 3: "common sense"
  After five years of teaching C, I have to agree with my students that
it makes no sense to forbid this construct. To take the address of
something use the address operator. I have seen this mistake made by
students from major universities, and graduates of courses taught by
high priced consultants, so it's not just my students.

Moreover, there is already a major peculiarity in the was array names
are handled, as compared to the way pointers work. This is in the
operation of the sizeof operator, which gives the size of a pointer or
sizeof an entire array.

Conclusion: I don't find this very desirable, I just think that it
makes more sense to allow it that not allow it. Hopefully the next
language will do away with arrays, and eliminate the whole problem :>

-- 
	-bill davidsen

  ihnp4!seismo!rochester!steinmetz!--\
                                       \
                    unirot ------------->---> crdos1!davidsen
                          chinet ------/
         sixhub ---------------------/        (davidsen@ge-crd.ARPA)

"Stupidity, like virtue, is its own reward"

bs@alice.UucP (Bjarne Stroustrup) (07/19/86)

for what it is worth: In C++ it is allowed (and encourraged practice) to use the
addressof operator & explicitly for array and function names.

ark@alice.UucP (Andrew Koenig) (07/19/86)

>   After five years of teaching C, I have to agree with my students that
> it makes no sense to forbid this construct. To take the address of
> something use the address operator. I have seen this mistake made by
> students from major universities, and graduates of courses taught by
> high priced consultants, so it's not just my students.

All right, tell me:  What is the type of the address of an array?

That is, suppose I write:

	int a[10];

What type is &a?  Don't tell me it's "pointer to integer" because
that is the type of &a[0], and a and a[0] are different things.

rgenter@BBN-LABS-B.ARPA (Rick Genter) (07/20/86)

     If a is declared as:

	int	a[10];

then &a is of type "pointer to array [10] of int", or

	(int (*) [10])

in cast terminology.  (Or at least it should be)

(Cdecl is wonderful).
--------
Rick Genter 				BBN Laboratories Inc.
(617) 497-3848				10 Moulton St.  6/512
rgenter@labs-b.bbn.COM  (Internet new)	Cambridge, MA   02238
rgenter@bbn-labs-b.ARPA (Internet old)	linus!rgenter%BBN-LABS-B.ARPA (UUCP)

guy@sun.uucp (Guy Harris) (07/21/86)

> All right, tell me:  What is the type of the address of an array?
> 
> That is, suppose I write:
> 
> 	int a[10];
> 
> What type is &a?  Don't tell me it's "pointer to integer" because
> that is the type of &a[0], and a and a[0] are different things.

No, it's "pointer to array of 10 'int's", a type that is already in C;
consider

	int a[10][10];

and ask what the type of "&a[5]" is.  PCC, at least, even handles this sort
of type correctly; you can declare such a pointer, assign a value to it
(even though it's only possible to construct such a value by taking the
address of a subarray, at least in C as she currently is spoke), and select
an element from the array that it points to by the obvious method.

This brings up an interesting problem.  The ANSI C draft I have (Aug 11, 1985)
says

	C.2.2.1 Arrays, functions, and pointers

	   Except when used as the operand of the "sizeof" operator or
	when a string literal is used to initialize an array of "char"s,
	an expression that has type "array of 'type'" is converted to
	an expression that has type "pointer to 'type'" and that points
	to the initial member of the array object.

This is a generalization of what "The C Reference Manual" says, which is:

	7.1 Primary expressions

	...

	An identifier is a primary expression, provided that it has
	been suitably declared as discussed below.  Its type is specified
	by its declaration.  If the type of the identifier is
	"array of ...", however, then the value of the identifier-expression
	is a pointer to the first object in the array, and the type of
	the expression is "pointer to ...".

This is silent about "array-valued expressions", except that it implies that
the name of an array is not an array-valued expression.  It later (in 8.7
Type names) acknowledges the existence of the type "pointer to array of
...", but doesn't indicate what happens if it encounters an expression of
that type.

The ANSI C statement seems to be the obvious way of correcting this
omission.  However, it now makes it harder to construct a value of this
type.

Neither K&R C nor ANSI C allow you to construct a pointer to an array that
is not a member of another array (if you declare "int a[10]", "&a" is
illegal and "a" is a pointer to the first member of the array, not to the
array itself).  However, K&R C does not explicitly *forbid* putting an "&"
in front of an expression that is a member of an array of arrays.  E.g., if
you declare "int a[10][10]", it doesn't forbid "&a[3]".  (Our PCC, and
probably most, if not all PCCs, *will* complain about this; I don't know if
this is plugging a loophole in the rules, or just an accident of the
implementation.)

ANSI C, however, says that *any* expression of type "array of 'type'" is
converted to a pointer to the first element of that array (hence of type
"pointer to 'type'".  This means that the expression "&a[3]" is invalid,
since "a[3]" is an array-valued expression referring to the fourth member of
"a", and this is converted to a pointer to the first member of the fourth
member of "a"; this expression cannot have its address taken.

You can get a pointer to the *first* member of "a"; the expression "a" is
converted to such a pointer.  You can then get a pointer to other members
with pointer arithmetic; i.e., "a + 1" is a pointer to the second member of
"a" (which is another 10-element array of "int").  Unfortunately, this means
something that works for arrays of types that are not arrays won't work for
arrays of types that are.  If you have "int a[10]", "&a[5]" is a point to
"a"s sixth element; if you have "int a[10][10]", however, "&a[5]" is illegal.

This is a bit of a rough spot in C's type system.  It would be preferable if
the operand of the "&" operator, like the operand of the "sizeof" operator,
were not converted from "array of 'type'" to "pointer to first element of
array of 'type".  If this were the case, "&a" would be legal, regardless of
the type of "a" (except, possibly, if "a" were of type "function returning
'type', and perhaps even that could be allowed).  This would make pointers
to arrays more useful, and would permit a routine that took a pointer to an
array to be written as such, instead of using the subterfuge of declaring
the argument in question to be a pointer to an element of such an array.

One would presumably be allowed to declare a pointer of type "pointer to
array of 'type' of unspecified size", thus permitting a function to take
arrays of arbitrary size as arguments.  The Aug 11, 1985 ANSI C draft seems
to forbid this; an array size specifier "must be present, except that the
first size may be omitted when an array is being declared as a formal
parameter of a function, or when the array declaration has storage-class
specifier 'extern' and the definition that actually allocates storage is
given elsewhere."  One is currently allowed to do so by PCC, at least; K&R
doesn't forbid it, although the only contexts in which it discusses such
array specifiers are the two mentioned by the ANSI C draft.

If "&" is to be changed to work like "sizeof", the rules for type specifiers
should also be changed in this fashion.  Yes, there will be a problem with
pointer arithmetic involving pointers to arrays of unspecified size"; this
will have to be forbidden.  However, ANSI C already has object like this;
consider a pointer to a structure of unspecified shape.  One can declare
such pointers - this is needed to deal with mutually-recursive structures,
where an object of type "struct a" contains a pointer to an object of type
"struct b", and *vice versa* - and the language must somehow forbid pointer
arithmetic on such pointers, at least until the structure's shape is
declared.

If anyone wonders why the type "pointer to array of 'type'" would be useful,
and is not swayed by arguments involving the completeness of type systems or
the relative merits of using "pointer to array of 'type'" to point to
something of type "array of 'type'" rather than using "pointer to 'type'",
consider a program stepping through an array of "vectors", defined as arrays
of three "double"s, computing the norm of each one.  It *could* do so by
stepping an index of integral type, but one reason why C pointers work the
way they do is so you can step through an array by stepping a pointer into
that array!  (These arguments sound somewhat similar to the arguments about
taking the address of a "jmp_buf" using "&", since forbidding "&" to be
applied to an array forces a programmer to know whether "jmp_buf" is
implemented as an array or a structure.  In both cases, one is forced to
treat arrays differently from other sorts of objects, and it seems
unnecessary to require this.)
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

throopw@dg_rtp.UUCP (07/21/86)

> davidsen@steinmetz.UUCP (Davidsen)

> After thinking about the discussions about using the address operator
> on an array by name, I come to the reluctant conclusiont that it SHOULD
> be allowed in the new ANSI standard. [...]
>
> Reason 1: "codification of existing practice"

Well, maybe.  However, this argument means that the standard should say
that the compiler ought to warn about it, yet compile it.  An odd thing
for a standard to say.

> Reason 2: "modularity and information hiding"

Unfortunately this argument doesn't hold up, for two reasons.  First,
&array (when it is allowed) currently most often evaluates to the
address of the first element of the array, not to the address of the
whole array.  Thus, you can't hide the array-ness anyhow, since this is
different than other applications of the & operator.

Second, the inability to hide information (in particular, the allowable
operators for a type) is not unique to arrays in any event.  Integers
can be "+"ed, but not structures, structures can be "."ed, but not
pointers, etc etc etc, and none of this can be hidden.  And even if "&"
were allowed, the assignment would not be.  I'll admit that "&" is a
peculiar operator to not be mostly universal, but I'm still not
convinced that C's peculiar treatment of arrays makes taking their
address sensible.  In effect, it adds yet-another-special-case, rather
than regularizing things.

> Reason 3: "common sense"
>   After five years of teaching C, I have to agree with my students that
> it makes no sense to forbid this construct. To take the address of
> something use the address operator.

I have a great deal of sympathy for this view.  But NOTE WELL, that it
should yield the address of the WHOLE ARRAY, and NOT the address of the
first element of the array.  This is DIFFERENT than current usage.  Note
that it would make

        int actual[10];
        void f(formal) int formal[10];{}
        void g(){ f(&actual); }

type-incorrect, since the formal is expecting type (int *), and gets
type (int (*)[]) instead.

Also note that "to take the address of something, use the address
operator" is overly simplistic, even if arrays could be "&"ed.  There
are many "somethings" that cannot be "&"ed, such as register variables,
bit fields, expressions, and so on.  Arrays happen currently to be one
of these.


To sum up, I wouldn't be absolutely aghast if ANSI legislated that
&array should work.  But NOTE WELL that it would constitute YASC, and it
would be a crime against reason to make it work as it does in some
compilers now, such that &array gives a conceptually different type than
&non_array.  And, on balance, I'd say it isn't really that good an idea.

--
The string is a stark data structure and everywhere it is passed there
is much duplication of process.  It is a perfect vehicle for hiding
information.
                                --- Alan J. Perlis
-- 
Wayne Throop      <the-known-world>!mcnc!rti-sel!dg_rtp!throopw

rbj@icst-cmr (Root Boy Jim) (07/23/86)

	> davidsen@steinmetz.UUCP (Davidsen)
	
	> Reason 2: "modularity and information hiding"
	
	Unfortunately this argument doesn't hold up, for two reasons.  First,
	&array (when it is allowed) currently most often evaluates to the
	address of the first element of the array, not to the address of the
	whole array.  Thus, you can't hide the array-ness anyhow, since this is
	different than other applications of the & operator.

Hopefully, the first element is at the same address as the entire array.
	
	Second, the inability to hide information ...

I pretty much agree, but then we're talking data type. I don't care
what I get back from localtime; I just want to pass it to ctime.
	
	> Reason 3: "common sense"
	> After five years of teaching C, I have to agree with my students that
	> it makes no sense to forbid this construct. To take the address of
	> something use the address operator.
	
	I have a great deal of sympathy for this view.  But NOTE WELL, that it
	should yield the address of the WHOLE ARRAY, and NOT the address of the
	first element of the array.  This is DIFFERENT than current usage.  Note
	that it would make
	
	        int actual[10];
	        void f(formal) int formal[10];{}
	        void g(){ f(&actual); }
	
	type-incorrect, since the formal is expecting type (int *), and gets
	type (int (*)[]) instead.

If you think about it, a pointer to an int can be used (and is) as a pointer
to an array of ints. Unless you apply ++ to it, they are the same thing.
(I can already feel the flames approaching).
	
	Also note that "to take the address of something, use the address
	operator" is overly simplistic, even if arrays could be "&"ed.  There
	are many "somethings" that cannot be "&"ed, such as register variables,
	bit fields, expressions, and so on.  Arrays happen currently to be one
	of these.
	
Taking a larger view, while I think that it's relatively harmless, except
for macros, there's no compelling reason to clamor for this either. On the
other hand, compilers should warn about but ignore the `&' as that's what
the guy probably meant anyhow, and he shouldn't have to recompile just for that.

	To sum up, I wouldn't be absolutely aghast if ANSI legislated that
	&array should work.  But NOTE WELL that it would constitute YASC, and it
	would be a crime against reason to make it work as it does in some
	compilers now, such that &array gives a conceptually different type than
	&non_array.  And, on balance, I'd say it isn't really that good an idea.

Yeah. Who cares?
	
	Wayne Throop      <the-known-world>!mcnc!rti-sel!dg_rtp!throopw

	(Root Boy) Jim Cottrell		<rbj@icst-cmr.arpa>
	I hope I bought the right relish...zzzzzzzzz..."

tainter@ihlpg.UUCP (Tainter) (07/24/86)

> >   After five years of teaching C, I have to agree with my students that
> > it makes no sense to forbid this construct. To take the address of
> > something use the address operator. I have seen this mistake made by
> > students from major universities, and graduates of courses taught by
> > high priced consultants, so it's not just my students.
> 
> All right, tell me:  What is the type of the address of an array?
> 
> That is, suppose I write:
> 
> 	int a[10];
> 
> What type is &a?  Don't tell me it's "pointer to integer" because
> that is the type of &a[0], and a and a[0] are different things.

The answer is : It doesn't have one.  That isn't valid C.  Compilers will
give you warnings about this and interpret it as &a[0], or will give you
an error message (or are broken!).
--j.a.tainter

throopw@dg_rtp.UUCP (Wayne Throop) (07/26/86)

> rbj%icst-cmr@smoke.UUCP ((Root Boy) Jim Cottrell)
>> throopw@dg_rtp.UUCP (Wayne Throop)
>>> davidsen@steinmetz.UUCP (Davidsen)

>>> [arguments for allowing (&array)]
>>> Reason 2: "modularity and information hiding"
>> [Still have problem, since current practice is type-anomalous]
> Hopefully, the first element is at the same address as the entire array.

Yes this is often true, and is necessary in C.  But only if you are
looking at "addresses" as typeless entities.  Which they are not, at
least not in C.

>>> Reason 3: "common sense"
>> [Agreed, but make sure that the type is (int (*)[]), not (int *)]
> If you think about it, a pointer to an int can be used (and is) as a pointer
> to an array of ints. Unless you apply ++ to it, they are the same thing.
> (I can already feel the flames approaching).

I assume Jim really means "unless you apply ++, --, [], *, +, -, +=, or
-=" (unless I'm overlooking one).  That is, unless you use it in
arithmetic, subscripting, or indirection.  Sort of covers what you can
do with a pointer, doesn't it?

Remember, a type is not just an interpretation of a pattern of bits.  It
also has to do with what operations are legal on those bits, and what
their effects are.  Thus, just because a pointer to the first int in an
array has the same bit pattern as a pointer to the whole array does NOT
indicate that they are "the same thing", any more than the fact that an
integer zero and a floating point zero often have the same bit pattern
indicates that these are "the same thing".

--
C types require, when pointers pair,
Conversions which are never there.
They aren't there again today,
Please, Dennis, make them go away.
-- 
Wayne Throop      <the-known-world>!mcnc!rti-sel!dg_rtp!throopw

jsdy@hadron.UUCP (Joseph S. D. Yao) (08/05/86)

I have seen several references to the address of an array vs.
the address of the first element of the array.  Would someone
care to address what they think this difference is, aside from
data type?  I.e., it is clear that the types *int and *(int[])
should be different.  But the values should be the same:
	int countdown[] = { 10, 9, 8, ... };
		gives something like
	_countdown:
	=>	.word 10
		.word 9
		.word 8
		...
The values of both addresses should be the address of the word
'10'.

Well, yes, in some theoretical architectures I've heard tell of
pointers include arbitrary information on e.g. the size of the
object.  Any of these actually implemented?
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}
			jsdy@hadron.COM (not yet domainised)

jon@amdahl.UUCP (Jonathan Leech) (08/07/86)

In article <513@hadron.UUCP>, jsdy@hadron.UUCP (Joseph S. D. Yao) writes:
> I have seen several references to the address of an array vs.
> the address of the first element of the array.  Would someone
> care to address what they think this difference is, aside from
> data type?  I.e., it is clear that the types *int and *(int[])
> should be different.  But the values should be the same:
> 	int countdown[] = { 10, 9, 8, ... };
> 		gives something like
> 	_countdown:
> 	=>	.word 10
> 		.word 9
> 		.word 8
> 		...
> The values of both addresses should be the address of the word
> '10'.
> 
> Well, yes, in some theoretical architectures I've heard tell of
> pointers include arbitrary information on e.g. the size of the
> object.  Any of these actually implemented?

    You could implement pointers as a triple:

    (low address, length, offset of current member)

    for range checking. Doesn't the Symbolics machine do something like
this? I recall a reference in a C compiler manual for the Symbolics but
have never actually used the machine or compiler.

    -- Jon Leech (...seismo!amdahl!jon)
    UTS Products / Amdahl Corporation
    __@/

mouse@mcgill-vision.UUCP (08/09/86)

[ > through >>>> re &array ]
>> If you think about it, a pointer to an int can be used (and is) as a
>> pointer to an array of ints. Unless you apply ++ to it, they are the
>> same thing.  (I can already feel the flames approaching).
> I assume Jim really means "unless you apply ++, --, [], *, +, -, +=, or
> -=" (unless I'm overlooking one).  That is, unless you use it in
> arithmetic, subscripting, or indirection.  Sort of covers what you can
> do with a pointer, doesn't it?

You are overlooking something important.  They are the same thing UNTIL
the size of what the pointer points to becomes important.  These
situations are:
	) ++
	) --
	) [] with a non-zero subscript
	) +
	) -
	) +=
	) -=
As for what they SHOULD be....a pointer to an array should be just
that; indirecting off it should result in an array.  There are good
reasons this isn't done; I have yet to hear an implementation suggested
that doesn't have worse flaws than the flaw currently under discussion.
-- 
					der Mouse

USA: {ihnp4,decvax,akgua,utzoo,etc}!utcsri!mcgill-vision!mouse
     think!mosart!mcgill-vision!mouse
Europe: mcvax!decvax!utcsri!mcgill-vision!mouse
ARPAnet: utcsri!mcgill-vision!mouse@uw-beaver.arpa

"Come with me a few minutes, mortal, and we shall talk."
			- Thanatos (Piers Anthony's Bearing an Hourglass)

throopw@dg_rtp.UUCP (Wayne Throop) (08/11/86)

Several interesting points are raised about just what is the difference
between a pointer to an array, and a pointer to the first element of an
array.

> jsdy@hadron.UUCP (Joseph S. D. Yao)
> I have seen several references to the address of an array vs.
> the address of the first element of the array.  Would someone
> care to address what they think this difference is, aside from
> data type?  I.e., it is clear that the types *int and *(int[])
> should be different.  But the values should be the same:

What is really the case is that the first addressable element of the
array is the same as the first addressable element of the first element
of the array.  (This is not true in all languages, but must be so in C,
unless I'm overlooking something quite tricky.)

But one *MUST* keep in mind that this is *NOT* the same thing as saying
that the bit patterns of these pointers must be the same.  Nor is it
meaningful to say that "they have the same value".  The most one should
say is that they have the same bit-pattern on most common architectures.

Consider an analogous argument.  "I have seen several references to
floating point zero vs integer zero.  It is clear that the types (float)
and (int) should be different.  But the values should be the same."
(And, just as the two pointers above are "really" indicating the same
storage element, the two values are "really" indicating the same point
on the "number line".)

Most people see that this is bogus (I hope). And it is bogus for
*precisely* the same reason that the same statement about pointers is
bogus.  The fact that the bit patterns of these objects are the same on
most common architectures is *irrelevant*.  Because the same operations
performed on these objects give different results, and because of this
the bit patterns are best thought of as having a *different*
*interpretation*.  Given that the interpretation is different, it is at
best meaningless and at worst dangerously misleading to say that they
"have the same value".

(If this is what Joe meant by "only different in datatype", then I agree
 with him.  But I disagree that this is a good phrase to use to describe
 this difference.)

(I agree that there are interesting distinctions between the (float) vs
 (int) case and the (int *) vs (int (*)[]) case.  But these distinctions
 are (I believe) not relevant to this point.)

(And lastly, if I thought "a and &a[0] have the same value" meant
 "((void *)a)==((void *)&a[0]) is true", I'd agree.  But I think
 the meaning of "same value" in common use means more than this,
 so I don't agree.)

> jon@amdahl.UUCP (Jonathan Leech)
>> jsdy@hadron.UUCP (Joseph S. D. Yao)

>> Well, yes, in some theoretical architectures I've heard tell of
>> pointers include arbitrary information on e.g. the size of the
>> object.  Any of these actually implemented?
>     You could implement pointers as a triple:
>     (low address, length, offset of current member)
>     for range checking.

The interesting thing here is that the address of an array and the
address of the first element, under the above scheme, would *still* have
exactly the same bitwise value!

This odd result depends on my interpretation of "length", and is derived
from the fact that the length is used for range checking.  The point
about range checking leads to the conclusion that the length is used to
regulate pointer arithmetic, and is thus *not* the length of the item
the pointer denotes, but rather the length of valid addressing
arithmetic from the "low address".  Now, in the case of an array
declared at top level, this range is the length of the array.  But, in
C, the valid addressing range for the first element of an array declared
at top level is *still* *the* *length* *of* *the* *array*!  Thus, in
terms of the above triple, the address of the array

        char a[10];

ought to be (whatever,10,0), and the address of the first element of
this array ought also to be (whatever,10,0).

> mouse@mcgill-vision.UUCP (der Mouse)
> As for what they SHOULD be....a pointer to an array should be just
> that; indirecting off it should result in an array.  There are good
> reasons this isn't done;

I'm a little confused by this.  On most compilers I've used indirecting
a pointer to an array yields an array, so I'm not sure what is meant by
saying that it "isn't done".  Perhaps it means that this isn't common
usage?  I'll agree with that.  But nevertheless, most implementations
get this particular fine point right.

> I have yet to hear an implementation suggested
> that doesn't have worse flaws than the flaw currently under discussion.

I presume this means that most implementations of C have worse bugs than
that of allowing (&array) and returning the address of the first element
instead.  I won't argue with that.  But what I was objecting to was
elevating this common bug to a "standard feature".  I still think it
would be wrong to do so.  If it means anything (and currently it does
*NOT*), (&array) should indicate the address of the whole array, not the
address of its first element.

--
God made integers, all else is the work of man.
                                --- Leopold Kronecker {1823-1891}
-- 
Wayne Throop      <the-known-world>!mcnc!rti-sel!dg_rtp!throopw

guy@sun.uucp (Guy Harris) (08/11/86)

> As for what they SHOULD be....a pointer to an array should be just
> that; indirecting off it should result in an array.  There are good
> reasons this isn't done; I have yet to hear an implementation suggested
> that doesn't have worse flaws than the flaw currently under discussion.

I suspect the implementors of PCC would be interested in hearing those "good
reasons", since PCC *does* implement the type "pointer to array", and if you
dereference something of that type it does yield an array (which gets
converted to a pointer into its first element, along the lines mentioned in
the ANSI C draft).  In fact, they had no choice *but* to implement that
type, since K&R clearly indicates that "pointer to array" is a valid type
(an example is given of a pointer of that type in the C Reference Manual).
It is a nuisance to generate a *value* of that type to assign to a variable
of that type, but that's another matter.

-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

karl@haddock (08/13/86)

hadron!jsdy writes:
>I have seen several references to the address of an array vs.
>the address of the first element of the array.  Would someone
>care to address what they think this difference is, aside from
>data type?

On most machines, as you imply, &a and &a[0] do indeed have the same
bit-pattern, and will compare equal if you cast them to a common type.
If, however, you want to *do* something with the pointer (*, [], +, -,
++, --, etc.) you'd better have the correct type as well as value.  In
particular, the effect of ++p on an int(*)[] is not the same as on an int*.

Btw, someone suggested earlier that ANSI C doesn't interpret &a as pointer
to array.  I think it does: "The operand of the unary & operator shall be
a function locator or an lvalue [other than bit-field or register].  ...
The result ... is a pointer to the object [and has type] `pointer to _t_y_p_e'."
(3.3.3.2, 01-May-1986 draft.)  Arrays are not mentioned as a special case.
And yes, arrays *are* (non-modifiable) lvalues in X3J11.

Karl W. Z. Heuer (ihnp4!ima!haddock!karl), The Walking Lint

karl@haddock (08/16/86)

dg_rtp!throopw (Wayne Throop) writes:
>What I was objecting to was elevating this common bug [interpreting &a as
>&a[0]] to a "standard feature".  I still think it would be wrong to do so.
>If it means anything (and currently it does *NOT*), (&array) should indicate
>the address of the whole array, not the address of its first element.

As I mentioned before, X3J11 seems to have accepted the "correct" meaning.
In the case of the other "optional ampersand", if f is a function locator,
then "&f" and "f" are equivalent.  I personally think the first is more
meaningful (language purity and all that; "f" should denote the function as a
whole, even if you can't do anything with it other than "&" or "()"), but I
don't use it because lint prefers the second.  I also detest the usage of
"pf()" for "(*pf)()", but X3J11 has blessed this as well.  (In fact, they
defined "()" to *always* operate on a function *pointer* (possibly obtained
from the implied "&" on a function locator), so now the first form is more
"correct"!)

Karl W. Z. Heuer (ihnp4!ima!haddock!karl), The Walking Lint

throopw@dg_rtp.UUCP (Wayne Throop) (08/16/86)

> karl@haddock (Karl W. Z. Heuer)

Ghak!!  It's a good thing I said I wouldn't be aghast if ANSI C made
&array legal, since they already *have* made it legal!  Karl raised a
point:

> Btw, someone suggested earlier that ANSI C doesn't interpret &a as pointer
> to array.  I think it does: "The operand of the unary & operator shall be
> a function locator or an lvalue [other than bit-field or register].  ...
> The result ... is a pointer to the object [and has type] `pointer to type'."
> (3.3.3.2, 01-May-1986 draft.)  Arrays are not mentioned as a special case.
> And yes, arrays *are* (non-modifiable) lvalues in X3J11.

And, looking up the references, I found in addition to the above points
a another that I don't know how I missed before:

    C.2.2.1 Except when used as an operand that may or shall be an
    lvalue, [...] an expression that has type "array of *type*" is
    converted to an expression that has type "pointer to *type*" and
    that points to the initial member of the array object.

Interestingly, the "may or shall be an lvalue" is a little strange.  I
assume that what is meant here is that the conversion is not done iff an
lvalue is required.  (After all, an lvalue *may* be used anywhere a
value may be used, but maybe I'm missing some subtle point.)  Note that
this is *different* than what K&R say.  K&R say that arrays are not
lvalues, and the only anomaly is for "sizeof" (and this anomaly is
listed along with sizeof, not with arrays).

H&S agree with K&R, (on page 97, for example):

    [array of T is converted to pointer to T].  This rule is one of the
    usual unary conversions.  The only exception to the conversion rule
    is when an array identifier is used as an operand of the sizeof
    operator,

So, when ANSI-compliant compilers hit the streets, arrays will be
lvalues, &array will be legal, and it will even behave reasonably.  They
even made the rule for array promotion reasonably simple, though the
"sizeof" is still a separate special case, and is still listed
separately.  And it's still a little difficult to get an array-typed
rvalue, so assignment still doesn't work, even aside from the fact that
ANSI doesn't make array-typed lvalues modifiable.

--
"They couldn't hit an elephant at this dist......"
        --- The last words of General John Sedgwick,
            at the battle of Spotsylvania, 1864
-- 
Wayne Throop      <the-known-world>!mcnc!rti-sel!dg_rtp!throopw

karl@haddock (08/20/86)

dg_rtp!throopw (Wayne Throop) writes:
>And [in ANSI C] it's still a little difficult to get an array-typed
>rvalue, so assignment still doesn't work, even aside from the fact that
>ANSI doesn't make array-typed lvalues modifiable.

I've got some ideas about that, but the first step is to deprecate the
"feature" that allows you to write "f(int a[])" for "f(int *a)".  (I refer
here to the declaration of the function, not its call.)  In my mind, since
arrays may not currently be passed as arguments, the declaration is an error,
and the compiler is "politely" figuring out what you must have *meant*.  As
has already been pointed out, "sizeof(a)" gives you "sizeof(int *)" in this
context, so the apparent acceptance of the declaration tends to be confusing.

Karl W. Z. Heuer (ihnp4!ima!haddock!karl), The Walking Lint