[net.lang.c] Pointers and arrays explained at length

msb@lsuc.UUCP (Mark Brader) (02/18/85)
I post this article with a certain reluctance; this whole matter is
basically a Frequently Asked Question, like the matter of (char *)NULL
which I discoursed upon not too long ago.  But there seem to have been
enough confused items posted lately that I think it's worth saying
all of this once again.

I will start with the very basic stuff, and work up from there.
In order to provide an "escape" from C terminology, I will assume that
a "long" occupies 4 bytes while a pointer occupies 2 bytes, and I will give
actual addresses in decimal.

Okay, now suppose you're declaring an external variable.

	long AA;	/* This reserves 4 bytes, say locations 1000-1003. */
	long *BB;	/* This reserves 2 bytes, 1004-1005. */
	long CC[5];	/* This reserves 20 bytes, 1006-1025. */
	long DD[] = { 1L, 2L, 3L, 4L, 5L };
			/* This also reserves 20 bytes, 1026-1045. */
	long EE[];	/* This doesn't reserve any bytes at all. */

Now, the next thing to remember is that although CC names an array,
the TYPE of the expression CC is pointer, or to be specific, long *.

If that seems peculiar, remember that a subscript expression, say CC[1],
BY DEFINITION means *(CC+1).  Now the operand of * is obviously a pointer;
if *(CC+1) is a long, then (CC+1) must be a pointer to a long, i.e. a long *.
But the only way we can add two things to get a pointer is if one of them
is a pointer and the other is some type of integer.  1 is an integer, so
CC must be a pointer.

In fact, the expression CC is a CONSTANT of type long *, just the same as &AA
is.  Just as &AA is a constant equal to 1000, so CC is a constant equal to
1006.  There is no memory allocated to store this number 1006.  If you say

	BB = CC;

then the value 1006 gets stored at location 1004.  Now you can write the
expression BB[1] and it will refer to the same memory location as CC[1],
i.e., 1010-1013.  But notice that the actual code used to evaluate BB[1]
and CC[1] is different.  Evaluating CC[1] goes like this:  1 is multiplied
by sizeof long (i.e. 4) and added to the constant CC, which is 1006, giving
1010, and a long is read from that location.  On the other hand, BB[1]
works this way:  1 is multiplied by 4 as before, then a value is read
from location &BB, which is 1004; the value is 1006, and this is added to
the 4 giving 1010, and a long is read from that location.
Simple, right?

Now the critical thing that confuses a lot of people is that some
of the declaration forms above don't always mean the same thing.

IF THE VARIABLES ARE FORMAL PARAMETERS OF A FUNCTION,

afunctn (AAA, BBB, CCC, DDD, EEE)
long AAA;	/* This still reserves 4 bytes, say SP+10 to SP+13. */
long *BBB;	/* This still reserves 2 bytes, SP+14 to SP+15. */
long CCC[5];	/* This ALSO reserves only 2 bytes, SP+16 to SP+17. */
long DDD[] = { 1L, 2L, 3L, 4L, 5L };
		/* This is illegal: you can't initialize a parameter. */
long EEE[];	/* This also reserves 2 bytes. */
	{
	function body
	}

Of course these variables go on the stack, so I have given their addresses
relative to the Stack Pointer.

IN THIS CONTEXT ONLY, the declaration of CCC and EEE is treated as declaring
a pointer, again a long *, and NOT an array.  The 5 in CCC[5] is simply
ignored.  The result is the same as if you declared "long *CCC, *EEE;".
But this applies only to formal parameter declarations!

Suppose we called this function this:  X = afunctn (AA,BB,CC,DD,EE);
assuming DDD has been declared legally.  Then the value 1006 that we
assigned above would be read from BB, i.e. from location 1004, and
stored into BBB, i.e. at location SP+104.  And the CONSTANT value 1006
of CC would be stored into CCC, at SP+106.  Then, in the function
afunctn, both BBB[1] and CCC[1] would refer to the same array element
at location 1010-1013, and they would be computed in the same manner.

Although CCC's declaration looks like CC's and not like BBB's, the object
CCC is of the same type as BBB and not CCC.  All three names are of the
same type, long *.

A character string constant in double quotes is also a pointer-valued
constant, of type char *.  There is no difference between

	char	ST[] = { 'a', 'b', 'c', 'd', '\0' };
	printf (ST);

and

	printf ("abcd");

except that in the first example we can use the constant pointer ST
again, and in the second we can't use the pointer again because we
didn't give it a name.



As this point I'd better mention the syntax for arrays of pointers.
This is:

	long *FF[2];
or
	long *(FF[2]);

The two forms are equivalent, and the first is usual.  Assuming this is
not a formal parameter declaration, it allocates 4 bytes of memory
for 2 pointers.  I'll talk about how you use it later.



Now, I haven't yet used the term "pointer to array".  This is deliberate.
Just because BB or CCC points to the beginning of an object that is an
array does NOT give it the type "pointer to array".  Its type is "pointer
to long", or long *.  To talk about what a "pointer to an array" means,
it's easiest if I digress into 2-dimensional arrays.

If you're declaring external variables...

	long	KK[2][5];	/* Allocates 40 bytes, 2000-2039. */

In C a 2x5 array such as this is really an array of 2 arrays of 5 longs.
KK[1][2] refers to location 2028-2031 and is a long.  KK[1] is a pointer-
valued expression whose value is 2020 and is of type long *, and KK is
a pointer-valued constant whose value is 2000.

Now, notice how we evaluate that expression KK[1][2].  We start with KK,
the constant equal to 2000.  The subscript 1 counts rows of the 2-dimensional
array, so we multiply it by 20, the size of a row in bytes, and add
to get 2020 as the value of KK[1].  Then we multiply the subscript 2
by 4, the size of a long, and add that in to get 2028.

But what is this "size of a row in bytes"?  Well, this is the size of an
ARRAY of 5 longs.  You see, the type of KK is "pointer to array of 5 longs".
This can be written as: long (*)[5].   THIS is what a "pointer to array"
type is.

You can declare a variable to be a "pointer to array of 5 longs" this way:

	long	(*LL)[5];

This reserves just 2 bytes.  You can now say LL = KK, and then LL[1][2]
will have the same value as KK[1][2].  But just like CC and BB above,
it will not be computed the same way.

And once again, these meanings change if you are declaring formal
parameters.  In that case only, we might have:

bfunctn (KKK, LLL, MMM)
long	KKK[2][5];	/* The 2 is ignored.  2 bytes are allocated. */
long	(*LLL)[5];	/* 2 bytes are allocated. */
long	MMM[][5];	/* 2 bytes are allocated. */
	{
	...
	}

All three variables are of the same type, "pointer to array of 5 longs".
When using the array-like declaration as in KKK and MMM, the value in
the first subscript can be omitted since it will be ignored anyway.
But there must be a value in the second subscript (and each further
one in a 3-dimensional or higher case), because this becomes part of
the type, and it will be needed in order to interpret what MMM[1][2] means!

Again, if you call this function by Z = bfunctn (KK, LL, LL);
after assigning LL = KK, then KKK[1][2], LLL[1][2], and MMM[1][2] all
refer to the same location 2028-2031.



Having explained pointers to arrays, I turn back to arrays of pointers.
Suppose you declared, as above,

	long CC[5];
	...
	long *FF[2];

as an external.  This would reserve 4 bytes, say 1500-1503.
Now suppose that CC and DD are declared as above:

	long CC[5];
	long DD[] = { 1L, 2L, 3L, 4L, 5L };

and they occupy 1006-1025 and 1026-1045 respectively.  Then, remember,
the expressions CC and DD are constants with the types long * and the
values 1006 and 1026.

Now FF is an array of long *'s.  So FF[0] is a long *.  So you can say
FF[0] = CC; FF[1] = DD;    ...now the value 1006 is stored at location
1500-1501, and 1026 at 1502-1503.

So how do you use this?  Well, suppose you refer to FF[1][2].  This means
to take the subscript 1 and multiply by the size of a pointer, or 2 bytes,
and add this to the constant FF or 1500, giving 1502.  Then we read location
1502 to get 1026.  We take the subscript 2, multiply by 4, the size of  a
long, and add it in, giving 1034.  So FF[1][2] refers to a long at location
1034-1037, i.e., the [2] element of DD.

Notice that this gives the effect of a 2-dimensional array, but the rows
do not have to be regularly spaced in memory as with the declared array.

The type of the expression FF is, of course, long **.  If the object FF
is an array of long *, the name itself must be a pointer to a long *,
i.e., a long **.

So you could declare

	long **GG;
and say
	GG = FF;

and then you could refer to GG[1][2] which would also be the same as DD[2].
It would be evaluated in yet another way, which I leave as an exercise.

Finally I observe that, as usual, the array syntax used for FF does
not mean the same thing when used for a formal parameter, but instead
it yields an object like GG, a pointer to a pointer.

cfunctn (FFF, GGG)
long *FFF[];
long **GGG;
	{
	...
	}

these objects are of the same type as each other, the same as GG.
This is also what the "main" argument usually called "argv" is like,
too, except that it's a char ** instead of a long **.


Mark Brader