[net.lang.c] My pointer stuff: C caught me again

allbery@ncoast.UUCP (Brandon Allbery) (06/28/86)

Expires:

Quoted from <418@dg_rtp.UUCP> ["Re: Pointers vs. arrays:  another dumb question..."], by throopw@dg_rtp.UUCP (Wayne Throop)...
+---------------
| > Okay, I've another dumb question for everyone:
| 
| A-Hah!  Not a dumb question at all!  It is a question that cuts right to
| the center of the confusion about pointers in C.
+---------------

Believe me, I'm aware of it.  Anyone have a PD Modula 2 compiler for a 68000?

+---------------
| There are two specific casts that are asked about, (struct foo (*)[])
| and (struct foo *).  As I understand it, the question is, where should
| each be used, and why.  No specific examples of the use of either cast
| are given, but I conjecture from previous postings that the first cast
| is used like so:
| 
|     int (*a[N])[];
|     a[M] = (int (*)[]) malloc( (unsigned) O*sizeof(int) );
| 
| This is, of course, correct.  The second cast's use is much more
| obscure... the only thing said about it is:
| 
| > But, if it's in initialized data, you can't do it that way:
| > you can't take a ``pointer to an array''.
| > So the cast is: (struct foo *)
| 
| Lacking an example of where this cast must be used, some questions
| immediately spring to mind.  What "it" is in initialized data?  The
| array or one of the the arrays pointed to?  What "it" is it that
| "can't be done that way".  The cast?  The assignment (or maybe an
| initialization)?  The malloc?  Finally, what "way" is being talked about
| here?  The format of the assignment/initialization?  The method of
| allocating storage?  Zen budhism?
+---------------

I may have messed up, but C's damnable pointer/array stuff has me so confused
I don't know for sure.  The basic idea as I understand it is the C array
versus an malloc'ed one.

The code in question is two analogous sections:

-------- section 1 ---------

struct sfld (*__cursf)[] = (struct sfld (*)[]) 0;

if ((__cursf = (struct sfld (*)[]) calloc(n, sizeof (struct sfld)))
	== (struct sfld (*)[]) 0) ...

----------------------------

This was intended to allocate an array and assign it to a variable of type
``pointer to array of (struct sfld).  I suspect the type is wrong but I'm not
sure how to decalre such a beastie; I suspect that it *does* *not* *exist*
*at* *all* in C, now that I've played with it.

The other section looks like this:

-------- section 2 ---------

struct menu {
	int m_rec;
	struct cmd *m_cmd;
};

struct menu cmdtab[] = {
	orders,		ocmdarr,
	customer,	ccmdarr,
	-1,		(struct cmd *) 0,
};

----------------------------

The dichotomy between these otherwise identical sections (as far as the
``pointer to an array'' is concerned) is that an array DECLARED in C causes
the array name to become a CONSTANT.  Whereas the malloc()'ed one is a POINTER
VARIABLE.  This could easily have been done correctly:

int array[3];	-- should declare a pointer followed by 3 integers, with the
		   pointer initialized to the 3 integers
int array[];	-- should decalre a pointer.

The ``pointers'' I am talking about here are the assembly-language constructs;
C should treat ``int array[]'' as a different type from ``int *ptr'', and
while ``int array[3]'' and ``int array[]'' are the same type, the sized
array's pointer should be treated as a constant.  (This may be arguable.)
BTW, the (struct foo (*)[]) was confusion on my part; it's just plain wrong
for what I was doing.

I have become thoroughly sick of C pointers-vs.-arrays.  Anyone with a
replacement?  If this continues I may go back to programming in BASIC (yes,
it's THAT bad!).

+---------------
| Anyhow, the insightful stuff follows:
| 
| > BUT:  the arrangement in memory is identical!
+---------------

Not for that cast it wasn't.  The actual problem comes from C's closeness to
the machine hardware:

	the malloc()'ed one is type (int *), to the C compiler (to me, int [])
	the declared one is type (int []), to the C compiler
		(which defines (int []) as (int *))
	--btw, what REALLY threw me was the idea of a cast to (int []) --
		huh?  I wholeheartedly agree with your flame re: declaration-
		mirrors-use; that cast is ridiculous!  ((int (*)[]) is worse!)

and they are in fact identical in memory, so the C compiler treats them as
identical period.  Boo hiss; just because on my computer a (long) is the same
size as an (int) doesn't mean I can mix them with impunity.  C (and, more
importantly, lint) deals with (long)->(int)->(short), but NOWHERE is there a
utility to catch misuse of * and [].

+---------------
| "Why isn't the correct type of an int array name (int [])?"
| 
| *GOOD* question.  *VERY* good question.  The answer is "just because".
+---------------

AMEN, HALLELUJAH!!!

+---------------
| Or, if you want to be insulting, because DMR slipped a cog.  This is
| *THE* *MOST* *CONFUSING* thing about C, by far.  An array name, when
| evaluated, does *NOT* *NOT* *NOT* yield an array type.  This is the
| single greatest lunacy of the C language.  It might be argued that this
| is the *ONLY* great lunacy in C, although the declaration-mirrors-use
| rule probably ought to be considered a great lunacy as well.  (In case
| you can't tell, these nasty remarks about array name evaluation in C are
| my opinions only, and only about 60% serious.  Others differ with me.
| However, it is objective fact that this one point causes more confusion
| and error than any other construct in C.  By far.)
+---------------

This one feature is the only one that has me posting confusing (and wrong)
ideas about C on the net.  Abolish it.  If I had lint source I would change it
to force arrays to be (int []) and pointers (int *); of course, malloc() would
have to be ``known'' for this to work, so the size allocated could be checked
and the correct type assigned.  (malloc(sizeof int) shouldn't have to be cast
to (int []), since it's valid for ``int *foo''.)  Meaning, not possible.  C
loses again.  (HELP!)

+---------------
| > That
| > would make much more clear the meaning of the pointer, and would avoid many of
| > the pointer-vs.-array confusions.
| 
| Yes, yes, yes!!!  However, the fact that array names evaluate to the
| address of the first element in the array means that the types "pointer
| to foo" and "pointer to array of foo" *must* indicate the same storage
| layout in C, and this glitch is so deeply ingrained in C that to "fix"
| it would simply yield a new language, not a better C language.  Note
| that this glitch, coupled with the definition of subscripting in terms
| of pointer arithmetic, makes the type "pointer to foo" an unresistably
| convenient near-synonym for "pointer to unknown sized array of foo", and
| thus nearly everybody uses the simpler form.
+---------------

I'm in the process of rewriting programs to use [] where [] is meant and *
where * is meant.

Come to think of it -- can malloc() or similar be typed right anyway?  I
suspect this is why Pascal uses the ``new(pointer)'' construct, known to the
compiler; it's type-able at compile time.  But catching the allocation of an
(int []) (vs. an (int)) from malloc() and forcing the former to be assigned to
a variable of type (int []) and the latter to an (int *) is nearly
impossible even when the language considers (int []) and (int *) to be
different.

Chuck it out & start over, please!

--Brandon (confusion (*)[])
-- 
ihnp4!sun!cwruecmp!ncoast!allbery ncoast!allbery@Case.CSNET ncoast!tdi2!brandon
(ncoast!tdi2!root for business) 6615 Center St. #A1-105, Mentor, OH 44060-4101
Phone: +01 216 974 9210      CIS 74106,1032      MCI MAIL BALLBERY (part-time)

chris@umcp-cs.UUCP (Chris Torek) (06/29/86)

Perhaps I just have an odd mind, but all this pointer/array stuff
never really bothered me.

In article <1267@ncoast.UUCP> allbery@ncoast.UUCP (Brandon Allbery) writes:
>struct sfld (*__cursf)[] = (struct sfld (*)[]) 0;
>
>if ((__cursf = (struct sfld (*)[]) calloc(n, sizeof (struct sfld)))
>	== (struct sfld (*)[]) 0) ...
>
>This was intended to allocate an array and assign it to a variable of type
>``pointer to array of (struct sfld).  I suspect the type is wrong but I'm not
>sure how to decalre such a beastie; I suspect that it *does* *not* *exist*
>*at* *all* in C, now that I've played with it.

Why not simply use a `pointer to struct sfld'?  If you intend to
use this as `__cursf[i].field', that is what you need.

>The other section looks like this:
>
>struct menu {
>	int m_rec;
>	struct cmd *m_cmd;
>};
>
>struct menu cmdtab[] = {
>	orders,		ocmdarr,
>	customer,	ccmdarr,
>	-1,		(struct cmd *) 0,
>};

This looks reasonable to me.

>The dichotomy

What dichotomy?  Using my declarations everything is identical;
in

	/* given `int a[N];' */
	a;

the type of the expression `a' is `pointer to int'.

>between these otherwise identical sections (as far as the
>``pointer to an array'' is concerned)

You should to have a two-dimensional array in mind in the first
place before using `pointer to array N'.  In

	/* int b[M][N]; */
	b;

the type of the expression `b' is `pointer to array N of int'.
(Note that if this is dereferenced, it becomes `array N of int',
which in a normal expression is then immediately converted to
`pointer to int'.  `normal' here means `not a target of sizeof'.)

>is that an array DECLARED in C causes the array name to become
>a CONSTANT.

Not quite, but close.  When used as an rvalue the constant has
type `pointer to' whatever one element of that array might be.

>Whereas the malloc()'ed one is a POINTER VARIABLE.

No, it is a pointer expression, with type `pointer to' whatever
one element of that array might be.  Once it has been assigned to
a pointer variable, then that is indeed a pointer variable.

There are certainly other ways of handling the typing of arrays;
C does it by making arrays second class objects, which is occasionally
regrettable, but not too hard to deal with.

>+---------------
>| Anyhow, the insightful stuff follows:
>| 
>| > BUT:  the arrangement in memory is identical!
>+---------------

The arrangement in memory of any array of any dimension is flat.
`int a[2][5]' is, aside from typing information, identical to
`int a[10]'.  This has never bothered me.

>The actual problem comes from C's closeness to the machine hardware:
>
>	the malloc()'ed one is type (int *), to the C compiler (to me, int [])

Yes.

>	the declared one is type (int []), to the C compiler

Yes.  Note that the first dimension of the array is unimportant after
allocation, so the type *is* (int []), not (int [5]) or whatnot.
(Again, `sizeof' is peculiar; ignore it.)

>		(which defines (int []) as (int *))

Only in `most places' (this is perhaps what bothers people; `sizeof'
is `peculiar', and so are declarations of formals).

>+---------------
>| "Why isn't the correct type of an int array name (int [])?"
>| 
>| *GOOD* question.  *VERY* good question.  The answer is "just because".
>| Or, if you want to be insulting, because DMR slipped a cog.  This is
>| *THE* *MOST* *CONFUSING* thing about C, by far.  An array name, when
>| evaluated, does *NOT* *NOT* *NOT* yield an array type.  This is the
>| single greatest lunacy of the C language.  It might be argued that this
>| is the *ONLY* great lunacy in C, although the declaration-mirrors-use
>| rule probably ought to be considered a great lunacy as well.  (In case
>| you can't tell, these nasty remarks about array name evaluation in C are
>| my opinions only, and only about 60% serious.  Others differ with me.
>| However, it is objective fact that this one point causes more confusion
>| and error than any other construct in C.  By far.)
>+---------------

Again, it has never bothered me.  Arrays are second class objects;
you cannot quite name one outside a data declaration.  Functions
are likewise second class: a function name, when evaluated, does
not yeild a function type, but rather a function pointer.  Lunacy?
I guess you should reserve a place in the nut-house for me (though
it is arguable that at UMCP, I am already there :-) ).

Incidentally,

	int (*p)[];

is not really a useful declaration.  Pretend you are a compiler:
tell me how to find p[3][1] (or, if you prefer, (*(p+3))[1]).
Try again with

	int (*p)[5];

and see if that makes a difference.

[answers below]



The rule for pointer addition is `multiply the integer value by
the size (in bytes) of the pointed-to object, then add that to the
address given by the pointer.'  Given `int (*p)[]', we want to find
p[3][1].  This is equivalent to *((*(p+3))+1).  Do the innermost
expression first: p+3.  Following the pointer addition rule, multiply
3 by the size of whatever p points to.  p points to `int []'.  How
big is this?  Got me.  It is *not* (sizeof (int *)).  See what your
compiler says about `sizeof (int [])'.

For `int (*p)[5]' and the same reference, we take the size of
whatever p points to, and p points to `int [5]'.  How big is this?
Well, it depends on your machine, but let us suppose you have a
Vax; we get 5*4 = 20 bytes.  We take the address of location (p +
20 bytes), not the contents, as the type of *p is `int []' (the
first subscript drops out of any array type), and since this is
used in another expression, convert the type to `int *'.  We now
want to add one to this pointer, so again we follow the pointer
addition rule and take the size of the type of the pointed-to object
(int), which is four bytes, and multiply by 1 (remember, we are
now doing *(<thing> + 1)).  <thing> happens to be (p + 20 bytes),
to which we add 4 bytes.  The location of p[3][1] is thus (p + 24
bytes), and the type is `int'.  If `p' is a register (call it r11),
the expression

	i = p[3][1];

should compile to

	movl    24(r11),_i

and indeed it does.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

guy@sun.uucp (Guy Harris) (06/29/86)

> The code in question is two analogous sections:
> 
> -------- section 1 ---------
> 
> struct sfld (*__cursf)[] = (struct sfld (*)[]) 0;
> 
> if ((__cursf = (struct sfld (*)[]) calloc(n, sizeof (struct sfld)))
> 	== (struct sfld (*)[]) 0) ...
> 
> ----------------------------
> 
> This was intended to allocate an array and assign it to a variable of type
> ``pointer to array of (struct sfld).  I suspect the type is wrong but I'm
> not sure how to decalre such a beastie; I suspect that it *does* *not*
> *exist* *at* *all* in C, now that I've played with it.

Wrongo.  "struct sfld (*cursf)[]" *is* a declaration of a pointer to an
array of "struct sfld".  However, it is not possible to generate a value
with that type by taking the address of an object which is an array of
"struct sfld".  You *can* generate a value of that type by using the name of
an array of arrays of "struct sfld"; such a name has the type of a pointer
to an element of that array, and hence the type "pointer to array of 'struct
sfld'".

(By the way, the casts of "0" are not necessary; the compiler knows that the
LHS of the "=" operator in the declaration, and the "==" operator in the
"if", is a pointer, and thus knows that it must coerce the "0" into a null
pointer of the appropriate type.)

The "malloc" here *allocates* an *array* of "struct sfld"; however, it
*returns* a pointer to the first element of that array.

> This could easily have been done correctly:
> 
> int array[3];	-- should declare a pointer followed by 3 integers, with the
> 		   pointer initialized to the 3 integers
> int array[];	-- should decalre a pointer.

No, NO, *NO*, ***N*O****,


	N     N   OOOOO   !
	NN    N  O     O  !
	N N   N  O     O  !
	N  N  N  O     O  !
	N   N N  O     O  !
	N    NN  O     O
	N     N   OOOOO   !

"int array[3]" does not, and should declare any sort of pointer.  It should
reserve storage for three "int"s - PERIOD!  "int array[]" should, if "array"
is initialized, declare an array with as many members as appear in the
initialization; if it's not initialized, it should either be an error or be
considered an "extern" declaration of an array whose size is specified (and
whose storage is reserved" in another module.  The only pointers involved
should be the *constant expression* "array", which has type "pointer to
'int'" when it appears in an expression.  NO storage should be reserved to
hold this "pointer", because no storage NEEDS to be reserved to hold this
pointer - any more than storage needs to be reserved (except, possibly, in
the instruction stream, or maybe in a literal pool) for the "3" in the
expression "x + 3".

> C should treat ``int array[]'' as a different type from ``int *ptr'',

It does.  That's what people have been trying to tell you!

> and while ``int array[3]'' and ``int array[]'' are the same type, the sized
> array's pointer should be treated as a constant.  (This may be arguable.)

Damn straight it's arguable.  NEITHER array has a "pointer" in the sense of
a location of memory which holds a pointer to that array.  The name "array"
is, when used in an expression, a *constant* pointer to the first member of
that array - in *both* cases.

> 	the malloc()'ed one is type (int *), to the C compiler (to me, int [])
> 	the declared one is type (int []), to the C compiler
> 		(which defines (int []) as (int *))

No, it doesn't.  You haven't been listening.  *Start* listening.  To the C
compiler, "int []" declares an array of "int"s, which is normally
implemented as a consecutive block of locations holding "int"s.  However, an
array can *not* be used as an object in an expression.  You can't do array
assignment, you can't add two arrays, you can't pass arrays to functions as
arguments, and you can't have a function which returns an array.  When the
name of an array is used in an expression, it is *reinterpreted* as a
*constant* pointer to the first element of that array.

The "malloc()'ed one" is type "int []"; however, "malloc" returns a pointer
to the first element of that array.  This is not much stranger than

	int *x;
	x = (int *) malloc(sizeof int);

"malloc" can't very well return an "int" here, it can *only* return a
*pointer* to what it has allocated.  You *have* to declare a "pointer to
'int'" here, even though the object which "malloc" has allocated is an
"int", not a "pointer to 'int'".  The same is almost true of arrays, except
that you declare a pointer to an object of type <whatever>, rather than of
type "array of <whatever>", when "malloc"ing an array.

> and they are in fact identical in memory, so the C compiler treats them as
> identical period.

Bullshit.  A pointer to "int" and an array of "int" are in NO WAY identical
in memory.

> Come to think of it -- can malloc() or similar be typed right anyway?  I
> suspect this is why Pascal uses the ``new(pointer)'' construct, known to the
> compiler; it's type-able at compile time.  But catching the allocation of an
> (int []) (vs. an (int)) from malloc() and forcing the former to be assigned
> to a variable of type (int []) and the latter to an (int *) is nearly
> impossible even when the language considers (int []) and (int *) to be
> different.

No, no, no!  If you "malloc" an array, you don't assign the result of
"malloc" to a variable of type "int []".  What you want is to be able to
assign it to a variable of type "pointer to array of 'int'" and use that
pointer to refer to that array.  If you "malloc" an "int", you don't assign
the result to a variable of type "int", do you?

The problem here is that you don't deal with pointers to arrays in the
following fashion:

	int (*pointer_to_array)[];

	pointer_to_array =
	    (int (*)[]) malloc(number_of_array_elements * sizeof int);
	third_element_of_malloced_array = (*pointer_to_array)[2];

If arrays had been first-class types in C, this would have been how you
would have done it.  Instead, you have to do:

	int *pointer_to_first_element_of_array;

	pointer_to_first_element_of_array =
	    (int *)malloc(number_of_array_elements * sizeof int);
	third_element_of_malloced_array =
	    pointer_to_first_element_of_array[2];
	/* or *(pointer_to_first_element_of_array + 2) */

This is the source of infinite confusion for some C programmers, and I agree
with Wayne that it was, in balance, a mistake.  It *can't* be fixed now,
however fervently one might wish to do so.  It's *too late*.  C is *already
out there*, and changing it now would break too many programs.  If you
change it, you'll have to call the resulting language D (or P).
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

guy@sun.uucp (Guy Harris) (06/29/86)

> The problem here is that you don't deal with pointers to arrays in the
> following fashion:
> 
> 	int (*pointer_to_array)[];
> 
> 	pointer_to_array =
> 	    (int (*)[]) malloc(number_of_array_elements * sizeof int);
> 	third_element_of_malloced_array = (*pointer_to_array)[2];

Well, after thinking about it, I realized that you *can* deal with pointers
to arrays in that fashion, if you want.  The following bit of code compiles,
runs, and even passes "lint" (modulo "possible pointer alignment problem"
messages):

	main()
	{
		extern char *malloc();
		register int i;
		register int (*pointer_to_array)[];

		pointer_to_array =
		    (int (*)[]) malloc(3 * sizeof(int));

		for (i = 0; i < 3; i++)
			(*pointer_to_array)[i] = i + 4;

			for (i = 0; i < 3; i++)
			printf("array[%d] = %d\n", i, (*pointer_to_array)[i]);
	}

(Note that I got the syntax of the "sizeof" wrong in my previous article -
it should be "sizeof(int)", not "sizeof int".)

It's not customary to do so, however, and you still can't do

	register int (*pointer_to_array)[];
	int array[666];

	pointer_to_array = &array;

since the compiler will bitch about "&array", and probably ignore the "&"
and treat this as

	pointer_to_array = array;

and then bitch that the LHS is of type "pointer to array of 'int'" and the
RHS is of type "pointer to int".

For similar reasons, this trick won't work for a function which takes an
argument which is a pointer to an array, since "lint" (and the compiler,
once function prototypes are generally available) will complain if you try
to pass an array to that function, for the same reason (think of a function
call as containing assignments of the actual parameters to the formal
parameters).
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)