[net.unix] My pointer stuff: C caught me again

allbery@ncoast.UUCP (Brandon Allbery) (06/28/86)

Expires:

Quoted from <418@dg_rtp.UUCP> ["Re: Pointers vs. arrays:  another dumb question..."], by throopw@dg_rtp.UUCP (Wayne Throop)...
+---------------
| > Okay, I've another dumb question for everyone:
| 
| A-Hah!  Not a dumb question at all!  It is a question that cuts right to
| the center of the confusion about pointers in C.
+---------------

Believe me, I'm aware of it.  Anyone have a PD Modula 2 compiler for a 68000?

+---------------
| There are two specific casts that are asked about, (struct foo (*)[])
| and (struct foo *).  As I understand it, the question is, where should
| each be used, and why.  No specific examples of the use of either cast
| are given, but I conjecture from previous postings that the first cast
| is used like so:
| 
|     int (*a[N])[];
|     a[M] = (int (*)[]) malloc( (unsigned) O*sizeof(int) );
| 
| This is, of course, correct.  The second cast's use is much more
| obscure... the only thing said about it is:
| 
| > But, if it's in initialized data, you can't do it that way:
| > you can't take a ``pointer to an array''.
| > So the cast is: (struct foo *)
| 
| Lacking an example of where this cast must be used, some questions
| immediately spring to mind.  What "it" is in initialized data?  The
| array or one of the the arrays pointed to?  What "it" is it that
| "can't be done that way".  The cast?  The assignment (or maybe an
| initialization)?  The malloc?  Finally, what "way" is being talked about
| here?  The format of the assignment/initialization?  The method of
| allocating storage?  Zen budhism?
+---------------

I may have messed up, but C's damnable pointer/array stuff has me so confused
I don't know for sure.  The basic idea as I understand it is the C array
versus an malloc'ed one.

The code in question is two analogous sections:

-------- section 1 ---------

struct sfld (*__cursf)[] = (struct sfld (*)[]) 0;

if ((__cursf = (struct sfld (*)[]) calloc(n, sizeof (struct sfld)))
	== (struct sfld (*)[]) 0) ...

----------------------------

This was intended to allocate an array and assign it to a variable of type
``pointer to array of (struct sfld).  I suspect the type is wrong but I'm not
sure how to decalre such a beastie; I suspect that it *does* *not* *exist*
*at* *all* in C, now that I've played with it.

The other section looks like this:

-------- section 2 ---------

struct menu {
	int m_rec;
	struct cmd *m_cmd;
};

struct menu cmdtab[] = {
	orders,		ocmdarr,
	customer,	ccmdarr,
	-1,		(struct cmd *) 0,
};

----------------------------

The dichotomy between these otherwise identical sections (as far as the
``pointer to an array'' is concerned) is that an array DECLARED in C causes
the array name to become a CONSTANT.  Whereas the malloc()'ed one is a POINTER
VARIABLE.  This could easily have been done correctly:

int array[3];	-- should declare a pointer followed by 3 integers, with the
		   pointer initialized to the 3 integers
int array[];	-- should decalre a pointer.

The ``pointers'' I am talking about here are the assembly-language constructs;
C should treat ``int array[]'' as a different type from ``int *ptr'', and
while ``int array[3]'' and ``int array[]'' are the same type, the sized
array's pointer should be treated as a constant.  (This may be arguable.)
BTW, the (struct foo (*)[]) was confusion on my part; it's just plain wrong
for what I was doing.

I have become thoroughly sick of C pointers-vs.-arrays.  Anyone with a
replacement?  If this continues I may go back to programming in BASIC (yes,
it's THAT bad!).

+---------------
| Anyhow, the insightful stuff follows:
| 
| > BUT:  the arrangement in memory is identical!
+---------------

Not for that cast it wasn't.  The actual problem comes from C's closeness to
the machine hardware:

	the malloc()'ed one is type (int *), to the C compiler (to me, int [])
	the declared one is type (int []), to the C compiler
		(which defines (int []) as (int *))
	--btw, what REALLY threw me was the idea of a cast to (int []) --
		huh?  I wholeheartedly agree with your flame re: declaration-
		mirrors-use; that cast is ridiculous!  ((int (*)[]) is worse!)

and they are in fact identical in memory, so the C compiler treats them as
identical period.  Boo hiss; just because on my computer a (long) is the same
size as an (int) doesn't mean I can mix them with impunity.  C (and, more
importantly, lint) deals with (long)->(int)->(short), but NOWHERE is there a
utility to catch misuse of * and [].

+---------------
| "Why isn't the correct type of an int array name (int [])?"
| 
| *GOOD* question.  *VERY* good question.  The answer is "just because".
+---------------

AMEN, HALLELUJAH!!!

+---------------
| Or, if you want to be insulting, because DMR slipped a cog.  This is
| *THE* *MOST* *CONFUSING* thing about C, by far.  An array name, when
| evaluated, does *NOT* *NOT* *NOT* yield an array type.  This is the
| single greatest lunacy of the C language.  It might be argued that this
| is the *ONLY* great lunacy in C, although the declaration-mirrors-use
| rule probably ought to be considered a great lunacy as well.  (In case
| you can't tell, these nasty remarks about array name evaluation in C are
| my opinions only, and only about 60% serious.  Others differ with me.
| However, it is objective fact that this one point causes more confusion
| and error than any other construct in C.  By far.)
+---------------

This one feature is the only one that has me posting confusing (and wrong)
ideas about C on the net.  Abolish it.  If I had lint source I would change it
to force arrays to be (int []) and pointers (int *); of course, malloc() would
have to be ``known'' for this to work, so the size allocated could be checked
and the correct type assigned.  (malloc(sizeof int) shouldn't have to be cast
to (int []), since it's valid for ``int *foo''.)  Meaning, not possible.  C
loses again.  (HELP!)

+---------------
| > That
| > would make much more clear the meaning of the pointer, and would avoid many of
| > the pointer-vs.-array confusions.
| 
| Yes, yes, yes!!!  However, the fact that array names evaluate to the
| address of the first element in the array means that the types "pointer
| to foo" and "pointer to array of foo" *must* indicate the same storage
| layout in C, and this glitch is so deeply ingrained in C that to "fix"
| it would simply yield a new language, not a better C language.  Note
| that this glitch, coupled with the definition of subscripting in terms
| of pointer arithmetic, makes the type "pointer to foo" an unresistably
| convenient near-synonym for "pointer to unknown sized array of foo", and
| thus nearly everybody uses the simpler form.
+---------------

I'm in the process of rewriting programs to use [] where [] is meant and *
where * is meant.

Come to think of it -- can malloc() or similar be typed right anyway?  I
suspect this is why Pascal uses the ``new(pointer)'' construct, known to the
compiler; it's type-able at compile time.  But catching the allocation of an
(int []) (vs. an (int)) from malloc() and forcing the former to be assigned to
a variable of type (int []) and the latter to an (int *) is nearly
impossible even when the language considers (int []) and (int *) to be
different.

Chuck it out & start over, please!

--Brandon (confusion (*)[])
-- 
ihnp4!sun!cwruecmp!ncoast!allbery ncoast!allbery@Case.CSNET ncoast!tdi2!brandon
(ncoast!tdi2!root for business) 6615 Center St. #A1-105, Mentor, OH 44060-4101
Phone: +01 216 974 9210      CIS 74106,1032      MCI MAIL BALLBERY (part-time)

guy@sun.uucp (Guy Harris) (06/29/86)

> The code in question is two analogous sections:
> 
> -------- section 1 ---------
> 
> struct sfld (*__cursf)[] = (struct sfld (*)[]) 0;
> 
> if ((__cursf = (struct sfld (*)[]) calloc(n, sizeof (struct sfld)))
> 	== (struct sfld (*)[]) 0) ...
> 
> ----------------------------
> 
> This was intended to allocate an array and assign it to a variable of type
> ``pointer to array of (struct sfld).  I suspect the type is wrong but I'm
> not sure how to decalre such a beastie; I suspect that it *does* *not*
> *exist* *at* *all* in C, now that I've played with it.

Wrongo.  "struct sfld (*cursf)[]" *is* a declaration of a pointer to an
array of "struct sfld".  However, it is not possible to generate a value
with that type by taking the address of an object which is an array of
"struct sfld".  You *can* generate a value of that type by using the name of
an array of arrays of "struct sfld"; such a name has the type of a pointer
to an element of that array, and hence the type "pointer to array of 'struct
sfld'".

(By the way, the casts of "0" are not necessary; the compiler knows that the
LHS of the "=" operator in the declaration, and the "==" operator in the
"if", is a pointer, and thus knows that it must coerce the "0" into a null
pointer of the appropriate type.)

The "malloc" here *allocates* an *array* of "struct sfld"; however, it
*returns* a pointer to the first element of that array.

> This could easily have been done correctly:
> 
> int array[3];	-- should declare a pointer followed by 3 integers, with the
> 		   pointer initialized to the 3 integers
> int array[];	-- should decalre a pointer.

No, NO, *NO*, ***N*O****,


	N     N   OOOOO   !
	NN    N  O     O  !
	N N   N  O     O  !
	N  N  N  O     O  !
	N   N N  O     O  !
	N    NN  O     O
	N     N   OOOOO   !

"int array[3]" does not, and should declare any sort of pointer.  It should
reserve storage for three "int"s - PERIOD!  "int array[]" should, if "array"
is initialized, declare an array with as many members as appear in the
initialization; if it's not initialized, it should either be an error or be
considered an "extern" declaration of an array whose size is specified (and
whose storage is reserved" in another module.  The only pointers involved
should be the *constant expression* "array", which has type "pointer to
'int'" when it appears in an expression.  NO storage should be reserved to
hold this "pointer", because no storage NEEDS to be reserved to hold this
pointer - any more than storage needs to be reserved (except, possibly, in
the instruction stream, or maybe in a literal pool) for the "3" in the
expression "x + 3".

> C should treat ``int array[]'' as a different type from ``int *ptr'',

It does.  That's what people have been trying to tell you!

> and while ``int array[3]'' and ``int array[]'' are the same type, the sized
> array's pointer should be treated as a constant.  (This may be arguable.)

Damn straight it's arguable.  NEITHER array has a "pointer" in the sense of
a location of memory which holds a pointer to that array.  The name "array"
is, when used in an expression, a *constant* pointer to the first member of
that array - in *both* cases.

> 	the malloc()'ed one is type (int *), to the C compiler (to me, int [])
> 	the declared one is type (int []), to the C compiler
> 		(which defines (int []) as (int *))

No, it doesn't.  You haven't been listening.  *Start* listening.  To the C
compiler, "int []" declares an array of "int"s, which is normally
implemented as a consecutive block of locations holding "int"s.  However, an
array can *not* be used as an object in an expression.  You can't do array
assignment, you can't add two arrays, you can't pass arrays to functions as
arguments, and you can't have a function which returns an array.  When the
name of an array is used in an expression, it is *reinterpreted* as a
*constant* pointer to the first element of that array.

The "malloc()'ed one" is type "int []"; however, "malloc" returns a pointer
to the first element of that array.  This is not much stranger than

	int *x;
	x = (int *) malloc(sizeof int);

"malloc" can't very well return an "int" here, it can *only* return a
*pointer* to what it has allocated.  You *have* to declare a "pointer to
'int'" here, even though the object which "malloc" has allocated is an
"int", not a "pointer to 'int'".  The same is almost true of arrays, except
that you declare a pointer to an object of type <whatever>, rather than of
type "array of <whatever>", when "malloc"ing an array.

> and they are in fact identical in memory, so the C compiler treats them as
> identical period.

Bullshit.  A pointer to "int" and an array of "int" are in NO WAY identical
in memory.

> Come to think of it -- can malloc() or similar be typed right anyway?  I
> suspect this is why Pascal uses the ``new(pointer)'' construct, known to the
> compiler; it's type-able at compile time.  But catching the allocation of an
> (int []) (vs. an (int)) from malloc() and forcing the former to be assigned
> to a variable of type (int []) and the latter to an (int *) is nearly
> impossible even when the language considers (int []) and (int *) to be
> different.

No, no, no!  If you "malloc" an array, you don't assign the result of
"malloc" to a variable of type "int []".  What you want is to be able to
assign it to a variable of type "pointer to array of 'int'" and use that
pointer to refer to that array.  If you "malloc" an "int", you don't assign
the result to a variable of type "int", do you?

The problem here is that you don't deal with pointers to arrays in the
following fashion:

	int (*pointer_to_array)[];

	pointer_to_array =
	    (int (*)[]) malloc(number_of_array_elements * sizeof int);
	third_element_of_malloced_array = (*pointer_to_array)[2];

If arrays had been first-class types in C, this would have been how you
would have done it.  Instead, you have to do:

	int *pointer_to_first_element_of_array;

	pointer_to_first_element_of_array =
	    (int *)malloc(number_of_array_elements * sizeof int);
	third_element_of_malloced_array =
	    pointer_to_first_element_of_array[2];
	/* or *(pointer_to_first_element_of_array + 2) */

This is the source of infinite confusion for some C programmers, and I agree
with Wayne that it was, in balance, a mistake.  It *can't* be fixed now,
however fervently one might wish to do so.  It's *too late*.  C is *already
out there*, and changing it now would break too many programs.  If you
change it, you'll have to call the resulting language D (or P).
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

guy@sun.uucp (Guy Harris) (06/29/86)

> The problem here is that you don't deal with pointers to arrays in the
> following fashion:
> 
> 	int (*pointer_to_array)[];
> 
> 	pointer_to_array =
> 	    (int (*)[]) malloc(number_of_array_elements * sizeof int);
> 	third_element_of_malloced_array = (*pointer_to_array)[2];

Well, after thinking about it, I realized that you *can* deal with pointers
to arrays in that fashion, if you want.  The following bit of code compiles,
runs, and even passes "lint" (modulo "possible pointer alignment problem"
messages):

	main()
	{
		extern char *malloc();
		register int i;
		register int (*pointer_to_array)[];

		pointer_to_array =
		    (int (*)[]) malloc(3 * sizeof(int));

		for (i = 0; i < 3; i++)
			(*pointer_to_array)[i] = i + 4;

			for (i = 0; i < 3; i++)
			printf("array[%d] = %d\n", i, (*pointer_to_array)[i]);
	}

(Note that I got the syntax of the "sizeof" wrong in my previous article -
it should be "sizeof(int)", not "sizeof int".)

It's not customary to do so, however, and you still can't do

	register int (*pointer_to_array)[];
	int array[666];

	pointer_to_array = &array;

since the compiler will bitch about "&array", and probably ignore the "&"
and treat this as

	pointer_to_array = array;

and then bitch that the LHS is of type "pointer to array of 'int'" and the
RHS is of type "pointer to int".

For similar reasons, this trick won't work for a function which takes an
argument which is a pointer to an array, since "lint" (and the compiler,
once function prototypes are generally available) will complain if you try
to pass an array to that function, for the same reason (think of a function
call as containing assignments of the actual parameters to the formal
parameters).
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)