[comp.lang.c] Help me cast this!: Ultrix 2.x bug

chris@mimsy.UUCP (Chris Torek) (05/04/88)

context:
struct outfile (*output)[] = <<cast>>malloc(sizeof(struct outfile) * 3);
---what to use for <<cast>>:

In article <1451@iscuva.ISCS.COM> carlp@iscuva.ISCS.COM (Carl Paukstis) writes:
>	output = (struct outfile **)malloc(sizeof(struct outfile) * 3);
>
>works on Ultrix 2.something ....

... but only because of a compiler bug.  Obviously DEC have
not installed Guy Harris's fixes to chkpun() in mip/trees.c.

[lint complains]
>"possible pointer alignment problem" ...

4.3BSD `man lint':

	BUGS
	     There are some things you just can't get lint to shut up
	     about.

This is one of them.  malloc() returns a maximally-aligned pointer, but
lint does not know, and cannot be told.  (There may be smarter lints
that can be told; ours can also be improved.  Meanwhile, I tend to run
all lint output through `grep -v "possible pointer alignment problem"'.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

root@mfci.UUCP (SuperUser) (05/05/88)

Expires:

Followup-To:

Distribution:


In article <11344@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>context:
>struct outfile (*output)[] = <<cast>>malloc(sizeof(struct outfile) * 3);
>...
>>	output = (struct outfile **)malloc(sizeof(struct outfile) * 3);
>>
>>works on Ultrix 2.something ....
>
>... but only because of a compiler bug.  Obviously DEC have
>not installed Guy Harris's fixes to chkpun() in mip/trees.c.

Since "type (*)[...]" and "type **" are clearly incompatible types, I
consider the above example to be terrible style.  However, pcc compilers
don't give a warning, and I was once told that Dennis Ritchie considers
it to be perfectly legal C.  So even though I consider it to be brain
damaged to treat * and [] as equivalent other than at the top level, I
am currently under the impression that this is standard K&R C.

From your posting, it sounds like Guy Harris added a warning for this
case, and in fact I noticed that suns also warn.  I certainly agree with
the motivation for giving a warning, but I'm not convinced that it is
considered correct to do so.

Anyway, here's my personal view of pointers in C:

% cat apt.c
int     a;
int     b[5];
int     c[4][5];
int     d[3][4][5];

int    *pa;
int   (*pb)[5];
int   (*pc)[4][5];
int   (*pd)[3][4][5];

int    *qa;
int    *qb;
int   (*qc)[5];
int   (*qd)[4][5];

int     x;
int    *px;
int   **ppx;
int   (*xx)[10];

main()
{
    a          = 1;
    b[2]       = 2;
    c[2][3]    = 3;
    d[2][3][4] = 4;


    /**
    *** Cumbersome purist style:
    **/

    pa = &a;
    pb = (int (*)[5]) b;            /* &b */
    pc = (int (*)[4][5]) c;         /* &c */
    pd = (int (*)[3][4][5]) d;      /* &d */

    printf("%d %d %d %d\n", *pa, (*pb)[2], (*pc)[2][3], (*pd)[2][3][4]);


    /**
    *** Normal C style:
    **/

    qa = &a;
    qb = b;
    qc = c;
    qd = d;

    printf("%d %d %d %d\n", *qa, qb[2], qc[2][3], qd[2][3][4]);


    /**
    *** Brain damaged style.  I've been told that Ritchie claims this is
    *** legal C.  If so then I consider it to be a bug in the language.
    **/

    pa = &a;
    pb = (int **) b;        /* brain damage */
    pc = (int ***) c;       /* brain damage */
    pd = (int ****) d;      /* brain damage */

    printf("%d %d %d %d\n", *pa, (*pb)[2], (*pc)[2][3], (*pd)[2][3][4]);


    /**
    *** Example of brain damage in action:
    **/

    x   = 123;
    px  = &x;
    ppx = &px;
    xx  = &px;              /* brain damage */

    printf("%d %d %d\n", **ppx, **xx == (int) &x, *(int *)**xx);
}
% cc apt.c -o apt
% apt
1 2 3 4
1 2 3 4
1 2 3 4
123 1 123
%

chris@mimsy.UUCP (Chris Torek) (05/06/88)

In article <386@m3.mfci.UUCP> root@mfci.UUCP (SuperUser) writes:
-... "type (*)[...]" and "type **" are clearly incompatible types....
-However, pcc compilers [without Guy Harris's fix, or equivalent]
-don't give a warning, and I was once told that Dennis Ritchie considers
-it to be perfectly legal C.

Told by whom?

-So even though I consider it to be brain damaged to treat * and []
-as equivalent other than at the top level, I am currently under the
-impression that this is standard K&R C.

I think not.  It is certainly not true in dpANS C.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

root@mfci.UUCP (SuperUser) (05/10/88)

Expires:

Followup-To:

Distribution:


In article <11371@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <386@m3.mfci.UUCP> root@mfci.UUCP (SuperUser) writes:
>-... "type (*)[...]" and "type **" are clearly incompatible types....
>-However, pcc compilers [without Guy Harris's fix, or equivalent]
>-don't give a warning, and I was once told that Dennis Ritchie considers
>-it to be perfectly legal C.
>
>Told by whom?

By Bjarne Stroustrup, whom I assume simply asked him.  This was the
result of a mail conversation I was having with him several years ago
over what &a should mean when a is an array.  Of course, &a is not
legal K&R C, but Bjarne thought it should be treated just like a,
i.e., that &a should yield a pointer to the first element of a.  I
argued that this was inconsistent and illogical, that if &a were legal
then it should obviously yield a pointer to a itself, not merely its
first element.  That is to say, it should have the same value as the
other form, but a different type.  Here is the example I used, which
I maintained was perfectly reasonable:

    int a[5][7];
    int (*p)[5][7];

    p = &a;

    (*p)[2][4] = 123;

This can still be accomplished in K&R C, but it is necessary to write the
assignment to p as:

    p = (int (*)[5][7]) a;

My point was that if a has type TYPE [...], then &a should have type
TYPE (*)[...] and should be equivalent to ((TYPE (*)[...]) a), so
that *&a has type TYPE [...], rather than type TYPE as he would have
had it.

Anyway, in a postscript to a reply to one of my many messages on the
subject, he wrote the following.

    PS Dennis claims that this is C:
    main()
    {
            int a[5][7] ;
            int (*p)[5][7];
            p = (int***) a;                         /* no & */
            printf("a %d p %d *p %d\n",a,p,*p);     /* a == p == *p !!! */
            (*p)[2][4] = 123 ;
            printf("%d\n",a[2][4]);                 /* 123 */
    }
    It works! Amazing!

guy@gorodish.Sun.COM (Guy Harris) (05/11/88)

First: mail to "root" at "mfci" failed, so I'll post this; could the person
maintaining netnews at Multiflow please try to arrange that not all messages
from there have a "From:" line listing "root@mfci.UUCP" as the poster?  The
*real* poster's name appears in the "Reply-To:" line, so the information *is*
available.

Second:

> >-However, pcc compilers [without Guy Harris's fix, or equivalent]
> >-don't give a warning, and I was once told that Dennis Ritchie considers
> >-it to be perfectly legal C.

> >Told by whom?
> 
> By Bjarne Stroustrup, whom I assume simply asked him.  This was the
> result of a mail conversation I was having with him several years ago
> over what &a should mean when a is an array.  Of course, &a is not
> legal K&R C, but Bjarne thought it should be treated just like a,
> i.e., that &a should yield a pointer to the first element of a.

This is *not* the same as the "it" referred to above, which is an assignment of
a value of type "struct outfile **" to a variable of type
"struct outfile (*)[]".  The latter is not valid; "array of <type>" and
"pointer to <type>" are inequivalent types, and therefore "pointer to array
of <type>" and "pointer to pointer to <type>" are inequivalent types.  I would
be *EXTREMELY* surprised if Dennis Ritchie felt they were equivalent.

If it's not clear why they must be inequivalent, here's a specific example.
Consider that, if all pointers are represented in a particular implementation
as pointers to the first byte of an object, then

	p++

causes the address contained in "p" to be incremented by the size of the
object.  Now, the size of "struct outfile *" might, say, be 4 on a machine
with 8-bit bytes and 32-bit pointers.  However, the size of
"struct outfile [23]", for example, is 23*sizeof (struct outfile), and the
size of "struct outfile []" is unknown (in effect, zero).

As such, if you have:

	struct outfile **p;
	struct outfile (*output)[];

on an implementation of the sort listed above, with 8-bit bytes and 32-bit
pointers, the expression "p++" will increment the address contained in "p" by 4
bytes (as "sizeof (struct outfile *)" is 4) and the expression "output++" will
probably elicit a complaint from the compiler (as
"sizeof (struct outfile [])" is unknown).

In (old) K&R C (the new K&R presumably describes the ANSI rules), it is
considered incorrect to put "&" before an array or function.  In almost all
contexts, an expression of type "array of <type>" or "function returning
<type>" is converted to type "pointer to <type>" or "pointer to function
returning <type>".  The pointer-valued expressions in question are not lvalues,
and thus cannot be preceded with "&", just as you can't say "&3".  Some
*compilers* permit an "&" to be placed before expressions of this type, and
treat it as redundant.

Some compilers also appear to consider "pointer to <type>" and "array of
<type>" to be equivalent.  Unfortunately, this causes some invalid programs to
compile without complaint; those programs fail later.  In fact, one such
program *did* fail; somebody posted something to "comp.lang.c" about it
(actually to "net.lang.c", if I remember correctly, which indicates how long
ago this was!), which was what got me to look for and find the PCC bug in
question.

> PS Dennis claims that this is C:
> main()
> {
> 	int a[5][7] ;
> 	int (*p)[5][7];
>
> 	p = (int***) a; /* no & */
> 	printf("a %d p %d *p %d\n",a,p,*p); /* a == p == *p !!!  */
> 	(*p)[2][4] = 123 ;
>	printf("%d\n",a[2][4]); /* 123 */
> }
> It works!  Amazing!

"a" is of type "array of 5 arrays of 7 'int's".  "p" is of type "pointer to
array of 5 arrays of 7 'int's."  There is no way in K&R C to type-correctly
assign a pointer to "a" to "p".  The "(int ***)" cast is incorrect; "p" does
*NOT* have type "pointer to pointer to pointer to 'int'."

The fact that the values returned by "a", "p", and "*p" should not be
surprising.  In almost all contexts, an array-valued expression is converted to
a pointer to the first element of the array.  "*p" is an array-valued
expression and gets so converted; in effect, "*p" is equivalent to "p" in
almost all contexts.  The fact that the expressions "a" "p" have the same
numeric value is a consequence of the fact that *most* C implementations
represent pointers by the address of the first addressible unit of the object
pointed to.  As such, the addresses represented by "a" and "p" are the same.

If Dennis considers the above valid C, either by K&R rules or by ANSI C rules,
I would like to see his reasoning.  Everything *except for* the
"p = (int ***)a" is valid K&R C and valid ANSI C.  (Actually, if one wants to
be *extremely* fussy, one can complain about:

	the "printf" - there is no guarantee in K&R that *any* particular
	"printf" format specifier can be used to print any particular pointer,
	and ANSI C guarantees only that "%p" can be used to print "void *";

	the lack of certain #includes, such as "#include <stdio.h>";

	the lack of declaration of arguments for "main()";

but none of those are germane to this particular discussion.)

The following *would* be valid K&R C (modulo the other stuff):

main()
{
	int a[5][7];
	int (*p)[7];

	p = a; /* no &, no cast */
	printf("a %d p %d *p %d\n",a,p,*p); /* a == p == *p !!!  */
	p[2][4] = 123;
	printf("%d\n",a[2][4]); /* 123 */
}

Note that "p" is of type "pointer to array of 7 'int's."  "a" is of type "array
of 5 arrays of 7 'int's."  In most contexts, the expression "a" is converted to
a pointer to the first element of "a"; this first element is of type "array of
7 'int's," so a pointer to it is of type "pointer to array of 7 'int's," which
is the same type as "p".

The above is also valid ANSI C.  The following would be valid ANSI C (modulo
the other stuff), but not valid K&R C:

main()
{
	int a[5][7];
	int (*p)[5][7];

	p = &a; /* no cast */
	printf("a %d p %d *p %d\n",a,p,*p);
		/* "a", "p", and "*p" have the same numeric value */
		/* however, "p" and "*p" are *NOT* equivalent */
		/* "p" is a pointer to "a", "*p" is a pointer to "a[0]" */
	(*p)[2][4] = 123;
	printf("%d\n",a[2][4]); /* 123 */
}

chris@mimsy.UUCP (Chris Torek) (05/11/88)

>>In article <386@m3.mfci.UUCP> root@mfci.UUCP (SuperUser) wrote:
>>>... "type (*)[...]" and "type **" are clearly incompatible types....
>>>However, pcc compilers [without Guy Harris's fix, or equivalent]
>>>don't give a warning, and I was once told that Dennis Ritchie considers
>>>it to be perfectly legal C.

>In article <11371@mimsy.UUCP> I asked:
>>Told by whom?

In article <392@m3.mfci.UUCP> root@mfci.UUCP (SuperUser) answered:
>By Bjarne Stroustrup, whom I assume simply asked him.

Well, at least we have Big Names :-) .  Perhaps Dennis has changed his
mind since then, or maybe we just disagree.

[rest deleted; read the parent article.]

In any case, the draft proposed standard makes `type (*)[N]' and
`type **' different types, and treats &array (where array is declared
with `type array[N]') as a value of type `type (*)[N]', which is the
way root@mfci.UUCP and I both think things should be.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/11/88)

There is an argument for allowing &a. If you have a typedef and don't
need the user to be aware of the inner workings, you would have to know
if the typedef evaluated to an array, to prevent warnings.

Consider:
  typedef int ary[10];		/* system dependent header	*/

  ary mystuff;			/* user declaration		*/
  do_init(&mystuff);		/* warning here			*/

If the user just uses 'mystuff' without the '&' the program must be
rewritten if the internal structure of type 'ary' changes. The solution
is to force a structure, thus:
  typedef struct { int vect[10]; } ary;

Then either the item or the address can be used, and only the procedures
which work with the contents of the structure need to know its contents.

-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

karzes@mfci.UUCP (Tom Karzes) (05/12/88)

In article <52684@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>First: mail to "root" at "mfci" failed, so I'll post this; could the person
>maintaining netnews at Multiflow please try to arrange that not all messages
>from there have a "From:" line listing "root@mfci.UUCP" as the poster?  The
>*real* poster's name appears in the "Reply-To:" line, so the information *is*
>available.

Unfortunately, no one here is responsible for maintaining netnews.  However,
the problem is that the From field isn't being set by the machine on which
the message originates, so it ends up being set to root.  In theory this
won't happen if the From field is set manually.  If I ever have the time I'll
probably fix it, but until then we'll see if this works.

>This is *not* the same as the "it" referred to above, which is an assignment
>of a value of type "struct outfile **" to a variable of type
>"struct outfile (*)[]".  The latter is not valid; "array of <type>" and
>"pointer to <type>" are inequivalent types, and therefore "pointer to array
>of <type>" and "pointer to pointer to <type>" are inequivalent types.  I would
>be *EXTREMELY* surprised if Dennis Ritchie felt they were equivalent.

This is a slight oversimplification.  In many contexts, "array of <type>"
will be converted to "pointer to <type>", so that things like the following
are legal:

    int     a[10];
    int    *p;

    p = a;

The types "int *" and "int [10]" are inequivalent, but in the above context
"int [10]" is coerced to "int *".  The point of contention is whether similar
coercion should take place at deeper levels, as in:

    int   (*p)[10];
    int   **q;

    p = q;

This is obviously bad because p requires one level of physical indirection
while q requires two levels of physical indirection, although both contain
two levels of "virtual" indirection.  I don't like it any more than you
or Chris do, and if it were up to me I'd make it illegal.  The only
reasons I can cite for considering it legal C are as follows:

    1.  Code such as this has historically compiled without warning in
        pcc-derived compilers, which inevitably leads to people using
        (type **) as a cast where they should really use (type (*)[N]),
        because they didn't know the correct way to form the cast.

    2.  Bjarne once told me that Dennis told him that this was C.  I have
        no knowledge of this other than what I sent in my previous posting.

In spite of this, I still think it's probably worth giving a warning,
since it's most likely a bug if someone writes such code.  If it's not
a bug, then they either used the wrong cast or were just lazy.

>If it's not clear why they must be inequivalent, here's a specific example.
[deleted example]

Yes, it's clear why they should be incompatible types.  It always has been.

throopw@xyzzy.UUCP (Wayne A. Throop) (05/14/88)

> guy@gorodish.Sun.COM (Guy Harris)
>> PS Dennis claims that this is C:
>> main()
>> {
>> 	int a[5][7] ;
>> 	int (*p)[5][7];
>>
>> 	p = (int***) a; /* no & */
>> 	printf("a %d p %d *p %d\n",a,p,*p); /* a == p == *p !!!  */
>> 	(*p)[2][4] = 123 ;
>>	printf("%d\n",a[2][4]); /* 123 */
>> }
>> It works!  Amazing!
> The "(int ***)" cast is incorrect; "p" does
> *NOT* have type "pointer to pointer to pointer to 'int'."

Guy, of course, is correct.  But I'd like to elaborate on this central
point, and explain it slightly differently than does Guy.

On a system where pointers all have the same format regardless of type,
the assignment of p would work as outlined above if it were rendered in
ANY of these ways:

        p = (int *)a;
        p = (int **)a;
        p = (int ****)a;
        p = (char *)a;
        p = (void *)a;
        p = a;

On many systems it would even work as

        p = (int)a;
        p = (long)a;

and so on and on, because everything that follows the assignment depends
upon the type of p AND NOT on the type of the expression assigned to p.

In particular, think of what would happen if you were to replace the "p"
in the assignment 

        (*p)[2][4] = 123;

with the supposedly identical ((int ***)a).  That is, what would you
expect to happen in this program:


main(){
    int a[5][7] ;
    (*((int ***)a))[2][4] = 123 ;
    printf("%d\n",a[2][4]);
}

Careful study of this altered example should convince you that (int ***)
is not anything NEAR the same type as (int (*)[5][7]).

--
You will not see a monster {at Loch Ness}
just as millions before you have not.
                        --- Charls Kuralt
-- 
Wayne Throop      <the-known-world>!mcnc!rti!xyzzy!throopw