[net.lang.c] C'mon, guys!

allbery@ncoast.UUCP (Brandon Allbery) (05/25/86)

Expires:

Quoted from <357@dg_rtp.UUCP> ["Re: allocating arrays"], by throopw@dg_rtp.UUCP (Wayne Throop)...
+---------------
| > david@ukma.UUCP (David Herron)
| >> throopw@dg_rtp.UUCP (Wayne Throop)
| >>> allbery@ncoast.UUCP (Brandon Allbery)
| >>> | ???
| >>> | Consider an array of 15 pointers to arrays of doubles:
| >>> |     double (*parray[15])[];
| 
| >>> double (*parray[15])[]; means:
| >>>       an indefinitely-sized array of (or a pointer to)
| >>>               an array of 15
| >>>                       (double *)
| 
| >>Wrong.  It means just what the original poster said it meant.  It is an
| >>array of 15 pointers to arrays of double.
| 
| > I respectively submit that you're full of it and that Brandon is correct.
| 
| I exasperatedly submit that I'm quite empty of it, and that Brandon is
| as wrong as wrong can be.  Also, after saying I'm wrong, your examples
| go on to support my position strongly, so you've succeeded in puzzling
| me no end, as well as prolonging this discussion beyond it's natrual
| span.
+---------------

I concede.  But it wasn't pointer-vs.-array that threw me; f[] and *f are
identical, whereas f[5] and *f are NOT and neither are f[] = {3} and *f.
What threw me was getting my insides and outsides confused.  C declarations
are giving me gray hairs!  Anyone for Modula-2?

The declaration is correct, the cast should be to (double **), and MSC is as
screwed up as everything else I've ever seen from Microsoft.  (So what's
new?)  I'm interested in knowing why your sys5 passed it without an illegal
pointer combo message, though.

--Brandon
-- 
decvax!cwruecmp!ncoast!allbery  ncoast!allbery@Case.CSNET  ncoast!tdi2!brandon
(ncoast!tdi2!root for business) 6615 Center St. #A1-105, Mentor, OH 44060-4101
Phone: +01 216 974 9210      CIS 74106,1032      MCI MAIL BALLBERY (part-time)
PC UNIX/UNIX PC - which do you like best?  See <1129@ncoast.UUCP> in net.unix.

guy@sun.UUCP (05/27/86)

> I'm interested in knowing why your sys5 passed it without an illegal
> pointer combo message, though.

I don't know what "it" refers to here, but if "it" is an assignment of a
pointer to an array of X to a pointer to a pointer to X, or something like
that, the reason why the PCC in question (and most other PCCs) passed it is
that PCC has a bug in it.  The type checking code in "chkpun" is completely
wrong.  It treats pointers and arrays as equivalent everywhere.  See
net.bugs.v7/net.bugs.2bsd/net.bugs.4bsd and net.bugs.usg for fixes.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.arpa

greg@utcsri.UUCP (Gregory Smith) (05/27/86)

In article <1194@ncoast.UUCP> allbery@ncoast.UUCP (Brandon Allbery) writes:
>I concede.  But it wasn't pointer-vs.-array that threw me; f[] and *f are
>identical, whereas f[5] and *f are NOT and neither are f[] = {3} and *f.
 ^^^^^^^^^
Wrong. They shouldn't be, anyway.

Let's shed a little reality on this:
	int nul_ary[];
	int *p;
	main(){
		int i;
		i = nul_ary[0];
		i = p[0];
	}
------------------ output: ( vax, trimmed )----
	.data
	.comm	_nul_ary,0	; 0 bytes for nul_ary
	.comm	_p,4		; 4 bytes for the pointer
	.text
....
	movl	_nul_ary,-4(fp)	; get i = word @ _nul_ary
	movl	_p,r0		; get p in r0
	movl	(r0),-4(fp)	; get i = word pointed to by p
----------------------
nul_ary is treated as an array that happens to have 0 elements, so
*any* subscript is out of range. It can be seen that `i=nul_ary[0]'
will actually do i=(int)p, since the assembler symbols '_nul_ary' and
'_p' are at the same address. Like I said, out of range.

All four compilers I tried behaved the same way (tho I think 3 are pcc-based).

There are some compilers that represent ( in the symbol table ) an array in
the same way as a pointer. The two are distinguished by setting the array
size to 0 to indicate a pointer. After all, pointers don't have array sizes,
and *nobody* uses 0-sized arrays, right? *wrong*. This is the 'clean' way
of declaring an external array of unspecified size:

extern int array[];	/* or array[0] */

So some compilers will 'accidentally' give you a pointer for a
declaration like this, simply because the dimension in the symbol table is
set to zero, and the resulting entry is the same as that for a pointer.
I.e. The generated code will behave as if you had said `extern int *array;'
Obviously, if the external variable is an array, it ain't gonna woik.

If your compiler produces the same code for both variables in the above
sample prog ( with p and nul_ary ), then it is suffering from this bug.
This should not be construed as a language feature, it is an implementation
botch. Some compilers may get the storage declaration right, and then
get the code generation wrong, so check both.

The only compiler that I know for sure has this botch is C/80 for 8080,
which is not full C anyway. From the way this discussion has been going
there must be others. So if you still think f[] and *f are the same, please
check the generated code. Let me/us know about compilers with this bug.

BTW, if you say f[]={3,4}; the size of the array is set to 2 elements by
counting initializers, so there will be no problem.
-- 
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg

throopw@dg_rtp.UUCP (Wayne Throop) (05/30/86)

> allbery@ncoast.UUCP (Brandon Allbery)

> I concede.  But it wasn't pointer-vs.-array that threw me; f[] and *f are
> identical, whereas f[5] and *f are NOT and neither are f[] = {3} and *f.
> What threw me was getting my insides and outsides confused.  C declarations
> are giving me gray hairs!  Anyone for Modula-2?

I'm for Modula-2 too.  But you don't concede enough, apparently, since
you still think that f[] and *f are the same thing, which they are not,
and you go on to say:

> The declaration is correct, the cast should be to (double **), and MSC is as
> screwed up as everything else I've ever seen from Microsoft.  (So what's
> new?)  I'm interested in knowing why your sys5 passed it without an illegal
> pointer combo message, though.

Which is WRONG, WRONG, WRONG.  The cast should be (double (*)[]).  Many
folks still seem to think that declaring arrays of unknown size is the
same as declaring a pointer.  It is NOT so.  Apparently, I have not
convinced you yet.  Let's see what various tools say about this example:

    1    char *malloc();
    2    void f(){
    3        int (*a)[], (*b)[], (*c)[];
    4        a = (int **)    malloc( (unsigned)sizeof(int)*10 );
    5        b = (int (*)[]) malloc( (unsigned)sizeof(int)*10 );
    6        c = (int *)     malloc( (unsigned)sizeof(int)*10 );
    7    }

Well, what ought to happen here?  The assignment on line 5 is the only
one which has matching types, so everybody ought to complain about the
other two.

Our compiler doesn't raise a peep for any of the three, but this doesn't
surprise me. C compilers most often take a "parts is parts" attitude,
and ignore minor type issues if the intent is "clear".

Our local typechecker says (compressing some whitespace):

    4   inconsistent types discovered
            Types are:  (:POINTER_TO (:ARRAY_OF (:INT) () ()))
            and:        (:POINTER_TO (:POINTER_TO (:INT)))
    6   inconsistent types discovered
            Types are:  (:POINTER_TO (:ARRAY_OF (:INT) () ()))
            and:        (:POINTER_TO (:INT))

Lint, on the other hand, only complains about the assignment on line 6,
saying

    warning: illegal pointer combination
        (6)

So, much to my disgust, lint doesn't catch what I claim is a blatant
error.  So who're ya  gonna believe, me or a stupid program?  :-)
Let's see if we can't induce lint to see a difference between a pointer
and an array with unknown bounds.  Let's consider *this* example:

    1    #include "stdio.h"
    2    void f(){
    3        int (*a)[], **b;
    4        printf( "%d %d\n",
    5            sizeof *a,
    6            sizeof *b );
    7    }

This time, our compiler says

    Error 276 severity 3 beginning on line 5
    You cannot take the size of an array with unknown bounds.

Our local typechecker doesn't raise a peep (it doesn't attempt to
evaluate sizeofs).

And lint, glory be, says

    (5)  warning: sizeof returns value less than or equal to zero

So, lint *does* know the difference between an array of unknown bounds
and a pointer (it correctly complained about applying sizeof to the
array, and allowed the sizeof of the pointer), it just doesn't complain
if you try to assign a pointer-to-a-pointer to a pointer-to-an-array.
That is, lint has a bug, which answers the question as to why lint
doesn't call "illegal pointer combo" on line 4 of the first example.
Further supporting the contention that lint has a bug, it doesn't
complain about this example, which *everyone* should agree is incorrect:

    char *malloc();
    void f(){
        int (*a)[10];
        a = (int **) malloc( (unsigned) sizeof(int)*10 );
    }

In this case, lint apparently thinks that (int (*)[10]) is the same type
as (int **), clearly wrong.

Let's look at one more example.

    #include "stdio.h"
    void main(){
        int ia[10] = {1}, (*a)[] = (int (*)[])ia;
        int i = 2, pi = &i, **b = &pi;
        printf( "%x %x  %d      %x %x  %d\n",
                 a, *a, **a,    b, *b, **b );
    }

This program, when run, prints

    70000a18 70000a18  1      70000a14 70000a12  2

So, a shows the contents of the pointer a, *a is an array name, and
hence shows the address of the first element pointed to by a, and **a is
an integer, the first one in the array *a.  On the other hand b shows
the contents of the pointer b, *b is a pointer, and hence shows the
contents of a second pointer, and **b is an integer, the one pointed to
by *b.

This shows that (*a)[] is talking about two chunks of storage, one being
a pointer to the other, and the other being an array of integers (of
unknown size).  **b, on the other hand, is talking about *three* chunks
of storage, one being a pointer to the second, the second being a
pointer to the third, the third being an integer (or, implicitly, an
array of integers of unknown size).  Note that in both of these cases,
only the *first* of the chunks of storage being talked about is
allocated by the definition of a or b.

Now, have I convinced you all that (*)[] is not the same thing as **,
or must I get out.........  THE RACK!  HA HA HAAAAAA!!!

--
"I didn't expect the bloody Spainish Inquisition."
                                --- Monty Python
-- 
Wayne Throop      <the-known-world>!mcnc!rti-sel!dg_rtp!throopw

rde@ukc.ac.uk (R.D.Eager) (05/31/86)

This discussion should move to net.lang.
-- 
           Bob Eager

           rde@ukc.UUCP
           rde@ukc
           ...!mcvax!ukc!rde

           Phone: +44 227 66822 ext 7589

guy@sun.uucp (Guy Harris) (06/01/86)

> That is, lint has a bug...

Yup.  See a recent posting in net.bugs.<various flavors of UNIX> for a fix
(unless some of you people out there *want* PCC and "lint" to concur in your
mistaken belief that pointers and arrays are equivalent).  Also note that
this fix also permits "lint" to check that you aren't trying to convert a
pointer to an array of, say, 10 "int"s to a pointer to an array of 20
"int"s; that is as questionable as converting a pointer to an "int" as
converting a pointer to a "double".
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.arpa

root@ucsfcca.UUCP (Computer Center) (06/03/86)

> 
> (unless some of you people out there *want* PCC and "lint" to concur in your
> mistaken belief that pointers and arrays are equivalent).  Also note that

Or do we have confusion between name and object?

K&R, p. 186

      An array name is a pointer expression.

     p. 210

      Every time an identifier of array type appears in an
      expression, it is converted into a pointer to the first
      member of the array.

So an array is not a pointer but a reference to an array
in an expression is a pointer.

Thos Sumner    (...ucbvax!ucsfcgl!ucsfcca.UCSF!thos)

guy@sun.uucp (Guy Harris) (06/04/86)

> Or do we have confusion between name and object?
> ...
> So an array is not a pointer but a reference to an array
> in an expression is a pointer.

That's part of it, but I strongly suspect that people have been confused by
the fact that when you use an array name in an expression it really means a
pointer to the first member of the array; they interpret this limited
interchangability as meaning pointers and arrays are equivalent.

There's been a discussion in net.lang.c of features to be added to C;
sometimes I think pointers should be *subtracted* from C, since people seem
to get very confused about them.  There's plenty of code out there which
indicates that the author thinks that, because the second argument to "stat"
is a "struct stat *", that you *have* to pass it a variable of type "struct
stat *" as its second argument; they declare such a variable, set it to
point to a declared "struct stat", and pass the variable to "stat" instead
of just "&<the struct>".  Of course, there's always the code which doesn't
declare the "struct stat", and doesn't initialize the pointer, but just
passes a pointer which points nowhere to "stat"; this pops up every so often
on net.lang.c with the question "Why doesn't this work?" attached.

OK, maybe they're useful, and shouldn't be removed.  C classes should spend
*lots* more time on them than they are doing, though, given these problems;
better that people learn in class before they start writing code, than be
forced to ask several thousand people on USENET why their code doesn't work.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

djfiander@watnot.UUCP (David J. Fiander) (06/06/86)

>> So an array is not a pointer but a reference to an array
>> in an expression is a pointer.
>
>That's part of it, but I strongly suspect that people have been confused by
>the fact that when you use an array name in an expression it really means a
>pointer to the first member of the array; they interpret this limited
>interchangability as meaning pointers and arrays are equivalent.

I discovered a very simple way to thing of the name of an array: think of
it a a constant.  That's how the compiler treats it at least, and that is
why the expression:
		
			{
			int a[10], *ptr;

			ptr = &a;
			}

is invalid, you can't take the address of a macro (which the name of
the array is to the compiler.

-- 
    UUCP  : {allegra,ihnp4,decvax,utzoo,clyde}!watmath!watnot!djfiander
    CSNET : djfiander%watnot@waterloo.CSNET
    ARPA  : djfiander%watnot%waterloo.csnet@csnet-relay.ARPA
    BITNET: djfiande@watdcs

gilbert@aimmi.UUCP (Gilbert Cockton) (06/10/86)

In article <3904@sun.uucp> guy@sun.UUCP writes:
>
>There's been a discussion in net.lang.c of features to be added to C;
>sometimes I think pointers should be *subtracted* from C, since people seem
>to get very confused about them.  
>
>OK, maybe they're useful, and shouldn't be removed.  C classes should spend
>*lots* more time on them than they are doing, though, given these problems;
>better that people learn in class before they start writing code, than be
>forced to ask several thousand people on USENET why their code doesn't work.

Despite reading three years of discussion on and off on the pointer-array
equivalance topic, as a casual user of C, I've never been able to come
up with a clear view on when pointers and arrays are equivalent.
I'm sure much of my confusion has actually been caused by reading the
news.

During some debates, large indigestible tracts have appeared. I
haven't got the time to wade through these and synthesise an idiot's
guide to array-pointer equivalences.

Any volunteers for a simple set of statements that get the message
across? There must be many C compiler experts out there.

All I can start with is a straw man, as I'm no expert. 

* given an array of dimensions a x b x c .. x n,
  the array name is a pointer to array[0][0][0]..[0]

* the only time this is any real use if when passing arrays by reference
  as `array' is easier and safer to write than

	  &(array[0][0][0]..[0])

  as you don't need to bear the array dimensions in mind.

peters@cubsvax.UUCP (Peter S. Shenkin) (06/12/86)

In article <aimmi.769> gilbert@aimmi.UUCP (Gilbert Cockton) writes:
>Despite reading three years of discussion on and off on the pointer-array
>equivalance topic, as a casual user of C, I've never been able to come
>up with a clear view on when pointers and arrays are equivalent.
>guide to array-pointer equivalences.
>    ...
>Any volunteers for a simple set of statements that get the message
>across? There must be many C compiler experts out there.
>
>All I can start with is a straw man, as I'm no expert. 
>
>* given an array of dimensions a x b x c .. x n,
>  the array name is a pointer to array[0][0][0]..[0]
>
>* the only time this is any real use if when passing arrays by reference
>  as `array' is easier and safer to write than
>
>	  &(array[0][0][0]..[0])
>
>  as you don't need to bear the array dimensions in mind.

I'm no expert either, but figured I could submit my own straw man as
more ammunition for the gurus:

	When a function is called with an array as its argument, what 
	is passed is a pointer to the first element of the array.  
	That's all there is, there ain't no more.

Peter S. Shenkin	 Columbia Univ. Biology Dept., NY, NY  10027
{philabs,rna}!cubsvax!peters		cubsvax!peters@columbia.ARPA

friesen@psivax.UUCP (Stanley Friesen) (06/13/86)

In article <769@aimmi.UUCP> gilbert@aimmi.UUCP (Gilbert Cockton) writes:
>
>Despite reading three years of discussion on and off on the pointer-array
>equivalance topic, as a casual user of C, I've never been able to come
>up with a clear view on when pointers and arrays are equivalent.
>I'm sure much of my confusion has actually been caused by reading the
>news.
>
>Any volunteers for a simple set of statements that get the message
>across? There must be many C compiler experts out there.
>

	OK, I'll try. First statement

 * A two-dimensional array is an array of one-dimensional arrays, and
   multi-dimensioanl arrays are produced by repeating this composition
   process.

>All I can start with is a straw man, as I'm no expert. 
>
>* given an array of dimensions a x b x c .. x n,
>  the array name is a pointer to array[0][0][0]..[0]
>
      * Close, the array name is a *constant* pointer to the first
	b x c x .. x n dimensional *subarray*, i.e. to array[0].

>* the only time this is any real use if when passing arrays by reference
>  as `array' is easier and safer to write than
>
>	  &(array[0][0][0]..[0])
>
>  as you don't need to bear the array dimensions in mind.

      * I would say its real use is to allow writing algorithms which
	may be applied to any one of several similar arrays by using
	a pointer instead of an array and initializing the pointer to
	the first element of the array. (especially since 'array' is
	equivalent to '&(array[0])' rather than the longer form you
	suggest)


	To summarize, given the declaration:

TYPE array[a][b][c]...[n];

	the following two usages are equivalent:
	array		&array[0]
-- 

				Sarima (Stanley Friesen)

UUCP: {ttidca|ihnp4|sdcrdcf|quad1|nrcvax|bellcore|logico}!psivax!friesen
ARPA: ??

guy@sun.uucp (Guy Harris) (06/14/86)

> 	When a function is called with an array as its argument, what 
> 	is passed is a pointer to the first element of the array.  
> 	That's all there is, there ain't no more.

Try

	When an array name is used in an expression, it is treated as
	a pointer to the first element of the array.

It doesn't just happen when an array name is used as an argument; that's
just a special case, since it's an expression.

The *only* other form of pointer/array interchangability is that a
declaration of a formal argument as an array of X is treated as a
declaration of that argument as a pointer to X.  No other declarations of
arrays are equivalent to declarations of pointers; this is the one that
seems to bite people.  If you declare an array of 20 "int"s in the module
that defines that array, and you want to make an external reference from
another module, do NOT call it a pointer to "int".  Call it an array of an
unspecified number of "int"s ("int x[]") if that module doesn't know the
size of the array.  Better still, call it an *external" array of an
unspecified number of "int"s ("extern int x[]") and your code will build
even on systems which don't have UNIX's "common block" model of external
definitions and references.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

throopw@dg_rtp.UUCP (Wayne Throop) (06/14/86)

> gilbert@aimmi.UUCP (Gilbert Cockton)

> * given an array of dimensions a x b x c .. x n,
>   the array name is a pointer to array[0][0][0]..[0]

Not quite.  It isn't quite right to say that the array name "is" a
pointer... this is what causes all the confusion.  The array name, when
*evaluated*, *yields* an address (except when the object of "sizeof").
Note that here I am using the term "address" to mean a non-lvalue
address expression, and "pointer" to mean an lvalue address expression.
Using this terminology, arrays are *never* pointers (except the usually
cited kludge of an array formal argument.)

> * the only time this is any real use if when passing arrays by reference
>   as `array' is easier and safer to write than
>
>           &(array[0][0][0]..[0])
>
>   as you don't need to bear the array dimensions in mind.

Huh?  Since the operation e1[e2] is *defined* in C to be (*(e1+e2)), the
fact that an array name evaluates to the address is used *even* *when*
*an* *array* *is* *subscripted*.


I'm not sure just what you are unclear about.  It seems to me that most
people are confused on two points:

     1) They think that the fact that pointers and arrays can each be
        indirected and subscripted means that they are "the same thing".
        This is not the case.  They both yeild address values when
        evaluated (except when object of sizeof), but they are
        describing an entirely different runtime storage arrangement.
     2) If they are unconfused on point (1), they think that arrays of
        unknown size are pointers.  This is *not* the case.  This
        mistake is probably due to the fact that array formal arguments
        are equivalent to pointers, but this is a special case exception
        and does *not* make them equivalent in the general case.

If you get straight these things:

  - how pointers/arrays are *evaluated*
  - what *runtime storage* their declaration implies
  - what *operations* can be done on addresses, and how subscripting is
    defined in terms of address arithmetic

then you have a chance of understanding arrays and pointers in C.  Most
people are confused on one or more of these points.
-- 
Wayne Throop      <the-known-world>!mcnc!rti-sel!dg_rtp!throopw

bzs@bu-cs.UUCP (Barry Shein) (06/18/86)

Guy Harris writes:
>...sometimes I think pointers should be *subtracted* from C, since people seem
>to get very confused about them...
>...OK, maybe they're useful, and shouldn't be removed.  C classes should spend
>*lots* more time on them than they are doing, though, given these problems;
>better that people learn in class before they start writing code

Being as I teach a couple of C courses a year here at BU I thought I
would comment on this, perhaps it would be of some use to language
designers (or make them give up entirely!)

It's not 'C', I've had the same basic problems with students when
teaching IBM assembler, PL/I and Pascal.

There seems to be some fundamental problem with distinguishing
"the thing" from "the thing contained"*. In english (and probably
most other natural languages) we interchange these notions very
loosely as in "The White House said today..." or "she packed a
great lunch" (it's a little subtle, think about it, "I called 411"
is a similar conceptual variant.)

Somewhere in that mess they have to retrain themselves to carefully
distinguish between these two notions and not confuse a 'box' with
it's 'contents', an address with a value contained therein, to remember
two things for one object.

It's similar to the exasperating problem of teaching recursion (another
concept we are taught in language is actually an *error*, "don't define
things in terms of themselves!")

That is, it's more fundamental then simple language design although
design can affect the learning process. I believe C is actually
superior to, say, Pascal in this regard as Pascal never removes the
mystery from pointers (one exercise I will give a student having
conceptual problems is to printf some pointers in a meaningful
way as the program runs, such as when stepping through an array
with a pointer, try that in Pascal, you'll probably get a compile-time
error from the write() statement.)

Practice helps *IF* they can get past the first conceptual hurdles,
many never do. But then again, those that don't aren't writing to
net.lang.c, they're all in Law School...

	-Barry Shein, Boston University

* An amusing anecdote I've heard attributed to Marvin Minsky (who knows)
involves him saying angrily "YOU'RE CONFUSING THE 'THING' WITH 'ITSELF'!".

phil@osiris.UUCP (Philip Kos) (06/19/86)

Barry Shein writes:
> Guy Harris writes:
> >...sometimes I think pointers should be *subtracted* from C, since people
> >seem to get very confused about them...
> >...OK, maybe they're useful, and shouldn't be removed.  C classes should
> >spend *lots* more time on them than they are doing, though, given these
> >problems; better that people learn in class before they start writing code
> 
> Being as I teach a couple of C courses a year here at BU I thought I
> would comment on this, perhaps it would be of some use to language
> designers (or make them give up entirely!)
> 
> It's not 'C', I've had the same basic problems with students when
> teaching IBM assembler, PL/I and Pascal.
> 
> There seems to be some fundamental problem with distinguishing
> "the thing" from "the thing contained"*.....
> 
> 	-Barry Shein, Boston University

I'm sure that this is the problem with with many programmers, but not
all.  I just ran into a situation yesterday here at the Hospital where
one of our programmers was falsely attributing a bug to the C compiler
because the following declaration caused the program to die of a
segmentation violation:

	static char **ptrarray = {
		"",
		"str1",
		"str2",
		"etc."
	};

The programmer found that the alternate declaration

	static char *ptrarray[] = {
		"",
		"etc."
	};

ran just fine.  The confusion was compounded by the fact that this
programmer was taught in a supposedly reputable C class that *the two
declarations were always identical*.  I explained the situation in
which the two are effectively identical (formal parameter to a
function, natch) and the reason why the first just wouldn't work the
way it was intended, and things are OK now, but I'd like to point out
here that it's not just student C programmers who get confused about
pointers, and much more care needs to be taken on both ends of the
learning exchange process to ensure that the concept is transferred
completely and correctly.

(Of course there really *is* a bug in the compiler, since it let the
original declaration through without even a peep; it's just not the bug
imagined by the programmer.  For everyone's info, the compiler is cc on
OSx v3.1 [Pyramid Computer Corp.] and I have suggested that the company
be notified of this bug.  In the meantime, can anyone tell me whether
this is a known pcc bug, or is it specific to Pyramid's cc?)


Phil Kos			  ...!decvax!decuac
The Johns Hopkins Hospital                         > !aplcen!osiris!phil
Baltimore, MD			...!allegra!umcp-cs

"People say I'm crazy, dreaming my life away..."  - J. Lennon

peters@cubsvax.UUCP (Peter S. Shenkin) (06/19/86)

In article <bu-cs.811> bzs@bu-cs.UUCP (Barry Shein) writes:
>
>Guy Harris writes:
>>...sometimes I think pointers should be *subtracted* from C, since people seem
>>to get very confused about them...
>>...OK, maybe they're useful, and shouldn't be removed.  C classes should spend
>>*lots* more time on them than they are doing, though, given these problems;
>>better that people learn in class before they start writing code
>
>Being as I teach a couple of C courses a year here at BU I thought I
>would comment on this, perhaps it would be of some use to language
>designers (or make them give up entirely!)
>
>It's not 'C', I've had the same basic problems with students when
>teaching IBM assembler, PL/I and Pascal.

Just a quick remark.  When I was learning C, I understood that "*pi" meant "the
contents of pi," but somehow had difficulty conceptualizing why the declaration
"int *pi;" declares pi as a pointer to an int;  that is, I knew it was a
convention I had to memorize, but it didn't seem mnemonic to me.  Then, about
a month ago, revelation!:  read this as "the contents of pi is an integer;"
which implies, "pi is that which contains (or points to)" an integer.  Somehow 
it made thinking about the declarations easier.  It's occurred to me that maybe
everyone else in the world sees this from day 1, but for us dumb folks, having
this reading pointed out would probably make the learning process easier....

Peter S. Shenkin	 Columbia Univ. Biology Dept., NY, NY  10027
{philabs,rna}!cubsvax!peters		cubsvax!peters@columbia.ARPA

jrv@siemens.UUCP (06/20/86)

Peter Shenkin writes:

>Just a quick remark.  When I was learning C, I understood that "*pi" meant "the
>contents of pi," but somehow had difficulty conceptualizing why the declaration
>"int *pi;" declares pi as a pointer to an int;  that is, I knew it was a
>convention I had to memorize, but it didn't seem mnemonic to me.  Then, about
>a month ago, revelation!:  read this as "the contents of pi is an integer;"
>which implies, "pi is that which contains (or points to)" an integer.  Somehow

	Interesting how the same words have completely different meanings
	for two people. The revelation which helps Peter keep pointers
	straight uses the exact words which helps me make the *distinction*
	between pointers and what they point to.

	For me contents of pi is a *pointer* to an integer. I do not make
	the association of "contains (or points to)" as the same thing. In
	fact I emphasis the difference.

	I agree that the declaration of pointers and their use is not
	consistent. Two statements which to the novice C programmer look
	very similar have different results:

		In a declaration:

			Int *pi = abcd;

		will store a value in the pointer variable pi.
		While in the middle of code:

			*pi = wxyz;

		will store a value where pi points.

	In the C classes I have taught this inconsistency did not appear to
	be a major stumbling point. When asked at the end of the semester
	what was the hardest topic in the course the overwhelming response
	is, "POINTERS, POINTERS, POINTERS". It was there general use not so
	much the declaration/initialization. Each semester I have added more
	material but have still not gotten to the point where the confusion
	level has been reduced to my satisfaction.

	I don't think that this is
	peculiar (sp?) to C. The extra level of indirection is a difficult
	concept to grasp whether it is in C or PASCAL or whatever. It is
	like riding a bicycle: once you know how you never forget; likewise
	once you understand pointers you know them for life. (There are
	probably several concepts which fall into this category. Recursion
	is another one that comes to mind.)

	(Jim, stop the semester ended over a month ago! In a minute the bell
	hasn't rung yet. :-) The one technique which I use often when I am
	attempting to keep a pointer staight is the "cover-up" technique.
	If I want to know what data type a variable is I go back to the place
	where it is declared and cover over what I am interested in and what
	remains is the data type of the object. For a simple example:

		Int *pi;

		given 'pi' cover this over in the declaration and you get
			'Int *'. So 'pi' is a pointer to an integer. 'pi'
			contains a pointer to an integer.

		or given '*pi' cover it in the declaration and you get 'Int'.
			So '*pi' is an integer. '*pi' contains an intger.
	
	Enough.

>it made thinking about the declarations easier.  It's occurred to me that maybe
>everyone else in the world sees this from day 1, but for us dumb folks, having
>this reading pointed out would probably make the learning process easier....
>
>Peter S. Shenkin	 Columbia Univ. Biology Dept., NY, NY  10027
>{philabs,rna}!cubsvax!peters		cubsvax!peters@columbia.ARPA
>


Jim Vallino
Siemens Research and Technology Lab.
Princeton, NJ
{allegra,ihnp4,seismo,philabs}!princeton!siemens!jrv

phaedrus@eneevax.UUCP (Praveen Kumar) (06/20/86)

I believe that a lot of the notation in C is derived from PDP assembly
language.  I think (it has been long time since I mucked around with
PDPs) that the increment, "++", and the dereferencing, "*" operators are
straight out of PDP assembly.  That is, they work the same way in
PDP assembly. 

pk
-- 
"Everybody wants a piece of pie, today," he said.
"You gotta watch the ones who always keep their hands clean."

phaedrus@eneevax.umd.edu or {seismo,allegra}!umcp-cs!eneevax!phaedrus

chris@umcp-cs.UUCP (Chris Torek) (06/21/86)

[Warning: this is not an article about `C', but rather an article
about `about C'.  Nothing truly technical is contained herein.]

In article <748@eneevax.UUCP> phaedrus@eneevax.UUCP (Praveen Kumar) writes:
>I believe that a lot of the notation in C is derived from PDP assembly
>language.  I think (it has been long time since I mucked around with
>PDPs) that the increment, "++", and the dereferencing, "*" operators are
>straight out of PDP assembly.

This is not really for me to say, for I was not in on the creation
of the C language, yet I feel I should answer this.  (If I do a
good enough job, perhaps I can even provoke DMR into a few minor
corrections. :-) )  Was the C notation derived from PDP-11 assembly?
I think the answer here is both no and yes.  Much C notation was
certainly influenced by '11 assembly; but I think `derived' is too
strong.

DEC PDP-11 assemblers use `@', not `*', but let us assume that Ken
Thompson had been using `*' with whatever assembler he was using.
(The 4BSD Vax assembler uses `*', so it is reasonable to guess that
this was handed down from an earlier era.)  First contrast

	mov	*(r4)+,-(r5)

with

	*--p = **q++;

(if I have not botched the '11 assembly; I have never used an '11).
Close?  Well, somewhat: I can see a resemblance, at any rate.

Now step back a bit and consider the notation in and of itself.
We have here three basic operations: `--p', `q++', and `*'.  From
early mathematics notation we can take `-' as `subtract' and `+'
as `add'.  `*' is an abberation; it looks more like one of the
generic binary operation symbols used in group theory than anything
else (though this may depend on your terminal's font).  As for why
there are two each of `+' and `-', I think we can put that down to
the exigencies of parsing.  Now we have `-p' and `q+'---but what
might these mean?  Well, if `-' is subtract and `+' is add, then
we have `subtracted p' and `q added'.  There is nothing explicitly
being subtracted or added, so it is perhaps reasonable to assume
one of the classical computer science numbers, namely `zero', `one',
and `many'.  Adding and subtracting zero is useless, and adding
and subtracting many is ambiguous, so we will add and subtract one.
I think it is also a small step to say that the `-' is `before'
`p', and the `+' is `after' `q', so we should do the subtraction
`before' and the addition `after'.  Before and after what?  Here
I resort to fiat and say `before and after *, which we define to
mean indirection'.

Of course, all this does is demonstrate that the PDP-11 assembly
notation was in some respects `reasonable', and not that the notation
appears in C for that particular reason.  In order to refute the
quoted statement above, I must find `a lot of C notation' that does
not seem to be `derived from PDP assembly'.  So let us consider
some more C notation, in particular in expressions.

1.  Arithmetic.  C arithmetic seems to be quite conventional for
    post-FORTRAN languages.  `a + b * (c - d)' does not look much
    like a series of `sub', `mul', and `add' instructions to me.

2.  Structures.  Structure member access via `.' is again very
    conventional; it looks like PL/I, among others.  Pointer
    member access is a little different.  `p->member' can indeed
    be done with a single '11 instruction in many cases, yet
    the `->' notation itself does not appear in '11 assembly.

3.  Logical operations.  `&&' and `||' have no direct counterpart
    in '11 assembly, and must be implemented with rather complex
    series of tests and branches.

No doubt more examples can be found by those cleverer than I;
but I think this much is sufficient.  I think I will close by
saying that the notation used in C is simply a well-coordinated
set of notations borrowed from other places and languages,
including but not limited to PDP-11 assembly, and modified as
appropriate to obtain that coordination.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

campbell@sauron.UUCP (Mark Campbell) (06/21/86)

> Guy Harris writes:
> >...sometimes I think pointers should be *subtracted* from C, since people seem
> >to get very confused about them...
> > [...]
> [...] 
> It's not 'C', I've had the same basic problems with students when
> teaching IBM assembler, PL/I and Pascal.
> [...]
> Somewhere in that mess they have to retrain themselves to carefully
> distinguish between these two notions and not confuse a 'box' with
> it's 'contents', an address with a value contained therein, to remember
> two things for one object.
> [...] 	-Barry Shein, Boston University

Nicklaus Wirth, in "Algorithms + Data Structures = Programs", makes the observation
that the "goto" control structure and the pointer are analagous constructs.  It's
interesting reading.
-- 

Mark Campbell    Phone: (803)-791-6697     E-Mail: !ncsu!ncrcae!sauron!campbell

jso@edison.UUCP (John Owens) (06/26/86)

In article <487@cubsvax.UUCP>, peters@cubsvax.UUCP writes:
> Just a quick remark.  When I was learning C, I understood that "*pi"
> meant "the contents of pi," but somehow had difficulty conceptualizing
> why the declaration "int *pi;" declares pi as a pointer to an int;
> that is, I knew it was a convention I had to memorize, but it didn't
> seem mnemonic to me.  Then, about a month ago, revelation!: read this
> as "the contents of pi is an integer;" which implies, "pi is that
> which contains (or points to)" an integer.
> Peter S. Shenkin	 Columbia Univ. Biology Dept., NY, NY  10027

Maybe it's my machine-language heritage showing, but I've always found
it least confusing to think of "pi" as "the contents of pi" (which is
a pointer), "*pi" as "that which (the contents of) pi points to",
"int i" as declaring i to contain an int, and "int *pi" as declaring
pi to contain a pointer to an int.

The crucial difference is in the notions of "to contain" and "to be".
The trick is that "int i" does not mean that i *is* an int, but that i
*is* a variable which contains an int.  The expression "i" is, of
course, an int: the int which i contains.

As far as the syntax of the declaration goes, this has proven useful:

	int  pi;	(the expression)   pi    is an int
	int *pi;	(the expression)  *pi    is an int
	int  pi[2];	(the expression)   pi[i] is an int
	int *pi();	(the expression)  *pi()  is an int

	John Owens @ General Electric Company	(+1 804 978 5726)
	edison!jso%virginia@CSNet-Relay.ARPA		[old arpa]
	edison!jso@virginia.EDU				[w/ nameservers]
	jso@edison.UUCP					[w/ uucp domains]
	{cbosgd allegra ncsu xanth}!uvacs!edison!jso	[roll your own]

pete@valid.UUCP (Pete Zakel) (07/03/86)

> I'm sure that this is the problem with with many programmers, but not
> all.  I just ran into a situation yesterday here at the Hospital where
> one of our programmers was falsely attributing a bug to the C compiler
> because the following declaration caused the program to die of a
> segmentation violation:
> 
> 	static char **ptrarray = {
> 		"",
> 		"str1",
> 		"str2",
> 		"etc."
> 	};
> 
> The programmer found that the alternate declaration
> 
> 	static char *ptrarray[] = {
> 		"",
> 		"etc."
> 	};
> 
> ran just fine.
> 
> (Of course there really *is* a bug in the compiler, since it let the
> original declaration through without even a peep; it's just not the bug
> imagined by the programmer.  For everyone's info, the compiler is cc on
> OSx v3.1 [Pyramid Computer Corp.] and I have suggested that the company
> be notified of this bug.  In the meantime, can anyone tell me whether
> this is a known pcc bug, or is it specific to Pyramid's cc?)
> 
> Phil Kos			  ...!decvax!decuac

Our pcc based compiler said "warning: illegal pointer combination, op ="
and "compiler error: initialization alignment error" to the first
declaration.
-- 
-Pete Zakel (..!{hplabs,amd,pyramid,ihnp4}!pesnta!valid!pete)

pete@valid.UUCP (Pete Zakel) (07/03/86)

> Just a quick remark.  When I was learning C, I understood that "*pi" meant "the
> contents of pi," but somehow had difficulty conceptualizing why the declaration
> "int *pi;" declares pi as a pointer to an int;  that is, I knew it was a
> convention I had to memorize, but it didn't seem mnemonic to me.  Then, about
> a month ago, revelation!:  read this as "the contents of pi is an integer;"
> which implies, "pi is that which contains (or points to)" an integer.  Somehow 
> it made thinking about the declarations easier.  It's occurred to me that maybe
> everyone else in the world sees this from day 1, but for us dumb folks, having
> this reading pointed out would probably make the learning process easier....
> 
> Peter S. Shenkin

I find it a lot easier to think of "int i" as "i is an integer", "int *pi" as
"pi is a pointer to an integer", "pi" as "the address of an integer", and
"*pi" as "the integer that pi points to".  Of course, being an assembly
programmer at heart, I tend to understand the things in the way that they
are actually used by the machine.  If one thinks of "pi" as the name of a
box, then the contents of that box is the address of an integer, NOT the
integer itself.  I would think of "*pi" as the box that resides at the
address contained in "pi", which contains an integer.

In the wording used above, I would think using "the contents of what pi
points to" as a definition for "*pi" would be much better than what is 
stated above.
-- 
-Pete Zakel (..!{hplabs,amd,pyramid,ihnp4}!pesnta!valid!pete)

peters@cubsvax.UUCP (Peter S. Shenkin) (07/05/86)

In article <edison.811> jso@edison.UUCP (John Owens) writes:
>In article <487@cubsvax.UUCP>, peters@cubsvax.UUCP writes:
>> Just a quick remark.  When I was learning C, I understood that "*pi"
>> meant "the contents of pi," but somehow had difficulty conceptualizing
>> why the declaration "int *pi;" declares pi as a pointer to an int;
>> that is, I knew it was a convention I had to memorize, but it didn't
>> seem mnemonic to me.  Then, about a month ago, revelation!: read this
>> as "the contents of pi is an integer;" which implies, "pi is that
>> which contains (or points to)" an integer.
>> Peter S. Shenkin	 Columbia Univ. Biology Dept., NY, NY  10027

>
>Maybe it's my machine-language heritage showing, but I've always found
>it least confusing to think of "pi" as "the contents of pi" (which is
>a pointer), "*pi" as "that which (the contents of) pi points to",
>"int i" as declaring i to contain an int, and "int *pi" as declaring
>pi to contain a pointer to an int.
>

Like many revelations of mine, I just discovered that this one was right
there in K&R from the first;  "...I guess I was too young / to realize...":

p90:		int *px;
	is intended as a mnemonic;  it says that the combination *px is
	an int...

I also note with amazement that I used the word "mnemonic," which also
occurs in this K&R passage, in my original posting.  I rarely use that
word, yet I hadn't read that section of K&R for about two years (that I can
recall) and its purport evidently eluded me at the time.  People have
been accused of plagiarism for less....

Peter S. Shenkin	 Columbia Univ. Biology Dept., NY, NY  10027
{philabs,rna}!cubsvax!peters		cubsvax!peters@columbia.ARPA