[net.lang.c] mixing pointers and arrays

cobb@csu-cs.UUCP (07/28/83)

Consider the two routines:

extern char *yytext;
main ()
{
stat ();
printf ("%s\n",*yytext);
}

and

char yytext [10];
stat ()
{
yytext [0] = 'y';
yytext [1] = 'e';
yytext [2] = 'a';
yytext [3] = 'r';
yytext [4] = '\0';
}

These two routines exist in different modules, and are compiled 
separately and then linked and loaded. However, every time I try to
execute the routine, I get an execution error. From reading the C
manual, the implication is that yytext is implemented as a pointer 
to some storage locations, and can therefore be treated as such. This
is reinforced by the legal program segment
            .
            .
            .
     char ch [n];
     char *chptr;
     chptr = ch;
            .
            .
which results in chptr being assigned the base location of the array
'ch'. Why then do the first routines blow up?
                                          Thanks
                                            S. Cobb
                                            !hao!csu-cs!cobb

tom@rlgvax.UUCP (07/30/83)

Concerning the following:

extern char *x;
main() {
	foo();
	printf("%s", *x);
}
-----------------------------<new file>----------
char x[10];
sub() {
	x[0] = 'a';
	x[1] = '\0';
}

First, the reason the program won't work is that it should be
"printf("%s", x);" WITHOUT the indirection on x.  What was originally
written passes a single character to printf() where a pointer (address)
was expected.

Second, the declaration "extern char *x;" is incorrect.  "x" is an array,
NOT a pointer, and must be declared as such.  The compiler thinks that
x is a pointer (i.e. a storage location containing an address).  When
you pass x to printf(), x is evaulated, i.e. the contents of that location
is pushed onto the stack.  However, in reality x is an ARRAY!  This means
that what "the contents of that location" is in fact a bunch of characters.
So, you are passing a bunch of characters which are being interpeted as
an address.  So, you blow up.  Solution: in main, declare "extern char x[];".

The example you mention:

	char *foo, bar[10];
	foo = bar;

works because using "bar" by itself is exactly the same as "&(bar[0])"
by definition in C, not because you can treat pointers and arrays
indiscriminately.  This next example also works for the same reason.

	main() {
	char foo[10];
	subr(foo);
	}
	subr(bar) char *bar; {
	}

I find that most new programmers find this second example confusing to say the
least.

- Tom Beres
{seismo, allegra, brl-bmd, mcnc, we13}!rlgvax!tom

dixon@ihuxa.UUCP (07/31/83)

1.) The routine stat() placed the binary values for 'y', 'e', 'a', and 'r'
    (NOT necessarily in that order) in the global storage location yytext
    that was defined for the global area yytext.  In the printf() statement
    the *yytext argument instructed the system to get the contents at the
    location given by the value in yytext. For the binary values of 'y', 'e',
    'a', and 'r' this is a VERY large number (assuming unsigned for character
    pointers). Much larger than the amount of memory you are apt to have on
    a VAX 780 system.

2.) If you wanted to print a string of characters using a pointer to that
    string, then you should use

      printf("%s\n",yytext);           rather than

      printf("%s\n",*yytext);

    since yytext has been declared to be a pointer to a charcter string.

alan@allegra.UUCP (07/31/83)

Consider the following fragments:

	--- file 1 ---

	extern char *foo;

	func() {
		printf("%c", *foo);
	}

	--- file 2 ---

	char foo[SIZE];

	---

The declaration

	extern char *foo;

says that foo is a variable holding the address of a character (in
this case, the first character of an array).   This is not correct.
The variable foo doesn't refer to a pointer to an array, but to an
array itself.

On the other hand,

	main() {
		char foo[SIZE];

		func(foo);
	}

	func(foo)
		char *foo;
	{
		printf("%c", *foo);
	}

will work just fine, although it would have been clearer to declare
foo as

	char foo[];

in func().

The key here is that in the first example, the foo in file one is the
same variable as the foo in file two, while in the second example, we
have two different foo's.   By saying

	func(foo);

we create a character pointer.   In func(), foo now refers to a location
on the stack which holds the address of the first character of the array
known by the name foo in main().

Like so many other blunders, this is all done in the name of efficiency.
As far as I'm concerned, it's one of the ugliest parts of the C language.

						Alan Driscoll
						BTL, Murray Hill

mp@mit-eddie.UUCP (Mark Plotnick) (08/01/83)

ihuxa!dixon recommended the following:

    If you wanted to print a string of characters using a pointer to that
    string, then you should use

      printf("%s\n",yytext);           rather than

      printf("%s\n",*yytext);

    since yytext has been declared to be a pointer to a character string.

Well, this is almost right.  Since the loader (at least on 4.1bsd and
System 3) has deemed that the external symbol _yytext is equal to
something like 10600 (the start of the 10-element array), if you do
printf("%s", yytext), you're going to get either a segmentation fault or
a lot of garbage, since you're passing printf the contents of 4 bytes
starting at 10600, which is 16230262571 on our machine.  If you do
printf("%s", *yytext), you'll have an even better chance of core
dumping.

The solution is to use compatible declarations - in this case, yytext
should be declared as
	extern char yytext[];
in the first program, not
	extern char *yytext;
THEN you can use dixon's suggested fix.

The System V loader presumably catches conflicting declarations such as
this, and may even catch such favorite constructs as multiple external
definitions.  The 3B20 loader of several years ago caught these errors
and I had a wonderful time while porting some Berkeley software in which
every module #included a header file full of global definitions.

	Mark (genrad!mit-eddie!mp, eagle!mit-vax!mp)

wyse@ihuxp.UUCP (08/02/83)

I suggest that you all reread section 5.3 in "The C Programming Language",
by our friends BWK and DMR.  Given that I have

	int a[10];

the reference a[i] is equivalent to *(a+i).  The difference between an array
name and a pointer is that the array name is a constant, i.e., you can't
assign to it.

When a array name is used as an argument to a function, the address of the
beginning of array is passed and as somebody else pointed out, it is
truly a pointer.  However, you can declare it in the function as
either
	char *s;
or
	char s[];
as they are equivalent and which one is used depends on how the expressions
involving s will be written in the function.

		Neal Wyse
		ihnp4!ihuxp!wyse
		Bell Labs, Naperville Ill.

tom@rlgvax.UUCP (Tom Beres) (08/06/83)

If you realize that "char *p;" and "char p[];" are different and you
don't need an explanation why, skip this.  If not, read on.
NEAL WYSE at ihuxp, I think you need to read this.


Just because you can interchange "char *p;" and "char p[];" when
declaring a formal parameter does NOT mean you can confuse these
2 everywhere else.  They are NOT the same!

char foo[10];
foo[1] = '\0';
	means:  Take the ADDRESS "foo" and add 1 to it to get a new address.
	Then put a '\0' in the memory location indicated by that address.

char *foo;
foo[1] = '\0';
	means:  Look at the ADDRESS "foo" and take the VALUE you find in
	there.  Add 1 to that VALUE.  Now use that VALUE as an address,
	and put a '\0' in the memory location indicated by that address.
	NOTE: I realize that "foo" should first be set to point to someplace
	legit.

In both cases you can use indexing (i.e. "xxx[yyy]" is legit) but DIFFERENT
things happen in both cases!  In order for the compiler to generate the
right code, it must know whether "foo" is an array or a pointer.  In:

file 1:			|	file 2:
                        |
char array[10];		|	extern char *array;

You are giving the wrong information to the compiler in File 2.  The compiler
will oblige by giving you back wrong code.  You are stating in File 2 that
"array" is the address of a character pointer, so the code produced will
fit the 2nd description above.  But File 1 says that "array" is the
address of the first of 10 sequentially stored characters, so File 1
will produce code to suit the first description above.

Now for the question 'why can you use "char foo[];" and "char *foo;"
interchangably when declaring a formal parameter to a subroutine?'.
The reason is that, as we all know, when you pass an array in C, what
really gets passed to the subroutine is the address of the first element,
and the parameter variable will be initialized to that address.  So, the
parameter is really an honest-to-goodness pointer variable.  C knows that,
too, and what's more is a bit lenient about it.  If you declare the parameter
as "char foo[];" the compiler recognizes that the whole array really won't
be passed, that only the address will be, and that the parameter needs to be
a pointer, so C correctly interprets the slightly misstated declaration.

This leniency of interpretation is provided only when declaring parameters,
nowhere else!

- Tom Beres
{seismo, allegra, brl-bmd, mcnc, we13}!rlgvax!tom

dmmartindale@watcgl.UUCP (Dave Martindale) (08/11/83)

Since C now allows structures to be passed by value, I think that it would
be an interesting change if the language started supporting the passing of
arrays by value as well.  Then the meanings of "array" and "&array[0]"
would always be different, and would correspond to the way structures
are handled.

Unfortunately, this would break the majority of already-existing C
programs, so no one is every likely to do it.  Still, an interesting
idea.  Comments?

Dave Martindale, {allegra,decvax!watmath}!watcgl!dmmartindale

alan@allegra.UUCP (08/12/83)

Someone asked for comments about the idea of being allowed to pass
arrays (real arrays, not pointers) as arguments to functions.

I love it!

I would like to say things like

	main() {
		int array1[10], array2[10];

		array1 = array2;

		f(&array1);	/* Call f with the address of array1. */
		g(array2);	/* Call g with array2. */
	}

Arrays should behave like structures.  After all, an array is just a
structure whose elements are of the same type.  (Or, conversely, a
structure is just an array whose elements are not of the same type.)

Unfortunately, I can't see this ever happening, even though it would
make the language more consistent, more powerful, and more elegant.
Too bad.


	Alan Driscoll
	Bell Labs, Murray Hill

guy@rlgvax.UUCP (Guy Harris) (08/13/83)

I agree that if you're going to treat structures as (mostly) first-class
citizens, you should treat arrays the same way.  It would give you a
nice way to put "block moves" into code without having to use the USG UNIX
5.0 "memcpy"... routines as a side effect.  (If you want to do this you
can always declare a structure whose only member is an array, but this
is a ghastly kludge).

One benefit of structure-valued arguments and returns is that if you
want to pass several arguments to a function that are *really* all part
of the same data structure, you can do it more cleanly with a structure-
valued argument.  More importantly, it provides a way to get multiple
return values back from a function, which there is no other truly clean way
to do (you have to pass pointers to the return values otherwise).  One
major nuisance with C (and 99% of all programming languages) is that a
routine can't return both a value and a status.  The "read"/"write" system
calls, for example, must either return the number of bytes read/written
OR return an "abnormal termination" status - not both.  So, you have the
problem of trying to have "write" on a magtape say "Well, I wrote all 5120
bytes, but I hit the EOT marker while doing it" or a "write" on a terminal
saying "Well, I got 100 of those bytes out before the guy hit the interrupt
key".  The only way it could be done with only one return value is to have
something the convention that "errno" is cleared by all system calls unless
there is some abnormal condition; instead of testing for a return value of
"-1" (which is also a legitimate return value for some system calls) you
test for "errno != 0".  An alternative would be to have the return value
from a particular system call declared as a structure containing the
returned value and the status code.

The disadvantage is that some present compilers aren't set up to do them
so they don't do them very well.  PCC does some strange and non-reentrant
things for returning structure values, and the code in the compiler to
implement it is painful.  Whether this is saying that the architecture of
C compilers should not now be the one used for PCC or that C should not
be extended to make aggregates first-class citizens is left as a question
for posterity to answer (as is the question of how UNIX system calls
should indicate abnormal status - note, not all abnormal statuses are really
errors, especially with I/O).

	Guy Harris
	{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy

guy@rlgvax.UUCP (Guy Harris) (08/13/83)

P.S.  I also think that the equivalence between array names and
array addresses wasn't necessarily a good idea, given the problems it
seems to cause.  (Please, no "UNIX wasn't intended for novices" flames;
it's going to be used by novices whether you like it or not.)  I certainly
don't make use of it unless forced to do so by "lint"'s complaining about

	int foo[10];
	int *bar;

	bar = &foo;

I think that the current rules mix two concepts (pointers and arrays) in
a way that is often unclear and occasionally dangerous.  I would have
preferred it if "foo" had stood for the array, and "&foo" had stood for the
address of the array (the '&' would remind the reader of the code that it was
indeed an address without relying on their contextual knowledge of C to
fill that detail in) and, using the statements in the previous example,
you could refer to "foo[5]" either as "foo[5]" or "(*bar)[5]" (or, as I
usually do in such circumstances, *(bar + 5)).  This would have no effect
in what computations could be expressed, nor would it have any effect on
how those computations were implemented (i.e., you wouldn't get further
removed from the machine), it would just be syntactic sugar.  As such,
it probably wouldn't have done any harm but also isn't worth doing at this
point given that it would change the rules in the middle of the game.

	Guy Harris
	{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy

chris@umcp-cs.UUCP (08/13/83)

Well, you can always write

struct array {
	int x[10];
};

main () {
	struct array a1, a2;

	foo (&a1);		/* call foo with address of array a1 */
	bar (a2);		/* call bar with array a2 */
}

A bit of a kludge, but it works in the current C compilers....

				- Chris
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris.umcp-cs@UDel-Relay

mark@cbosgd.UUCP (08/14/83)

Having arrays passed by value instead of by reference would
indeed clean up the language.  However, let's take a look at
some of the consequences of this move:

(1) Almost all existing programs would have to be rewritten,
    since the change would not be upward compatible.
(2) If all arrays are passed by value, then character strings
    must be passed by value as well (after all, they are arrays
    of characters).  This means
	(a) programs will be slower, so that the copy can be made,
	(b) programs will be bigger, for a place to copy to,
	(c) subroutines that modify a character string in place
	    (like strcpy) will stop working.
(3) The above comments also apply to big arrays of things.  In a
    language such as Pascal, arrays being passed by value are one
    of the worst causes of very slow programs (especially when the
    author just didn't realize the array was going to be copied).
(4) The whole paradigm that arrays and pointers are interchangable
    would have to be redone, no doubt losing upward compatibility.

Remember that C is not a general purpose high level applications
programming language, it's a systems implementation language.  This
means it's closer to the machine, and you are supposed to be aware
of what's going on in the underlying machine.  The fact that so many
people are using C for applications shows not that C is well suited
for applications, but that it's usable for applications and nothing
else is supported as well on UNIX.

	Mark Horton

guy@rlgvax.UUCP (Guy Harris) (08/15/83)

I agree that even though passing arrays by value would be cleaner, it
wouldn't be desirable (interesting note about Pascal programmers).  I
don't know if DMR or any of the other founding fathers of C read this
newsgroup, but what is the general consensus about structure assignment
and passing structures by value?  I know it's a major mess to handle
them inside PCC, and some version of PCC generate very nasty code to
handle them (the push/copy of structures isn't done by a loop - the loop
is unrolled at compile time).  I rarely use them, and plan not to use
them in the future given that our MIT-based 68000 PCC generates the afore-
mentioned nasty code.  (The fix is fairly straightforward, I just haven't
felt motivated to put it in.)

	Guy Harris
	{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy

grahamr@bronze.UUCP (Graham Ross) (08/15/83)

What are arrays just like?

In Pascal, arrays are just like records except all the elements are the
same type.  In C, arrays are just like pointers except they reserve stor-
age and they don't work as lvalues.  This is a part of the "pointer rift"
separating Pascal and its relatives from C.

An array cannot be passed at all in C, without putting it in a struct.
When we do:
	f(a);
we are passing a pointer to the 0th element of a (C Ref Man Sec 10.1).
It is not a pointer to the array!  f sees it the same as this:
	p = a; f(p);
It makes no C sense to pass a pointer to the base element of a struct,
since the elements are of varying size.

	Graham Ross
	teklabs!tekmdp!grahamr

alan@allegra.UUCP (08/15/83)

I think my ideas about arrays being passed by value were misunderstood,
at least by one person.  A follow-up article made these points:
	
	(1) Almost all existing programs would have to be rewritten,
	    since the change would not be upward compatible.
	(2) If all arrays are passed by value, then character strings
	    must be passed by value as well (after all, they are arrays
	    of characters).  This means
		(a) programs will be slower, so that the copy can be made,
		(b) programs will be bigger, for a place to copy to,
		(c) subroutines that modify a character string in place
		    (like strcpy) will stop working.

Yes, this fix to the language would break a lot of programs, but if the
change were made, and the broken programs fixed, they would not be any
less efficient than they were before, and routines like strcpy would still
work just fine.

	len = strlen(s);

would just be replaced with

	len = strlen(&s);

I'm certainly not saying arrays should always be passed by value.  People
pass pointers to structures all the time, even though they could pass the
structure by value, if they chose.  I would just like the same choice with
arrays.  It would make the language simpler, clearer, and more consistent.


	Alan Driscoll
	Bell Labs, Murray Hill

mjl@ritcv.UUCP (Mike Lutz) (08/15/83)

Personally I like the idea of array names being equivalent to the
address of the first item in the array.  Philosophically, an array name
is simply a pointer constant, just like 87 is an integer constant.  One
of the biggest wins for this interpretation is that C can handle
strings conveniently (I won't stretch the point to say "elegantly").
Since this is something of a religious issue, I will try to be tolerant
of the heathens and infidels who don't believe in the tenets of the One
True Language.

I do have a complaint about the interpretation of a structure names, in
that these should be identical to array names.  That is, if "foo" is a
structure, then "foo" by itself should be a pointer to the structure.
In this instance, my complaint is based on the desireability of
consistency rather than the emotional issues of what an array/structure
name should denote.  I would like to treat arrays and structures
identically.

By the way, I think the Whitesmith's compiler does (or previous
versions of it did) interpret structure names as pointers.

Mike Lutz {allegra,seismo}!rochester!ritcv!mjl

andrew@orca.UUCP (Andrew Klossner) (08/16/83)

Here's a reactionary note:  I don't think that passing and returning
structures is a good idea, and should be backed out of the language.
It conflicts with the rest of C in that it can be a very "expensive"
operation, while the other C operations are "frugal".

However, I recognize that structure return values are vital in
producing YACC parsers when multiple attributes are in use.  Perhaps
this was the movitating force for this language enhancement?

  -- Andrew Klossner   (decvax!tektronix!tekecs!andrew)  [UUCP]
                       (andrew.tektronix@rand-relay)     [ARPA]

ken@turtleva.UUCP (Ken Turkowski) (08/17/83)

Now, wait a minute, Andrew Klossner, there are some reasons for passing
structures, and they are very natural.  Suppose your numbers were
rational numbers, that is, each number is composed of a numerator and
denominator.  Then it makes sense to pass the structure { int num, den; }
to processing routines.  Similarly with complex numbers { double re, im; },
two dimensional coordinates { int x, y; }, polar coordinates
{ double r, theta}.
			Ken Turkowski
		    CADLINC, Palo Alto
		{decwrl,amd70}!turtlevax!ken

stuart@rochester.UUCP (Stuart Friedberg) (08/17/83)

I recently ran afoul of the difference in treatment of structures and
arrays. I had an array which was passed to an image processing program
to be filled with the locations of all the boundary points of an image.
The call was BoundaryPoints(Image, Bounds). (Image and Bounds both arrays).

I then decided to have BoundaryPoints calculate a few related values like
centroid and "mass" of the interesting part of the image. I changed Bounds
to be a struct including the earlier array and adding variables for the
new info. I updated all the references to Bounds, but left the call alone.
Surprise, surprise! It all compiled, but previously working code crashes.
Well, it rapidly became clear that the structure was being passed by value,
not reference, but I did think that it was an unnecessary "gotcha".

Shades of the uniform reference problem!!!

						Stu Friedberg

ka@spanky.UUCP (08/17/83)

The PDP-11 C compiler accepts the construct &array to mean the address
of the array (as opposed to the address of the first element of the array).
The problem with using this feature, aside from the obvious one that it is
nonportable, is that it doesn't allow variable length arrays to be handled
without using lots of type casts.  If you have a pointer to an array you
must know the size of the array at compile time.  If you refer to an array
using the address of its first element, you can figure out the size at run
time.
					Kenneth Almquist

arnold@gatech.UUCP (08/17/83)

	I think the way to change C to copy arrays by value is to add a
new operator to the language. I suggest using the ` (back quote). This is
one of 3 characters which are not used in C at all (others are $ and @).
It would work something like this:

char x[10],y[10];	/* declaration of two arrays */

`x = `y;	/* copy array y to array x, equivalent to strcpy(x,y) */

foo(`y);     /* call procedure foo() with the entire array y sent by value */

	A procedure receiving an array passed by value would declare it
as follows:

foo(array)
char `array[];	/*  the [] says 'array', the ` says it is passed by value */
{
	/* code */
}


	So far, this would allow array assignment, without changing the meaning
of 'array name as pointer to first element'. It is also upward compatible,
and would not break any existing programs. The use of an explicit operator makes
it 100% clear that array operations are going on.
	The only problem with this is assignment of character strings; one
would have to do something ugly like:

	`x = ` "string constant";

however, the language could probably special-case it to make the ` before
the constant un-necessary, the same way that it special-cases initialization
of character arrays to allow using "string" instead of the more verbose
{ 's', 't', 'r', 'i', 'n', 'g', '\0' }. As with initialization, where the
braces form is allowed, the ` in front of a string constant would also be
valid (i.e. no error messages), just not necessary.

	An array concatenation operator, probably $, might also be useful.
For example:

#define DIM	/* whatever */
char x[DIM], y[DIM], z[DIM];

x = y $ z;		/* same as sprintf(x, "%s%s", y, z) */

x $= y;		/* the more common strcat(x,y); */

	Both the ` and the $ should work for arrays of any type, not
just character arrays, but character arrays would be where they were the
most useful. They should not work for arrays of mixed type.
	New operators would make it easy for lint to catch mistakes as
well, for instance  strcat(x, `y)  would be a type error, an array passed
instead of a pointer.
	C could use some extensions for dealing with arrays, we just have
to be *very* careful about how well they fit in with the rest of the language.
-- 
Arnold Robbins
Arnold @ GATech  (CS Net)	Arnold.GaTech @ UDel-Relay  (ARPA)
...!{sb1, allegra}!gatech!arnold  (uucp)
...!decvax!cornell!allegra!gatech!arnold

rgh@inmet.UUCP (08/20/83)

#R:gatech:-37800:inmet:5000004:000:602
inmet!rgh    Aug 19 08:46:00 1983

    The basic problem with passing arrays as values in C is that in
general you don't know how big the array is.  Consider:

	int a[20];		/* here you know */
	extern int b[];		/* no telling */
	f(c)
	    int c[];		/* can't tell */
	{
	}


    Pascal treats the length of an array as part of its type.  However,
this is everybody's least favorite feature of the language, since it
makes general array-handling subroutines impossible to write.
    The length of a structure is determinable from its declaration, so this
problem doesn't arise for structures.

					Randy Hudson
					{harpo,ima}!inmet!rgh

ken@turtleva.UUCP (Ken Turkowski) (08/22/83)

Is there any logical reason at all why the C compiler will not accept
constructs like &array, especially in subroutine calls, or even in
assignments?  I believe that version 6 allowed either.  The &array
construct is closer to what actually happens.

			Ken Turkowski
		    CADLINC, Palo Alto
		{decwrl,amd70}!turtlevax!ken