cobb@csu-cs.UUCP (07/28/83)
Consider the two routines:
extern char *yytext;
main ()
{
stat ();
printf ("%s\n",*yytext);
}
and
char yytext [10];
stat ()
{
yytext [0] = 'y';
yytext [1] = 'e';
yytext [2] = 'a';
yytext [3] = 'r';
yytext [4] = '\0';
}
These two routines exist in different modules, and are compiled
separately and then linked and loaded. However, every time I try to
execute the routine, I get an execution error. From reading the C
manual, the implication is that yytext is implemented as a pointer
to some storage locations, and can therefore be treated as such. This
is reinforced by the legal program segment
.
.
.
char ch [n];
char *chptr;
chptr = ch;
.
.
which results in chptr being assigned the base location of the array
'ch'. Why then do the first routines blow up?
Thanks
S. Cobb
!hao!csu-cs!cobbtom@rlgvax.UUCP (07/30/83)
Concerning the following:
extern char *x;
main() {
foo();
printf("%s", *x);
}
-----------------------------<new file>----------
char x[10];
sub() {
x[0] = 'a';
x[1] = '\0';
}
First, the reason the program won't work is that it should be
"printf("%s", x);" WITHOUT the indirection on x. What was originally
written passes a single character to printf() where a pointer (address)
was expected.
Second, the declaration "extern char *x;" is incorrect. "x" is an array,
NOT a pointer, and must be declared as such. The compiler thinks that
x is a pointer (i.e. a storage location containing an address). When
you pass x to printf(), x is evaulated, i.e. the contents of that location
is pushed onto the stack. However, in reality x is an ARRAY! This means
that what "the contents of that location" is in fact a bunch of characters.
So, you are passing a bunch of characters which are being interpeted as
an address. So, you blow up. Solution: in main, declare "extern char x[];".
The example you mention:
char *foo, bar[10];
foo = bar;
works because using "bar" by itself is exactly the same as "&(bar[0])"
by definition in C, not because you can treat pointers and arrays
indiscriminately. This next example also works for the same reason.
main() {
char foo[10];
subr(foo);
}
subr(bar) char *bar; {
}
I find that most new programmers find this second example confusing to say the
least.
- Tom Beres
{seismo, allegra, brl-bmd, mcnc, we13}!rlgvax!tomdixon@ihuxa.UUCP (07/31/83)
1.) The routine stat() placed the binary values for 'y', 'e', 'a', and 'r'
(NOT necessarily in that order) in the global storage location yytext
that was defined for the global area yytext. In the printf() statement
the *yytext argument instructed the system to get the contents at the
location given by the value in yytext. For the binary values of 'y', 'e',
'a', and 'r' this is a VERY large number (assuming unsigned for character
pointers). Much larger than the amount of memory you are apt to have on
a VAX 780 system.
2.) If you wanted to print a string of characters using a pointer to that
string, then you should use
printf("%s\n",yytext); rather than
printf("%s\n",*yytext);
since yytext has been declared to be a pointer to a charcter string.alan@allegra.UUCP (07/31/83)
Consider the following fragments:
--- file 1 ---
extern char *foo;
func() {
printf("%c", *foo);
}
--- file 2 ---
char foo[SIZE];
---
The declaration
extern char *foo;
says that foo is a variable holding the address of a character (in
this case, the first character of an array). This is not correct.
The variable foo doesn't refer to a pointer to an array, but to an
array itself.
On the other hand,
main() {
char foo[SIZE];
func(foo);
}
func(foo)
char *foo;
{
printf("%c", *foo);
}
will work just fine, although it would have been clearer to declare
foo as
char foo[];
in func().
The key here is that in the first example, the foo in file one is the
same variable as the foo in file two, while in the second example, we
have two different foo's. By saying
func(foo);
we create a character pointer. In func(), foo now refers to a location
on the stack which holds the address of the first character of the array
known by the name foo in main().
Like so many other blunders, this is all done in the name of efficiency.
As far as I'm concerned, it's one of the ugliest parts of the C language.
Alan Driscoll
BTL, Murray Hillmp@mit-eddie.UUCP (Mark Plotnick) (08/01/83)
ihuxa!dixon recommended the following:
If you wanted to print a string of characters using a pointer to that
string, then you should use
printf("%s\n",yytext); rather than
printf("%s\n",*yytext);
since yytext has been declared to be a pointer to a character string.
Well, this is almost right. Since the loader (at least on 4.1bsd and
System 3) has deemed that the external symbol _yytext is equal to
something like 10600 (the start of the 10-element array), if you do
printf("%s", yytext), you're going to get either a segmentation fault or
a lot of garbage, since you're passing printf the contents of 4 bytes
starting at 10600, which is 16230262571 on our machine. If you do
printf("%s", *yytext), you'll have an even better chance of core
dumping.
The solution is to use compatible declarations - in this case, yytext
should be declared as
extern char yytext[];
in the first program, not
extern char *yytext;
THEN you can use dixon's suggested fix.
The System V loader presumably catches conflicting declarations such as
this, and may even catch such favorite constructs as multiple external
definitions. The 3B20 loader of several years ago caught these errors
and I had a wonderful time while porting some Berkeley software in which
every module #included a header file full of global definitions.
Mark (genrad!mit-eddie!mp, eagle!mit-vax!mp)wyse@ihuxp.UUCP (08/02/83)
I suggest that you all reread section 5.3 in "The C Programming Language", by our friends BWK and DMR. Given that I have int a[10]; the reference a[i] is equivalent to *(a+i). The difference between an array name and a pointer is that the array name is a constant, i.e., you can't assign to it. When a array name is used as an argument to a function, the address of the beginning of array is passed and as somebody else pointed out, it is truly a pointer. However, you can declare it in the function as either char *s; or char s[]; as they are equivalent and which one is used depends on how the expressions involving s will be written in the function. Neal Wyse ihnp4!ihuxp!wyse Bell Labs, Naperville Ill.
tom@rlgvax.UUCP (Tom Beres) (08/06/83)
If you realize that "char *p;" and "char p[];" are different and you
don't need an explanation why, skip this. If not, read on.
NEAL WYSE at ihuxp, I think you need to read this.
Just because you can interchange "char *p;" and "char p[];" when
declaring a formal parameter does NOT mean you can confuse these
2 everywhere else. They are NOT the same!
char foo[10];
foo[1] = '\0';
means: Take the ADDRESS "foo" and add 1 to it to get a new address.
Then put a '\0' in the memory location indicated by that address.
char *foo;
foo[1] = '\0';
means: Look at the ADDRESS "foo" and take the VALUE you find in
there. Add 1 to that VALUE. Now use that VALUE as an address,
and put a '\0' in the memory location indicated by that address.
NOTE: I realize that "foo" should first be set to point to someplace
legit.
In both cases you can use indexing (i.e. "xxx[yyy]" is legit) but DIFFERENT
things happen in both cases! In order for the compiler to generate the
right code, it must know whether "foo" is an array or a pointer. In:
file 1: | file 2:
|
char array[10]; | extern char *array;
You are giving the wrong information to the compiler in File 2. The compiler
will oblige by giving you back wrong code. You are stating in File 2 that
"array" is the address of a character pointer, so the code produced will
fit the 2nd description above. But File 1 says that "array" is the
address of the first of 10 sequentially stored characters, so File 1
will produce code to suit the first description above.
Now for the question 'why can you use "char foo[];" and "char *foo;"
interchangably when declaring a formal parameter to a subroutine?'.
The reason is that, as we all know, when you pass an array in C, what
really gets passed to the subroutine is the address of the first element,
and the parameter variable will be initialized to that address. So, the
parameter is really an honest-to-goodness pointer variable. C knows that,
too, and what's more is a bit lenient about it. If you declare the parameter
as "char foo[];" the compiler recognizes that the whole array really won't
be passed, that only the address will be, and that the parameter needs to be
a pointer, so C correctly interprets the slightly misstated declaration.
This leniency of interpretation is provided only when declaring parameters,
nowhere else!
- Tom Beres
{seismo, allegra, brl-bmd, mcnc, we13}!rlgvax!tomdmmartindale@watcgl.UUCP (Dave Martindale) (08/11/83)
Since C now allows structures to be passed by value, I think that it would
be an interesting change if the language started supporting the passing of
arrays by value as well. Then the meanings of "array" and "&array[0]"
would always be different, and would correspond to the way structures
are handled.
Unfortunately, this would break the majority of already-existing C
programs, so no one is every likely to do it. Still, an interesting
idea. Comments?
Dave Martindale, {allegra,decvax!watmath}!watcgl!dmmartindalealan@allegra.UUCP (08/12/83)
Someone asked for comments about the idea of being allowed to pass
arrays (real arrays, not pointers) as arguments to functions.
I love it!
I would like to say things like
main() {
int array1[10], array2[10];
array1 = array2;
f(&array1); /* Call f with the address of array1. */
g(array2); /* Call g with array2. */
}
Arrays should behave like structures. After all, an array is just a
structure whose elements are of the same type. (Or, conversely, a
structure is just an array whose elements are not of the same type.)
Unfortunately, I can't see this ever happening, even though it would
make the language more consistent, more powerful, and more elegant.
Too bad.
Alan Driscoll
Bell Labs, Murray Hillguy@rlgvax.UUCP (Guy Harris) (08/13/83)
I agree that if you're going to treat structures as (mostly) first-class
citizens, you should treat arrays the same way. It would give you a
nice way to put "block moves" into code without having to use the USG UNIX
5.0 "memcpy"... routines as a side effect. (If you want to do this you
can always declare a structure whose only member is an array, but this
is a ghastly kludge).
One benefit of structure-valued arguments and returns is that if you
want to pass several arguments to a function that are *really* all part
of the same data structure, you can do it more cleanly with a structure-
valued argument. More importantly, it provides a way to get multiple
return values back from a function, which there is no other truly clean way
to do (you have to pass pointers to the return values otherwise). One
major nuisance with C (and 99% of all programming languages) is that a
routine can't return both a value and a status. The "read"/"write" system
calls, for example, must either return the number of bytes read/written
OR return an "abnormal termination" status - not both. So, you have the
problem of trying to have "write" on a magtape say "Well, I wrote all 5120
bytes, but I hit the EOT marker while doing it" or a "write" on a terminal
saying "Well, I got 100 of those bytes out before the guy hit the interrupt
key". The only way it could be done with only one return value is to have
something the convention that "errno" is cleared by all system calls unless
there is some abnormal condition; instead of testing for a return value of
"-1" (which is also a legitimate return value for some system calls) you
test for "errno != 0". An alternative would be to have the return value
from a particular system call declared as a structure containing the
returned value and the status code.
The disadvantage is that some present compilers aren't set up to do them
so they don't do them very well. PCC does some strange and non-reentrant
things for returning structure values, and the code in the compiler to
implement it is painful. Whether this is saying that the architecture of
C compilers should not now be the one used for PCC or that C should not
be extended to make aggregates first-class citizens is left as a question
for posterity to answer (as is the question of how UNIX system calls
should indicate abnormal status - note, not all abnormal statuses are really
errors, especially with I/O).
Guy Harris
{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guyguy@rlgvax.UUCP (Guy Harris) (08/13/83)
P.S. I also think that the equivalence between array names and
array addresses wasn't necessarily a good idea, given the problems it
seems to cause. (Please, no "UNIX wasn't intended for novices" flames;
it's going to be used by novices whether you like it or not.) I certainly
don't make use of it unless forced to do so by "lint"'s complaining about
int foo[10];
int *bar;
bar = &foo;
I think that the current rules mix two concepts (pointers and arrays) in
a way that is often unclear and occasionally dangerous. I would have
preferred it if "foo" had stood for the array, and "&foo" had stood for the
address of the array (the '&' would remind the reader of the code that it was
indeed an address without relying on their contextual knowledge of C to
fill that detail in) and, using the statements in the previous example,
you could refer to "foo[5]" either as "foo[5]" or "(*bar)[5]" (or, as I
usually do in such circumstances, *(bar + 5)). This would have no effect
in what computations could be expressed, nor would it have any effect on
how those computations were implemented (i.e., you wouldn't get further
removed from the machine), it would just be syntactic sugar. As such,
it probably wouldn't have done any harm but also isn't worth doing at this
point given that it would change the rules in the middle of the game.
Guy Harris
{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guychris@umcp-cs.UUCP (08/13/83)
Well, you can always write
struct array {
int x[10];
};
main () {
struct array a1, a2;
foo (&a1); /* call foo with address of array a1 */
bar (a2); /* call bar with array a2 */
}
A bit of a kludge, but it works in the current C compilers....
- Chris
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci
UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet: chris@umcp-cs ARPA: chris.umcp-cs@UDel-Relaymark@cbosgd.UUCP (08/14/83)
Having arrays passed by value instead of by reference would
indeed clean up the language. However, let's take a look at
some of the consequences of this move:
(1) Almost all existing programs would have to be rewritten,
since the change would not be upward compatible.
(2) If all arrays are passed by value, then character strings
must be passed by value as well (after all, they are arrays
of characters). This means
(a) programs will be slower, so that the copy can be made,
(b) programs will be bigger, for a place to copy to,
(c) subroutines that modify a character string in place
(like strcpy) will stop working.
(3) The above comments also apply to big arrays of things. In a
language such as Pascal, arrays being passed by value are one
of the worst causes of very slow programs (especially when the
author just didn't realize the array was going to be copied).
(4) The whole paradigm that arrays and pointers are interchangable
would have to be redone, no doubt losing upward compatibility.
Remember that C is not a general purpose high level applications
programming language, it's a systems implementation language. This
means it's closer to the machine, and you are supposed to be aware
of what's going on in the underlying machine. The fact that so many
people are using C for applications shows not that C is well suited
for applications, but that it's usable for applications and nothing
else is supported as well on UNIX.
Mark Hortonguy@rlgvax.UUCP (Guy Harris) (08/15/83)
I agree that even though passing arrays by value would be cleaner, it
wouldn't be desirable (interesting note about Pascal programmers). I
don't know if DMR or any of the other founding fathers of C read this
newsgroup, but what is the general consensus about structure assignment
and passing structures by value? I know it's a major mess to handle
them inside PCC, and some version of PCC generate very nasty code to
handle them (the push/copy of structures isn't done by a loop - the loop
is unrolled at compile time). I rarely use them, and plan not to use
them in the future given that our MIT-based 68000 PCC generates the afore-
mentioned nasty code. (The fix is fairly straightforward, I just haven't
felt motivated to put it in.)
Guy Harris
{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guygrahamr@bronze.UUCP (Graham Ross) (08/15/83)
What are arrays just like? In Pascal, arrays are just like records except all the elements are the same type. In C, arrays are just like pointers except they reserve stor- age and they don't work as lvalues. This is a part of the "pointer rift" separating Pascal and its relatives from C. An array cannot be passed at all in C, without putting it in a struct. When we do: f(a); we are passing a pointer to the 0th element of a (C Ref Man Sec 10.1). It is not a pointer to the array! f sees it the same as this: p = a; f(p); It makes no C sense to pass a pointer to the base element of a struct, since the elements are of varying size. Graham Ross teklabs!tekmdp!grahamr
alan@allegra.UUCP (08/15/83)
I think my ideas about arrays being passed by value were misunderstood, at least by one person. A follow-up article made these points: (1) Almost all existing programs would have to be rewritten, since the change would not be upward compatible. (2) If all arrays are passed by value, then character strings must be passed by value as well (after all, they are arrays of characters). This means (a) programs will be slower, so that the copy can be made, (b) programs will be bigger, for a place to copy to, (c) subroutines that modify a character string in place (like strcpy) will stop working. Yes, this fix to the language would break a lot of programs, but if the change were made, and the broken programs fixed, they would not be any less efficient than they were before, and routines like strcpy would still work just fine. len = strlen(s); would just be replaced with len = strlen(&s); I'm certainly not saying arrays should always be passed by value. People pass pointers to structures all the time, even though they could pass the structure by value, if they chose. I would just like the same choice with arrays. It would make the language simpler, clearer, and more consistent. Alan Driscoll Bell Labs, Murray Hill
mjl@ritcv.UUCP (Mike Lutz) (08/15/83)
Personally I like the idea of array names being equivalent to the
address of the first item in the array. Philosophically, an array name
is simply a pointer constant, just like 87 is an integer constant. One
of the biggest wins for this interpretation is that C can handle
strings conveniently (I won't stretch the point to say "elegantly").
Since this is something of a religious issue, I will try to be tolerant
of the heathens and infidels who don't believe in the tenets of the One
True Language.
I do have a complaint about the interpretation of a structure names, in
that these should be identical to array names. That is, if "foo" is a
structure, then "foo" by itself should be a pointer to the structure.
In this instance, my complaint is based on the desireability of
consistency rather than the emotional issues of what an array/structure
name should denote. I would like to treat arrays and structures
identically.
By the way, I think the Whitesmith's compiler does (or previous
versions of it did) interpret structure names as pointers.
Mike Lutz {allegra,seismo}!rochester!ritcv!mjlandrew@orca.UUCP (Andrew Klossner) (08/16/83)
Here's a reactionary note: I don't think that passing and returning
structures is a good idea, and should be backed out of the language.
It conflicts with the rest of C in that it can be a very "expensive"
operation, while the other C operations are "frugal".
However, I recognize that structure return values are vital in
producing YACC parsers when multiple attributes are in use. Perhaps
this was the movitating force for this language enhancement?
-- Andrew Klossner (decvax!tektronix!tekecs!andrew) [UUCP]
(andrew.tektronix@rand-relay) [ARPA]ken@turtleva.UUCP (Ken Turkowski) (08/17/83)
Now, wait a minute, Andrew Klossner, there are some reasons for passing
structures, and they are very natural. Suppose your numbers were
rational numbers, that is, each number is composed of a numerator and
denominator. Then it makes sense to pass the structure { int num, den; }
to processing routines. Similarly with complex numbers { double re, im; },
two dimensional coordinates { int x, y; }, polar coordinates
{ double r, theta}.
Ken Turkowski
CADLINC, Palo Alto
{decwrl,amd70}!turtlevax!kenstuart@rochester.UUCP (Stuart Friedberg) (08/17/83)
I recently ran afoul of the difference in treatment of structures and arrays. I had an array which was passed to an image processing program to be filled with the locations of all the boundary points of an image. The call was BoundaryPoints(Image, Bounds). (Image and Bounds both arrays). I then decided to have BoundaryPoints calculate a few related values like centroid and "mass" of the interesting part of the image. I changed Bounds to be a struct including the earlier array and adding variables for the new info. I updated all the references to Bounds, but left the call alone. Surprise, surprise! It all compiled, but previously working code crashes. Well, it rapidly became clear that the structure was being passed by value, not reference, but I did think that it was an unnecessary "gotcha". Shades of the uniform reference problem!!! Stu Friedberg
ka@spanky.UUCP (08/17/83)
The PDP-11 C compiler accepts the construct &array to mean the address of the array (as opposed to the address of the first element of the array). The problem with using this feature, aside from the obvious one that it is nonportable, is that it doesn't allow variable length arrays to be handled without using lots of type casts. If you have a pointer to an array you must know the size of the array at compile time. If you refer to an array using the address of its first element, you can figure out the size at run time. Kenneth Almquist
arnold@gatech.UUCP (08/17/83)
I think the way to change C to copy arrays by value is to add a
new operator to the language. I suggest using the ` (back quote). This is
one of 3 characters which are not used in C at all (others are $ and @).
It would work something like this:
char x[10],y[10]; /* declaration of two arrays */
`x = `y; /* copy array y to array x, equivalent to strcpy(x,y) */
foo(`y); /* call procedure foo() with the entire array y sent by value */
A procedure receiving an array passed by value would declare it
as follows:
foo(array)
char `array[]; /* the [] says 'array', the ` says it is passed by value */
{
/* code */
}
So far, this would allow array assignment, without changing the meaning
of 'array name as pointer to first element'. It is also upward compatible,
and would not break any existing programs. The use of an explicit operator makes
it 100% clear that array operations are going on.
The only problem with this is assignment of character strings; one
would have to do something ugly like:
`x = ` "string constant";
however, the language could probably special-case it to make the ` before
the constant un-necessary, the same way that it special-cases initialization
of character arrays to allow using "string" instead of the more verbose
{ 's', 't', 'r', 'i', 'n', 'g', '\0' }. As with initialization, where the
braces form is allowed, the ` in front of a string constant would also be
valid (i.e. no error messages), just not necessary.
An array concatenation operator, probably $, might also be useful.
For example:
#define DIM /* whatever */
char x[DIM], y[DIM], z[DIM];
x = y $ z; /* same as sprintf(x, "%s%s", y, z) */
x $= y; /* the more common strcat(x,y); */
Both the ` and the $ should work for arrays of any type, not
just character arrays, but character arrays would be where they were the
most useful. They should not work for arrays of mixed type.
New operators would make it easy for lint to catch mistakes as
well, for instance strcat(x, `y) would be a type error, an array passed
instead of a pointer.
C could use some extensions for dealing with arrays, we just have
to be *very* careful about how well they fit in with the rest of the language.
--
Arnold Robbins
Arnold @ GATech (CS Net) Arnold.GaTech @ UDel-Relay (ARPA)
...!{sb1, allegra}!gatech!arnold (uucp)
...!decvax!cornell!allegra!gatech!arnoldrgh@inmet.UUCP (08/20/83)
#R:gatech:-37800:inmet:5000004:000:602
inmet!rgh Aug 19 08:46:00 1983
The basic problem with passing arrays as values in C is that in
general you don't know how big the array is. Consider:
int a[20]; /* here you know */
extern int b[]; /* no telling */
f(c)
int c[]; /* can't tell */
{
}
Pascal treats the length of an array as part of its type. However,
this is everybody's least favorite feature of the language, since it
makes general array-handling subroutines impossible to write.
The length of a structure is determinable from its declaration, so this
problem doesn't arise for structures.
Randy Hudson
{harpo,ima}!inmet!rghken@turtleva.UUCP (Ken Turkowski) (08/22/83)
Is there any logical reason at all why the C compiler will not accept
constructs like &array, especially in subroutine calls, or even in
assignments? I believe that version 6 allowed either. The &array
construct is closer to what actually happens.
Ken Turkowski
CADLINC, Palo Alto
{decwrl,amd70}!turtlevax!ken