cobb@csu-cs.UUCP (07/28/83)
Consider the two routines: extern char *yytext; main () { stat (); printf ("%s\n",*yytext); } and char yytext [10]; stat () { yytext [0] = 'y'; yytext [1] = 'e'; yytext [2] = 'a'; yytext [3] = 'r'; yytext [4] = '\0'; } These two routines exist in different modules, and are compiled separately and then linked and loaded. However, every time I try to execute the routine, I get an execution error. From reading the C manual, the implication is that yytext is implemented as a pointer to some storage locations, and can therefore be treated as such. This is reinforced by the legal program segment . . . char ch [n]; char *chptr; chptr = ch; . . which results in chptr being assigned the base location of the array 'ch'. Why then do the first routines blow up? Thanks S. Cobb !hao!csu-cs!cobb
tom@rlgvax.UUCP (07/30/83)
Concerning the following: extern char *x; main() { foo(); printf("%s", *x); } -----------------------------<new file>---------- char x[10]; sub() { x[0] = 'a'; x[1] = '\0'; } First, the reason the program won't work is that it should be "printf("%s", x);" WITHOUT the indirection on x. What was originally written passes a single character to printf() where a pointer (address) was expected. Second, the declaration "extern char *x;" is incorrect. "x" is an array, NOT a pointer, and must be declared as such. The compiler thinks that x is a pointer (i.e. a storage location containing an address). When you pass x to printf(), x is evaulated, i.e. the contents of that location is pushed onto the stack. However, in reality x is an ARRAY! This means that what "the contents of that location" is in fact a bunch of characters. So, you are passing a bunch of characters which are being interpeted as an address. So, you blow up. Solution: in main, declare "extern char x[];". The example you mention: char *foo, bar[10]; foo = bar; works because using "bar" by itself is exactly the same as "&(bar[0])" by definition in C, not because you can treat pointers and arrays indiscriminately. This next example also works for the same reason. main() { char foo[10]; subr(foo); } subr(bar) char *bar; { } I find that most new programmers find this second example confusing to say the least. - Tom Beres {seismo, allegra, brl-bmd, mcnc, we13}!rlgvax!tom
dixon@ihuxa.UUCP (07/31/83)
1.) The routine stat() placed the binary values for 'y', 'e', 'a', and 'r' (NOT necessarily in that order) in the global storage location yytext that was defined for the global area yytext. In the printf() statement the *yytext argument instructed the system to get the contents at the location given by the value in yytext. For the binary values of 'y', 'e', 'a', and 'r' this is a VERY large number (assuming unsigned for character pointers). Much larger than the amount of memory you are apt to have on a VAX 780 system. 2.) If you wanted to print a string of characters using a pointer to that string, then you should use printf("%s\n",yytext); rather than printf("%s\n",*yytext); since yytext has been declared to be a pointer to a charcter string.
alan@allegra.UUCP (07/31/83)
Consider the following fragments: --- file 1 --- extern char *foo; func() { printf("%c", *foo); } --- file 2 --- char foo[SIZE]; --- The declaration extern char *foo; says that foo is a variable holding the address of a character (in this case, the first character of an array). This is not correct. The variable foo doesn't refer to a pointer to an array, but to an array itself. On the other hand, main() { char foo[SIZE]; func(foo); } func(foo) char *foo; { printf("%c", *foo); } will work just fine, although it would have been clearer to declare foo as char foo[]; in func(). The key here is that in the first example, the foo in file one is the same variable as the foo in file two, while in the second example, we have two different foo's. By saying func(foo); we create a character pointer. In func(), foo now refers to a location on the stack which holds the address of the first character of the array known by the name foo in main(). Like so many other blunders, this is all done in the name of efficiency. As far as I'm concerned, it's one of the ugliest parts of the C language. Alan Driscoll BTL, Murray Hill
mp@mit-eddie.UUCP (Mark Plotnick) (08/01/83)
ihuxa!dixon recommended the following: If you wanted to print a string of characters using a pointer to that string, then you should use printf("%s\n",yytext); rather than printf("%s\n",*yytext); since yytext has been declared to be a pointer to a character string. Well, this is almost right. Since the loader (at least on 4.1bsd and System 3) has deemed that the external symbol _yytext is equal to something like 10600 (the start of the 10-element array), if you do printf("%s", yytext), you're going to get either a segmentation fault or a lot of garbage, since you're passing printf the contents of 4 bytes starting at 10600, which is 16230262571 on our machine. If you do printf("%s", *yytext), you'll have an even better chance of core dumping. The solution is to use compatible declarations - in this case, yytext should be declared as extern char yytext[]; in the first program, not extern char *yytext; THEN you can use dixon's suggested fix. The System V loader presumably catches conflicting declarations such as this, and may even catch such favorite constructs as multiple external definitions. The 3B20 loader of several years ago caught these errors and I had a wonderful time while porting some Berkeley software in which every module #included a header file full of global definitions. Mark (genrad!mit-eddie!mp, eagle!mit-vax!mp)
wyse@ihuxp.UUCP (08/02/83)
I suggest that you all reread section 5.3 in "The C Programming Language", by our friends BWK and DMR. Given that I have int a[10]; the reference a[i] is equivalent to *(a+i). The difference between an array name and a pointer is that the array name is a constant, i.e., you can't assign to it. When a array name is used as an argument to a function, the address of the beginning of array is passed and as somebody else pointed out, it is truly a pointer. However, you can declare it in the function as either char *s; or char s[]; as they are equivalent and which one is used depends on how the expressions involving s will be written in the function. Neal Wyse ihnp4!ihuxp!wyse Bell Labs, Naperville Ill.
tom@rlgvax.UUCP (Tom Beres) (08/06/83)
If you realize that "char *p;" and "char p[];" are different and you
don't need an explanation why, skip this. If not, read on.
NEAL WYSE at ihuxp, I think you need to read this.
Just because you can interchange "char *p;" and "char p[];" when
declaring a formal parameter does NOT mean you can confuse these
2 everywhere else. They are NOT the same!
char foo[10];
foo[1] = '\0';
means: Take the ADDRESS "foo" and add 1 to it to get a new address.
Then put a '\0' in the memory location indicated by that address.
char *foo;
foo[1] = '\0';
means: Look at the ADDRESS "foo" and take the VALUE you find in
there. Add 1 to that VALUE. Now use that VALUE as an address,
and put a '\0' in the memory location indicated by that address.
NOTE: I realize that "foo" should first be set to point to someplace
legit.
In both cases you can use indexing (i.e. "xxx[yyy]" is legit) but DIFFERENT
things happen in both cases! In order for the compiler to generate the
right code, it must know whether "foo" is an array or a pointer. In:
file 1: | file 2:
|
char array[10]; | extern char *array;
You are giving the wrong information to the compiler in File 2. The compiler
will oblige by giving you back wrong code. You are stating in File 2 that
"array" is the address of a character pointer, so the code produced will
fit the 2nd description above. But File 1 says that "array" is the
address of the first of 10 sequentially stored characters, so File 1
will produce code to suit the first description above.
Now for the question 'why can you use "char foo[];" and "char *foo;"
interchangably when declaring a formal parameter to a subroutine?'.
The reason is that, as we all know, when you pass an array in C, what
really gets passed to the subroutine is the address of the first element,
and the parameter variable will be initialized to that address. So, the
parameter is really an honest-to-goodness pointer variable. C knows that,
too, and what's more is a bit lenient about it. If you declare the parameter
as "char foo[];" the compiler recognizes that the whole array really won't
be passed, that only the address will be, and that the parameter needs to be
a pointer, so C correctly interprets the slightly misstated declaration.
This leniency of interpretation is provided only when declaring parameters,
nowhere else!
- Tom Beres
{seismo, allegra, brl-bmd, mcnc, we13}!rlgvax!tom
dmmartindale@watcgl.UUCP (Dave Martindale) (08/11/83)
Since C now allows structures to be passed by value, I think that it would be an interesting change if the language started supporting the passing of arrays by value as well. Then the meanings of "array" and "&array[0]" would always be different, and would correspond to the way structures are handled. Unfortunately, this would break the majority of already-existing C programs, so no one is every likely to do it. Still, an interesting idea. Comments? Dave Martindale, {allegra,decvax!watmath}!watcgl!dmmartindale
alan@allegra.UUCP (08/12/83)
Someone asked for comments about the idea of being allowed to pass
arrays (real arrays, not pointers) as arguments to functions.
I love it!
I would like to say things like
main() {
int array1[10], array2[10];
array1 = array2;
f(&array1); /* Call f with the address of array1. */
g(array2); /* Call g with array2. */
}
Arrays should behave like structures. After all, an array is just a
structure whose elements are of the same type. (Or, conversely, a
structure is just an array whose elements are not of the same type.)
Unfortunately, I can't see this ever happening, even though it would
make the language more consistent, more powerful, and more elegant.
Too bad.
Alan Driscoll
Bell Labs, Murray Hill
guy@rlgvax.UUCP (Guy Harris) (08/13/83)
I agree that if you're going to treat structures as (mostly) first-class citizens, you should treat arrays the same way. It would give you a nice way to put "block moves" into code without having to use the USG UNIX 5.0 "memcpy"... routines as a side effect. (If you want to do this you can always declare a structure whose only member is an array, but this is a ghastly kludge). One benefit of structure-valued arguments and returns is that if you want to pass several arguments to a function that are *really* all part of the same data structure, you can do it more cleanly with a structure- valued argument. More importantly, it provides a way to get multiple return values back from a function, which there is no other truly clean way to do (you have to pass pointers to the return values otherwise). One major nuisance with C (and 99% of all programming languages) is that a routine can't return both a value and a status. The "read"/"write" system calls, for example, must either return the number of bytes read/written OR return an "abnormal termination" status - not both. So, you have the problem of trying to have "write" on a magtape say "Well, I wrote all 5120 bytes, but I hit the EOT marker while doing it" or a "write" on a terminal saying "Well, I got 100 of those bytes out before the guy hit the interrupt key". The only way it could be done with only one return value is to have something the convention that "errno" is cleared by all system calls unless there is some abnormal condition; instead of testing for a return value of "-1" (which is also a legitimate return value for some system calls) you test for "errno != 0". An alternative would be to have the return value from a particular system call declared as a structure containing the returned value and the status code. The disadvantage is that some present compilers aren't set up to do them so they don't do them very well. PCC does some strange and non-reentrant things for returning structure values, and the code in the compiler to implement it is painful. Whether this is saying that the architecture of C compilers should not now be the one used for PCC or that C should not be extended to make aggregates first-class citizens is left as a question for posterity to answer (as is the question of how UNIX system calls should indicate abnormal status - note, not all abnormal statuses are really errors, especially with I/O). Guy Harris {seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy
guy@rlgvax.UUCP (Guy Harris) (08/13/83)
P.S. I also think that the equivalence between array names and array addresses wasn't necessarily a good idea, given the problems it seems to cause. (Please, no "UNIX wasn't intended for novices" flames; it's going to be used by novices whether you like it or not.) I certainly don't make use of it unless forced to do so by "lint"'s complaining about int foo[10]; int *bar; bar = &foo; I think that the current rules mix two concepts (pointers and arrays) in a way that is often unclear and occasionally dangerous. I would have preferred it if "foo" had stood for the array, and "&foo" had stood for the address of the array (the '&' would remind the reader of the code that it was indeed an address without relying on their contextual knowledge of C to fill that detail in) and, using the statements in the previous example, you could refer to "foo[5]" either as "foo[5]" or "(*bar)[5]" (or, as I usually do in such circumstances, *(bar + 5)). This would have no effect in what computations could be expressed, nor would it have any effect on how those computations were implemented (i.e., you wouldn't get further removed from the machine), it would just be syntactic sugar. As such, it probably wouldn't have done any harm but also isn't worth doing at this point given that it would change the rules in the middle of the game. Guy Harris {seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy
chris@umcp-cs.UUCP (08/13/83)
Well, you can always write struct array { int x[10]; }; main () { struct array a1, a2; foo (&a1); /* call foo with address of array a1 */ bar (a2); /* call bar with array a2 */ } A bit of a kludge, but it works in the current C compilers.... - Chris -- In-Real-Life: Chris Torek, Univ of MD Comp Sci UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris.umcp-cs@UDel-Relay
mark@cbosgd.UUCP (08/14/83)
Having arrays passed by value instead of by reference would indeed clean up the language. However, let's take a look at some of the consequences of this move: (1) Almost all existing programs would have to be rewritten, since the change would not be upward compatible. (2) If all arrays are passed by value, then character strings must be passed by value as well (after all, they are arrays of characters). This means (a) programs will be slower, so that the copy can be made, (b) programs will be bigger, for a place to copy to, (c) subroutines that modify a character string in place (like strcpy) will stop working. (3) The above comments also apply to big arrays of things. In a language such as Pascal, arrays being passed by value are one of the worst causes of very slow programs (especially when the author just didn't realize the array was going to be copied). (4) The whole paradigm that arrays and pointers are interchangable would have to be redone, no doubt losing upward compatibility. Remember that C is not a general purpose high level applications programming language, it's a systems implementation language. This means it's closer to the machine, and you are supposed to be aware of what's going on in the underlying machine. The fact that so many people are using C for applications shows not that C is well suited for applications, but that it's usable for applications and nothing else is supported as well on UNIX. Mark Horton
guy@rlgvax.UUCP (Guy Harris) (08/15/83)
I agree that even though passing arrays by value would be cleaner, it wouldn't be desirable (interesting note about Pascal programmers). I don't know if DMR or any of the other founding fathers of C read this newsgroup, but what is the general consensus about structure assignment and passing structures by value? I know it's a major mess to handle them inside PCC, and some version of PCC generate very nasty code to handle them (the push/copy of structures isn't done by a loop - the loop is unrolled at compile time). I rarely use them, and plan not to use them in the future given that our MIT-based 68000 PCC generates the afore- mentioned nasty code. (The fix is fairly straightforward, I just haven't felt motivated to put it in.) Guy Harris {seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy
grahamr@bronze.UUCP (Graham Ross) (08/15/83)
What are arrays just like? In Pascal, arrays are just like records except all the elements are the same type. In C, arrays are just like pointers except they reserve stor- age and they don't work as lvalues. This is a part of the "pointer rift" separating Pascal and its relatives from C. An array cannot be passed at all in C, without putting it in a struct. When we do: f(a); we are passing a pointer to the 0th element of a (C Ref Man Sec 10.1). It is not a pointer to the array! f sees it the same as this: p = a; f(p); It makes no C sense to pass a pointer to the base element of a struct, since the elements are of varying size. Graham Ross teklabs!tekmdp!grahamr
alan@allegra.UUCP (08/15/83)
I think my ideas about arrays being passed by value were misunderstood, at least by one person. A follow-up article made these points: (1) Almost all existing programs would have to be rewritten, since the change would not be upward compatible. (2) If all arrays are passed by value, then character strings must be passed by value as well (after all, they are arrays of characters). This means (a) programs will be slower, so that the copy can be made, (b) programs will be bigger, for a place to copy to, (c) subroutines that modify a character string in place (like strcpy) will stop working. Yes, this fix to the language would break a lot of programs, but if the change were made, and the broken programs fixed, they would not be any less efficient than they were before, and routines like strcpy would still work just fine. len = strlen(s); would just be replaced with len = strlen(&s); I'm certainly not saying arrays should always be passed by value. People pass pointers to structures all the time, even though they could pass the structure by value, if they chose. I would just like the same choice with arrays. It would make the language simpler, clearer, and more consistent. Alan Driscoll Bell Labs, Murray Hill
mjl@ritcv.UUCP (Mike Lutz) (08/15/83)
Personally I like the idea of array names being equivalent to the address of the first item in the array. Philosophically, an array name is simply a pointer constant, just like 87 is an integer constant. One of the biggest wins for this interpretation is that C can handle strings conveniently (I won't stretch the point to say "elegantly"). Since this is something of a religious issue, I will try to be tolerant of the heathens and infidels who don't believe in the tenets of the One True Language. I do have a complaint about the interpretation of a structure names, in that these should be identical to array names. That is, if "foo" is a structure, then "foo" by itself should be a pointer to the structure. In this instance, my complaint is based on the desireability of consistency rather than the emotional issues of what an array/structure name should denote. I would like to treat arrays and structures identically. By the way, I think the Whitesmith's compiler does (or previous versions of it did) interpret structure names as pointers. Mike Lutz {allegra,seismo}!rochester!ritcv!mjl
andrew@orca.UUCP (Andrew Klossner) (08/16/83)
Here's a reactionary note: I don't think that passing and returning structures is a good idea, and should be backed out of the language. It conflicts with the rest of C in that it can be a very "expensive" operation, while the other C operations are "frugal". However, I recognize that structure return values are vital in producing YACC parsers when multiple attributes are in use. Perhaps this was the movitating force for this language enhancement? -- Andrew Klossner (decvax!tektronix!tekecs!andrew) [UUCP] (andrew.tektronix@rand-relay) [ARPA]
ken@turtleva.UUCP (Ken Turkowski) (08/17/83)
Now, wait a minute, Andrew Klossner, there are some reasons for passing structures, and they are very natural. Suppose your numbers were rational numbers, that is, each number is composed of a numerator and denominator. Then it makes sense to pass the structure { int num, den; } to processing routines. Similarly with complex numbers { double re, im; }, two dimensional coordinates { int x, y; }, polar coordinates { double r, theta}. Ken Turkowski CADLINC, Palo Alto {decwrl,amd70}!turtlevax!ken
stuart@rochester.UUCP (Stuart Friedberg) (08/17/83)
I recently ran afoul of the difference in treatment of structures and arrays. I had an array which was passed to an image processing program to be filled with the locations of all the boundary points of an image. The call was BoundaryPoints(Image, Bounds). (Image and Bounds both arrays). I then decided to have BoundaryPoints calculate a few related values like centroid and "mass" of the interesting part of the image. I changed Bounds to be a struct including the earlier array and adding variables for the new info. I updated all the references to Bounds, but left the call alone. Surprise, surprise! It all compiled, but previously working code crashes. Well, it rapidly became clear that the structure was being passed by value, not reference, but I did think that it was an unnecessary "gotcha". Shades of the uniform reference problem!!! Stu Friedberg
ka@spanky.UUCP (08/17/83)
The PDP-11 C compiler accepts the construct &array to mean the address of the array (as opposed to the address of the first element of the array). The problem with using this feature, aside from the obvious one that it is nonportable, is that it doesn't allow variable length arrays to be handled without using lots of type casts. If you have a pointer to an array you must know the size of the array at compile time. If you refer to an array using the address of its first element, you can figure out the size at run time. Kenneth Almquist
arnold@gatech.UUCP (08/17/83)
I think the way to change C to copy arrays by value is to add a
new operator to the language. I suggest using the ` (back quote). This is
one of 3 characters which are not used in C at all (others are $ and @).
It would work something like this:
char x[10],y[10]; /* declaration of two arrays */
`x = `y; /* copy array y to array x, equivalent to strcpy(x,y) */
foo(`y); /* call procedure foo() with the entire array y sent by value */
A procedure receiving an array passed by value would declare it
as follows:
foo(array)
char `array[]; /* the [] says 'array', the ` says it is passed by value */
{
/* code */
}
So far, this would allow array assignment, without changing the meaning
of 'array name as pointer to first element'. It is also upward compatible,
and would not break any existing programs. The use of an explicit operator makes
it 100% clear that array operations are going on.
The only problem with this is assignment of character strings; one
would have to do something ugly like:
`x = ` "string constant";
however, the language could probably special-case it to make the ` before
the constant un-necessary, the same way that it special-cases initialization
of character arrays to allow using "string" instead of the more verbose
{ 's', 't', 'r', 'i', 'n', 'g', '\0' }. As with initialization, where the
braces form is allowed, the ` in front of a string constant would also be
valid (i.e. no error messages), just not necessary.
An array concatenation operator, probably $, might also be useful.
For example:
#define DIM /* whatever */
char x[DIM], y[DIM], z[DIM];
x = y $ z; /* same as sprintf(x, "%s%s", y, z) */
x $= y; /* the more common strcat(x,y); */
Both the ` and the $ should work for arrays of any type, not
just character arrays, but character arrays would be where they were the
most useful. They should not work for arrays of mixed type.
New operators would make it easy for lint to catch mistakes as
well, for instance strcat(x, `y) would be a type error, an array passed
instead of a pointer.
C could use some extensions for dealing with arrays, we just have
to be *very* careful about how well they fit in with the rest of the language.
--
Arnold Robbins
Arnold @ GATech (CS Net) Arnold.GaTech @ UDel-Relay (ARPA)
...!{sb1, allegra}!gatech!arnold (uucp)
...!decvax!cornell!allegra!gatech!arnold
rgh@inmet.UUCP (08/20/83)
#R:gatech:-37800:inmet:5000004:000:602
inmet!rgh Aug 19 08:46:00 1983
The basic problem with passing arrays as values in C is that in
general you don't know how big the array is. Consider:
int a[20]; /* here you know */
extern int b[]; /* no telling */
f(c)
int c[]; /* can't tell */
{
}
Pascal treats the length of an array as part of its type. However,
this is everybody's least favorite feature of the language, since it
makes general array-handling subroutines impossible to write.
The length of a structure is determinable from its declaration, so this
problem doesn't arise for structures.
Randy Hudson
{harpo,ima}!inmet!rgh
ken@turtleva.UUCP (Ken Turkowski) (08/22/83)
Is there any logical reason at all why the C compiler will not accept constructs like &array, especially in subroutine calls, or even in assignments? I believe that version 6 allowed either. The &array construct is closer to what actually happens. Ken Turkowski CADLINC, Palo Alto {decwrl,amd70}!turtlevax!ken