timg@jpl-devvax.JPL.NASA.GOV (Tim Graham) (08/18/88)
Hello again everyone, From a couple of the responses which I have received (thanks, by the way, to those who took the trouble to respond), it has become fairly clear that I didn't express myself completely clearly in my original posting. I am not under the impression that C multi-dimensional arrays are non-rectangular; it is precisely because of their rectangularity that it appears so easy to implement initialization in a manner analogous to 1-D static array (without size declarator) initialization. For example, if we have the declaration int foo[][4] = { { 1, 2, 3, 4 }, { 5, 8 }, { 9, 10, 12 } }; then the unitialized elements in the 3x4 array are filled with zeros. Is it really that much harder for it to be possible to implement a declaration like int foo[][] = { { 1, 2, 3, 4 }, { 5, 8 }, { 9, 10, 12 } }; and have it do exactly the same thing without the bother of me having to know in advance what the largest number of elements to appear in any row is going to be? It seems as though it would be possible for a compiler to calculate how large each of the dimensions would need to be and then just allocate the space. There's not even any memory wasted since, whether I have to declare in advance how large each dimension is or not, the resultant array is the same size. So, what I meant to ask in my original posting was why are things not implemented in this way? Thanks again to those who responded to my query, and thanks in advance to those who will respond. Tim Graham ================================================================================ "Freedom without order and order without freedom are equally destructive" - Teddy Roosevelt Jet Propulsion Laboratory/California Institute of Technology 4800 Oak Grove Drive Pasadena, CA. 91109 MS: 301-260A (818) 354-1448 UUCP: ...cit-vax!elroy!jpl-devvax!timg ARPA: elroy!timg@jpl-devvax ================================================================================
chris@mimsy.UUCP (Chris Torek) (08/18/88)
In article <2682@jpl-devvax.JPL.NASA.GOV> timg@jpl-devvax.JPL.NASA.GOV (Tim Graham) writes: >For example ... > int foo[][4] = { { 1, 2, 3, 4 }, > { 5, 8 }, > { 9, 10, 12 } }; >... the uninitialized elements in the 3x4 array are filled with zeros. Is >it really that much harder for it to be possible to implement a declaration >like > > int foo[][] = { { 1, 2, 3, 4 }, > { 5, 8 }, > { 9, 10, 12 } }; > >and have it do exactly the same thing without the bother of me having to >know in advance what the largest number of elements to appear in any row >is going to be? Yes. Consider the situation from the compiler's point of view, and in particular, the case where the first row is not the longest: int foo[][?] = { { 1 }, { 2, 3, 4, 5 }, ... }; If the `?' is filled in with (e.g.) 4, the compiler can generate assembly or object code as: .dataseg foo_: .int 1 .int 0 .int 0 .int 0 .int 2 .int 3 .int 4 .int 5 If, on the other hand, no value is supplied for `?', the compiler must defer emission of any of these constants until it has deduced the appropriate `?'-value. There is a way to do this, if the assembler or object format is powerful enough, viz: .dataseg foo_: .int 1 .space S0 # (assuming this zero-fills) .int 2 .int 3 .int 4 .int 5 .space S1 # etc .set S0,6 # row 0 needs 3 ints, 2 bytes each .set S1,0 # row 1 needs 0 ints # etc >So, what I meant to ask in my original posting was why are things not >implemented in this way? Basically, it is to make one-pass compilers easier to write. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
tainter@ihlpb.ATT.COM (Tainter) (08/23/88)
In article <13060@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: > In article <2682@jpl-devvax.JPL.NASA.GOV> timg@jpl-devvax.JPL.NASA.GOV >(Tim Graham) writes: >>For example ... >> int foo[][4] = { { 1, 2, 3, 4 }, { 5, 8 }, { 9, 10, 12 } }; >>... the uninitialized elements in the 3x4 array are filled with zeros. Is >>it really that much harder for it to be possible to implement >> int foo[][] = { { 1, 2, 3, 4 }, { 5, 8 }, { 9, 10, 12 } }; > Yes. Consider the situation from the compiler's point of view, > and in particular, the case where the first row is not the longest: > In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) > Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris It isn't just an issue of the initializer. Consider code referencing the above entity foo. As long as only one dimension is unknown C can generate code to manipulate the structure without requiring patching at link time. Note: Defering the resolution of this type of symbol would probably complicate using an assembler as a back end for the c ompiler. --j.a.tainter ihlpb!tainter
gregg@ihlpb.ATT.COM (Wonderly) (08/23/88)
From article <8584@ihlpb.ATT.COM>, by tainter@ihlpb.ATT.COM (Tainter): > In article <13060@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: >> In article <2682@jpl-devvax.JPL.NASA.GOV> timg@jpl-devvax.JPL.NASA.GOV >>(Tim Graham) writes: >>>For example ... >>> int foo[][4] = { { 1, 2, 3, 4 }, { 5, 8 }, { 9, 10, 12 } }; >>>... the uninitialized elements in the 3x4 array are filled with zeros. Is >>>it really that much harder for it to be possible to implement >>> int foo[][] = { { 1, 2, 3, 4 }, { 5, 8 }, { 9, 10, 12 } }; > >> Yes. Consider the situation from the compiler's point of view, >> and in particular, the case where the first row is not the longest: > > It isn't just an issue of the initializer. Consider code referencing the > above entity foo. As long as only one dimension is unknown C can generate > code to manipulate the structure without requiring patching at link time. This looks to be a simple ilef vector. e.g. one might construct the equivalent by writing the following C code. int **foo; foo = malloc (3*sizeof(int*)); foo[0] = malloc (4*sizeof(int)); foo[0][0] = 1; foo[0][1] = 2; foo[0][2] = 3; foo[0][3] = 4; foo[1] = malloc (2*sizeof(int)); foo[1][0] = 5; foo[1][1] = 8; etc. or in assembler (as the compiler sees it) int foo[][] Compiler says OHHHH there are missing dimensions, I can't possibly know how to do the multiplication, so lets use pointers instead. foo equ $ .addr 1$ .addr 2$ .addr 3$ 1$ equ $ .long 1 .long 2 .long 3 .long 4 2$ equ % .long 5 .long 8 Now there is some (possibly huge) amount of context that needs to be saved in order to count the rows. Alternatively the assembler could support psect's by name that would allow the compiler to just call it as it sees it foo psect foo .addr 1$ psect foo. 1$ equ $ .long 1 .long 2 .long 3 .long 4 psect foo .addr 2$ psect foo. 2$ equ $ .long 5 .long 8 or some slighty more coherent syntax. This would save a lot of time in building sparse matrices. When referencing such a beast, the same assumption as was made above should be made; i.e. if there is more than one missing dimension, each of those is accessed through a pointer from the previous dimension. Gregg Wonderly AT&T Bell Laboratories DOMAIN: gregg@ihlpb.att.com IH2D217 - (312) 979-2794 UUCP: att!ihlpb!gregg -- Gregg Wonderly AT&T Bell Laboratories DOMAIN: gregg@ihlpb.att.com IH2D217 - (312) 979-2794 UUCP: ihnp4!ihlpb!gregg
chris@mimsy.UUCP (Chris Torek) (08/23/88)
[questions and answers about when array sizes can be elided deleted] In article <8584@ihlpb.ATT.COM> tainter@ihlpb.ATT.COM (Tainter) writes: >It isn't just an issue of the initializer. Consider code referencing the >above entity foo. As long as only one dimension is unknown C can generate >code to manipulate the structure without requiring patching at link time. The question was about initialised arrays---specifically, about the cases where the compiler is able to deduce the dimensions. Hence *none* of the dimensions are unknown; they are merely left up to the compiler to count. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
tainter@ihlpb.ATT.COM (Tainter) (08/24/88)
In article <13149@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >[questions and answers about when array sizes can be elided deleted] >In article <8584@ihlpb.ATT.COM> tainter@ihlpb.ATT.COM (Tainter) writes: >>It isn't just an issue of the initializer. Consider code referencing the >>above entity foo. As long as only one dimension is unknown C can generate >>code to manipulate the structure without requiring patching at link time. >The question was about initialised arrays---specifically, about the >cases where the compiler is able to deduce the dimensions. Hence >*none* of the dimensions are unknown; they are merely left up to the >compiler to count. Which is all well and good for the module allocating the object and initializing it, but what about other modules declaring it extern? Those are the ones that end up needing the patching at link time. Thus for security in the code you should explicitly declare the dimension in both places (extern declaration and the definition) using a macro, included into both files. Therefore, to encourage this, the language should, and does require all lower dimensions. Or better yet use a typedef! Of additional interest, what does one do with function parameters if the lesser dimensions are not explicitly declared? Again one wants to encourage use of a common dimension definition. >In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) --j.a.tainter ihlpb!tainter
chris@mimsy.UUCP (Chris Torek) (08/26/88)
>In article <13149@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >>The question was about initialised arrays---specifically, about the >>cases where the compiler is able to deduce the dimensions. Hence >>*none* of the dimensions are unknown; they are merely left up to the >>compiler to count. [Note: this is not legal C, although there is, in my opinion, no really compelling reason for that status. Read on:] In article <8596@ihlpb.ATT.COM> tainter@ihlpb.ATT.COM (Tainter) writes: >Which is all well and good for the module allocating the object and >initializing it, but what about other modules declaring it extern? >Those are the ones that end up needing the patching at link time. Those are illegal. >Thus for security in the code you should explicitly declare the dimension >in [all] places ... [using some form of common definition]. Therefore, >to encourage this, the language should, and does require all lower dimensions. But if all the code that manipulates the array is in one module, why not allow the compiler to do the counting? If we decide that getting the proper dimensions out of one module and into another is too hard to assign to the compiler, yes, that leaves the language inconsistent in a way, but is it not now inconsistent in a similar way in that it allows you to elide some dimension, but not any number of dimensions? >Of additional interest, what does one do with function parameters if the >lesser dimensions are not explicitly declared? One generates an error, and refuses to compile the code. % cat foo.c f(a) int a[][]; {} % cc foo.c "foo.c", line 1: null dimension % -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
ok@quintus.uucp (Richard A. O'Keefe) (08/26/88)
Various people have been arguing back and forth about declarations like
int a[][] = {{1}, {2,3,4}, {5,6}};
suggesting that the compiler should figure out the bounds or giving
reasons why it doesn't.
I'd just like to point out that there are lots of things I'd like in an
initialisation sublanguage which aren't there (in particular, calls to
functions like sin() are not allowed, oh the pain), but that that
restriction needn't get in the way. There is no law that says every
character in the input to the C compiler had to be written by hand.
It's quite straightforward to put together a little mini-language
(using the C pre-processor) where you can write e.g.
DEF_TWO_D("int", "a")
ROW IVAL(1) END_ROW
ROW IVAL(2) IVAL(3) IVAL(4) END_ROW
ROW IVAL(5) IVAL(6) END_ROW
END_TWO_D
and compile it with a C compiler, then run it to generate a data.c
and data.h file. Hint: the heading turns into something like
{ static int INI_rmax = 0, INI_rcur = 0;
static int INI_cmax = 0, INI_ccur = 0;
if (pass == 2) {
fprintf(C_file, "%s %s[%d][%d] = {\n",
"int", "a", INI_rmax, INI_cmax);
fprintf(H_file, "extern %s %s[%d][%d];\n",
"int", "a", INI_rmax, INI_cmax);
}
Using a mini-language like this lets you compute arbitrary numeric
initial values, and makes absolutely sure that your .c and .h file
are in agreement. You can even do things like
ROW for (i = 0; i < N; i++) IVAL(f(i)) END_ROW