[comp.lang.c] Multidimensional Static Array Initialization Follow-up

timg@jpl-devvax.JPL.NASA.GOV (Tim Graham) (08/18/88)

Hello again everyone,

From a couple of the responses which I have received (thanks, by the way, to
those who took the trouble to respond), it has become fairly clear that I 
didn't express myself completely clearly in my original posting.

I am not under the impression that C multi-dimensional arrays are
non-rectangular; it is precisely because of their rectangularity
that it appears so easy to implement initialization in a manner
analogous to 1-D static array (without size declarator) initialization.

For example, if we have the declaration

	int foo[][4] = { { 1, 2,  3,  4  },
		 	 { 5, 8          },
		 	 { 9, 10, 12     } };

then the unitialized elements in the 3x4 array are filled with zeros.  Is
it really that much harder for it to be possible to implement a declaration
like
	
	int foo[][] = { { 1, 2,  3,  4  },
		 	 { 5, 8          },
		 	 { 9, 10, 12     } };

and have it do exactly the same thing without the bother of me having to
know in advance what the largest number of elements to appear in any row
is going to be?  It seems as though it would be possible for a compiler
to calculate how large each of the dimensions would need to be and then
just allocate the space.  There's not even any memory wasted since, whether
I have to declare in advance how large each dimension is or not, the resultant
array is the same size.

So, what I meant to ask in my original posting was why are things not
implemented in this way?

Thanks again to those who responded to my query, and thanks in advance to
those who will respond.


Tim Graham

================================================================================
"Freedom without order and order without freedom are equally destructive"
							       - Teddy Roosevelt
Jet Propulsion Laboratory/California Institute of Technology
4800 Oak Grove Drive  Pasadena, CA. 91109  MS: 301-260A           (818) 354-1448
UUCP: ...cit-vax!elroy!jpl-devvax!timg               ARPA: elroy!timg@jpl-devvax
================================================================================

chris@mimsy.UUCP (Chris Torek) (08/18/88)

In article <2682@jpl-devvax.JPL.NASA.GOV> timg@jpl-devvax.JPL.NASA.GOV
(Tim Graham) writes:
>For example ...
>	int foo[][4] = { { 1, 2,  3,  4  },
>		 	 { 5, 8          },
>		 	 { 9, 10, 12     } };
>... the uninitialized elements in the 3x4 array are filled with zeros.  Is
>it really that much harder for it to be possible to implement a declaration
>like
>	
>	int foo[][] = { { 1, 2,  3,  4  },
>		 	{ 5, 8          },
>		 	{ 9, 10, 12     } };
>
>and have it do exactly the same thing without the bother of me having to
>know in advance what the largest number of elements to appear in any row
>is going to be?

Yes.  Consider the situation from the compiler's point of view,
and in particular, the case where the first row is not the longest:

	int foo[][?] = { { 1	      },
			 { 2, 3, 4, 5 },
			 ... };

If the `?' is filled in with (e.g.) 4, the compiler can generate assembly
or object code as:

		.dataseg
	foo_:
		.int	1
		.int	0
		.int	0
		.int	0
		.int	2
		.int	3
		.int	4
		.int	5

If, on the other hand, no value is supplied for `?', the compiler
must defer emission of any of these constants until it has deduced
the appropriate `?'-value.

There is a way to do this, if the assembler or object format is
powerful enough, viz:

		.dataseg
	foo_:
		.int	1
		.space	S0	# (assuming this zero-fills)
		.int	2
		.int	3
		.int	4
		.int	5
		.space	S1
		# etc
		.set	S0,6	# row 0 needs 3 ints, 2 bytes each
		.set	S1,0	# row 1 needs 0 ints
		# etc

>So, what I meant to ask in my original posting was why are things not
>implemented in this way?

Basically, it is to make one-pass compilers easier to write.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

tainter@ihlpb.ATT.COM (Tainter) (08/23/88)

In article <13060@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> In article <2682@jpl-devvax.JPL.NASA.GOV> timg@jpl-devvax.JPL.NASA.GOV
>(Tim Graham) writes:
>>For example ...
>>  int foo[][4] = { { 1, 2,  3,  4  }, { 5, 8          }, { 9, 10, 12     } };
>>... the uninitialized elements in the 3x4 array are filled with zeros.  Is
>>it really that much harder for it to be possible to implement
>>  int foo[][] = { { 1, 2,  3,  4  }, { 5, 8          }, { 9, 10, 12     } };

> Yes.  Consider the situation from the compiler's point of view,
> and in particular, the case where the first row is not the longest:

> In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
> Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris


It isn't just an issue of the initializer.  Consider code referencing the
above entity foo.  As long as only one dimension is unknown C can generate
code to manipulate the structure without requiring patching at link time.
Note:  Defering the resolution of this type of symbol would probably
complicate using an assembler as a back end for the c ompiler.

--j.a.tainter
ihlpb!tainter

gregg@ihlpb.ATT.COM (Wonderly) (08/23/88)

From article <8584@ihlpb.ATT.COM>, by tainter@ihlpb.ATT.COM (Tainter):
> In article <13060@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
>> In article <2682@jpl-devvax.JPL.NASA.GOV> timg@jpl-devvax.JPL.NASA.GOV
>>(Tim Graham) writes:
>>>For example ...
>>>  int foo[][4] = { { 1, 2,  3,  4  }, { 5, 8          }, { 9, 10, 12     } };
>>>... the uninitialized elements in the 3x4 array are filled with zeros.  Is
>>>it really that much harder for it to be possible to implement
>>>  int foo[][] = { { 1, 2,  3,  4  }, { 5, 8          }, { 9, 10, 12     } };
> 
>> Yes.  Consider the situation from the compiler's point of view,
>> and in particular, the case where the first row is not the longest:
> 
> It isn't just an issue of the initializer.  Consider code referencing the
> above entity foo.  As long as only one dimension is unknown C can generate
> code to manipulate the structure without requiring patching at link time.

This looks to be a simple ilef vector.  e.g.  one might construct the
equivalent by writing the following C code.

int **foo;

foo = malloc (3*sizeof(int*));
foo[0] = malloc (4*sizeof(int));
foo[0][0] = 1;
foo[0][1] = 2;
foo[0][2] = 3;
foo[0][3] = 4;
foo[1] = malloc (2*sizeof(int));
foo[1][0] = 5;
foo[1][1] = 8;
etc.

or in assembler (as the compiler sees it)

int foo[][]     Compiler says OHHHH there are missing dimensions, I
can't possibly know how to do the multiplication, so lets use pointers
instead.

foo	equ	$
	.addr	1$
	.addr	2$
	.addr	3$
1$	equ	$
	.long	1
	.long	2
	.long	3
	.long	4
2$	equ	%
	.long	5
	.long	8

Now there is some (possibly huge) amount of context that needs to be saved
in order to count the rows.  Alternatively the assembler could support
psect's by name that would allow the compiler to just call it as it sees it

foo	psect	foo
	.addr	1$
	psect	foo.
1$	equ	$
	.long	1
	.long	2
	.long	3
	.long	4
	psect	foo
	.addr	2$
	psect	foo.
2$	equ	$
	.long	5
	.long	8

or some slighty more coherent syntax.  This would save a lot of time in
building sparse matrices.  When referencing such a beast, the same
assumption as was made above should be made;  i.e. if there is more than
one missing dimension, each of those is accessed through a pointer from
the previous dimension.

Gregg Wonderly
AT&T Bell Laboratories                   DOMAIN: gregg@ihlpb.att.com
IH2D217 - (312) 979-2794                 UUCP:   att!ihlpb!gregg
-- 
Gregg Wonderly
AT&T Bell Laboratories                   DOMAIN: gregg@ihlpb.att.com
IH2D217 - (312) 979-2794                 UUCP:   ihnp4!ihlpb!gregg

chris@mimsy.UUCP (Chris Torek) (08/23/88)

[questions and answers about when array sizes can be elided deleted]

In article <8584@ihlpb.ATT.COM> tainter@ihlpb.ATT.COM (Tainter) writes:
>It isn't just an issue of the initializer.  Consider code referencing the
>above entity foo.  As long as only one dimension is unknown C can generate
>code to manipulate the structure without requiring patching at link time.

The question was about initialised arrays---specifically, about the
cases where the compiler is able to deduce the dimensions.  Hence
*none* of the dimensions are unknown; they are merely left up to the
compiler to count.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

tainter@ihlpb.ATT.COM (Tainter) (08/24/88)

In article <13149@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>[questions and answers about when array sizes can be elided deleted]

>In article <8584@ihlpb.ATT.COM> tainter@ihlpb.ATT.COM (Tainter) writes:
>>It isn't just an issue of the initializer.  Consider code referencing the
>>above entity foo.  As long as only one dimension is unknown C can generate
>>code to manipulate the structure without requiring patching at link time.

>The question was about initialised arrays---specifically, about the
>cases where the compiler is able to deduce the dimensions.  Hence
>*none* of the dimensions are unknown; they are merely left up to the
>compiler to count.

Which is all well and good for the module allocating the object and
initializing it, but what about other modules declaring it extern?
Those are the ones that end up needing the patching at link time.

Thus for security in the code you should explicitly declare the dimension
in both places (extern declaration and the definition) using a macro,
included into both files.  Therefore, to encourage this, the language should,
and does require all lower dimensions.  Or better yet use a typedef!

Of additional interest, what does one do with function parameters if the
lesser dimensions are not explicitly declared?  Again one wants to encourage
use of a common dimension definition.

>In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)

--j.a.tainter
ihlpb!tainter

chris@mimsy.UUCP (Chris Torek) (08/26/88)

>In article <13149@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>The question was about initialised arrays---specifically, about the
>>cases where the compiler is able to deduce the dimensions.  Hence
>>*none* of the dimensions are unknown; they are merely left up to the
>>compiler to count.

[Note: this is not legal C, although there is, in my opinion, no
really compelling reason for that status.  Read on:]

In article <8596@ihlpb.ATT.COM> tainter@ihlpb.ATT.COM (Tainter) writes:
>Which is all well and good for the module allocating the object and
>initializing it, but what about other modules declaring it extern?
>Those are the ones that end up needing the patching at link time.

Those are illegal.

>Thus for security in the code you should explicitly declare the dimension
>in [all] places ... [using some form of common definition].  Therefore,
>to encourage this, the language should, and does require all lower dimensions.

But if all the code that manipulates the array is in one module, why
not allow the compiler to do the counting?  If we decide that getting
the proper dimensions out of one module and into another is too hard to
assign to the compiler, yes, that leaves the language inconsistent in a
way, but is it not now inconsistent in a similar way in that it allows
you to elide some dimension, but not any number of dimensions?

>Of additional interest, what does one do with function parameters if the
>lesser dimensions are not explicitly declared?

One generates an error, and refuses to compile the code.

	% cat foo.c
	f(a) int a[][]; {}
	% cc foo.c
	"foo.c", line 1: null dimension
	%
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

ok@quintus.uucp (Richard A. O'Keefe) (08/26/88)

Various people have been arguing back and forth about declarations like
    int a[][] = {{1}, {2,3,4}, {5,6}};
suggesting that the compiler should figure out the bounds or giving
reasons why it doesn't.

I'd just like to point out that there are lots of things I'd like in an
initialisation sublanguage which aren't there (in particular, calls to
functions like sin() are not allowed, oh the pain), but that that
restriction needn't get in the way.  There is no law that says every
character in the input to the C compiler had to be written by hand.

It's quite straightforward to put together a little mini-language
(using the C pre-processor) where you can write e.g.
	DEF_TWO_D("int", "a")
	    ROW IVAL(1) END_ROW
	    ROW IVAL(2) IVAL(3) IVAL(4) END_ROW
	    ROW IVAL(5) IVAL(6) END_ROW
	END_TWO_D
and compile it with a C compiler, then run it to generate a data.c
and data.h file.  Hint: the heading turns into something like
	{ static int INI_rmax = 0, INI_rcur = 0;
	  static int INI_cmax = 0, INI_ccur = 0;
	  if (pass == 2) {
	    fprintf(C_file, "%s %s[%d][%d] = {\n",
		"int", "a", INI_rmax, INI_cmax);
	    fprintf(H_file, "extern %s %s[%d][%d];\n",
		"int", "a", INI_rmax, INI_cmax);
	   }
Using a mini-language like this lets you compute arbitrary numeric
initial values, and makes absolutely sure that your .c and .h file
are in agreement.  You can even do things like
	ROW for (i = 0; i < N; i++) IVAL(f(i)) END_ROW