[comp.lang.c] vector initialization

chad@lakesys.UUCP (Chad Gibbons) (06/06/89)

...at the risk of being burned at the stake;

	After scanning a few books, I've been unable to find an answer 
either way to my question, so I must resort to the help of the net.
Question: is it possible to initialize a vector?

	I tried using the same syntax as an array of pointers, i.e.
		char *foo[] = { "one", "two", "three", NULL };
but this would not compile.  Since I am initializing it, it would seem an
array of pointers would suffice.  However, in the context of something
along these lines (from a command list vector):

typedef struct _com {
    char **words;
    int (*fcn)();
    short flags;
} COM;

...and then initializing the structure in another module by:

COM foo[] = {
    { "one", "two", "three", NULL }, do_num, 0,
    { "exit", "quit", NULL }, quit, 0
};

That would be nice, but something tells me it isn't C, or anything else
like it.  I may resort to allowing only a single word per structure, but
it would be more advantageous to use a list associated with a single
structure entry.

	Anyway to achive this?  Perhaps the actual structure definition
is the problem here, I'm not sure.
-- 
D. Chadwick Gibbons, chad@lakesys.lakesys.com, ...!uunet!marque!lakesys!chad

chris@mimsy.UUCP (Chris Torek) (06/07/89)

In article <687@lakesys.UUCP> chad@lakesys.UUCP (Chad Gibbons) writes:
>Question: is it possible to initialize a vector?
[where `vector' means `object of type pointer to pointer to char']

Yes.  The only legal data type that cannot be initialised is a union
(and even that can have an initialiser, albeit restricted, in pANS C).

>	I tried using the same syntax as an array of pointers, i.e.
>		char *foo[] = { "one", "two", "three", NULL };
>but this would not compile.

The definition above for `foo' is legal: `foo' is an object of type
`array ? of pointer to char' (array 4, after the initialisation is
complete) and the initialiser is of type `array 4 of pointer to char'.
The way the aggregate---everything between the left and right braces
---acquires this type is somewhat convoluted%: it picks up the `array'
part from the variable being initialised, gets the `4' from the actual
number of elements, and gets the `pointer to char' part from both the
variable and the values themselves.  The type of a double-quoted string
is `array N of char', where N is the length of the string including the
final \0 character; this reverts to an object of type `pointer to char'
in this initialiser context, so the types of the individual array
elements match.  After solidifying foo[] as an `array 4', the types
of both sides of the equal sign match and the whole thing is declared
sane (even if you are no longer, after reading this paragraph).

-----
% Some might say `Byzantine' or `Baroque', but I prefer `Rococo' :-)
-----

>Since I am initializing it, it would seem an array of pointers would
>suffice.

You need an object of type `pointer to pointer to char'.  An array of
pointer to char would suffice, if you could create one without giving
it a name, because it would decay into a pointer to pointer to char.
But you cannot create one without naming it---the initialisation for
`foo' above is legal only because there is a context allowing the array
(namely the intialiser for `foo').  C does not have `naked arrays',
with the one exception of double-quoted strings---you cannot, in the
midst of some expression, write

	({ "foo", "bar" })[i]			/* illegal */

to select either "foo" or "bar" depending on i==0 or i==1.  (You
*can* write this in GCC, by using an `initialised cast':

	((char *[]){ "foo", "bar" })[i]		/* GCC-specific */

The cast provides the shape for the aggregate.)  Normally, the only
way to construct an aggregate object (structure or array) is as an
initialiser for a named variable, and the variable provide the shape.
The variable (or, in GCC, the cast) must have an aggregate type,
not a simple pointer type.  In essence,

	char **p = { "a", "b", "c" };		/* illegal */

provides the wrong context---the compiler thinks, `Aha, we need a
pointer to pointer to char, and we have a left brace which means an
aggregate ... oops, something is wrong, help me Spock....'  It
cannot go through its type-matching waltz without first being told
`array, array!'%%.

-----
%% or `Ole, ole!'; but this works only in Spanish-speaking C compilers.
-----

Now that I have told you why you cannot do it (but not in 50 words
or less), here is the rest of the story:

>... something along these lines (from a command list vector):
>
>typedef struct _com {
>    char **words;
>    int (*fcn)();
>    short flags;
>} COM;
>
>...and then initializing the structure in another module by:
>
>COM foo[] = {
>    { "one", "two", "three", NULL }, do_num, 0,
>    { "exit", "quit", NULL }, quit, 0
>};

For each of the two COM objects foo[0] and foo[1], you need a constant
value of type `char **'.  So what we *can* do is this:

	char *xxx0[] = { "one", "two", "three", NULL };
	char *xxx1[] = { "exit", "quit", NULL };

	COM foo[] = {
		{ xxx0, do_num, 0 },
		{ xxx1, quit, 0 },
	};

(I have put back the optional braces in the initialiser for foo[],
and added the optional extra comma after foo[1]'s initialiser.)  Each
xxx array is an object of type (char *[]), which degenerates into one
of type (char **), and which is a constant expression after this
degeneration (because it is the address of a global variable).  If
we hated making up names, and did not mind restricting ourselves to
GCC, we could instead use

	COM foo[] = {
		{ (char *[]){ "one", "two", "three", NULL }, do_num, 0 },
		{ (char *[]){ "exit", "quit", NULL }, quit, 0 },
	};

but the former approach is portable, if somewhat ugly.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

scs@adam.pika.mit.edu (Steve Summit) (06/08/89)

In article <687@lakesys.UUCP> chad@lakesys.UUCP (Chad Gibbons) writes:
>Question: is it possible to initialize a vector?
>...in the context of something
>along these lines (from a command list vector):
>
>typedef struct _com {
>    char **words;
>    int (*fcn)();
>    short flags;
>} COM;
>
>...and then initializing the structure in another module by:
>
>COM foo[] = {
>    { "one", "two", "three", NULL }, do_num, 0,
>    { "exit", "quit", NULL }, quit, 0
>};
>
>That would be nice, but something tells me it isn't C, or anything else
>like it.  I may resort to allowing only a single word per structure, but
>it would be more advantageous to use a list associated with a single
>structure entry.

In article <17940@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:

>You need an object of type `pointer to pointer to char'.  An array of
>pointer to char would suffice, if you could create one without giving
>it a name, because it would decay into a pointer to pointer to char.
>But you cannot create one without naming it...
>For each of the two COM objects foo[0] and foo[1], you need a constant
>value of type `char **'.  So what we *can* do is this:
>
>	char *xxx0[] = { "one", "two", "three", NULL };
>	char *xxx1[] = { "exit", "quit", NULL };
>
>	COM foo[] = {
>		{ xxx0, do_num, 0 },
>		{ xxx1, quit, 0 },
>	};

The other way is to modify the declaration, so that you don't
need a pointer:

	typedef struct {
		char *words[MAXWORDS];
		int (*fcn)();
		short flags;
	} COM;

	COM foo[] = {
		{{ "one", "two", "three", NULL }, do_num, 0,},
		{{ "exit", "quit", NULL }, quit, 0,},
	};

This has the advantage that you don't need to think of names for
the placeholder pointers.  It has the disadvantages that the
number of entries is limited, and space is wasted if there are
many lists with considerably fewer than MAXWORDS entries.

Note that the initialization of this latter form is exactly as
Chad speculated (although I have added extra braces and commas,
as Chris did).  In fact, you can leave out the explicit NULLs as
list endmarkers, because uninitialized char *'s (in this case,
the entries out to MAXWORDS) are guaranteed to be set to null
pointers.  (For God's sake, if you don't understand this, don't
post a followup!)  The code scanning the list would be
well-advised to terminate after either seeing a NULL or
exhausting MAXWORDS words, whichever comes first.  Accessing the
words list is otherwise (syntactically) identical to the full
pointer version.

The method using dummy arrays, which Chris illustrated, is
generally preferable, especially if you don't mind making up
names.  I often write preprocessors for application-specific
aggregate initializations of this sort, which generate the
intermediate arrays and dummy names automatically.

                                            Steve Summit
                                            scs@adam.pika.mit.edu

chad@lakesys.UUCP (Chad Gibbons) (06/08/89)

In article <11883@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit)
writes:

|The other way is to modify the declaration, so that you don't
|need a pointer:
|	typedef struct {
|		char *words[MAXWORDS];
|		int (*fcn)();
|		short flags;
|	} COM;
|	COM foo[] = {
|		{{ "one", "two", "three", NULL }, do_num, 0,},
|		{{ "exit", "quit", NULL }, quit, 0,},
|	};
|This has the advantage that you don't need to think of names for
|the placeholder pointers.  It has the disadvantages that the
|number of entries is limited, and space is wasted if there are
|many lists with considerably fewer than MAXWORDS entries.

	Of course this disadvantage (wasted space) takes away from the
reason of using the char ** type to begin with.  A single entry might
have as few as one entry and has many as _fifty_--lots and lots of
wasted space there, unfortunately.  It would be quite nice to be able to 
specify the list the same way one does when initializing an array of
pointers, but the world doesn't always work the way you want it to.

|The method using dummy arrays, which Chris illustrated, is
|generally preferable, especially if you don't mind making up
|names.

	Yes; the extra names don't make a difference to me--they can be
declared static in a given module and then you wouldn't have to worry
about someone "forgetting" what the name you used was and wind up using
it for something else, not that this would matter or anything. 

	 Of course, once you find out your compiler only supports 4K of
literal strings, and you need at least 16K worth, this doesn't matter
any more. That is somewhat of a good thing; imagine trying to do a 
linear-type search through a 16K list _all the time_.  If I have to store
them external to the code, at least I can throw them into some type of
structure where searching goes quickly.
-- 
D. Chadwick Gibbons, chad@lakesys.lakesys.com, ...!uunet!marque!lakesys!chad