[comp.lang.c] initialization of unions

wsmith@uiucdcsb.cs.uiuc.edu (10/26/87)

I have a question about initialization of variables in C.
Kernighan & Ritchie do not speak of what happens when a union is
attempted to be initialized.  The 4 compilers available to me disagree on 
what is legal.

K&R say that if an initialization list is incomplete, the remaining fields
are initialized to zero, which works on all 4 compilers, if for example the
union is the last element of a structure and I do not leave a value for
that place. The difference occurs when I try to place a value into the
union field.  2 consider this to be an error, one bluntly saying: "unions
can not be initialized".  The other two allow the initialization to take
place if the first option of the union matches the value being placed in.

Which two are correct according to the new standard?

struct foo  {
	union {
		char * pch;
		int a;
		} j;
	int l;
	} unionvar = { "hello\n", 37 };

This ^^^  works on two and not on the other two.

struct foo2 {
	int l;
	union {
		char * pch;
		int a;
		} j;
	} unionvar = { 37 };

This ^^^  works on all four.

Bill Smith
ihnp4!uiucdcs!wsmith
wsmith@a.cs.uiuc.edu

rmasters@bbn.COM (Bob Masters) (10/27/87)

In article <165600017@uiucdcsb> wsmith@uiucdcsb.cs.uiuc.edu writes:
>
>I have a question about initialization of variables in C.
>Kernighan & Ritchie do not speak of what happens when a union is
>attempted to be initialized.  The 4 compilers available to me disagree on 
>what is legal.
 
ANSI C has a specific way of initializing unions, which, as I recall is so
ambiguously worded as to be almost useless (allow 20 minutes to decipher :-)

As I recall (this was from several months ago, when I was writing a book on
Turbo-C . . . they didn't follow the standard . . .), unions are initialized
as if you were initializing their first element.  So . . .

union u1 {
    long   along;
    int    anint[2];
  } myunion = { 0x012345678L };
  
union u2 {
    int    anint[2];
    long   along;
  } yourunion = { 0x5678, 0x1234 };


. . . are both valid declarations, and (on a small-endian machine, such as an
'86) are pretty much equivalent.

>
>struct foo  {
>	union {
>		char * pch;
>		int a;
>		} j;
>	int l;
>	} unionvar = { "hello\n", 37 };
>

I believe that this should work on an ANSI-compatible compiler

-kdg

chip@killer.UUCP (10/29/87)

In article <4237@ccv.bbn.COM> kgregory@ccy.bbn.com (Keith D Gregory) writes:
> ...unions are initialized
> as if you were initializing their first element...

I've never understood why this is the case.  Is there some reason I'm
overlooking why the compiler couldn't let you cast it to one of the
following elements?

-- 
Chip Rosenthal, Dallas Semiconductor, (214) 450-0400
This message courtesy of ``The UNIX Connection BBS'' in Dallas.
Neither they nor my employer are responsible for my stupidity.

chris@mimsy.UUCP (Chris Torek) (10/29/87)

>In article <4237@ccv.bbn.COM> kgregory@ccy.bbn.com (Keith D Gregory) writes:
>>...unions are initialized as if you were initializing their first element...

In article <1936@killer.UUCP> chip@killer.UUCP (Chip Rosenthal) writes:
>... Is there some reason I'm overlooking why the compiler couldn't let
>you cast it to one of the following elements?

Yes.  Casts do not carry enough information if you work under
`expression' rules, where types char, short, and float get short
shrift, being immediately promoted to longer types.  Moreover, you
cannot cast aggregates, so that

	union {
		struct intnode {
			int type;
			int i;
		} intnode;
		struct floatnode {
			int type;
			float f;
		} floatnode;
		...
	} x = {
		(struct floatnode) { FLOATTYPE, 4.5 }
	};

is not possible.  If, however, you were to announce that, because
there is no clean and simple solution, that unions were never to
be initialised at all, you would not be able to pin down the
initial value of an uninitialised union:

	union { int i; char *p; } global_x;

Yet all global variables are supposed to be initialised to `zero',
even if that results in a nonzero bit pattern:

	char *p;
	char *q = 0;

are the same if both variables are global (or static), while it
is conceivable that both have the bit pattern that corresponds
to 0xc0000000.  (This pattern guarantees an error if dereferenced
on a Vax.)  What, then, is the value of `global_x' above?

The X3J11 committee has decided that the value is such that the
first element of the union will compare equal to zero, so on
our hypothetical Vax compiler that makes zero `char *' values
0x80000000, global_x.i is 0, not 0x80000000.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@brl-smoke.ARPA (Doug Gwyn ) (10/30/87)

In article <4237@ccv.bbn.COM> kgregory@ccy.bbn.com (Keith D Gregory) writes:
>ANSI C has a specific way of initializing unions, which, as I recall is so
>ambiguously worded as to be almost useless (allow 20 minutes to decipher :-)

There is nothing ambiguous about initializing unions specifically;
however, many people have had trouble understanding the rules for
incompletely { } bracketed initializer lists for aggregates in
general, and that part of the wording in the draft proposed Standard
is being revised.  (The problem is that it is not clear from the
former rules whether the initializer list should be parsed "bottom
up" or "top down".)

>... unions are initialized as if you were initializing their first element.

Yes.  First member, actually.

nextuid@xyzzy.UUCP (Next User-id to allocate) (10/30/87)

In article <165600017@uiucdcsb> wsmith@uiucdcsb.cs.uiuc.edu writes:
| 
| K&R say that if an initialization list is incomplete, the remaining fields
| are initialized to zero, which works on all 4 compilers, if for example the
| union is the last element of a structure and I do not leave a value for
| that place. The difference occurs when I try to place a value into the
| union field.  2 consider this to be an error, one bluntly saying: "unions
| can not be initialized".  The other two allow the initialization to take
| place if the first option of the union matches the value being placed in.

ANSI says that it legal to initialize a union, and by doing so, you use
the first element of the union as the type to use when filling in the bits.

Note, there have been some people who wrote in and said they thought the
first member business was for the birds, and proposed various means to
specify which member gets initialized.  My personal feeling is the first
member is more in the spirit of K&R C, and that a much better scheme would
be to allow you to specify the name of the field being initialized ala ADA.
I don't know how many times I have changed the order of fields in a structure
and then had to go find all cases of initialization and fix them as well.

guy%gorodish@Sun.COM (Guy Harris) (10/31/87)

"Next User-id to allocate"?

> Note, there have been some people who wrote in and said they thought the
> first member business was for the birds, and proposed various means to
> specify which member gets initialized.  My personal feeling is the first
> member is more in the spirit of K&R C, and that a much better scheme would
> be to allow you to specify the name of the field being initialized ala ADA.
> I don't know how many times I have changed the order of fields in a structure
> and then had to go find all cases of initialization and fix them as well.

Not to mention the fact that, when initializing an array of unions or objects
containing unions, the "first member" rule is useless unless for *every*
element of the array the first member of *all* unions in *every* array element
is the one that you want to initialize.  The only thing the "first member"
array is good for is giving a meaning to the *implicit* initialization of
static objects.  It's better than nothing, as long as 1) it doesn't preclude
doing something better in the future and 2) doesn't give anybody the false
impression that the problem of initializing unions has been solved - it hasn't
been, it's just that one immediate need for a solution has been postponed by
the first member kludge.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

henry@utzoo.UUCP (Henry Spencer) (11/01/87)

> > ...unions are initialized
> > as if you were initializing their first element...
> 
> I've never understood why this is the case.  Is there some reason I'm
> overlooking why the compiler couldn't let you cast it to one of the
> following elements?

Oh oh.  This is another one of those subjects that tends to produce long
flaming debates, with increasingly elaborate "solutions" proposed and
vigorously championed by people who have never used or implemented them.

As I understand it (I do not speak for X3J11), the main reason that the
ANSI draft says *anything* about the matter is a desire to have some sort
of well-defined value in a static union variable at startup time.  While
there is general agreement that a more flexible approach could be useful
at times, it falls to the usual objections:  there is no dire need for it
and no implementation experience with it.  The first-element approach, by
the way, *is* used in some existing implementations and has worked all
right (i.e. it's better than nothing).

Using casts for this is thoroughly ugly, incidentally -- it may *look*
like a minimal solution, but it introduces a major special case into the
language because it is now the only place where casts are mandatory and
the usual implicit conversions are not applicable.  (Relaxing either of
these rules leads to other problems.)

If you have a wonderful idea on the matter, remember that ANSI standards
do come up for revision regularly.  You are much too late to influence the
current standardization effort, but it's time to get moving if you hope to
get your idea into the next revision (which is maybe 5 years away).  What
you should do is:

	(a) implement your idea in your favorite C compiler
	(b) get a fair number of users using the resulting compiler
	(c) get feedback from them on how useful the feature is and
		whether it causes other problems
	(d) using that information as ammunition, prepare a formal
		proposal in time for the revision process

No, you cannot omit step (a); nobody will take the idea seriously unless
you have DEMONSTRATED its usefulness (as opposed to speculating about it).
In C, not some other language.  Better start now, (a) and (b) will take time.
-- 
PS/2: Yesterday's hardware today.    |  Henry Spencer @ U of Toronto Zoology
OS/2: Yesterday's software tomorrow. | {allegra,ihnp4,decvax,utai}!utzoo!henry