[comp.lang.c] Union type conversions

tada@athena.mit.edu (Michael Zehr) (06/13/88)

I have question about type conversions and portability.  (please dont' tell
me to buy & read a manual -- i tried and still don't know if this is legal
or not)

I have a function that sometimes needs an int and sometimes needs a float.
(At the function end, it will be able to tell how to treat the arguements.)
The question I have concerns passing the values.  I made a union and 
a function:

typedef union { int integer_value; float float_value} INTEGER_OR_FLOAT;
func(INTEGER_OR_FLOAT a);

What i'm wondering is how to call this function.  should it be:

int i;
float f;
func( (INTEGER_OR_FLOAT) i);
func( (INTEGER_OR_FLOAT) f);

or should it be:

INTEGER_OR_FLOAT temp;
temp.integer_value = i;
func(temp);
temp.float_value = f;
func(temp);


So what it boils down to, is whether casting into a union type is
legal and portable (i don't want to just play with it til it works and
then discover that it only works by accident on my compiler :-).
Thanks in advance for any help.

-michael j zehr

henry@utzoo.uucp (Henry Spencer) (06/17/88)

> So what it boils down to, is whether casting into a union type is
> legal and portable...

No.  You have to use the temporary union variable and assign to one of
its members, as in your second example.
-- 
Man is the best computer we can      |  Henry Spencer @ U of Toronto Zoology
put aboard a spacecraft. --Von Braun | {ihnp4,decvax,uunet!mnetor}!utzoo!henry

am@cl.cam.ac.uk (Alan Mycroft) (06/23/88)

In article <1988Jun16.182158.2424@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>> So what it boils down to, is whether casting into a union type is
>> legal and portable...
>
>No.  You have to use the temporary union variable and assign to one of
>its members, as in your second example.

Yes, but have you ever seen a compiler which deals with this efficiently?
(Not to mention the human overhead.)

mouse@mcgill-vision.UUCP (der Mouse) (06/25/88)

In article <5754@bloom-beacon.MIT.EDU>, tada@athena.mit.edu (Michael Zehr) writes:
> So what it boils down to, is whether casting into a union type is
> legal and portable

Ouch.  I just searched through K&R V2 for a description of what may be
cast to what.  Nowhere did I find anything that comes right out and
*says* you can't cast to an aggregate type.  However, I also found
nothing explicitly requiring it to even compile, much less work.

Casting a type to a union which has a member of that type is certainly
a reasonable operation.  However, so are many other things which aren't
allowed, such as comparing aggregate types for equality....

Legal or not, I think we can be confident it isn't portable.

Could someone with a copy of the draft standard say just how much
latitude it allows on this point?

> (i don't want to just play with it til it works and then discover
> that it only works by accident on my compiler :-).

If only everyone were so sensible.

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

leo@philmds.UUCP (Leo de Wit) (06/27/88)

In article <231@gannet.cl.cam.ac.uk> am@cl.cam.ac.uk (Alan Mycroft) writes:
|In article <1988Jun16.182158.2424@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
||| So what it boils down to, is whether casting into a union type is
||| legal and portable...
||
||No.  You have to use the temporary union variable and assign to one of
||its members, as in your second example.
|
|Yes, but have you ever seen a compiler which deals with this efficiently?
|(Not to mention the human overhead.)

Yep, I see 'em every day. A good compiler SHOULD handle this
efficiently; the temp. variable would get optimized away (not being
used elsewhere). As far as the overhead is conceirned , this seems the
same argument that programming languages handled when 'making it
easier' for the programmer by

    1) AUTOMATIC declaration (and even initialization) of variables in
       FORTRAN,BASIC.
    2) AUTOMATIC declaration of types in C (functions becoming int, 
       parameters becoming int).
    3) AUTOMATIC casting of ints to pointers in C (the compiler should
       at least warn you when doing so; not all do).

The gain that you might have from these features you loose - and more
than that - spending your time to find an AUTOMATICcaly introduced bug.
Pity your debugger can't solve it AUTOMATICALly.

It seems that you'd rather have a broken compiler than take the trouble
to type some more characters?

    Leo    (join the union).

ftw@masscomp.UUCP (Farrell Woods) (06/29/88)

In article <1180@mcgill-vision.UUCP> mouse@mcgill-vision.UUCP (der Mouse) writes:
>Nowhere did I find anything that comes right out and
>*says* you can't cast to an aggregate type.  However, I also found
>nothing explicitly requiring it to even compile, much less work.

>Could someone with a copy of the draft standard say just how much
>latitude it allows on this point?

>					der Mouse

My old and crusty October '86 dpANS says this:

(page 39, 3.3.4 Cast operators)

Constraints

...the type name shall specify a scalar type and the operand shall have
scalar type.

...

Not much latitude there! 8-)



-- 
============================================================================
Farrell T. Woods - Engineer, OS Development                  MASSCOMP
(617)692-6200 x2471                                      1 Technology Park
{ihnp4|decvax|uunet|allegra}!masscomp!ftw               Westford, MA  01886

richard@aiva.ed.ac.uk (Richard Tobin) (07/08/88)

>>> So what it boils down to, is whether casting into a union type is
>>> legal and portable...
>>
>>No.  You have to use the temporary union variable and assign to one of
>>its members, as in your second example.
>
>Yes, but have you ever seen a compiler which deals with this efficiently?

Indeed I have.  It's gcc (of course).  At least, it works well in this 
simple case:

main()                                    _main:                     
{					          link a6,#-2 
    int i = f();			          jbsr _f 
    char c; 
 
    { 
        union {int i; char c[4];} u; 
        u.i = i;			          moveb d0,a6@(-2) 
        c = u.c[3]; 
    } 
						  pea a6@(-2) 
    g(&c);					  jbsr _g 
}						  unlk a6 
						  rts 
>(Not to mention the human overhead.)

Doesn't solve that of course.

-- Richard

-- 
Richard Tobin,                         JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,             ARPA:  R.Tobin%uk.ac.ed@nss.cs.ucl.ac.uk
Edinburgh University.                  UUCP:  ...!ukc!ed.ac.uk!R.Tobin

atbowler@watmath.waterloo.edu (Alan T. Bowler [SDG]) (07/13/88)

In article <1180@mcgill-vision.UUCP> mouse@mcgill-vision.UUCP (der Mouse) writes:
>In article <5754@bloom-beacon.MIT.EDU>, tada@athena.mit.edu (Michael Zehr) writes:
>> So what it boils down to, is whether casting into a union type is
>> legal and portable
>
>Ouch.  I just searched through K&R V2 for a description of what may be
>cast to what.  Nowhere did I find anything that comes right out and
>*says* you can't cast to an aggregate type.  However, I also found
>nothing explicitly requiring it to even compile, much less work.

Actually I dn't think you are guaranteed anything more than
if you assign to a particular union member you can get back the
value you assigned by naming that member provided that you do
no assign to any other member.  It is usual practice for a compiler
to put all members of a union at the same starting address
(i.e. equivalence them) however, there is no guarantee that
the compiler does not simply do the equivalent of
#define union struct
and proceed from there.  Using union for a "pun" operation
is  reasonable thing to do in many programs, however you
should always be aware that it is not a machine independant
action and would need to be checked in porting a program.

jnh@ece-csc.UUCP (Joseph Nathan Hall) (07/14/88)

In article <19845@watmath.waterloo.edu> atbowler@watmath.waterloo.edu (Alan T. Bowler [SDG]) writes:
 Actually I dn't think you are guaranteed anything more than
 if you assign to a particular union member you can get back the
 value you assigned by naming that member provided that you do
 no assign to any other member.  It is usual practice for a compiler
 to put all members of a union at the same starting address
 (i.e. equivalence them) however, there is no guarantee that
 the compiler does not simply do the equivalent of
 #define union struct
 and proceed from there.  Using union for a "pun" operation
...

Sorry, you're just plain wrong here.  From page 140 of K&R, I quote:

	"In effect, a union is a structure in which all members have
	 OFFSET ZERO [emphasis added], the structure is big enough to hold
	 the 'widest' member, and the alignment is appropriate for all
	 of the types in the union ..."
-- 
v   v sssss|| joseph hall                      || 201-1D Hampton Lee Court
 v v s   s || jnh@ece-csc.ncsu.edu (Internet)  || Cary, NC  27511
  v   sss  || the opinions expressed herein are not necessarily those of my
-----------|| employer, north carolina state university . . . . . . . . . . . 

chris@mimsy.UUCP (Chris Torek) (07/15/88)

>In article <19845@watmath.waterloo.edu> atbowler@watmath.waterloo.edu
>>(Alan T. Bowler [SDG]) writes:
>>... there is no guarantee that the compiler does not simply do the
>>equivalent of `#define union struct' ...

In article <3714@ece-csc.UUCP> jnh@ece-csc.UUCP (Joseph Nathan Hall) writes:
>Sorry, you're just plain wrong here.  From page 140 of K&R, I quote:

... from the *de*scriptive part of the text, which says only that

>	"In effect, a union is a structure in which all members have
>	 OFFSET ZERO [emphasis added] ..."

The point of this quote is to warn users that writing on any one
element of a union *may* stomp any other element, not that it *must*
stomp other elements.  Alan Bowler is right; unions make few
guarantees.  On the other hand, a compiler that does not conserve
storage with union definitions is probably not worth using.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

jnh@ece-csc.UUCP (Joseph Nathan Hall) (07/15/88)

In article <12490@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
 >In article <19845@watmath.waterloo.edu> atbowler@watmath.waterloo.edu
 >>(Alan T. Bowler [SDG]) writes:
 >>... there is no guarantee that the compiler does not simply do the
 >>equivalent of `#define union struct' ...
 
 In article <3714@ece-csc.UUCP> jnh@ece-csc.UUCP (Joseph Nathan Hall) writes:
 >Sorry, you're just plain wrong here.  From page 140 of K&R, I quote:
 
 ... from the *de*scriptive part of the text, which says only that
 
 >	"In effect, a union is a structure in which all members have
 >	 OFFSET ZERO [emphasis added] ..."
 
 The point of this quote is to warn users that writing on any one
 element of a union *may* stomp any other element, not that it *must*
 stomp other elements.  Alan Bowler is right; unions make few
 guarantees.  On the other hand, a compiler that does not conserve
 storage with union definitions is probably not worth using.

The description of unions in K&R (1st ed.; I don't have the 2nd close by
to look at) is, I agree, somewhat vague.  But it specifically states, in
the passage I quoted above, that all of the members start at offset
zero ... don't you think this implies, without ambiguity, that the members
of a union will a) be allocated space starting at the same address, and b)
that they will have in common the first n bytes of storage, where n is the
size of the smallest item?  (Notwithstanding cases where you have unions
of structs where storage isn't contiguously allocated, of course.)

Also, on p. 197 of the 1st ed., "A union may be thought of as a structure
all of whose members begin at offset 0 and whose size is sufficient to
contain any of its members.  At most one of the members can be stored in
a union at any time."

I don't see how you can come up with the liberal interpretation that a
compiler following the K&R standard could "#define union struct."  Maybe there
is a badly-behaved compiler out there that does, but in that case it's not
a "real" C compiler.

'Nuff said on my account.  I'd like to see comments from others ...

-- 
v   v sssss|| joseph hall                      || 201-1D Hampton Lee Court
 v v s   s || jnh@ece-csc.ncsu.edu (Internet)  || Cary, NC  27511
  v   sss  || the opinions expressed herein are not necessarily those of my
-----------|| employer, north carolina state university . . . . . . . . . . . 

karl@haddock.ISC.COM (Karl Heuer) (07/15/88)

In article <12490@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <3714@ece-csc.UUCP> jnh@ece-csc.UUCP (Joseph Nathan Hall) writes:
>[quoting from the *de*scriptive part of the text [K&R], which says only that]
>>	"In effect, a union is a structure in which all members have
>>	 OFFSET ZERO [emphasis added] ..."
>
>The point of this quote is to warn users that writing on any one
>element of a union *may* stomp any other element, not that it *must*
>stomp other elements.  Alan Bowler is right; unions make few
>guarantees.  On the other hand, a compiler that does not conserve
>storage with union definitions is probably not worth using.

Except possibly as a debugging tool to catch accidental punning.

Anyway, the dpANS does guarantee that "a pointer to a union object, suitably
cast, points to each of its members ... and vice versa" (3.5.2.1).  So an
implementation that tries to make structs out of unions would have to do some
gymnastics if it's to remain ANSI-conformant.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

chris@mimsy.UUCP (Chris Torek) (07/22/88)

>>>In article <19845@watmath.waterloo.edu> atbowler@watmath.waterloo.edu
>>>>(Alan T. Bowler [SDG]) wrote:
>>>>... there is no guarantee that the compiler does not simply do the
>>>>equivalent of `#define union struct' ...

>>In article <3714@ece-csc.UUCP> jnh@ece-csc.UUCP (Joseph Nathan Hall)
>>answered with a quote from K&R 1st ed., p. 140:
>>>	"In effect, a union is a structure in which all members have
>>>	 OFFSET ZERO [emphasis added] ..."

>In article <12490@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) I said:
>>The point of this quote is to warn users that writing on any one
>>element of a union *may* stomp any other element, not that it *must*
>>stomp other elements. ...

In article <3717@ece-csc.UUCP> jnh@ece-csc.UUCP (Joseph Nathan Hall)
continues:
>The description of unions in K&R (1st ed.; I don't have the 2nd close by
>to look at) is, I agree, somewhat vague.  But it specifically states, in
>the passage I quoted above, that all of the members start at offset
>zero ... don't you think this implies, without ambiguity, that the members
>of a union will a) be allocated space starting at the same address, and b)
>that they will have in common the first n bytes of storage, where n is the
>size of the smallest item?  (Notwithstanding cases where you have unions
>of structs where storage isn't contiguously allocated, of course.)

I am not sure.  In particular, if there is no testable assertion that
makes a union different from a structure, then a compiler that implements
a union as a structure will not break any (testable) rules and will
thus be correct.

>Also, on p. 197 of the 1st ed., "A union may be thought of as a structure
>all of whose members begin at offset 0 and whose size is sufficient to
>contain any of its members.  At most one of the members can be stored in
>a union at any time."

This is a bit stronger (being in the prescriptive text), but `may be
thought of' is not the same as `is'.

>I don't see how you can come up with the liberal interpretation that a
>compiler following the K&R standard could "#define union struct."

Write some correct code that produces a wrong answer if a union of a
set of elements were implemented as a structure containing all the
elements, and you will have a proof.  As it is, the only thing I can
come up with is this:

	union {
		int a;
		int b;
	} x;
	...
	x.b = 0;
	x.a = 123;
	assert(x.b == 123);

which I am not certain is guaranteed (by K&R 1st ed., at least) to work.
If it did not, it would violate the Rule of Least Astonishment, but that
rule does not appear in the text. . . .
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

walter@hpcllca.HP.COM (Walter Murray) (07/23/88)

Chris Torek writes:

>If there is no testable assertion that
>makes a union different from a structure, then a compiler that implements
>a union as a structure will not break any (testable) rules and will
>thus be correct.
>
>Write some correct code that produces a wrong answer if a union of a
>set of elements were implemented as a structure containing all the
>elements, and you will have a proof.  

O.K., how about:

#include <assert.h>
main ()
{
   struct {int a; int b;} s;
   union  {int c; int d;} u;
   assert (&s.a <  &s.b);
   assert (&u.c == &u.d);
}

I believe the draft proposed ANSI standard guarantees this to work
(May, 1988, section 3.3.8).  "Pointers to structure members declared
later compare higher than pointers to members declared earlier in
the structure."   "All pointers to members of the same union object
compare equal."

Walter Murray
All opinions expressed are my own.

henry@utzoo.uucp (Henry Spencer) (07/24/88)

In article <12628@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>Write some correct code that produces a wrong answer if a union of a
>set of elements were implemented as a structure containing all the
>elements, and you will have a proof...

Easy, given (3.5.2.1):  "A pointer to a union object, suitably cast,
points to each of its members... and vice versa."  If two members have
the same type, they have to be in the same place.
-- 
Anyone who buys Wisconsin cheese is|  Henry Spencer at U of Toronto Zoology
a traitor to mankind.  --Pournelle |uunet!mnetor!utzoo! henry @zoo.toronto.edu

chris@mimsy.UUCP (Chris Torek) (07/26/88)

>In article <12628@mimsy.UUCP> I suggsted that someone
>>Write some correct code that produces a wrong answer if [union == struct]

to which, in article <1988Jul23.220609.22105@utzoo.uucp>, henry@utzoo.uucp
(Henry Spencer) suggests:
>Easy, given (3.5.2.1) . . . .

I had thought this was to be limited to K&R (1st ed.).  The dpANS makes
a number of guarantees not present in K&R.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

will.summers@p6.f18.n114.z1.fidonet.org (will summers) (07/30/88)

 > I am not sure.  In particular, if there is no testable assertion that
 > makes a union different from a structure, then a compiler that implements
 > a union as a structure will not break any (testable) rules and will
 > thus be correct.
 >
 > Write some correct code that produces a wrong answer if a union of a
 > set of elements were implemented as a structure containing all the
 > elements, and you will have a proof.
 
How about:
  union { int a; int b; } x;
 
  int offset_of_b = (int) (&x.b - &x);
    ...
 
    if  (offset_of_b != 0)
        printf("compiler is broke\n");
 
"A union may be thought of as a structure all of whose members begin at offset 
0" (K&R A.8.5, pg 197).
 
If I may "think of" a union as an "all 0 offset structure", then I may write 
code that fails when any aspect of that metaphor is violated.
 
So what's wrong with the above code?
 
   \/\/ill
 


--  
St. Joseph's Hospital/Medical Center - Usenet <=> FidoNet Gateway
Uucp: ...ncar!noao!asuvax!stjhmc!18.6!will.summers
Internet: will.summers@p6.f18.n114.z1.fidonet.org

chris@mimsy.UUCP (Chris Torek) (08/02/88)

In some article whose referent was deleted by deficient news software
(actually, by crossing a news/FIDO gateway), I wrote in re union offsets:
>>Write some correct code that produces a wrong answer if a union of a
>>set of elements were implemented as a structure containing all the
>>elements, and you will have a proof.

(Incidentally, you need to do this for all possible cases to make a
proof for more than just one case.  This was in an article in which
my point was `in a computer language, a difference that makes no
difference *is* no difference.')

In article <224.22F29448@stjhmc.fidonet.org>
will.summers@p6.f18.n114.z1.fidonet.org (will summers) writes:
>How about:
>  union { int a; int b; } x;
> 
>  int offset_of_b = (int) (&x.b - &x);
>    ...
> 
>    if  (offset_of_b != 0)
>        printf("compiler is broke\n");

>So what's wrong with the above code?

`Let me count the ways ...' :-)

First: you have subtracted pointers of different types; that operation
is undefined.  Well, we could fix that:

	int offset_of_b = &x.b - &x.a;

but it may still be wrong to assume that offset_of_b should then be 0.

>"A union may be thought of as a structure all of whose members begin at
>offset 0" (K&R A.8.5, pg 197).
> 
>If I may "think of" a union as an "all 0 offset structure", then I may write 
>code that fails when any aspect of that metaphor is violated.

You may indeed---but it might still be incorrect code.

The question here is really in regard to the original wording: why say
`may be thought of as' rather than `is'?  Given that this is in a
tutorial, rather than a formal language definition, there is one very
likely answer: perhaps `is' would be false.  After all, a float `may be
thought of as' a pair of integers separated by a decimal point.  That
metaphor works to a fair extent, but it breaks down when you really
push it.  Perhaps the same is true of a union.  Just possibly, the idea
was that a union would be a structure with zero offsets, except
wherever that happened not to be ideal, such as when embedded into
another structure:

	struct this_could_be_aligned_fancily {
		char name[7];	/* object name, max 7 letters */
		union {
			long l;	/* value if long */
			char c[5]; /* value if string */
		} u;		/* (type distinguished by name) */
	};

On a byte-addressible machine where `long's must be aligned at a
multiple of four bytes (e.g., SPARC), the obvious way to pack this is:

	offset	object(s)
	------	---------
	 0	name[0]
	 1	name[1]
	...
	 6	name[6]
	 7	<hole>
	 8	u.l, u.c[0]
	 9	u.l, u.c[1]
	10	u.l, u.c[2]
	11	u.l, u.c[3]
	12	u.c[4]
	13	<hole>
	14	<hole>
	15	<hole>

A `better' way to pack it is this:

	offset	object(s)
	------	---------
	 0	name[0]
	...
	 6	name[6]
	 7	u.c[0]
	 8	u.l, u.c[1]
	 9	u.l, u.c[2]
	10	u.l, u.c[3]
	11	u.l, u.c[4]

This saves four bytes per structure object.

The question, then, is `is such a packing legal?'  K&R does not really
answer this.  I hope that the dpANS does, and does so with a `no'; but
I do not know whether this is in fact the case.

(I had hoped not to have to be this explicit, but at least this is
better than reruns of `pointers vs. arrays' or `defining NULL' :-) .)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris