[comp.lang.c] C union problems

bills@sequent.UUCP (Bill Sears) (04/26/89)

What this posting basically boils down to is:

    Is a pointer a pointer?

In other words if I have a pointer to a given object, can I typecast
it into a pointer to another object and be guaranteed that the new
pointer will be "the same" as the old pointer?  The scenario is
fairly long.  Please ignore the errors caused by omission.

Consider the following scenario.  I have a list of structures which
have three fields: a char array, a type, and a pointer which points
to one of two objects depending upon the aforementioned type.

typedef enum { ismenu, isaction } objtype;
typedef struct
    {
    char	desc[40];
    objtype	stype;
    union
	{
	MENU	*menu;
	ACTION	*action;
	}	dummy;
    } SELECTION;

SELECTION a;

The problem with this is that in order to declare a union, you must
introduce a new variable into the structure (i.e. dummy).  Now my two
pointers must be accessed as "a.dummy.menu" and "a.dummy.action",
rather than (the preferable) "a.menu" and "a.action".  One way to 
solve this is not to use a union.

typedef struct
    {
    char	desc[40];
    objtype	stype;
    MENU	*menu;
    ACTION	*action;
    } SELECTION;

This results in unused storage being allocated for each SELECTION.
Another solution is to use a single pointer variable to reference
both possibilities.

typedef struct
    {
    char	desc[40];
    objtype	stype;
    char	*objptr;	/* or int * or long * */
    } SELECTION;

Now, by casting objptr to be the type of pointer that I am using
at any given time, I can get rid of any compile errors, but is this
always guaranteed to work?  In other words, is a pointer a pointer?

Although this is probably a matter of personal taste and programming
style, which of the above implementations is the "most desirable"?
They all have flaws (as I have documented) but which is the "best".

For no particular reason, other than information, the following is
the Pascal code which will implement this same scenario.  Since I
am not a Pascal programmer, please ignore errors in this fragment.
I think it is correct enough to convey the desired idea.

type
  objtype = (ismenu, isaction);
  act_ptr = ^action;
  menu_ptr = ^menu;
  selection = record
    desc : packed array[1..40] of char;
    case stype : objtype of
      ismenu   : ( menu   : menu_ptr );
      isaction : ( action : act_ptr )
  end;

var
  a : selection;

This solves all of the above flaws, but introduces one of it's own
(i.e. it's in the wrong language :-).  The pointer fields are accessed
via "a.menu" and "a.action", there is no wasted space introduced, and
all of the pointer types will point to their own type of structure.
Was the Pascal variant record syntax designed to overcome the above
restrictions with the C union, or is it just coincidental that it does?

Any comments?

gwyn@smoke.BRL.MIL (Doug Gwyn) (04/26/89)

In article <15058@sequent.UUCP> bills@sequent.UUCP (Bill Sears) writes:
>Now my two pointers must be accessed as "a.dummy.menu" and "a.dummy.action",
>rather than (the preferable) "a.menu" and "a.action".

If that really, really bothers you, first consider using "u" instead of
"dummy" for the union identifier. and if that's still not enough then
try something like

struct {
	char	desc[40];
	objtype	stype;
	union {
		MENU	*umenu;
#define menu u.umenu
		ACTION	*uaction;
#define action u.uaction
	} u;
} a;

Then you can type your beloved "a.action" etc.
Personally I don't think typing the extra "u." is such a big deal.

>In other words, is a pointer a pointer?

No, different pointer types in general have different requirements
(although all pointers to structures have to have similar representation,
by a fairly subtle chain of reasoning).  There is a generic pointer
type (void* in ANSI C), but it's a mistake to use it unnecessarily,
because you lose type checking that way.  (There can also be run-time
overhead for pointer conversions in some implementations.)

>Was the Pascal variant record syntax designed to overcome the above
>restrictions with the C union, or is it just coincidental that it does?

There is no evidence I know of that Wirth was aware of C when he designed
Pascal.  (I'm not sure whether or not it was even chronologically possible.)
Pascal variant records and C unions don't address quite the same need,
although there are some similarities.  I'm actually quite glad that C
unions don't require an associated variant-selector tag; there are times
when it would get in the way.

Early versions of C allowed structure members to be accessed with
pointers to other structure types, which would have provided you with
an alternative solution to your "problem".  Each structure type was
given its own member name space somewhere around 1977, as I recall.
I think most experienced C programmers would agree with the change.
(Not that we have any choice in the matter.)

kurtk@tekcae.CAX.TEK.COM (Kurt Krueger) (04/26/89)

On 'real' computers a pointer is generally an absolute virtual memory address.
Casting a pointer is really a NOP, it is only done to keep the compiler happy
(and an attempt to convey that you REALLY know what you are doing).

So in this case, a pointer is a pointer regardless of what it points to.

Now, enter the 80287 (i.e IBM PC).  It takes TWO (2, count 'em) 16 bit
quantities to specify a memory address and the formula goes something like
address = segment*16 + offset.  Borland has implemented three pointer types
in an attempt to maintain speed but yet address all of memory if possible.
'Near' pointers are just the offset, 'far' pointers are both, but since several
combinations of segment and offset can reference the same address, they have
'huge' pointers which are just normalized far pointers.

In this case a pointer is nowhere near a pointer.

I think you're better off 'unionizing' your pointers and living with the
additional annoyance of union1.union2.pointer.  Makes brain damaged processors
more likely to work with your code.

________________________________________________________________________________
					|
kurtk@tekcae.CAX.TEK.COM (Kurt Krueger)	| Everything runs on smoke.  When the
  Electrical Simulation Group (ECAX)	| smoke leaks out, it stops working.
    D.S. 59-432  (503) 627-4363		|
________________________________________|_______________________________________

henry@utzoo.uucp (Henry Spencer) (04/27/89)

In article <15058@sequent.UUCP> bills@sequent.UUCP (Bill Sears) writes:
>The problem with this is that in order to declare a union, you must
>introduce a new variable into the structure (i.e. dummy).  Now my two
>pointers must be accessed as "a.dummy.menu" and "a.dummy.action",
>rather than (the preferable) "a.menu" and "a.action".  One way to 
>solve this is not to use a union.

Another way is to use the union but use #defines to hide the extra
name:

	union {
		MENU	*dumenu;
		ACTION	*duaction;
	} dummy;
	#define	menu	dummy.dumenu
	#define	action	dummy.duaction

so you can say "a.menu" and have it work.  It's a bit inelegant, but it
works very nicely.

>    char	*objptr;	/* or int * or long * */
>
>Now, by casting objptr to be the type of pointer that I am using
>at any given time, I can get rid of any compile errors, but is this
>always guaranteed to work?  In other words, is a pointer a pointer?

No; in the general case a cast from one type of pointer to another yields
machine-dependent and unpredictable results.  However, "void *" and (as a
grandfather clause) "char *" are exceptions:  they are guaranteed to be
able to hold any pointer, and conversions to and from them are guaranteed
safe.

>Although this is probably a matter of personal taste and programming
>style, which of the above implementations is the "most desirable"?

I would give the nod to union-plus-#define, which avoids waste of storage
and, at the cost of slight inelegance at the declaration, avoids notational
clumsiness at the points of use.

>Was the Pascal variant record syntax designed to overcome the above
>restrictions with the C union, or is it just coincidental that it does?

They're somewhat differently-flavored solutions to the same problem.
Pascal wins on this particular point of syntax-at-point-of-use, at the
cost of more complex structure-declaration syntax.
-- 
Mars in 1980s:  USSR, 2 tries, |     Henry Spencer at U of Toronto Zoology
2 failures; USA, 0 tries.      | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

gwyn@smoke.BRL.MIL (Doug Gwyn) (04/27/89)

In article <2700@tekcae.CAX.TEK.COM> kurtk@tekcae.CAX.TEK.COM (Kurt Krueger) writes:
>Casting a pointer is really a NOP

No!  This is a common misconception.  Pointer casting can involve change
in representation, and when converting between pointer to char and
pointers to wider types it often does.

yair@tybalt.caltech.edu (Yair Zadik) (04/29/89)

I like the C++ solution the best: you can declare a union without a name
as long as it is within a struct.  These 'anonymous' unions behave they 
way you would expect them to.  You could just declare:

	typedef struct {
			int a;
			char b;
			union {
				int *c1;
				char *c2;
			      }
			} randomtype;

	randomtype random;

Then you could refer to random.a, random.b, random.c1, and random.c2 just
like in a Pascal record variant.  The only problem is that you need a C++
compiler to handle it.

yair@tybalt.caltech.edu