[net.lang.c] making it easier to use unions

rgenter@BBN-LABS-B.arpa (Rick Genter) (06/26/86)

     Quite often, one will see a union used in a context similar to the
following:

	struct	device	{
		union	{
			unsigned short	_dr_word1;
			struct	{
				unsigned _dr_bit1  : 1;
				unsigned _dr_bit2  : 1;
				unsigned _dr_value : 3;
				unsigned	   : 2;
				unsigned _dr_ctrl  : 1;
				unsigned _dr_val2  : 8;
			} _dr_w1;
		} _dr_u1;

		unsigned short	dr_data;

		union	{
			< another control word >
		} _dr_u2;
	};

followed by a whole bunch of #defines like:

	#define	dr_word1	_dr_u1._dr_word1
	#define	dr_bit1		_dr_u1._dr_w1._dr_bit1
	#define	dr_value	_dr_u1._dr_w1._dr_value

usually bracketed by a comment saying something about "making it easier to
access the bit fields."  The reason for using the union in the first place
is that you want to write your device driver in an obvious manner - if you
want to assert the 'bit1' bit in the first control register, you want to
say something like:

	dp->dr_bit1 = 1;

(we'll ignore the issue of bit-field ordering and portability for now) - yet
you have to actually write the control register as a word because it is a
write-only register and the only bit manipulation instructions your processor
executes are read-modify-write, causing you to keep a software copy of the
register.

     It would be nice if we could do away with the intermediate labels
on structures/unions used in this manner.  It seems to me that it
should be possible given the constraint that within the enclosing
context for the structure/union declaration in question, field names
must be unique.  In my mind, this is merely an extension of insisting
that field names within a structure/union declaration be unique.

     Using the above example, we could define 'struct device' as:

	struct	device	{
		union	{
			unsigned short	dr_word1;
			struct	{
				unsigned dr_bit1  : 1;
				unsigned dr_bit2  : 1;
				unsigned dr_value : 3;
				unsigned	   : 2;
				unsigned dr_ctrl  : 1;
				unsigned dr_val2  : 8;
			};	/* <= note there is no label here */
		};		/* <= nor here */

		unsigned short	dr_data;

		union	{
			< another control word >
		};		/* <= nor here */
	};

Then if you had a variable 'dp' declared as (struct device *), you could
just reference dp->dr_bit1 directly, without going through all of that 
#define nonsense and without saying dp->_dr_u1._dr_w1._dr_bit1.  Only objects
of type 'struct device' would be able to reference the fields 'dr_bit1', etc.,
without generating a warning ("warning: illegal member use: dr_bit1").

     Now, someone tell me why this would be bad to add to X3J11.  My
apologies if it has already been added; my copy is the April 30, 1985 draft
and I haven't incorporated the 9 articles worth of changes which were
posted a couple of months ago into it yet (1/2 :-).

(By the way, accessing device registers is not the only application for
using structures/unions as shown.  Other applications include protocol
implementations, and interpreting binary files with complex record formats.)
--------
Rick Genter 				BBN Laboratories Inc.
(617) 497-3848				10 Moulton St.  6/512
rgenter@labs-b.bbn.COM  (Internet new)	Cambridge, MA   02238
rgenter@bbn-labs-b.ARPA (Internet old)	linus!rgenter%BBN-LABS-B.ARPA (UUCP)

chris@umcp-cs.UUCP (Chris Torek) (06/30/86)

In article <1725@brl-smoke.ARPA> rgenter@BBN-LABS-B.arpa (Rick Genter)
points out that unions often contain `useless' structures and
ad hoc `constructed' names, followed by a series of `#define's:

>	#define	dr_word1	_dr_u1._dr_word1
>	#define	dr_bit1		_dr_u1._dr_w1._dr_bit1
>	#define	dr_value	_dr_u1._dr_w1._dr_value
>
>usually bracketed by a comment saying something about "making it easier to
>access the bit fields."

This is true, and I, at least, have found this particular kludge
annoying, yet useful.  Rick suggests an extension to avoid the
constructed names, removing much of the kludgery.

His suggested method for doing this seems to me, however, rather
confusing; he suggests simply omitting the labels:

>	struct	device	{
>		union	{
>			unsigned short	dr_word1;
>			struct	{
>				unsigned dr_bit1  : 1;
>				unsigned dr_bit2  : 1;
>				unsigned dr_value : 3;
>				unsigned	   : 2;
>				unsigned dr_ctrl  : 1;
>				unsigned dr_val2  : 8;
>			};	/* <= note there is no label here */
>		};		/* <= nor here */
>
>		unsigned short	dr_data;
>
>		union	{
>			< another control word >
>		};		/* <= nor here */
>	};

The problem here is that constructs such as

	struct { int x; char *y; };

and

	union { short a; char b[2]; };

are already legal, if useless, and this particular extension would
have to be implemented with a rule such as `if a struct or union
has no tag name and declares no data objects, the fields it declares
migrate (along with their offset values) into the containing struct
or union'.  I am not certain why, but this `feels' confusing to me.

If this were to be implemented, I would like to see some sort of
keyword indicating that the `dummy' structures or unions are there
to declare fields in the next outer level:

	struct drdevice {
		this_is_a_fake union {
			u_short	dr_word1;
			this_is_a_fake struct {
				u_int	dr_bits1:1,
					dr_bits2:15;
			};
			/* dr_bits1 and dr_bits2 are now available in
			   the union */
		};
		/* dr_word1, dr_bits1, and dr_bits2 are now all
		   available in struct drdevice */
		u_short	dr_word2;
		...
	};

or similar.  *This*, unfortunately, requires yet another keyword
(`this_is_a_fake' is not a serious suggestion).  (I suppose one
could appropriate `entry', but that name is terrible.  `void'
perhaps, but that one is already way overused in the draft standard.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

mcdaniel@uicsrd.CSRD.UIUC.EDU (07/07/86)

C++ permits "anonymous unions" like
	struct {
		union {
			int x;
			char * y;
		};
		int z;
	} zap;
so
	zap.z = zap.x;
is legal.  I don't find it confusing.  [I think that anonymous structs
should likewise be permitted in C++ -- which is outside the scope of C.]

In such a case, I would say
	struct {		/* or union if appropriate */
		int x;
		char * y;
	};
	int z;
outside another struct/float would likewise define 3 external
variables named x, y, and z.

Anonymous structs could thus be described as only for purposes of
grouping (like parentheses) but do not affect scope; all identifiers
declared therein are "exported" to the next scope out, whatever it may
be.

Anonymous unions would be for grouping and for storage overlay.

As for C:  probably too late to add anonymity.  Anonymous is a lousy
keyword:  too hard to speel.

henry@utzoo.UUCP (Henry Spencer) (07/10/86)

> C++ permits "anonymous unions" ...

Actually, the really old C compilers permitted this too, by accident,
since they had no notion that a struct/union member name "belonged" to
a particular struct/union.  To them, a member name was just an offset
and a type.  Since all offsets in a union are 0, it all worked out.
A certain amount of old code, notably the Unix kernel, relied on this.
This trick broke when member names became local.
-- 
Usenet(n): AT&T scheme to earn
revenue from otherwise-unused	Henry Spencer @ U of Toronto Zoology
late-night phone capacity.	{allegra,ihnp4,decvax,pyramid}!utzoo!henry