[comp.lang.c] offsetof: definition & use

scs@adam.mit.edu (Steve Summit) (06/07/90)

In article <8242@crdgw1.crd.ge.com> volpe@underdog.crd.ge.com (Christopher R Volpe) writes:
>In the "Answers to Frequently Asked Questions" document,
>the structure offset macro is defined as follows:
>	#define offset(type,mem) ((size_t) (char *)&(((type *)0)->mem))
>If you want to convert it to be of type "size_t", why must it
>first be converted to "char *" ??

The offsetof macro (the "frequently asked" posting incorrectly
calls it "offset") is defined as returning a byte offset.  The
above implementation (which is the usual one, if you can get
away with it) involves casting a pointer to an integer, an
intrinsically unportable notion.  The pointer type with the
greatest likelihood of being successfully and "correctly" cast to
an integer byte offset is a pointer to a byte, or char *.
("Correctly" is in quotes because there is no generally "correct"
meaning for casting a pointer to an integer, although K&R
suggests that "the mapping... is intended to be unsurprising to
those who know the addressing structure of the machine." --First
Edition, Sec. A14.4, p. 210.)

As an example of why this distinction between pointer types could
matter, consider a machine with a 16-bit word size and a word
addressing scheme.  That is, the first 16 bit word has address 0,
the second 16 bits are at address 1, etc.  A character pointer on
such a machine must consist of a word pointer plus an additional
bit, to indicate which of the two bytes in the word is being
addressed.  On such a machine, casting an integer pointer (or a
pointer to anything larger than an int which is word-addressable)
involves a change of representation.  (Typically, this change
will amount to multiplying the word address by two.)

Here is how the first four words (eight bytes) of this machine's
memory might be addressed:

	word                byte
	address             addresses
	         ____ ____
	      0 |____|____| 0, 1
	      1 |____|____| 2, 3
	      2 |____|____| 4, 5
	      3 |____|____| 6, 7

If we had the structure

	struct x {
		char a, b;
		short c;
		int d;
	};

on this hypothetical machine, the expression

	&(((struct x *)0)->d)

would probably evaluate to address 2.  We can't predict exactly
how that machine would implement casting an int * to a size_t,
but it is conceivable that the result could be the integer 2.
However, casting (int *)2 to a char * should yield address 4, and
it seems likely that casting this (char *)4 to an int would in
fact yield 4, the desired answer.

In general, you shouldn't have to think about this level if
implementation unless you're a compiler writer.  Using the
standard tools (such as a vendor-supplied, correctly-implemented
offsetof() macro) should insulate you from arcane machine-
specific details.  As K&R says, "if you don't know how they are
done on various machines, that innocence may serve to protect
you."  (First Edition, Sec. 2.12, p. 50.)

In article <20026@duke.cs.duke.edu> drh@cs.duke.edu writes:
>Having obtained the byte offset of some field in a structure, is there
>a standard way of using that offset?

The correct use essentially undoes the double cast in the
offsetof macro, casting the structure pointer to a char * so the
integer byte offset can meaningfully be added, then casting again
to the desired pointer type.  If structp is a pointer to a struct x,
and off is the offset of the integer field d (presumably
computed with offsetof(struct x, d)), then d's value can be set
indirectly with

         *(int *)((char *)structp + off) = value;

See the frequently-asked posting (question 35) for a bit more
information.

                                            Steve Summit
                                            scs@adam.mit.edu