[comp.lang.c] Determing alignment of

newman@mit-trillian.MIT.EDU (Ron Newman) (12/05/86)

I am using a (char *) pointer to store a sequence of differently-typed
and differently-sized objects.  I want to make sure that each object
will be properly aligned before storing it.

In particular, I need to determine whether the pointer is 32-bit
aligned before attempting to store a long by casting it to a (long *).
Which is a better, more portable way of determining whether a pointer
is aligned?

   char *p;

   /* method 1 */

   if ((long)p & 3) ...


   /* method 2 */

   if ((long)(p - (char *) 0) & 3) ...


or if neither of these is best, what is better?

/Ron Newman

rbutterworth@watmath.UUCP (12/05/86)

In article <1510@mit-trillian.MIT.EDU>, newman@mit-trillian.MIT.EDU (Ron Newman) writes:
> 
>    char *p;
>    /* method 1 */
>    if ((long)p & 3) ...
>    /* method 2 */
>    if ((long)(p - (char *) 0) & 3) ...

As far as I know, all C compilers will return the integral number
of bytes between the two pointers in method 2.  It should always
work (and there is no need for the (long) cast.  (There might be
compilers out there in which (sizeof(char))!=1, I don't know.)

Method 1 is definitely not reliable.  First, there is no guarantee
that a (long) can hold a (char*); some machines have pointers that
are longer than longs (e.g. one word for the word address, a second
word for the bit or byte offset).  Second, even if it does fit, there
is no guarantee that the pointer looks anything like an integer.  I
use a machine in which the upper 18 bits are the word address of the
object and the lower 18 bits contain the byte offset.  This byte
offset is NOT right justified, so the lower few bits will always
be zero and your test will always fail.

In general, casting pointers into (char*), casting (char*) back to
the SAME TYPE of pointer, taking the difference between two pointers
of the SAME TYPE, and adding an integral amount to a pointer are
all safe operations.  If you find you are making other kinds of
casts or arithmetic operations, there is a good chance that what
you are doing won't be portable.

henry@utzoo.UUCP (Henry Spencer) (12/05/86)

> In particular, I need to determine whether the pointer is 32-bit
> aligned before attempting to store a long by casting it to a (long *).

Speaking generally, "there ain't no graceful way".

>    if ((long)p & 3) ...

This one will fail badly on any machine which uses a peculiar representation
for "char *", since it assumes that the low-order bits of the pointer are
the low-order bits of the address.  This is likely to be the case on most
fundamentally byte-addressed machines, but it will fall down on things
like the PDP-10, where the simplest form of pointer is a word address and
character addressing sneaks in using higher bits.  The excrement really
hits the fan if character pointers are too large to fit in a long, which
is technically illegal but can happen.  In general, you can't safely assume
anything about what you get when you convert a pointer into a long; to be
portable, such a value must be treated as a magic cookie without meaningful
internal structure.

>    if ((long)(p - (char *) 0) & 3) ...

If I had to pick one, this would be it.  It's a violation of the rules,
unless X3J11 has done something strange (I haven't brought myself up to date
yet), since pointer subtraction is meaningful only within the same array.
It might run into trouble on strangely-built machines (can you say "segments"
six times swiftly?) where memory is not treated as a single array of bytes,
and would be worrisome on machines that use unorthodox representations of
the null pointer.  But I think it's the best that can be done.  Neither of
the casts should be necessary, actually:  the subtraction is already giving
some kind of integer as a result, so & will work unaccompanied, and the 0
turns into a null pointer automatically because of the way it's used.

Whatever you do, make very sure that it is isolated in a single module,
and that the module's potentially machine-dependent nature is documented
properly.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

desj@brahms (David desJardins) (12/06/86)

In article <1510@mit-trillian.MIT.EDU> newman@athena.mit.edu (Ron Newman) writes:
>   char *p;
>
>   if ((long)p & 3) ...
>
>   if ((long)(p - (char *) 0) & 3) ...

   Why not use

	if (p != (char *) (long *) p) ... ?

   This should give you the relevant information (can a long be stored at
the location pointed to by p?) without any machine dependencies.  I suppose
on some (broken) compilers it might not work...

   -- David desJardins

henry@utzoo.UUCP (Henry Spencer) (12/07/86)

> 	if (p != (char *) (long *) p) ... ?
>    This should give you the relevant information (can a long be stored at
> the location pointed to by p?) without any machine dependencies.  I suppose
> on some (broken) compilers it might not work...

Also on some non-broken compilers, alas.  Converting "char *" to "long *"
with a cast does not guarantee that the result is a VALID "long *".  That
is your problem, not the compiler's.  On some machines this will work,
because the conversion involves a change in representation, explicitly
dropping the higher-precision part of the char pointer.  But on orthodox
machines like VAXen and 68Ks, the compiler makes no attempt to clear those
nasty low-order bits, so the comparison tells you nothing.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

eppstein@garfield.columbia.edu (David Eppstein) (12/07/86)

In <7381@utzoo.UUCP>, henry@utzoo.UUCP (Henry Spencer) writes:
> >    if ((long)(p - (char *) 0) & 3) ...
> If I had to pick one, this would be it.

I'm responding to this partly because in his criticism of a different
and worse construction Henry mentioned that it wouldn't work on a PDP-10.

It's been too long since I worked on the relevant compiler, but I
doubt this works on the PDP-10 either.  Normal char pointers like p
have word addresses in their low bits and bitfield stuff in their high
bits as Henry said earlier, but (char *)0 is exactly the zero word
(for efficiency in comparisons and for losers who pass it as an
argument without casting it).  I don't remember what subtracting one
from the other does but it's probably not what you expect.

Also, the &3 part bothers me.  This is ok for the usual char pointers
you see on the PDP-10 because they use 9-bit bytes which pack four to
a word.  But 7-bit bytes packed 5 to a word are very common, and I
seem to recall some other architectures having 3 bytes per word (which
can also be done on the PDP-10 but I've never seen it actually used).

The best way to check pointer alignment on the PDP-10 is
	(int *) p == (int *) (p - 1) /* unaligned if p-1 is in same word */
which works for all types of byte pointers, but I would go with
 	p != (char *) (double *) p
since it works for the 9-bit pointers used most often by C and could
be made to work on most other architectures.

Or better rewrite the code to not need this information in the first place.
-- 
David Eppstein, eppstein@cs.columbia.edu, seismo!columbia!cs!eppstein

stuart@bms-at.UUCP (Stuart D. Gathman) (12/08/86)

In article <1510@mit-trillian.MIT.EDU>, newman@mit-trillian.MIT.EDU (Ron Newman) writes:

> I am using a (char *) pointer to store a sequence of differently-typed

> In particular, I need to determine whether the pointer is 32-bit
> aligned before attempting to store a long by casting it to a (long *).

Cast it to a (long *) and back again then see if it changed.

char *p;
if ( (char *) (long *) p == p) . . .
-- 
Stuart D. Gathman	<..!seismo!dgis!bms-at!stuart>

throopw@dg_rtp.UUCP (Wayne Throop) (12/08/86)

> desj@brahms (David desJardins)
>   Why not use
>       if (p != (char *) (long *) p) ... ?
>    This should give you the relevant information (can a long be stored at
> the location pointed to by p?) without any machine dependencies.  I suppose
> on some (broken) compilers it might not work...

Well... no.  C casts don't guarantee to alter the alignment of pointers
to reflect architectural restrictions.  They just DON'T guarantee NOT to
do so.  Basically, there isn't any portable way of doing this.

--
A LISP programmer knows the value of everything, but the cost of nothing.
                                --- Alan J. Perlis
-- 
Wayne Throop      <the-known-world>!mcnc!rti-sel!dg_rtp!throopw

henry@utzoo.UUCP (Henry Spencer) (12/08/86)

> ... Neither of the casts should be necessary, actually...

Argh, I must have been half-asleep when I typed that.  Of course, the
second cast remains necessary, to ensure that it's pointer subtraction
rather than decrementing a pointer by zero.  Sorry about that.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

ron@brl-sem.ARPA (Ron Natalie <ron>) (12/09/86)

In article <3798@watmath.UUCP>, rbutterworth@watmath.UUCP (Ray Butterworth) writes:
> In article <1510@mit-trillian.MIT.EDU>, newman@mit-trillian.MIT.EDU (Ron Newman) writes:
> > 
> >    char *p;
> >    /* method 1 */
> >    if ((long)p & 3) ...
> >    /* method 2 */
> >    if ((long)(p - (char *) 0) & 3) ...
> 
> As far as I know, all C compilers will return the integral number
> of bytes between the two pointers in method 2.  It should always
> work (and there is no need for the (long) cast.  (There might be
> compilers out there in which (sizeof(char))!=1, I don't know.)

First, sizeof (char) has to be 1.  Second, method 2 assumes that
subtracting a null character pointer from a pointer results in some
meaningful value, which is not guaranteed.  All you are guaranteed
to be able to do with ZERO is assign it and test for it.  Third,
this assumes that longs will always need to have the lower two
bits of their address cleared.  If you are going to make this silly
assumption, you might as well forget about the other attempts at
portability as you have already blown it.

> In general, casting pointers into (char*), casting (char*) back to
> the SAME TYPE of pointer,
 
Maybe yes, maybe no.

> taking the difference between two pointers of the SAME TYPE,

By definition, this is so.

> and adding an integral amount to a pointer are all safe operations.

Provided that you stay within the declared data type (i.e...
	char	foo[100];
	char	*goo;

	goo = foo;
	goo += 100;

Is not guaranteed to work.

-Ron

rushfort@esunix.UUCP (12/10/86)

In article <487@cartan.Berkeley.EDU> desj@brahms (David desJardins) writes

>   Why not use
>
>	if (p != (char *) (long *) p) ... ?
>
>   This should give you the relevant information (can a long be stored at
>the location pointed to by p?) without any machine dependencies.  I suppose
>on some (broken) compilers it might not work...

Not quite.  This test merely checks whether casting a character pointer into a
long pointer and then back into a character pointer preserves all of the bits.
A compiler is not forced to check alignment on casts.  According to K&R
Appendix A, Section 14.4 (Explicit pointer conversions):

	Certain conversions involving pointers are permitted but have
	implementation-dependent aspects.
[...]
	A pointer to one type may be converted to a pointer to another
	type.  The resulting pointer may cause addressing exceptions
	upon use if the subject pointer does not refer to an object
	suitably aligned in storage.

Note that K&R specifically allows an implementation to leave a pointer
alone when casting between pointer types.  I tried this out on a VAX
running Ultrix 1.2 and a SUN 2 running some form of 4.2 BSD and both
chose to leave a mis-aligned pointer alone.  The bottom line is that
there is no PORTABLE way to determine whether a character pointer is
suitably aligned such that it can point to an object of a different
type.
-- 
                Kevin C. Rushforth
                Evans & Sutherland Computer Corporation

UUCP Address:   {ihnp4,decvax}!decwrl!esunix!rushfort
Alternate:      {ihnp4,seismo}!utah-cs!utah-gr!uplherc!esunix!rushfort

karl@haddock.UUCP (Karl Heuer) (12/10/86)

In article <1510@mit-trillian.MIT.EDU> newman@athena.mit.edu (Ron Newman) writes:
>Which is a better, more portable way of determining whether a pointer
>[char *p] is aligned?

Several answers have been proposed, each of which fails on some machine.
I had this same situation a while back, and I solved it with the "most
portable" method:  "if (isaligned(p)) ...".  I then wrote the isaligned()
macro, and put it in a header file with a big comment.  Now the code works
on any machine, as long as I keep that header file up-to-date.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

mwm@eris.UUCP (12/10/86)

In article <299@bms-at.UUCP> stuart@bms-at.UUCP (Stuart D. Gathman) writes:
>In article <1510@mit-trillian.MIT.EDU>, newman@mit-trillian.MIT.EDU (Ron Newman) writes:
>> In particular, I need to determine whether the pointer is 32-bit
>> aligned before attempting to store a long by casting it to a (long *).
>
>Cast it to a (long *) and back again then see if it changed.
>
>char *p;
>if ( (char *) (long *) p == p) . . .

As has been pointed out several times, the C compiler isn't guaranteed
to convert p to a valid long* for you. As hasn't yet been pointed out,
many (most?) modern machines are perfectly happy to fetch a long from
an odd boundary; you just pay a performance penalty for it.

So even if your compiler is "correct" (n.b. - the quotes don't mean
that I think this is really correct for C), it could still fail. Worse
yet, some over zealous (and given current memory costs, I'd be tempted
to say not very bright) language maintainer decides that memory is
more precious than CPU cycles, your code could quit working between
compiles.

This brought up an interesting question:

Given a char *p that is not aligned, what should the cast (long *) p
return if it wants to return an aligned pointer? The long containing
*p? The next long after *p? Any standards say anything?

	Thanx,
	<mike

rjk@mrstve.UUCP (12/10/86)

In article <7381@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>> In particular, I need to determine whether the pointer is 32-bit
>> aligned before attempting to store a long by casting it to a (long *).
>
>Speaking generally, "there ain't no graceful way".
> [and so forth]

Should there be?  I may very well be misunderstanding what's going on here,
but it appears to me that you're looking for a *machine-independent* way of
performing a *machine-specific* operation.  None of the methods I've seen
(I may have missed a few) would work on a machine like the CDC 6600
(famous in song and story) with its 60 bit words.  Am I right? or have I
missed something?  Our news feed is occasionally flakey...
-- 
Rich Kuhns		{ihnp4, decvax, etc...}!pur-ee!pur-phy!mrstve!rjk

henry@utzoo.UUCP (Henry Spencer) (12/16/86)

> ... I may very well be misunderstanding what's going on here,
> but it appears to me that you're looking for a *machine-independent* way of
> performing a *machine-specific* operation.  None of the methods I've seen
> (I may have missed a few) would work on a machine like the CDC 6600...

The *methods* wouldn't work, but the general notion of "is a character
pointer sufficiently well-aligned to be a valid pointer to something else"
probably remains valid on the 6600 (which I admit I'm not very familiar
with).  It would not be unreasonable for a language like C to provide a
primitive for determining this, although the implementation of the primitive
would necessarily be machine-specific.  [Note that this is *not* a call for
adding such a primitive to C in particular.  Right now we need less inno-
vation in C's definition, not more.]
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry