[comp.lang.c] Portability vs. Endianness

esink@turia.dit.upm.es (03/12/91)

OK.  I read regularly, I check the FAQ List and other sources before
posting ANYTHING.  I even mail occasional simple questions directly
to someone with authority so I don't flood the net with a possible
FAQ.  However, the 'policing' on this group at times approaches
a neurotic level, so I proceed with caution.  Forgive me if this
has been discussed to death (FMITHBDTD)

Given the following :

long var;
unsigned char Bytes[4];


Is there a portable way to move the value held in var
into the memory space pointed to by Bytes, with the restriction
that the representation be in Most Significant Byte first
format ?  I am well aware that sizeof(long) may not be 4.  I
want the value contained in var converted to a 68000
long word, and I want the code fragment to run on any
machine.  The solution must be ANSI C.

Feel free to suggest your own declarations
to replace or augment mine.

Thanks in advance for the input,

Eric




Eric W. Sink                     | Putting the phrase      |All opinions
Departamento de Telematica       | "Frequently Asked"      |are mine and
Universidad Politecnica de Madrid| in your kill file is    |not necessarily
esink@turia.dit.upm.es           | not recommended.        |yours.

jfw@ksr.com (John F. Woods) (03/13/91)

esink@turia.dit.upm.es writes:
>Forgive me if this has been discussed to death (FMITHBDTD)

It probably has been, but perhaps what can be said about the answer will
be interesting:

>Given the following :
>long var;
>unsigned char Bytes[4];


>Is there a portable way to move the value held in var
>into the memory space pointed to by Bytes, with the restriction
>that the representation be in Most Significant Byte first
>format ?  I am well aware that sizeof(long) may not be 4.  I
>want the value contained in var converted to a 68000
>long word, and I want the code fragment to run on any
>machine.  The solution must be ANSI C.

	Bytes[0] = (var >> 24) & 0xFF;
	Bytes[1] = (var >> 16) & 0xFF;
	Bytes[2] = (var >> 8)  & 0xFF;
	Bytes[3] =  var        & 0xFF;

This code is guaranteed.

It might not be out of place to surround it with #ifdefs, so as to enable
more efficient code sequences to be used when possible:

#if defined(LONGS_ARE_4_BYTES) && defined(BIGENDIAN) && defined(LOOSEALIGN)
	/* eg 68020 */
	*(long *)Bytes = var;
#elif defined(LONGS_ARE_4_BYTES) && defined(BIGENDIAN)
	/* eg 68000; maybe Bytes can be an odd address */
	Bytes[0] = ((char *)&var)[0];
	Bytes[1] = ((char *)&var)[1];
	Bytes[2] = ((char *)&var)[2];
	Bytes[3] = ((char *)&var)[3];
#else
	Bytes[0] = (var >> 24) & 0xFF;
	Bytes[1] = (var >> 16) & 0xFF;
	Bytes[2] = (var >> 8)  & 0xFF;
	Bytes[3] =  var        & 0xFF;
#endif

Note that BIGENDIAN and LOOSEALIGN are *application defined* keywords here
that would be part of your configuration process; ANSI C does not mandate
any particular kind of compiler-provided help for this.  Considering the
subtleties that might interfere with these tricks even if "BIGENDIAN" and
"LOOSEALIGN" appear to be true, perhaps better defines might be something like
#ifdef TRICK666
	/* 68020 stuff */
#elif TRICK665
	/* 68000 stuff */
#else
	/* guaranteed stuff */
#endif

where your porting manual for your software package explains just what
TRICK665 and TRICK666 are, what constraints are involved, and points out
that the code is guaranteed to run correctly even if they are not defined
when they could be ("Make it right, then make it fast.").  You do write
porting manuals for your software packages, don't you?

One might also want to include the possibility (in the 68000 code) that
memcpy() is a compiler builtin which can be particularly efficient, but
this does seem like it's getting to be an awful lot of typing for relatively
little benefit.

Note that depending on WHY your application is copying longs into char[]
buffers, you might want the solution to look completely different, possibly
hidden behind a macro like "putlong()" (or "putint32()") which could then
expand into the above mess.

gwyn@smoke.brl.mil (Doug Gwyn) (03/13/91)

In article <2628@ksr.com> jfw@ksr.com (John F. Woods) writes:
>This code is guaranteed.

"Famous last words."

In fact, it is not guaranteed on a host that uses ones-complement
or sign-magnitude integral representation.

bhoughto@hopi.intel.com (Blair P. Houghton) (03/13/91)

In article <1991Mar12.105451.19488@dit.upm.es> esink@turia.dit.upm.es () writes:
>long var;
>unsigned char Bytes[4];
>
>Is there a portable way to move the value held in var
>into the memory space pointed to by Bytes, with the restriction
>that the representation be in Most Significant Byte first
>format ?  I am well aware that sizeof(long) may not be 4.  I

#if (((sizeof long)%(sizeof char)) == 0)	/* even-numbered modulus */
/*
 *  Don't forget; chars _can_ be the same number of bits
 *  as longs, and longs _can_ be an odd number of bytes. 
 *
 *  Give'em an inch, and...
 */

    #define LONGWORD sizeof long
    #define HALFWORD LONGWORD/2

/* Memcpy(3) is the neat way. */

    #include <string.h>

    long var;
    unsigned char Bytes[LONGWORD];
    char *p;

    p = (char *)&var;
    memcpy( (void *)(Bytes+HALFWORD), (void *)p, HALFWORD); /* front to back */
    memcpy( (void *)Bytes, (void *)(p+HALFWORD), HALFWORD); /* back to front */

/* to avoid using p: */

    long var;
    unsigned char Bytes[LONGWORD];

    memcpy((void *)(Bytes+HALFWORD),(void *)&var,HALFWORD); /* front to back */
    memcpy((void *)Bytes,(void *)((char *)&var+HALFWORD),HALFWORD); /*b-f*/

/* but when you don't have the ANSI library or a bcopy(3), */

    long var;
    unsigned char Bytes[LONGWORD];
    char *p;
    int n;

    /* front to back, then back to front */
    for ( p = (char *)&var; p < LONGWORD + (char *)&var; p += HALFWORD )
	for ( n = ((char *)&var + HALFWORD) - p); /* Halfword, then 0 */
	      n < ((char *)&var + LONGWORD) - p); /* Longword, then Halfword */
	      n++ )
	    Bytes[n] = *p;

/*
 *  Of course, since this applies architecturally, you can
 *  wrap it in machine names and simplify until you're silly,
 *  using those porridges of code above only for the default
 *  (unknown wordsize) case.
 */

#ifdef BLEET_TO_BLIT
    /* both: 16-bit chars, 32-bit longs; bleet: little; blit: big */
    long var;
    unsigned char Bytes[2];

    Bytes[0] = *(char *)var;
    Bytes[1] = *((char *)var + 1);
#endif

#endif

The basic answer to your underlying question is, yes, you
can access the individual bytes of any object in ANSI C;
i.e., all objects must have n*CHAR_BIT bits, where n is
some positive integer.  See ANSI X3.159-1989 (i.e., the
standard), sections 1.6 (for "Object"), 2.2.4.2.1 (for
"CHAR_BIT"), and the Rationale, section 1.6 (for more on
the multiples-of-bytes theme).

				--Blair
				  "AND, it has the added BONUS of
				   being able to convert the big-
				   endian BACK into little-endian
				   when you're DONE with it! :-)"

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/13/91)

In article <1991Mar12.105451.19488@dit.upm.es> esink@turia.dit.upm.es () writes:
> long var;
> unsigned char Bytes[4];
> Is there a portable way to move the value held in var
> into the memory space pointed to by Bytes, with the restriction
> that the representation be in Most Significant Byte first
> format ?

Portable questions may have unportable answers, but unportable questions
can never have portable answers. In other words, ``Most Significant Byte
first format (of longs in a 4-byte field)'' is not a portable concept,
so there is no way that portable code can implement the format.

Now what you probably mean is that Bytes[0] should contain the low 8
bits of var, Bytes[1] should contain the next 8 bits, etc. Literally:

  Bytes[0] = ((unsigned long) var) & 255;
  Bytes[1] = (((unsigned long) var) >> 8) & 255;
  Bytes[2] = (((unsigned long) var) >> 16) & 255;
  Bytes[3] = (((unsigned long) var) >> 24) & 255;

(You should use unsigned to keep the sign bit out of the way.) If
(unsigned long) var is between 0 and 2^32 - 1, then you can recover var
as

  var = (long) (((((((unsigned long) Bytes[3]) * 256
	  + Bytes[2]) * 256) + Bytes[1]) * 256) + Bytes[0]);
	
I think this is machine-independent. But there are no guarantees that
long has 32 bits, or that char has 8 bits.

In practice, if you can assume that longs are 32 bits and chars are 8
bits, you don't want to write code like the above to do simple byte
copies. Some people will suggest #ifdefs. If you want to avoid the
#ifdefs, you might find an alternate strategy useful; I quote from my
snuffle.c:

  WORD32 tr[16] =
   {
    50462976, 117835012, 185207048, 252579084, 319951120, 387323156,
    454695192, 522067228, 589439264, 656811300, 724183336, 791555372,
    858927408, 926299444, 993671480, 1061043516
   } ;
  ...
  register unsigned char *ctr = &tr[0];
  ...
  x[n] = h[n] + m[ctr[n & 31]];

Here WORD32 is a 32-bit type, and m is a char (actually unsigned char)
pointer, really pointing to an array of WORD32s. m[ctr[0]] will on most
32-bit architectures be the low 8 bits of the first WORD32; m[ctr[1]]
will be the next 8 bits; and so on. This works because every common byte
order---1234, 4321, 2143, 3412---is an invertible permutation of 1234.
(I suppose it would be clearer to write the tr[] values in hex, but the
magic numbers are more fun.)

---Dan

henry@zoo.toronto.edu (Henry Spencer) (03/14/91)

In article <3005@inews.intel.com> bhoughto@hopi.intel.com (Blair P. Houghton) writes:
>#if (((sizeof long)%(sizeof char)) == 0)	/* even-numbered modulus */

Bleep.  Error.  At least, it won't do what you think.  "sizeof" is not an
operator in #if; it is just another unknown identifier.

Incidentally, that expression is guaranteed to be true anyway, because
sizeof char ==== 1.  ("====" is the "emphatically equal to" operator. :-))
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

wirzenius@cc.helsinki.fi (Lars Wirzenius) (03/14/91)

In article <1991Mar12.105451.19488@dit.upm.es>, esink@turia.dit.upm.es writes:
>Is there a portable way to move the value held in var
>into the memory space pointed to by Bytes, with the restriction
>that the representation be in Most Significant Byte first
>format ?  I am well aware that sizeof(long) may not be 4.  I
>want the value contained in var converted to a 68000
>long word, and I want the code fragment to run on any
>machine.  The solution must be ANSI C.

I think you're going to have difficulties on machines with non-32-bit
longs and non-8-bit chars. longs bigger than 32 bits won't fit into a
68000 longword (32 bits) without losing information.

On machines with 32-bit longs and 8-bit bytes, you could try the
following:
	
	#include <stdio.h>
	#define BITS_PER_CHAR 8
	
	int main() {
		unsigned long var;
		char Bytes[sizeof var];
		unsigned long mask;
		int i;
	
		var = 0x12345678;
		printf("var = %lx\n", var);
	
		mask = ~(~0 & (~0 << BITS_PER_CHAR));
			/* this creates a mask with only the lower
			 * BITS_PER_CHAR bits turned on
			 */

		for (i = 0; i < sizeof var; ++i) {
			Bytes[ (sizeof var) - i - 1] = 
				(var & mask);
			var >>= BITS_PER_CHAR;
		}
	
		for (i = 0; i < sizeof var; ++i) 
			printf("Byte %d: %02x\n", i, Bytes[i]);
		exit(0);
	}
	
This has been tested on a VAX (real good example, I know, but I can't
try it on anything else right now).

-- 
Lars Wirzenius    wirzenius@cc.helsinki.fi

jfw@ksr.com (John F. Woods) (03/14/91)

gwyn@smoke.brl.mil (Doug Gwyn) writes:
>In article <2628@ksr.com> jfw@ksr.com (John F. Woods) writes:
>>This code is guaranteed.
>"Famous last words."
>In fact, it is not guaranteed on a host that uses ones-complement
>or sign-magnitude integral representation.

oops.

jfw@ksr.com (John F. Woods) (03/14/91)

[One more time, this time with an open copy of ANSI C]

gwyn@smoke.brl.mil (Doug Gwyn) writes:
>In article <2628@ksr.com> jfw@ksr.com (John F. Woods) writes:
>>This code is guaranteed.
>"Famous last words."
>In fact, it is not guaranteed on a host that uses ones-complement
>or sign-magnitude integral representation.

Hmm.  Actually, even more offensive is that my code fails if the number
of bits in a byte isn't 8 (fails in some sense, anyway; it does (on a
2s-complement 9-bit-per-byte machine) pick up the low 32 bits of the long,
which might be what's desired, but it doesn't pick up the low 4xbytesize
bits of the long, which might well be the idea).  In fact, it isn't
"guaranteed" even on a garden-variety, right-thinking, 8-bits-per-byte
2s-complement machine because "if E1 has a signed type and a negative
value, the resulting value is implementation-defined."  One hopes the
compiler will not generate code to launch a nuclear airstrike, but if
it is so documented in the vendor's manual, it is OK by ANSI.

Further hmm.  Is memcpy() obliged to do anything sensible if handed
a pointer to an integer cast to a void *?  The ANSI C spec refers to
memcpy() as working on "arrays of characters", and I know there have been
discussions where (for example) integers are placed in a different address
space from characters and hence a routine cannot assume that it can make any
use of "(char *)&integervalue".

Hmm again.  Now that I am staring at the ANSI manual, there's lots of
fascinating things here.  First, a conforming implementation is not
required to be able to represent "-0x80000000" in a signed long, but 
must be able to represent all 0x100000000 32-bit values in an unsigned long.
So, I think that the variable definitions should be changed to:

unsigned long var;
unsigned char Bytes[4];

Then

Bytes[0] = (var >> 24) & 0xFF;
Bytes[1] = (var >> 16) & 0xFF;
Bytes[2] = (var >> 8)  & 0xFF;
Bytes[3] =  var        & 0xFF;

Ought to give a consistent answer.  I think.  At least, I think it gives
the same answer on a 68000 and a DataBlaster-1000.

The problem specification is, I think, the real problem:  what, for example,
does it really *mean* to "want the value contained in var converted to a 68000
long word" on a 9 bit (or 1s-complement) machine?  If the goal is to printf
the four bytes with %x, I think the above solution is right in all cases;
if the goal is to write the data out on tape, then a 9-bit machine may not
be able to give a useful answer in any case.

I'm going to put "guarantee" into my spell-checker's list of "always
misspelled" words (well, I would if I used one...).

jfw@ksr.com (John F. Woods) (03/14/91)

jfw@ksr.com (John F. Woods) writes:
>[One more time, this time with an open copy of ANSI C]
Now, opening the FIRST chapter, ...

>Further hmm.  Is memcpy() obliged to do anything sensible if handed
>a pointer to an integer cast to a void *?

Yes.  Courtesy of section 1.6 and explained by some handwaving in the
Rationale, all C objects can be addressed as arrays of characters.
Of course, on a 1s-complement machine, the characters you find in a long
will NOT be the same characters you would find on a 2s-complement machine,
so as long as the declaration of 'var' in the original stays just 'long',
then memcpy() isn't the right answer.

gwyn@smoke.brl.mil (Doug Gwyn) (03/14/91)

In article <2669@ksr.com> jfw@ksr.com (John F. Woods) writes:
>Further hmm.  Is memcpy() obliged to do anything sensible if handed
>a pointer to an integer cast to a void *?

Yes, memcpy() can be used to copy any valid object.

By the way, I think I have a correct solution for the original problem
embedded as a subset of what the MUVES Dx package does when sending
data between heterogeneous architectures, but it is hard to extract
from the context.

bhoughto@hopi.intel.com (Blair P. Houghton) (03/14/91)

In article <1991Mar13.164840.20615@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <3005@inews.intel.com> bhoughto@hopi.intel.com (Blair P. Houghton) writes:
>>#if (((sizeof long)%(sizeof char)) == 0)	/* even-numbered modulus */
>
>Bleep.  Error.  At least, it won't do what you think.  "sizeof" is not an
>operator in #if; it is just another unknown identifier.

I'll just mumble something here about "meta-code"
and try not to look too guilty...

And yes, that's been around here once or twice before, too.

>Incidentally, that expression is guaranteed to be true anyway, because
>sizeof char ==== 1.  ("====" is the "emphatically equal to" operator. :-))

Yeh.  I never were good about making my antecedents agree
with its objects...

				--Blair
				  "Pay no attention to the man
				   behind the curtain, a-mundo."

markt@nro.cs.athabascau.ca (Mark Tarrabain) (03/14/91)

esink@turia.dit.upm.es writes:

> Given the following :
> 
> long var;
> unsigned char Bytes[4];
> 
> 
> Is there a portable way to move the value held in var
> into the memory space pointed to by Bytes, with the restriction
> that the representation be in Most Significant Byte first
> format ?  I am well aware that sizeof(long) may not be 4.  I
> want the value contained in var converted to a 68000
> long word, and I want the code fragment to run on any
> machine.  The solution must be ANSI C.
> 

try the following code fragment: (no guarantees mind you, It's late at
night and I'm kinda wired on caffeine)

union swap_type
  {
  long lvalue;
  char cvalue[4];
  };
int hifirst()
  {
  union swap_type dummy;
  dummy.lvalue = 1;
  return dummy.cvalue[3];
  }
void put_var_to_bytes(long var,char Bytes[4])
  {
  union swap_type dummy;
  dummy.lvalue = var;
  if (!hifirst())
    {
    char temp;
    temp = dummy.cvalue[0];
    dummy.cvalue[0] = dummy.cvalue[3];
    dummy.cvalue[3] = temp;
    temp = dummy.cvalue[1];
    dummy.cvalue[1] = dummy.cvalue[2];
    dummy.cvalue[2] = temp;
    }
  Bytes[0] = dummy.cvalue[0];
  Bytes[1] = dummy.cvalue[1];
  Bytes[2] = dummy.cvalue[2];
  Bytes[3] = dummy.cvalue[3];
  }

this is slightly machine dependant - it assumes that the size of a long
is the same as 4 characters. (which should be the case on a 68000 
anyways)
        >> Mark

peter@ficc.ferranti.com (Peter da Silva) (03/14/91)

In article <1991Mar12.105451.19488@dit.upm.es> esink@turia.dit.upm.es () writes:
> long var;
> unsigned char Bytes[4];
> 
> Is there a portable way to move the value held in var
> into the memory space pointed to by Bytes, with the restriction
> that the representation be in Most Significant Byte first
> format ?  I am well aware that sizeof(long) may not be 4.  I
> want the value contained in var converted to a 68000
> long word, and I want the code fragment to run on any
> machine.  The solution must be ANSI C.

	Bytes[0] = (var & 0xFF000000) >> 24;
	Bytes[1] = (var & 0x00FF0000) >> 16;
	Bytes[2] = (var & 0x0000FF00) >> 8;
	Bytes[3] = (var & 0x000000FF) >> 0;

If you don't mind checking ifdefs, you can run a test program to check
endianness and do this:

#ifdef E4321
	*((long *)Bytes) = var
#else
	... above code ...
#endif

One guy here wrote a fairly extensive .h file for manipulating things
in 68000 order. It needs a single #define for the native byte order and
defines everything else based on that.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

don@dgbt.doc.ca (Donald McLachlan) (03/15/91)

>Sender: @dit.upm.es
>
>
>Given the following :
>
>long var;
>unsigned char Bytes[4];
>
>
>Is there a portable way to move the value held in var
>into the memory space pointed to by Bytes, with the restriction
>that the representation be in Most Significant Byte first
>format ?  I am well aware that sizeof(long) may not be 4.  I
>want the value contained in var converted to a 68000
>long word, and I want the code fragment to run on any
>machine.  The solution must be ANSI C.
>
>Feel free to suggest your own declarations
>to replace or augment mine.
>
>Thanks in advance for the input,
>
>Eric
>

Eric:

I can't be completely sure (I've only checked it on 2 different architectures)
and done dry runs in my head for 2 others, but I think whis should work.

Please don't flame me for posting this guys, because since I don't know
if this will work I want some input.

Don.

_______________________ cut ________________________________________
static int init = 1;
static char to_motorola[sizeof(long)];

main()
{
	extern long cvt_to_moto();

	char format_string[32];
	int i;
	long number;
	char *c = (char *)&number;

	number = 0x574319;
	sprintf(format_string, "number = 0x%%0%dlx\n", sizeof(number) * 2);
	printf(format_string, (unsigned long)number);

	printf("local order = ");
	for(i = 0; i < sizeof(number); ++i)
		printf("%02lx ", (unsigned long)c[i]);
	puts("");
	
	number = cvt_to_moto(number);

	printf("moto  order = ");
	for(i = 0; i < sizeof(number); ++i)
		printf("%02lx ", (unsigned long)c[i]);
	
	puts("");
}

long cvt_to_moto(host_num)
long host_num;
{
	char *native, *motorola;
	int i;
	long moto_num;

	if(init == 1)
		init_cvt_to_moto();

	native = (char *)&host_num;
	motorola = (char *)&moto_num;

	for(i = 0; i < sizeof(moto_num); ++i)
		motorola[i] = native[to_motorola[i]];
	
	return(moto_num);
}

init_cvt_to_moto()
{
	int i;
	long num;
	char *c = (char *)&num;

	for(i = 0; i < sizeof(num); ++i)	/* build moto-endian image */
		c[i] = i;			/* of 0x00010203... */

	for(--i; i >= 0; --i)			/* for each byte in num */
	{					/* read LSB bytes from num */
		to_motorola[i] = num & 0x00FFL;	/* and put in to_moto[LSB] */
		num >>=8;			/* get next byte in num */
	}
	init = 0;
}
_______________________ cut ________________________________________

here are the results from a Vax 750 running Ultrix

number = 0x00574319
local order = 19 43 57 00 
moto  order = 00 57 43 19 

and from a sun sparcstation 330 running Sun0/S

number = 0x00574319
local order = 00 57 43 19 
moto  order = 00 57 43 19 

rjohnson@shell.com (Roy Johnson) (03/16/91)

esink@turia.dit.upm.es writes:

> Given the following :
> 
> long var;
> unsigned char Bytes[4];
> 
> 
> Is there a portable way to move the value held in var
> into the memory space pointed to by Bytes, with the restriction
> that the representation be in Most Significant Byte first
> format ?  I am well aware that sizeof(long) may not be 4.  I
> want the value contained in var converted to a 68000
> long word, and I want the code fragment to run on any
> machine.  The solution must be ANSI C.

I posted yesterday in response to this, but managed to cover my
face with egg.  The real question I had should have gone something
like this:

Would tempstr, after
 sprintf(tempstr, "%08x", var);
be independent of:
1) the endianness of the datatype var
2) the internal representation
?

If so, then you could check strlen(tempstr) to make sure it's not
too long, and convert the two-byte substrings of tempstr to
the bytes they represent, e.g.

#define mask(ch) (char)(((ch >= 'a') && (ch <= 'f')) ? ch-'a'+0xa : ch-'0')

for (i=0; i<4; ++i)
  Bytes[i] = (mask(tempstr[2*i]) << 4) | mask(tempstr[2*i + 1])
--
======= !{sun,psuvax1,bcm,rice,decwrl,cs.utexas.edu}!shell!rjohnson =======
Feel free to correct me, but don't preface your correction with "BZZT!"
Roy Johnson, Shell Development Company

dgil@pa.reuter.COM (Dave Gillett) (03/19/91)

In <2628@ksr.com> jfw@ksr.com (John F. Woods) writes:

>	Bytes[0] = (var >> 24) & 0xFF;
>	Bytes[1] = (var >> 16) & 0xFF;
>	Bytes[2] = (var >> 8)  & 0xFF;
>	Bytes[3] =  var        & 0xFF;

>This code is guaranteed.


     I don't think that the standard guarantees that chars are eight bits.
I will agree that that's probably even more common than 4-byte longs, but
even this assumption about word sizes cannot, IMHO, be guaranteed to be
portable.  (6-bit and 9-bit chars have gone out of style, but I've used
machines with 16-bit word sizes that would *not* have produced the desired
result for this code.)
                                                 Dave

gwyn@smoke.brl.mil (Doug Gwyn) (03/21/91)

In article <829@saxony.pa.reuter.COM> dgil@pa.reuter.COM (Dave Gillett) writes:
>>	Bytes[0] = (var >> 24) & 0xFF;
>>	Bytes[1] = (var >> 16) & 0xFF;
>>	Bytes[2] = (var >> 8)  & 0xFF;
>>	Bytes[3] =  var        & 0xFF;
>     I don't think that the standard guarantees that chars are eight bits.

It does guarantee that they can hold AT LEAST 8 bits.
Assuming sufficient care is taken with use of unsigned types, etc.,
that is not a problem.
However, as mentioned before, the failure to consider representation
issues for negative values can be a problem, depending on the implementation.

msb@sq.sq.com (Mark Brader) (03/24/91)

> >	Bytes[0] = (var >> 24) & 0xFF;
> >	Bytes[1] = (var >> 16) & 0xFF;
> >	Bytes[2] = (var >> 8)  & 0xFF;
> >	Bytes[3] =  var        & 0xFF;
> >This code is guaranteed.
> 
> I don't think that the standard guarantees that chars are eight bits.
> I will agree that that's probably even more common than 4-byte longs, but
> even this assumption about word sizes cannot, IMHO, be guaranteed to be
> portable. ...

The original question asked that the output be in "68000 format", or some
such words.  Therefore it *is* correct to pull off 8 bits at a time,
even if the code is running on a machine where chars are larger than
8 bits.  (They aren't allowed to be smaller.)

However, the >> operation is implementation-defined when applied to a
signed integer variable whose value is negative.  To avoid problems, the
original value should be copied into a variable of type unsigned long.
This will also in effect convert it to 2's complement format, as desired
for the "68000 format" result, no matter what format the machine uses.
(Guaranteed in ANSI C.)

However, it isn't correct to assume that the original long fits in 32 bits.
That is, the original value may not be convertible to the desired format,
and the code should check for this.  There are several ways to do so, none
of them particularly pretty.  For instance, supposing that var is the
original input and uvar is the result of converting it to unsigned long:

	if (((uvar >> 31) >> 1) != (((var < 0)? -1UL : 0UL) >> 31) >> 1)
		complain_about_overflow();

There may be easier ways that I haven't thought of.  Note that it is not
portable to shift right by 32 bits, as shifts by the entire number of
bits in a type are undefined.  Also, I have not tested the above expression,
since I don't have access to a machine with longs wider than 32 bits.
Old compilers will require a cast rather than the UL suffix, but the
original poster asked for ANSI C.

-- 
Mark Brader, SoftQuad Inc., Toronto, utzoo!sq!msb, msb@sq.com
#define	MSB(type)	(~(((unsigned type)-1)>>1))

This article is in the public domain.