[comp.protocols.misc] big endians are good for networking, huh?

vixie@decwrl.dec.com (Paul A Vixie) (01/25/89)

I've got a diagram from Comer's Internetworking book (page 140), that says
that the TCP header starts out looking like this:

	0       8       16             31
	+-------+-------+-------+-------+
	|  SOURCE PORT  |   DEST PORT   |
	+-------+-------+-------+-------+

Looks reasonable.  It's even useful, mostly, since if I print the bytes in a
TCP header in the easiest possible way (lowest addresses on the left), I can
look from the bytes to the table and it looks just like it ought to.

But, for something I'm doing with SLIP header compression, I needed to put the
ports into a longword.  For a while I was overlaying an array of u_long on the
TCP header and grabbing element 0, but the pointer assignment looked non-port-
able somehow, so I decided to shift and add.

	longports = (tcp->th_sport << 16) + (tcp->th_dport);

Fine.  But oops: the "<<" operator, which _looks_ like it's going to move
things to the "left", is really going to move things to more-significant
bit positions.  On the VAX, that's "left", since the VAX Arch Ref Manual
shows endless diagrams with bit 31 on the left and bit 0 on the right.

On a 68000 or MIPSM, "up" is "right" and since the "<<" is "up" it's actually
"right" on those machines.  In my case, if I compared "longports" to something
extracted via the "overlay the u_long array" method, it would be different.
It doesn't really matter since I'm extracting it the same way everywhere, but
I believe I can justly complain that the law of least astonishment has been
violated, just the same.

Perhaps "C" shows a strong little-endian bias in the visual representation of
"shift to more/less significant bit positions" operators ("<<" and ">>")?  I
toyed with the possibility that "<<" would really mean "left" (or "down") on
a big-endian, and I'm reasonably sure it doesn't work that way -- anybody know
otherwise?

Little endian, big endian -- they all suck.  Give me an alternative!  :-)
--
Paul Vixie
Work:    vixie@decwrl.dec.com    decwrl!vixie    +1 415 853 6600
Play:    paul@vixie.sf.ca.us     vixie!paul      +1 415 864 7013

ron@ron.rutgers.edu (Ron Natalie) (01/26/89)

> On a 68000 or MIPSM, "up" is "right" and since the "<<" is "up" it's actually
> "right" on those machines.  In my case, if I compared "longports" to something
> extracted via the "overlay the u_long array" method, it would be different.

Untrue, C is not little endian by nature.  On the 68000 (and with every other machine
that I've seen) the bits in a word are presented as if they were a binary number.
That is, the least significant bits are on the right, the most significant on the
right.  Shifts and rotates are referenced to this.  On the 68000, the rotate left/right
instructions behave as you would expect with this implementation.  Moving in the "right"
direction moves bits towards the more significant part of the word and the "left"
direction is towards the least signficant part.

While various machines number their bits differently (for example, IBM numbers the
most significant bit 0 and the least 31, where as 68000's and VAX's use little numbers
for low order bits, and I think that the Univac numbers them from 1 rather than 0),
they all represent them as having the LSB on the RIGHT.   What makes them big or
little endian is how they map this representation into the machines address space.

-Ron

ron@ron.rutgers.edu (Ron Natalie) (01/27/89)

[ Some kind people pointed out that in addition to some wierd margin sizes
  I inadvertently messed up right and left a couple of times.  This is a
  reviced posting ]

> On a 68000 or MIPSM, "up" is "right" and since the "<<" is "up" it's
> actually "right" on those machines.  In my case, if I compared
> "longports" to something extracted via the "overlay the u_long array"
> method, it would be different.

Untrue, C is not little endian by nature.  On the 68000 (and with every
other machine that I've seen) the bits in a word are presented as if
they were a binary number.  That is, the least significant bits are on
the right, the most significant on the right.  Shifts and rotates are
referenced to this.  On the 68000, the rotate left/right instructions
behave as you would expect with this implementation.  Moving in the
"left" direction moves bits towards the more significant part of the
word and the "right" direction is towards the least signficant part.

While various machines number their bits differently (for example, IBM
numbers the most significant bit 0 and the least 31, where as 68000's
and VAX's use little numbers for low order bits, and I think that the
Univac numbers them from 1 rather than 0), they all represent them as
having the LSB on the RIGHT.  What makes them big or little endian is
how they map this representation into the machines address space.