[comp.unix.internals] Byte Order on workstations

wangjw@cs.purdue.EDU (Jingwen Wang) (06/28/91)

  I am a little curious about the byte order differences on different
workstations. 

  When we communicate via sockets over the network, we don't need to care
about the differences between network order and host order. This is true 
if we are communicating among the workstations of the same byte order.

  But If we communicate between workstations of different byte order, .e.g., 
Dec station -- Sun, we must first transform data into network order before 
sending them and change them back to host order after receiving them.

  My question is that in a network environment, how is this problem solved?
For example, when this mail reaches your machine, how does your machine 
know that this mail is from a Sun instead of a Dec Station? How does your
machine process the byte order?

  Can anyone shade some light on this?

Jingwen Wang
Purdue University

mwarren@rws1.ma30.bull.com (Mark Warren) (06/28/91)

wangjw@cs.purdue.EDU (Jingwen Wang) writes:


>  I am a little curious about the byte order differences on different
>workstations. 

>  When we communicate via sockets over the network, we don't need to care
>about the differences between network order and host order. This is true 
>if we are communicating among the workstations of the same byte order.

>  But If we communicate between workstations of different byte order, .e.g., 
>Dec station -- Sun, we must first transform data into network order before 
>sending them and change them back to host order after receiving them.

First, all messages on the network are in "network" byte order, as far as the
message headers are concerned.  Even on machines whose host order is the
same as the network byte order, the hton{l,s}() and friends are used, even if
they are noops for a given machine.  The important thing is that for ANY
machine, network packets always have their headers and trailers in network,
not host, order.

>  My question is that in a network environment, how is this problem solved?
>For example, when this mail reaches your machine, how does your machine 
>know that this mail is from a Sun instead of a Dec Station? How does your
>machine process the byte order?

In general, the problem for the DATA contained in a message is not solved.
Mail works pretty well because it is byte stream data, so byte order is
immaterial.  For binary data, the the byte order of a message may be wrong.
It is up to cooperating applications to perform any necessary byte swapping
and re-alignment to make binary data usable.  One way this is done is by
using XDR's in Sun RPC.  This is ENTIRELY up to the application programs.
The network won't do anything for you except preserve the network headers.
--

 == Mark Warren                      Bull HN Information Systems Inc. ==
 == (508) 294-3171 (FAX 294-3020)    300 Concord Road     MS836A      ==
 == M.Warren@bull.com                Billerica, MA 01821              ==

torek@elf.ee.lbl.gov (Chris Torek) (06/28/91)

In article <15145@ector.cs.purdue.edu> wangjw@brandon.cs.purdue.edu () writes:
>... If we communicate between workstations of different byte order, .e.g., 
>Dec station -- Sun, we must first transform data into network order before 
>sending them and change them back to host order after receiving them.

You probably misunderstand the `byte order problem':

>  My question is that in a network environment, how is this problem solved?
>For example, when this mail reaches your machine, how does your machine 
>know that this mail is from a Sun instead of a Dec Station?

It does not.  Mail is not a problem because it is interpreted consistently.

The `byte order problem' is not that one or another kind of machine gets
things `backwards'.  The bytes you send from a VAX to a Sun, or vice
versa, are the same when received as when sent.  The reason that a
`number' like 0x1234 seems to change to 0x3412 is not because it *did*
change, but rather because you changed the way you interpret the bytes.

,erehwyreve redro emas eht ni era enil siht no setyb ehT
but you have to read them right to left to make sense of them.

`Little endian' machines (VAX, DECstation, etc) interpret multibyte
numbers this way:

	byte 0: 0x12
	byte 1: 0x34

The number is the low byte plus the next byte times 256 plus the next
times 65536 plus....  In this case the number is 0x3412.  Big endian
machines, on the other hand, start `at the top'.  If the multibyte
number is four bytes long, the number is the low byte time 16777216
plus the next byte times 65536 plus the next byte times 256 plus the
last byte.  Here the number is two bytes, so it is 0x1234.  The
difference is that while we read English text left-to-right, the
machine reads the bytes in the machine's order (whatever that is) and
then presents them to us left-to-right in ASCII.

Once again: the underlying bit sequence HAS NOT CHANGED.  The reason
it `looks' different is that different machines `look' in different
orders.  As long as the machines agree to `look at' the bytes in the
same order, the problem vanishes.

(Internet) mail is defined as a text sequence, and most computers use
the same order for text.  Thus there is no problem.  The semantics are
the same because the symbols (ASCII characters) are interpreted
identically.%  This is the basis of communication: when two entities
agree on a set of symbols and interpretation rules, those two entities
can communicate.  The `network byte order problem' is a result of a
small variation in the interpretation rules.
-----
% Note that this breaks down in some instances, e.g., the bytes {|}
  may be interpreted differently on a display in Oslo than on another
  in Akron, Ohio.
-----

Incidentally, this is sort of a microcosmic version of the problems
ASN.1 and other standards are trying to solve.  In order to communicate
some data set, we need to get the machines to agree on both the symbols
and their meanings.  The ISO approach seems to be to define translation
layer after translation layer, and hope that, somewhere along the way,
all the pieces get defined, rather than to start with the fact that the
pieces must get defined, to define them, and only then to arrange those
definitions in some pleasing order for recording in English (or other
human language).
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov