[comp.protocols.tcp-ip] Byte and bit order within packet headers

mandrews@alias.com (Mark Andrews) (04/25/91)

I have a question concerning concering the byte and bit order of fields
within packet headers. Many of the RFCS (including RFC1060) state rules
about the byte (octet) order:

Data Notations

   The convention in the documentation of Internet Protocols is to
   express numbers in decimal and to picture data in "big-endian" order
   [21].  That is, fields are described left to right, with the most
   significant octet on the left and the least significant octet on the
   right.

   The order of transmission of the header and data described in this
   document is resolved to the octet level.  Whenever a diagram shows a
   group of octets, the order of transmission of those octets is the
   normal order in which they are read in English.  For example, in the
   following diagram the octets are transmitted in the order they are
   numbered.


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |       1       |       2       |       3       |       4       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |       5       |       6       |       7       |       8       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |       9       |      10       |      11       |      12       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                        Transmission Order of Bytes


So, all multi-octet fields are transmitted in big-endian byte order.

This is a small problem for little endian machines (most significant byte is
on the right). They must correct their byte order before the packet is
transmitted. In BSD systems, this job is performed by the htonl() and htons()
functions (host to network long, host to network short), but what about the
bit order? It is still little-endian (bits are numbered right to left instead
of left to right)! What order is are the bits transmitted in?

This is further complicated by the following code fragment from
/usr/include/netinet/ip.h:


	struct ip {
	#if BYTE_ORDER == LITTLE_ENDIAN
	        u_char  ip_hl:4,                /* header length */
	 	        ip_v:4;                 /* version */
	#endif
	#if BYTE_ORDER == BIG_ENDIAN
	        u_char  ip_v:4,                 /* version */
	                ip_hl:4;                /* header length */
	#endif
		<etc.>


When all this is translated, there are two views of the first byte of the
ip structure; one little endian:

	7              0
	+------+-------+
	| ip_v | ip_hl |
	+------+-------+

and one big endian:

	0              7
	+------+-------+
	| ip_v | ip_hl |
	+------+-------+

Now according to the 4.3BSD C Reference Manual by B. Kernighan, the addresses
of a structure increase as the declarations are read left to right
(irrelevant of the bit or byte order), so in terms of a C addressing model,
the first byte of the ip structure is:

	0              7
	+------+-------+-----
	| ip_v | ip_hl | other elements of ip structure
	+------+-------+-----

Perhaps the bits are transmitted MSB to LSB based on the C model. I don't
know.


Any clarifications on my confusion or any other help would be appreciated.

Thanks,

Mark

------------------------------------------------------------------------------
	Mark Andrews
	Systems Programmer,
	Alias Research,
	Toronto, Canada
Phone:	(416)-362-9181
Mail box: mark@alias.com

greene@coral.com (Jeremy Greene) (04/26/91)

) This is a small problem for little endian machines (most significant byte is
) on the right). They must correct their byte order before the packet is
) transmitted. In BSD systems, this job is performed by the htonl() and htons()
) functions (host to network long, host to network short), but what about the
) bit order? It is still little-endian (bits are numbered right to left instead
) of left to right)! What order is are the bits transmitted in?
) 

Byte order is a host to host issue. One host has a different view of
byte order than the other. Fortunately, there is an agreed upon inter-host
(network) format which is big-endian.

Bit order is not a host problem; it is only network related, and more
specifically, MAC layer related. In otherwords, you always have the
same hardware at both ends of the connection and from the host
perspective the bit order is treated the same. If you send 0x01 on
fddi you will receive 0x01 on some other fddi interface. Same for
Ethernet.  Unlike the byte order issue, you never send from fddi to
Ethernet, which would make a big mess.

The fact that bits are sent in a different order do not present a
problem in getting data from one host to another.

The problem is that the actual network hardware interprets the
mac address. Given that:

   - the address on both rings and Ethernets are the same: group
     bit first and,

   - rings transmit the left most bit from a byte first, Ethernet
     transmits the right most bit

the same address has to be placed in memory in a different bit order
depending on the media type.

So, if the address starts 0x01 in Ehternet land (which is a group
address) then it must start 0x80 for a fddi interface. From the
network perspective it's the same address.

In other words, similar to the byte order problem, you want to have
the macros 'ntomac' and 'macton'. To do this, there has to be a
canonical foramt, which IEEE has (recently) stated is the Ethernet format.

The bottom line is that you only have to worry about bit order if
you're wokring at the MAC layer.

Jeremy

erick@sunee.waterloo.edu (Erick Engelke) (04/26/91)

In article <9104241753.AA21589@dino.alias.com> mandrews@alias.com (Mark Andrews) writes:
>
>I have a question concerning concering the byte and bit order of fields
>within packet headers. Many of the RFCS (including RFC1060) state rules
>about the byte (octet) order:
>

Much confusion stems from the fact that Intel processors store 
bits in the following order

  +--+--+--+--+--+--+--+---+---+--+--+--+--+--+--+---+
  |07 06 05 04 03 02 01 00 | 15 14 13 12 11 10 09 08 |
  +--+--+--+--+--+--+--+---+---+--+--+--+--+--+--+---+
  |  first stored byte     |  second stored byte     |  etc.

whereas network order is

  +--+--+--+--+--+--+--+---+---+--+--+--+--+--+--+---+
  |15 14 13 12 11 10 09 08 | 07 06 05 04 03 02 01 00 |
  +--+--+--+--+--+--+--+---+---+--+--+--+--+--+--+---+

so the bits and nybbles are already in network order, you simply need to
organize quantities larger than a byte, namely 16 and 32 bit values.

The intel code should have
    unsigned  ip_h : 4;
    unsigned  ip_v : 4;

I hope this clears it up a bit.

Erick


-- 
----------------------------------------------------------------------------
Erick Engelke                                       Watstar Computer Network
Watstar Network Guy                                   University of Waterloo
Erick@Development.Watstar.UWaterloo.ca              (519) 885-1211 Ext. 2965

mark@alias.com (Mark Andrews) (04/26/91)

From NIC.DDN.MIL!tcp-ip-RELAY@utcsri Fri Apr 26 03:20:29 1991
Date: 26 Apr 91 05:15:36 GMT
From: usc!rpi!news-server.csri.toronto.edu!utgpu!watserv1!sunee!erick@apple.com  (Erick Engelke)
Organization: University of Waterloo
Subject: Re: Byte and bit order within packet headers
References: <9104241753.AA21589@dino.alias.com>
Sender: tcp-ip-relay@nic.ddn.mil
To: tcp-ip@nic.ddn.mil

Erick Engelke (usc!rpi!news-server.csri.toronto.edu!utgpu!watserv1!sunee!erick@apple.com) responds to my question:

>In article <9104241753.AA21589@dino.alias.com> mandrews@alias.com (Mark Andrews) writes:
>>
>>I have a question concerning concering the byte and bit order of fields
>>within packet headers. Many of the RFCS (including RFC1060) state rules
>>about the byte (octet) order:
>>
>
>Much confusion stems from the fact that Intel processors store 
>bits in the following order
>
>  +--+--+--+--+--+--+--+---+---+--+--+--+--+--+--+---+
>  |07 06 05 04 03 02 01 00 | 15 14 13 12 11 10 09 08 |
>  +--+--+--+--+--+--+--+---+---+--+--+--+--+--+--+---+
>  |  first stored byte     |  second stored byte     |  etc.
>
>whereas network order is
>
>  +--+--+--+--+--+--+--+---+---+--+--+--+--+--+--+---+
>  |15 14 13 12 11 10 09 08 | 07 06 05 04 03 02 01 00 |
>  +--+--+--+--+--+--+--+---+---+--+--+--+--+--+--+---+
>
>so the bits and nybbles are already in network order, you simply need to
>organize quantities larger than a byte, namely 16 and 32 bit values.
>
>The intel code should have
>    unsigned  ip_h : 4;
>    unsigned  ip_v : 4;
>
>I hope this clears it up a bit.
>
>Erick

Fine, this a good example. In my specific example, I was looking how BSD
code interprets the version number and  header length of an IP packet
header:

    struct ip {
    #if BYTE_ORDER == LITTLE_ENDIAN
            u_char  ip_hl:4,                /* header length */
                    ip_v:4;                 /* version */
    #endif
    #if BYTE_ORDER == BIG_ENDIAN
            u_char  ip_v:4,                 /* version */
                    ip_hl:4;                /* header length */
    #endif

Unfortunately, the bit order is machine and compiler dependent. On little
endian machines, the bit fields are assigned least significant bit first
(right to left), resulting in:

   MSB                     LSB
    +--+--+--+--+--+--+--+--+
    |   ip_v    |   ip_hl   |
    +--+--+--+--+--+--+--+--+
     07 06 05 04 03 02 01 00

On big endian machines, the bit fields are also assigned least significant
bit first, but this time the bit fields are assigned left to right:

   LSB                     MSB
    +--+--+--+--+--+--+--+--+
    |   ip_v    |   ip_hl   |
    +--+--+--+--+--+--+--+--+
     00 01 02 03 04 05 06 07


In which order are the bits transmitted such that the integrity of the
data is not compromised, independent of the endian order.

For example, if a big endian machine is talking to a little endian machine,
in what order are the bits transmitted so that the ip_v and ip_hl fields
from the big endian machine are interpreted properly on the little endian
machine.

In the case of the ip_v field, bits 0-3 of the big endian byte must be
transmitted to bits 4-7 of the little endian byte!

Thanks for any information,

Mark

henry@zoo.toronto.edu (Henry Spencer) (04/27/91)

In article <9104241753.AA21589@dino.alias.com> mandrews@alias.com (Mark Andrews) writes:
>... what about the
>bit order? It is still little-endian (bits are numbered right to left instead
>of left to right)! What order is are the bits transmitted in?

Fortunately, this is not an issue, because the data is fed to the hardware
as bytes (usually) and consequently it is the hardware's business to get
the bit order right on both ends.  The usual practice is to send lsb first,
but this is completely invisible to the software.

There is sometimes confusion about how the bits are *numbered*, but that
is a separate issue.  The high-order bit is always the high-order bit and
is always in the same place, regardless of whether the manual calls it
bit 7 or bit 0.

>This is further complicated by the following code fragment from
>/usr/include/netinet/ip.h:

Here we have a different issue.  The reason why ip.h is #if'd is that C
does not define the order of bitfields within a word, and it is both
machine-specific and compiler-specific.  Using bitfields for this was
dumb, actually; convincing them to match an externally-defined storage
layout can be tricky.

>Now according to the 4.3BSD C Reference Manual by B. Kernighan, the addresses
>of a structure increase as the declarations are read left to right...

Bitfields do not have addresses and that rule does not apply to them.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

zweig@cs.uiuc.edu (Johnny Zweig) (04/27/91)

My solution is never to use bit-fields when decoding packets. Just mask things
with 0x0F and 0xF0 and let the compiler optimize it.

The problem is not a network thing but a C language thing. Quoting from K&R
(2nd ed.):

	Almost everything about [bit] fields is implementation-dependent.
	... Fields are assign left to right on some machines and right to
	left on others.

The term machine here is misleading -- it is actually the implementation of C
on a particular machine that decides what to do with bit fields. One could
imagine two different compilers on the same architecture that did it
differently.

So just get a compiler with inline expansion and a decent optimizer and define
functions to access fields inside of headers.

And this htonl() ntohs() is a poor solution to the problem. It is too easy to
forget what byte-order a particular int currently is in. In my TCP/IP
implementation (in C++) I have a class that hides all that junk. I just assign
values into number-holders according to what byte order they are in, and
retrieve the values in the appropriate order for manipulation.  This makes
errors such as calling htons() twice never happen....

-ynnhoJ redro-etyB

kre@cs.mu.oz.au (Robert Elz) (04/29/91)

mark@alias.com (Mark Andrews) writes:

>Unfortunately, the bit order is machine and compiler dependent.

This is true, in a sense.

Bu this ...

>On little endian machines, the bit fields are assigned least significant bit
>first (right to left),

>On big endian machines, the bit fields are also assigned least significant
>bit first, but this time the bit fields are assigned left to right:

Is simply wrong, or perhaps inaccurate.  Except on those processors
that have "extract/insert bitfield" instructions, the order in which
bitfields are placed in a byte (or whatever) is purely compiler
dependant - on a little endian host you could assign bit fields either
way, on a big endian host you could assign bit fields either way.

Even with a host with bitfield instructions, the compiler could do
it either way (the bit numbers of the fields will be constant,
whether the compiler omits instructions to extract bits 0-3 or
4-7 when fetching the ip_v field is pretty much irrelevant to
anything - except the expectations of the author of the code).

The ifdef's in the BSD source are just a latent bug waiting to bite
someone who isn't very careful porting the code to a new compiler.
(It just happens to work out right on the compilers the code is
normally compiled with).

>In which order are the bits transmitted such that the integrity of the
>data is not compromised, independent of the endian order.

It depends entirely on the medium over which the data is being sent,
and only on that - on serial (point to point) wires, sync or async,
and on ethernet (ISO 8802/3) the least significant bit is sent
first, on rings the most significant bit is sent first, if you
happen to have an 8 bit parallel bus, then all the bits are sent
simultaneously..., but as long as all the hardware understands this
the most significant bit of a received byte will be the most
significant bit of the transmitted byte, so once the hardware is
designed & built correctly you never need to worry about this.

On the other hand, you do need to worry about the order of the bytes
wrt their interpretation as multi-byte objects (int's etc), and
thw way that the compiler lays out storage in structs, including bits
for bit fields.  Anyone attempting to use struct definitions to
represent network packet formats must have intimate knowledge of the
way the compiler works - and should the compiler decide to change
from one version to another, and the network code breaks because
of that, its a bug in the net code - not the compiler.

kre

romkey@ASYLUM.SF.CA.US (John Romkey) (04/29/91)

It's true. In the portable UDP stack which Epilogue sells, I
originally had defined the header length and IP version numbers at
bitfields but finally took them out because we found that that
bitfield ordering was really compiler dependent instead of processor
dependent. I replaced our bitfields with appropriate masking
operations to avoid the problem.
		- john romkey			Epilogue Technology
USENET/UUCP/Internet:  romkey@asylum.sf.ca.us	voice/fax: 415 594-1141

nreadwin@micrognosis.co.uk (Neil Readwin) (04/30/91)

In article <9104252242.AA28673@taipan.coral.com>, greene@coral.com
(Jeremy Greene) writes:
|> The bottom line is that you only have to worry about bit order if
|> you're wokring at the MAC layer.

Or reading IEEE documents that specify everything in a counter-intuitive
bit ordering :-\

 Phone: +44 71 528 8282  E-mail: nreadwin@micrognosis.co.uk
 Quote: Everything is a cause for sorrow that my mind or body has made

lance@motcsd.csd.mot.com (lance.norskog) (04/30/91)

mark@alias.com (Mark Andrews) writes:

>I have a question concerning concering the byte and bit order of fields
>within packet headers. Many of the RFCS (including RFC1060) state rules
>about the byte (octet) order:
> [ use of C bit fields elided ]

The C programming language is missing a lot of useful features;
paradoxically, bit fields should never have been added.  In
particular, they should not be used for expressing the movement
of binary data between machines.  BSD TCP/IP shouldn't have used
bit fields to begin with, and should be rewritten to get rid of
them.  Sorry, I can't volunteer.  You should quit trying to 
use them, and rewrite your code to remove them.

Lance Norskog