[comp.sys.m68k] Intel vs Motorola Byte ordering

swanson@ihlpl.UUCP (Swanson) (11/20/86)

Could someone please explain to me the rational behind the
way INTEL stores words in memory?  The way Motorola stores
words in memory?  Please email.  Thank you.

Robert Swanson

ihnp4!tss

gnu@hoptoad.UUCP (11/23/86)

In article <1509@ihlpl.UUCP>, swanson@ihlpl.UUCP (Swanson) writes:
> Could someone please explain to me the rational behind the
> way INTEL stores words in memory?  The way Motorola stores
> words in memory?

If enough people are apathetic (e.g. don't complain), I will post a
great piece, "On Holy Wars and a Plea for Peace", which is the best
description of byte ordering problems I've ever seen.  It was written
by Danny Cohen of USC-ISI, released as an Internet Experiment Note
(IEN-137), and eventually published in Datamation.  It runs about 36K
bytes.  Send me mail if you think I should not post it.

I will post it to mod.sources.doc if I can get any response out of
the moderator.  That group has been inactive for a long time.
-- 
John Gilmore  {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu   jgilmore@lll-crg.arpa
    "I can't think of a better way for the War Dept to spend money than to
  subsidize the education of teenage system hackers by creating the Arpanet."

gene@cooper.UUCP (Eugene Kwiecinski ) (11/24/86)

In article <1509@ihlpl.UUCP>, swanson@ihlpl.UUCP (Swanson) writes:
> Could someone please explain to me the rational behind the
> way INTEL stores words in memory?  The way Motorola stores
> words in memory?  Please email.  Thank you.

To save on a few transistors, of course. (money is money) From a hardware
point of view, it's easier to add the LSBs first and work up to the MSBs
( B = Byte, not bit ).

						Bye,
						Gene


	Usenet (UUCP) Address:
					 cucard\
				    psuvax!cmcl2\
      {psuvax1!princeton, ucbvax!ulysses}!allegra>!phri!cooper!gene
					columbia/
 {decwrl!ihnp4, harvard!seismo, decvax}!philabs/


	(Whew!)

hansen@mips.UUCP (Craig Hansen) (11/26/86)

In article <1509@ihlpl.UUCP>, swanson@ihlpl.UUCP (Swanson) writes:
> Could someone please explain to me the rational behind the
> way INTEL stores words in memory?  The way Motorola stores
> words in memory?  Please email.  Thank you.

You will undoubtably get several lame excuses why one ordering is better
than the other. They are both right...and both wrong.  The two conventions
are commonly described as little-endian and big-endian, which is a reference
to the frivolous dispute the Brobdignagians engaged in over which end an egg
should be broken on, in Johnathan Swift's _Gulliver's_Travels_.

At this point, a debate between the two conventions is entirely frivolous.
What's important is that there are two distinct conventions, and that this
can be a barrier to porting programs and databases, and data communications
between machines employing different conventions.  At MIPS, we refer to
these conventions as "byte sex," emphasising the two-ness and
incompatibility between them, but begging the question as to which is male
and which is female.  (The MIPS processor is, in this way, bisexual, and can
be configured to follow either convention.)

What I find inexcusable is the existence of machines that mix up the two
conventions on a single machine.  Motorola-endian is little-endian for bits,
and big-endian for most everything else (including the 68020 bit field
operations).  VAX-endian is mostly little-endian, except for floating-point
values which are, well, little-big-endian.

-- 

Craig Hansen			|	 "Evahthun' tastes
MIPS Computer Systems		|	 bettah when it
...decwrl!mips!hansen		|	 sits on a RISC"

bjorn@alberta.UUCP (Bjorn R. Bjornsson) (11/27/86)

In article <1335@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:
> If enough people are apathetic (e.g. don't complain), I will post a
> great piece, "On Holy Wars and a Plea for Peace", which is the best
> description of byte ordering problems I've ever seen.  ......

If I recall correctly, the biggest problem with this paper was
it's bias, Cohen expresses a definite preference (not in so many
words, but it shines through), and leaves out some, good arguments
for the little endian side.  I'm not unbiased either, but I certainly
don't pretend to be.  I'll elucidate, if this discussion gets of the
ground again.

Then again, I can work quite comfortably with either byte ordering,
and do, on Suns and Vaxen, many times with applications that are
sensitive to the particular order.  When it's an issue, big endian
usually makes things a little bit more interesting, if you have
trouble disposing of your free time that is.


				Bjorn R. Bjornsson
				alberta!bjorn

lamaster@nike.uucp (Hugh LaMaster) (12/04/86)

In article <138@pembina.alberta.UUCP> bjorn@alberta.UUCP (Bjorn R. Bjornsson) writes:
>In article <1335@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:
>> If enough people are apathetic (e.g. don't complain), I will post a
>> great piece, "On Holy Wars and a Plea for Peace", which is the best
>> description of byte ordering problems I've ever seen.  ......
>
 .....
> ....                             and leaves out some, good arguments
>for the little endian side.  I'm not unbiased either, but I certainly
>don't pretend to be.  I'll elucidate, if this discussion gets of the
>ground again.
>
>				Bjorn R. Bjornsson
>				alberta!bjorn


I, for one, would like to hear some good arguments for or against
a particular byte ordering.  It is my belief that there is no
intrinsic architectural reason for either one.  However, I am an
unapologetic big-endian for two reasons:
1)  A STANDARD is needed for the benefit of those of us who need
to move BINARY data files between machines of different types,
such as graphics and solids modeling data files;
2)  Big Endian is easier to read for English speaking people
because characters and floating point are in the same order as
in English.  (Has anyone ever wondered why we don't write 1 Million
as 000,000,1    ?)
But are there any intrinsic reasons for a particular order?  Some
people seem to think so.  What are they?




   Hugh LaMaster, m/s 233-9,   UUCP:  {seismo,hplabs}!nike!pioneer!lamaster 
   NASA Ames Research Center   ARPA:  lamaster@ames-pioneer.arpa
   Moffett Field, CA 94035     ARPA:  lamaster%pioneer@ames.arpa
   Phone:  (415)694-6117       ARPA:  lamaster@ames.arc.nasa.gov

"He understood the difference between results and excuses."

("Any opinions expressed herein are solely the responsibility of the
author and do not represent the opinions of NASA or the U.S. Government")

fouts@orville (Marty Fouts) (12/04/86)

A number of people have made claims along the line
that BIG ENDIAN is "easier to read because it's" like English.  This is
perhaps an oversimplification.  You can write a memory dump program to
present data in whichever format most amuses you, and I have seen terrible
examples of several possible formats, my favorite being the one which
gives lines of hex bytes alongside of lines of the ascii character codes
for the same memory addresses with the hex reading right to left and the
ascii reading left to right, like:

20 6e 69 74 72 61 4d 20  Martin 
20 20 20 73 74 74 6f 46 Fouts

Obviously, any of the four combinations LL, LR, RL, RR could have been coded,
independent of the wordsize and byte ordering of the machine in question.
Three of the four would be hard to read compared to the fourth, depending on
who you are.

I don't believe that there is an overriding hardware or software architural
requirement that makes one byte ordering obviously right.  There are
application and implementation dependent factors favoring either, depending
on the circumstance.  (And of course, there is always the 60 bit word length
machine :-)

A standard would be nice, but its probably too late for that.  (Anybody care
to discuss the superiority of EBCDIC over ASCII?)  I guess we should just be
happy that there aren't more byte within word order choices being made.

tim@amdcad.UUCP (Tim Olson) (12/05/86)

In article <791@nike.UUCP> lamaster@pioneer.UUCP (Hugh LaMaster) writes:
>I, for one, would like to hear some good arguments for or against
>a particular byte ordering.  It is my belief that there is no
>intrinsic architectural reason for either one.  However, I am an
>unapologetic big-endian for two reasons:
>1)  A STANDARD is needed for the benefit of those of us who need
>to move BINARY data files between machines of different types,
>such as graphics and solids modeling data files;
>2)  Big Endian is easier to read for English speaking people
>because characters and floating point are in the same order as
>in English.  (Has anyone ever wondered why we don't write 1 Million
>as 000,000,1    ?)
>But are there any intrinsic reasons for a particular order?  Some
>people seem to think so.  What are they?

I think a major reason why many microprocessors are little-endian is that
they have an 8-bit ancestory, and to perform multi-precision arithmetic
efficiently, they must index from the least-significant byte to the
most significant.  However, this is less of a problem with larger word size
microprocessors, since multi-precision aritmetic is not used as much past
16 or 32 bits.

One benefit of big-endian byte ordering on large (32-bit or more) wordsize
machines is possible fast lexicographical comparison of character data with
the use of integer compare instructions.  Since the "direction" of MSB to
LSB (B = byte) is the same as MSb to LSb (b = bit) strings may be compared
a word at a time instead of byte-by-byte. (That is, as long as you use a
sane character encoding, not something like the CDC display-codes ;-)

	-- Tim Olson
	Advanced Micro Devices
	
"byte ordering preferences expressed in this article do not necessarily
represent the views of this station or its management"

mayer@rochester.ARPA (Jim Mayer) (12/05/86)

I have never been convinced of any fundamental reason to prefer one
byte ordering over another, however I believe there are some practical
ones.  Basically, any networked machine that uses a different byte
order than the network(s) it is connected to will pay a (possibly
significant) performance penalty.  Furthermore, code that written to
run on both byte orders will always pay some penalty even if run on
the "right" machine.  The rest of this article contains an example,
a possible way out of the mess, an observation, and a question.

THE EXAMPLE:

Suppose a C program reads a message into a structure (let's assume
problems with byte size and alignment have gone away).  A correct
program has two choices: it can convert the structure to the machine's
byte order, or it can leave the structure in network order an convert on
each reference.  In the first case, there are three options.  The first
two are:

(1)	struct message { short x; long y; } m;
	m.x = ntohs(m.x); m.y = ntohl(m.y);

(2)	if (host byte order is not network byte order) {
		m.x = ntohs(m.x); m.y = ntohl(m.y);
		}
(3)	if (sending machine byte order is not host machine order) {
		m.x = ntohs(m.x); m.y = ntohl(m.y);
		}

In (1), there is a constant penalty of at least one unnecessary copy.
Unnecessary copy operations can quickly destroy the performance of a
message passing system.  Also, any missing swap operations will not be
detected on a machine with network byte order.  Case (2) assigns the
penalty where it is due, but opens up even more possibilities for
errors.  Case (3) uses the same trick the X display server uses:
accept messages in either byte order.  The code ends up being similar
to (2), but swapping is only done if the sending machine has a
different byte order than the receiving machine.  It has the same
testability problems as (2).

If the structure is maintained in network byte order, then each reference
to the structure entails a possible conversion.  The possibilities for
programmer error are quite large here as well.

THE SUGGESTION:

All of the above solutions are error prone when written in a language like
C that has no notion of byte order.  Languages like CLU and C++ offer
the possibility of encapsulating most of the nasty conversions.  Another
possibility is the addition of a "message" type constructor, analogous to
"struct" in C, but maintaining a particular byte order (and floating
point representation, etc.) and either prohibiting or interpreting correctly
things like pointer references.  Adding a "message" construct would be
syntactically prettier (I think) than forcing a lot of calls to "m.get_x()"
and "m.set_x(value)".  It would also help with other representation issues
(like value alignment, byte and word size, and floating point).

THE OBSERVATION:

I can easily envision a Load/Store architecture machine with a completely
bisexual instruction set.  Only the load and store operations would have
to be modified.  I understand from other postings that the MIPS processor
can already be configured in either mode.

THE QUESTION:

TCP and friends use a "big-endian" order (since TCP is a byte stream
protocol this applies only to things like addresses and headers).  What
order do other protocol families use?  Are there any major
"little-endian" protocol families?

-- Jim Mayer					Computer Science Department
(arpa) mayer@Rochester.EDU			University of Rochester
(uucp) rochester!mayer				Rochester, New York 14627
-- 
-- Jim Mayer					Computer Science Department
(arpa) mayer@Rochester.EDU			University of Rochester
(uucp) rochester!mayer				Rochester, New York 14627

billw@navajo.STANFORD.EDU (William E. Westfield) (12/09/86)

Network interfaces should do any byte swapping that may be necessary for
a given machine.  That way there isn't any speed penalty.

(having to byte swap the headers only is not so bad - note that the
 ones complement checksum used in TCP/IP is associative with respect
 tp byte swapping...)

BillW

rb@cci632.UUCP (Rex Ballard) (12/10/86)

There are several protocols that are Non-endian, such as AX.25.
Most such protocols simply establish a hierarchy of addresses.
This hierarchy simplifies routing significantly, since only the
the machine that matches the first ID will have to look at the
second...

In general, there is not "Best" endian approach.  Little-endian
machines have advantages in the integer processing of words larger
than their registers/backplane, while Big-endian has advantages
in raw search/compare/sort type applications.  Even graphics
isn't "cut and dried" in favor of either one, since there is
lots of shifting (which gives Big-endian an advantage), and
lots of integer computation (which favors Little-endian).

The worst of course is "middle-endian", such as that used on
the PDP-11 and some 808X machines for longs.

When discussing network protocols, or data interchange formats,
it is a good idea to avoid "endian" thinking at all.  Bytes
with "extension bits", or just raw bytes are definately preferable.
Some psychological factors which work well, is to think in terms of
"system; node" rather than "nodelow; nodehigh" type labelling.

Internally, so long as the machine is consistant, either end
is acceptable, based on the trade-offs desired for the application.

Rex B.

ihm@minnie.UUCP (Ian Merritt) (12/11/86)

>There are several protocols that are Non-endian, such as AX.25.
>Most such protocols simply establish a hierarchy of addresses.
>This hierarchy simplifies routing significantly, since only the
>the machine that matches the first ID will have to look at the
>second...

What you describe sounds like IP Source routing, where each address
examines the address chain, strips the first and sends the rest on to
that address.  Still, if the basic address size is larger than the CPU
basic word (as it will certainly be in any network with more than 256
nodes (or at least 256 nodes per heirarcical subnet), the issue of how
to combine the basic size at the destination is no more well established
by your suggestion than by the existing standards.

>
>In general, there is not "Best" endian approach.  Little-endian
>machines have advantages in the integer processing of words larger
>than their registers/backplane, while Big-endian has advantages
>in raw search/compare/sort type applications.  Even graphics
>isn't "cut and dried" in favor of either one, since there is
>lots of shifting (which gives Big-endian an advantage), and
>lots of integer computation (which favors Little-endian).

Agreed, mostly, though I am somewhat partial (as I have mentioned
before) to the Little endian school.

>
>The worst of course is "middle-endian", such as that used on
>the PDP-11 and some 808X machines for longs.

"middle-endian", you seem to use as a generic form for describing
anything that is not consistently either little or big.  I disagree with
the term, but not with the conclusion you draw.  To what 808x machines
do you refer?

>
>When discussing network protocols, or data interchange formats,
>it is a good idea to avoid "endian" thinking at all.  Bytes
>with "extension bits", or just raw bytes are definately preferable.

I can't agree that extension bits are preferable, and in any case, as I
mentioned above, there is often a need for numbers larger than can be
represented in a single byte.  There is no need to constrict the
capability of the protocol just because it's "hard" to juggle the bytes
around to accomodate a particular machine.  If computers aren't here to
juggle bits and bytes, what ARE they for? Granted it would save time in
this case if every machine had the same standard, but since we can't
agree on a uniform standard, juggling is likely to be here to stay for a
while at least.

>Some psychological factors which work well, is to think in terms of
>"system; node" rather than "nodelow; nodehigh" type labelling.

Again, source routing: not always useful; not necessarily byte oriented.

>
>Internally, so long as the machine is consistant, either end
>is acceptable, based on the trade-offs desired for the application.

Agreed, but there are very few machines that are totally consistent
about it.

>
>Rex B.


Cheerz--
						<>IHM<>
-- 

uucp:	ihnp4!nrcvax!ihm