[comp.arch] byte order: be reasonable - do it my way...

urip@orcisi.UUCP (01/08/87)

Although it's coming a little late, and some readers may have forgotten the 
original article (On Holy Wars and a Plea for Peace by Danny Cohen) by now, 
I still hope that my article will get enough audience.

My point is that the Least-Significant-Byte-first camp (LSBians,
pronounced: elesbians) has a more correct way than the Most-Significant-byte-
first (MSBians, pronounced: emesbians), and I am going to try to convince
the MSBians to go my way. 

Before I start with the main issue, let me comment about the side issue.
As someone who's native language is Hebrew and also knows some
Arabic from school, I would like to confirm almost everything that was said 
in the article and in the responses about the order of digits etc. including
the examples from the Bible and computer terminals in Arabic/Hebrew. 
There was a slight inaccuracy about the way numbers are read in Arabic:
Only the units and tens are read the LSBian way, and the rest of
the number is read the MSBian way. For example, the year 1984 is read: 
"one thousand nine hundreds four and eighty". Also, for those who don't know,
the digit characters in Arabic are different from the Latin forms, but
in Hebrew they are the same.

The article was written in 1980, and things have changed since then.
Six years are a lifetime in the world of computers, and sentences like:

"I failed to find a Little-Endians' system which is totally consistent"

cannot be left without an objection in 1986 (almost 1987).
The Intel 80*86 micro processors are true, consistent LSBians. They do not
have combined shift operations (the article suggested these as a good criterion
to tell between LSBians and MSBians), but the multiply operation sure leaves
the most significant part of the result in the high register, and the floating
point format is totally consistent with the rest of the data types.
The same is true for the National Series 32000 and I believe that Zilog
is with the LSBians too.

So in the micro processor area, it seems that Motorola 68000
is the only (though major...) MSBian around. Now, is it really as clean
and pure MSBian as claimed in the article? Let me refresh your memory
with a quote from the article:

"Hence, the M68000 is a consistent Big-Endian, except for its bit
designation, which is used to camouflage its true identity.
Remember: the Big-Endians were the outlaws."

The author did not try to claim that the funny floating point format
of the VAX was to camouflage the VAX's true identity, so why should one believe
that the LSBian bit order of the M68000 is because "the Big-Endians were
the outlaws" ? I suspect that the true reason behind the inconsistency 
of the M68000 is the fact that only with an LSBian bit order, the value of 
bit number 'i' in a word is always equal to  

		b[i] * 2^i

(where b[i] is 0 or 1 according to bit number 'i', and 2^i is 2 to the power 
of i) 
and the designers of M68000 wanted to keep this important feature in spite of 
the overall MSBian architecture.


There is another difference between LSBian and MSBian memory order that
was not mentioned in the article. 
In the LSBian scheme, if a long-word in memory contains a small value,
then a word or a byte in the same memory location still hold the same
value (if the value is small enough to fit into these).
For example, assume we have the value 0x00000002 in a (32 bit) long-word
in memory address 100.

                LSB in lower address

address	       104  103  102  101  100
                +----+----+----+----+
value           | 00 | 00 | 00 | 02 | 
                +----+----+----+----+

Note that a long-word, short word, byte and even nibble at address 100, all
contain value 2.
On the other hand,

                MSB in lower address

address	       100  101  102  103  104
                +----+----+----+----+
value           | 00 | 00 | 00 | 02 | 
                +----+----+----+----+

Note that only a long-word at address 100 contains 2. All the rest contain 0.

This may not seem to be a key issue, but it has some significance in type 
conversion as illustrated by the following C program segment:

/*=================================*/
int	i;
char	ch;

ch = i;
/*=================================*/

The 'int' (assume int is 32 bits) value has to be converted to 'char' or byte. 
In LSBian, this conversion is just a simple 'movb' (move byte) instruction 
from 'i' to 'ch':

		movb	i, ch

since both byte and long-word contain the same value. 

In MSBian it may involve an expensive bit field instruction (or worse, 
shifts and ands). Luckily for the M68000, it is byte addressable, so the 
compiler can do the trick and generate:

		movb	i+3, ch

So it is still a simple machine instruction, but it involves a small trick.
Not clean, but still consistent, as long as we stick to byte addressable
memory. 

But what about registers? registers are not byte addressable. 
There is only one byte of a register that can be accessed by a 'movb' 
instruction. All the other 3 bytes can be accessed only through bit field
instructions (or worse, shifts and ands). 

Let's look at another program segment:

/*=================================*/
extern  int fgetc();
char	ch;

ch = fgetc(file);
/*=================================*/

The C library routine 'fgetc' returns an 'int' result and it has to be
converted to 'char'. Most implementations return function results in 
register 0. 

Assume that register D0 contains 'int' (32 bits) value 2,
and so does the long-word at address 100.

                MSB in lower address

address	       100  101  102  103  104
                +----+----+----+----+
value           | 00 | 00 | 00 | 02 | 
                +----+----+----+----+
                +----+----+----+----+
register D0     | 00 | 00 | 00 | 02 | 
                +----+----+----+----+

The instructions

	movl	100,x
	movl	D0,x

both move a long-word containing value 0x00000002 to location 'x'.
so
	movb	100,x
	movb	D0,x

both should move a byte containing value 00 to locaion 'x'.
So the code generated for above program segment in a true consistent
MSBian machine would be:

	jbsr	fgets
	movl	24,d1
	lsrl	d1,d0
	movb	d0,ch

But in M68000 this is not true. No shift operation is needed because
a 'movb' instruction with a register operand takes the byte that contains 2,
that is, the HIGH address bit, so the compiler can generate a

	jbsr	fgets
	movb	d0,ch

In other words, we see that the byte/word/long-word overlap of registers 
in the M68000 is implemented according to the more efficient LSBian way!!


Conclusion:
==========

I have shown that there are two aspects in which the LSBian way is more 
suitable and more efficent for binary computers. This is in addition
to the argument of easier serial addition and multiplication that was mentioned
in the article (though the latter is balanced, to some extent, by serial 
comparison and division).

The main argument left against the LSBians is the more readable MSBian
dump format. I think that in the modern days of optimizing compilers and 
symbolic debuggers, dumps are almost an extinct species, and please 
let them stay that way. 

I don't have any illusions. I don't expect Motorola to change their byte order
after reading my article. I don't even expect users to prefer LSBian 
machines just for the sake of beauty and consistency. 
But I do hope that some day the LSBian method will prevail (or, maybe,
someone will convince me of the superiority of the MSBian method...).


Uri Postavsky (utcs!syntron!orcisi!urip)

		(currently with O.R.C Toronto,
		  formerly with National Semiconductor Tel Aviv).


From postnews Thu Jan  8 10:22:34 1987

Subject: Byte Order: be reasonable - do it my way...
Newsgroups: comp.sys.m68k,comp.arch,comp.sys.intel



Although it's coming a little late, and some readers may have forgotten the 
original article (On Holy Wars and Plea for Peace by Danny Cohen) by now, 
I still hope that my article will get enough audience.

My point is that the Least-Significant-Byte-first camp (LSBians,
pronounced: elesbians) has a more correct way than the Most-Significant-byte-
first (MSBians, pronounced: emesbians), and I am going to try to convince
the MSBians to go my way. 

Before I start with the main issue, let me comment about the side issue.
As someone who's native language is Hebrew and also knows some
Arabic from school, I would like to confirm almost everything that was said 
in the article and in the responses about the order of digits etc. including
the examples from the Bible and computer terminals in Arabic/Hebrew. 
There was a slight inaccuracy about the way numbers are read in Arabic:
Only the units and tens are read the LSBian way, and the rest of
the number is read the MSBian way. For example, the year 1984 is read: 
"one thousand nine hundreds four and eighty". Also, for those who don't know,
the digit characters in Arabic are different from the Latin forms, but
in Hebrew they are the same.

The article was written in 1980, and things have changed since then.
Six years are a lifetime in the world of computers, and sentences like:

"I failed to find a Little-Endians' system which is totally consistent"

cannot be left without an objection in 1986 (almost 1987).
The Intel 80*86 micro processors are true, consistent LSBians. They do not
have combined shift operations (the article suggested these as a good criterion
to tell between LSBians and MSBians), but the multiply operation sure leaves
the most significant part of the result in the high register, and the floating
point format is totally consistent with the rest of the data types.
The same is true for the National Series 32000 and I believe that Zilog
is with the LSBians too.

So in the micro processor area, it seems that Motorola 68000
is the only (though major...) MSBian around. Now, is it really as clean
and pure MSBian as claimed in the article? Let me refresh your memory
with a quote from the article:

"Hence, the M68000 is a consistent Big-Endian, except for its bit
designation, which is used to camouflage its true identity.
Remember: the Big-Endians were the outlaws."

The author did not try to claim that the funny floating point format
of the VAX was to camouflage the VAX's true identity, so why should one believe
that the LSBian bit order of the M68000 is because "the Big-Endians were
the outlaws" ? I suspect that the true reason behind the inconsistency 
of the M68000 is the fact that only with an LSBian bit order, the value of 
bit number 'i' in a word is always equal to  

		b[i] * 2^i

(where b[i] is 0 or 1 according to bit number 'i', and 2^i is 2 to the power 
of i) 
and the designers of M68000 wanted to keep this important feature in spite of 
the overall MSBian architecture.


There is another difference between LSBian and MSBian memory order that
was not mentioned in the article. 
In the LSBian scheme, if a long-word in memory contains a small value,
then a word or a byte in the same memory location still hold the same
value (if the value is small enough to fit into these).
For example, assume we have the value 0x00000002 in a (32 bit) long-word
in memory address 100.

                LSB in lower address

address	       104  103  102  101  100
                +----+----+----+----+
value           | 00 | 00 | 00 | 02 | 
                +----+----+----+----+

Note that a long-word, short word, byte and even nibble at address 100, all
contain value 2.
On the other hand,

                MSB in lower address

address	       100  101  102  103  104
                +----+----+----+----+
value           | 00 | 00 | 00 | 02 | 
                +----+----+----+----+

Note that only a long-word at address 100 contains 2. All the rest contain 0.

This may not seem to be a key issue, but it has some significance in type 
conversion as illustrated by the following C program segment:

/*=================================*/
int	i;
char	ch;

ch = i;
/*=================================*/

The 'int' (assume int is 32 bits) value has to be converted to 'char' or byte. 
In LSBian, this conversion is just a simple 'movb' (move byte) instruction 
from 'i' to 'ch':

		movb	i, ch

since both byte and long-word contain the same value. 

In MSBian it may involve an expensive bit field instruction (or worse, 
shifts and ands). Luckily for the M68000, it is byte addressable, so the 
compiler can do the trick and generate:

		movb	i+3, ch

So it is still a simple machine instruction, but it involves a small trick.
Not clean, but still consistent, as long as we stick to byte addressable
memory. 

But what about registers? registers are not byte addressable. 
There is only one byte of a register that can be accessed by a 'movb' 
instruction. All the other 3 bytes can be accessed only through bit field
instructions (or worse, shifts and ands). 

Let's look at another program segment:

/*=================================*/
extern  int fgetc();
char	ch;

ch = fgetc(file);
/*=================================*/

The C library routine 'fgetc' returns an 'int' result and it has to be
converted to 'char'. Most implementations return function results in 
register 0. 

Assume that register D0 contains 'int' (32 bits) value 2,
and so does the long-word at address 100.

                MSB in lower address

address	       100  101  102  103  104
                +----+----+----+----+
value           | 00 | 00 | 00 | 02 | 
                +----+----+----+----+
                +----+----+----+----+
register D0     | 00 | 00 | 00 | 02 | 
                +----+----+----+----+

The instructions

	movl	100,x
	movl	D0,x

both move a long-word containing value 0x00000002 to location 'x'.
so
	movb	100,x
	movb	D0,x

both should move a byte containing value 00 to locaion 'x'.
So the code generated for above program segment in a true consistent
MSBian machine would be:

	jbsr	fgets
	movl	24,d1
	lsrl	d1,d0
	movb	d0,ch

But in M68000 this is not true. No shift operation is needed because
a 'movb' instruction with a register operand takes the byte that contains 2,
that is, the HIGH address bit, so the compiler can generate a

	jbsr	fgets
	movb	d0,ch

In other words, we see that the byte/word/long-word overlap of registers 
in the M68000 is implemented according to the more efficient LSBian way!!


Conclusion:
==========

I have shown that there are two aspects in which the LSBian way is more 
suitable and more efficent for binary computers. Even an MSBian machine
like M68000 is LSBian in these aspects. This is in addition
to the argument of easier serial addition and multiplication that was mentioned
in the article (though the latter is balanced, to some extent, by serial 
comparison and division).

The main argument left against the LSBians is the more readable MSBian
dump format. I think that in the modern days of optimizing compilers and 
symbolic debuggers, dumps are almost an extinct species, and please 
let them stay that way. 

I don't have any illusions. I don't expect Motorola to change their byte order
after reading my article. I don't even expect users to prefer LSBian 
machines just for the sake of beauty and consistency. 
But I do hope that some day the LSBian method will prevail (or, maybe,
someone will convince me of the superiority of the MSBian method...).


Uri Postavsky (utcs!syntron!orcisi!urip)

		(currently with O.R.C Toronto,
		  formerly with National Semiconductor Tel Aviv).

mwm@cuuxb.UUCP (01/11/87)

In article <760@orcisi.UUCP> urip@orcisi.UUCP writes:
>My point is that the Least-Significant-Byte-first camp (LSBians,
>pronounced: elesbians) has a more correct way than the Most-Significant-byte-
>first (MSBians, pronounced: emesbians), and I am going to try to convince
>the MSBians to go my way. 
>...
>"Hence, the M68000 is a consistent Big-Endian, except for its bit
>designation, which is used to camouflage its true identity.
>Remember: the Big-Endians were the outlaws."
>
>The author did not try to claim that the funny floating point format
>of the VAX was to camouflage the VAX's true identity, so why should one believe
>that the LSBian bit order of the M68000 is because "the Big-Endians were
>the outlaws" ? I suspect that the true reason behind the inconsistency 
>of the M68000 is the fact that only with an LSBian bit order, the value of 
>bit number 'i' in a word is always equal to  
>
>		b[i] * 2^i
>
>(where b[i] is 0 or 1 according to bit number 'i', and 2^i is 2 to the power 
>of i) 
>and the designers of M68000 wanted to keep this important feature in spite of 
>the overall MSBian architecture.

	The bit notation is strictly notational, and has no bearing on
	the operation of the cpu.   One could go entirely through any
	MC68000 book and replace the bit-numbers appropriately throughout.

	Bit notation (what way you number which bits) has no relation
	to byte ordering.  I could number my bits:
		13579BDF2468ACE
		000000000000001
	to represent the number 1 if I so chose.  Saying that byte ordering
	that proceeds in the opposite direction to bit ordering is
	some how "inconsistent" is part of your argument, I would maintain
	that it is moot.
> [ long disscussion of how in LSBian machines, a movebyte 2,x and a
> movelong 2,x generate the same bits at location x]
>
>This may not seem to be a key issue, but it has some significance in type 
>conversion as illustrated by the following C program segment:
>
>{int i; char ch; ch = i;}
>
>The 'int' (assume int is 32 bits) value has to be converted to 'char' or byte. 
>In LSBian, this conversion is just a simple 'movb' (move byte) instruction 
>from 'i' to 'ch':
>
>		movb	i, ch
>
>since both byte and long-word contain the same value. 
>
>In MSBian it may involve an expensive bit field instruction (or worse, 
>shifts and ands). Luckily for the M68000, it is byte addressable, so the 
>compiler can do the trick and generate:
>
>		movb	i+3, ch
>
>So it is still a simple machine instruction, but it involves a small trick.

Actually, what should be done, in general, is something similar to the VAX
trick of having a conversion instruction, and do a

	cvt_lb	i,ch

Unfortunately, in the 68000 the type-conversion (sign extension/truncation)
operations only work between registers, and one is forced to resort to
kluges like mov.b i+3,ch in order to do the job efficiently.

>Not clean, but still consistent, as long as we stick to byte addressable
>memory. 
>
>But what about registers? registers are not byte addressable. 
>There is only one byte of a register that can be accessed by a 'movb' 
>instruction. All the other 3 bytes can be accessed only through bit field
>instructions (or worse, shifts and ands). 

This is true of big-endian and little-endian machines which are byte 
addressable....

>Let's look at another program segment:
>
> {extern  int fgetc(); char ch; ch = fgetc(file);}
>
>The C library routine 'fgetc' returns an 'int' result and it has to be
>converted to 'char'. Most implementations return function results in 
>register 0. 
>
>Assume that register D0 contains 'int' (32 bits) value 2,
>and so does the long-word at address 100.
>
>                MSB in lower address
>
>address	       100  101  102  103  104
>                +----+----+----+----+
>value           | 00 | 00 | 00 | 02 | 
>                +----+----+----+----+
>                +----+----+----+----+
>register D0     | 00 | 00 | 00 | 02 | 
>                +----+----+----+----+
>
>The instructions
>
>	movl	100,x
>	movl	D0,x
>
>both move a long-word containing value 0x00000002 to location 'x'.
>so
>	movb	100,x
>	movb	D0,x
>
>both should move a byte containing value 00 to locaion 'x'.

Okay so far, both big endian and little endian machines do this; so long
as the definition of "Location x" includes a *type*.  Whether that type
is a 17 bit wide field or a 1 byte wide feild, in either machine, locations
need types to make them consistent.  Type conversion could even be implemented 
to generate an exception on truncation of signifigant bits, and to sign extend 
appropriately.

>So the code generated for above program segment in a true consistent
>MSBian machine would be:
>
>	jbsr	fgets
>	movl	24,d1
>	lsrl	d1,d0
>	movb	d0,ch

Huh?  Once again you are defining "consistent" to suit your argument against
MSBian machines.
>
>But in M68000 this is not true. No shift operation is needed because
>a 'movb' instruction with a register operand takes the byte that contains 2,
>that is, the HIGH address bit, so the compiler can generate a
>
>	jbsr	fgets
>	movb	d0,ch

Once again you miss the fact that the longword at location x is different from 
the character at location x.  (This is, incidentally, a register-based type
conversion, treating d0 first as a long and then as a character, as per my
discussion above. The code should read:
	jsbr fgets
	cvt_lb d0,ch
It just so happens that a movb on a long word *does* convert long to byte.)
>
>In other words, we see that the byte/word/long-word overlap of registers 
>in the M68000 is implemented according to the more efficient LSBian way!!
>
Wait a minute.  First you claim that the MC68000 is MSBian, but that its
*registers* are LSBian????  What does byte order in memory have to do with 
registers???  Are you saying that the 68000 should load bytes into the
high order bits of a register rather than the low order bits in order to
be "consistent"?
>
>Conclusion:
>==========
>
>I have shown that there are two aspects in which the LSBian way is more 
>suitable and more efficent for binary computers. This is in addition
>to the argument of easier serial addition and multiplication that was mentioned
>in the article (though the latter is balanced, to some extent, by serial 
>comparison and division).
>
>The main argument left against the LSBians is the more readable MSBian
>dump format. I think that in the modern days of optimizing compilers and 
>symbolic debuggers, dumps are almost an extinct species, and please 
>let them stay that way. 
>
>I don't have any illusions. I don't expect Motorola to change their byte order
>after reading my article. I don't even expect users to prefer LSBian 
>machines just for the sake of beauty and consistency. 
>But I do hope that some day the LSBian method will prevail (or, maybe,
>someone will convince me of the superiority of the MSBian method...).
>
Unfortuntately that won't happen either -- the point I am trying to make
is that for computational purposes, they byte-ordering question is 
moot.  When you introduce the paradign of typed memory locations and
type conversion instructions, the two systems become computationally
equivalent.
>
>Uri Postavsky (utcs!syntron!orcisi!urip)
>
>		(currently with O.R.C Toronto,
>		  formerly with National Semiconductor Tel Aviv).
-- 
 Marc Mengel
 ...!ihnp4!cuuxb!mwm

amos@instable.UUCP (Amos Shapir) (01/11/87)

When I was working on CCI's 6/32, which is basically a vax-like architecture
turned MSB-ian, the problem that Uri mentioned (partial read/write to
registers) was hard to resolve. The result: on CCI 6/32, a register is
always long; that requires sign-extension, but that doesn't take more
time than a 'movb'.

-- 
	Amos Shapir
National Semiconductor (Israel)
6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel
(011-972) 52-522261  amos%nsta@nsc 34.48'E 32.10'N

mangler@cit-vax.Caltech.Edu (System Mangler) (01/12/87)

The advantage of little-endian is that it allows type punning,
giving the optimizer a little something extra to work with.

The advantage of big-endian is that it disallows type punning,
making it easy to catch non-portable programming practices.

Don Speck   speck@vlsi.caltech.edu  {seismo,rutgers,ames}!cit-vax!speck

braun@drivax.UUCP (Karl T. Braun (kral)) (01/12/87)

In article <760@orcisi.UUCP> urip@orcisi.UUCP writes:
- I think that in the modern days of optimizing compilers and 
- symbolic debuggers, dumps are almost an extinct species,...
- 	:
- 	:
- I don't have any illusions.

'nuf said.


-- 
kral		408/647-6112		...!{amdahl,ihnp4}!drivax!braun
"Who is Number One?"			"You are, Number Six!"

keithe@tekgvs.UUCP (Keith Ericson) (01/12/87)

Who stinkin cares?
As you stated in the lead in to your article, you're too late.
There's Motorola's way and there's Intel's way. Your way, except
that it resembles one of the above, is irrelevant.

radford@calgary.UUCP (Radford Neal) (01/14/87)

In article <1011@cuuxb.UUCP>, mwm@cuuxb.UUCP (Marc W. Mengel) writes:
> ... The point I am trying to make
> is that for computational purposes, they byte-ordering question is 
> moot.  When you introduce the paradign of typed memory locations and
> type conversion instructions, the two systems become computationally
> equivalent.

Not quite. Consider the following C code:

    int i;
    char c;

    void proc(x)
      char *x;
    { ...
      access to *x;
      ...
    }

    ...

    c = 'a';
    proc(&c); 
    i = 'a';
    proc(&i);

The two calls to 'proc' both deliver parameters for which *x is 'a' on a
little-endian machine, but they don't on a big-endian machine. The litte-
endian method thus allows a certain flexibility in typing that the
big-endian method doesn't. This is NOT just a notational matter.

I do not recommend use of this technique in C code, due to the obvious
portability problems. But if little-endian memory were entirely standard it
might be a usefull technique, though really you want a more general run-time
typing scheme if you want to do this sort of thing.

    Radford Neal
    The University of Calgary

bjorn@alberta.UUCP (Bjorn R. Bjornsson) (01/14/87)

In article <753@vaxb.calgary.UUCP>, radford@calgary.UUCP (Radford Neal) writes:
> The two calls to 'proc' both deliver parameters for which *x is 'a' on a
> little-endian machine, but they don't on a big-endian machine. The litte-
> endian method thus allows a certain flexibility in typing that the
> big-endian method doesn't. This is NOT just a notational matter.

Yes, this is what I call a subset property.  Of course Cohen because
of his bias never bothered to document this in his supposed plea for
peace.  This property was exhibited by all the usual data types on
a VAX until G and H format floating point came into general use.

Cohen focused on the fact that PDP-11/VAX floating point data rep-
resentations were not strictly little endian.  He was right of course
but I think that the subset property is worth mentioning.  If you
expect an F float by reference in a procedure, it's quite alright
for the caller to pass you a D float, what gets lopped off is the
least significant (substitute interesting for significant when
talking about integers).  This subset property is lost in a purely
little endian machine, such as the *86 from Intel.  What you get
there is pure garbage if a caller isn't careful to pass what's
expected.

But then a language that provides a reasonable facsimile of
IEEE floating point arithmetic on a VAX has to forego the
subset property, sacrificing the equal width exponents of
F and D, in favor of the greater exponent range of G format
floating point for double precision implementation.

So in the final analysis enter one more alternative into
the holy war:  The location invariant subset property.


	When my buss pass runs out,
	should I run out after it,
	should I pass,
	or just buzz?

			Bjorn R. Bjornsson
			alberta!bjorn

wsr@lmi-angel.UUCP (Wolfgang Rupprecht) (01/14/87)

In article <> mwm@cuuxb.UUCP (Marc W. Mengel) writes:
>	The bit notation is strictly notational, and has no bearing on
>	the operation of the cpu.   One could go entirely through any
>	MC68000 book and replace the bit-numbers appropriately throughout.
>
>	Bit notation (what way you number which bits) has no relation
>	to byte ordering.  I could number my bits:
>		13579BDF2468ACE
>		000000000000001
>	to represent the number 1 if I so chose.  Saying that byte ordering
>	that proceeds in the opposite direction to bit ordering is
>	some how "inconsistent" is part of your argument, I would maintain
>	that it is moot.

Unfortunately thats not the case. Look at the BSET, BCLR, BTST (bit
set, clear and test) instructions. If you do a:

	BSET.L d0, somelocation

with #31 in d0, you have set the MSB (byte 0!) of the long-word. This
is *LOW* endian.
-- 
Wolfgang Rupprecht	{harvard|decvax!cca|mit-eddie}!lmi-angel!wsr

joel@gould9.UUCP (Joel West) (01/16/87)

The 68000 is big-endian, as is the 68020.

Recently I saw the suggestion that the 68008 is little-endian,
except when fetching opcodes.  Could this be true?

If so, does anyone have any idea how many changes would be required
to programs and an OS to make them work on both processors?  It
boggles my mind...
-- 
	Joel West			     MCI Mail: 282-8879
	Western Software Technology, POB 2733, Vista, CA  92083
	{cbosgd, ihnp4, pyramid, sdcsvax, ucla-cs} !gould9!joel
	joel%gould9.uucp@NOSC.ARPA

lamaster@pioneer.UUCP (01/16/87)

In article <112@lmi-angel.UUCP> wsr@lmi-angel.UUCP (Wolfgang Rupprecht) writes:
>In article <> mwm@cuuxb.UUCP (Marc W. Mengel) writes:
>>	The bit notation is strictly notational, and has no bearing on
>>	the operation of the cpu.   One could go entirely through any
>>	MC68000 book and replace the bit-numbers appropriately throughout.
>
>Unfortunately thats not the case. Look at the BSET, BCLR, BTST (bit
>set, clear and test) instructions. If you do a:
>-- 
>Wolfgang Rupprecht	{harvard|decvax!cca|mit-eddie}!lmi-angel!wsr

There is a difference between a) the data formats  b)  addressing
conventions   c) instructions  d) notation and documentation.  
Few machines are consistently big-endian
or little endian on all four.  The CDC Cyber 205 is one of the few
machines that is consistently big-endian in ALL aspects.  It is BIT
addressable, byte addressable, 32-bit word addressable, and 64-bit
word addressable, with completely mutually consistent addresses for
each.  And, a plus, it even shows the formats in the documentation
numbered from the left.  I believe that the National Semiconductor
NS 32000 series MAY be a completely consistent little endian.  If it
is, then that too would be something of an accomplishment.
(Remembering what Emerson said about consistency, it is still
a valuable property of computer instruction sets).

The Motorola MC68000 series is big-endian in data formats, but is
inconsistent in instructions and addressing.  The VAX machines are
distinctive in that they are not even consistent in data formats.
I am not sure about the Intel machines.  Perhaps an Intel booster
could enlighten me.

As one of the instigators of the current discussion, I would like to
summarize some of what I have learned from subsequent arguments:
a) Some very very small machines may save a little bit of temporary
register space by being little endian (I looked up some hardware 
diagrams and discovered that none of the more recent PDP-11's used
reduced size memory data registers, but some of the early small
machines might have...I don't know)
b) Some variable length memory to memory instructions could be
benefited by using little endian notation.
c) Most people assume that there are other easy to prove properties
of "their" personal favorite ("If God had wanted us to use Hex, He
would have given us 16 fingers").  To date, I have not seen any such
argument that holds water. 
d) Some people believe that systems programmers don't have to read
dumps (Ha Ha Ha!).  (The same people believe that Pascal HAS ALREADY
REPLACED Cobol and Fortran :-) :-) )

As I said before, those of us who work in multi-user environments
need to be able to move binary data files between machines.  The
IEEE floating point standard was the declaration of independence;
now we are fighting the war.  Big manufacturers will never willingly
make it easier to move data between machines.  Fortunately, small
manufacturers will.  It is important that a standard for the mapping
of integers and floating point numbers onto bits, bytes, and words,
be developed.  Before this effort can succeed, it may be necessary
for microprocessor and workstation manufacturers to agree.  Why not
form an IEEE committee?

One last point:  As a systems programmer, I think it would be nice
if data formats, addressing, instruction formats, and notation were
all consistently big-endian or little-endian.  As a user, all I really
care about are that the data formats are consistently one way or the
other on ALL machines.  All the mainframes that I know of have
big-endian data formats; so does the (by far) most popular
microprocessor
for engineering and scientific use (MC68020).  That gives big-endians
a big head start.  But the (to be formed) committee may vote the other
way.  As long as a standard is chosen, I don't care particularly.   

By the way, for those that missed it, the Danny Cohen article that is
often referred to is in the October '81 IEEE Computer magazine.



   Hugh LaMaster, m/s 233-9,   UUCP:  {seismo,hplabs}!nike!pioneer!lamaster 
   NASA Ames Research Center   ARPA:  lamaster@ames-pioneer.arpa
   Moffett Field, CA 94035     ARPA:  lamaster%pioneer@ames.arpa
   Phone:  (415)694-6117       ARPA:  lamaster@ames.arc.nasa.gov

"He understood the difference between results and excuses."

("Any opinions expressed herein are solely the responsibility of the
author and do not represent the opinions of NASA or the U.S. Government")

jer@ipmoea.UUCP (Eric Roskos) (01/17/87)

uri@orcisi.UUCP writes:

>My point is that the Least-Significant-Byte-first camp (LSBians,
>pronounced: elesbians) has a more correct way than the Most-Significant-byte-
>first (MSBians, pronounced: emesbians), and I am going to try to convince
>the MSBians to go my way. 

It seems to me that the basic premise of the argument that follows the above
statement is that the name (memory address) bound to a long word in memory
is different from the name bound to its least significant byte, whereas
the same name (register number) references a longword (for a
long register operand) or its least significant byte (for a byte operand).
The poster argues that this is inconsistent.

In fact, the problem (and the cause of the whole debate, I often think) is that
separate bytes in a longword in memory have different names bound to them
at all.  What is the 2nd byte of a 4-byte integer good for, on its own?
Why should it be separately addressable at all?  

In the 68000, registers, however transiently, have a type associated with
them that determines their length; but the individual bytes of a register,
when it's a longword register, do not have separate names.
Register 5 is register 5, though its size varies depending on its type
(which persists only for the duration of 1 instruction).  Most problems
arise when you take a datum of one type and suddenly start treating it
as a different type; that is the real inconsistency.

jbn@glacier.STANFORD.EDU (John B. Nagle) (01/17/87)

big-endian, but TP4/MAP is little-endian. 
      Incidentally, there are some fields in X.25 control frames where the
BITS are in reverse order, for obscure historical reasons.

					John Nagle

radford@calgary.UUCP (01/17/87)

In article <980@gould9.UUCP>, joel@gould9.UUCP (Joel West) writes:
> The 68000 is big-endian, as is the 68020.
> 
> Recently I saw the suggestion that the 68008 is little-endian,
> except when fetching opcodes.  Could this be true?

No, it couldn't be true. A more ridiculous design decision would be hard
to imagine. Come to think of it, that may not be all that strong an 
argument :-)

Anyway, I've actually used both 68000 and 68008 processors, and can assure
you that they are pretty much compatible. According to the documentation,
they even went so far as to have the 68008 trap word accesses at an odd
address (I didn't actually try it). Personally, I think this was going 
too far, especially as (I am given to understand) the 68020 doesn't. (I
know, I know, the 68008 came earlier...)

    Radford Neal
    The University of Calgary

papowell@umn-cs.UUCP (Patrick Powell) (01/18/87)

In order to spare the innocent,  I will briefly note:

mumble proc(x)
	char *x;
	{ .... }


int i;
char c;
(void) proc(&c); <--O.K.
(void) proc(&i); <--SSSSSSSSSS BOOOOOO

Since when do you assume that a pointer to int has the same FORM
as a pointer to char?  Sigh...

I think that the originator of the idea has to work on a word based machine
for his sins.

Patrick ("Two days programming in FORTRAN IV on a IBM 7090 or
		50 lashes"  "I'll take the lashes!!!") Powell
-- 
Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE,
University of Minnesota,  Minneapolis, MN 55455 (612)625-3543/625-4002

davidsen@steinmetz.steinmetz.UUCP (william E Davidsen) (01/20/87)

In article <172@ames.UUCP> lamaster@pioneer.UUCP (Hugh LaMaster) writes:
 >The Motorola MC68000 series is big-endian in data formats, but is
 >inconsistent in instructions and addressing.  The VAX machines are
 >distinctive in that they are not even consistent in data formats.
 >I am not sure about the Intel machines.  Perhaps an Intel booster
 >could enlighten me.

The Intel 80*86 series seems to be consistent in being little endian in
the integer formats. The first (lowest address) byte is at the same
address for all integer data formats, followed by the other bytes, if
any, in increasing order. This is not true of text strings, which have
the first byte at the lowest location, including BCD numbers, as I
recall.
 >
 >As one of the instigators of the current discussion, I would like to
 >summarize some of what I have learned from subsequent arguments:
 >a) Some very very small machines may save a little bit of temporary
 >register space by being little endian (I looked up some hardware 
 >diagrams and discovered that none of the more recent PDP-11's used
 >reduced size memory data registers, but some of the early small
 >machines might have...I don't know)

It does allow some code to be simpler when processing the same data
using several word lengths. This is in the area of making the code
easier to write, rather than (necessarily) more efficient.

 >b) Some variable length memory to memory instructions could be
 >benefited by using little endian notation.

Very much was is implied by (a).

 >c) Most people assume that there are other easy to prove properties
 >of "their" personal favorite ("If God had wanted us to use Hex, He
 >would have given us 16 fingers").  To date, I have not seen any such
 >argument that holds water.

How many fingers do you have? Seriously, the notation should be a
factor of the word length. Machines which have 9 bit bytes, such as the
GE/Honeywell/Hitachi series, DEC 10/20, etc, are much easier to
understand in octal. I don't see that this effects endian at all.

 >d) Some people believe that systems programmers don't have to read
 >dumps (Ha Ha Ha!).  (The same people believe that Pascal HAS ALREADY
 >REPLACED Cobol and Fortran :-) :-) )

See previous comment.
 >
 >As I said before, those of us who work in multi-user environments
 >need to be able to move binary data files between machines.  The
 >IEEE floating point standard was the declaration of independence;
 >now we are fighting the war.

I wrote a little routine on the Cray to generate binary files for a VAX
or PC (running a BASIC program yet) in about 2 minutes and 4 lines of
code. The war isn't in integers, but float, where everyone uses what
they like, and IEEE if they feel like it. The micros are MUCH better at
staying with the standard. For what it's worth, it was far easier to
output the integers LSB first on any machine, since machines with short
word lengths may not have the bits needed to output MSB first.

 >                              Big manufacturers will never willingly
 >make it easier to move data between machines.  Fortunately, small
 >manufacturers will.  It is important that a standard for the mapping
 >of integers and floating point numbers onto bits, bytes, and words,
 >be developed.  Before this effort can succeed, it may be necessary
 >for microprocessor and workstation manufacturers to agree.  Why not
 >form an IEEE committee?

What am I missing? FP and bytes I can see, but bits?
 >
 >One last point:  As a systems programmer, I think it would be nice
 >if data formats, addressing, instruction formats, and notation were
 >all consistently big-endian or little-endian.

That's one of the things I don't like about the 680?0 series, I would
expect an integer with a value of 0x12345678 to be in memory as either
12345678, or 87654321, or 78563412 even, but not 34127856. I consider
that "middle-endian". (yes I have att7300, Suns, etc, save you flames
about how much I would like them)
 >                                               As a user, all I really
 >care about are that the data formats are consistently one way or the
 >other on ALL machines.  All the mainframes that I know of have
 >big-endian data formats; so does the (by far) most popular
 >microprocessor
 >for engineering and scientific use (MC68020).

Does that mean that if it uses a National or Intel chip it isn't a
workstation?
 >                                               That gives big-endians
 >a big head start.  But the (to be formed) committee may vote the other
 >way.  As long as a standard is chosen, I don't care particularly.   
 >
 >By the way, for those that missed it, the Danny Cohen article that is
 >often referred to is in the October '81 IEEE Computer magazine.
 >
 >   Hugh LaMaster, m/s 233-9,   UUCP:  {seismo,hplabs}!nike!pioneer!lamaster 
 >   NASA Ames Research Center   ARPA:  lamaster@ames-pioneer.arpa
 >   Moffett Field, CA 94035     ARPA:  lamaster%pioneer@ames.arpa
 >   Phone:  (415)694-6117       ARPA:  lamaster@ames.arc.nasa.gov
 >
 >"He understood the difference between results and excuses."
 >
 >("Any opinions expressed herein are solely the responsibility of the
 >author and do not represent the opinions of NASA or the U.S. Government")

Anyone trying to categorize me as a lover of one endian over the other
has not read my comments carefully. I hate trying to read 680?0 dumps,
and fortunately don't do it often.
-- 
bill davidsen			sixhub \
      ihnp4!seismo!rochester!steinmetz ->  crdos1!davidsen
				chinet /
ARPA: davidsen%crdos1.uucp@crd.ge.com (or davidsen@crd.ge.com)

jesup@steinmetz.steinmetz.UUCP (Randell Jesup) (01/21/87)

In article <1116@steinmetz.steinmetz.UUCP> davidsen@kbsvax.UUCP (william E Davidsen) writes:
>That's one of the things I don't like about the 680?0 series, I would
>expect an integer with a value of 0x12345678 to be in memory as either
>12345678, or 87654321, or 78563412 even, but not 34127856. I consider
>that "middle-endian".

I'm not sure how you're dumping the memory, but I find it hard to
understand how you could get anything back but 12345678.  If a long-word
of 0x12345678 is stored at location 0, location 0 will contain 0x12,
location 1 will contain 0x34, location 2 will contain 0x56, and location
3 will contain 0x78.  The only way I can see for you to get anything but is
to dump it with a little-endian dump program.

The only cases where I can see little-endian having any advantage is machines
that have (or are descendants of machines that have) differing internal
and external word sizes.  Any machine that can read and write a register's
worth of data at a time gets no performance improvement from little-
endedness.

	Randell Jesup
	jesup@steinmetz.uucp (seismo!rochester!steinmetz!jesup)
	jesup@ge-crd.arpa

guy%gorodish@Sun.COM (Guy Harris) (01/22/87)

>  >One last point:  As a systems programmer, I think it would be nice
>  >if data formats, addressing, instruction formats, and notation were
>  >all consistently big-endian or little-endian.
> 
> That's one of the things I don't like about the 680?0 series, I would
> expect an integer with a value of 0x12345678 to be in memory as either
> 12345678, or 87654321, or 78563412 even, but not 34127856. I consider
> that "middle-endian".

Could you please explain how you got "34127856" out of this?  If you read
the nibbles in a byte left-to-right, and sequence through the value
0x12345678 from byte 0 to byte 3 (which is the same as reading the bytes in
a word left-to-right), you get "12345678".

The only way *I* can see getting "34127856" is if you read the first 16-bit
word of 0x12345678 first (i.e., the 16-bit word whose 0th byte has the
lowest address) followed by the second 16-bit word; read the low-order byte
of the word first (i.e., the one in the least-significant bits, i.e.  the
one with the *highest* address), and read the two nibbles of each byte from
left to right (i.e., the nibble in the most-significant bits first).  If you
read the bits from the most-significant bit to the least-significant bit,
you are also reading the (bytes, words) from the lowest address to the
highest, and you get "12345678".

The 680x0 is perfectly consistently big-endian here; it's your way of
reading it that's "middle-endian" and inconsistent.

>  >All the mainframes that I know of have big-endian data formats; so does
>  >the (by far) most popular microprocessor for engineering and scientific
>  >use (MC68020).
> 
> Does that mean that if it uses a National or Intel chip it isn't a
> workstation?

Of course not; no reasonable reading of his statement could possibly lead to
that conclusion.  1) It didn't say "workstation" anywhere.  2) It didn't say
that only Motorola chips were used in engineering and scientific machines,
just that they were the most popular chip for those sorts of machines.

A couple of points:

	1) The aesthetics of an architecture is rather subjective.  Person
	   A may think that a machine that has a nice cute one-to-one
	   mapping between constructs in some high-level language and
	   machine primitives is Beautiful and therefore Good; person B
	   may think that a machine that has primitives that can be
	   used as simple building blocks to express the operations actually
	   performed by programs is Beautiful and therefore Good.  As such,
	   the aesthetics of little-endian, big-endian, middle-endian, or
	   whatever machines is really rather irrelevant to their merits as
	   designs.

	2) The chances that any given vendor will change their byte order
	   because 1) people say the current one is ugly or 2) some
	   committee decides that the other byte order is the One True
	   Way are somewhere between zip and nil.  You're just going to
	   have to live with different byte orders, and use data
	   representation conventions when exchanging data between machines.

socha@drivax.UUCP (01/23/87)

in reply to: radford@calgary.UUCP (Radford Neal)
> In article <980@gould9.UUCP>, joel@gould9.UUCP (Joel West) writes:
> > The 68000 is big-endian, as is the 68020.
> > 
> > Recently I saw the suggestion that the 68008 is little-endian,
> > except when fetching opcodes.  Could this be true?
> Anyway, I've actually used both 68000 and 68008 processors, and can assure
> you that they are pretty much compatible. According to the documentation,
                    ^^^^^^^^^^^
>     Radford Neal     The University of Calgary

Actually, the 68008 is internally EXACTLY a 68000 with the exception of an
interface to an 8 bit bus. i.e. the ALU, microcode, registers all no change!
Just whenever the 68000 wants to do a 16 bit read, there are two memory cycles
to get the LSB and MSB at their correct 68000 addresses.

----


-- 
UUCP:...!amdahl!drivax!socha                                      WAT Iron'75
"Everything should be made as simple as possible but not simpler."  A. Einstein

lamaster@pioneer.arpa (Hugh LaMaster) (01/23/87)

In article <11858@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
>
>A couple of points:
>
>	1) The aesthetics of an architecture is rather subjective.  Person

           Agreed.  I note, though, that wierd (e.g. VAX floating point)
           data formats can, in addition to being unaesthetic, cost
           users real money when they need to move data between machines
           of different types.
 
>
>	2) The chances that any given vendor will change their byte order
>	   because 1) people say the current one is ugly or 2) some
>	   committee decides that the other byte order is the One True
>	   Way are somewhere between zip and nil.  You're just going to
>	   have to live with different byte orders, and use data
>	   representation conventions when exchanging data between machines.

           I have to disagree with this.  When the IEEE floating point
           standard was first proposed, everyone said that that it would
           never fly.  Well, it has been almost ten years, but in fact
           almost all new workstations and minicomputers are using it.
           If bit-byte-word order can be resolved, in ten years everyone
           will be using it.  And, in the meantime, software could be
           written knowing what the standard order will be.

           In any case, with networked environments becoming the norm,
           lots of sites are going to want to be able to move binary
           data between machines.  Don't expect that this issue will
           go away.


   Hugh LaMaster, m/s 233-9,   UUCP:  {seismo,hplabs}!nike!pioneer!lamaster 
   NASA Ames Research Center   ARPA:  lamaster@ames-pioneer.arpa
   Moffett Field, CA 94035     ARPA:  lamaster%pioneer@ames.arpa
   Phone:  (415)694-6117       ARPA:  lamaster@ames.arc.nasa.gov

"He understood the difference between results and excuses."

("Any opinions expressed herein are solely the responsibility of the
author and do not represent the opinions of NASA or the U.S. Government")

guy%gorodish@Sun.COM (Guy Harris) (01/23/87)

>            I have to disagree with this.  When the IEEE floating point
>            standard was first proposed, everyone said that that it would
>            never fly.  Well, it has been almost ten years, but in fact
>            almost all new workstations and minicomputers are using it.

Note the word "new" here.  I said that the chances of getting any
current vendor to change their byte order based on some standard are
somewhere between zip and nil, and I stand by that claim.  If IEEE
chooses a standard byte order, the chances that Motorola, IBM, Intel,
National Semiconductor, DEC, or whoever will declare a flag day and
that the 68000, 370, 8086, NS32000, VAX, etc. families will instantly
change their byte order are, again, somewhere between zip and nil.
Neither the 370 nor the VAX have adopted the IEEE standard, so it
doesn't give any encouragement that *existing* machines will somehow
adopt an IEEE byte order.

We won't even discuss: the chances that a committee that will,
presumably, have Intel, DEC, and National Semiconductor on it will
choose a big-endian byte order; the chances that a committee that
will, presumably, have Motorola on it will choose a little-endian
byte order; or the chances that a committee that will, presumably,
have IBM on it will choose *any* byte order (they have one major
big-endian line, and one little-endian line, and there is no way in
hell that they're going to blow either of those lines out of the
water).

>            If bit-byte-word order can be resolved, in ten years everyone
>            will be using it.

Give me a break.  *Everyone*?  If we choose a big-endian byte order,
in ten years all the IBM PCs out there will become big-endian?  If we
choose a little-endian byte order, in ten years all the 370s out
there will become little-endian?

(I remind you that the ASCII character set has been adopted as a
standard, and there are still machines that use - and will continue
to use - EBCDIC internally.  It's been more than 10 years since ASCII
was adopted.)

>	     And, in the meantime, software could be written knowing what
>	     the standard order will be.

And that software will not run on a very large percentage of the
machines out there.

You don't seem to realize that the format of integral data,
addresses, instructions, etc. are several orders of magnitude more
central that the format of floating-point numbers.  DEC has adopted
floating-point formats in the VAX (G and H), but they still support
the old formats.  In theory, if they felt so motivated they could
probably add IEEE formats as well.  It's nowhere near that simple for
byte order; if DEC were to adopt a big-endian byte order for the VAX,
they'd have to add big-endian versions of most of their instructions
to the VAX instruction set, or add a mode bit to select the byte
order.  If they did the latter, they'd have to have software that
could deal with data files, etc. with either byte order, and two
versions of library routines, and two versions of system calls, etc.,
etc., etc..

>            In any case, with networked environments becoming the norm,
>            lots of sites are going to want to be able to move binary
>            data between machines.  Don't expect that this issue will
>            go away.

Sigh.  Have you ever heard of X.400, or Sun XDR, or...?  I'm *quite*
aware that there is an issue of moving binary data *between*
machines.  This does not, however, mandate a standard byte order for
representing data internally on machines!  Yes, having to convert
data into some standard external form when passing it between
machines is a nuisance; however, the inconvenience involved in making
this conversion is much smaller than the inconvenience involved in
changing something as fundamental as the byte order in an
architecture that has a large existing software and data base.

I certainly don't expect that this issue will go away, for one simple
reason - it'll be a cold day in Hell before all the machines out
there have the same byte order!