[comp.arch] byte order and positional number systems

jer@ipmoea.UUCP (Eric Roskos) (01/19/87)

Here is another side to this debate which I was just thinking about.  I
apologise if this aspect has already been discussed; I just resumed
reading this group a few weeks ago after a long separation from it.

A lot of the arguments for byte order can be looked at in terms of
positional number systems.  (I'm thinking here specifically of data of
various sizes in memory that represent numbers, as opposed to arbitrary
data such as flags or characters).  Now, consider that the representation
of a number is just that, a representation, and the number is the same
no matter how it is represented, whether in base 2, or base 8, or in
Roman Numerals.

Now, on digital computers we generally represent numbers in a base-2
positional notation.  However, for notational convenience we often
write these numbers in bases which are a power of two, because if you
write a number in some base b, then the same number in base b**n represents
a grouping of each set of n consecutive digits in base b into a single
digit in base b**n.

I had said recently that there is no reason to address the individual
bytes in an integer word of more than 1 byte.  Thinking about it some more,
I realized that there is one case in which this is useful, specifically if
you treat the integer, on a machine with an n-bit byte, as a number
in base 2**n.

This is a strange case, but it is true that it is reasonable and consistent,
for example, to think of a 32-bit integer on a machine with 8-bit bytes
as a 4-digit number in base 256.

The only immediately useful result this leads to (that I can think of) is
that, for consistency, on an MSB-ian machine you should number bits in a
word such that the high-order bit in the word is bit 0.  The reason for
this is that it seems highly desirable that, on a given machine, for numbers
in any base the exponent for the i'th digit of the number should be
invariant; only the base should change.  In other words, for consistency
it seems desirable that 0001 0010 0011 0100 in binary should be written
1234 in hexadecimal and 011064 in octal; you shouldn't, for instance,
write 4321 in hexadecimal just because it is base 16.  If you "number" or
index the digits in the digit strings in any base, it also seems desirable that
the same index should apply to all bases; so if 0 is the leftmost digit
in base 16, it should be the leftmost digit in base 2.  Taking these two
together leads to the conclusion above.

This has the undesirable side-effect that the index for the digits doesn't
correspond to their exponents.  However, this is only one isolated
problem.  You could argue that in order to make the digit index equal to
the exponent, you should use a LSB-ian architecture.  But this, in turn,
leads to the dump-reading problem; you can fix the dump-reading problem
by making dumps on an LSB-ian machine go from large addresses to small
addresses, but then strings dump backwards; you can fix strings by assigning
characters in strings to addresses in descending order, and making subscripts
on strings work via subtracting the subscript from the base address rather
than adding, and everything works fine unless someone assumes the addresses
were assigned the other way (which they shouldn't, if you enforce strong
typing).

In any case, it's evident that both LSB-ian and MSB-ian machines have their
drawbacks, which is probably why everyone has been arguing about them for
several weeks in here.  However, the above raises an interesting question.
Given the fact that the dump-reading problem suggests putting strings
in memory such that the leftmost byte of the string when it's printed is
at the highest address, is that a reasonable approach?  It seems to be
inconsistent with the above "indexing" property, because the 0th element
of the string is at the highest address, whereas the whole argument on
bit numbering was based on the idea that the 0th digit of the number should
be the digit that is at the lowest-numbered address.  On the other hand,
is it really inconsistent to do so?  Or should strings be treated as
being different from numbers?  Or from arrays in general?

turk@apple.UUCP (Ken "Turk" Turkowski) (01/22/87)

Big or little-endian ordering makes more sense if you think of the numbers
being fractions or integers, respectively.  On a big-endian system, the most
natural way to a number in terms of its digits is as:

	d  d  d  d  d  ...
	 -1 -2 -3 -4 -5

The indices are associated with powers of the radix, so here the
numbers are considered to be fractions, having a value less than 1.
For brevity, we may eliminate the minus signs, or start the digit
indexing at 0, so that the following variants exist:

	d d d d d ...	d d d d d ...	d  d  d  d  d  ...
	 1 2 3 4 5	 0 1 2 3 4	 -0 -1 -2 -3 -4

Whereas on a little-endian system it is most natural to have the digits
indexed in the following [more familiar] way:

	...d d d d d
	    4 3 2 1 0

as is suitable for an integer.

For communication purposes, the digit with index closest to 0 is the
one that is transmitted first.  If the base of the digit is 2, then we
have big- or little- endian BIT ordering, whereas if the base of the
digit is 256, we have big- or little-endian BYTE ordering.
-- 
Ken Turkowski @ Apple Computer, Inc., Cupertino, CA
UUCP: {sun,nsc}!apple!turk
CSNET: turk@Apple.CSNET
ARPA: turk%Apple@csnet-relay.ARPA