jer@ipmoea.UUCP (Eric Roskos) (01/19/87)
Here is another side to this debate which I was just thinking about. I apologise if this aspect has already been discussed; I just resumed reading this group a few weeks ago after a long separation from it. A lot of the arguments for byte order can be looked at in terms of positional number systems. (I'm thinking here specifically of data of various sizes in memory that represent numbers, as opposed to arbitrary data such as flags or characters). Now, consider that the representation of a number is just that, a representation, and the number is the same no matter how it is represented, whether in base 2, or base 8, or in Roman Numerals. Now, on digital computers we generally represent numbers in a base-2 positional notation. However, for notational convenience we often write these numbers in bases which are a power of two, because if you write a number in some base b, then the same number in base b**n represents a grouping of each set of n consecutive digits in base b into a single digit in base b**n. I had said recently that there is no reason to address the individual bytes in an integer word of more than 1 byte. Thinking about it some more, I realized that there is one case in which this is useful, specifically if you treat the integer, on a machine with an n-bit byte, as a number in base 2**n. This is a strange case, but it is true that it is reasonable and consistent, for example, to think of a 32-bit integer on a machine with 8-bit bytes as a 4-digit number in base 256. The only immediately useful result this leads to (that I can think of) is that, for consistency, on an MSB-ian machine you should number bits in a word such that the high-order bit in the word is bit 0. The reason for this is that it seems highly desirable that, on a given machine, for numbers in any base the exponent for the i'th digit of the number should be invariant; only the base should change. In other words, for consistency it seems desirable that 0001 0010 0011 0100 in binary should be written 1234 in hexadecimal and 011064 in octal; you shouldn't, for instance, write 4321 in hexadecimal just because it is base 16. If you "number" or index the digits in the digit strings in any base, it also seems desirable that the same index should apply to all bases; so if 0 is the leftmost digit in base 16, it should be the leftmost digit in base 2. Taking these two together leads to the conclusion above. This has the undesirable side-effect that the index for the digits doesn't correspond to their exponents. However, this is only one isolated problem. You could argue that in order to make the digit index equal to the exponent, you should use a LSB-ian architecture. But this, in turn, leads to the dump-reading problem; you can fix the dump-reading problem by making dumps on an LSB-ian machine go from large addresses to small addresses, but then strings dump backwards; you can fix strings by assigning characters in strings to addresses in descending order, and making subscripts on strings work via subtracting the subscript from the base address rather than adding, and everything works fine unless someone assumes the addresses were assigned the other way (which they shouldn't, if you enforce strong typing). In any case, it's evident that both LSB-ian and MSB-ian machines have their drawbacks, which is probably why everyone has been arguing about them for several weeks in here. However, the above raises an interesting question. Given the fact that the dump-reading problem suggests putting strings in memory such that the leftmost byte of the string when it's printed is at the highest address, is that a reasonable approach? It seems to be inconsistent with the above "indexing" property, because the 0th element of the string is at the highest address, whereas the whole argument on bit numbering was based on the idea that the 0th digit of the number should be the digit that is at the lowest-numbered address. On the other hand, is it really inconsistent to do so? Or should strings be treated as being different from numbers? Or from arrays in general?
turk@apple.UUCP (Ken "Turk" Turkowski) (01/22/87)
Big or little-endian ordering makes more sense if you think of the numbers being fractions or integers, respectively. On a big-endian system, the most natural way to a number in terms of its digits is as: d d d d d ... -1 -2 -3 -4 -5 The indices are associated with powers of the radix, so here the numbers are considered to be fractions, having a value less than 1. For brevity, we may eliminate the minus signs, or start the digit indexing at 0, so that the following variants exist: d d d d d ... d d d d d ... d d d d d ... 1 2 3 4 5 0 1 2 3 4 -0 -1 -2 -3 -4 Whereas on a little-endian system it is most natural to have the digits indexed in the following [more familiar] way: ...d d d d d 4 3 2 1 0 as is suitable for an integer. For communication purposes, the digit with index closest to 0 is the one that is transmitted first. If the base of the digit is 2, then we have big- or little- endian BIT ordering, whereas if the base of the digit is 256, we have big- or little-endian BYTE ordering. -- Ken Turkowski @ Apple Computer, Inc., Cupertino, CA UUCP: {sun,nsc}!apple!turk CSNET: turk@Apple.CSNET ARPA: turk%Apple@csnet-relay.ARPA