rbi@beaufort.cs.wisc.edu (Bruce Irvin) (02/02/90)
Could someone send me (or post) a brief summary of the major issues in choosing Big or Little Endian byte ordering for an architecture? Alternatively, does anyone have a reference to an article which has a good discussion of this topic? Thank you, Bruce Irvin. (rbi@cs.wisc.edu)
gerry@zds-ux.UUCP (Gerry Gleason) (02/03/90)
In article <9656@spool.cs.wisc.edu> rbi@beaufort.cs.wisc.edu (Bruce Irvin) writes: > Could someone send me (or post) a brief summary of the >major issues in choosing Big or Little Endian byte ordering >for an architecture? Pretty much everyone agrees that it is a wash with respect to performance, functionality, and the other important issues, except for a few zealots. The main issue is which of a few tricks that can be pulled: on a big- endian machine, you can compare alligned (and padded) strings word at a time instead of byte by byte, or on a little endian that byte, half-word or word accesses through the same pointer will get the same value assuming you haven't truncated any important bits. Of course, both of these are tricks, and the second could be considered a disadvantage because it makes it more likely that incorect code might work much of the time. Btw, has anyone else seen the extended rhyme on this topic. It's a takeoff on Swift's satirical piece about which end an egg should be opened from (it's in Guliver's Travels if I'm not mistaken). If someone has it, could you email or post it. Gerry Gleason
henry@utzoo.uucp (Henry Spencer) (02/03/90)
In article <9656@spool.cs.wisc.edu> rbi@beaufort.cs.wisc.edu (Bruce Irvin) writes: > Could someone send me (or post) a brief summary of the >major issues in choosing Big or Little Endian byte ordering >for an architecture? My impression is that the big issue is usually backward compatibility with previous mistakes. As far as *technical* issues, the only ones I'm aware of weakly favor big-endian: it's the network standard order, which makes life easier for protocol code, and it makes for consistent bit ordering in things like frame buffers (do the dots on the screen run left-to-right in successive bytes, or successive words? -- on a little-endian machine, they're not the same), which simplifies the code for optimized raster operations. -- 1972: Saturn V #15 flight-ready| Henry Spencer at U of Toronto Zoology 1990: birds nesting in engines | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
sl@van-bc.UUCP (Stuart Lynne) (02/03/90)
In article <148@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes: >In article <9656@spool.cs.wisc.edu> rbi@beaufort.cs.wisc.edu (Bruce Irvin) writes: >> Could someone send me (or post) a brief summary of the >Btw, has anyone else seen the extended rhyme on this topic. It's a takeoff >on Swift's satirical piece about which end an egg should be opened from >(it's in Guliver's Travels if I'm not mistaken). If someone has it, could >you email or post it. > From ubc-vision!alberta!ihnp4!cbatt!clyde!rutgers!lll-crg!hoptoad!gnu Tue Dec 2 23:49:31 PST 1986 Article 26 of comp.sys.m68k: Path: van-bc!ubc-vision!alberta!ihnp4!cbatt!clyde!rutgers!lll-crg!hoptoad!gnu >From: gnu@hoptoad.uucp (John Gilmore) Newsgroups: comp.sys.m68k,comp.arch,comp.sys.intel Subject: Byte Order: On Holy Wars and a Plea for Peace Message-ID: <1364@hoptoad.uucp> Date: 30 Nov 86 01:29:46 GMT References: <1509@ihlpl.UUCP> <1335@hoptoad.uucp> Followup-To: comp.arch Organization: Nebula Consultants in San Francisco Lines: 829 Xref: van-bc comp.sys.m68k:26 comp.arch:38 comp.sys.intel:52 [Not a single person objected to my posting this, so here it is. Mod.sources.doc seems to be dead, so I am posting this here. Factual followups to comp.arch, please. Send flames to yourself via email. Note that the date of the article is 1980, so there are a few things that have changed since then; nevertheless, the spirit of the article is still relevant. --gnu] IEN 137 Danny Cohen U S C/I S I 1 April 1980 ON HOLY WARS AND A PLEA FOR PEACE INTRODUCTION This is an attempt to stop a war. I hope it is not too late and that somehow, magically perhaps, peace will prevail again. The latecomers into the arena believe that the issue is: "What is the proper byte order in messages?". The root of the conflict lies much deeper than that. It is the question of which bit should travel first, the bit from the little end of the word, or the bit from the big end of the word? The followers of the former approach are called the Little-Endians, and the followers of the latter are called the Big-Endians. The details of the holy war between the Little-Endians and the Big-Endians are documented in [6] and described, in brief, in the Appendix. I recommend that you read it at this point. [I have inserted it -- gnu] 13 A P P E N D I X Some notes on Swift's Gulliver's Travels: Gulliver finds out that there is a law, proclaimed by the grandfather of the present ruler, requiring all citizens of Lilliput to break their eggs only at the little ends. Of course, all those citizens who broke their eggs at the big ends were angered by the proclamation. Civil war broke out between the Little-Endians and the Big-Endians, resulting in the Big-Endians taking refuge on a nearby island, the kingdom of Blefuscu. Using Gulliver's unquestioning point of view, Swift satirizes religious wars. For 11,000 Lilliputian rebels to die over a controversy as trivial as at which end eggs have to be broken seems not only cruel but also absurd, since Gulliver is sufficiently gullible to believe in the significance of the egg question. The controversy is important ethically and politically for the Lilliputians. The reader may think the issue is silly, but he should consider what Swift is making fun of the actual causes of religious- or holy-wars. In political terms, Lilliput represents England and Blefuscu France. The religious controversy over egg-breaking parallels the struggle between the Protestant Church of England and the Catholic Church of France, possibly referring to some differences about what the Sacraments really mean. More specifically, the quarrel about egg-breaking may allude to the different ways that the Anglican and Catholic Churches distribute communion, bread and wine for the Anglican, but bread alone for the Catholic. The French and English struggled over more mundane questions as well, but in this part of Gulliver's Travels, Swift points up the symbolic difference between the churches to ridicule any religious war. For ease of reference please note that Lilliput and Little-Endians both start with an "L", and that both Blefuscu and Big-Endians start with a "B". This is handy while reading this note.] [End of appendix -- gnu] The above question arises from the serialization process which is performed on messages in order to send them through communication media. If the communication unit is a message - these problems have no meaning. If the units are computer "words" then one may ask in which order these words are sent, what is their size, but not in which order the elements of these words are sent, since they are sent virtually "at-once". If the unit of transmission is an 8-bit byte, similar questions about bytes are meaningful, but not the order of the elementary particles which constitute these bytes. If the units of communication are bits, the "atoms" ("quarks"?) of computation, then the only meaningful question is the order in which bits are sent. Obviously, this is actually the case for serial transmission. Most modern communication is based on a single stream of information ("bit-stream"). Hence, bits, rather than bytes or words, are the units of information which are actually transmitted over the communication channels such as wires and satellite connections. Even though a great deal of effort, in both hardware and software, is dedicated to giving the appearance of byte or word communication, the basic fact remains: bits are communicated. Computer memory may be viewed as a linear sequence of bits, divided into bytes, words, pages and so on. Each unit is a subunit of the next level. This is, obviously, a hierarchical organization. 2 If the order is consistent, then such a sequence may be communicated successfully while both parties maintain their freedom to treat the bits as a set of groups of any arbitrary size. One party may treat a message as a "page", another as so many "words", or so many "bytes" or so many bits. If a consistent bit order is used, the "chunk-size" is of no consequence. If an inconsistent bit order is used, the chunk size must be understood and agreed upon by all parties. We will demonstrate some popular but inconsistent orders later. In a consistent order, the bit-order, the byte-order, the word-order, the page-order, and all the other higher level orders are all the same. Hence, when considering a serial bit-stream, along a communication line for example, the "chunk" size which the originator of that stream has in mind is not important. There are two possible consistent orders. One is starting with the narrow end of each word (aka "LSB") as the Little-Endians do, or starting with the wide end (aka "MSB") as their rivals, the Big-Endians, do. In this note we usually use the following sample numbers: a "word" is a 32-bit quantity and is designated by a "W", and a "byte" is an 8-bit quantity which is designated by a "C" (for "Character", not to be confused with "B" for "Bit)". MEMORY ORDER The first word in memory is designated as W0, by both regimes. Unfortunately, the harmony goes no further. The Little-Endians assign B0 to the LSB of the words and B31 is the MSB. The Big-Endians do just the opposite, B0 is the MSB and B31 is the LSB. By the way, if mathematicians had their way, every sequence would be numbered from ZERO up, not from ONE, as is traditionally done. If so, the first item would be called the "zeroth".... Since most computers are not built by mathematicians, it is no wonder that some computers designate bits from B1 to B32, in either the Little-Endians' or the Big-Endians' order. These people probably would like to number their words from W1 up, just to be consistent. Back to the main theme. We would like to illustrate the hierarchically consistent order graphically, but first we have to decide about the order in which computer words are written on paper. Do they go from left to right, or from right to left? 3 The English language, like most modern languages, suggests that we lay these computer words on paper from left to right, like this: |---word0---|---word1---|---word2---|.... In order to be consistent, B0 should be to the left of B31. If the bytes in a word are designated as C0 through C3 then C0 is also to the left of C3. Hence we get: |---word0---|---word1---|---word2---|.... |C0,C1,C2,C3|C0,C1,C2,C3|C0,C1,C2,C3|..... |B0......B31|B0......B31|B0......B31|...... If we also use the traditional convention, as introduced by our numbering system, the wide-end is on the left and the narrow-end is on the right. Hence, the above is a perfectly consistent view of the world as depicted by the Big-Endians. Significance consistency decreases as the item numbers (address) increases. Many computers share with the Big-Endians this view about order. In many of their diagrams the registers are connected such that when the word W(n) is shifted right, its LSB moves into the MSB of word W(n+1). English text strings are stored in the same order, with the first character in C0 of W0, the next in C1 of W0, and so on. This order is very consistent with itself and with the English language. On the other hand, the Little-Endians have their view, which is different but also self-consistent. They believe that one should start with the narrow end of every word, and that low addresses are of lower order than high addresses. Therefore they put their words on paper as if they were written in Hebrew, like this: ...|---word2---|---word1---|---word0---| When they add the bit order and the byte order they get: ...|---word2---|---word1---|---word0---| ....|C3,C2,C1,C0|C3,C2,C1,C0|C3,C2,C1,C0| .....|B31......B0|B31......B0|B31......B0| In this regime, when word W(n) is shifted right, its LSB moves into the MSB of word W(n-1). 4 English text strings are stored in the same order, with the first character in C0 of W0, the next in C1 of W0, and so on. This order is very consistent with itself, with the Hebrew language, and (more importantly) with mathematics, because significance increases with increasing item numbers (address). It has the disadvantage that English character streams appear to be written backwards; this is only an aesthetic problem but, admittedly, it looks funny, especially to speakers of English. In order to avoid receiving strange comments about this orders the Little-Endians pretend that they are Chinese, and write the bytes, not right-to-left but top-to-bottom, like: C0: "J" C1: "O" C2: "H" C3: "N" ..etc.. Note that there is absolutely no specific significance whatsoever to the notion of "left" and "right" in bit order in a computer memory. One could think about it as "up" and "down" for example, or mirror it by systematically interchanging all the "left"s and "right"s. However, this notion stems from the concept that computer words represent numbers, and from the old mathematical tradition that the wide-end of a number (aka the MSB) is called "left" and the narrow-end of a number is called "right". This mathematical convention is the point of reference for the notion of "left" and "right". It is easy to determine whether any given computer system was designed by Little-Endians or by Big-Endians. This is done by watching the way the registers are connected for the "COMBINED-SHIFT" operation and for multiple-precision arithmetic like integer products; also by watching how these quantities are stored in memory; and obviously also by the order in which bytes are stored within words. Don't let the B0-to-B31 direction fool you!! Most computers were designed by Big-Endians, who under the threat of criminal prosecution pretended to be Little-Endians, rather than seeking exile in Blefuscu. They did it by using the B0-to-B31 convention of the Little-Endians, while keeping the Big-Endians' conventions for bytes and words. The PDP10 and the 360, for example, were designed by Big-Endians: their bit order, byte-order, word-order and page-order are the same. The same order also applies to long (multi-word) character strings and to multiple precision numbers. 5 Next, let's consider the new M68000 microprocessor. Its way of storing a 32-bit number, xy, a 16-bit number, z, and the string "JOHN" in its 16-bit words is shown below (S = sign bit, M = MSB, L = LSB): SMxxxxxxx yyyyyyyyL SMzzzzzzL "J" "O" "H" "N" |--word0--|--word1--|--word2--|--word3--|--word4--|.... |-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|..... |B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|...... The M68000 always has on the left (i.e., LOWER byte- or word-address) the wide-end of numbers in any of the various sizes which it may use: 4 (BCD), 8, 16 or 32 bits. Hence, the M68000 is a consistent Big-Endian, except for its bit designation, which is used to camouflage its true identity. Remember: the Big-Endians were the outlaws. Let's look next at the PDP11 order, since this is the first computer to claim to be a Little-Endian. Let's again look at the way data is stored in memory: "N" "H" "O" "J" SMzzzzzzL SMyyyyyyL SMxxxxxxL ....|--word4--|--word3--|--word2--|--word1--|--word0--| .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-| ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0| The PDP11 does not have an instruction to move 32-bit numbers. Its multiplication products are 32-bit quantities created only in the registers, and may be stored in memory in any way. Therefore, the 32-bit quantity, xy, was not shown in the above diagram. Hence, the above order is a Little-Endians' consistent order. The PDP11 always stores on the left (i.e., HIGHER bit- or byte-address) the wide-end of numbers of any of the sizes which it may use: 8 or 16 bits. However, due to some infiltration from the other camp, the registers of this Little-Endian's marvel are treated in the Big-Endians' way: a double length operand (32-bit) is placed with its MSB in the lower address register and the LSB in the higher address register. Hence, when depicted on paper, the registers have to be put from left to right, with the wide end of numbers in the LOWER-address register. This affects the integer multiplication and division, the combined-shifts and more. Admittedly, Blefuscu scores on this one. Later, floating-point hardware was introduced for the PDP11/45. Floating-point numbers are represented by either 32- or 64-bit quantities, which are 2 or 4 PDP11 words. The wide end is the one with the sign bit(s), the exponent and the MSB of the fraction. The narrow end is the one with the LSB of the fraction. On paper these formats are clearly shown with the wide end on the left and the narrow on the right, according to the centuries old mathematical conventions. On page 12-3 6 of the PDP11/45 processor handbook, [3], there is a cute graphical demonstration of this order, with the word "FRACTION" split over all the 2 or the 4 words which are used to store it. However, due to some oversights in the security screening process, the Blefuscuians took over, again. They assigned, as they always do, the wide end to the LOWer addresses in memory, and the narrow to the HIGHer addresses. Let "xy" and "abcd" be 32- and 64-bit floating-point numbers, respectively. Let's look how these numbers are stored in memory: ddddddddL ccccccccc bbbbbbbbb SMaaaaaaa yyyyyyyyL SMxxxxxxx ....|--word5--|--word4--|--word3--|--word2--|--word1--|--word0--| .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-| ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0| Well, Blefuscu scores many points for this. The above reference in [3] does not even try to camouflage it by any Chinese notation. Encouraged by this success, as minor as it is, the Blefuscuians tried to pull another fast one. This time it was on the VAX, the sacred machine which all the Little-Endians worship. Let's look at the VAX order. Again, we look at the way the above data (with xy being a 32-bit integer) is stored in memory: "N" "H" "O" "J" SMzzzzzzL SMxxxxxxx yyyyyyyyL ...ng2-------|-------long1-------|-------long0-------| ....|--word4--|--word3--|--word2--|--word1--|--word0--| .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-| ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0| What a beautifully consistent Little-Endians' order this is !!! So, what about the infiltrators? Did they completely fail in carrying out their mission? Since the integer arithmetic was closely guarded they attacked the floating point and the double-floating which were already known to be easy prey. 7 Let's look, again, at the way the above data is stored, except that now the 32-bit quantity xy is a floating point number: now this data is organized in memory in the following Blefuscuian way: "N" "H" "O" "J" SMzzzzzzL yyyyyyyyL SMxxxxxxx ...ng2-------|-------long1-------|-------long0-------| ....|--word4--|--word3--|--word2--|--word1--|--word0--| .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-| ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0| Blefuscu scores again. The VAX is found guilty, however with the explanation that it tries to be compatible with the PDP11. Having found themselves there, the VAXians found a way around this unaesthetic appearance: the VAX literature (e.g., p. 10 of [4]) describes this order by using the Chinese top-to-bottom notation, rather than an embarrassing left-to-right or right-to-left one. This page is a marvel. One has to admire the skillful way in which some quantities are shown in columns 8-bit wide, some in 16 and other in 32, all in order to avoid the egg-on-the-face problem..... By the way, some engineering-type people complain about the "Chinese" (vertical) notation because usually the top (aka "up") of the diagrams corresponds to "low"-memory (low addresses). However, anyone who was brought up by computer scientists, rather than by botanists, knows that trees grow downward, having their roots at the top of the page and their leaves down below. Computer scientists seldom remember which way "up" really is (see 2.3 of [5], pp. 305-309). Having scored so easily in the floating point department, the Blefuscuians moved to new territories: Packed-Decimal. The VAX is also capable of using 4-bit-chunk decimal arithmetic, which is similar to the well known BCD format. The Big-Endians struck again, and without any resistance got their way. The decimal number 12345678 is stored in the VAX memory in this order: 7 8 5 6 3 4 1 2 ...|-------long0-------| ....|--word1--|--word0--| .....|-C1-|-C0-|-C1-|-C0-| ......|B15....B0|B15....B0| This ugliness cannot be hidden even by the standard Chinese trick. 8 SUMMARY (of the Memory-Order section) To the best of my knowledge only the Big-Endians of Blefuscu have built systems with a consistent order which works across chunk-boundaries, registers, instructions and memories. I failed to find a Little-Endians' system which is totally consistent. TRANSMISSION ORDER In either of the consistent orders the first bit (B0) of the first byte (C0) of the first word (W0) is sent first, then the rest of the bits of this byte, then (in the same order) the rest of the bytes of this word, and so on. Such a sequence of 8 32-bit words, for example, may be viewed as either 4 long-words, 8 words, 32 bytes or 256 bits. For example, some people treat the ARPA-internet-datagrams as a sequence of 16-bit words whereas others treat them as either 8-bit byte streams or sequences of 32-bit words. This has never been a source of confusion, because the Big-Endians' consistent order has been assumed. There are many ways to devise inconsistent orders. The two most popular ones are the following and its mirror image. Under this order the first bit to be sent is the LEAST significant bit (B0) of the MOST significant byte (C0) of the first word, followed by the rest of the bits of this byte, then the same right-to-left bit order inside the left-to-right byte order. Figure 1 shows the transmission order for the 4 orders which were discussed above, the 2 consistent and the 2 inconsistent ones. Those who use such an inconsistent order (or any other), and only those, have to be concerned with the famous byte-order problem. If they can pretend that their communication medium is really a byte-oriented link then this inconsistency can be safely hidden under the rug. A few years ago 8-bit microprocessors appeared and changed drastically the way we do business. A few years later a wide variety of 8-bit communication hardware (e.g., Z80-SIO and 2652) followed, all of which operate in the Little-Endians' order. 9 Now a wave of 16-bit microprocessors has arrived. It is not inconceivable that 16-bit communication hardware will become a reality relatively soon. Since the 16-bit communication gear will be provided by the same folks who brought us the 8-bit communication gear, it is safe to expect these two modes to be compatible with each other. The only way to achieve this is by using the consistent Little-Endians order, since all the existing gear is already in Little-Endians order. We have already observed that the Little-Endians do not have consistent memory orders for intra-computer organization. IF the 16-bit communication link could be made to operate in any order, consistent or not, which would give it the appearance of being a byte- oriented link, THEN the Big-Endians could push (ask? hope? pray?) for an order which transmits the bytes in left-to-right (i.e., wide-end first) and use that as a basis for transmitting all quantities (except BCD) in the more convenient Big-Endians format, with the most significant portions leading the least significant, maintaining compatibility between 16- and 32-bit communication, and more. However, this is a big "IF". Wouldn't it be nice if we could encapsulate the byte-communication and forget all about the idiosyncrasies of the past, introduced by RS232 and TELEX, of sending the narrow-end first? I believe that it would be nice, but nice things do not necessarily occur, especially if there is so much silicon against them. Hence, our choice now is between (1) Big-Endians' computer-convenience and (2) future compatibility between communication gear of different chunk size. I believe that this is the question, and we should address it as such. Short term convenience considerations are in favor of the former, and the long term ones are in favor of the latter. Since the war between the Little-Endians and the Big-Endians is imminent, let's count who is in whose camp. The founders of the Little-Endians party are RS232 and TELEX, who stated that the narrow-end is sent first. So do the HDLC and the SDLC protocols, the Z80-SIO, Signetics-2652, Intel-8251, Motorola-6850 and all the rest of the existing communication devices. In addition to these protocols and chips the PDP11s and the VAXes have already pledged their allegiance to this camp, and deserve to be on this roster. 10 The HDLC protocol is a full fledged member of this camp because it sends all of its fields with the narrow end first, as is specifically defined in Table 1/X.25 (Frame formats) in section 2.2.1 of Recommendation X.25 (see [2]). A close examination of this table reveals that the bit order of transmission is always 1-to-8. Always, except the FCS (checksum) field, which is the only 16-bit quantity in the byte-oriented protocol. The FCS is sent in the 16-to-1 order. How did the Blefuscuians manage to pull off such a fiasco?! The answer is beyond me. Anyway, anyone who designates bits as 1-to-8 (instead of 0-to-7) must be gullible to such tricks. The Big-Endians have the PDP10's, 370's, ALTO's and Dorado's... An interesting creature is the ARPANet-IMP. The documentation of its standard host interface (aka "LH/DH") states that "The high order bit of each word is transmitted first" (p. 4-4 of [1]), hence, it is a Big-Endian. This is very convenient, and causes no confusion between diagrams which are either 32- (e.g., on p. 3-25) and 16-bit wide (e.g., on p. 5-14). However, the IMP's Very Distant Host (VDH) interface is a Little-Endian. The same document ([1], again, p. F-18), states that the data "must consist of an even number of 8-bit bytes. Further, considering each pair of bytes as a 16-bit word, the less significant (right) byte is sent first". In order to make this even more clear, p. F-23 states "All bytes (data bytes too) are transmitted least significant (rightmost) bit first". Hence, both camps may claim to have this schizophrenic double-agent in their camp. Note that the Lilliputians' camp includes all the who's-who of the communication world, unlike the Blefuscuians' camp which is very much oriented toward the computing world. Both camps have already adopted the slogan "We'd rather fight than switch!". I believe they mean it. 11 SUMMARY (of the Transmission-Order section) There are two camps each with its own language. These languages are as compatible with each other as any Semitic and Latin languages are. All Big-Endians can talk to each other with relative ease. So can all the Little-Endians, even though there are some differences among the dialects used by different tribes. There is no middle ground. Only one end can go first. CONCLUSION Each camp tries to convert the other. Like all the religious wars of the past, logic is not the decisive tool. Power is. This holy war is not the first one, and probably will not be the last one either. The "Be reasonable, do it my way" approach does not work. Neither does the Esperanto approach of "let's all switch to yet a new language". Our communication world may split according to the language used. A certain book (which is NOT mentioned in the references list) has an interesting story about a similar phenomenon, the Tower of Babel. Little-Endians are Little-Endians and Big-Endians are Big-Endians and never the twain shall meet. We would like to see some Gulliver standing up between the two islands, forcing a unified communication regime on all of us. I do hope that my way will be chosen, but I believe that, after all, which way is chosen does not make too much difference. It is more important to agree upon an order than which order is agreed upon. How about tossing a coin ??? 12 time time | | \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / <-MSB---------------LSB- -MSB---------------LSB-> order (1) | | order (2) time time | | / | | \ / | | \ / | | \ / | | \ / | | \ / | | \ / | | \ / | | \ / | | \ / | | \ / | | \ / | | \ / | | \ / | | \ / | | \ / | | \ <-MSB---------------LSB- -MSB---------------LSB-> order (3) | | order (4) Figure 1: Possible orders, consistent: (1)+(2), inconsistent: (3)+(4). 14 R E F E R E N C E S [1] Bolt Beranek & Newman. Report No. 1822: Interface Message Processor. Technical Report, BB&N, May, 1978. [2] CCITT. Orange Book. Volume VIII.2: Public Data Networks. International Telecommunication Union, Geneva, 1977. [3] DEC. PDP11 04/05/10/35/40/45 processor handbook. Digital Equipment Corp., 1975. [4] DEC. VAX11 - Architecture Handbook. Digital Equipment Corp., 1979. [5] Knuth, D. E. The Art of Computer Programming. Volume I: Fundamental Algorithms. Addison-Wesley, 1968. [6] Swift, Jonathan. Gulliver's Travel. Unknown publisher, 1726. 15 OTHER SLIGHTLY RELATED TOPICS (IF AT ALL) not necessarily for inclusion in this note Who's on first? Zero or One ?? People start counting from the number ONE. The very word FIRST is abbreviated into the symbol "1st" which indicates ONE, but this is a very modern notation. The older notions do not necessarily support this relationship. In English and French - the word "first" is not derived from the word "one" but from an old word for "prince" (which means "foremost"). Similarly, the English word "second" is not derived from the number "two" but from an old word which means "to follow". Obviously there is an close relation between "third" and "three", "fourth" and "four" and so on. Similarly, in Hebrew, for example, the word "first" is derived from the word "head", meaning "the foremost", but not specifically No. 1. The Hebrew word for "second" is specifically derived from the word "two". The same for three, four and all the other numbers. However, people have,for a very long time, counted from the number One, not from Zero. As a matter of fact, the inclusion of Zero as a full-fledged member of the set of all numbers is a relatively modern concept. Zero is one of the most important numbers mathematically. It has many important properties, such as being a multiple of any integer. A nice mathematical theorem states that for any basis, b, the first b^N (b to the Nth power) positive integers are represented by exactly N digits (leading zeros included). This is true if and only if the count starts with Zero (hence, 0 through b^N-1), not with One (for 1 through b^N). This theorem is the basis of computer memory addressing. Typically, 2^N cells are addressed by an N-bit addressing scheme. Starting the count from One, rather than Zero, would cause either the loss of one memory cell, or an additional address line. Since either price is too expensive, computer engineers agree to use the mathematical notation of starting with Zero. Good for them! The designers of the 1401 were probably ashamed to have address-0 and hid it from the users, pretending that the memory started at address-1. 16 This is probably the reason that all memories start at address-0, even those of systems which count bits from B1 up. Communication engineers, like most "normal" people, start counting from the number One. They never suffer by having to lose a memory cell, for example. Therefore, they are happily counting 1-to-8, and not 0-to-7 as computer people learn to do. ORDER OF NUMBERS. In English, we write numbers in Big-Endians' left-to-right order. I believe that this is because we SAY numbers in the Big-Endians' order, and because we WRITE English in Left-to-right order. Mathematically there is a lot to be said for the Little-Endians' order. Serial comparators and dividers prefer the former. Serial adders and multipliers prefer the latter order. When was the common Big-Endians order adopted by most modern languages? In the Bible, numbers are described in words (like "seven") not by digits (like "7") which were "invented" nearly a thousand years after the Bible was written. In the old Hebrew Bible many numbers are expressed in the Little-Endians order (like "Seven and Twenty and Hundred") but many are in the Big-Endians order as well. Whenever the Bible is translated into English the contemporary English order is used. For example, the above number appears in that order in the Hebrew source of The Book of Esther (1:1). In the King James Version it is (in English) "Hundred and Seven and Twenty". In the modern Revised American Standard Version of the Bible this number is simply "One Hundred and Twenty-Seven". INTEGERS vs. FRACTIONS Computer designers treat fix-point multiplication in one of two ways, as an integer-multiplication or as a fractional-multiplication. The reason is that when two 16-bit numbers, for example, are multiplied, the result is a 31-bit number in a 32-bit field. Integers are right justified; fractions are left justified. The entire difference is only a single 1-bit shift. As small as it is, this is an important difference. Hence, computers are wired differently for these kinds of multiplications. The addition/subtraction operation is the same for either integer/fraction operation. 17 If the LSB is B0 then the value of a number is SIGMA<B(i)*[(2)^i]>, for i=0,15, in the above example. This is, obviously, an integer. If the MSB is B0 then the value of a number is SIGMA<B(i)*[(1/2)^i]>, for i=0,15. This is, obviously, a fraction. Hence, after multiplication the Integerites would typically keep B0-B15, the LSH (Least Significant Half), and discard the MSH, after verifying that there is no overflow into it. The Fractionites would also keep B0-B15, which is the MSH, and discard the LSH. One could expect Integerites to be Little-Endians, and Fractionites to be Big-Endians. I do not believe that the world is that consistent. SWIFT's POINT It may be interesting to notice that the point which Jonathan Swift tried to convey in Gulliver's Travels in exactly the opposite of the point of this note. Swift's point is that the difference between breaking the egg at the little-end and breaking it at the big-end is trivial. Therefore, he suggests, that everyone does it in his own preferred way. We agree that the difference between sending eggs with the little- or the big-end first is trivial, but we insist that everyone must do it in the same way, to avoid anarchy. Since the difference is trivial we may choose either way, but a decision must be made. ***** An editied version of this note appears in Computer Magazine (IEEE) of October 1981. ***** -- John Gilmore {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu jgilmore@lll-crg.arpa "I can't think of a better way for the War Dept to spend money than to subsidize the education of teenage system hackers by creating the Arpanet." -- Stuart.Lynne@wimsey.bc.ca ubc-cs!van-bc!sl 604-937-7532(voice) 604-939-4768(fax)
david@sun.com (philosophy is my hobby) (02/04/90)
In article <1990Feb2.215421.24894@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >As far as *technical* issues, the only ones >I'm aware of weakly favor big-endian: ... and it makes for consistent >bit ordering in things like frame buffers (do the dots on the screen >run left-to-right in successive bytes, or successive words? -- on a >little-endian machine, they're not the same), which simplifies the >code for optimized raster operations. You can have "consistent" display pixel ordering with either byte sex. For big-endian machines you put the most significant bits of a word on the left, and for little-endian machines you put the least significant bits on the left. Sun's 68K, SPARC, and 386 machines are "consistent" in this way. I believe that IBM PCs are "inconsistent" -- they have a little-endian CPU, but display the most significant bits on the left. I have written a lot of optimized (?) raster graphics code, and I don't consider this "consistency" to be particularly important. -- David DiGiacomo, Sun Microsystems, Mt. View, CA sun!david david@eng.sun.com
frk@mtxinu.COM (Frank Korzeniewski) (02/04/90)
In article <131201@sun.Eng.Sun.COM> david@sun.com (philosophy is my hobby) writes: #In article <1990Feb2.215421.24894@utzoo.uucp> henry@utzoo.uucp (Henry #Spencer) writes: #>As far as *technical* issues, the only ones #>I'm aware of weakly favor big-endian: ... and it makes for consistent #>bit ordering in things like frame buffers (do the dots on the screen #>run left-to-right in successive bytes, or successive words? -- on a #>little-endian machine, they're not the same), which simplifies the #>code for optimized raster operations. # #You can have "consistent" display pixel ordering with either byte sex. For #big-endian machines you put the most significant bits of a word on the #left, and for little-endian machines you put the least significant bits on #the left. Sun's 68K, SPARC, and 386 machines are "consistent" in this way. # #I believe that IBM PCs are "inconsistent" -- they have a little-endian CPU, #but display the most significant bits on the left. # #I have written a lot of optimized (?) raster graphics code, and I don't #consider this "consistency" to be particularly important. # #-- #David DiGiacomo, Sun Microsystems, Mt. View, CA sun!david david@eng.sun.com I have written bitblt and graphics routines for an EGA on an 80386, and I consider this in"consistency" to be a serious mistake. You cannot take advantage of 32 bit shifts when the bits are laced back and forth thru the 32 bit word. This can (and does) result in a factor of two or more LOSS in performance. Wait you say, the EGA only has an 8 or 16 bit interface and so you don't need to reference it in 32 bit chunks. Hmmmm, I say, not all your pixel manipulation occurs onscreen. There is plenty of offscreen work to do where you can reference 32 bits at a time (or you could if the hardware were done correctly). Frank Korzeniewski (frk@mtxinu.com)
) (02/04/90)
In article <1116@mtxinu.UUCP> frk@mtxinu.UUCP (Frank Korzeniewski) writes: >I have written bitblt and graphics routines for an EGA on an 80386, and I >consider this in"consistency" to be a serious mistake. You cannot take >advantage of 32 bit shifts when the bits are laced back and forth thru >the 32 bit word. I'm not familiar with EGA, but it sounds like the problem is something beyond inconsistent pixel and byte order. Certainly the bits that make up each pixel should be contiguous, but once you've managed that, the order of the pixels within the word is not critical.
jgk@osc.COM (Joe Keane) (02/05/90)
In article <1990Feb2.215421.24894@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >My impression is that the big issue is usually backward compatibility with >previous mistakes. As far as *technical* issues, the only ones I'm aware of >weakly favor big-endian: it's the network standard order, which makes life >easier for protocol code, Isn't this also ``backward compatibility with previous mistakes''? And what about RS-232? >and it makes for consistent bit ordering in things like frame buffers (do the >dots on the screen run left-to-right in successive bytes, or successive words? >-- on a little-endian machine, they're not the same), which simplifies the >code for optimized raster operations. Sorry, but little-endian is just as consistent. The code is exactly the same, you just have to switch << and >> in the right places. On say a MicroVAX, the leftmost bit on the screen is the least significant in its byte, word, page, or whatever.
frk@mtxinu.COM (Frank Korzeniewski) (02/05/90)
In article <131204@sun.Eng.Sun.COM> david@sun.com (POP-TARTS CONTAIN SLACK!!) writes: #In article <1116@mtxinu.UUCP> frk@mtxinu.UUCP (Frank Korzeniewski) writes: #>I have written bitblt and graphics routines for an EGA on an 80386, and I #>consider this in"consistency" to be a serious mistake. You cannot take #>advantage of 32 bit shifts when the bits are laced back and forth thru #>the 32 bit word. # #I'm not familiar with EGA, but it sounds like the problem is something #beyond inconsistent pixel and byte order. Certainly the bits that make up #each pixel should be contiguous, but once you've managed that, the order #of the pixels within the word is not critical. Okay, the following is a map of the coorespondence between the cpu (register or memory) and the video memory. Video memory is numbered with the bit order that will show up on the screen. I.e., v0 is the leftmost pixel, v1 is to its right, v7 is followed on its right by v8, and so on. The cpu bit numbering is based on which bits move into which other bits when a shift operation is performed. B0 moves into b1's position, ... b7 moves into b8's position, and so on, when a single bit left shift is done. |---------------------------------------------------| | order of bits in a word in cpu memory | | byte 3 | byte 2 | byte 1 | byte 0 | | b31 .. b24 | b23 .. b16 | b15 .. b8 | b7 .. b0 | |---------------------------------------------------| | v24 .. v31 | v16 .. v23 | v8 .. v15 | v0 .. v7 | | byte 3 | byte 2 | byte 1 | byte 0 | | order of bits in video memory | |---------------------------------------------------| The important thing to note is that is you do a 16 bit rotate right on the half-word containing v8..v15,v0..v7, then v7 will move into the position occupied by v8. The converse is true for left rotates. This is very important in bitblt for aligning bitfields prior to writing them into memory. This works very well when you are dealing with the data on the basis of 16 bit words. This same procedure falls apart when you try to do it on a 32 bit word scale. There is no shift instruction on the 386 that will move v7 into v8, and also move v15 into v16, and also move v23 into v24. As a consequence you can at most use 16 bit quantities when dealing with video memory data. This is an example of single bit pixels. For multi-bit pixels, you would have to bank switch in the other bits. The EGA only allows aggregate pixel access, one bit plane at a time. You can access a 4 bit pixel as a single quantity, but only ONE of these at a time. EGA was not your basic high proformance graphic design. Frank Korzeniewski (frk@mtxinu.com)
henry@utzoo.uucp (Henry Spencer) (02/06/90)
In article <1991@osc.COM> jgk@osc.COM (Joe Keane) writes: >>... As far as *technical* issues, the only ones I'm aware of >>weakly favor big-endian: it's the network standard order, which makes life >>easier for protocol code, > >Isn't this also ``backward compatibility with previous mistakes''? And what >about RS-232? It's a compatibility issue, yes, but it's compatibility with a widespread standard rather than just with your predecessors' mistakes. I don't see the relevance of RS232. Nothing except the UART chips ever cares about the bit order on the wire; all the computers involved deal with the data a byte/character at a time. It's the same way on Ethernet: the bytes are transmitted a bit at a time, but almost nobody knows or cares which bit order is used. -- SVR4: every feature you ever | Henry Spencer at U of Toronto Zoology wanted, and plenty you didn't.| uunet!attcan!utzoo!henry henry@zoo.toronto.edu
cik@l.cc.purdue.edu (Herman Rubin) (02/06/90)
In article <1991@osc.COM>, jgk@osc.COM (Joe Keane) writes: > In article <1990Feb2.215421.24894@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) > writes: ........................... > >-- on a little-endian machine, they're not the same), which simplifies the > >code for optimized raster operations. > > Sorry, but little-endian is just as consistent. The code is exactly the same, > you just have to switch << and >> in the right places. On say a MicroVAX, the > leftmost bit on the screen is the least significant in its byte, word, page, > or whatever. It depends on the purpose. If the part you want to keep after truncating is the most significant part, such as is the case with floating point numbers and fixed point fractions, big-endian is far easier to work with. If you want the low-order parts, little endian has the advantage. On the VAX, in order to provide for this problem, floating point numbers are little endian within words, but the words (16 bit) are arranged in a big endian manner. Packing and unpacking are quite difficult. The raster problem corresponds to fixed point fractions; when truncating, the most significant part is used. This is easier in big endian. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
mcgrath@homer.Berkeley.EDU (Roland McGrath) (02/07/90)
Obviously, for some purposes big-endian is better, and for some purposes little-endian is better. I suppose even twisted-endian (least-significant byte first in words, but most-significant word first in longwords) is good for something. I think the solution for the architects is to have it selectable. The Intel 860, the AMD 29000, and I'm sure others I don't know about, have this feature: a bit in the processor control register determines byte order. -- Roland McGrath Free Software Foundation, Inc. roland@ai.mit.edu, uunet!ai.mit.edu!roland
alan@oz.nm.paradyne.com (Alan Lovejoy) (02/07/90)
In article <1910@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >The raster problem corresponds to fixed point fractions; when truncating, >the most significant part is used. This is easier in big endian. As much as I like big-endianness, I beg to disagree. Have you ever implemented BitBlt? I have. In the general case, you have to splice words ON BOTH SIDES of each line in the bitmap. On the the left end of the line, you want to combine the leftmost bits of the word in the destination bitmap with the rightmost bits in the word from the source bitmap. On the right end of the line, you want to combine the leftmost bits of the word from the source bitmap with rightmost bits of the word in the destination bitmap. (All the above assumes that the leftmost bit in a word represents the leftmost pixel.) Also, there are generally two ways to clear a word of unwanted bits, if you are restricted to the standard shift and mask instructions (special bitfield and/or graphics instructions may provide other options). One way is to code "dataWord & mask." The other way is "((dataWord >> bits) << bits)." Normally, one prepares masks outside the BitBlt loop as follows: ...code which calculates the left and right pivot bits mercifully omitted.. leftSrcMask = ((~0 << leftPivotBit) >> leftPivotBit); leftDstMask = ~leftSrcMask; rightSrcMask = ((~0 >> rightPivotBit) << rightPivotBit); rightDstMask = ~rightSrcMask; The leftmost word to be copied in each line is then "blitted" as follows: (The following code assumes that the pixel bits happen to be aligned in the source and destination bitmaps.) srcWord = *srcPtr++; destWord = *destPtr; *destWPtr++ = (srcWord & leftSrcMask) | (destWord & leftDstMask); The rightmost word to be copied is blitted thusly: srcWord = *srcPtr++; destWord = *destPtr; *destPtr++ = (srcWord & rightSrcMask) | (destWord & rightDstMask); I fail to see any advantage here for big-endian over any other constistent, contiguous byte ordering. But perhaps you had something else in mind? Of course, it would be nice if there were a single machine instruction that could achieve the effect of "(srcWord & mask) | (destWord & ~mask)," where "mask" is either a field of 1 bits followed by a field of zero bits, or else vice versa. ____"Congress shall have the power to prohibit speech offensive to Congress"____ Alan Lovejoy; alan@pdn; 813-530-2211; AT&T Paradyne: 8550 Ulmerton, Largo, FL. Disclaimer: I do not speak for AT&T Paradyne. They do not speak for me. Mottos: << Many are cold, but few are frozen. >> << Frigido, ergo sum. >>
des@dtg.nsc.com (Desmond Young) (02/13/90)
In article <1990Feb5.192958.12091@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes: > I don't see the relevance of RS232. Nothing except the UART chips ever > cares about the bit order on the wire; all the computers involved deal > with the data a byte/character at a time. It's the same way on Ethernet: > the bytes are transmitted a bit at a time, but almost nobody knows or > cares which bit order is used. Well, not quite. Since the IEEE committees in their wisdom (and I suppose sort of justified) defined two bit orderings, this has meant there are people interested in the order. Ethernet vs Token-Ring/FDDI. When converting between the two (bridging), reversing the address bit orders is a bit of a pain. Des