urip@orcisi.UUCP (01/08/87)
Although it's coming a little late, and some readers may have forgotten the original article (On Holy Wars and a Plea for Peace by Danny Cohen) by now, I still hope that my article will get enough audience. My point is that the Least-Significant-Byte-first camp (LSBians, pronounced: elesbians) has a more correct way than the Most-Significant-byte- first (MSBians, pronounced: emesbians), and I am going to try to convince the MSBians to go my way. Before I start with the main issue, let me comment about the side issue. As someone who's native language is Hebrew and also knows some Arabic from school, I would like to confirm almost everything that was said in the article and in the responses about the order of digits etc. including the examples from the Bible and computer terminals in Arabic/Hebrew. There was a slight inaccuracy about the way numbers are read in Arabic: Only the units and tens are read the LSBian way, and the rest of the number is read the MSBian way. For example, the year 1984 is read: "one thousand nine hundreds four and eighty". Also, for those who don't know, the digit characters in Arabic are different from the Latin forms, but in Hebrew they are the same. The article was written in 1980, and things have changed since then. Six years are a lifetime in the world of computers, and sentences like: "I failed to find a Little-Endians' system which is totally consistent" cannot be left without an objection in 1986 (almost 1987). The Intel 80*86 micro processors are true, consistent LSBians. They do not have combined shift operations (the article suggested these as a good criterion to tell between LSBians and MSBians), but the multiply operation sure leaves the most significant part of the result in the high register, and the floating point format is totally consistent with the rest of the data types. The same is true for the National Series 32000 and I believe that Zilog is with the LSBians too. So in the micro processor area, it seems that Motorola 68000 is the only (though major...) MSBian around. Now, is it really as clean and pure MSBian as claimed in the article? Let me refresh your memory with a quote from the article: "Hence, the M68000 is a consistent Big-Endian, except for its bit designation, which is used to camouflage its true identity. Remember: the Big-Endians were the outlaws." The author did not try to claim that the funny floating point format of the VAX was to camouflage the VAX's true identity, so why should one believe that the LSBian bit order of the M68000 is because "the Big-Endians were the outlaws" ? I suspect that the true reason behind the inconsistency of the M68000 is the fact that only with an LSBian bit order, the value of bit number 'i' in a word is always equal to b[i] * 2^i (where b[i] is 0 or 1 according to bit number 'i', and 2^i is 2 to the power of i) and the designers of M68000 wanted to keep this important feature in spite of the overall MSBian architecture. There is another difference between LSBian and MSBian memory order that was not mentioned in the article. In the LSBian scheme, if a long-word in memory contains a small value, then a word or a byte in the same memory location still hold the same value (if the value is small enough to fit into these). For example, assume we have the value 0x00000002 in a (32 bit) long-word in memory address 100. LSB in lower address address 104 103 102 101 100 +----+----+----+----+ value | 00 | 00 | 00 | 02 | +----+----+----+----+ Note that a long-word, short word, byte and even nibble at address 100, all contain value 2. On the other hand, MSB in lower address address 100 101 102 103 104 +----+----+----+----+ value | 00 | 00 | 00 | 02 | +----+----+----+----+ Note that only a long-word at address 100 contains 2. All the rest contain 0. This may not seem to be a key issue, but it has some significance in type conversion as illustrated by the following C program segment: /*=================================*/ int i; char ch; ch = i; /*=================================*/ The 'int' (assume int is 32 bits) value has to be converted to 'char' or byte. In LSBian, this conversion is just a simple 'movb' (move byte) instruction from 'i' to 'ch': movb i, ch since both byte and long-word contain the same value. In MSBian it may involve an expensive bit field instruction (or worse, shifts and ands). Luckily for the M68000, it is byte addressable, so the compiler can do the trick and generate: movb i+3, ch So it is still a simple machine instruction, but it involves a small trick. Not clean, but still consistent, as long as we stick to byte addressable memory. But what about registers? registers are not byte addressable. There is only one byte of a register that can be accessed by a 'movb' instruction. All the other 3 bytes can be accessed only through bit field instructions (or worse, shifts and ands). Let's look at another program segment: /*=================================*/ extern int fgetc(); char ch; ch = fgetc(file); /*=================================*/ The C library routine 'fgetc' returns an 'int' result and it has to be converted to 'char'. Most implementations return function results in register 0. Assume that register D0 contains 'int' (32 bits) value 2, and so does the long-word at address 100. MSB in lower address address 100 101 102 103 104 +----+----+----+----+ value | 00 | 00 | 00 | 02 | +----+----+----+----+ +----+----+----+----+ register D0 | 00 | 00 | 00 | 02 | +----+----+----+----+ The instructions movl 100,x movl D0,x both move a long-word containing value 0x00000002 to location 'x'. so movb 100,x movb D0,x both should move a byte containing value 00 to locaion 'x'. So the code generated for above program segment in a true consistent MSBian machine would be: jbsr fgets movl 24,d1 lsrl d1,d0 movb d0,ch But in M68000 this is not true. No shift operation is needed because a 'movb' instruction with a register operand takes the byte that contains 2, that is, the HIGH address bit, so the compiler can generate a jbsr fgets movb d0,ch In other words, we see that the byte/word/long-word overlap of registers in the M68000 is implemented according to the more efficient LSBian way!! Conclusion: ========== I have shown that there are two aspects in which the LSBian way is more suitable and more efficent for binary computers. This is in addition to the argument of easier serial addition and multiplication that was mentioned in the article (though the latter is balanced, to some extent, by serial comparison and division). The main argument left against the LSBians is the more readable MSBian dump format. I think that in the modern days of optimizing compilers and symbolic debuggers, dumps are almost an extinct species, and please let them stay that way. I don't have any illusions. I don't expect Motorola to change their byte order after reading my article. I don't even expect users to prefer LSBian machines just for the sake of beauty and consistency. But I do hope that some day the LSBian method will prevail (or, maybe, someone will convince me of the superiority of the MSBian method...). Uri Postavsky (utcs!syntron!orcisi!urip) (currently with O.R.C Toronto, formerly with National Semiconductor Tel Aviv). From postnews Thu Jan 8 10:22:34 1987 Subject: Byte Order: be reasonable - do it my way... Newsgroups: comp.sys.m68k,comp.arch,comp.sys.intel Although it's coming a little late, and some readers may have forgotten the original article (On Holy Wars and Plea for Peace by Danny Cohen) by now, I still hope that my article will get enough audience. My point is that the Least-Significant-Byte-first camp (LSBians, pronounced: elesbians) has a more correct way than the Most-Significant-byte- first (MSBians, pronounced: emesbians), and I am going to try to convince the MSBians to go my way. Before I start with the main issue, let me comment about the side issue. As someone who's native language is Hebrew and also knows some Arabic from school, I would like to confirm almost everything that was said in the article and in the responses about the order of digits etc. including the examples from the Bible and computer terminals in Arabic/Hebrew. There was a slight inaccuracy about the way numbers are read in Arabic: Only the units and tens are read the LSBian way, and the rest of the number is read the MSBian way. For example, the year 1984 is read: "one thousand nine hundreds four and eighty". Also, for those who don't know, the digit characters in Arabic are different from the Latin forms, but in Hebrew they are the same. The article was written in 1980, and things have changed since then. Six years are a lifetime in the world of computers, and sentences like: "I failed to find a Little-Endians' system which is totally consistent" cannot be left without an objection in 1986 (almost 1987). The Intel 80*86 micro processors are true, consistent LSBians. They do not have combined shift operations (the article suggested these as a good criterion to tell between LSBians and MSBians), but the multiply operation sure leaves the most significant part of the result in the high register, and the floating point format is totally consistent with the rest of the data types. The same is true for the National Series 32000 and I believe that Zilog is with the LSBians too. So in the micro processor area, it seems that Motorola 68000 is the only (though major...) MSBian around. Now, is it really as clean and pure MSBian as claimed in the article? Let me refresh your memory with a quote from the article: "Hence, the M68000 is a consistent Big-Endian, except for its bit designation, which is used to camouflage its true identity. Remember: the Big-Endians were the outlaws." The author did not try to claim that the funny floating point format of the VAX was to camouflage the VAX's true identity, so why should one believe that the LSBian bit order of the M68000 is because "the Big-Endians were the outlaws" ? I suspect that the true reason behind the inconsistency of the M68000 is the fact that only with an LSBian bit order, the value of bit number 'i' in a word is always equal to b[i] * 2^i (where b[i] is 0 or 1 according to bit number 'i', and 2^i is 2 to the power of i) and the designers of M68000 wanted to keep this important feature in spite of the overall MSBian architecture. There is another difference between LSBian and MSBian memory order that was not mentioned in the article. In the LSBian scheme, if a long-word in memory contains a small value, then a word or a byte in the same memory location still hold the same value (if the value is small enough to fit into these). For example, assume we have the value 0x00000002 in a (32 bit) long-word in memory address 100. LSB in lower address address 104 103 102 101 100 +----+----+----+----+ value | 00 | 00 | 00 | 02 | +----+----+----+----+ Note that a long-word, short word, byte and even nibble at address 100, all contain value 2. On the other hand, MSB in lower address address 100 101 102 103 104 +----+----+----+----+ value | 00 | 00 | 00 | 02 | +----+----+----+----+ Note that only a long-word at address 100 contains 2. All the rest contain 0. This may not seem to be a key issue, but it has some significance in type conversion as illustrated by the following C program segment: /*=================================*/ int i; char ch; ch = i; /*=================================*/ The 'int' (assume int is 32 bits) value has to be converted to 'char' or byte. In LSBian, this conversion is just a simple 'movb' (move byte) instruction from 'i' to 'ch': movb i, ch since both byte and long-word contain the same value. In MSBian it may involve an expensive bit field instruction (or worse, shifts and ands). Luckily for the M68000, it is byte addressable, so the compiler can do the trick and generate: movb i+3, ch So it is still a simple machine instruction, but it involves a small trick. Not clean, but still consistent, as long as we stick to byte addressable memory. But what about registers? registers are not byte addressable. There is only one byte of a register that can be accessed by a 'movb' instruction. All the other 3 bytes can be accessed only through bit field instructions (or worse, shifts and ands). Let's look at another program segment: /*=================================*/ extern int fgetc(); char ch; ch = fgetc(file); /*=================================*/ The C library routine 'fgetc' returns an 'int' result and it has to be converted to 'char'. Most implementations return function results in register 0. Assume that register D0 contains 'int' (32 bits) value 2, and so does the long-word at address 100. MSB in lower address address 100 101 102 103 104 +----+----+----+----+ value | 00 | 00 | 00 | 02 | +----+----+----+----+ +----+----+----+----+ register D0 | 00 | 00 | 00 | 02 | +----+----+----+----+ The instructions movl 100,x movl D0,x both move a long-word containing value 0x00000002 to location 'x'. so movb 100,x movb D0,x both should move a byte containing value 00 to locaion 'x'. So the code generated for above program segment in a true consistent MSBian machine would be: jbsr fgets movl 24,d1 lsrl d1,d0 movb d0,ch But in M68000 this is not true. No shift operation is needed because a 'movb' instruction with a register operand takes the byte that contains 2, that is, the HIGH address bit, so the compiler can generate a jbsr fgets movb d0,ch In other words, we see that the byte/word/long-word overlap of registers in the M68000 is implemented according to the more efficient LSBian way!! Conclusion: ========== I have shown that there are two aspects in which the LSBian way is more suitable and more efficent for binary computers. Even an MSBian machine like M68000 is LSBian in these aspects. This is in addition to the argument of easier serial addition and multiplication that was mentioned in the article (though the latter is balanced, to some extent, by serial comparison and division). The main argument left against the LSBians is the more readable MSBian dump format. I think that in the modern days of optimizing compilers and symbolic debuggers, dumps are almost an extinct species, and please let them stay that way. I don't have any illusions. I don't expect Motorola to change their byte order after reading my article. I don't even expect users to prefer LSBian machines just for the sake of beauty and consistency. But I do hope that some day the LSBian method will prevail (or, maybe, someone will convince me of the superiority of the MSBian method...). Uri Postavsky (utcs!syntron!orcisi!urip) (currently with O.R.C Toronto, formerly with National Semiconductor Tel Aviv).
mwm@cuuxb.UUCP (01/11/87)
In article <760@orcisi.UUCP> urip@orcisi.UUCP writes: >My point is that the Least-Significant-Byte-first camp (LSBians, >pronounced: elesbians) has a more correct way than the Most-Significant-byte- >first (MSBians, pronounced: emesbians), and I am going to try to convince >the MSBians to go my way. >... >"Hence, the M68000 is a consistent Big-Endian, except for its bit >designation, which is used to camouflage its true identity. >Remember: the Big-Endians were the outlaws." > >The author did not try to claim that the funny floating point format >of the VAX was to camouflage the VAX's true identity, so why should one believe >that the LSBian bit order of the M68000 is because "the Big-Endians were >the outlaws" ? I suspect that the true reason behind the inconsistency >of the M68000 is the fact that only with an LSBian bit order, the value of >bit number 'i' in a word is always equal to > > b[i] * 2^i > >(where b[i] is 0 or 1 according to bit number 'i', and 2^i is 2 to the power >of i) >and the designers of M68000 wanted to keep this important feature in spite of >the overall MSBian architecture. The bit notation is strictly notational, and has no bearing on the operation of the cpu. One could go entirely through any MC68000 book and replace the bit-numbers appropriately throughout. Bit notation (what way you number which bits) has no relation to byte ordering. I could number my bits: 13579BDF2468ACE 000000000000001 to represent the number 1 if I so chose. Saying that byte ordering that proceeds in the opposite direction to bit ordering is some how "inconsistent" is part of your argument, I would maintain that it is moot. > [ long disscussion of how in LSBian machines, a movebyte 2,x and a > movelong 2,x generate the same bits at location x] > >This may not seem to be a key issue, but it has some significance in type >conversion as illustrated by the following C program segment: > >{int i; char ch; ch = i;} > >The 'int' (assume int is 32 bits) value has to be converted to 'char' or byte. >In LSBian, this conversion is just a simple 'movb' (move byte) instruction >from 'i' to 'ch': > > movb i, ch > >since both byte and long-word contain the same value. > >In MSBian it may involve an expensive bit field instruction (or worse, >shifts and ands). Luckily for the M68000, it is byte addressable, so the >compiler can do the trick and generate: > > movb i+3, ch > >So it is still a simple machine instruction, but it involves a small trick. Actually, what should be done, in general, is something similar to the VAX trick of having a conversion instruction, and do a cvt_lb i,ch Unfortunately, in the 68000 the type-conversion (sign extension/truncation) operations only work between registers, and one is forced to resort to kluges like mov.b i+3,ch in order to do the job efficiently. >Not clean, but still consistent, as long as we stick to byte addressable >memory. > >But what about registers? registers are not byte addressable. >There is only one byte of a register that can be accessed by a 'movb' >instruction. All the other 3 bytes can be accessed only through bit field >instructions (or worse, shifts and ands). This is true of big-endian and little-endian machines which are byte addressable.... >Let's look at another program segment: > > {extern int fgetc(); char ch; ch = fgetc(file);} > >The C library routine 'fgetc' returns an 'int' result and it has to be >converted to 'char'. Most implementations return function results in >register 0. > >Assume that register D0 contains 'int' (32 bits) value 2, >and so does the long-word at address 100. > > MSB in lower address > >address 100 101 102 103 104 > +----+----+----+----+ >value | 00 | 00 | 00 | 02 | > +----+----+----+----+ > +----+----+----+----+ >register D0 | 00 | 00 | 00 | 02 | > +----+----+----+----+ > >The instructions > > movl 100,x > movl D0,x > >both move a long-word containing value 0x00000002 to location 'x'. >so > movb 100,x > movb D0,x > >both should move a byte containing value 00 to locaion 'x'. Okay so far, both big endian and little endian machines do this; so long as the definition of "Location x" includes a *type*. Whether that type is a 17 bit wide field or a 1 byte wide feild, in either machine, locations need types to make them consistent. Type conversion could even be implemented to generate an exception on truncation of signifigant bits, and to sign extend appropriately. >So the code generated for above program segment in a true consistent >MSBian machine would be: > > jbsr fgets > movl 24,d1 > lsrl d1,d0 > movb d0,ch Huh? Once again you are defining "consistent" to suit your argument against MSBian machines. > >But in M68000 this is not true. No shift operation is needed because >a 'movb' instruction with a register operand takes the byte that contains 2, >that is, the HIGH address bit, so the compiler can generate a > > jbsr fgets > movb d0,ch Once again you miss the fact that the longword at location x is different from the character at location x. (This is, incidentally, a register-based type conversion, treating d0 first as a long and then as a character, as per my discussion above. The code should read: jsbr fgets cvt_lb d0,ch It just so happens that a movb on a long word *does* convert long to byte.) > >In other words, we see that the byte/word/long-word overlap of registers >in the M68000 is implemented according to the more efficient LSBian way!! > Wait a minute. First you claim that the MC68000 is MSBian, but that its *registers* are LSBian???? What does byte order in memory have to do with registers??? Are you saying that the 68000 should load bytes into the high order bits of a register rather than the low order bits in order to be "consistent"? > >Conclusion: >========== > >I have shown that there are two aspects in which the LSBian way is more >suitable and more efficent for binary computers. This is in addition >to the argument of easier serial addition and multiplication that was mentioned >in the article (though the latter is balanced, to some extent, by serial >comparison and division). > >The main argument left against the LSBians is the more readable MSBian >dump format. I think that in the modern days of optimizing compilers and >symbolic debuggers, dumps are almost an extinct species, and please >let them stay that way. > >I don't have any illusions. I don't expect Motorola to change their byte order >after reading my article. I don't even expect users to prefer LSBian >machines just for the sake of beauty and consistency. >But I do hope that some day the LSBian method will prevail (or, maybe, >someone will convince me of the superiority of the MSBian method...). > Unfortuntately that won't happen either -- the point I am trying to make is that for computational purposes, they byte-ordering question is moot. When you introduce the paradign of typed memory locations and type conversion instructions, the two systems become computationally equivalent. > >Uri Postavsky (utcs!syntron!orcisi!urip) > > (currently with O.R.C Toronto, > formerly with National Semiconductor Tel Aviv). -- Marc Mengel ...!ihnp4!cuuxb!mwm
amos@instable.UUCP (Amos Shapir) (01/11/87)
When I was working on CCI's 6/32, which is basically a vax-like architecture turned MSB-ian, the problem that Uri mentioned (partial read/write to registers) was hard to resolve. The result: on CCI 6/32, a register is always long; that requires sign-extension, but that doesn't take more time than a 'movb'. -- Amos Shapir National Semiconductor (Israel) 6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel (011-972) 52-522261 amos%nsta@nsc 34.48'E 32.10'N
mangler@cit-vax.Caltech.Edu (System Mangler) (01/12/87)
The advantage of little-endian is that it allows type punning, giving the optimizer a little something extra to work with. The advantage of big-endian is that it disallows type punning, making it easy to catch non-portable programming practices. Don Speck speck@vlsi.caltech.edu {seismo,rutgers,ames}!cit-vax!speck
braun@drivax.UUCP (Karl T. Braun (kral)) (01/12/87)
In article <760@orcisi.UUCP> urip@orcisi.UUCP writes:
- I think that in the modern days of optimizing compilers and
- symbolic debuggers, dumps are almost an extinct species,...
- :
- :
- I don't have any illusions.
'nuf said.
--
kral 408/647-6112 ...!{amdahl,ihnp4}!drivax!braun
"Who is Number One?" "You are, Number Six!"
keithe@tekgvs.UUCP (Keith Ericson) (01/12/87)
Who stinkin cares? As you stated in the lead in to your article, you're too late. There's Motorola's way and there's Intel's way. Your way, except that it resembles one of the above, is irrelevant.
radford@calgary.UUCP (Radford Neal) (01/14/87)
In article <1011@cuuxb.UUCP>, mwm@cuuxb.UUCP (Marc W. Mengel) writes: > ... The point I am trying to make > is that for computational purposes, they byte-ordering question is > moot. When you introduce the paradign of typed memory locations and > type conversion instructions, the two systems become computationally > equivalent. Not quite. Consider the following C code: int i; char c; void proc(x) char *x; { ... access to *x; ... } ... c = 'a'; proc(&c); i = 'a'; proc(&i); The two calls to 'proc' both deliver parameters for which *x is 'a' on a little-endian machine, but they don't on a big-endian machine. The litte- endian method thus allows a certain flexibility in typing that the big-endian method doesn't. This is NOT just a notational matter. I do not recommend use of this technique in C code, due to the obvious portability problems. But if little-endian memory were entirely standard it might be a usefull technique, though really you want a more general run-time typing scheme if you want to do this sort of thing. Radford Neal The University of Calgary
bjorn@alberta.UUCP (Bjorn R. Bjornsson) (01/14/87)
In article <753@vaxb.calgary.UUCP>, radford@calgary.UUCP (Radford Neal) writes: > The two calls to 'proc' both deliver parameters for which *x is 'a' on a > little-endian machine, but they don't on a big-endian machine. The litte- > endian method thus allows a certain flexibility in typing that the > big-endian method doesn't. This is NOT just a notational matter. Yes, this is what I call a subset property. Of course Cohen because of his bias never bothered to document this in his supposed plea for peace. This property was exhibited by all the usual data types on a VAX until G and H format floating point came into general use. Cohen focused on the fact that PDP-11/VAX floating point data rep- resentations were not strictly little endian. He was right of course but I think that the subset property is worth mentioning. If you expect an F float by reference in a procedure, it's quite alright for the caller to pass you a D float, what gets lopped off is the least significant (substitute interesting for significant when talking about integers). This subset property is lost in a purely little endian machine, such as the *86 from Intel. What you get there is pure garbage if a caller isn't careful to pass what's expected. But then a language that provides a reasonable facsimile of IEEE floating point arithmetic on a VAX has to forego the subset property, sacrificing the equal width exponents of F and D, in favor of the greater exponent range of G format floating point for double precision implementation. So in the final analysis enter one more alternative into the holy war: The location invariant subset property. When my buss pass runs out, should I run out after it, should I pass, or just buzz? Bjorn R. Bjornsson alberta!bjorn
wsr@lmi-angel.UUCP (Wolfgang Rupprecht) (01/14/87)
In article <> mwm@cuuxb.UUCP (Marc W. Mengel) writes: > The bit notation is strictly notational, and has no bearing on > the operation of the cpu. One could go entirely through any > MC68000 book and replace the bit-numbers appropriately throughout. > > Bit notation (what way you number which bits) has no relation > to byte ordering. I could number my bits: > 13579BDF2468ACE > 000000000000001 > to represent the number 1 if I so chose. Saying that byte ordering > that proceeds in the opposite direction to bit ordering is > some how "inconsistent" is part of your argument, I would maintain > that it is moot. Unfortunately thats not the case. Look at the BSET, BCLR, BTST (bit set, clear and test) instructions. If you do a: BSET.L d0, somelocation with #31 in d0, you have set the MSB (byte 0!) of the long-word. This is *LOW* endian. -- Wolfgang Rupprecht {harvard|decvax!cca|mit-eddie}!lmi-angel!wsr
joel@gould9.UUCP (Joel West) (01/16/87)
The 68000 is big-endian, as is the 68020. Recently I saw the suggestion that the 68008 is little-endian, except when fetching opcodes. Could this be true? If so, does anyone have any idea how many changes would be required to programs and an OS to make them work on both processors? It boggles my mind... -- Joel West MCI Mail: 282-8879 Western Software Technology, POB 2733, Vista, CA 92083 {cbosgd, ihnp4, pyramid, sdcsvax, ucla-cs} !gould9!joel joel%gould9.uucp@NOSC.ARPA
lamaster@pioneer.UUCP (01/16/87)
In article <112@lmi-angel.UUCP> wsr@lmi-angel.UUCP (Wolfgang Rupprecht) writes: >In article <> mwm@cuuxb.UUCP (Marc W. Mengel) writes: >> The bit notation is strictly notational, and has no bearing on >> the operation of the cpu. One could go entirely through any >> MC68000 book and replace the bit-numbers appropriately throughout. > >Unfortunately thats not the case. Look at the BSET, BCLR, BTST (bit >set, clear and test) instructions. If you do a: >-- >Wolfgang Rupprecht {harvard|decvax!cca|mit-eddie}!lmi-angel!wsr There is a difference between a) the data formats b) addressing conventions c) instructions d) notation and documentation. Few machines are consistently big-endian or little endian on all four. The CDC Cyber 205 is one of the few machines that is consistently big-endian in ALL aspects. It is BIT addressable, byte addressable, 32-bit word addressable, and 64-bit word addressable, with completely mutually consistent addresses for each. And, a plus, it even shows the formats in the documentation numbered from the left. I believe that the National Semiconductor NS 32000 series MAY be a completely consistent little endian. If it is, then that too would be something of an accomplishment. (Remembering what Emerson said about consistency, it is still a valuable property of computer instruction sets). The Motorola MC68000 series is big-endian in data formats, but is inconsistent in instructions and addressing. The VAX machines are distinctive in that they are not even consistent in data formats. I am not sure about the Intel machines. Perhaps an Intel booster could enlighten me. As one of the instigators of the current discussion, I would like to summarize some of what I have learned from subsequent arguments: a) Some very very small machines may save a little bit of temporary register space by being little endian (I looked up some hardware diagrams and discovered that none of the more recent PDP-11's used reduced size memory data registers, but some of the early small machines might have...I don't know) b) Some variable length memory to memory instructions could be benefited by using little endian notation. c) Most people assume that there are other easy to prove properties of "their" personal favorite ("If God had wanted us to use Hex, He would have given us 16 fingers"). To date, I have not seen any such argument that holds water. d) Some people believe that systems programmers don't have to read dumps (Ha Ha Ha!). (The same people believe that Pascal HAS ALREADY REPLACED Cobol and Fortran :-) :-) ) As I said before, those of us who work in multi-user environments need to be able to move binary data files between machines. The IEEE floating point standard was the declaration of independence; now we are fighting the war. Big manufacturers will never willingly make it easier to move data between machines. Fortunately, small manufacturers will. It is important that a standard for the mapping of integers and floating point numbers onto bits, bytes, and words, be developed. Before this effort can succeed, it may be necessary for microprocessor and workstation manufacturers to agree. Why not form an IEEE committee? One last point: As a systems programmer, I think it would be nice if data formats, addressing, instruction formats, and notation were all consistently big-endian or little-endian. As a user, all I really care about are that the data formats are consistently one way or the other on ALL machines. All the mainframes that I know of have big-endian data formats; so does the (by far) most popular microprocessor for engineering and scientific use (MC68020). That gives big-endians a big head start. But the (to be formed) committee may vote the other way. As long as a standard is chosen, I don't care particularly. By the way, for those that missed it, the Danny Cohen article that is often referred to is in the October '81 IEEE Computer magazine. Hugh LaMaster, m/s 233-9, UUCP: {seismo,hplabs}!nike!pioneer!lamaster NASA Ames Research Center ARPA: lamaster@ames-pioneer.arpa Moffett Field, CA 94035 ARPA: lamaster%pioneer@ames.arpa Phone: (415)694-6117 ARPA: lamaster@ames.arc.nasa.gov "He understood the difference between results and excuses." ("Any opinions expressed herein are solely the responsibility of the author and do not represent the opinions of NASA or the U.S. Government")
jer@ipmoea.UUCP (Eric Roskos) (01/17/87)
uri@orcisi.UUCP writes: >My point is that the Least-Significant-Byte-first camp (LSBians, >pronounced: elesbians) has a more correct way than the Most-Significant-byte- >first (MSBians, pronounced: emesbians), and I am going to try to convince >the MSBians to go my way. It seems to me that the basic premise of the argument that follows the above statement is that the name (memory address) bound to a long word in memory is different from the name bound to its least significant byte, whereas the same name (register number) references a longword (for a long register operand) or its least significant byte (for a byte operand). The poster argues that this is inconsistent. In fact, the problem (and the cause of the whole debate, I often think) is that separate bytes in a longword in memory have different names bound to them at all. What is the 2nd byte of a 4-byte integer good for, on its own? Why should it be separately addressable at all? In the 68000, registers, however transiently, have a type associated with them that determines their length; but the individual bytes of a register, when it's a longword register, do not have separate names. Register 5 is register 5, though its size varies depending on its type (which persists only for the duration of 1 instruction). Most problems arise when you take a datum of one type and suddenly start treating it as a different type; that is the real inconsistency.
jbn@glacier.STANFORD.EDU (John B. Nagle) (01/17/87)
big-endian, but TP4/MAP is little-endian. Incidentally, there are some fields in X.25 control frames where the BITS are in reverse order, for obscure historical reasons. John Nagle
radford@calgary.UUCP (01/17/87)
In article <980@gould9.UUCP>, joel@gould9.UUCP (Joel West) writes: > The 68000 is big-endian, as is the 68020. > > Recently I saw the suggestion that the 68008 is little-endian, > except when fetching opcodes. Could this be true? No, it couldn't be true. A more ridiculous design decision would be hard to imagine. Come to think of it, that may not be all that strong an argument :-) Anyway, I've actually used both 68000 and 68008 processors, and can assure you that they are pretty much compatible. According to the documentation, they even went so far as to have the 68008 trap word accesses at an odd address (I didn't actually try it). Personally, I think this was going too far, especially as (I am given to understand) the 68020 doesn't. (I know, I know, the 68008 came earlier...) Radford Neal The University of Calgary
papowell@umn-cs.UUCP (Patrick Powell) (01/18/87)
In order to spare the innocent, I will briefly note: mumble proc(x) char *x; { .... } int i; char c; (void) proc(&c); <--O.K. (void) proc(&i); <--SSSSSSSSSS BOOOOOO Since when do you assume that a pointer to int has the same FORM as a pointer to char? Sigh... I think that the originator of the idea has to work on a word based machine for his sins. Patrick ("Two days programming in FORTRAN IV on a IBM 7090 or 50 lashes" "I'll take the lashes!!!") Powell -- Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE, University of Minnesota, Minneapolis, MN 55455 (612)625-3543/625-4002
davidsen@steinmetz.steinmetz.UUCP (william E Davidsen) (01/20/87)
In article <172@ames.UUCP> lamaster@pioneer.UUCP (Hugh LaMaster) writes: >The Motorola MC68000 series is big-endian in data formats, but is >inconsistent in instructions and addressing. The VAX machines are >distinctive in that they are not even consistent in data formats. >I am not sure about the Intel machines. Perhaps an Intel booster >could enlighten me. The Intel 80*86 series seems to be consistent in being little endian in the integer formats. The first (lowest address) byte is at the same address for all integer data formats, followed by the other bytes, if any, in increasing order. This is not true of text strings, which have the first byte at the lowest location, including BCD numbers, as I recall. > >As one of the instigators of the current discussion, I would like to >summarize some of what I have learned from subsequent arguments: >a) Some very very small machines may save a little bit of temporary >register space by being little endian (I looked up some hardware >diagrams and discovered that none of the more recent PDP-11's used >reduced size memory data registers, but some of the early small >machines might have...I don't know) It does allow some code to be simpler when processing the same data using several word lengths. This is in the area of making the code easier to write, rather than (necessarily) more efficient. >b) Some variable length memory to memory instructions could be >benefited by using little endian notation. Very much was is implied by (a). >c) Most people assume that there are other easy to prove properties >of "their" personal favorite ("If God had wanted us to use Hex, He >would have given us 16 fingers"). To date, I have not seen any such >argument that holds water. How many fingers do you have? Seriously, the notation should be a factor of the word length. Machines which have 9 bit bytes, such as the GE/Honeywell/Hitachi series, DEC 10/20, etc, are much easier to understand in octal. I don't see that this effects endian at all. >d) Some people believe that systems programmers don't have to read >dumps (Ha Ha Ha!). (The same people believe that Pascal HAS ALREADY >REPLACED Cobol and Fortran :-) :-) ) See previous comment. > >As I said before, those of us who work in multi-user environments >need to be able to move binary data files between machines. The >IEEE floating point standard was the declaration of independence; >now we are fighting the war. I wrote a little routine on the Cray to generate binary files for a VAX or PC (running a BASIC program yet) in about 2 minutes and 4 lines of code. The war isn't in integers, but float, where everyone uses what they like, and IEEE if they feel like it. The micros are MUCH better at staying with the standard. For what it's worth, it was far easier to output the integers LSB first on any machine, since machines with short word lengths may not have the bits needed to output MSB first. > Big manufacturers will never willingly >make it easier to move data between machines. Fortunately, small >manufacturers will. It is important that a standard for the mapping >of integers and floating point numbers onto bits, bytes, and words, >be developed. Before this effort can succeed, it may be necessary >for microprocessor and workstation manufacturers to agree. Why not >form an IEEE committee? What am I missing? FP and bytes I can see, but bits? > >One last point: As a systems programmer, I think it would be nice >if data formats, addressing, instruction formats, and notation were >all consistently big-endian or little-endian. That's one of the things I don't like about the 680?0 series, I would expect an integer with a value of 0x12345678 to be in memory as either 12345678, or 87654321, or 78563412 even, but not 34127856. I consider that "middle-endian". (yes I have att7300, Suns, etc, save you flames about how much I would like them) > As a user, all I really >care about are that the data formats are consistently one way or the >other on ALL machines. All the mainframes that I know of have >big-endian data formats; so does the (by far) most popular >microprocessor >for engineering and scientific use (MC68020). Does that mean that if it uses a National or Intel chip it isn't a workstation? > That gives big-endians >a big head start. But the (to be formed) committee may vote the other >way. As long as a standard is chosen, I don't care particularly. > >By the way, for those that missed it, the Danny Cohen article that is >often referred to is in the October '81 IEEE Computer magazine. > > Hugh LaMaster, m/s 233-9, UUCP: {seismo,hplabs}!nike!pioneer!lamaster > NASA Ames Research Center ARPA: lamaster@ames-pioneer.arpa > Moffett Field, CA 94035 ARPA: lamaster%pioneer@ames.arpa > Phone: (415)694-6117 ARPA: lamaster@ames.arc.nasa.gov > >"He understood the difference between results and excuses." > >("Any opinions expressed herein are solely the responsibility of the >author and do not represent the opinions of NASA or the U.S. Government") Anyone trying to categorize me as a lover of one endian over the other has not read my comments carefully. I hate trying to read 680?0 dumps, and fortunately don't do it often. -- bill davidsen sixhub \ ihnp4!seismo!rochester!steinmetz -> crdos1!davidsen chinet / ARPA: davidsen%crdos1.uucp@crd.ge.com (or davidsen@crd.ge.com)
jesup@steinmetz.steinmetz.UUCP (Randell Jesup) (01/21/87)
In article <1116@steinmetz.steinmetz.UUCP> davidsen@kbsvax.UUCP (william E Davidsen) writes: >That's one of the things I don't like about the 680?0 series, I would >expect an integer with a value of 0x12345678 to be in memory as either >12345678, or 87654321, or 78563412 even, but not 34127856. I consider >that "middle-endian". I'm not sure how you're dumping the memory, but I find it hard to understand how you could get anything back but 12345678. If a long-word of 0x12345678 is stored at location 0, location 0 will contain 0x12, location 1 will contain 0x34, location 2 will contain 0x56, and location 3 will contain 0x78. The only way I can see for you to get anything but is to dump it with a little-endian dump program. The only cases where I can see little-endian having any advantage is machines that have (or are descendants of machines that have) differing internal and external word sizes. Any machine that can read and write a register's worth of data at a time gets no performance improvement from little- endedness. Randell Jesup jesup@steinmetz.uucp (seismo!rochester!steinmetz!jesup) jesup@ge-crd.arpa
guy%gorodish@Sun.COM (Guy Harris) (01/22/87)
> >One last point: As a systems programmer, I think it would be nice > >if data formats, addressing, instruction formats, and notation were > >all consistently big-endian or little-endian. > > That's one of the things I don't like about the 680?0 series, I would > expect an integer with a value of 0x12345678 to be in memory as either > 12345678, or 87654321, or 78563412 even, but not 34127856. I consider > that "middle-endian". Could you please explain how you got "34127856" out of this? If you read the nibbles in a byte left-to-right, and sequence through the value 0x12345678 from byte 0 to byte 3 (which is the same as reading the bytes in a word left-to-right), you get "12345678". The only way *I* can see getting "34127856" is if you read the first 16-bit word of 0x12345678 first (i.e., the 16-bit word whose 0th byte has the lowest address) followed by the second 16-bit word; read the low-order byte of the word first (i.e., the one in the least-significant bits, i.e. the one with the *highest* address), and read the two nibbles of each byte from left to right (i.e., the nibble in the most-significant bits first). If you read the bits from the most-significant bit to the least-significant bit, you are also reading the (bytes, words) from the lowest address to the highest, and you get "12345678". The 680x0 is perfectly consistently big-endian here; it's your way of reading it that's "middle-endian" and inconsistent. > >All the mainframes that I know of have big-endian data formats; so does > >the (by far) most popular microprocessor for engineering and scientific > >use (MC68020). > > Does that mean that if it uses a National or Intel chip it isn't a > workstation? Of course not; no reasonable reading of his statement could possibly lead to that conclusion. 1) It didn't say "workstation" anywhere. 2) It didn't say that only Motorola chips were used in engineering and scientific machines, just that they were the most popular chip for those sorts of machines. A couple of points: 1) The aesthetics of an architecture is rather subjective. Person A may think that a machine that has a nice cute one-to-one mapping between constructs in some high-level language and machine primitives is Beautiful and therefore Good; person B may think that a machine that has primitives that can be used as simple building blocks to express the operations actually performed by programs is Beautiful and therefore Good. As such, the aesthetics of little-endian, big-endian, middle-endian, or whatever machines is really rather irrelevant to their merits as designs. 2) The chances that any given vendor will change their byte order because 1) people say the current one is ugly or 2) some committee decides that the other byte order is the One True Way are somewhere between zip and nil. You're just going to have to live with different byte orders, and use data representation conventions when exchanging data between machines.
socha@drivax.UUCP (01/23/87)
in reply to: radford@calgary.UUCP (Radford Neal) > In article <980@gould9.UUCP>, joel@gould9.UUCP (Joel West) writes: > > The 68000 is big-endian, as is the 68020. > > > > Recently I saw the suggestion that the 68008 is little-endian, > > except when fetching opcodes. Could this be true? > Anyway, I've actually used both 68000 and 68008 processors, and can assure > you that they are pretty much compatible. According to the documentation, ^^^^^^^^^^^ > Radford Neal The University of Calgary Actually, the 68008 is internally EXACTLY a 68000 with the exception of an interface to an 8 bit bus. i.e. the ALU, microcode, registers all no change! Just whenever the 68000 wants to do a 16 bit read, there are two memory cycles to get the LSB and MSB at their correct 68000 addresses. ---- -- UUCP:...!amdahl!drivax!socha WAT Iron'75 "Everything should be made as simple as possible but not simpler." A. Einstein
lamaster@pioneer.arpa (Hugh LaMaster) (01/23/87)
In article <11858@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes: > >A couple of points: > > 1) The aesthetics of an architecture is rather subjective. Person Agreed. I note, though, that wierd (e.g. VAX floating point) data formats can, in addition to being unaesthetic, cost users real money when they need to move data between machines of different types. > > 2) The chances that any given vendor will change their byte order > because 1) people say the current one is ugly or 2) some > committee decides that the other byte order is the One True > Way are somewhere between zip and nil. You're just going to > have to live with different byte orders, and use data > representation conventions when exchanging data between machines. I have to disagree with this. When the IEEE floating point standard was first proposed, everyone said that that it would never fly. Well, it has been almost ten years, but in fact almost all new workstations and minicomputers are using it. If bit-byte-word order can be resolved, in ten years everyone will be using it. And, in the meantime, software could be written knowing what the standard order will be. In any case, with networked environments becoming the norm, lots of sites are going to want to be able to move binary data between machines. Don't expect that this issue will go away. Hugh LaMaster, m/s 233-9, UUCP: {seismo,hplabs}!nike!pioneer!lamaster NASA Ames Research Center ARPA: lamaster@ames-pioneer.arpa Moffett Field, CA 94035 ARPA: lamaster%pioneer@ames.arpa Phone: (415)694-6117 ARPA: lamaster@ames.arc.nasa.gov "He understood the difference between results and excuses." ("Any opinions expressed herein are solely the responsibility of the author and do not represent the opinions of NASA or the U.S. Government")
guy%gorodish@Sun.COM (Guy Harris) (01/23/87)
> I have to disagree with this. When the IEEE floating point > standard was first proposed, everyone said that that it would > never fly. Well, it has been almost ten years, but in fact > almost all new workstations and minicomputers are using it. Note the word "new" here. I said that the chances of getting any current vendor to change their byte order based on some standard are somewhere between zip and nil, and I stand by that claim. If IEEE chooses a standard byte order, the chances that Motorola, IBM, Intel, National Semiconductor, DEC, or whoever will declare a flag day and that the 68000, 370, 8086, NS32000, VAX, etc. families will instantly change their byte order are, again, somewhere between zip and nil. Neither the 370 nor the VAX have adopted the IEEE standard, so it doesn't give any encouragement that *existing* machines will somehow adopt an IEEE byte order. We won't even discuss: the chances that a committee that will, presumably, have Intel, DEC, and National Semiconductor on it will choose a big-endian byte order; the chances that a committee that will, presumably, have Motorola on it will choose a little-endian byte order; or the chances that a committee that will, presumably, have IBM on it will choose *any* byte order (they have one major big-endian line, and one little-endian line, and there is no way in hell that they're going to blow either of those lines out of the water). > If bit-byte-word order can be resolved, in ten years everyone > will be using it. Give me a break. *Everyone*? If we choose a big-endian byte order, in ten years all the IBM PCs out there will become big-endian? If we choose a little-endian byte order, in ten years all the 370s out there will become little-endian? (I remind you that the ASCII character set has been adopted as a standard, and there are still machines that use - and will continue to use - EBCDIC internally. It's been more than 10 years since ASCII was adopted.) > And, in the meantime, software could be written knowing what > the standard order will be. And that software will not run on a very large percentage of the machines out there. You don't seem to realize that the format of integral data, addresses, instructions, etc. are several orders of magnitude more central that the format of floating-point numbers. DEC has adopted floating-point formats in the VAX (G and H), but they still support the old formats. In theory, if they felt so motivated they could probably add IEEE formats as well. It's nowhere near that simple for byte order; if DEC were to adopt a big-endian byte order for the VAX, they'd have to add big-endian versions of most of their instructions to the VAX instruction set, or add a mode bit to select the byte order. If they did the latter, they'd have to have software that could deal with data files, etc. with either byte order, and two versions of library routines, and two versions of system calls, etc., etc., etc.. > In any case, with networked environments becoming the norm, > lots of sites are going to want to be able to move binary > data between machines. Don't expect that this issue will > go away. Sigh. Have you ever heard of X.400, or Sun XDR, or...? I'm *quite* aware that there is an issue of moving binary data *between* machines. This does not, however, mandate a standard byte order for representing data internally on machines! Yes, having to convert data into some standard external form when passing it between machines is a nuisance; however, the inconvenience involved in making this conversion is much smaller than the inconvenience involved in changing something as fundamental as the byte order in an architecture that has a large existing software and data base. I certainly don't expect that this issue will go away, for one simple reason - it'll be a cold day in Hell before all the machines out there have the same byte order!