g-rh@cca.UUCP (Richard Harter) (04/07/86)
In article <> rbj%icst-cmr@smoke.UUCP writes: >Bill Crews writes: > > The only reason you got -28672 for BIG instead of nulls is > because your machine has backwards byte order. > >Sorry Bill, *you're* the one that's got backwards byte order. Little >Endian is `correct', even tho bucking historical convention. > >My reasoning is this: The original byte ordering was done the obvious >way, Big Endian. If this was so perfect, why would a sane man design >anything Little Endian? For compelling mathematical reasons! >You wouldn't number your bits backwards (within a register) would you? >Admittedly, some people do, but they must not know any better. > Well, no, little-endian came about because the engineers at DEC who designed the PDP-11 made an arbitrary decision that was not well thought out. I will not essay to defend the sanity of DEC engineers, and cannot recommend that any one else do so (:-)). It was a bad decision. Consider the following problem. You have an array of 4 byte integers. If you sort the array numerically you get one result. If you regard the bytes as characters and sort them lexicographically on a little endian machine you get a different result. The reason is that the most signifigant byte occupies the eight least signifigant bits. Consistency of signifigance requires that the direction of signifigance be the same for both bytes and bits. Little-endian came about from the idea of making the lsb of a word be bit 0. From this it follows that byte 0 should be the lsbyte of the word. This is in natural analogy with the idea that word 0 is the l.s. word. The object is: bit 0 is the lowest address bit byte 0 is the lowest address byte word 0 is the lowest address word This is the "all addressing flows in the same direction" constraint. If that were the only constraint either big-endian or little-endian would be acceptable, provided that it were followed through consistently. However we have a second constraint, that comparison be valid whether a word is considered as a character string or as a number. Given both constraints, the correct addressing assignment is: bit 0 is the most signifigant bit byte 0 is the most signifigant byte. In short, little-endian was a mistake, is a mistake, and will continue to be a mistake. Richard Harter, SMDS Inc.
ggs@ulysses.UUCP (Griff Smith) (04/07/86)
... > Well, no, little-endian came about because the engineers at DEC > who designed the PDP-11 made an arbitrary decision that was not well > thought out. I will not essay to defend the sanity of DEC engineers, > and cannot recommend that any one else do so (:-)). It was a bad > decision. . ... > In short, little-endian was a mistake, is a mistake, and will continue > to be a mistake. > > Richard Harter, SMDS Inc. As an old PDP-11 hacker, I can't agree with the condemnation of the DEC engineering decision. You are looking at it from the perspective of a modern software engineer who wouldn't think of type punning and other hacks. To an assembly language programmer, however, the ability to use the same address to test the low and high bytes of a device status register meant that code would be shorter and faster. It also increased the number of cases where indirect addressing could be used with register pointers. You can't expect the engineers to have anticipated that high-level languages would discredit these practices. My own theory about big vs. little end usage is that the mistake was made hundreds of years ago when merchants started to adopt the Arabic (as adapted from earlier Hindu sources) number system. Note that Arabic is written right to left; note that numbers are written right to left. I think the Arabs knew what they were doing; they set the notation so that the natural computational order followed the conventional lexical order. The European merchants missed the point and copied the notation verbatum instead of compensating for the opposite lexical convention. In summary, big-endian was a mistake, but there is no use fighting it. Any better-informed historical challenges will be cheerfully accepted; the best data I could get was from an ex-patriot of Iran. -- Griff Smith AT&T (Bell Laboratories), Murray Hill Phone: (201) 582-7736 Internet: ggs@ulysses.uucp UUCP: ulysses!ggs ( {allegra|ihnp4}!ulysses!ggs )
david@sun.uucp (David DiGiacomo) (04/07/86)
In article <7046@cca.UUCP> g-rh@cca.UUCP (Richard Harter) writes: > Well, no, little-endian came about because the engineers at DEC >who designed the PDP-11 made an arbitrary decision that was not well >thought out. I will not essay to defend the sanity of DEC engineers, >and cannot recommend that any one else do so (:-)). It was a bad >decision. You are not considering the context of the decision. A little-endian architecture is highly desirable in an environment which uses call-by-reference exclusively (e.g. PDP-11 Fortran). It allows any size value to be passed to a subroutine expecting a parameter of that size or smaller. In a predominantly call-by-value environment (e.g. C) this is not particularly important, but there are plenty of programmers who have been burned by { int c = 0; read(fd, &c, 1); }. -- David DiGiacomo {decvax, ihnp4, ucbvax}!sun!david david@sun.com Sun Microsystems, Mt. View, CA (415) 960-7495
mj@myrias.UUCP (Michal Jaegermann) (04/08/86)
>>I think the Arabs knew what they were doing; they set the >>notation so that the natural computational order followed the >>conventional lexical order. The European merchants missed the point >>and copied the notation verbatum instead of compensating for the >>opposite lexical convention. We have right now A. D. MCMLIIIVI. Somehow Romans scanned numbers also in a "wrong" order. :-) At least if you are writing from left-to-right. Michal Jaegermann Myrias Research Corporation Edmonton, Alberta ...ihnp4!alberta!myrias!mj
markb@sdcrdcf.UUCP (Mark Biggar) (04/08/86)
In article <7046@cca.UUCP> g-rh@cca.UUCP (Richard Harter) writes: >However we have a second constraint, that comparison be valid whether >a word is considered as a character string or as a number. Where in the world did this constraint come from. This has to be one of most blatant violations of the spirit of data abstraction and information hiding I have ever seen. You don't require this if the types being considered are signed/unsigned or int/float, why should you if the types are char-string/int. Mark Biggar {allegra,burdvax,cbosgd,hplabs,ihnp4,akgua,sdcsvax}!sdcrdcf!markb
gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/08/86)
In article <7046@cca.UUCP> g-rh@cca.UUCP (Richard Harter) writes: > Well, no, little-endian came about because the engineers at DEC >who designed the PDP-11 made an arbitrary decision that was not well >thought out. I will not essay to defend the sanity of DEC engineers, >and cannot recommend that any one else do so (:-)). It was a bad >decision. One advantage of DEC's decision was that passing numeric quantities by reference to procedures was simplified; the procedure could use the lower bits of the parameter without worrying about the size of the parameter. This made code generation (e.g. Fortran) easier and was a real programmer convenience (I certainly exploited it myself). When the FP11 was designed, DEC screwed up and got the 4-byte (long) integer format wrong (since fixed in the VAX-11, which was otherwise most careful to use PDP-11 compatible data formats). Some Fortrans kept the bytes in little-endian order anyway, so as not to lose the above-mentioned advantage, and reshuffled them during FP11 operations. Many implementors of C on big-endian machines have had to cope with the pointer conversion problem, which is simpler on little-endians. > Consider the following problem. You have an array of 4 byte >integers. If you sort the array numerically you get one result. If >you regard the bytes as characters and sort them lexicographically on >a little endian machine you get a different result. The reason is that >the most signifigant byte occupies the eight least signifigant bits. >Consistency of signifigance requires that the direction of signifigance >be the same for both bytes and bits. This is not a practical example. If it were, you would have to make the same argument for floating-point numbers, records from a file, and so on. >In short, little-endian was a mistake, is a mistake, and will continue >to be a mistake. About the only thing that is clear, apart from the better fit of big-endian to block-oriented operations and of little-endian to stream-oriented ones, is that this is a "religious issue".
yde@necis.UUCP (David Elins ext. 220) (04/09/86)
What is right. Whether a lexicographic sort be the equal of a numeric, or mayhaps the addresses of bytes within a word mimic their significance. 'Tis not ours to decide, we may only give our thanks that there be (so far) only two ways of attaching significance to these beasties. A good article on the "Byte Order" problem appeared in the April Fool's day issue of Computer Language Magazine. The author (Mike Higgins) took a very sensible stand on what is the "right" way to order bytes. That issue of CL was much fun. The article on Un*x is particularly good; it answered a lot of questions for me on why things are the way they are.
rbj@icst-cmr (Root Boy Jim) (04/09/86)
> The only reason you got -28672 for BIG instead of nulls is > because your machine has backwards byte order. > >Sorry Bill, *you're* the one that's got backwards byte order. Little >Endian is `correct', even tho bucking historical convention. > >My reasoning is this: The original byte ordering was done the obvious >way, Big Endian. If this was so perfect, why would a sane man design >anything Little Endian? For compelling mathematical reasons! >You wouldn't number your bits backwards (within a register) would you? >Admittedly, some people do, but they must not know any better. > Well, no, little-endian came about because the engineers at DEC who designed the PDP-11 made an arbitrary decision that was not well thought out. I will not essay to defend the sanity of DEC engineers, and cannot recommend that any one else do so (:-)). It was a bad decision. Like I said, `Compelling Mathematical Reasons'. Suppose you were to write an infinite precision multiply routine. Each character represents a decimal digit. The routine is called as `mult(x,sx,y,sy)'. The code becomes: char *mult(x,sx,y,sy) register char *x,*y; { register char *p; register int j,k; p = calloc(sx + sy,1); for (j = 0; j < sx; j++) for (k = 0; k < sy; k++) if ((p[j + k] += x[j] * y[k]) > 10) p[j + k + 1] += p[j + k] / 10, p[j + k ] %= 10; return(p); } If you did it Big Endian, you would have to go to the end of the `string' first and do it backwards with all kinds of god-awful subtraxion expressions. There are other examples. Consider the following problem. You have an array of 4 byte integers. If you sort the array numerically you get one result. If you regard the bytes as characters and sort them lexicographically on a little endian machine you get a different result. The reason is that the most signifigant byte occupies the eight least signifigant bits. Consistency of signifigance requires that the direction of signifigance be the same for both bytes and bits. No. You sort character by character. Character Strings are defined as Big Endian. Don't mix apples and oranges. Little-endian came about from the idea of making the lsb of a word be bit 0. From this it follows that byte 0 should be the lsbyte of the word. This is in natural analogy with the idea that word 0 is the l.s. word. The object is: bit 0 is the lowest address bit byte 0 is the lowest address byte word 0 is the lowest address word You Got It! This is the "all addressing flows in the same direction" constraint. If that were the only constraint either big-endian or little-endian would be acceptable, provided that it were followed thru consistently. Perhaps now is a good time to address the issue of floating point. Several machines, including the VAX, have the following formats: Byte 0 Bytes 1-3 Bytes 4-7 Single: sign+exp 3*mantissa Double: sign+exp 3*mantissa 4*mantissa Allow me to coin the term `Middle Endian', which means that values are stored with the bits closest to the binary point in lower memory. This unifys the concept of floating points and integers. Don't even mention `BCD' or `Packed Decimal', which are more like strings than numbers. However we have a second constraint, that comparison be valid whether a word is considered as a character string or as a number. Given both constraints, the correct addressing assignment is: bit 0 is the most signifigant bit Full Shucking Bit! This is prime example of bogosity. I have to know how many bits you've get in a register (or word, or whatever) to know what bit 13 is. Bit n should represent `2 raised to the power n', not `2 raised to the power (k - n)'. That's what happens when engineers design hardware instead of mathematicians or computer scientists. byte 0 is the most signifigant byte. In short, little-endian was a mistake, is a mistake, and will continue to be a mistake. By now, you have probably seen the posting on `type punning', which allows inferior functions to manipulate larger call-by-reference arguments as smaller quantities as long as they remain unsigned with respect to the smaller quantity. Why do you think DEC defined the `Jump Low Bit' instruxion? So they could use one test on any of LOGICAL*[124] variables. And TRUE == 1. Richard Harter, SMDS Inc. In short, DEC's decision *was* well thought out. Like I said, if there was no good reason to design Little Endian, it never would have been designed. Often times, the obvious (and usually first) idea is wrong. Consider Origin One Indexing vs Origin Zero Indexing for example P.S. Does anybody remember the `Big Indian' pinball machine? Does anybody remember pinball machines? (Root Boy) Jim Cottrell <rbj@cmr> Baseball in D.C. in `87!
thomas@utah-gr.UUCP (Spencer W. Thomas) (04/09/86)
The best (and most tongue in cheek) discussion I have ever seen about this issue is a USC/ISI memo written by Danny Cohen, titled "On Holy Wars and a Plea for Peace". The whole thing is about 33000 bytes, so I won't post it, but I will give some excerpts. [Well, even my excerpts come to about 1/3 of the original.] IEN 137 Danny Cohen U S C/I S I 1 April 1980 ON HOLY WARS AND A PLEA FOR PEACE INTRODUCTION This is an attempt to stop a war. I hope it is not too late and that somehow, magically perhaps, peace will prevail again. The latecomers into the arena believe that the issue is: "What is the proper byte order in messages?". The root of the conflict lies much deeper than that. It is the question of which bit should travel first, the bit from the little end of the word, or the bit from the big end of the word? The followers of the former approach are called the Little-Endians, and the followers of the latter are called the Big-Endians. The details of the holy war between the Little-Endians and the Big-Endians are documented in [6] and described, in brief, in the Appendix. I recommend that you read it at this point. ... In a consistent order, the bit-order, the byte-order, the word-order, the page-order, and all the other higher level orders are all the same. Hence, when considering a serial bit-stream, along a communication line for example, the "chunk" size which the originator of that stream has in mind is not important. There are two possible consistent orders. One is starting with the narrow end of each word (aka "LSB") as the Little-Endians do, or starting with the wide end (aka "MSB") as their rivals, the Big-Endians, do. In this note we usually use the following sample numbers: a "word" is a 32-bit quantity and is designated by a "W", and a "byte" is an 8-bit quantity which is designated by a "C" (for "Character", not to be confused with "B" for "Bit)". MEMORY ORDER The first word in memory is designated as W0, by both regimes. Unfortunately, the harmony goes no further. The Little-Endians assign B0 to the LSB of the words and B31 is the MSB. The Big-Endians do just the opposite, B0 is the MSB and B31 is the LSB. By the way, if mathematicians had their way, every sequence would be numbered from ZERO up, not from ONE, as is traditionally done. If so, the first item would be called the "zeroth".... Since most computers are not built by mathematicians, it is no wonder that some computers designate bits from B1 to B32, in either the Little-Endians' or the Big-Endians' order. These people probably would like to number their words from W1 up, just to be consistent. ... On the other hand, the Little-Endians have their view, which is different but also self-consistent. They believe that one should start with the narrow end of every word, and that low addresses are of lower order than high addresses. Therefore they put their words on paper as if they were written in Hebrew, like this: ...|---word2---|---word1---|---word0---| When they add the bit order and the byte order they get: ...|---word2---|---word1---|---word0---| ....|C3,C2,C1,C0|C3,C2,C1,C0|C3,C2,C1,C0| .....|B31......B0|B31......B0|B31......B0| In this regime, when word W(n) is shifted right, its LSB moves into the MSB of word W(n-1). English text strings are stored in the same order, with the first character in C0 of W0, the next in C1 of W0, and so on. This order is very consistent with itself, with the Hebrew language, and (more importantly) with mathematics, because significance increases with increasing item numbers (address). It has the disadvantage that English character streams appear to be written backwards; this is only an aesthetic problem but, admittedly, it looks funny, especially to speakers of English. In order to avoid receiving strange comments about this orders the Little-Endians pretend that they are Chinese, and write the bytes, not right-to-left but top-to-bottom, like: C0: "J" C1: "O" C2: "H" C3: "N" ..etc.. ... (Discussion of PDP-11 Floating point unit leads into ...) However, due to some oversights in the security screening process, the Blefuscuians took over, again. They assigned, as they always do, the wide end to the LOWer addresses in memory, and the narrow to the HIGHer addresses. Let "xy" and "abcd" be 32- and 64-bit floating-point numbers, respectively. Let's look how these numbers are stored in memory: ddddddddL ccccccccc bbbbbbbbb SMaaaaaaa yyyyyyyyL SMxxxxxxx ....|--word5--|--word4--|--word3--|--word2--|--word1--|--word0--| .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-| ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0| Well, Blefuscu scores many points for this. The above reference in [3] does not even try to camouflage it by any Chinese notation. Encouraged by this success, as minor as it is, the Blefuscuians tried to pull another fast one. This time it was on the VAX, the sacred machine which all the Little-Endians worship. Let's look at the VAX order. Again, we look at the way the above data (with xy being a 32-bit integer) is stored in memory: "N" "H" "O" "J" SMzzzzzzL SMxxxxxxx yyyyyyyyL ...ng2-------|-------long1-------|-------long0-------| ....|--word4--|--word3--|--word2--|--word1--|--word0--| .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-| ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0| What a beautifully consistent Little-Endians' order this is !!! So, what about the infiltrators? Did they completely fail in carrying out their mission? Since the integer arithmetic was closely guarded they attacked the floating point and the double-floating which were already known to be easy prey. Let's look, again, at the way the above data is stored, except that now the 32-bit quantity xy is a floating point number: now this data is organized in memory in the following Blefuscuian way: "N" "H" "O" "J" SMzzzzzzL yyyyyyyyL SMxxxxxxx ...ng2-------|-------long1-------|-------long0-------| ....|--word4--|--word3--|--word2--|--word1--|--word0--| .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-| ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0| Blefuscu scores again. The VAX is found guilty, however with the explanation that it tries to be compatible with the PDP11. Having found themselves there, the VAXians found a way around this unaesthetic appearance: the VAX literature (e.g., p. 10 of [4]) describes this order by using the Chinese top-to-bottom notation, rather than an embarrassing left-to-right or right-to-left one. This page is a marvel. One has to admire the skillful way in which some quantities are shown in columns 8-bit wide, some in 16 and other in 32, all in order to avoid the egg-on-the-face problem..... By the way, some engineering-type people complain about the "Chinese" (vertical) notation because usually the top (aka "up") of the diagrams corresponds to "low"-memory (low addresses). However, anyone who was brought up by computer scientists, rather than by botanists, knows that trees grow downward, having their roots at the top of the page and their leaves down below. Computer scientists seldom remember which way "up" really is (see 2.3 of [5], pp. 305-309). ... SUMMARY (of the Memory-Order section) To the best of my knowledge only the Big-Endians of Blefuscu have built systems with a consistent order which works across chunk-boundaries, registers, instructions and memories. I failed to find a Little-Endians' system which is totally consistent. ... (Discussion in similar vein of various transmission protocols) SUMMARY (of the Transmission-Order section) There are two camps each with its own language. These languages are as compatible with each other as any Semitic and Latin languages are. All Big-Endians can talk to each other with relative ease. So can all the Little-Endians, even though there are some differences among the dialects used by different tribes. There is no middle ground. Only one end can go first. CONCLUSION Each camp tries to convert the other. Like all the religious wars of the past, logic is not the decisive tool. Power is. This holy war is not the first one, and probably will not be the last one either. The "Be reasonable, do it my way" approach does not work. Neither does the Esperanto approach of "let's all switch to yet a new language". Our communication world may split according to the language used. A certain book (which is NOT mentioned in the references list) has an interesting story about a similar phenomenon, the Tower of Babel. Little-Endians are Little-Endians and Big-Endians are Big-Endians and never the twain shall meet. We would like to see some Gulliver standing up between the two islands, forcing a unified communication regime on all of us. I do hope that my way will be chosen, but I believe that, after all, which way is chosen does not make too much difference. It is more important to agree upon an order than which order is agreed upon. How about tossing a coin ??? ... For ease of reference please note that Lilliput and Little-Endians both start with an "L", and that both Blefuscu and Big-Endians start with a "B". This is handy while reading this note. R E F E R E N C E S [1] Bolt Beranek & Newman. Report No. 1822: Interface Message Processor. Technical Report, BB&N, May, 1978. [2] CCITT. Orange Book. Volume VIII.2: Public Data Networks. International Telecommunication Union, Geneva, 1977. [3] DEC. PDP11 04/05/10/35/40/45 processor handbook. Digital Equipment Corp., 1975. [4] DEC. VAX11 - Architecture Handbook. Digital Equipment Corp., 1979. [5] Knuth, D. E. The Art of Computer Programming. Volume I: Fundamental Algorithms. Addison-Wesley, 1968. [6] Swift, Jonathan. Gulliver's Travel. Unknown publisher, 1726. OTHER SLIGHTLY RELATED TOPICS (IF AT ALL) Who's on first? Zero or One ?? People start counting from the number ONE. The very word FIRST is abbreviated into the symbol "1st" which indicates ONE, but this is a very modern notation. The older notions do not necessarily support this relationship. In English and French - the word "first" is not derived from the word "one" but from an old word for "prince" (which means "foremost"). Similarly, the English word "second" is not derived from the number "two" but from an old word which means "to follow". Obviously there is an close relation between "third" and "three", "fourth" and "four" and so on. Similarly, in Hebrew, for example, the word "first" is derived from the word "head", meaning "the foremost", but not specifically No. 1. The Hebrew word for "second" is specifically derived from the word "two". The same for three, four and all the other numbers. ... SWIFT's POINT It may be interesting to notice that the point which Jonathan Swift tried to convey in Gulliver's Travels in exactly the opposite of the point of this note. Swift's point is that the difference between breaking the egg at the little-end and breaking it at the big-end is trivial. Therefore, he suggests, that everyone does it in his own preferred way. We agree that the difference between sending eggs with the little- or the big-end first is trivial, but we insist that everyone must do it in the same way, to avoid anarchy. Since the difference is trivial we may choose either way, but a decision must be made. -- =Spencer ({ihnp4,decvax}!utah-cs!thomas, thomas@utah-cs.ARPA)
g-rh@cca.UUCP (Richard Harter) (04/10/86)
In article <> ggs@ulysses.UUCP (Griff Smith) writes: >... >> Well, no, little-endian came about because the engineers at DEC >> who designed the PDP-11 made an arbitrary decision that was not well >> thought out. I will not essay to defend the sanity of DEC engineers, >> and cannot recommend that any one else do so (:-)). It was a bad >> decision. >. >... >> In short, little-endian was a mistake, is a mistake, and will continue >> to be a mistake. >> >> Richard Harter, SMDS Inc. > >As an old PDP-11 hacker, I can't agree with the condemnation of the >DEC engineering decision. You are looking at it from the perspective >of a modern software engineer who wouldn't think of type punning and >other hacks. To an assembly language programmer, however, the ability >to use the same address to test the low and high bytes of a device >status register meant that code would be shorter and faster. It also >increased the number of cases where indirect addressing could be used >with register pointers. You can't expect the engineers to have >anticipated that high-level languages would discredit these practices. Ah, but I too am an old PDP-11 hacker. (In fact, my first DEC machine was a PDP-1!) I've done all those good things you talk about -- however you could do exactly the same things in a correctly designed big-endian machine. The issues at hand have nothing to do with modern software engineering and high-level languages. See below. > >My own theory about big vs. little end usage is that the mistake was >made hundreds of years ago when merchants started to adopt the Arabic >(as adapted from earlier Hindu sources) number system. Note that >Arabic is written right to left; note that numbers are written right >to left. I think the Arabs knew what they were doing; they set the >notation so that the natural computational order followed the >conventional lexical order. The European merchants missed the point >and copied the notation verbatum instead of compensating for the >opposite lexical convention. > >In summary, big-endian was a mistake, but there is no use fighting it. >Any better-informed historical challenges will be cheerfully accepted; >the best data I could get was from an ex-patriot of Iran. Being that we (You and I and a select few others) are all reasonable beings let us all eschew slogans and epitaths (especially me) and reason together. Perhaps we can find some truth. Let us see. Discussions of little-endian vs big-endian are often muddied by two collateral issues, the merits of coherent addressing, and the merits of byte addressing. Coherent addressing is a neologism I have invented for this discussion. All it really means is that all addressing goes in the same direction. Thus bit 0 of byte 0 of a word is bit 0 of the word, etc. The diagram below illustrates coherent addressing: Byte: 0 1 2 3 4 5 6 7 8 .... Int*2: 0 1 2 3 4 .... Int*4: 0 1 2 .... Coherent addressing is clearly desirable. However coherent addressing has nothing to do, per se, with little-endian or big-endian. A common point of confusion in these arguments is to argue for the advantages of coherent addressing in the belief that one is thereby arguing for ones favorite position. The PDP-11 uses uniform byte addressing. The advantage of uniform byte addressing is that the addresses are independent of the size of the entity being addressed. The disadvantage of uniform byte addressing is that it consumes address bits -- one for shorts, two for longs, and more if we extend the scheme to larger blocks. This is not critical in present architectures; it would be if we dropped to the level of uniform bit addressing. Again, the merits of little-endian versus big-endian have nothing to do with the merits of uniform byte addressing. Big-endian versus little-endian only arises when we decide which bit (byte) of a word is byte 0 -- the most signifigant byte or the least signifigant byte. Either choice will do for coherent addressing. However the choice does affect two areas, arithmetic and comparison. Let us consider the problems of doing arithmetic on integers of indefinite size. In that case the natural method is to represent integers as polynomials in powers of two and do the arithmetic starting with the lsb and work up (the point is that the algorithms do not depend on knowing the location of the msb's of the operands.) In short, little-endian is the correct choice for doing arithmetic on integers of indefinite size. For comparisons of strings of indefinite size, on the other hand, the correct choice is big-endian. The key point is that the natural method for comparison is to first compare msb's and work down. It turns out, if one thinks about it, that the natural model for strings is the binary fraction rather than binary polynomial. The advantages of little-endian probably show up in hardware design at the microcode level even though the machine instructions for arithmetic operate on fixed size operands. If this is the case then this might have been a design factor when the PDP-11 was first being designed. (Cheap small computers were a LOT slower in those days, and every trick to gain speed and simplicity at the hardware level counted.) At the machine code level, however, all arithmetic instructions are fixed length so little-endian/big-endian is, again, irrelevant. In general programming, indefinite length arithmetic is a rare animal. Comparison of strings, either bits or characters, is ubiquitous. And it is here that big-endian is preferable. In view of this fact I conclude again, little-endian was a mistake, albeit less of one than I made it out to be. Some side issues: We write left to right because most people are right handed. Script style matters here -- if we are drawing characters the direction doesn't matter. In choosing the representation of numbers the issues are the same as they are for byte ordering. If size is the issue, big-endian has the advantage; if arithmetic is the issue, little-endian has the advantage. As a personal note, when I was young I drilled myself on fast multiplication. I could write down two numbers and then write down their product underneath at hand writing speed. When I was in practice I could do products of five digit numbers readily. The trick isn't hard -- you just do cross multiplication. However you have to do it little-endian style, i.e. work from the low end up. Richard Harter, SMDS Inc.
greg@utcsri.UUCP (Gregory Smith) (04/10/86)
>> In short, little-endian was a mistake, is a mistake, and will continue >In summary, big-endian was a mistake, but there is no use fighting it. bleah... bleah... bleah... I have only been reading this net.group since Jan86, but I can't believe this issue hasn't been discussed. My opinion: There ain't no bad one and good one, just different ones... Adding/subtracting multi-precision integers: Little-end is better because you start at the little end. Comparing multi-prec. numbers: Big-end is better because you start at the big end. Working with different-sized multi-prec numbers. Little-end is better because both numbers start at "1's". In Big-End, the first bytes have different weights, so you can't even compare them directly. LittleEnd also allows, say, an int to be copied to a char by reading the byte at the *same* address as the int. Testing multi-prec ints for >=0/<0 Big-end wins. just test the first byte. Testing mult-prec ints for odd/even. Little-end wins. just test the first byte. :-) Working with character strings: Big-end is better because lexical comparison ( first char most significant ) is equivalent to numerical comparison of mult-prec ints. Transmitting serial async data: Little-end (bitwise) is essential if 7-bit/8-bit formats are to be used, ( and abused :-)), since that way the LSB location is independent of the # of bits. Only the 8th bit is in doubt. This also affects CRC calculation: if you write software to find a CRC for data which is to be serially transmitted,( async or otherwise ) you should know whether it is to be sent little-end or big-end, or burst error detection will suffer. I don't know which is easier or if there is a difference. Human readability: Big-end is supposedly better 'cause that's how we write'. I feel that this is completely irrelavent and is rather like saying that all floating-point calculations should be done in ASCII. Software to convert back and forth from ascii to binary is not affected much by the order. And if God had wanted humans to spend a lot of time reading hex dumps, he would have given us 16 fingers :-). So saying 'Well little-endian looks funny on a dump' makes no sense. Bit numbering within bytes: Little-end wins, LSB = bit 0 means that bit n = 2^n regardless of word width. This argument ( like the parallel one at the byte & word level) doesn't carry much weight by itself - practical situations like I have given re adding and comparing are more significant. However, numbering the LSB 0 is ( almost) universal. Ahhh... say the big-endians, look at two consecutive bytes: bit #'s: 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 MSB LSB MSB LSB seen as an int: ............LSB MSB............ (little-end int) bit #'s: 7 6 5 4 3 2 1 0 151413121110 9 8 The int bit #'s are out of sequence, says the bigendian. That's because the big-endian ( like the little-endian) has written the bytes down with the MSB on the left. Just like a human. If you write them down the other way, the LSB of the int is on the left, and the bottom line reads 0,1,2...14,15. How nice. Remember, in many computers, there is nothing joining the msb of one byte to the lsb of the next, or vice-versa, except the byte order of ints and longer types. Some machines ( e.g. some of the 680XX? ) can 'peel' arbitrary string of bits from memory. This should be consistent - i.e. peeling off 16 bits that happen to align with two bytes should give the same result as just reading those two bytes as an int. I could come up with more, but it's late, and you don't want to read them. In fact, I'm a little-end fan, but only just - it doesn't bother me a bit programming 6809 or 68K in assembler. The point is, whatever is done, it must be CONSISTENT. Thus I don't like the 1,0,2,3 'hybrid' byte ordering of longs on a PDP11*. And ( getting obsure ) the BDS-C (CP/M 80 ) compiler implements longs in a big-end fashion while ints are little-end - also nasty. *not in the least bit [no pun] a result of the architecture of the PDP11, BTW. So please no more postings saying "X is completely stupid and I wouldn't even be seen in the same building with a machine that orders its bytes that way". Considered arguments are, as always, welcome. P.S. Someone said that 'Little-end is a result of the PDP-11'. I don't know, but I find that hard to believe... except of course in the case of the VAX. Maybe that's all that was meant. Anybody know? I do know that the following micros are little-end: 8080,Z80,6502. E-mail please. -- "If you aren't making any mistakes, you aren't doing anything". ---------------------------------------------------------------------- Greg Smith University of Toronto UUCP: ..utzoo!utcsri!greg
ggs@ulysses.UUCP (Griff Smith) (04/10/86)
> >>I think the Arabs knew what they were doing; they set the > >>notation so that the natural computational order followed the > >>conventional lexical order. The European merchants missed the point > >>and copied the notation verbatum instead of compensating for the > >>opposite lexical convention. > > We have right now A. D. MCMLIIIVI. Somehow Romans scanned numbers also > in a "wrong" order. :-) At least if you are writing from left-to-right. > > Michal Jaegermann > Myrias Research Corporation > Edmonton, Alberta > ...ihnp4!alberta!myrias!mj Try doing arithmetic with roman numerals! My impression is that the Arabic notation was adopted (with much discussion about its sanity, morality, etc) after merchants conceded that Roman numerals were too clumsy for business computations. Considering how the Romans blew the notation, I wouldn't give them much credit for picking the right digit order. Given the entrenched big-endian order in most of the spoken languages, the merchants may have hesitated to adjust the notation for fear that it would be hard to speak the numbers (though German numbers up to 100 are little-endian!). More likely, they just adopted the notation verbatum without thinking much about it. -- Griff Smith AT&T (Bell Laboratories), Murray Hill Phone: (201) 582-7736 Internet: ggs@ulysses.uucp UUCP: ulysses!ggs ( {allegra|ihnp4}!ulysses!ggs )
henry@utzoo.UUCP (Henry Spencer) (04/10/86)
> Consider the following problem. You have an array of 4 byte > integers. If you sort the array numerically you get one result. If > you regard the bytes as characters and sort them lexicographically on > a little endian machine you get a different result. The reason is that > the most signifigant byte occupies the eight least signifigant bits. > Consistency of signifigance requires that the direction of signifigance > be the same for both bytes and bits. If your machine has unsigned characters and two's-complement integers with the sign bit in the high bit -- e.g. 360/370 -- then neither byte order will make the two comparisons come out the same. So quite apart from all the other flaws in this argument, it just doesn't work. Before you start defending signed characters, remember that they were another arbitrary invention of the pdp11. And a mistake, too. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry
g-rh@cca.UUCP (Richard Harter) (04/11/86)
In article <> gwyn@brl.ARPA writes: > >One advantage of DEC's decision was that passing numeric quantities >by reference to procedures was simplified; the procedure could use >the lower bits of the parameter without worrying about the size of >the parameter. This made code generation (e.g. Fortran) easier and >was a real programmer convenience (I certainly exploited it myself). > Point well taken. I think that most of us will agree that it is usually bad practice on the part of the programmers to have mismatched types across procedure invocations. However code generating programs should exploit all of the tricks of the trade. > >Many implementors of C on big-endian machines have had to cope with >the pointer conversion problem, which is simpler on little-endians. > Color me skeptical. You may be right, but I would like to see more details. In the model of pointers that it occurs to me to implement, big-Endian and little-endian come out the same. > >> Consider the following problem. You have an array of 4 byte >>integers. If you sort the array numerically you get one result. If >>you regard the bytes as characters and sort them lexicographically on >>a little endian machine you get a different result. The reason is that >>the most signifigant byte occupies the eight least signifigant bits. >>Consistency of signifigance requires that the direction of signifigance >>be the same for both bytes and bits. > >This is not a practical example. If it were, you would have to >make the same argument for floating-point numbers, records from >a file, and so on. > Grumble, grumble. Sorry, this is a real practical problem. Let me give some context. Suppose you are doing a high speed sort of character strings. Being a clever fellow you have read the glossed over parts of the literature and have seen that a math sort is O(n) rather than O(n log n). You also notice that it is cheaper to compare 4 bytes at once with an 32-bit integer compare than to make 4 byte compares. On a BE machine you make the calculations directly; on a LE machine you move the bytes into BE order and then do the compare. Yes, Virginia, this is type punning, and all those other bad things your CS professor told you were no-no's. But it pays off because you are exploiting the effective parallelism of integer address calculation. >>In short, little-endian was a mistake, is a mistake, and will continue >>to be a mistake. > >About the only thing that is clear, apart from the better fit of >big-endian to block-oriented operations and of little-endian to >stream-oriented ones, is that this is a "religious issue". Well, I won't exactly retract my words, but there is much to what you say. I will contend that it was a mistake to introduce a second standard architecture where only one existed before unless the advantages of the second standard are clear-cut. And this is not the case. LE wins in some situations, BE wins in others. But the disadvantages of two standards to deal with outweigh the advantages of either. Richard Harter, SMDS Inc.
crowl@rochester.ARPA (Lawrence Crowl) (04/12/86)
Neither big-endian nor little-endian numbers can be sorted lexicographically. Since the only reason for for attempting to mix character string and number data is to save time, we must assume that we will have an arbitrary mix if characters and numbers, including different size numbers. When sorted lexicographically, the numbers in little-endian format are sorted based on the low order digit. For example 32 > 41. CLEARLY WRONG. When sorted lexicographically, the numbers in big-endian format are sorted based on the first high order digit of each number. This presents a problem when numbers are of different sizes. For example, 32 > 126. CLEARLY WRONG. In both cases, using two's complement sign representation causes problems. For example, both have -41 > 41. CLEARLY WRONG. The basic problem is that our definition of lexicographic sorting is incompatible with our (natural) definition of numeric sorting. CHALLENGE: Come up with a scheme for representing numbers, and a sorting scheme in which numbers sort naturally. Your scheme must deal with variable length character strings and variable size numbers. That is, you cannot requires strings to be padded with nulls, or numbers to be padded with zeros. Note that many part numbering schemes have intermixed letters and digits. -- Lawrence Crowl 716-275-5766 University of Rochester Computer Science Department ...!{allegra,decvax,seismo}!rochester!crowl Rochester, New York, 14627
buls@dataioDataio.UUCP (Rick Buls) (04/15/86)
In article <235@myrias.UUCP> mj@myrias.UUCP (Michal Jaegermann) writes: > >We have right now A. D. MCMLIIIVI. Somehow Romans scanned numbers also syntax error ^^^ >in a "wrong" order. :-) At least if you are writing from left-to-right. -- Rick Buls (Data I/O; Redmond, Wa) uw-beaver!entropy!dataio!buls
robison@uiucdcsb.CS.UIUC.EDU (04/16/86)
> CHALLENGE: Come up with a scheme for representing numbers, and a sorting > scheme in which numbers sort naturally. Your scheme must deal with variable > length character strings and variable size numbers. That is, you cannot > requires strings to be padded with nulls, or numbers to be padded with zeros. ... -4 = AAAAZ -3 = AAAZ -2 = AAZ -1 = AZ 0 = M 1 = Z 2 = ZZ 3 = ZZZ 4 = ZZZZ ... Arch D. Robison University of Illinois at Urbana-Champaign
thomas@kuling.UUCP (Thomas H{meenaho) (04/16/86)
In article <7046@cca.UUCP> g-rh@cca.UUCP (Richard Harter) writes: > Consider the following problem. You have an array of 4 byte >integers. If you sort the array numerically you get one result. If >you regard the bytes as characters and sort them lexicographically on >a little endian machine you get a different result. The reason is that >the most signifigant byte occupies the eight least signifigant bits. >Consistency of signifigance requires that the direction of signifigance >be the same for both bytes and bits. > [Deleted text] >In short, little-endian was a mistake, is a mistake, and will continue >to be a mistake. > > Richard Harter, SMDS Inc. I think big-endians was and is a mistake. The only advantage with them is that it's easy to read a hex dump! I don't think people usually compare text strings word-wise so that is a pretty weak argument. What I find more significant is when you type-cast different types of integers into other sized integers. If for instance you have a long and want to convert it to a short, you just have to read it as a short on a little-endian whereas on a big-endian you have to adjust the address before you can read it. I also think Motorola would have made the 68k a little-endian if they were to design it from scratch knowing what they know today. The 68020 is a pretty hefty processor but the dynamic bus sizing stinks. It can dynamically (on a cycle by cycle basis) adjust to data busses varying from 8 to 32 bits. But as the processor always assume it will get a 32 bit word when it tries to read one you'll have to put the data on the top bits if you have a narrower bus as it is the most significant part that comes first! If you give the processor 8 bits you'll have to put it on D24-D31. And for 16 bits on D16-D31. -- Thomas Hameenaho, Dept. of Computer Science, Uppsala University, Sweden Phone: +46 18 138650 UUCP: thomas@kuling.UUCP (...!{seismo,mcvax}!enea!kuling!thomas)
greg@utcsri.UUCP (Gregory Smith) (04/16/86)
In article <17162@rochester.ARPA> crowl@rochester.UUCP (Lawrence Crowl) writes: > >When sorted lexicographically, the numbers in little-endian format are >sorted based on the low order digit. For example 32 > 41. CLEARLY WRONG. > >When sorted lexicographically, the numbers in big-endian format are sorted >based on the first high order digit of each number. This presents a problem >when numbers are of different sizes. For example, 32 > 126. CLEARLY WRONG. >incompatible with our (natural) definition of numeric sorting. > >CHALLENGE: Come up with a scheme for representing numbers, and a sorting >scheme in which numbers sort naturally. Your scheme must deal with variable >length character strings and variable size numbers. That is, you cannot >requires strings to be padded with nulls, or numbers to be padded with zeros. How about prepending the digit count to big-endian digit strings? so 32 and 126 become 232 and 3126, and a lexical comparison gives 3126 > 232. Of course, leading zeroes in the significand cannot be used. If the strings are of different lengths ( and both positive ) then the longer one is always greater. To support negative numbers, the length digit could be biased. The following convention supports signed strings of length 1-5: 1 2 3 4 5 < length of number 5 6 7 8 9 < prefix for +ve numbers 4 3 2 1 0 < prefix for -ve numbers The significand digits in a negative number would have to be 9's complemented, so 32, 126,-37 and -8 are 632, 7126, 362 and 41. There are two 0's, 49 and 50. ( This makes zero an exception to the leading-zero rule ). Note also that -99 becomes 300,and that 2999 is illegal. This is starting to resemble a floating-point scheme... To allow longer strings, just allow a bigger 'length' field - more digits, or a whole byte. >Note that many part numbering schemes have intermixed letters and digits. But what significance do letters have? do you want them to be ignored, or to be assigned a dictionary order with the numbers? P.S. this rather obscure scenario hardly seems reason to state that little- and big-endians are both wrong. -- "If you aren't making any mistakes, you aren't doing anything". ---------------------------------------------------------------------- Greg Smith University of Toronto UUCP: ..utzoo!utcsri!greg
friesen@psivax.UUCP (Stanley Friesen) (04/16/86)
In article <7158@cca.UUCP> g-rh@cca.UUCP (Richard Harter) writes: >> >>Many implementors of C on big-endian machines have had to cope with >>the pointer conversion problem, which is simpler on little-endians. >> > Color me skeptical. You may be right, but I would like to > see more details. In the model of pointers that it occurs > to me to implement, big-Endian and little-endian come out > the same. Could you expand on this! Do you mean that if you cast a pointer to a long to a pointer to a short and dereference you will get the *high* order portion on a big-endian machine and the *low* order portion on a little-endian? Clearly a portability problem, and the the big-endian behavior is counter-intuitive. Or do you intend that the pointer always point to the low-order byte even on a big-endian machine? Then you have to index *backwards* to break the item up into bytes! Really, the only way to get the rigth semantics on a big-endian machine is to actually convert pointers when a cast is used. -- Sarima (Stanley Friesen) UUCP: {ttidca|ihnp4|sdcrdcf|quad1|nrcvax|bellcore|logico}!psivax!friesen ARPA: ttidca!psivax!friesen@rand-unix.arpa
crowl@rochester.ARPA (Lawrence Crowl) (04/17/86)
>I wrote: >> CHALLENGE: Come up with a scheme for representing numbers, and a sorting >> scheme in which numbers sort naturally. Your scheme must deal with variable >> length character strings and variable size numbers. That is, you cannot >> requires strings to be padded with nulls, or numbers to be padded with zeros. Arch D. Robison wrote: >... -4 = AAAAZ -3 = AAAZ -2 = AAZ -1 = AZ > 0 = M 1 = Z 2 = ZZ 3 = ZZZ 4 = ZZZZ ... > An interesting solution. I totally failed to consider unary notations. Just to keep the answers honest, I will add the requirement that some efficient form of representation be used. Radix notation is efficient, but you are not required to use it. For extra credit, extend your notation to deal with non-integral values. This is easy under current notation if the integral part is the same. P -- Lawrence Crowl 716-275-5766 University of Rochester Computer Science Department ...!{allegra,decvax,seismo}!rochester!crowl Rochester, New York, 14627
jjhnsn@ut-ngp.UUCP (James Johnson) (04/17/86)
> Consider the following problem. You have an array of 4 byte > integers. If you sort the array numerically you get one result. If > you regard the bytes as characters and sort them lexicographically on > a little endian machine you get a different result. WOW! You mean big endian machines are optimized to sort lists of four-letter-words ? :-) -- James Lee Johnson, UTexas Computation Center, Austin, Texas 78712 ARPA: jjhnsn@ngp.cc.utexas.edu jjhnsn@ut-ngp.arpa UUCP: ihnp4!ut-ngp!jjhnsn allegra!ut-ngp!jjhnsn gatech!ut-ngp!jjhnsn seismo!ut-sally!jjhnsn harvard!ut-sally!jjhnsn
greg@utcsri.UUCP (Gregory Smith) (04/20/86)
In article <1104@psivax.UUCP> friesen@psivax.UUCP (Stanley Friesen) writes: > Could you expand on this! Do you mean that if you cast a >pointer to a long to a pointer to a short and dereference you will get >the *high* order portion on a big-endian machine and the *low* order >portion on a little-endian? Clearly a portability problem, and the the >big-endian behavior is counter-intuitive. Or do you intend that the >pointer always point to the low-order byte even on a big-endian >machine? Then you have to index *backwards* to break the item up into >bytes! Really, the only way to get the rigth semantics on a big-endian >machine is to actually convert pointers when a cast is used. I strongly disagree. If you have long *x, then (char)*x ( as opposed to *(char*)x ) is the low-order byte of the pointed-to long and is portable. If you also have char c, then c= *x is also portable. One would hope that these would be recognized by the code generator on a big-endian machine, so that only the single byte would be read. Besides, your solution would get really weird on things like the PDP where lo bytes come first in words but lo words are second in longs ( admittedly a silly setup ). Pointer conversions between 'pointer-to-x' and 'pointer-to-y' should be no-ops whenever possible, since things like (struct foo *)malloc(...) are done so often. If the above scheme were to be used, then promoting, say, the (char *) from malloc to a (long *) would require masking off the lower 2 ( or whatever ) bits of the address. On a 68K, e.g., the lower bit would have to be cleared when a (char *) was promoted, and there is no provision in the instruction set to do this directly to an address register, which is where a pointer would most likely be. Of course, on some machines pointers have 'tag' bits which have to be modified so the new pointer can be dereferenced properly - so a no-op pointer conversion can't be done regardless of big/little-end. The following portably strips a long into bytes: int i; unsigned char bytes[ sizeof( long )]; /* lo-byte first */ long input; for(i=0;i<sizeof(long); ++i) bytes[i] = input >> (i<<3) Nit-pickers can substitute i*CHARBITS for i<<3. Sure it's slow, but it is portable, and the problem is one that tends to defy portability more than others. Besides, it could be sped up easily at the expense of clarity ( that's what clarity is for :-) ). If you really want speed, put the following in your 'config.h': On 68K: On Vax: On PDP11: union wombat{ union wombat{ union wombat{ long longish; long longish; long longish; struct chs{ struct chs{ struct chs{ char ch3,ch2; char ch0,ch1; char ch2,ch3; char ch1,ch0; char ch2,ch3; char ch0,ch1; }charish; }charish; }charish; }; }; }; If you declare 'union wombat x' then x.longish is a long and x.charish.ch3 is the high-order byte, and is fast. Given long *lp, you could do ((struct chs*)lp)->ch2 which would portably refer to the second-most-significant byte. Not as general, but it *is* fast. In summary, I don't feel that the portability problem you talk about will rear its nasty head very often, and it can be dealt with using the tools provided, just as other portability problems can be. The trouble you have to go to depends on (1) how portable you want it to be (2) how fast you want it to be (3) how many intrinsically hard-to-port things you want to do. Stripping specific bytes out of a long by addressing tricks *is* intrinsically non-portable and is not a very common thing, anyway. -- "If you aren't making any mistakes, you aren't doing anything". ---------------------------------------------------------------------- Greg Smith University of Toronto UUCP: ..utzoo!utcsri!greg
guido@boring.UUCP (04/20/86)
In article <1104@psivax.UUCP> friesen@psivax.UUCP (Stanley Friesen) writes: > Could you expand on this! Do you mean that if you cast a >pointer to a long to a pointer to a short and dereference you will get >the *high* order portion on a big-endian machine and the *low* order >portion on a little-endian? Clearly a portability problem, and the the >big-endian behavior is counter-intuitive. You're right. Casting "pointer to long" to "pointer to short" in C has a portability problem. This can't be solved by outlawing big-endian architectures, nor by 'fixing' the C compilers. Non-portable casts have have an effect that is obvious or at least understandable when you know the underlying hardware architecture of a given implementation; never mind intuitivity, and you correctly describe the effect of this particular example. If you want to write portable C, don't use non-portable constructs. If you want to flame big-endianness, don't use C portability as an argument. You're right about the counter-intuitivity of the effect of dereferencing a wrong-size pointer in assembly language, and (as has been said numerous times by now) this was one of the resons for its use in the PDP-11. -- Guido van Rossum, CWI, Amsterdam <guido@mcvax.UUCP>
crowl@rochester.ARPA (Lawrence Crowl) (04/21/86)
In article <2568@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes: >In article <17162@rochester.ARPA> crowl@rochester.UUCP (Lawrence Crowl) writes: >>CHALLENGE: Come up with a scheme for representing numbers, and a sorting >>scheme in which numbers sort naturally. Your scheme must deal with variable >>length character strings and variable size numbers. That is, you cannot >>requires strings to be padded with nulls, or numbers to be padded with zeros. >How about prepending the digit count to big-endian digit strings? >so 32 and 126 become 232 and 3126, and a lexical comparison gives >3126 > 232. Of course, leading zeroes in the significand cannot be used. ... The problem with this scheme is that it assumes a constant size "length" field. This is the best that I have yet seen though. >>Note that many part numbering schemes have intermixed letters and digits. >But what significance do letters have? do you want them to be ignored, >or to be assigned a dictionary order with the numbers? This is part of the problem! >P.S. this rather obscure scenario hardly seems reason to state that >little- and big-endians are both wrong. Sort the following list of "words": 302, 3B2, -42, 132, 26, AD, ab, 74LS00 --The damn poster program wants more lines. --It says "more included lines than new next". --I've said all I want to say. --You want more lines? --Take this! --Feed it to your line eater! --Burp! -- Lawrence Crowl 716-275-5766 University of Rochester Computer Science Department ...!{allegra,decvax,seismo}!rochester!crowl Rochester, New York, 14627
friesen@psivax.UUCP (04/21/86)
In article <2590@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes: > >I strongly disagree. If you have long *x, then (char)*x ( as opposed to >*(char*)x ) is the low-order byte of the pointed-to long and is >portable. What I was trying to say is that *both* should be portable and equivalent. Actually it is when going the other way, from a pointer to a small integer to a pointer to a large integer that you *have* to use the *(long *)x form, otherwise you will not get the whole thing. >Besides, your solution would get really weird on things like the PDP >where lo bytes come first in words but lo words are second in longs ( >admittedly a silly setup ). What solution? I was only talking about the *semantics* of a cast. I was saying that pointer conversions should result in pointers to rational objects, not wierd things like the the high order fragment of a larger entity. This is a statement about desired behavior *not* implementation. > >Pointer conversions between 'pointer-to-x' and 'pointer-to-y' should be >no-ops whenever possible, since things like (struct foo *)malloc(...) >are done so often. If the above scheme were to be used, then promoting, >say, the (char *) from malloc to a (long *) would require masking off >the lower 2 ( or whatever ) bits of the address. I agree, pointer conversions *should* be no-ops. The article I was responding to had stated that big-endian pointers and little-endian pointer conversions could *both* be no-ops and I was asking HOW?. As far as I can see, in order to get rational, intuitive *semantics* on a big-endian machine ponter conversions *cannot* be no-ops. That is all I was saying. So again, on a big-endian machine, how do you get the desired *semantics* and still leave pointer conversions as no-ops? > >The following portably strips a long into bytes: > >int i; >unsigned char bytes[ sizeof( long )]; /* lo-byte first */ >long input; > for(i=0;i<sizeof(long); ++i) > bytes[i] = input >> (i<<3) > And I am saying that the following *should* be portable, and that any implementation that it doesn't work on is brain-damaged. register int i; unsigned char bytes[ sizeof( long )]; /* lo-byte first */ long input; register char *cvptr; for(cvptr = (char *)&input, i = 0; i < sizeof(long); i++) bytes[i] = cvptr[i]; -- Sarima (Stanley Friesen) UUCP: {ttidca|ihnp4|sdcrdcf|quad1|nrcvax|bellcore|logico}!psivax!friesen ARPA: ttidca!psivax!friesen@rand-unix.arpa
franka@mmintl.UUCP (Frank Adams) (04/22/86)
In article <17312@rochester.ARPA> crowl@rochester.UUCP (Lawrence Crowl) writes: >>I wrote: >>>CHALLENGE: Come up with a scheme for representing numbers, and a sorting >>>scheme in which numbers sort naturally. Your scheme must deal with variable >>>length character strings and variable size numbers. That is, you cannot >>>require strings to be padded with nulls, or numbers to be padded with zeros. > >Arch D. Robison wrote: >>... -4 = AAAAZ -3 = AAAZ -2 = AAZ -1 = AZ >> 0 = M 1 = Z 2 = ZZ 3 = ZZZ 4 = ZZZZ ... >> > >An interesting solution. I totally failed to consider unary notations. Just >to keep the answers honest, I will add the requirement that some efficient >form of representation be used. Radix notation is efficient, but you are not >required to use it. For extra credit, extend your notation to deal with >non-integral values. This is easy under current notation if the integral >part is the same. Basically, you use a unary notation (pure or modified) to indicate how long the number is, and the number follows. Using the scheme above, 257 would be ZZZ257. One and a half would be Z15. This is essentially floating point notation. This has to be modified a bit to deal with negative numbers. One possibility is to allocate separate "exponent" spaces for positive and negative numbers. E.g., for positive numbers, the exponents are: ... -4 = NNNNT -3 = NNNT -2 = NNT -1 = NT 0 = T 1 = ZT 2 = ZZT 3 = ZZZT 4 = ZZZZT For negative numbers, the exponents are: ... -4 = MMMMG -3 = MMMG -2 = MMG -1 = MG 0 = G 1 = AG 2 = AAG 3 = AAAG 4 = AAAAG Also, for negative numbers, the digits should be in (9's) complement form. Thus -1 would be AG8. A special notation, e.g. NA, is required for zero. This can of course be done somewhat better. A higher radix can be used, and the exponent encoding can be further optimized. But the point is to describe the approach, not to work out all the details. Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108
g-rh@cca.UUCP (Richard Harter) (04/24/86)
>In article <17162@rochester.ARPA> crowl@rochester.UUCP (Lawrence Crowl) writes: > >CHALLENGE: Come up with a scheme for representing numbers, and a sorting >scheme in which numbers sort naturally. Your scheme must deal with variable >length character strings and variable size numbers. That is, you cannot >requires strings to be padded with nulls, or numbers to be padded with zeros. Presented without endorsement: Addressing nomenclature -- big endian (i.e. msb is bit 0, etc.) Item is divided into five parts: Part I - sign bit (0 is negative, 1 is postive) Part II - Size of field size in unary, i.e. string of 1's terminated by 0 Part III - Field size, in binary (Field size is size of field to left of implicit decimal point. Part IV - Portion of string to left of implied decimal point, encoded in bytes. Part V - Portion of string to right of implied decimal point, encoded in ternary, using two bits per ternary digit, as follows: 00 Sentinel, terminating right field 01 ternary 0 02 ternary 1 03 ternary 2 All numbers are represented as ascii characters. The point of the ternary representation (or something equivalent) is that it provides for a sentinel for variable length strings that sorts lexicographically. If an integer is to be considered as a number and compared in that fashion it goes in the left field -- if it is to be considered as a character string it goes in the right field. I believe that this representation meets the challenge. I sincerely hope that I never have to work with it. Richard Harter, SMDS Inc.
g-rh@cca (04/24/86)
Subject: Re: Byte order (or you are both wrong) >In article <17162@rochester.ARPA> crowl@rochester.UUCP (Lawrence Crowl) writes: > >CHALLENGE: Come up with a scheme for representing numbers, and a sorting >scheme in which numbers sort naturally. Your scheme must deal with variable >length character strings and variable size numbers. That is, you cannot >requires strings to be padded with nulls, or numbers to be padded with zeros. Presented without endorsement: Addressing nomenclature -- big endian (i.e. msb is bit 0, etc.) Item is divided into five parts: Part I - sign bit (0 is negative, 1 is postive) Part II - Size of field size in unary, i.e. string of 1's terminated by 0 Part III - Field size, in binary (Field size is size of field to left of implicit decimal point. Part IV - Portion of string to left of implied decimal point, encoded in bytes. Part V - Portion of string to right of implied decimal point, encoded in ternary, using two bits per ternary digit, as follows: 00 Sentinel, terminating right field 01 ternary 0 02 ternary 1 03 ternary 2 All numbers are represented as ascii characters. The point of the ternary representation (or something equivalent) is that it provides for a sentinel for variable length strings that sorts lexicographically. If an integer is to be considered as a number and compared in that fashion it goes in the left field -- if it is to be considered as a character string it goes in the right field. I believe that this representation meets the challenge. I sincerely hope that I never have to work with it. Richard Harter, SMDS Inc.
rose@think (04/26/86)
In article <1117@psivax.UUCP> friesen@psivax.UUCP (Stanley Friesen) writes: >In article <2590@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes: >> >>I strongly disagree. If you have long *x, then (char)*x ( as opposed to >>*(char*)x ) is the low-order byte of the pointed-to long and is >>portable. > > What I was trying to say is that *both* should be portable and >equivalent. >..... > And I am saying that the following *should* be portable, and >that any implementation that it doesn't work on is brain-damaged. > > register int i; > unsigned char bytes[ sizeof( long )]; /* lo-byte first */ > long input; > register char *cvptr; > > for(cvptr = (char *)&input, i = 0; i < sizeof(long); i++) > bytes[i] = cvptr[i]; > > Sarima (Stanley Friesen) Well, this is certainly false for all existing C's on big-endian machines. Even if you make the pointer "(char *)&input" point to the last byte of the datum, when you index on it, say by referencing "cvptr[1]", you get bytes *past* the last byte of the long datum. But the funny thing is, a C compiler *could* be designed to make Sarima's code work on a big-endian machine. Pointer ordering and arithmetic could be defined, in a self-consistent way, *backwards* from the machine-word ordering. Arrays and structs would be allocated backward in memory. (As a bonus, conversion between pointers and ints could involve a negation, but portable C makes no such requirements.) In essence, the abstract addressing structure imposed by this hypothetical compiler would turn our misguided big-endian machine into a virtual little-endian. This hack fails for the waverers, such as PDP-11's. (However, you could complement the 1s bit before and after index arithmetic?) Be wrong, but be consistent! Disclaimers: This was not a serious suggestion to the compiler designers of the world. It certainly has nothing to do with the official policy of Thinking Machines Corporation. :-) :-) :-) [P.S. Sarima, if this was your meaning all along, sorry to steal your thunder; I must have missed an article.] [P.P.S. Local work crunch precludes retro-flame on &array responses. But stay tuned, netters.] -- ---------------------------------------------------------- John R. Rose Thinking Machines Corporation 245 First St., Cambridge, MA 02142 (617) 876-1111 X270 rose@think.arpa ihnp4!think!rose
crowl@rochester (04/26/86)
In article <1298@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes: ]In article <17312@rochester.ARPA> crowl@rochester.UUCP (Lawrence Crowl) writes: ]]CHALLENGE: Come up with a scheme for representing numbers, and a sorting ]]scheme in which numbers sort naturally. Your scheme must deal with variable ]]length character strings and variable size numbers. That is, you cannot ]]require strings to be padded with nulls, or numbers to be padded with zeros. ] ]Basically, you use a unary notation (pure or modified) to indicate how long ]the number is, and the number follows. Using the scheme above, 257 would ]be ZZZ257. One and a half would be Z15. This is essentially floating point ]notation. ... This has to be modified a bit to deal with negative numbers. ... ]Also, for negative numbers, the digits should be in (9's) complement form. ... ]This can of course be done somewhat better. A higher radix can be used, and ]the exponent encoding can be further optimized. But the point is to describe ]the approach, not to work out all the details. This seems good to me. If there are no radical alternatives, lets close the issue. -- Lawrence Crowl 716-275-5766 University of Rochester Computer Science Department ...!{allegra,decvax,seismo}!rochester!crowl Rochester, New York, 14627
greg@utcsri (04/26/86)
In article <1117@psivax.UUCP> friesen@psivax.UUCP (Stanley Friesen) writes: >In article <2590@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes: >>... > And I am saying that the following *should* be portable, and >that any implementation that it doesn't work on is brain-damaged. > > register int i; > unsigned char bytes[ sizeof( long )]; /* lo-byte first */ > long input; > register char *cvptr; > > for(cvptr = (char *)&input, i = 0; i < sizeof(long); i++) > bytes[i] = cvptr[i]; > Sorry - I thought you were suggesting that big-endian pointer conversions should set the lower bits (when pointing to a smaller object) and clear them (when pointing to a larger one). What you are actually saying is that any implementation which is not strictly little-endian is 'brain- damaged' (Including (a) PDP11 _longs_, (b) 68000). I will agree that (a) is brain-damaged, but I think the adjective is a little strong for (b). I am a little-end fan myself, but I can face reality, and the reality is there will be big-endian machines around that are still worth using, and that the 'problem' can be dealt with so easily that one needn't damn the 68K to eternal flames on this one point. -- "For every action there is an equal and opposite malfunction" ---------------------------------------------------------------------- Greg Smith University of Toronto UUCP: ..utzoo!utcsri!greg
rbj@icst-cmr (04/30/86)
> In article <1117@psivax.UUCP> friesen@psivax.UUCP (Stanley Friesen) writes: > >In article <2590@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes: > >>... > > And I am saying that the following *should* be portable, and > >that any implementation that it doesn't work on is brain-damaged. > > > > register int i; > > unsigned char bytes[ sizeof( long )]; /* lo-byte first */ > > long input; > > register char *cvptr; > > > > for(cvptr = (char *)&input, i = 0; i < sizeof(long); i++) > > bytes[i] = cvptr[i]; > > > Sorry - I thought you were suggesting that big-endian pointer conversions > should set the lower bits (when pointing to a smaller object) and clear > them (when pointing to a larger one). What you are actually saying is > that any implementation which is not strictly little-endian is 'brain- > damaged' (Including (a) PDP11 _longs_, (b) 68000). I will agree that (a) > is brain-damaged, but I think the adjective is a little strong for (b). > I am a little-end fan myself, but I can face reality, and the reality is > there will be big-endian machines around that are still worth using, and > that the 'problem' can be dealt with so easily that one needn't damn > the 68K to eternal flames on this one point. > > "For every action there is an equal and opposite malfunction" > ---------------------------------------------------------------------- > Greg Smith University of Toronto UUCP: ..utzoo!utcsri!greg To all you Big Endian fans out there: What's your problem? You guys are the ones saying that the first byte is the most significant, so why are y'all bitchin that you don't get the low byte like on *sane* machines? Axually, I like the 680[012]0 also. Byte order doesn't really bother me either way except as an academic issue. I don't see the need or the possibility for your example to be both portable and work the way you described. The two are at odds. Remember, C does what you tell it to do, not what you think it should do. (Root Boy) Jim Cottrell <rbj@cmr> "One man gathers what another man spills"
jsdy@hadron.UUCP (Joseph S. D. Yao) (05/08/86)
In article <5056@think.ARPA> rose@think.UUCP (John Rose) writes: >In article <1117@psivax.UUCP> friesen@psivax.UUCP (Stanley Friesen) writes: >> And I am saying that the following *should* be portable, and >>that any implementation that it doesn't work on is brain-damaged. >> register int i; >> unsigned char bytes[ sizeof( long )]; /* lo-byte first */ >> long input; >> register char *cvptr; >> for(cvptr = (char *)&input, i = 0; i < sizeof(long); i++) >> bytes[i] = cvptr[i]; >Well, this is certainly false for all existing C's on big-endian machines. >Even if you make the pointer "(char *)&input" point to the last byte of >the datum, when you index on it, say by referencing "cvptr[1]", you get >bytes *past* the last byte of the long datum. I don't see rose's point. Attend: 0: 0123 3210 big-endian little-endian In both cases, if &input == 0, and (cvptr = (char *) &input) == 0, then cvptr[0-3] will map in the same order to bytes[0-3]; and the copy succeeds. Perhaps he is thinking that (char *) &input == 3, and (short int *) &input == 2? This is exactly the behaviour that big-endians do n o t do, and which little-endian PDP-11 and VAX-11 programmers despair of because thay can't perform type punning. Incidentally, don't call an 11 a PDP, because there are radically different architectures on the PDP-1,-4,-5,-6,-7,-8,-9,-10,-12,-15, -16,-20,... ;-) -- Joe Yao hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}