[comp.misc] "big endian" and "little endian" - first usage for computer

bcase@cup.portal.com (Brian bcase Case) (12/30/88)

>Both representations are useful.  Are there any other respective
>advantages I've left out?

Big endian has the significant advantage that, when properly aligned,
character strings can be compared using the full width of the machine's
ALU.  For 32-bit machines, this means that two four-character (sub)strings
can be compared at one time.  This is because the lowest address always
points to the *first* character in the string.  Little endian requires
character-at-a-time processing or hardware gymnastics.

Since it forces inefficiency, little-endian is for CISCs.  :-) :-)

bcase@cup.portal.com (Brian bcase Case) (12/31/88)

>Big endian has the significant advantage that, when properly aligned,
>character strings can be compared using the full width of the machine's
>ALU.  For 32-bit machines, this means that two four-character (sub)strings
>can be compared at one time.  This is because the lowest address always
>points to the *first* character in the string.

Oops, I meant to say that the first character is always in a more
significant position than the second and succeeding characters.  This
corresponds to the convention that the first character in a string is
the most important in determining its position in an alphabetically
sorted list of strings.  Thus, after properly aligning, (sub)strings
can be compared as if they were simple (unsigned) integers.

patterso@hardees.rutgers.edu (Ross Patterson) (01/01/89)

>Since it forces inefficiency, little-endian is for CISCs.  :-) :-)

Does that make the IBM 370 a RISC? ;-)

stuart@bms-at.UUCP (Stuart Gathman) (01/04/89)

In article <13045@cup.portal.com>, bcase@cup.portal.com (Brian bcase Case) writes:

> Big endian has the significant advantage that, when properly aligned,
> character strings can be compared using the full width of the machine's
> ALU.  For 32-bit machines, this means that two four-character (sub)strings
> can be compared at one time.  This is because the lowest address always
> points to the *first* character in the string.  Little endian requires
> character-at-a-time processing or hardware gymnastics.

I do the same thing on little endian.  It all depends on how you store
the characters.  Read the "HOLY WAR" article for a detailed explanation.
The problem is, there are no consistent little-endian machines, the
big-endian infiltrators have sabotaged every last one (that I know of).

The major (dis)advantages are:

	BIGend		(numeric) compares / divides are faster
	LITTLEend	adds / multiplies are faster
-- 
Stuart D. Gathman	<stuart@bms-at.uucp>
			<..!{vrdxhq|daitc}!bms-at!stuart>

w-colinp@microsoft.UUCP (Colin Plumb) (01/04/89)

In article <145@bms-at.UUCP> stuart@bms-at.UUCP (Stuart Gathman) writes:
>The problem is, there are no consistent little-endian machines, the
>big-endian infiltrators have sabotaged every last one (that I know of).

The Inmos transputer is uniformly little-endian.  This applies to both
integers and floating-point numbers (where most others mess up).
-- 
	-Colin (uunet!microsof!w-colinp)

hutch@delft (David Hutchens) (01/05/89)

From article <170@microsoft.UUCP>, by w-colinp@microsoft.UUCP (Colin Plumb):
> In article <145@bms-at.UUCP> stuart@bms-at.UUCP (Stuart Gathman) writes:
>>The problem is, there are no consistent little-endian machines, the
>>big-endian infiltrators have sabotaged every last one (that I know of).
> 
> The Inmos transputer is uniformly little-endian.  This applies to both
> integers and floating-point numbers (where most others mess up).
> -- 
> 	-Colin (uunet!microsof!w-colinp)

Actually, where most little-endian machines screw up is storing the
bits in the byte in the wrong order.  It is good to hear that somebody got it
right and stored a one as 100000...0000 rather than 00000001000...000.
(That is what you meant wasn't it?).  Note that this implies one
multiplies by 2 by using a RIGHT shift (else there is an inconsistancy
in the little-endian view in the registers)!  The Inmos sounds interesting.

		David Hutchens
		hutch@hubcap.clemson.edu
		...!gatech!hubcap!hutch

w-colinp@microsoft.UUCP (Colin Plumb) (01/05/89)

In article <4008@hubcap.UUCP> hutch@delft writes:
>Actually, where most little-endian machines screw up is storing the
>bits in the byte in the wrong order.  It is good to hear that somebody got it
>right and stored a one as 100000...0000 rather than 00000001000...000.
>(That is what you meant wasn't it?).  Note that this implies one
>multiplies by 2 by using a RIGHT shift (else there is an inconsistancy
>in the little-endian view in the registers)!  The Inmos sounds interesting.

Sorry, no.  Little endian means that if two addressed objects (on the
Trasnputer, the smallest object that can be addressed is a byte) are
part of the same number, the object (byte) with the lower address is
less significant.

Note two things:
-> There is no ordering implied on bits within bytes; a byte is an atomic
   object, and you can't say which bit of it comes "first."  (Of course,
   in serial communications, the other site of the Holy War, this is
   significant.)
-> Both big- and little-endian types agree that more significant bits should
   be to the left, conceptually (the Arabic heritage, remember?); they
   *don't* agree on whether addresses increase left-to-right (big-endian) or
   right-to-left (little-endian).  See On Holy Wars and a Plea For Peace for
   more diagrams.  A one is stored as 00000000 00000000 00000000 00000001,
   with the bytes' addresses being     base+3   base+2   base+1   base+0.

Thus, it is impossible for a byte-addressed machine to store the bits in
a byte in the wrong order, unless it has bitfield instructions or some
such bit-addressing kludge.
-- 
	-Colin (uunet!microsof!w-colinp)

mac3n@babbage.acc.virginia.edu (Alex Colvin) (01/05/89)

> > The Inmos transputer is uniformly little-endian.  This applies to both
> > integers and floating-point numbers (where most others mess up).

You mean the beginning of a double looks like a float?

f(x) float x; { g(&x); } /* g() is actually passed a (double *) */

> Actually, where most little-endian machines screw up is storing the
> bits in the byte in the wrong order.  It is good to hear that somebody got it

On most of these machines, bits are stored vertically :-/. [half :-)]
If you can't index or address bits, there is no order.  If it makes you happy,
call a right shift (to less significance) a down shift, a left shift an up
shift.  The big/little thing only has meaning in addressing parts.

Another notational screw-up is where to put address 0 when drawing memory.  I
always put it at the top ("up there at the bottom of memory").

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (01/05/89)

In article <13045@cup.portal.com> bcase@cup.portal.com (Brian bcase Case) writes:
>Big endian has the significant advantage that, when properly aligned,
>character strings can be compared using the full width of the machine's

>Since it forces inefficiency, little-endian is for CISCs.  :-) :-)

I could be wrong, but I think a fully consistent little-endian machine
(e.g. nsc 32xxx) does not have this disadvantage.

All this was covered about 2 years ago on this group: the conclusion then
was that little-endian had a small advantage on tiny machines (e.g. 8008
class and slower) needing to do BCD arithmetic, big endian machines have
the "advantage" that it is easier to read dumps, and there are no other
significant differences.  VAXes, of course, are not consistent little-
endian or big-endian, but then, we are not supposed to have to read dumps
anymore anyway, remember ?  :-)

-- 
  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117

seanf@sco.COM (Sean Fagan) (01/05/89)

In article <20264@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster) writes:
>I could be wrong, but I think a fully consistent little-endian machine
>(e.g. nsc 32xxx) does not have this disadvantage.

You're wrong.  On a NSC 32k, addresses are in the wrong order (actually, I
think it might just be displacements), because the upper 1 or 2 bits
determine the size of the address (and means that you can't use a
displacement of 2gigs unsigned, or 1 gig signed.  everybody sigh in unison
8-)).  Also, I'd bet that the FP format is backwards (wrt big vs. little
endian).

Now, *Cybers* don't have this problem, you betcha.  It's kinda nice not
having to worry about byte addressing...

-- 
Sean Eric Fagan  | "Merry Christmas, drive carefully and have some great sex."
seanf@sco.UUCP   |     -- Art Hoppe
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (01/06/89)

In article <20264@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster) writes:
>I could be wrong, but I think a fully consistent little-endian machine
>(e.g. nsc 32xxx) does not have this disadvantage.

I was assuming an equality comparison.  Most people seem to assume strcmp(),
for which it does make a difference (this could lead to a very long discussion
of how important strcmp-like comparisons are, etc., which I will avoid.)

-- 
  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117

bcw@rti.UUCP (Bruce Wright) (01/06/89)

In article <20264@ames.arc.nasa.gov>, lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
> In article <13045@cup.portal.com> bcase@cup.portal.com (Brian bcase Case) writes:
> 
> >Since it forces inefficiency, little-endian is for CISCs.  :-) :-)
> 
> All this was covered about 2 years ago on this group: the conclusion then
> was that little-endian had a small advantage on tiny machines (e.g. 8008
> class and slower) needing to do BCD arithmetic, big endian machines have
> the "advantage" that it is easier to read dumps, and there are no other
> significant differences.  VAXes, of course, are not consistent little-
> endian or big-endian, but then, we are not supposed to have to read dumps
> anymore anyway, remember ?  :-)
> 
My immediate thought on seeing the VAX instruction set when it first came
out was that by making the byte order "little endian" it allowed something
like a Fortran compiler to take a statement like:

	call sub (1)

and pass a number to it (a longword - 4 bytes) which would be interpreted
correctly whether the receiving formal parameter was a byte, a word (2 bytes),
or a longword (4 bytes).  This is not possible in a "big endian" machine -
you have to know how many bytes of high order 0's to write before you get to
the low order byte.  Considering that the Fortran of the day had no way to
declare the formal parameters for subroutines, and the importance of
Fortran in the early days of the VAX (and the fact that the VAX was built
with a great deal of input from the software guys), could this be the REAL
motivation for "little endian"?

Of course the fact I even thought of the possibility of such a trick
probably shows I'm just an old Fortrash hacker ...

						Bruce C. Wright

steve@nuchat.UUCP (Steve Nuchia) (01/06/89)

In article <20293@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster) writes:
[supposed advantages of little-endian]

>I was assuming an equality comparison.  Most people seem to assume strcmp(),
>for which it does make a difference (this could lead to a very long discussion
>of how important strcmp-like comparisons are, etc., which I will avoid.)

Well, I won't.  :-)

The literature on sorting algorithms focuses on the use of a "<=" oracle,
by analogy with the mathematical definition of "well order", which is what
a sort is supposed to do.  In a previous life I derived a sort algorithm
that used a three-way oracle (strcmp, in fact) to good advantage.

I based the work on the fact that a large part of the comparison expense
for strings is in scanning the initial equal part; the three-way answer
comes for free after that.  My algorithm maintained in-core data in a
trinary tree with a degenerate (linear) subtree for the equals case.  The
expected data had significant clumping around discrete values so the extra
space was well justified.  The disk-resident format for intermediate runs
included a bit for "known equal" so the tests didn't have to be repeated
during merging.  It was a very fast sort, given the expected input
distribution.

(It used a number of other tricks, including bidirectional run management
 and very-high-order merging.  The other tricks exploited the unavoidable
 disk block cachine in unix, but the trinary tree is quite general.)

So, don't discount strcmp's value.  Very many progams use it for
an equality test only, but sorting still consumes a great deal of
computer time in the real world, and when sorting we need to
know which way it went.  It would be a good idea for computer
architects to bear this in mind:  As mundane as sorting may
seem, it is the benchmark of choice for a great many check-signers.
-- 
Steve Nuchia	      South Coast Computing Services
uunet!nuchat!steve    POB 890952  Houston, Texas  77289
(713) 964 2462	      Consultation & Systems, Support for PD Software.

wen-king@cit-vax.Caltech.Edu (King Su) (01/06/89)

In article <2695@rti.UUCP> bcw@rti.UUCP (Bruce Wright) writes:
>My immediate thought on seeing the VAX instruction set when it first came
<out was that by making the byte order "little endian" it allowed something
>like a Fortran compiler to take a statement like:
<
>	call sub (1)
<
>and pass a number to it (a longword - 4 bytes) which would be interpreted
<correctly whether the receiving formal parameter was a byte, a word (2 bytes),
>or a longword (4 bytes).  This is not possible in a "big endian" machine -
<you have to know how many bytes of high order 0's to write before you get to
>the low order byte.

This cannot have anything to do with byte-ordering because the two
byte-ordering conventions are totally symmetrical and isomorphic.  Any
difference between two machines must have been a result of some asymmetries
that was imposed on on the machine when the machine was designed.  In
the example above, the asymmetry was imposed when the following question
is answered:

   If a data unit is consisted of a sequence of bytes, what should the
   address of the data unit be: the address of the MSByte or the address
   of the LSByte.

In VAX, and in most little-endian machines, the address of the LSByte was
used to represent the address of the data unit.  In 68K and most big-endian
machines, the address of the MSByte was used.  The choice is quite arbitrary,
but the important thing is that it imposes an asymmetry.  The supposed
"advantage" of the little-endian byte-ordering is really the advantage of
choosing the address of the LSByte to be the address of a multi-byte unit.

We can build a big-endian machine with exactly the same advantage if we make
the same choice for it as we have made for VAX.  In this case, a 'long' that
occupies byte address 0x20 0x21 0x22 0x23, will have 0x23 as its address.

In general, given any little-endian machine, we can build a big-endian
machine that is exactly as good as the little-endian machine (in fact,
they will be duals), and vice versa.  Byte-ordering should cease to be the
focal point of any arguments; talks about the decisions that lead to the
asymmetries should replace it.
-- 
/*------------------------------------------------------------------------*\
| Wen-King Su  wen-king@vlsi.caltech.edu  Caltech Corp of Cosmic Engineers |
\*------------------------------------------------------------------------*/

d25001@mic.UUCP (Carrington Dixon) (01/07/89)

In article <2695@rti.UUCP> bcw@rti.UUCP (Bruce Wright) writes:

>My immediate thought on seeing the VAX instruction set when it first came
>out was that by making the byte order "little endian" it allowed something
>like a Fortran compiler to take a statement like:

>	call sub (1)

>and pass a number to it (a longword - 4 bytes) which would be interpreted
>correctly whether the receiving formal parameter was a byte, a word (2 bytes),
>or a longword (4 bytes).  This is not possible in a "big endian" machine -
>you have to know how many bytes of high order 0's to write before you get to
>the low order byte.  Considering that the Fortran of the day had no way to
>declare the formal parameters for subroutines, and the importance of
>Fortran in the early days of the VAX (and the fact that the VAX was built
>with a great deal of input from the software guys), could this be the REAL
                                                     ^^^^^^^^^^^^^^^^^^^^^^
>motivation for "little endian"?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

     If I thought that it was, my respect for DEC would take a BIG drop.
Passing a longword and receiving a word is an ERROR.  Sure, it works fine
for your example, but what if the statement were:

       call sub (70000)

Now both the big- and little-endian machines are receiving the "wrong"
value.   On the big-endian machine, the developer will probably find
his/her mistake during initial checkout.  On the little-endian machine
the error _may_ not be found until the module has been in production,
spewing out wrong answers for months.  I don't know about what kind of
environment you work in, but where I work this kind of error could cost my
company $k in data that had to be reprocessed.  (Not too mention egg on
our corporate face if a client were to discover the gaffe.)

     And now to lighten up....  No, this cannot be the _REAL_ motivation
for the little-endian data format, because this is INTEGER data (-::-)
snicker ... snicker ...  

Carrington Dixon
UUCP: { convex, killer }!mic!d25001

bcw@rti.UUCP (Bruce Wright) (01/08/89)

In article <205@mic.UUCP>, d25001@mic.UUCP (Carrington Dixon) writes:
> In article <2695@rti.UUCP> bcw@rti.UUCP (Bruce Wright) writes:
>  
> >My immediate thought on seeing the VAX instruction set when it first came
> >out was that by making the byte order "little endian" it allowed something
> >like a Fortran compiler to take a statement like:
> 
> >	call sub (1)
> 
> >and pass a number to it (a longword - 4 bytes) which would be interpreted
> >correctly whether the receiving formal parameter was a byte, a word (2 bytes),
> >or a longword (4 bytes).  [...] Could this be the REAL
>				   ^^^^^^^^^^^^^^^^^^^^^^
> >motivation for "little endian"?
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
>      If I thought that it was, my respect for DEC would take a BIG drop.
> Passing a longword and receiving a word is an ERROR.  Sure, it works fine
> for your example, but what if the statement were: [...]

Well, you can assert that the FORTRAN language itself is an error;  this is
essentially what you are saying.  The point is that there is NO WAY (repeat:
NO WAY) in the Fortran 77 standard to declare a formal argument list.  That
means that there is NO WAY to declare to the compiler that it is to pass
an INTEGER*2 value (as opposed to, say, an INTEGER*4 value) as a parameter
to the subroutine.  In other words, the INTENT OF THE PROGRAMMER was all
along to pass a word rather than a longword, but the DEFINITION OF THE
LANGUAGE does not allow this to be explicitly declared.

Now I am not going to defend FORTRAN as a "safe" language, or an "elegant"
language, or even a "good" language.  It is however a very commercially
significant language - which is not at all the same thing (eg, COBOL).
FORTRAN is still in considerable use (even for new development in some
environments).

> the error _may_ not be found until the module has been in production,
> spewing out wrong answers for months.  I don't know about what kind of
> environment you work in, but where I work this kind of error could cost my
> company $k in data that had to be reprocessed.  (Not too mention egg on
> our corporate face if a client were to discover the gaffe.)

The classic FORTRAN error had nothing whatsoever to do with this kind
of error, but with the terseness that FORTRAN uses for its syntax:

a statement something like

	do 100 i=1,10

got permuted to something like

	do 100 i=1.10

The former statement starts a loop varying "I" from 1 to 10, and the latter
assigns a value of 1.10 to a variable named "DO100I".  Because of the
structure of FORTRAN, where there is no explicit end-loop construct (that
is specified by the statement label "100"), so the error went undetected ...
until the satellite got dumped in the ocean and NASA had lots of egg on
its face.

In other words, if you want to flame anything about safe computing, you
should probably be flaming FORTRAN, not DEC or the VAX or me.

						Bruce C. Wright

glennw@nsc.nsc.com (Glenn Weinberg) (01/10/89)

In article <2015@scolex> seanf@scolex.UUCP (Sean Fagan) writes:
>In article <20264@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster) writes:
>>I could be wrong, but I think a fully consistent little-endian machine
>>(e.g. nsc 32xxx) does not have this disadvantage.
>
>You're wrong.  On a NSC 32k, addresses are in the wrong order (actually, I
>think it might just be displacements), because the upper 1 or 2 bits
>determine the size of the address (and means that you can't use a
>displacement of 2gigs unsigned, or 1 gig signed.  everybody sigh in unison
>8-)).  Also, I'd bet that the FP format is backwards (wrt big vs. little
>endian).

He's less wrong than you are :-)  The 32K is completely consistently
little-endian (including floating-point), except for displacements,
which are as you described: the upper two bits of the displacement
determine whether it is one, two or four bytes long.  Since displacements
are part of the instruction stream rather than the data, all data
representations in the 32K are consistently little-endian.  Unless you
write self-modifying code, the only time the reverse order of the
displacements is annoying is when you're writing an assembler or
disassembler.

-- 
Glenn Weinberg					Email: glennw@nsc.nsc.com
National Semiconductor Corporation		Phone: (408) 721-8102
(My opinions are strictly my own, but you can borrow them if you want.)

d25001@mic.UUCP (Carrington Dixon) (01/10/89)

In article <2702@rti.UUCP> bcw@rti.UUCP (Bruce Wright) writes:
>In article <205@mic.UUCP>, d25001@mic.UUCP (Carrington Dixon) writes:

>Well, you can assert that the FORTRAN language itself is an error;  this is
>essentially what you are saying.  The point is that there is NO WAY (repeat:
>NO WAY) in the Fortran 77 standard to declare a formal argument list.  That

     We both agree that there is no provision in FORTRAN to catch 
mismatched arguments at compile time.  We even seem to agree that this
is a failing of that language.  Thus there is a large category of errors
that FORTRAN cannot find at compile time.  I maintain that those who
wish to create "correct" programs will want to test these modules in
order to find as many errors as possible before dumping the mess on some
hapless user.

     With this in mind, I maintain that some data formats lend themselves
to finding such latent errors more readily than do others and that it
would be pernicious of any vendor to choose its data formats with an
eye to making such checkout as difficult as possible.  DEC and little-
endian integers was just the example at hand; I can think of other
architectures that allow the equally unfortunate passing of double-reals
and receiving single-reals with similar problems in runtime diagnoses.

>In other words, if you want to flame anything about safe computing, you
>should probably be flaming FORTRAN, not DEC or the VAX or me.
>
>						Bruce C. Wright

    I thought my response was a little mild to qualify as a full-
fledged usenet flame, but I suppose that opinons may differ.
For the record, I do not think that DEC was guilty of guilty of choosing
its data formats in some blind and misguided attempt to follow FORTRAN's
lead into the dismal swamp.  They chose the "little-endian" format for
other reasons.  I am sure that they were under no delusion that they had
to perpeptuate FORTRAN's shortcomings in their hardware.

     Incidentally, I think that the phrase that you were trying to use
(twice) was "in error."  I might be offended if I thought that you
really meant that I was "an error."


Carrington Dixon
UUCP: { convex, killer }!mic!d25001

jesup@cbmvax.UUCP (Randell Jesup) (01/11/89)

In article <482@babbage.acc.virginia.edu> mac3n@babbage.acc.virginia.edu (Alex Colvin) writes:
>> Actually, where most little-endian machines screw up is storing the
>> bits in the byte in the wrong order.  It is good to hear that somebody got it

>Another notational screw-up is where to put address 0 when drawing memory.  I
>always put it at the top ("up there at the bottom of memory").

	I think there are two main reasons for what appears to be more
programmers liking big-endian (no flames, local observation) and hardware
people liking little-endian are:

1)  Little-endian used to make it easier to support big integers on small-
buswidth machines (minor issue, solved or irrelevant now in general).

2)  Hardware people like to draw diagrams with 0 at bottom-right, software
people, used to printers and screens that print top to bottom, left to right,
like to put 0 at upper-left.  It also makes dumping memory with strings easier
to read.

-- 
Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup

bbadger@x102c.uucp (Badger BA 64810) (01/13/89)

In article <5658@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes:
>2)  Hardware people like to draw diagrams with 0 at bottom-right, software
>people, used to printers and screens that print top to bottom, left to right,
>like to put 0 at upper-left.  It also makes dumping memory with strings easier
>to read.

DEC VAX DUMP prints out in a format that makes both integers and strings 
easy to read.  Namely, it prints out each in their ``natural'' order:
Integers in little-endian (right to left), and strings from left to right.
Here's an example:

Virtual block number 1 (00000001), 512 (0200) bytes

 4E4D4C4B 4A494847 46454443 4241002F /.ABCDEFGHIJKLMN 000000
 69685420 5A595857 56555453 5251504F OPQRSTUVWXYZ Thi 000010
 74736574 20612079 6C6E6F20 73692073 s is only a test 000020
 00000000 00000000 00000000 FFFF0021 !............... 000030
 00000000 00000000 00000000 00000000 ................ 000040
     <----- numbers go this way <---*---> strings go this way --->

People who expect the first word (000000) to appear first (at left) will be 
suprised by this, but it's perfectly consistent with the way we write 
our numbers and strings.
Bernard A. Badger Jr.	407/984-6385          |``Use the Source, Luke!''
Secure UNIX Products                          |It's not a bug; it's a feature!
Harris GISD, Melbourne, FL                    |Buddy, can you paradigm?
Internet: bbadger@cobra@trantor.harris-atd.com|Recursive:  see Recursive.

jesup@cbmvax.UUCP (Randell Jesup) (01/17/89)

In article <1433@trantor.harris-atd.com> bbadger@x102c.UUCP (Badger BA 64810) writes:
>In article <5658@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes:
>>2)  Hardware people like to draw diagrams with 0 at bottom-right, software
>>people, used to printers and screens that print top to bottom, left to right,
>>like to put 0 at upper-left.  It also makes dumping memory with strings easier
>>to read.

>DEC VAX DUMP prints out in a format that makes both integers and strings 
>easy to read.  Namely, it prints out each in their ``natural'' order:
>Integers in little-endian (right to left), and strings from left to right.

> 4E4D4C4B 4A494847 46454443 4241002F /.ABCDEFGHIJKLMN 000000
> 69685420 5A595857 56555453 5251504F OPQRSTUVWXYZ Thi 000010
> 74736574 20612079 6C6E6F20 73692073 s is only a test 000020
>     <----- numbers go this way <---*---> strings go this way --->
>
>People who expect the first word (000000) to appear first (at left) will be 
>suprised by this, but it's perfectly consistent with the way we write 
>our numbers and strings.

	I don't know about you (or your hardware), but I tend to write from
left to right, not right to left.  :-)  And I don't start writing in the
middle of the page, and go both left and right from there.  :-)

	Sure you can write this way, or even make things scroll up, but
most terminals/whatever are easier to deal with in a sequential, left to
right, top to bottom fashion.  It's marginally more annoying to deal with
in your way.  Also, I get a headache trying to find the word/byte/whatever
I'm looking for in a listing like that, I have to reverse my thinking.  :-)

	Personally, that's a nice kludge to get around the fact that little-
endian is "naturally" written right to left, bottom to top by most people.
However, people don't read that way, certainly not text.

	I think little-endian is a long-standing joke played by hardware
engineers of software writers.  :-)

-- 
Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup

ggs@ulysses.homer.nj.att.com (Griff Smith) (01/17/89)

In article <5703@cbmvax.UUCP>, jesup@cbmvax.UUCP (Randell Jesup) writes:
> 	Personally, that's a nice kludge to get around the fact that little-
> endian is "naturally" written right to left, bottom to top by most people.
> However, people don't read that way, certainly not text.

Where `people' are defined to be those who happen to be members of the
Western cultures that read left to right.  What does that make the others?

> 	I think little-endian is a long-standing joke played by hardware
> engineers of software writers.  :-)

Big-endian is a long-standing mistake imposed on us by merchants from the
Middle Ages who missed the point.  In transcribing the number system from
the Arabic, they should have had the sense to reverse the digits to compensate
for the strange Western custom of writing from left to right. ( :-), I suppose).

> -- 
> Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup

-- 
Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		1-201-582-7736
UUCP:		{most AT&T sites}!ulysses!ggs
Internet:	ggs@ulysses.att.com

jesup@cbmvax.UUCP (Randell Jesup) (01/18/89)

In article <11113@ulysses.homer.nj.att.com> ggs@ulysses.homer.nj.att.com (Griff Smith) writes:
>In article <5703@cbmvax.UUCP>, jesup@cbmvax.UUCP (Randell Jesup) writes:
>> 	Personally, that's a nice kludge to get around the fact that little-
>> endian is "naturally" written right to left, bottom to top by most people.
>> However, people don't read that way, certainly not text.
>
>Where `people' are defined to be those who happen to be members of the
>Western cultures that read left to right.  What does that make the others?

	Yes, sorry, I forgot to qualify that as people is "Western"
cultures.  This is the smallest problem with existing systems/software
for non-"Western" people (does your software support kanji? Arabic?)

-- 
Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup

cik@l.cc.purdue.edu (Herman Rubin) (01/19/89)

In article <5721@cbmvax.UUCP>, jesup@cbmvax.UUCP (Randell Jesup) writes:
> In article <11113@ulysses.homer.nj.att.com> ggs@ulysses.homer.nj.att.com (Griff Smith) writes:
> >In article <5703@cbmvax.UUCP>, jesup@cbmvax.UUCP (Randell Jesup) writes:
> >> 	Personally, that's a nice kludge to get around the fact that little-
> >> endian is "naturally" written right to left, bottom to top by most people.
> >> However, people don't read that way, certainly not text.
> >
> >Where `people' are defined to be those who happen to be members of the
> >Western cultures that read left to right.  What does that make the others?
> 
> 	Yes, sorry, I forgot to qualify that as people is "Western"
> cultures.  This is the smallest problem with existing systems/software
> for non-"Western" people (does your software support kanji? Arabic?)
> 
> -- 
> Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup

A look at ancient writing of numbers, both in symbols and spelled out,
indicates that it is pretty much big-endian.  Except for the units and
tens digits, I know of no language in either the Semitic or the Indo-European
group which does not express numbers with the most significant part first.
For example, in Hebrew (and probably also in Arabic, they are sufficiently
similar), one would say the equivalent of two hundred and thirty, NOT
thirty and two hundred.  It would be written right-to-left big-endian,
just as the language is written.

These languages then introduced (mostly) decimal representations, using
different characters for multiples of different powers of 10.  Again, they
were written big-endian.  Then the idea of using the same symbol in each
place, with a zero to hold the place, originated in India.  The Indian 
writing is left-to-right.  After the Moslem invasion of India, they adopted
the Indian decimal notation without change.  That is why the Arabic expression
appears as little-endian.

There does not seem to be any support from "natural" languages for the
little-endian approach.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

gpwrdcs@gp.govt.nz (Don Stokes, Govt Print, Wellington) (01/20/89)

In article <20264@ames.arc.nasa.gov>, 
  lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
>                             VAXes, of course, are not consistent little-
> endian or big-endian, but then, we are not supposed to have to read dumps
> anymore anyway, remember ?  :-)
> 
VAXes are definitely little-endian as far as integers go ... and reading 
dumps is not a problem ... VMS DUMP puts the hex part of the dump in reverse
order, so all the bytes are in the right order, and numeric values can be 
easily distinguished.  It is just a matter of learning to read from right to
left...

The important part about little endian vs big endian (which can cause problems)
is overlaying of disimilar data types.  If I overlay a byte onto a word on a 
VAX (or any other little-endian processor), put in a word value < 256, and do a
byte read from the same address, I will get a correct response.  If I do the
same thing with a big-endian processor, I will get zero. 

Of course you don't usually overlay floating point numbers ... so the order of
the bytes in a floating-point number is (usually) irrelevant ...

Don Stokes
Systems Programmer
Government Printing Office, Wellington, New Zealand.

mac@mrk.ardent.com (Michael McNamara) (01/21/89)

In article <1102@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:

|There does not seem to be any support from "natural" languages for the
|little-endian approach.

	Four and twenty black birds, baked in a pie....
|-- 
|Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
|Phone: (317)494-6054
|hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

Michael McNamara 
  mac@ardent.com

jkl@csli.STANFORD.EDU (John Kallen) (01/21/89)

In article <1102@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>
>There does not seem to be any support from "natural" languages for the
>little-endian approach.

What about Danish:     fem og halvfirsindtyve (75 (my Danish is rusty))

Or norwegian:	       en og femti (51). This fooled me once into believing
		       one could rent a room in Paris for Fr 1.50... :-)
		       
Or better yet, German: Zwei und Vierzig  (42!)

I believe Danish, Norwegian and German count as "natural" languages. 
At least in Denmark, Norway and German[y|ies] :-)

John.
_______________________________________________________________________________
 | |   |   |    |\ | |   /|\ | John Kallen       "The light works. The gravity
 | |\ \|/ \|  * |/ | |/|  |  | PoBox 11215        works. Anything else we must
 | |\ /|\  |\ * |\ |   |  |  | Stanford CA 94309  take our chances with."
_|_|___|___|____|_\|___|__|__|_jkl@csli.stanford.edu___________________________

bbadger@x102c.uucp (Badger BA 64810) (01/22/89)

In article <5703@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes:
>In article <1433@trantor.harris-atd.com> bbadger@x102c.UUCP (Badger BA 64810) writes:
>>In article <5658@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes:
>>>2)  Hardware people like to draw diagrams with 0 at bottom-right, software
>>>people, used to printers and screens that print top to bottom, left to right,
>>>like to put 0 at upper-left.  It also makes dumping memory with strings easier
>>>to read.
>
>>DEC VAX DUMP prints out in a format that makes both integers and strings 
>>easy to read.  Namely, it prints out each in their ``natural'' order:
>>Integers in little-endian (right to left), and strings from left to right.
>
>> 4E4D4C4B 4A494847 46454443 4241002F /.ABCDEFGHIJKLMN 000000
>> 69685420 5A595857 56555453 5251504F OPQRSTUVWXYZ Thi 000010
>> 74736574 20612079 6C6E6F20 73692073 s is only a test 000020
>>     <----- numbers go this way <---*---> strings go this way --->
>>
>>People who expect the first word (000000) to appear first (at left) will be 
>>suprised by this, but it's perfectly consistent with the way we write 
>>our numbers and strings.
>
>	I don't know about you (or your hardware), but I tend to write from
>left to right, not right to left.  :-)  And I don't start writing in the
>middle of the page, and go both left and right from there.  :-)
>
Actually, my hardware (VT100 terminal) normally writes left-to-right, but
this doesn't stop me from *reading* right-to-left (and LtR) once an entire 
line is on-screen.
>	Sure you can write this way, or even make things scroll up, but
>most terminals/whatever are easier to deal with in a sequential, left to
>right, top to bottom fashion.  It's marginally more annoying to deal with
>in your way.  Also, I get a headache trying to find the word/byte/whatever
>I'm looking for in a listing like that, I have to reverse my thinking.  :-)
	(Left-to-right and Top-to-bottom are separate issues.)
>
>	Personally, that's a nice kludge to get around the fact that little-
>endian is "naturally" written right to left, bottom to top by most people.
>However, people don't read that way, certainly not text.
>
Aaahh! That's just it.  People reading VMS DUMP output looking for numbers 
*do* read from right-to-left (RtL) (once they get the hang of it :-).  
It's not really hard, and it make sense of all lengths of integers from 
1 byte to n.  The reasons for *choosing* big- or little-endian integer 
representations play more to hardware and software issues than adherence 
to historical human reading conventions.  The point I'm trying to make about 
DUMP output is that (Western) people expect to be able to *read* numeric 
output from left-to-right with the most-significant digits first.  If you 
think the first (i.e., leftmost) byte printed should also have the lowest 
byte-address, you are really *specifying* big-endian order.  By dropping 
this abitrary restriction, VMS DUMP can print the bytes out in a contiguous 
block for that line.

Taking the first line of the dump as an example, 
>> 4E4D4C4B 4A494847 46454443 4241002F /.ABCDEFGHIJKLMN 000000
note that the first two bytes of the file specify a single integer number,
LSB order:  002F  ==> byte(0) = 2F, byte(1) = 00.  It's certainly easier to 
read written MSB (002F) than in storage order (2F00).   
If the next element of the file were ``really'' an INTEGER*4 variable 
(please excuse the use of FORTRAN in mixed company :-), you would catenate 
the "4443 4241" into 44434241.  But if it turned out to be two INTEGER*2 
values you would read "4241" first, then "4443".  

This does result in your eyes moving RtL to increment addressing -- as when 
counting to a specific offset in a record structure -- and then scanning 
back from LtR to read an integer.  This is far easier to put up with than 
printing hexadecimal output with addresses increasing from left-to-right on 
a little-endian machine! 

As far as consistency goes, I always liked the fact that on little-endian 
architectures, the bit numbering (0..31) makes bit $ k $ represent 
$ 2^k $ no matter what the word size is.  Whereas on big-endian 32-bit words 
bit $ k $ equals $ 2 ^ {31 - k} $ and on 16-bit (half) words, the value is
$ 2 ^ {15 - k}$.
That is:
LSB (little-endian):	
        3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
        1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
2^7 =   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
   So 2^7 sets bit number 7.
MSB (little-endian):	
        0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
	0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
2^7 =  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
   So 2^7 sets bit number 24.
        0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 
	0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
2^7 =   0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
   So 2^7 sets bit number 8.

Normally we can sweep these distinctions under a rug of abstraction.  It's 
only when we start to examine machine code or numeric representations that 
we operate on that low a level.

>	I think little-endian is a long-standing joke played by hardware
>engineers of software writers.  :-)
Right.  So if we just play along with the joke in DUMP output, we won't have 
to tangle up our bits too badly.  Of course, then there's communications 
software where some data is MSB and some is LSB, depending whether you're 
using the host format or the network format.  In that case, no matter which 
way we print our dump lines, some data will be written with the LSB on the 
left.

P.S.  You mentioned the bottom/top issue:  whether to print the low addresses 
at the top (normal first-things-first order) or at the bottom (like most 
hardware address space diagrams, or STACK dumps).  Again the most convenient 
order depends on the use that is made of the data, what its internal format 
*is*.  Both forms of output are useful.  The VAX DUMP doesn't have a "FFFFFFFF
at top" option.  Too bad.  
Bernard A. Badger Jr.	407/984-6385   | ``Use the Source, Luke!''
Secure UNIX Products                   | That's not a bug! It's a feature!
Harris GISD, Melbourne, FL  32902      | Buddy, can you paradigm?
Internet: bbadger@x102c.harris-atd.com | 's/./&&/' Tom sed [sic] expansively.

john@frog.UUCP (John Woods) (01/24/89)

In article <7193@csli.STANFORD.EDU>, jkl@csli.STANFORD.EDU (John Kallen) writes:
> In article <1102@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> >There does not seem to be any support from "natural" languages for the
> >little-endian approach.
> What about Danish:     fem og halvfirsindtyve (75 (my Danish is rusty))
> Or norwegian:	       en og femti (51). This fooled me once into believing
> 		       one could rent a room in Paris for Fr 1.50... :-)
> Or better yet, German: Zwei und Vierzig  (42!)

Ah, but consider the German for 1988: neunzehn hundert acht und achtzig
(nine-and-ten hundred eight and eighty).  Middle-endian.  AHA!  Germans
are PDP-11s!

:-)
-- 
John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

Presumably this means that it is vital to get the wrong answers quickly.
		Kernighan and Plauger, The Elements of Programming Style

nol2105@dsacg2.UUCP (Robert E. Zabloudil) (01/25/89)

In article <1916@ardent.UUCP>, mac@mrk.ardent.com (Michael McNamara) writes:
> In article <1102@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> |There does not seem to be any support from "natural" languages for the
> |little-endian approach.
> 	Four and twenty black birds, baked in a pie....

In German:  24 == vierundzwnzig
In Dutch it's expressed similarly

Also compare English thirteen, fourteen, ... nineteen.

cik@l.cc.purdue.edu (Herman Rubin) (01/25/89)

In article <250@dsacg2.UUCP>, nol2105@dsacg2.UUCP (Robert E. Zabloudil) writes:
> In article <1916@ardent.UUCP>, mac@mrk.ardent.com (Michael McNamara) writes:
> > In article <1102@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> > |There does not seem to be any support from "natural" languages for the
> > |little-endian approach.
> > 	Four and twenty black birds, baked in a pie....
> 
> In German:  24 == vierundzwnzig
> In Dutch it's expressed similarly
> 
> Also compare English thirteen, fourteen, ... nineteen.

If you read my posting, I did state that there was reversal of the units and
tens digits in many languages.  This occurs regular in the Germanic languages,
as many have posted.  In Spanish, it only occurs from 11-15, and in French,
from 11-16.  A correction to my statement about Hebrew; it also applies there
to hundreds, but either order can occur, and in fact both orders occur in the
same passage.

However, my statement still holds.  To give a counterexample, it would be
necessary to come up with examples where such numbers as 46,378 have the
378 before the 46,000.  I know of no such examples.  The clear resolution
of this problem occurs in these cases of multi-"byte" expressions.  

The early symbolic representation of numbers by alphabetic characters or other
symbols is, in every case to my knowledge, in the same order as the written
letters.  Even the Roman numerals do this, in that if a less significant
symbol appears before a more significant one, it is treated anomalously. 
But the Roman numerals were not used for calculating.  The early numerical
representations used letters, but because of no 0 symbol, different letters
were used in different places, or other devices were used.  I know of no
ancient little-endian devices.  In Hebrew, 378 would always be 300 first,
then 70, then 8, in the right-to-left direction of the writing, even though
both word orders occur, and the other order would be unambiguous.

The apparent little-endianness of Arabic is due to the direct importation of
the left-to-right symbolic numerical writing from India.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

cramer@optilink.UUCP (Clayton Cramer) (01/27/89)

In article <1371@X.UUCP., john@frog.UUCP (John Woods) writes:
. In article <7193@csli.STANFORD.EDU., jkl@csli.STANFORD.EDU (John Kallen) writes:
. . In article <1102@l.cc.purdue.edu. cik@l.cc.purdue.edu (Herman Rubin) writes:
. . .There does not seem to be any support from "natural" languages for the
. . .little-endian approach.
. . What about Danish:     fem og halvfirsindtyve (75 (my Danish is rusty))
. . Or norwegian:	       en og femti (51). This fooled me once into believing
. . 		       one could rent a room in Paris for Fr 1.50... :-)
. . Or better yet, German: Zwei und Vierzig  (42!)
. 
. Ah, but consider the German for 1988: neunzehn hundert acht und achtzig
. (nine-and-ten hundred eight and eighty).  Middle-endian.  AHA!  Germans
. are PDP-11s!
. -- 
. John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101

I used to work for a German company, and you haven't seen confusion
until you've seen a bunch of German engineers trying to say "68000"
in English, and it keeps coming out "86000", for exactly that reason.

It's an understandable mistake, and we rather got used to it after
a while.
-- 
Clayton E. Cramer
{pyramid,pixar,tekbspa}!optilink!cramer
Disclaimer?  You must be kidding!  No company would hold opinions like mine!