[comp.arch] 'big endian' and 'little endian' - first usage for computer

GQ.RLG@forsythe.stanford.edu (Dick Guertin) (01/29/89)

I'm surprised that the portability issue of code (in C or any other
language) for things like Data Base Management Systems has not been
raised.  This is where I'm finding a BIG headache when it comes to
the big-endian or little-endian approach to data storage.  Both of
these systems lay out "strings" the same way, from lowest address
to highest address for the English (left-to-right) order of reading.
This makes comparisons of "strings" the same on all machines.  When
such "strings" are written into a Data Base, they are transportable.
But that is not true of multi-byte <binary> values.  When written by
a big-endian machine, the result is not readable by little-endian
machines...and vice-versa.

Another important consideration is the collating sequence of positive
<binary> values written as the keys of records that also can receive
"string" values as keys.  Unless the TYPE of the data is included in
the data, the key search algorithm hasn't got a clue when <binary>
values are intermixed with "strings" on little-endian systems.

In the big-endian systems, the sign-bit of <binary> values can be
flipped and sent to the file using "string" match operations to
place the key into proper collating sequence.  This is NOT TRUE
for little-endian systems.

Therefore, I believe big-endian storage of <binary> data is
superior since retrieval algorithms can be blind to data type.
All data, be it "string" or <binary>, sorts properly on
big-endian systems when the sign-bit is flipped for <binary>
values.  That's something that can easily be done when building
record keys, and then the TYPE of the key doesn't have to be
considered any longer.  On a little-endian system, not only does
the sign-bit need to be flipped, but the data bytes need to be
reversed in the stored record so "typeless" searches can be
performed by the Data Base Record Manager.

A Data Management System needs to be sensitive to big/little
endian machines, and that makes transportability of such systems
a matter of concern to the developers.  Just another headache!

peter@ficc.uu.net (Peter da Silva) (02/01/89)

[deleted discussion about using sign bit on binary data for search
 keys to distinguish them from strings in a bigendian database]

You can't assume all characters in the character set are positive. EBCDIC
and ISO latin 1 violate this assumption.
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.
Work: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.   `-_-'
Home: bigtex!texbell!sugar!peter, peter@sugar.uu.net.                 'U`
Opinions may not represent the policies of FICC or the Xenix Support group.

firth@sei.cmu.edu (Robert Firth) (02/01/89)

In article <2951@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>[deleted discussion about using sign bit on binary data for search
> keys to distinguish them from strings in a bigendian database]
>
>You can't assume all characters in the character set are positive. EBCDIC
>and ISO latin 1 violate this assumption.

Um, I'd like to rephrase that.  The character sets you mention use
encodings between 0 and 255, which surely look positive.  Indeed,
if you ordered them by signed comparison, you'd go seriously wrong.

I think the message is that the binary representation of a character
should always be treated as unsigned; if you apply signed operations
to it, you're wrong.  Should it happen that all the characters in
the set have a leading zero, you're wrong but lucky.

GQ.RLG@forsythe.stanford.edu (Dick Guertin) (02/03/89)

In article <2951@ficc.uu.net>,
peter@ficc.uu.net (Peter da Silva) writes:
>[deleted discussion about using sign bit on binary data for search
> keys to distinguish them from strings in a bigendian database]
>
>You can't assume all characters in the character set are positive.
>EBCDIC and ISO latin 1 violate this assumption.

Search algorithms typically work with "unsigned char"
where there are no positive or negative characters.