eliot@chutney.rtp.dg.com (Topher Eliot) (05/08/91)
There has been some mention of the "problem" of endianism in code sets, in particular in Unicode. This seems to me to be completely avoidable, if one only gives it a little thought. Endianism problems arise when one is dealing with an item of data that is larger than the smallest addresseable item (e.g. with a 16-bit quantity, on a machine with addresseable bytes), and moreover compelled to decide both 1) Which end comes "first" (e.g. for transmittal over a serial communication channel) 2) Which end has greater significance (e.g. when using the bits as an integer). With code sets, we are not required to do (2). Just because something is 16 bits long doesn't mean that it is a 16-bit integer. We no more have to agree on which of the bytes is more significant than the other than we have to agree on which bits are the mantissa and which are the exponent. We do have to keep straight which byte comes _first_, but I can't see any problem in that. We are very used to thinking that, for example, "0A59" represents an integral value, with 00001010 in the higher significance bits, and 01011001 in the low order bits. We just need to learn to think of (and process) "U+0A59" as representing the character with 00001010 in the _first_ byte, etc. This may represent a minor inconvenience for little-endian systems, where a text string representation like "\x0A59" would have to be parsed differently than parsing the integer value 0xA59. I say "minor" because I'm not doing the work :-) Of course, some implementations may choose to _treat_ a character as an integer, for indexing into a table or whatever. Such implementations probably will not be portable to other architectures that are differently-ended, without appropriate provisions for the reversal. Also, I've never seen the Unicode standard, and maybe the authors there did something foolish that comitted them to treating one byte of each character as being more significant than the other. Well, I managed to stir things up with my posting about message numbers. Is this one controversial, too? -- Topher Eliot Data General DG/UX Internationalization (919) 248-6371 62 T. W. Alexander Dr., Research Triangle Park, NC 27709 eliot@dg-rtp.dg.com {backbone}!mcnc!rti!dg-rtp!eliot Obviously, I speak for myself, not for DG.