daveb@geac.UUCP (Dave Brown) (07/23/87)
In article <463@unisoft.UUCP> greywolf@unisoft.UUCP (The Grey Wolf @ ext 165) writes: > What is the problem here? I see nothing wrong eight bits for a >character. Can you come up with anything better? What's the matter? >Are escape sequences for special characters too much for you to handle? >Gimme a break. > > Disgusted that this discussion is even *happening*, Point 1: languages other than english have larger character sets than english. Unless you wish to read transliterated text, the characters set used should contain *both* the english and non-english glyphs. Escape sequencing occupies much more space than a large character set, and is understandably unpopular in (for example) Japan. Point 2: FLAME ON Your last two sentences are an insult to non-english-speakers. FLAME OFF --David (Je ne parle pas Fran{\c}ais, je suis un rosbif) Collier-Brown -- David (Collier-) Brown. | Computer Science Geac Computers International Inc., | loses its memory 350 Steelcase Road,Markham, Ontario, | (if not its mind) CANADA, L3R 1B3 (416) 475-0525 x3279 | every 6 months.
dhesi@bsu-cs.UUCP (Rahul Dhesi) (08/01/87)
In article <857@bsu-cs.UUCP> I wrote: >A byte is therefore exactly 8 bits. No more and no less. Amidst all the name-calling that followed, the following objection to my statement was faintly discernible: Not all character sets will fit in 8 bits. This is true, but it does not affect my claim. A byte *is* exactly 8 bits. First, 8 bits suffices for *most* of the world's languages. Second, even if 8 bits is insufficient to hold a given character set (and this is true for only a few languages), this simply means that tradition must give way, and "character" and "byte" will not be synonymous. (If ANSI is not prepared for this, it's in for a rude shock, in my opinion.) Consider computer communications. The world's networks deal in 8-bit units. Political reality being what it is, it was considered unwise to call these bytes. They are called octets. What does one do with a machine/character set with 9-bit bytes? Map them to 8-bit bytes and lose some information, or split them with shifting/masking and transmit them as 8-bit units anyway. One then finds things rather awkward. One embraces the 8-bit byte as soon as possible. Consider the cost-benefit analysis manufacturers must do. Those that want bytes to be other than 8 bits must give up the convenience of using a lot of off-the-shelf parts. Custom hardware is expensive. Consider simple elegance. With a 9-bit byte, one is either stuck with wasted bits in a 32-bit machine word, or one must use a 36-bit word and end up with wasted bits within machine instructions and within data structures and/or get a nonorthogonal machine architecture. (Aside: Why do we see useless machine instructions such as "jump never, label" and "mov a,a"? Because orthogonality simplifies machine design.) The same goes for any other byte size except 16 bits, in which case we could just as well take a pair of 8-bit bytes and call them by a new name. Consider devices. The 8-bit byte is a standard unit of information transfer using tape drives. And I have a hunch most disk drives/ controllers are designed with 512-bytes-per-sector formatting in mind, which won't neatly fit with any arbitrary byte/word size. Consider a lot of things, and the 8-bit byte stares you in the face. And consider that in most cases, if 8 bits are not enough, neither are 9, or 10, or perhaps even 11. How, then, does one deal with a character set that won't fit in 8 bits? Predictions: o Such characters will, in the future, occupy two bytes. o There will be an increasing trend towards using transliterations that will allow unusual character sets to be represented using the Roman alphabet o Increasingly, computations will be done using English, even in countries where English is not a major language o Special-purpose machines using esoteric sizes of data units will continue to exist but will not replace general-purpose computers, which will continue to be based on the 8-bit byte. -- Rahul Dhesi UUCP: {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!dhesi
elman@sdamos.ling.ucsd.edu (Jeff Elman) (08/01/87)
One of the arguments people have advanced in favor of "bigger bytes" is to accommodate a broader diversity of character sets. 8-bit byters have been accused of being lingua-centric (or worse) by assuming that 8-bits suffice for all characters. Kanji is usually mentioned as a counter-example. I'm a little confused about this argument. While Kanji are often called "characters", they're not characters in the sense most people probably understand. Kanji are ideograms, and Kanji characters (or character pairs) correspond to what we think of as words. Is the proposal thus that bytes should be capable of transmitting entire words? That hardly seems reasonable. Or have I missed something? Jeff Elman Linguistics/UCSD elman@amos.ling.ucsd.edu
gwyn@brl-smoke.ARPA (Doug Gwyn ) (08/02/87)
In article <3566@sdcsvax.UCSD.EDU> elman@amos.ling.ucsd.edu (Jeff Elman) writes: >I'm a little confused about this argument. While Kanji are often >called "characters", they're not characters in the sense most people >probably understand. Kanji are ideograms, and Kanji characters (or >character pairs) correspond to what we think of as words. Is the proposal >thus that bytes should be capable of transmitting entire words? That >hardly seems reasonable. The confusion is introduced by trying to take "character" and "word" too literally. What is necessary computationally is support for handling individual basic textual units, whatever they might be. In English, that includes letters of the alphabet in both upper- and lower-case as well as digits and punctuation and separator symbols. One could include additional formatting controls as well, and for some specialized disciplines such as mathematics a batch of funny-looking squiggly things are also needed. Thus, the desired "character set" contains whatever is necessary so that a sequence of selections from the set can represent the language. In any case, the point was that a BYTE is NOT in general large enough to encode all requisite basic textual units.
guy%gorodish@Sun.COM (Guy Harris) (08/02/87)
> This is true, but it does not affect my claim. A byte *is* exactly > 8 bits. > ...9-bit bytes... > ... 9-bit byte ... Gee, if a byte *is* (emphasis yours) exactly 8 bits, why are you talking about 9-bit bytes? A byte is not exactly 8 bits. Proof by counterexample: the DEC 10s and 20s. Or, for an even better counterexample, namely one that runs UNIX: the Sperroughs Burrivac 1100 series. If you meant "a byte *should be* exactly 8 bits wherever possible", that's an opinion that can be meaningfully discussed. However, you said "a byte *is* exactly 8 bits", which is not an opinion, but an antifact (i.e., it is stated as if it were a fact not subject to debate, but it is false). Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
mark@ems.MN.ORG (Mark H. Colburn) (08/03/87)
In article <911@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: >In article <857@bsu-cs.UUCP> I wrote: >This is true, but it does not affect my claim. A byte *is* exactly >8 bits. ARGH!!!!!!! NO! An 'octet' is exactly eight bits, a byte is whatever size corresponds to the machine on which you are working. Saying that a byte is 8 bits is like saying a word is *EXACTLY* 16 bits! Do any hear any BOOs out there from the people working on 680X0, or Crays, or Amdahls or ...???? The ASCII character set is exactly 8 bits, true, but that does not correspond to byte size. The standard transfer data size for most telecommunications protocols uses 8-bit characters, true, but that, again, does not correspond to anything. And, again, *MOST* machines these days use 8-bit bytes, but, that does not mean that all bytes are 8 bits long. Drop it, your wrong! -- Mark H. Colburn DOMAIN: mark@ems.MN.ORG EMS/McGraw-Hill UUCP: ihnp4!meccts!ems!mark AT&T: (612) 829-8200
henry@utzoo.UUCP (Henry Spencer) (08/05/87)
> o Increasingly, computations will be done using English, even in > countries where English is not a major language The French and the Japanese, to name two, will dispute this. And let us not forget that "computations" nowadays are often done by people like secretaries and accountants, who do not appreciate having to learn a foreign language to do them. -- Support sustained spaceflight: fight | Henry Spencer @ U of Toronto Zoology the soi-disant "Planetary Society"! | {allegra,ihnp4,decvax,utai}!utzoo!henry