srg@quick.COM (Spencer Garrett) (09/20/87)
I'm the one who posted the remark about English and Russian being the only two languages written using unaccented alphabets. Perhaps I should have made the distinction more clear. By "unaccented" I meant "written without overstrikes" and "sorted one letter at a time in a fixed order". These characteristics make dealing with computers a darn sight easier, and thus represent advantages in this context. By this metric modern English and Russian qualify. We use accents in English only to write foreign words which use modified Roman alphabets. We transliterate other languages because a) few people would even be able to pronounce the words, much less understand them and b) we often don't have the facilities to render other writing schemes. The Russian letters "yo" and "e kratkoe" are not accented letters by this definition. They have their own keys on typewriters, their own place in the collating sequence, and presumably their own values in whatever character code Russian computers use (RSCII ?). The only exceptions which have come to my attention are minor languages which use the English alphabet, presumably because they were codified by English-speaking missionaries and such. I don't mean to imply by all this that I think other languages are inferior and should be changed or forgotten, but I do think this observation helps explain why English is so often used to talk to computers even in non-English-speaking countries. With the advent of configurable keyboards and bitmapped screens (I know a Russian major who does all her papers on a Macintosh) we could conceivably "fix" this problem for some languages (e.g. the Scandinavian group) by giving the accented letters their own keys and character codes, at the cost of a proliferation of standards. The cost of implementing a "superset" standard is unfortunately unacceptable. (Imagine merely doubling the size of all the "text" files on your system!) I don't think there will ever be a *solution* to this problem, just various ways of dealing with its existence.
daveb@geac.UUCP (Brown) (09/21/87)
In article <120@quick.COM> srg@quick.COM (Spencer Garrett) writes: >...[reasoned discussion of collating sequences of languages]. > The cost of implementing a "superset" standard is unfortunately >unacceptable. (Imagine merely doubling the size of all the "text" files >on your system!) I don't think there will ever be a *solution* to this >problem, just various ways of dealing with its existence. On the other hand, do think there will be a solution. Why? Because a 100% increase in disk space *has* been accepted by even penny-pinching commercial customers.... Don't forget, all of the old mainframes used 4-bit BCD at one time or another, and are now using 6-bit, 7-bit ascii and 8-bit ebcdic. --dave -- David Collier-Brown. {mnetor|yetti|utgpu}!geac!daveb Geac Computers International Inc., | Computer Science loses its 350 Steelcase Road,Markham, Ontario, | memory (if not its mind) CANADA, L3R 1B3 (416) 475-0525 x3279 | every 6 months.
sommar@enea.UUCP (Erland Sommarskog) (09/22/87)
srg@quick.COM (Spencer Garrett) writes: >I'm the one who posted the remark about English and Russian being the only >two languages written using unaccented alphabets. Perhaps I should have >made the distinction more clear. By "unaccented" I meant "written without >overstrikes" and "sorted one letter at a time in a fixed order". These >characteristics make dealing with computers a darn sight easier, and thus >represent advantages in this context. By this metric modern English and >Russian qualify. Don't you get the feeling that we are just going round and round in this discssuion? Since Mr. Garrett seems to have missed a lot of the discussion, I will have to repeat. English is by no means unique among the latin-written languages. Swedish, Danish, Finnish, Norwegian and I guess also Dutch qualify as well. >I don't mean to imply by all this that I think >other languages are inferior and should be changed or forgotten, but I do >think this observation helps explain why English is so often used to talk >to computers even in non-English-speaking countries. That has nothing to do with simplicity. It as just a matter of dominating culture. If French had been the leading language, computers would have been able to handles accents decently today. (Et, ils avraient parl'e le fran,cais.) >The cost of implementing a "superset" standard is unfortunately >unacceptable. (Imagine merely doubling the size of all the "text" files >on your system!) And stupid me, thinking we had computers to make things easier and better. Forever, I will do better with a simple pen or a typewriter. :-) Of course we can afford a better character standard! We have computers to serve and help us, not rule and delimit us! -- Erland Sommarskog ENEA Data, Stockholm sommar@enea.UUCP
gadfly@ihlpa.ATT.COM (Gadfly) (09/22/87)
-- > I'm the one who posted the remark about English and Russian being the only > two languages written using unaccented alphabets. Perhaps I should have > made the distinction more clear. By "unaccented" I meant "written without > overstrikes" and "sorted one letter at a time in a fixed order"... Danish (and Norwegian, which uses the same alphabet) also qualify. The extra (w/r/t English) vowels are unique single letters (sort order following "z"). The "a"-with-a-circle-over-it (not an overstrike any more than the dot over a Roman "i") is a fairly modern creation that used to be written "aa" (same pronunciation, close to English sound "aw"). This changed the sort order of words beginning with that letter quite dramatically ("aa"s come first; "a"-with-thingie comes last). So, people whose names began with "aa" got to choose whether they would modernize the spelling. *** *** J'EN AI RAS-LE-BOL ***** ***** ****** ****** 22 Sep 87 [6ieme Jour Sans-culottide An CXCV] ken perlow ***** ***** (312)979-8042 ** ** ** ** ihnp4!ihlpa!gadfly *** ***
franka@mmintl.UUCP (09/26/87)
In article <1430@geac.UUCP> daveb@geac.UUCP (Dave Collier-Brown) writes: >a 100% increase in disk space *has* been accepted by even >penny-pinching commercial customers.... Don't forget, all of the old >mainframes used 4-bit BCD at one time or another, and are now using >6-bit, 7-bit ascii and 8-bit ebcdic. Sorry, but 4-bit BCD uses 4 bits to store a digit, and 2 digits to store a character: voila, 8 bit characters. There *may* have been some machine, some time, which used 5 bit characters; but I doubt it. 5 bits is not enough to store an upper-case only alphabet and 10 digits. (5 bit codes have been used in specialized data structures, where the data is known to be mono-case alphabetic.) But basically, 6 bits is the minimum character size any computer has used; and 8 bit characters go all the way back. -- Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
daveb@geac.UUCP (Brown) (09/28/87)
In article <2418@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes: >In article <1430@geac.UUCP> daveb@geac.UUCP (Dave Collier-Brown) writes: >>a 100% increase in disk space *has* been accepted by even >>penny-pinching commercial customers.... Don't forget, all of the old >>mainframes used 4-bit BCD at one time or another, and are now using >>6-bit, 7-bit ascii and 8-bit ebcdic. > >Sorry, but 4-bit BCD uses 4 bits to store a digit, and 2 digits to store a >character: voila, 8 bit characters. > Oh dear, think I had a brain-fault. When I posted that I was thinking of excess-three, which only requires 5 bits. Not bcd, which requires 6 for the alphabetic extension. Therefore, the first line above should be "a 33% increase in disk...", unless you are on a 36-bit Honeywell, where it's a 50% increase (4 bytes/word, down from 6). -- David Collier-Brown. {mnetor|yetti|utgpu}!geac!daveb Geac Computers International Inc., | Computer Science loses its 350 Steelcase Road,Markham, Ontario, | memory (if not its mind) CANADA, L3R 1B3 (416) 475-0525 x3279 | every 6 months.
zentrale@rmi.UUCP (RMI Net) (10/17/87)
In article <120@quick.COM> srg@quick.COM (Spencer Garrett) writes:
: I'm the one who posted the remark about English and Russian being the only
: two languages written using unaccented alphabets. Perhaps I should have
: made the distinction more clear. By "unaccented" I meant "written without
: overstrikes" and "sorted one letter at a time in a fixed order". ...
: ... We use accents in English only to write foreign words
: which use modified Roman alphabets. We transliterate other languages
: because a) few people would even be able to pronounce the words, much less
: understand them and b) we often don't have the facilities to render other
: writing schemes. The Russian letters "yo" and "e kratkoe" are not accented
: letters by this definition. They have their own keys on typewriters, their
: own place in the collating sequence, and presumably their own values in
: whatever character code Russian computers use (RSCII ?). ...
Using this definition, German also is unaccented. The "Umlaut" Characters
have their own keys ond are not produced by overstriking. They also have
their own ASCII values (different ones in the normal and the IBM world).
-rm
*****************************************************************
* addresses: uucp rmohr@rmi.uucp rmohr@unido.bitnet *
* bix rmiaachen Btx 024121144-0001 *
* cis 72446,415 *
*****************************************************************
zentrale@rmi.UUCP (RMI Net) (10/17/87)
In article <2418@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
: There *may* have been some machine, some time, which used 5 bit characters;
: but I doubt it. 5 bits is not enough to store an upper-case only alphabet
: and 10 digits. (5 bit codes have been used in specialized data structures,
: where the data is known to be mono-case alphabetic.) But basically, 6 bits
: is the minimum character size any computer has used; and 8 bit characters go
: all the way back.
: --
:
: Frank Adams ihnp4!philabs!pwa-b!mmintl!franka
: Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
Don't forget the TELEX (Baudot) Code, most of our "modern" Telex Computers
have to deal with ... (5 bits/Case switching -Upper/Lower)
Rupert Mohr