[sci.lang] accented alphabets and computers

srg@quick.COM (Spencer Garrett) (09/20/87)

I'm the one who posted the remark about English and Russian being the only
two languages written using unaccented alphabets.  Perhaps I should have
made the distinction more clear.  By "unaccented" I meant "written without
overstrikes" and "sorted one letter at a time in a fixed order".  These
characteristics make dealing with computers a darn sight easier, and thus
represent advantages in this context.  By this metric modern English and
Russian qualify.  We use accents in English only to write foreign words
which use modified Roman alphabets.  We transliterate other languages
because a) few people would even be able to pronounce the words, much less
understand them and b) we often don't have the facilities to render other
writing schemes.  The Russian letters "yo" and "e kratkoe" are not accented
letters by this definition.  They have their own keys on typewriters, their
own place in the collating sequence, and presumably their own values in
whatever character code Russian computers use (RSCII ?).  The only exceptions
which have come to my attention are minor languages which use the English
alphabet, presumably because they were codified by English-speaking
missionaries and such.  I don't mean to imply by all this that I think
other languages are inferior and should be changed or forgotten, but I do
think this observation helps explain why English is so often used to talk
to computers even in non-English-speaking countries.  With the advent of
configurable keyboards and bitmapped screens (I know a Russian major who
does all her papers on a Macintosh) we could conceivably "fix" this problem
for some languages (e.g. the Scandinavian group) by giving the accented
letters their own keys and character codes, at the cost of a proliferation
of standards.  The cost of implementing a "superset" standard is unfortunately
unacceptable.  (Imagine merely doubling the size of all the "text" files
on your system!)  I don't think there will ever be a *solution* to this
problem, just various ways of dealing with its existence.

daveb@geac.UUCP (Brown) (09/21/87)

In article <120@quick.COM> srg@quick.COM (Spencer Garrett) writes:
>...[reasoned discussion of collating sequences of languages].
>  The cost of implementing a "superset" standard is unfortunately
>unacceptable.  (Imagine merely doubling the size of all the "text" files
>on your system!)  I don't think there will ever be a *solution* to this
>problem, just various ways of dealing with its existence.

  On the other hand, do think there will be a solution.  Why? Because
a 100% increase in disk space *has* been accepted by even
penny-pinching commercial customers.... Don't forget, all of the old
mainframes used 4-bit BCD at one time or another, and are now using
6-bit, 7-bit ascii and 8-bit ebcdic.
  --dave

-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

sommar@enea.UUCP (Erland Sommarskog) (09/22/87)

srg@quick.COM (Spencer Garrett) writes:
>I'm the one who posted the remark about English and Russian being the only
>two languages written using unaccented alphabets.  Perhaps I should have
>made the distinction more clear.  By "unaccented" I meant "written without
>overstrikes" and "sorted one letter at a time in a fixed order".  These
>characteristics make dealing with computers a darn sight easier, and thus
>represent advantages in this context.  By this metric modern English and
>Russian qualify.  

Don't you get the feeling that we are just going round and round in this
discssuion? Since Mr. Garrett seems to have missed a lot of the discussion,
I will have to repeat. English is by no means unique among the latin-written
languages. Swedish, Danish, Finnish, Norwegian and I guess also Dutch
qualify as well. 

>I don't mean to imply by all this that I think
>other languages are inferior and should be changed or forgotten, but I do
>think this observation helps explain why English is so often used to talk
>to computers even in non-English-speaking countries.  

That has nothing to do with simplicity. It as just a matter of dominating
culture. If French had been the leading language, computers would
have been able to handles accents decently today. (Et, ils avraient parl'e
le fran,cais.)

>The cost of implementing a "superset" standard is unfortunately
>unacceptable.  (Imagine merely doubling the size of all the "text" files
>on your system!)  

And stupid me, thinking we had computers to make things easier and better.
Forever, I will do better with a simple pen or a typewriter. :-)
Of course we can afford a better character standard! We have computers
to serve and help us, not rule and delimit us!


-- 

Erland Sommarskog       
ENEA Data, Stockholm    
sommar@enea.UUCP        

gadfly@ihlpa.ATT.COM (Gadfly) (09/22/87)

--
> I'm the one who posted the remark about English and Russian being the only
> two languages written using unaccented alphabets.  Perhaps I should have
> made the distinction more clear.  By "unaccented" I meant "written without
> overstrikes" and "sorted one letter at a time in a fixed order"...

Danish (and Norwegian, which uses the same alphabet) also qualify.
The extra (w/r/t English) vowels are unique single letters (sort order
following "z").  The "a"-with-a-circle-over-it (not an overstrike any
more than the dot over a Roman "i") is a fairly modern creation that
used to be written "aa" (same pronunciation, close to English sound
"aw").  This changed the sort order of words beginning with that letter
quite dramatically ("aa"s come first; "a"-with-thingie comes last).
So, people whose names began with "aa" got to choose whether they would
modernize the spelling.

                      *** ***
J'EN AI RAS-LE-BOL  ***** *****
                   ****** ******  22 Sep 87 [6ieme Jour Sans-culottide An CXCV]
ken perlow         *****   *****
(312)979-8042       ** ** ** **
ihnp4!ihlpa!gadfly    *** ***

franka@mmintl.UUCP (09/26/87)

In article <1430@geac.UUCP> daveb@geac.UUCP (Dave Collier-Brown) writes:
>a 100% increase in disk space *has* been accepted by even
>penny-pinching commercial customers.... Don't forget, all of the old
>mainframes used 4-bit BCD at one time or another, and are now using
>6-bit, 7-bit ascii and 8-bit ebcdic.

Sorry, but 4-bit BCD uses 4 bits to store a digit, and 2 digits to store a
character: voila, 8 bit characters.

There *may* have been some machine, some time, which used 5 bit characters;
but I doubt it.  5 bits is not enough to store an upper-case only alphabet
and 10 digits.  (5 bit codes have been used in specialized data structures,
where the data is known to be mono-case alphabetic.)  But basically, 6 bits
is the minimum character size any computer has used; and 8 bit characters go
all the way back.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

daveb@geac.UUCP (Brown) (09/28/87)

In article <2418@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>In article <1430@geac.UUCP> daveb@geac.UUCP (Dave Collier-Brown) writes:
>>a 100% increase in disk space *has* been accepted by even
>>penny-pinching commercial customers.... Don't forget, all of the old
>>mainframes used 4-bit BCD at one time or another, and are now using
>>6-bit, 7-bit ascii and 8-bit ebcdic.
>
>Sorry, but 4-bit BCD uses 4 bits to store a digit, and 2 digits to store a
>character: voila, 8 bit characters.
>

  Oh dear, think I had a brain-fault.
  When I posted that I was thinking of excess-three, which only requires
5 bits.  Not bcd, which requires 6 for the alphabetic extension.
  Therefore, the first line above should be "a 33% increase in
disk...", unless you are on a 36-bit Honeywell, where it's a 50%
increase (4 bytes/word, down from 6).

-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

zentrale@rmi.UUCP (RMI Net) (10/17/87)

In article <120@quick.COM> srg@quick.COM (Spencer Garrett) writes:
: I'm the one who posted the remark about English and Russian being the only
: two languages written using unaccented alphabets.  Perhaps I should have
: made the distinction more clear.  By "unaccented" I meant "written without
: overstrikes" and "sorted one letter at a time in a fixed order".  ...
: ...         We use accents in English only to write foreign words
: which use modified Roman alphabets.  We transliterate other languages
: because a) few people would even be able to pronounce the words, much less
: understand them and b) we often don't have the facilities to render other
: writing schemes.  The Russian letters "yo" and "e kratkoe" are not accented
: letters by this definition.  They have their own keys on typewriters, their
: own place in the collating sequence, and presumably their own values in
: whatever character code Russian computers use (RSCII ?).  ...

Using this definition, German also is unaccented. The "Umlaut" Characters
have their own keys ond are not produced by overstriking. They also have
their own ASCII values (different ones in the normal and the IBM world).

-rm
*****************************************************************
* addresses:  uucp   rmohr@rmi.uucp       rmohr@unido.bitnet    *
*             bix    rmiaachen            Btx    024121144-0001 *
*             cis    72446,415                                  *
*****************************************************************

zentrale@rmi.UUCP (RMI Net) (10/17/87)

In article <2418@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:

: There *may* have been some machine, some time, which used 5 bit characters;
: but I doubt it.  5 bits is not enough to store an upper-case only alphabet
: and 10 digits.  (5 bit codes have been used in specialized data structures,
: where the data is known to be mono-case alphabetic.)  But basically, 6 bits
: is the minimum character size any computer has used; and 8 bit characters go
: all the way back.
: -- 
: 
: Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
: Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

Don't forget the TELEX (Baudot) Code, most of our "modern" Telex Computers
have to deal with ... (5 bits/Case switching -Upper/Lower)

 Rupert Mohr