[comp.std.internat] Change the software or the alphabet?

karl@haddock.ISC.COM (Karl Heuer) (10/21/87)

Several posters have remarked that certain algorithms would be much simpler if
the natural language being processed by them had been designed sensibly.  This
remark usually draws a reply like "The software must change to meet the needs
of the users.  Computers are the servant of Man, not the other way around.  It
is absurd to suggest that a society should change its alphabet.".

It's true that a computer is a tool.  What nobody seems to have noticed is
that *natural language is also a tool*.  The alphabet is the servant of Man,
not the other way around; thus it is appropriate to suggest that it should
evolve to meet Man's changing needs.

I learned from the textbooks that English has certain rules concerning whether
punctuation goes inside or outside of quotes.  As a computer user, I regularly
break these rules and instead apply a more sensible one: the punctuation goes
inside if and only if it is part of the text being quoted.  If the text being
quoted is input to a computer, this can be critical; but I do this even with
straight English.  We who follow this convention are figuratively rewriting
the textbooks.

If it is painful to adapt the software to handle the peculiarities of certain
languages/alphabets (I have in mind Chinese, Japanese, and to a lesser extent
the accented letters of some European languages, and to some extent English),
then it is reasonable to consider the possibility that the language/alphabet
should change instead of the software.  I am not saying that the former *must*
be the one to change, only that it should be considered.  I recognize that
there's a lot of inertia to overcome, but might not the benefits be worth it?

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
Disclaimer: The word "Man" in the above denotes the entire species.
--->  Followup cautiously; this article was cross-posted!  <---

dik@cwi.nl (Dik T. Winter) (10/21/87)

In article <1446@haddock.ISC.COM> karl@haddock.isc.com (Karl Heuer) writes:
 >                                          The alphabet is the servant of Man,
 > not the other way around; thus it is appropriate to suggest that it should
 > evolve to meet Man's changing needs.

 > If it is painful to adapt the software to handle the peculiarities of certain
 > languages/alphabets (I have in mind Chinese, Japanese, and to a lesser extent
 > the accented letters of some European languages, and to some extent English),
 > then it is reasonable to consider the possibility that the language/alphabet
 > should change instead of the software.  I am not saying that the former *must*
 > be the one to change, only that it should be considered.  I recognize that
 > there's a lot of inertia to overcome, but might not the benefits be worth it?

Oh yes, pi is about 3.1103.
I do not understand you ask?
Well it is clear, we use base 8.
Oh, you ask, why not base 16?
Mm, are our computers using different alphabets?
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

gls@odyssey.ATT.COM (g.l.sicherman) (10/22/87)

> It's true that a computer is a tool.  What nobody seems to have noticed is
> that *natural language is also a tool*.  The alphabet is the servant of Man,
> not the other way around; thus it is appropriate to suggest that it should
> evolve to meet Man's changing needs.

What Karl says is true, but it might be well to distinguish the alphabet
from natural language.  Alphabetic writing is highly unnatural.  Historically,
when alphabets were introduced they proved as revolutionary as computers are
proving now.  Remember the story of Cadmus and the dragon's teeth?

> I learned from the textbooks that English has certain rules concerning whether
> punctuation goes inside or outside of quotes.  As a computer user, I regularly
> break these rules and instead apply a more sensible one: ...

Naturally.  Programmers cannot afford to position their "punctuation marks"
wherever they will look best.  Move the semicolon outside the right brace
and you have a syntax error.

With truly "natural" language--that is, speech--the problem does not arise.
As McLuhan says, Shakespeare never heard a grammatical error.  (There were
none!)

---

		"No more `mutiny'!"	--A. Razaf, "Christopher Columbus"
-- 
Col. G. L. Sicherman
...!ihnp4!odyssey!gls

jim@hpiacla.HP.COM (Jim Rogers) (10/23/87)

The concept that "natural languages" are tools has merit.  The concept that
these languages should be standardized to simplify the life of computer
programmers is ludicrous.

Each local language has local customs, history, and even thought patterns
deeply imbeded in its fabric.  Every time a local language replaced by
another language an irreplacable piece of human creativity and knowledge
is lost.  Language is more than just the way governments control their
populace.  Language is the fundamental basis for all human communication.

The basenote made reference to the "inertia" invloved in scrapping all 
"natural languages" in favor of a single standard language.  I say that
this is not only impractical but also undesirable.  Would we re-write all
literature for the simple purpose of making life easier for computer
programmers?  When this is done how many different versions of the
"natural language" parsing tools would be built?  I would guess there
would be atleast one for every version of every computer language in
use now or in the future.

Why not first invent a standardized computer environment which is used by
all hardware.  This would include operating systems, file systems,
networking, and, of course, computer languages.  After all, only a small
percentage of the residents of this planet are computer programmers.  It
should be musch easier to create a standard acceptable to the smaller
group than to create one acceptable to the general population of the
planet.

The answer to that question is obvious.

There are many different computer environments for many reasons.  Most
of those reasons reduce to the fact that a given computer environment
has been designed and developed to meet a specific set of needs.  I
have not yet met the genius who has designed a computer environment which
meets all the critical needs of all users.

Am I saying that standards are impossible?  No.  I am saying that there
are (and must be) a finite set (containing more than one member) of 
standards which will be developed to meet the known needs of the
computing community.  New research and design in areas not covered by
standards, or in methods not accepted by standards will (and must)
continue.  As capabilities, expectations, and needs change the standards
must change.

The only constant is change, and even that happens at varying rates.



Jim Rogers

hmj@tut.fi (Matti J{rvinen) (10/26/87)

In article <1446@haddock.ISC.COM> karl@haddock.isc.com (Karl Heuer) writes:
>                                          The alphabet is the servant of Man,
> not the other way around; thus it is appropriate to suggest that it should
> evolve to meet Man's changing needs.
> If it is painful to adapt the software to handle the peculiarities of certain
> languages/alphabets (I have in mind Chinese, Japanese, and to a lesser extent
> the accented letters of some European languages, and to some extent English),
> then it is reasonable to consider the possibility that the language/alphabet
> should change instead of the software.  I am not saying that the former *must*
> be the one to change, only that it should be considered.  I recognize that
> there's a lot of inertia to overcome, but might not the benefits be worth it?

Is this a joke or are you really stupid enough to be serious?
Those so called "accented" letters are very important in some languages.
How would you change alphabets using them as separate letters?
If { refers to a with dots (umlaut a), I may write two Finnish
words
	valittaa  and
	v{litt{{
having meanings "mourn" and "deliver". So, replacing { with a can not be
done.  Letter e can be after letters a or o, so replacing { with ae
can not be done.

Finnish is written as it is spoken.  Every letter has only one way to
pronounce it.  If you drop letters off, how would you write words
containing those letters?

This all is (partially) true for several languages (e.g. Swedish and German).
KEEP YOUR NASTY FINGERS OUT OF OUR ALPHABET AND FIX YOUR PROGRAMME(R)S!!

-- 
Hannu-Matti Jarvinen, Tampere University of Technology, Finland
Project EAST - European Advanced Software Technology
hmj@tut.fi, hmj@tut.uucp, hmj@tut.funet (tut.ARPA is not the same computer).

john@frog.UUCP (John Woods, Software) (10/27/87)

In article <365@zuring.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes:
>In article <1446@haddock.ISC.COM> karl@haddock.isc.com (Karl Heuer) writes:
>>                                         The alphabet is the servant of Man,
>>not the other way around; thus it is appropriate to suggest that it should
>>evolve to meet Man's changing needs.
> 
> Oh yes, pi is about 3.1103.
> I do not understand you ask?
> ...etc...

Fie on you.  Languages are constantly evolving to meet the needs of those
using them (except, perhaps, for CERTAIN languages with governmental bodies
created to ensure permanent ossification... :-).  English, for instance,
dropped grammar-coding endings many centuries ago, mostly because of the
difficulties people encountered in trying to reconcile differing sets of
endings (thanks to the recent Norse invaders, etc.) (there is a PBS series,
and a corresponding book, "The Story of English", that tells of this and many
more things, in quite an entertaining style).

Some believe that humans walk upright because of evolving to better use tools.
Perhaps you feel this was a mistake, and that sticks should have been designed
to be used while knuckle-walking... :-)

(Note, I don't necessarily feel that alphabets must, or even should, change
because of inadequacies of computers.  It's still an idea worth contemplating,
however.)

--
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, ...!mit-eddie!jfw, jfw@eddie.mit.edu

"Cutting the space budget really restores my faith in humanity.  It
eliminates dreams, goals, and ideals and lets us get straight to the
business of hate, debauchery, and self-annihilation."
		-- Johnny Hart

karl@haddock.ISC.COM (Karl Heuer) (10/27/87)

In article <190001@hpiacla.HP.COM> jim@hpiacla.HP.COM (Jim Rogers) writes:
>The concept that "natural languages" are tools has merit.  The concept that
>these languages should be standardized to simplify the life of computer
>programmers is ludicrous.

Actually, not so much the programmers as the users.  There are a lot more of
the latter.

>Each local language has local customs, history, and even thought patterns
>deeply imbeded in its fabric.

I don't think Spanish would be impoverished if "ch" were to be sorted as two
letters instead of one, nor do I think Spanish-speaking people would be losing
a significant part of their cultural heritage if they straightened this out.
(Just to pick one example.  Btw, an equally valid "fix" would be to make it a
single letter, with its own ASCII value and everything.)

>The basenote made reference to the "inertia" invloved in scrapping all
>"natural languages" in favor of a single standard language.

I did *not* suggest a single standard language.  I didn't even ask for a
single standard alphabet, although that would solve a lot of problems.  I
merely suggested that if the lexical warts get in the way, it's possible that
they'll get removed rather than avoided.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

edwards@uwmacc.UUCP (mark edwards) (10/27/87)

In article <1890@frog.UUCP> john@frog.UUCP (John Woods, Software) writes:
>In article <365@zuring.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes:
>>In article <1446@haddock.ISC.COM> karl@haddock.isc.com (Karl Heuer) writes:
>>>                                         The alphabet is the servant of Man,
>>>not the other way around; thus it is appropriate to suggest that it should
>>>evolve to meet Man's changing needs.

>Fie on you.  Languages are constantly evolving to meet the needs of those
>using them (except, perhaps, for CERTAIN languages with governmental bodies
>created to ensure permanent ossification... :-).  
[...]
>(Note, I don't necessarily feel that alphabets must, or even should, change
>because of inadequacies of computers.  It's still an idea worth contemplating,
>however.)

 Well time for my two cents. There have been histories of countries
 changing other countries alphabets. Consider the countries of 
 IndoChina. They use to use Chinese characters. After France had its 
 way the connection between China and the countries of Indochina
 became historical. Imagine having to learn a different language just
 inorder to read your countries history book in Original form.

 Now to take the argument of changing alphabets to meet the needs of
 computers.

 If people think that is a reasonable argument then I suggest we go
 further and change natural language to meet the needs of the computer.
 After all the benefits of having 100% computer understanding of every
 thing we say and write and do is just astronomical. We would have to
 mold our speaking to speak in simple, very distinct, and very unambiguous
 words and sentences. The sentences we would speak would be precise and
 very logical. Each sentence would have a determinable boolean qualifier.
 It would be either true or false, "maybe" causes to many problems.

 Words like cute or pretty, or hot, and cold would disappear because
 what is cute to one person may not be to another. That is what us
 computer people call inconsistent. A computer can not have inconsistent
 rules. Language would become boring, life would become dull.

 Well unfortunately, or should I say fortunately computer people will
 not get there way in this one. There are domains where the methods
 must change to meet the needs of the computer. Language is not one
 of these domains. The computer/programs will have to change to
 meet the needs of the humans. And that includes characters in alphabets
 also.

 mark

-- 
    edwards@vms.macc.wisc.edu
    {allegra, ihnp4, seismo}!uwvax!uwmacc!edwards
    UW-Madison, 1210 West Dayton St., Madison WI 53706

joe@haddock.ISC.COM (Joe Chapman) (10/29/87)

Mr Karl Heuer: "The alphabet is the servant of Man, not the other way
around; thus it is appropriate to suggest that it should evolve to
meet Man's changing needs."  Hra Hannu-Matti Jarvinen: "KEEP YOUR
NASTY FINGERS OUT OF OUR ALPHABET AND FIX YOUR PROGRAMME(R)S!!"

We've probably hammered this topic to death (at least in sci.lang) but
it seems to me rather silly to lump languages such as Finnish, which
have a few non-ascii characters, in with languages such as Chinese.  A
proposal to truncate alphabets in the former group seems to me, at the
risk of seeming unpatriotic, as a peculiarly American sort of
chauvanism.  Whether the trend towards simplification of ideographic
languages---for example, official simplified characters and the use of
pinyin in Chinese, and the disuse of non-toyo list kanji in
Japanese---reflects a natural evolution in the language or another
instance of Western-influenced information-processsing arrogance is
anyone's guess.

The more interesting assertion to me is the notion that language is
simply another tool which can be altered to suit societal needs, as
opposed to something people and societies find themselves in the midst
of.  Granted, minor changes in the fabric of language can be made by
governments and individuals, but one simply has to wait for the
fundamental process of signification to change.  This is a topic that
can probably only be argued in French; any comments?

--
Joe Chapman
harvard!ima!joe

gls@odyssey.ATT.COM (g.l.sicherman) (10/29/87)

> Finnish is written as it is spoken.  Every letter has only one way to
> pronounce it.  If you drop letters off, how would you write words
> containing those letters?

In English we use combinations like "th" and "ch." It works fine unless
you insist on phonetic spelling.

> This all is (partially) true for several languages (e.g. Swedish and German).
> KEEP YOUR NASTY FINGERS OUT OF OUR ALPHABET AND FIX YOUR PROGRAMME(R)S!!

This seems awfully possessive.  He who steals my alphabet, steals trash.
I can always invent a new one!

By the way, I have yet to see a standard that accommodates the Seuss
postzetals: yuzz, wum, um, humph, ...

---
	"No matter where you go, there you are ... except that
	 when you're on the phone, you're nowhere."

				--Ollaroo MacNoonzai
-- 
Col. G. L. Sicherman
...!ihnp4!odyssey!gls

lisper@yale.UUCP (Bjorn Lisper) (10/31/87)

In article <1924@kuukkeli.tut.fi> hmj@kuukkeli.UUCP (Hannu-Matti J{rvinen) writes:
>In article <1446@haddock.ISC.COM> karl@haddock.isc.com (Karl Heuer) writes:
>>                                          The alphabet is the servant of Man,
>> not the other way around; thus it is appropriate to suggest that it should
>> evolve to meet Man's changing needs.
>> If it is painful to adapt the software to handle the peculiarities of
>> certain 
>> languages/alphabets (I have in mind Chinese, Japanese, and to a lesser
>> extent 
>> the accented letters of some European languages, and to some extent
>> English), 
>> then it is reasonable to consider the possibility that the language/alphabet
>> should change instead of the software.  I am not saying that the former
>> *must* 
>> be the one to change, only that it should be considered.  I recognize that
>> there's a lot of inertia to overcome, but might not the benefits be worth
>> it? 
>
>Is this a joke or are you really stupid enough to be serious?
>Those so called "accented" letters are very important in some languages.
>How would you change alphabets using them as separate letters?
>If { refers to a with dots (umlaut a), I may write two Finnish
>words
>	valittaa  and
>	v{litt{{
>having meanings "mourn" and "deliver". So, replacing { with a can not be
>done.  Letter e can be after letters a or o, so replacing { with ae
>can not be done.
>
>Finnish is written as it is spoken.  Every letter has only one way to
>pronounce it.  If you drop letters off, how would you write words
>containing those letters?
>
>This all is (partially) true for several languages (e.g. Swedish and German).
>KEEP YOUR NASTY FINGERS OUT OF OUR ALPHABET AND FIX YOUR PROGRAMME(R)S!!
>
There are, of course, sometimes ways to transcribe "nonstandard" characters
to "standard" (with regard to the English character set) characters that is
unambiguous. The three special Swedish letters a-with-circle,
a-with-dieresis and o-with-dieresis, for instance, have the transcriptions
aa, ae and oe, respectively. Thus all Swedish words can really be
transcribed to "English" form. Context will then decide whether for instance
"oe" means o-with-dieresis or o followed by e.

But why did the Swedish character set include these extra characters in the
first place? The answer is that Swedish has more vowels than can be
expressed with ordinary latin characters without resorting to constructions
as above and these "extra" vowels ARE AS IMPORTANT AS THE OTHERS for the
meaning of the words and should not be treated differently; thus they
deserve characters of their own. This is also economical since these vowels
are frequent in Swedish and "single character codes" for them saves work and
space.

Another aspect is that according to the Swedish pronounciation rules "oe"
should really be pronounced as "o" followed by "e", so the usage of this for
o-with-dieresis should clutter the swedish pronounciation rules with
"unswedish" exceptions.

The alphabet is certainly the servant of Man, especially is a national
alphabet the servant of the people of the nation in question.

Bjorn Lisper	(for ignorant anglosaxons)
Bjoern Lisper	(for somewh

larry@sgistl.SGI.COM (Larry Autry) (11/01/87)

In article <348@odyssey.ATT.COM>, gls@odyssey.ATT.COM (g.l.sicherman) writes:
> 
> In English we use combinations like "th" and "ch." It works fine unless
> you insist on phonetic spelling.

When other languages such as Chinese, Japanese, and Polynesian are Anglicized,
they appear to be spelled similar to Spanish rules of pronunciation.  Am I
mistaken?  If more languages, even English, were to adopt at least similar
guidelines, a large gap would close.

-- 
					Larry Autry
larry@sgistl.sgi.com
       or
{ucbvax,sun,ames,pryamid,decwrl}!sgi!sgistl!larry

karl@haddock.UUCP (11/02/87)

In article <1924@kuukkeli.tut.fi> hmj@kuukkeli.UUCP (Hannu-Matti J{rvinen) writes:
>Is this a joke or are you really stupid enough to be serious?

Neither.  Please note that I did not make any specific proposals for changing
any alphabet.  My article can be summarized as "alphabets are not immutable";
it was a rebuttal to previous articles which seemed to implicitly assume the
opposite.

To forestall accusations of American chauvinism, let me concentrate on English
(which my article also mentioned).  English words include two non-letters, 
hyphen ("-") and apostrophe ("'").  Let's look at the latter.

It's been several years since I've seen the word "Halloween" spelled with an
apostrophe.  Many traffic lights say "DONT WALK".  So many people confuse
"its" and "it's" that they might as well be alternate spellings of each other.

Given the above, and the collation problem caused by apostrophe, I would
consider it possible (not necessarily desirable) that American English may
soon drop the use of apostrophe, at least in some contexts.  This would create
some collisions; I would guess that the existing words "cant" and "wont" (but
not "shell") would probably be dropped from the language, just as "quean"
disappeared after the Great Vowel Shift made it a homonym for "queen".

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
Followups to sci.lang only.

minow@decvax.UUCP (Martin Minow) (11/08/87)

In article <18306@yale-celray.yale.UUCP> Bjorn Lisper (lisper@yale-celray.UUCP)
suggests that "All Swedish words can really be transcribed to 'English' form"
by replacing a-ring by 'aa', a-dieresis by 'ae', and o-dieresis by 'oe'.

Unfortunately, this will not work properly:

1. Many Finnish words are used in Swedish.  For example, the common Finnish
   name "Paavo" is pronounced with a long-a (as in father) sound, not with
   an 'o' (as in boat).  However P<aa>ve means "Pope"

2. Sequences of vowels become ambiguous.  For example, Sj<aa>are (dock-worker).
   Many of these sequences arise from Swedish compounding rules.  In
   general, a sequence of vowels will indicate a morepheme boundary.
   Turning these sequences into what appear to be trigraphs will cause
   confusion.

We can see examples of Bjorn's suggestion causing problems in modern Danish.
In 1948, Danish started using a-ring for the previously used 'aa' sequence.
Also, words beginning with a-ring were moved to the back of the alphabet,
*even* if they were written with 'aa.'  Thus, AAlborg (the town) was
alphabetized after 'Z'.  The one exception to this were foreign words,
such as from Finnish, with natural sequences of 'aa'.

Martin Minow
decvax!minow

lisper@yale.UUCP (11/12/87)

In article <182@decvax.UUCP> minow@decvax.UUCP (Martin Minow) writes:
>In article <18306@yale-celray.yale.UUCP> Bjorn Lisper (lisper@yale-celray.UUCP)
>suggests that "All Swedish words can really be transcribed to 'English' form"
>by replacing a-ring by 'aa', a-dieresis by 'ae', and o-dieresis by 'oe'.
>
>Unfortunately, this will not work properly:
>
>1. Many Finnish words are used in Swedish.  For example, the common Finnish
>   name "Paavo" is pronounced with a long-a (as in father) sound, not with
>   an 'o' (as in boat).  However P<aa>ve means "Pope"
>
>2. Sequences of vowels become ambiguous.  For example, Sj<aa>are (dock-worker).
>   Many of these sequences arise from Swedish compounding rules.  In
>   general, a sequence of vowels will indicate a morepheme boundary.
>   Turning these sequences into what appear to be trigraphs will cause
>   confusion.
>
>We can see examples of Bjorn's suggestion causing problems in modern Danish.
>In 1948, Danish started using a-ring for the previously used 'aa' sequence.
>Also, words beginning with a-ring were moved to the back of the alphabet,
>*even* if they were written with 'aa.'  Thus, AAlborg (the town) was
>alphabetized after 'Z'.  The one exception to this were foreign words,
>such as from Finnish, with natural sequences of 'aa'.
>
>Martin Minow
>decvax!minow

Certainly there will be problems. The meanings of aa, ae and oe will be
context-dependent, as I pointed out in my previous posting. This is for
exactly the same reasons as you mention. (Another example: "o-" is the
prefix in Swedish equivalent to the English "un-". Thus the Swedish word for
uneconomical, "oekonomisk", contains the "oe", but it is pronounced as "o"
followed by "e", NOT as o-with-dieresis.)  My proposal was merely rethorical
and I do not advocate its enforcement.

Bjorn Lisper