[comp.ai.nlang-know-rep] NL-KR Digest Volume 3 No. 43

nl-kr-request@CS.ROCHESTER.EDU (NL-KR Moderator Brad Miller) (11/06/87)
NL-KR Digest             (11/05/87 22:03:16)            Volume 3 Number 43

Today's Topics:
        Re: Infinite alphabets - (Turing via Berke)
        Re: Infinite alphabets
        Phonemes and Alphabets
        Re: Phonemes and Alphabets
        Chinese
        
----------------------------------------------------------------------

Date: Fri, 16 Oct 87 12:07 EDT
From: Richard Wojcik <ssc-vax!bcsaic!rwojcik@beaver.cs.washington.edu>
Subject: Re: Infinite alphabets - (Turing via Berke)

There seems to be some confusion over the nature of writing systems for
human languages.  There are three 'ideal' types: logographic
(word-based), syllabic, and alphabetic.  Alphabetic writing systems are
based on phonemes, the basic units of sound that make up words.  Since
most languages have between 30 and 50 phonemes, it makes little sense to
talk about natural writing systems with infinite alphabets.  Logographic
writing systems, the least efficient type of writing, can have only as
many symbols as there are words (or morphemes) in the vocabulary.  If it
is necessary to carry on this discussion, why not debate the size of the
class of objects that written symbols map onto?

------------------------------

Date: Thu, 22 Oct 87 09:50 EDT
From: allegra!lcuxlm!dougf@ucbvax.Berkeley.EDU
Subject: Re: Infinite alphabets - (Turing via Berke)

In article <436@russell.STANFORD.EDU>, nakashim@russell.STANFORD.EDU (Hideyuki Nakashima) writes:
> >In article <417@russell.STANFORD.EDU> nakashim@russell.UUCP (Hideyuki Nakashima) writes:
> >> Chinese can DO have infinite characters.

> What I wanted to say was, however, not that point.  What Chinese tried
> was (or so seems tome was) to assign a symbol for each distinct
> concepts.  So, as the number of concepts grow, so does the number of
> symbols (potentially).  Europian "word"s correspond to Chinese
> "character"s.  
Most Chinese "words" these days consist of two characters.  Some "words" are
made up of more characters.  There are less than 10,000 Chinese characters,
but the language is not that limited.  New words are created all the time
generally composed of characters with related meanings.  "Telephone", for
example is two characters, the first for electricity, & the second for 
speech.

Other "words" are formed in the way we form acronyms.  A poor example, but
to the  point: I taught at He-fei Gong-ye Da-xue (the 3rd "word", "University",
is composed of the characters for "big" and a root for "study/school"), 
however it was almost always refered to as He-gong-da. This works for 
things other than names also.  

I am quoting the word "word", because the concept is often slippery in 
Chinese.  In writing, each character is the separated by the same space
whether or not it is part of the same "word".

> No Chinese equivalent of Europian "alphabet"s exist.

True, but as someone has mentioned, It has a very limited number of strokes.
In fact a typewriter keyboard hooked to a microprocessor was invented in 
1982 that could accept characters as input from a combination of as few as
5 types of strokes.  The keyboard also has a number of standard stroke
combinations that are part of many characters.  These save time, but the
5 strokes are all that is needed.  A key to this process is that the 
strokes in a Chinese character are to be written in a specific order. Thus,
the character dictionary consists of ordered sets of the 5 strokes.

The most complex single character has about 24 strokes.  If we take that as
a limit then the maximum number of characters, (assuming all combinations would
be acceptable, which i doubt), would be :

	i=24
	___
	\     5^i	or about 7*10^16
	/
	---
 	i=1

70 quadrillion, although rather large is not infinite.  In fact, few people
would be able to spend the time to learn them all -)

> Hideyuki Nakashima


-- 
	doug foxvog	...allegra!lcuxlj!dougf	[Please use lcuxlj not lcuxlm]
			If only Bell Labs agreed with my opinion...
NSA: names of CIA agents in NRO working on TEMPEST encrypted above.  Drug
	dealing terrorists assassinated for planned hijacking.

------------------------------

Date: Fri, 23 Oct 87 08:34 EDT
From: Greg Lee <nosc!humu!uhmanoa!uhccux!lee@sdcsvax.ucsd.edu>
Subject: Re: Infinite alphabets - (Turing via Berke)

Alphabets are often close to phoneme systems.  They may have characters
for allophones, however.  Karl Menge in his book on Turkish grammar
mentions the Orkhon script of a Turkic dialect which had different
signs for most consonants, depending on whether they occur in front
or back harmonic words.  Judging from Turkish and following the
conventional view, these were probably allophones.
	Greg Lee, lee@uhccux.uhcc.hawaii.edu

------------------------------

Date: Tue, 20 Oct 87 17:24 EDT
From: Hideyuki Nakashima <nakashim@russell.STANFORD.EDU>
Subject: Re: Infinite alphabets


In article <821@iscuva.ISCS.COM> Randyg@iscuva writes:
>In article <436@russell.STANFORD.EDU> nakashim@russell.UUCP (Hideyuki Nakashima) writes:
>>.............  No Chinese equivalent of Europian "alphabet"s exist.
>>
>>Hideyuki Nakashima
>
>I don't know much about Chinese writing, but in terms of formal game theory,
>does it really make much of a difference to the equivalence of the languages
>that chinese use simple tokens(well sorta) and europeans use complex tokens?
>From what I can tell, there is nothing that is impossible to translate either
>way(tho it may take a number of manipulations).
>
>RANDY GORDON(TAO KUO TSE FUN PEE)

What you are trying to say is no more than the arguemnt that since
Turing machine can simulate any symbol manipulation machine, all are
the same.

I agree that they are mutually translatable.  But the HEART of the
systems are different.  Europinan words are constructed on a rigit set
of alphabets.  Chinese are not.  They tried to have different
characters on each concepts they have.  But naturally, they failed on
their attempt, because their imagination was not infinite.  They could
not invent as many characters, or components of characters as
concepts.  So they ended up in using same componets in several places.

If you think each stroke of a character is like alphabet, I object.
It is just like saying each painting is composed of strokes of
brushes.  Strokes in Chinese characters are not their atomic elements.
Their radicals are rather those as:

   |             +------+             |
---+---          |      |         +--------
  /|\            +------+       --|
 / | \           |      |       --|
/  |  \  (tree)  +------+ (sun)   /       (sickness)

There are hundreds of those and most of them are themselves characters.

-- 
Hideyuki Nakashima
CSLI and ETL
nakashima@csli.stanford.edu (until Aug. 1988)
nakashima%etl.jp@relay.cs.net (afterwards)

------------------------------

Date: Thu, 29 Oct 87 15:55 EST
From: Richard Wojcik <rwojcik@bcsaic.UUCP>
Subject: Phonemes and Alphabets


This is a response to Greg Lee's comment about alphabetic writing
(Infinite Alphabets Thu Oct 29 08:33:39 PST 1987).  I am replying
under a new topic name, since this has little to do with the question
of infinity.  Greg's comment follows:

>Alphabets are often close to phoneme systems.  They may have characters
>for allophones, however.  Karl Menge in his book on Turkish grammar
>mentions the Orkhon script of a Turkic dialect which had different
>signs for most consonants, depending on whether they occur in front
>or back harmonic words.  Judging from Turkish and following the
>conventional view, these were probably allophones.
>	Greg Lee, lee@uhccux.uhcc.hawaii.edu

I know little about Turkish, and nothing about Orkhon script.  Why do
you think that the representation is allophonic?  Vowel harmony in
Turkish is morphophonological, not phonological.  Therefore,
consonantal signs may be tied to the phonemic identity of the vowel
that they cooccur with.  What are the consonantal allophones being
represented?  Are these 'allophones' phonemic or nonphonemic variants
of the phoneme?

My use of the term 'phoneme' here follows that of Sapir and Baudouin
de Courtenay.  My understanding of their view is that phonemes are the
phonological segments that speakers use to encode morphemes in memory.
Such units sometimes correspond to segments that Sapir called
'pseudo-phonemes'.  For example, in Russian /dub/~/duba/ 'oak~of the
oak', gets pronounced [dup]~[duba].  In Baudouin's seminal work on
phonemics, the phonemic representation of {dub} ended in a /b/, but
speakers pronounced it as [p].  (Since [p] is otherwise phonemic in
Russian, Sapir would have called it a 'pseudo-phoneme' in this case.)
Baudouin pointed out that there could be two types of alphabetic
writing for a language like Russian. 'Phonemographic' writing would
represent the surfacy [p] variant.  'Morphemographic' would represent
the underlying /b/ variant (cf. "The Influence of Language on
World-View and Mood" in the Stankiewicz reader).  The interesting
thing is that he never allowed for any level of alphabetic writing
that was more surfacy than phonemographic.

David Stampe has suggested one other alphabet that is more surfacy
than phonemographic:  Sanskrit's devanagari script, which contains
things like totally predictable allophonic variants of /n/.  Is Orkhon
script 'allophonic' in the same way?  Are there any other alphabetic
systems that get more surfacy than phonemographic?


------------------------------

Date: Sat, 31 Oct 87 11:11 EST
From: Greg Lee <lee@uhccux.UUCP>
Subject: Phonemes and Alphabets

In <2551@bcsaic.UUCP>, Rick Wojcik writes:
|
|This is a response to Greg Lee's comment about alphabetic writing
| ...
|I know little about Turkish, and nothing about Orkhon script.  Why do
|you think that the representation is allophonic?  Vowel harmony in

I don't think it is allophonic, personally.  My view is unconventional.

|Turkish is morphophonological, not phonological.  Therefore,

I doubt that vowel harmony in Turkish is morphonological. (More below.)

|consonantal signs may be tied to the phonemic identity of the vowel
|that they cooccur with.  What are the consonantal allophones being

Front vs. back p, t, k, b, etc., I suppose.  I know only about the
situation in modern Turkish (and only a little about that).

|represented?  Are these 'allophones' phonemic or nonphonemic variants
|of the phoneme?

Ah! There's the problem.  Who knows?  It depends how you look at it.
Judging from the way things are spelled, they are allophones.
And if we confine ourselves to "native" Turkish morphemes and devise
a set of segments just large enough to transcribe such morphemes
distinctly, they would also be judged to be allophones. (Thus
we arrive at the conventional view.)

|
|My use of the term 'phoneme' here follows that of Sapir and Baudouin
|de Courtenay.  My understanding of their view is that phonemes are the
|phonological segments that speakers use to encode morphemes in memory.
| ... [ Rick goes on to summarize some views of Baudouin and Sapir.]

Yes, now if we can just figure out how morphemes are memorized ...

Difficulties with the conventional phonemic analysis of Turkish arise
when we take into account the many (mostly) non-native inharmonic morphemes
-- borrowings from Arabic, French, German, ... -- and also take into
account consonant harmony in Turkish and the way it interacts with
vowel harmony and other phonological processes (rules?).  I will
make a long story short by giving my opinion about the matter without
going into the evidence I know of, pro and con.  And that is this.
The conventional view is fundamentally mistaken because it proceeds
from the "phonemic principle" that languages are organized around
a principle of least effort which leads to a minimization of the
distinctive segments -- the phonemes.  Rather, I think, it is the
distinctions among morphemes which are minimized.  In native Turkish,
there is, or was, a front versus back distinction for whole morphemes,
and all the segments, consonants as well as vowels, are phonetically
and "phonemically" front in a front morpheme, back in a back morpheme.
If we count as "phonemes" then, those segnemts which appear in the
memorized basic forms of morphemes, we arrive at twice the number
of consonants as we do in the conventional analysis.  The conventional
idea about this is a carry-over from the technology of developing
practical writing systems, but has no theoretical basis.

As to whether Turkish harmony is morphonological, well, that depends
on whether there are morphological processes which affect backness,
I guess.  I've never heard of any, and so far as I know, Turkish
harmony can be treated as completely phonological, with the one
exception that there are a few suffixes (formerly separate words)
which fail to harmonize for no phonological reason.

I gave the example of the Orkhon script as an example of allophonic
writing on the assumption that everything I just said is wrong.

Greg Lee, lee@uhccux.uhcc.hawaii.edu

------------------------------

Date: Mon, 2 Nov 87 13:18 EST
From: Rick Wojcik <rwojcik@bcsaic.UUCP>
Subject: Re: Phonemes and Alphabets


In article <1038@uhccux.UUCP> lee@uhccux.UUCP (Greg Lee) writes:
....
>The conventional view is fundamentally mistaken because it proceeds
>from the "phonemic principle" that languages are organized around
>a principle of least effort which leads to a minimization of the
>distinctive segments -- the phonemes.  Rather, I think, it is the
>...The conventional
>idea about this is a carry-over from the technology of developing
>practical writing systems, but has no theoretical basis.

I would not call the Baudouin-Sapir phoneme conventional, since it is
not the same as the better-known structuralist phoneme.  It has nothing
at all to do with systematic phonemics (although most modern
generativists think that it does, thanks to Chomsky).  The principle is
that language speakers have a set of pronounceable/perceivable phonetic
segments that can be used to encode morphemes (actually, allomorphs) in
memory.  Morphophonological alternations hold between allomorphs--e.g.
the /f/~/v/ alternation in {knife~knives}.  To English speakers, the /f/
and /v/ are distinct speech sounds.  What sets Baudouin & Sapir apart
from subsequent phonemicists is their use of 'phoneme' to characterize
abstract segments in cases of automatic neutralization.  They really
believed that German and Russian speakers were trying to pronounce final
voiced segments at the ends of syllables, even though the sounds always
came out voiceless.  So a final [t] in a German word could represent a
/t/ or a /d/.  Historically, German spelling used voiceless letters
(phonemographic), but it now uses voiced (morphemographic).  
 
>As to whether Turkish harmony is morphonological, well, that depends
>on whether there are morphological processes which affect backness,
>I guess.  I've never heard of any, and so far as I know, Turkish
>harmony can be treated as completely phonological, with the one
>exception that there are a few suffixes (formerly separate words)
>which fail to harmonize for no phonological reason.
>Greg Lee, lee@uhccux.uhcc.hawaii.edu

The term 'morphophonology' is Trubetzkoy's term for Baudouin's
'psychophonetics' (alternations involving 2 phonemes).  'Phonology' is
his term for Baudouin's 'physiophonetics' (alternations involving 1
basic phoneme).  The question, as it relates to Turkish, is whether or
not speakers hear front and back consonants as morpheme-distinguishing
speech sounds.  Note that both Baudouin & Trubetzkoy considered vowel
harmony to be morphophonological, not phonological.  The basis for this
is the fact that morphemic identities control harmony, not phonetic
environments.  Your reference to 'a few suffixes which fail to
harmonize' places vowel harmony in morphophonology.

------------------------------

Date: Wed, 4 Nov 87 06:27 EST
From: Greg Lee <lee@uhccux.UUCP>
Subject: Re: Phonemes and Alphabets



No. Baudouin, Sapir, and for that matter (or especially) David Stampe
are all conventionalists.  They accept the phonemic principle. Let
me illustrate the alternatives.  Say we have front ell [l] and
back ell [l-] occurring phonetically, and amongst the vowels front [i]
and back [a].  Phonetically, ell is always front before a front vowel
and back before a back vowel, so [li] and [l-a] are possible, but
never *[l-i] or *[la].  Then what about the basic memorized forms
of morphemes?  We must reduce "morpheme space" so that of the
four possibilities /li/, /l-i/, /la/, /l-a/, we can have only
two, because (let us say) in fact only two such morphemes can be
distinguished.  There are at least two conceivable alternatives.
We could require that in morphemes there should be only one variety
of ell, say the front one, in which case we have only /li/ and /la/ --
then [l-a] is described as having a phonetically conditioned
allophonic variant of basic /l/.  That's the phonemic analysis.
Or we could require that in morphemes ell should always agree
in frontness with the following vowel, in which case we have only
the two possible morphemes /li/ and /l-a/.  This would be an
unconventional analysis because two basic segments /l/ and /l-/
are assumed even though these can never be used to distinguish
morphemes.  Morpheme space has been reduced, but segment space
has not.  It's this type of analysis I suggested for Turkish.
But I don't think any of the above mentioned persons would have (had)
any hesitation in saying this must be wrong.  It violates the
phonemic principle and, note, is alphabetically inconvenient,
since more letters than necessary are required to transcribe
basic pronunciations.

	Generative phonologists are unconventional by accident,
since no principle of generative phonology requires phonemic
analyses.  But they are conventional by habit.  David Stampe
is conventional by principle, and Natural Phonology has
explicit assumptions to require phonemic analyses.  As for
Baudouin and Sapir, I don't know whether they arrive at
phonemics through reflection and choice or through alphabetic
habit.

	When we've agreed about the issue, I hope we can go
on to discuss some evidence, especially if there are some
Turkish or Hungarian experts here on the net to consult.

	Greg Lee, lee@uhccux.uhcc.hawaii.edu

------------------------------

Date: Sun, 25 Oct 87 20:26 EST
From: Neural Activists of the World .. Unite! <dwong@zgov03.dec.com>
Subject: Chinese

>Please note that it is Turing claiming Chinese "attempts" to have an
>enumerable infinity of symbols.
....
>Chinese cannot succeed at having an enumerbale infinity of symbols.  It
>can only "attempt" to have them.  Unless you "go down a level" and
>consider "things that make up" Chinese symbols "the symbols."  Then
>there must be a finite number pf them.  Does anyone know if this is true
>about Chinese?
Somebody may already have replied to this ahead of me but anyways, let
me first congratulate Peter Berke on his correct deduction.
It sure is true about Chinese.  Ancient Chinese writing WAS composed of
a finite set of symbols and that set of symbols do exist to this day.
It's just that instead of writing those symbols linearly, they have been
written two-dimensionally, thus a Chinese symbol is more of a
two-dimensional word.  Problems with the writing today is that in the
course of two ... three thousand years, there have been many additions,
subtractions and the like.  Contemporary Chinese writing is going in
for a 'shorthand' version of these words and this will probably obscure
the original two-dimensional word more.
Lastly, there is no enumerable infinity of symbols.  Obviously, any
language must be reducible to a finite set of primitive symbols.  An
infinity of symbols infers zero information content per symbol.  You
cannot distinguish between 99999999999999999 and 99999999999999999999
mainly because you can't count at a glance, but it is simple to
aribitrarily assign 'values' to 17 9's and 20 9's which can be
distinguished more easily.  In a two-dimensional sense, in finite
paper space (but perhaps not on Turing's tape !?!?!?), one would probably
come out with a blob of ink.
Lastly lastly, not all Chinese know all Chinese words but because of
their composition of these 'primitive symbols' and associations with
other 'known words', an educated guess can be made.  This occurs in
English commonly too but I get this feeling that it is more easily
done in Chinese because of the symbolism of the words.  Take for
example the word "electronic", could be deduced if the word "electron"
or "electric" is known.  Likewise too in Chinese.

DISCLAIMER: ALL OPINIONS DISCUSSED ARE STRICTLY MY OWN AND DO NOT REFLECT
THE OPINIONS OF MY EMPLOYER OR ANYBODY WHO DOES NOT WISH TO REFLECT MY
OPINIONS.

------------------------------

End of NL-KR Digest
*******************