nl-kr-request@CS.ROCHESTER.EDU (NL-KR Moderator Brad Miller) (11/06/87)
NL-KR Digest (11/05/87 22:03:16) Volume 3 Number 43 Today's Topics: Re: Infinite alphabets - (Turing via Berke) Re: Infinite alphabets Phonemes and Alphabets Re: Phonemes and Alphabets Chinese ---------------------------------------------------------------------- Date: Fri, 16 Oct 87 12:07 EDT From: Richard Wojcik <ssc-vax!bcsaic!rwojcik@beaver.cs.washington.edu> Subject: Re: Infinite alphabets - (Turing via Berke) There seems to be some confusion over the nature of writing systems for human languages. There are three 'ideal' types: logographic (word-based), syllabic, and alphabetic. Alphabetic writing systems are based on phonemes, the basic units of sound that make up words. Since most languages have between 30 and 50 phonemes, it makes little sense to talk about natural writing systems with infinite alphabets. Logographic writing systems, the least efficient type of writing, can have only as many symbols as there are words (or morphemes) in the vocabulary. If it is necessary to carry on this discussion, why not debate the size of the class of objects that written symbols map onto? ------------------------------ Date: Thu, 22 Oct 87 09:50 EDT From: allegra!lcuxlm!dougf@ucbvax.Berkeley.EDU Subject: Re: Infinite alphabets - (Turing via Berke) In article <436@russell.STANFORD.EDU>, nakashim@russell.STANFORD.EDU (Hideyuki Nakashima) writes: > >In article <417@russell.STANFORD.EDU> nakashim@russell.UUCP (Hideyuki Nakashima) writes: > >> Chinese can DO have infinite characters. > What I wanted to say was, however, not that point. What Chinese tried > was (or so seems tome was) to assign a symbol for each distinct > concepts. So, as the number of concepts grow, so does the number of > symbols (potentially). Europian "word"s correspond to Chinese > "character"s. Most Chinese "words" these days consist of two characters. Some "words" are made up of more characters. There are less than 10,000 Chinese characters, but the language is not that limited. New words are created all the time generally composed of characters with related meanings. "Telephone", for example is two characters, the first for electricity, & the second for speech. Other "words" are formed in the way we form acronyms. A poor example, but to the point: I taught at He-fei Gong-ye Da-xue (the 3rd "word", "University", is composed of the characters for "big" and a root for "study/school"), however it was almost always refered to as He-gong-da. This works for things other than names also. I am quoting the word "word", because the concept is often slippery in Chinese. In writing, each character is the separated by the same space whether or not it is part of the same "word". > No Chinese equivalent of Europian "alphabet"s exist. True, but as someone has mentioned, It has a very limited number of strokes. In fact a typewriter keyboard hooked to a microprocessor was invented in 1982 that could accept characters as input from a combination of as few as 5 types of strokes. The keyboard also has a number of standard stroke combinations that are part of many characters. These save time, but the 5 strokes are all that is needed. A key to this process is that the strokes in a Chinese character are to be written in a specific order. Thus, the character dictionary consists of ordered sets of the 5 strokes. The most complex single character has about 24 strokes. If we take that as a limit then the maximum number of characters, (assuming all combinations would be acceptable, which i doubt), would be : i=24 ___ \ 5^i or about 7*10^16 / --- i=1 70 quadrillion, although rather large is not infinite. In fact, few people would be able to spend the time to learn them all -) > Hideyuki Nakashima -- doug foxvog ...allegra!lcuxlj!dougf [Please use lcuxlj not lcuxlm] If only Bell Labs agreed with my opinion... NSA: names of CIA agents in NRO working on TEMPEST encrypted above. Drug dealing terrorists assassinated for planned hijacking. ------------------------------ Date: Fri, 23 Oct 87 08:34 EDT From: Greg Lee <nosc!humu!uhmanoa!uhccux!lee@sdcsvax.ucsd.edu> Subject: Re: Infinite alphabets - (Turing via Berke) Alphabets are often close to phoneme systems. They may have characters for allophones, however. Karl Menge in his book on Turkish grammar mentions the Orkhon script of a Turkic dialect which had different signs for most consonants, depending on whether they occur in front or back harmonic words. Judging from Turkish and following the conventional view, these were probably allophones. Greg Lee, lee@uhccux.uhcc.hawaii.edu ------------------------------ Date: Tue, 20 Oct 87 17:24 EDT From: Hideyuki Nakashima <nakashim@russell.STANFORD.EDU> Subject: Re: Infinite alphabets In article <821@iscuva.ISCS.COM> Randyg@iscuva writes: >In article <436@russell.STANFORD.EDU> nakashim@russell.UUCP (Hideyuki Nakashima) writes: >>............. No Chinese equivalent of Europian "alphabet"s exist. >> >>Hideyuki Nakashima > >I don't know much about Chinese writing, but in terms of formal game theory, >does it really make much of a difference to the equivalence of the languages >that chinese use simple tokens(well sorta) and europeans use complex tokens? >From what I can tell, there is nothing that is impossible to translate either >way(tho it may take a number of manipulations). > >RANDY GORDON(TAO KUO TSE FUN PEE) What you are trying to say is no more than the arguemnt that since Turing machine can simulate any symbol manipulation machine, all are the same. I agree that they are mutually translatable. But the HEART of the systems are different. Europinan words are constructed on a rigit set of alphabets. Chinese are not. They tried to have different characters on each concepts they have. But naturally, they failed on their attempt, because their imagination was not infinite. They could not invent as many characters, or components of characters as concepts. So they ended up in using same componets in several places. If you think each stroke of a character is like alphabet, I object. It is just like saying each painting is composed of strokes of brushes. Strokes in Chinese characters are not their atomic elements. Their radicals are rather those as: | +------+ | ---+--- | | +-------- /|\ +------+ --| / | \ | | --| / | \ (tree) +------+ (sun) / (sickness) There are hundreds of those and most of them are themselves characters. -- Hideyuki Nakashima CSLI and ETL nakashima@csli.stanford.edu (until Aug. 1988) nakashima%etl.jp@relay.cs.net (afterwards) ------------------------------ Date: Thu, 29 Oct 87 15:55 EST From: Richard Wojcik <rwojcik@bcsaic.UUCP> Subject: Phonemes and Alphabets This is a response to Greg Lee's comment about alphabetic writing (Infinite Alphabets Thu Oct 29 08:33:39 PST 1987). I am replying under a new topic name, since this has little to do with the question of infinity. Greg's comment follows: >Alphabets are often close to phoneme systems. They may have characters >for allophones, however. Karl Menge in his book on Turkish grammar >mentions the Orkhon script of a Turkic dialect which had different >signs for most consonants, depending on whether they occur in front >or back harmonic words. Judging from Turkish and following the >conventional view, these were probably allophones. > Greg Lee, lee@uhccux.uhcc.hawaii.edu I know little about Turkish, and nothing about Orkhon script. Why do you think that the representation is allophonic? Vowel harmony in Turkish is morphophonological, not phonological. Therefore, consonantal signs may be tied to the phonemic identity of the vowel that they cooccur with. What are the consonantal allophones being represented? Are these 'allophones' phonemic or nonphonemic variants of the phoneme? My use of the term 'phoneme' here follows that of Sapir and Baudouin de Courtenay. My understanding of their view is that phonemes are the phonological segments that speakers use to encode morphemes in memory. Such units sometimes correspond to segments that Sapir called 'pseudo-phonemes'. For example, in Russian /dub/~/duba/ 'oak~of the oak', gets pronounced [dup]~[duba]. In Baudouin's seminal work on phonemics, the phonemic representation of {dub} ended in a /b/, but speakers pronounced it as [p]. (Since [p] is otherwise phonemic in Russian, Sapir would have called it a 'pseudo-phoneme' in this case.) Baudouin pointed out that there could be two types of alphabetic writing for a language like Russian. 'Phonemographic' writing would represent the surfacy [p] variant. 'Morphemographic' would represent the underlying /b/ variant (cf. "The Influence of Language on World-View and Mood" in the Stankiewicz reader). The interesting thing is that he never allowed for any level of alphabetic writing that was more surfacy than phonemographic. David Stampe has suggested one other alphabet that is more surfacy than phonemographic: Sanskrit's devanagari script, which contains things like totally predictable allophonic variants of /n/. Is Orkhon script 'allophonic' in the same way? Are there any other alphabetic systems that get more surfacy than phonemographic? ------------------------------ Date: Sat, 31 Oct 87 11:11 EST From: Greg Lee <lee@uhccux.UUCP> Subject: Phonemes and Alphabets In <2551@bcsaic.UUCP>, Rick Wojcik writes: | |This is a response to Greg Lee's comment about alphabetic writing | ... |I know little about Turkish, and nothing about Orkhon script. Why do |you think that the representation is allophonic? Vowel harmony in I don't think it is allophonic, personally. My view is unconventional. |Turkish is morphophonological, not phonological. Therefore, I doubt that vowel harmony in Turkish is morphonological. (More below.) |consonantal signs may be tied to the phonemic identity of the vowel |that they cooccur with. What are the consonantal allophones being Front vs. back p, t, k, b, etc., I suppose. I know only about the situation in modern Turkish (and only a little about that). |represented? Are these 'allophones' phonemic or nonphonemic variants |of the phoneme? Ah! There's the problem. Who knows? It depends how you look at it. Judging from the way things are spelled, they are allophones. And if we confine ourselves to "native" Turkish morphemes and devise a set of segments just large enough to transcribe such morphemes distinctly, they would also be judged to be allophones. (Thus we arrive at the conventional view.) | |My use of the term 'phoneme' here follows that of Sapir and Baudouin |de Courtenay. My understanding of their view is that phonemes are the |phonological segments that speakers use to encode morphemes in memory. | ... [ Rick goes on to summarize some views of Baudouin and Sapir.] Yes, now if we can just figure out how morphemes are memorized ... Difficulties with the conventional phonemic analysis of Turkish arise when we take into account the many (mostly) non-native inharmonic morphemes -- borrowings from Arabic, French, German, ... -- and also take into account consonant harmony in Turkish and the way it interacts with vowel harmony and other phonological processes (rules?). I will make a long story short by giving my opinion about the matter without going into the evidence I know of, pro and con. And that is this. The conventional view is fundamentally mistaken because it proceeds from the "phonemic principle" that languages are organized around a principle of least effort which leads to a minimization of the distinctive segments -- the phonemes. Rather, I think, it is the distinctions among morphemes which are minimized. In native Turkish, there is, or was, a front versus back distinction for whole morphemes, and all the segments, consonants as well as vowels, are phonetically and "phonemically" front in a front morpheme, back in a back morpheme. If we count as "phonemes" then, those segnemts which appear in the memorized basic forms of morphemes, we arrive at twice the number of consonants as we do in the conventional analysis. The conventional idea about this is a carry-over from the technology of developing practical writing systems, but has no theoretical basis. As to whether Turkish harmony is morphonological, well, that depends on whether there are morphological processes which affect backness, I guess. I've never heard of any, and so far as I know, Turkish harmony can be treated as completely phonological, with the one exception that there are a few suffixes (formerly separate words) which fail to harmonize for no phonological reason. I gave the example of the Orkhon script as an example of allophonic writing on the assumption that everything I just said is wrong. Greg Lee, lee@uhccux.uhcc.hawaii.edu ------------------------------ Date: Mon, 2 Nov 87 13:18 EST From: Rick Wojcik <rwojcik@bcsaic.UUCP> Subject: Re: Phonemes and Alphabets In article <1038@uhccux.UUCP> lee@uhccux.UUCP (Greg Lee) writes: .... >The conventional view is fundamentally mistaken because it proceeds >from the "phonemic principle" that languages are organized around >a principle of least effort which leads to a minimization of the >distinctive segments -- the phonemes. Rather, I think, it is the >...The conventional >idea about this is a carry-over from the technology of developing >practical writing systems, but has no theoretical basis. I would not call the Baudouin-Sapir phoneme conventional, since it is not the same as the better-known structuralist phoneme. It has nothing at all to do with systematic phonemics (although most modern generativists think that it does, thanks to Chomsky). The principle is that language speakers have a set of pronounceable/perceivable phonetic segments that can be used to encode morphemes (actually, allomorphs) in memory. Morphophonological alternations hold between allomorphs--e.g. the /f/~/v/ alternation in {knife~knives}. To English speakers, the /f/ and /v/ are distinct speech sounds. What sets Baudouin & Sapir apart from subsequent phonemicists is their use of 'phoneme' to characterize abstract segments in cases of automatic neutralization. They really believed that German and Russian speakers were trying to pronounce final voiced segments at the ends of syllables, even though the sounds always came out voiceless. So a final [t] in a German word could represent a /t/ or a /d/. Historically, German spelling used voiceless letters (phonemographic), but it now uses voiced (morphemographic). >As to whether Turkish harmony is morphonological, well, that depends >on whether there are morphological processes which affect backness, >I guess. I've never heard of any, and so far as I know, Turkish >harmony can be treated as completely phonological, with the one >exception that there are a few suffixes (formerly separate words) >which fail to harmonize for no phonological reason. >Greg Lee, lee@uhccux.uhcc.hawaii.edu The term 'morphophonology' is Trubetzkoy's term for Baudouin's 'psychophonetics' (alternations involving 2 phonemes). 'Phonology' is his term for Baudouin's 'physiophonetics' (alternations involving 1 basic phoneme). The question, as it relates to Turkish, is whether or not speakers hear front and back consonants as morpheme-distinguishing speech sounds. Note that both Baudouin & Trubetzkoy considered vowel harmony to be morphophonological, not phonological. The basis for this is the fact that morphemic identities control harmony, not phonetic environments. Your reference to 'a few suffixes which fail to harmonize' places vowel harmony in morphophonology. ------------------------------ Date: Wed, 4 Nov 87 06:27 EST From: Greg Lee <lee@uhccux.UUCP> Subject: Re: Phonemes and Alphabets No. Baudouin, Sapir, and for that matter (or especially) David Stampe are all conventionalists. They accept the phonemic principle. Let me illustrate the alternatives. Say we have front ell [l] and back ell [l-] occurring phonetically, and amongst the vowels front [i] and back [a]. Phonetically, ell is always front before a front vowel and back before a back vowel, so [li] and [l-a] are possible, but never *[l-i] or *[la]. Then what about the basic memorized forms of morphemes? We must reduce "morpheme space" so that of the four possibilities /li/, /l-i/, /la/, /l-a/, we can have only two, because (let us say) in fact only two such morphemes can be distinguished. There are at least two conceivable alternatives. We could require that in morphemes there should be only one variety of ell, say the front one, in which case we have only /li/ and /la/ -- then [l-a] is described as having a phonetically conditioned allophonic variant of basic /l/. That's the phonemic analysis. Or we could require that in morphemes ell should always agree in frontness with the following vowel, in which case we have only the two possible morphemes /li/ and /l-a/. This would be an unconventional analysis because two basic segments /l/ and /l-/ are assumed even though these can never be used to distinguish morphemes. Morpheme space has been reduced, but segment space has not. It's this type of analysis I suggested for Turkish. But I don't think any of the above mentioned persons would have (had) any hesitation in saying this must be wrong. It violates the phonemic principle and, note, is alphabetically inconvenient, since more letters than necessary are required to transcribe basic pronunciations. Generative phonologists are unconventional by accident, since no principle of generative phonology requires phonemic analyses. But they are conventional by habit. David Stampe is conventional by principle, and Natural Phonology has explicit assumptions to require phonemic analyses. As for Baudouin and Sapir, I don't know whether they arrive at phonemics through reflection and choice or through alphabetic habit. When we've agreed about the issue, I hope we can go on to discuss some evidence, especially if there are some Turkish or Hungarian experts here on the net to consult. Greg Lee, lee@uhccux.uhcc.hawaii.edu ------------------------------ Date: Sun, 25 Oct 87 20:26 EST From: Neural Activists of the World .. Unite! <dwong@zgov03.dec.com> Subject: Chinese >Please note that it is Turing claiming Chinese "attempts" to have an >enumerable infinity of symbols. .... >Chinese cannot succeed at having an enumerbale infinity of symbols. It >can only "attempt" to have them. Unless you "go down a level" and >consider "things that make up" Chinese symbols "the symbols." Then >there must be a finite number pf them. Does anyone know if this is true >about Chinese? Somebody may already have replied to this ahead of me but anyways, let me first congratulate Peter Berke on his correct deduction. It sure is true about Chinese. Ancient Chinese writing WAS composed of a finite set of symbols and that set of symbols do exist to this day. It's just that instead of writing those symbols linearly, they have been written two-dimensionally, thus a Chinese symbol is more of a two-dimensional word. Problems with the writing today is that in the course of two ... three thousand years, there have been many additions, subtractions and the like. Contemporary Chinese writing is going in for a 'shorthand' version of these words and this will probably obscure the original two-dimensional word more. Lastly, there is no enumerable infinity of symbols. Obviously, any language must be reducible to a finite set of primitive symbols. An infinity of symbols infers zero information content per symbol. You cannot distinguish between 99999999999999999 and 99999999999999999999 mainly because you can't count at a glance, but it is simple to aribitrarily assign 'values' to 17 9's and 20 9's which can be distinguished more easily. In a two-dimensional sense, in finite paper space (but perhaps not on Turing's tape !?!?!?), one would probably come out with a blob of ink. Lastly lastly, not all Chinese know all Chinese words but because of their composition of these 'primitive symbols' and associations with other 'known words', an educated guess can be made. This occurs in English commonly too but I get this feeling that it is more easily done in Chinese because of the symbolism of the words. Take for example the word "electronic", could be deduced if the word "electron" or "electric" is known. Likewise too in Chinese. DISCLAIMER: ALL OPINIONS DISCUSSED ARE STRICTLY MY OWN AND DO NOT REFLECT THE OPINIONS OF MY EMPLOYER OR ANYBODY WHO DOES NOT WISH TO REFLECT MY OPINIONS. ------------------------------ End of NL-KR Digest *******************