bayers@kodak.UUCP (mitch bayersdorfer) (01/01/70)
In article <417@russell.STANFORD.EDU> nakashim@russell.UUCP (Hideyuki Nakashima) writes: > >The comparison between Europian alphabetical system and Chinese >character system is very interesting. Chinese can DO have infinite >characters. What they do is to assign one character to each concept. >Of course, they don't have infinite characters now. But I say it is >possible to have them if they want. > >Hideyuki Nakashima >CSLI and ETL >nakashima@csli.stanford.edu (until Aug. 1988) >nakashima%etl.jp@relay.cs.net (afterwards) I don't mean to make a reductio ad adsurdum, but can any alphabet which a given human can perceive be truely infinite? Given that human beings have a finite (but very large) number of neurons in their visual and cerebral cortex, and that any distinct alphabetic character would exceed the thresholds of an enumerable permutation of those neurons, mustn't there be only a finite number of characters (and concepts?). I am assuming that neural thresholds for a certain learned concept are relatively constant. If another concept produces the same permutation, but only to a greater degree, then couldn't that be reduced down to two concepts-- one a measure of the concept, and the other a measure of its degree? - Mitch Bayersdorfer ...{}!rochester!kodak!bayers "The above does not represent the views of the Eastman Kodak Company or its management."
berke@CS.UCLA.EDU (10/13/87)
I thought you might find Turing's words on the subject interesting. These are from his 1936/7 article "On Computable Numbers, with an application to the Entscheidungsproblem." I think they are important. I also quote them in my article "That Does Not Compute" from ICNN '87, distinguishing brains from computers from neural networks, an expanded version of which I have submitted to Computer Magazine. Turing: "If we regard a symbol as literally printed on a square... The symbol is defined as a set of points in theis square, viz. the set occupied by the printer's ink. If these sets are restricted to be measurable, we can define the "distance" between two symbols as the cost of transforming one symbol into the other if the cost of moving unit area of printer's ink unit distance is unity..." He later goes on... "I shall also suppose that the number of symbols which may be printed is finite. If we were to allow an infinity of symbols, then there would be symbols differing to an arbitrarily small extent. The effect of this restriction of the number of symbols is not very serious. It is always possible to use sequences of symbols in the place of single symbols. Thus an Arabic numeral such as 17 or 999999999999 is normally treated as a single symbol. "Similarly, in any European language words are treated as single symbols (Chinese however, attempts to have an enumerable infinity of symbols). The differences from our point of view between the single and compound symbols is that the compound symbols, if they are too lengthy, cannot be observed at one glance. This is in accord with experience. We cannot tell at a glance whether 999999999999999 and 9999999999999999 are the same." These passages led me to suggest that what we want to do with neural networks is to mechanize perception, not simply to build faster computers. (Though that is a desirable goal in its own right.) Please note that it is Turing claiming Chinese "attempts" to have an enumerable infinity of symbols. I am ignorant of Chinese. At some point, with an increasing number of symbols, all members of an alphabet cannot be recognized at-a-glance. Or, looking at it another way, if you presume an infinity of symbols upon which you are operating, you are not computing. This is also the intuitive reason I claim ambiguity resolution is not computable. I think of ambiguity resolution as the general case of "object recognition" and "problem formulation." Imagine a vague scene or situation not already described by a finite enumeration of the objects in it and their relationships, etc. Such a vague scene is equivalent, in Turing's sense, to supposing an infinite number of things in the scene, which you are going to reduce to a finite number of options from which to choose. Since you are supposing an infinite alphabet (or the equivalent "vague" scene) technically, you are not computing. By Turing's above remarks, I think by his definition, Chinese cannot succeed at having an enumerable infinity of symbols. It can only "attempt" to have them. Unless you "go down a level" and consider "things that make up" Chinese symbols "the symbols." Then, there must be a finite number of them. Does anyone know if this is true about Chinese? It would seem that even in English it does not apply to orthography, though it apparently does to letters, since we use in hand-writing no fixed "alphabet" of strokes, etc. Well, I meant only to introduce Turing's words, forgive me for going on. Regards, Peter Berke
nakashim@russell.STANFORD.EDU (Hideyuki Nakashima) (10/13/87)
In article <8583@shemp.UCLA.EDU> berke@CS.UCLA.EDU (Peter Berke) writes: > >By Turing's above remarks, I think by his definition, Chinese cannot >succeed at having an enumerable infinity of symbols. It can only >"attempt" to have them. Unless you "go down a level" and consider >"things that make up" Chinese symbols "the symbols." Then, there >must be a finite number of them. Does anyone know if this is true >about Chinese? It would seem that even in English it does not apply to >orthography, though it apparently does to letters, >since we use in hand-writing no fixed "alphabet" of strokes, etc. Well, >I meant only to introduce Turing's words, forgive me for going on. > The comparison between Europian alphabetical system and Chinese character system is very interesting. Chinese can DO have infinite characters. What they do is to assign one character to each concept. Of course, they don't have infinite characters now. But I say it is possible to have them if they want. One chinese character usually consists of smaller constituents. For example, a character for "maple tree" consists of two parts: one designating "tree" and the other designating "wind". The character for "forest" consists of three "tree"s. The character for "tree" comes from a pictorial representation of a real tree. It is like this: | ----+---- /|\ / | \ / | \ So are characters for "sun", "house", "river" and so on. By this way, you can just invent a new character for a concept if it is a basic one. Otherwise you can combine several characters to define a new one. I think that Europian way of thinking is analytic while Eastern is holistic. Having alphabets to compose words vs. having characters for each concepts, is one of the examples. I further think that digital computer follows Europian way. I want to come up with Eastern equivalent of digital computer (not an analog one, though). Connectionism is one of the possibilities. -- Hideyuki Nakashima CSLI and ETL nakashima@csli.stanford.edu (until Aug. 1988) nakashima%etl.jp@relay.cs.net (afterwards)
david@linc.cis.upenn.edu (David Feldman) (10/14/87)
In <8583@shemp.UCLA.EDU>, Peter asks: >By Turing's above remarks, I think by his definition, Chinese cannot >succeed at having an enumerable infinity of symbols. It can only >"attempt" to have them. Unless you "go down a level" and consider >"things that make up" Chinese symbols "the symbols." Then, there >must be a finite number of them. Does anyone know if this is true >about Chinese? It would seem that even in English it does not apply to > >Peter Berke Chinese characters are by and large composed of 'radicals' which have base meanings. These meanings are not necessarily related to the meanings of the characters that are composed of the radicals, but they sometimes do. For instance, the question particle 'ma' has the 'ko' radical which is associated with the mouth. And indeed, there is a finite number of radicals. I have a couple of sheets describing them. Anything component of a character that is not a radical can be classified as a stroke, and there are specific kinds of strokes also. Dave Feldman david@linc.cis.upenn.edu
rbl@nitrex.UUCP ( Dr. Robin Lake ) (10/15/87)
In article <8583@shemp.UCLA.EDU> berke@CS.UCLA.EDU (Peter Berke) writes: > > ... >This is also the intuitive reason I claim ambiguity resolution >is not computable. I think of ambiguity resolution as the >general case of "object recognition" and "problem formulation." >Imagine a vague scene or situation not already described by a finite >enumeration of the objects in it and their relationships, etc. >Such a vague scene is equivalent, in Turing's sense, to supposing >an infinite number of things in the scene, which you are going to reduce >to a finite number of options from which to choose. Since you are >supposing an infinite alphabet (or the equivalent "vague" scene) technically, >you are not computing. > > ... > >Peter Berke "Computable/Computing" with the current mathematics. If you go back to the fundamentals of math (Robinson's axiomatic set theory) and remove the Axiom of Choice, one can derive a new mathematics which is more appropriate for situations where uncertainty prevails. This line of reasoning has worked extremely well in developing some powerful tools for "smart" data analyzers where the data is too dirty and uncertain for classic statistical techniques. -- Rob Lake {decvax,ihnp4!cbosgd}!mandrill!nitrex!rbl
zhu@ogcvax.UUCP (Jianhua Zhu) (10/15/87)
In article <417@russell.STANFORD.EDU>, nakashim@russell.STANFORD.EDU (Hideyuki Nakashima) writes: > In article <8583@shemp.UCLA.EDU> berke@CS.UCLA.EDU (Peter Berke) writes: > > > >By Turing's above remarks, I think by his definition, Chinese cannot > >succeed at having an enumerable infinity of symbols. It can only > >"attempt" to have them. Unless you "go down a level" and consider > >"things that make up" Chinese symbols "the symbols." Then, there > >must be a finite number of them. Does anyone know if this is true > >about Chinese? It would seem that even in English it does not apply to > >orthography, though it apparently does to letters, > >since we use in hand-writing no fixed "alphabet" of strokes, etc. Well, > >I meant only to introduce Turing's words, forgive me for going on. > > > > The comparison between Europian alphabetical system and Chinese > character system is very interesting. Chinese can DO have infinite > characters. What they do is to assign one character to each concept. > Of course, they don't have infinite characters now. But I say it is > possible to have them if they want. > ... > > each concepts, is one of the examples. I further think that digital > computer follows Europian way. I want to come up with Eastern > equivalent of digital computer (not an analog one, though). > Connectionism is one of the possibilities. > It looks like a symbol is defined as a finite set of discrete points in a *finitely bounded* square (otherwise, the number of symbols would be infinite). According to this definition, I cannot image how Chinese can have infinite symbols. On the other hand, if one is allowed to use certain operators operating on symbols to introduce new symbols, then the number of symbols in any language can potentially be infinite. As far as written languages are concerned, the main difference between Chinese and English (or any Europian language) is that English words are composed of letters from an finite alphabet in a one-dimensional manner (namely, concatenation), whereas Chinese words are composed of *strokes* from a finite stroke set as two-dimentional pictures. (YES the stroke set is VERY finite, in fact not that bigger than English alphabet.) Although we do have one more degree of freedom by which we can build words from elements of one level down, we wouldn't want to go too far in this other direction (we still want to have neatly printed document in our language). As a matter of fact, currently nobody is inventing new words by piling up existing words any more, and new words are added, as new concepts come along, almost exclusively in a form of *word combinations*, whose English analogy would be hyphenated words. Yes I quite agree with Mr. Nakashima in that digital computers follow Europian way. But I don't see how Connectionism in particular can lead to digital computers (or equivalents) of Eastern way (I do know that Connectionism architecture can do extremely well for tasks such as pattern recognition), and would be delighted if someone could provide further information. -- /__|__ _/-\_ | /__|__ * Jianhua Zhu *** UUCP: ...!tektronix!ogcvax!zhu "___|__, --- || /| ,|_/ * CSE Dept - OGC *** CSNet: zhu@oregon-grad ,"|", \|/ || ---+--- * 19600NW von Neumann Dr *** / \| \ /--- \| | * Beaverton, OR 97006 ***
tsmith@gryphon.CTS.COM (Tim Smith) (10/16/87)
In article <971@kodak.UUCP> bayers@kodak.UUCP (mitch bayersdorfer) writes: +===== | I don't mean to make a reductio ad adsurdum, but can any alphabet which | a given human can perceive be truely infinite? Given that human beings have | a finite (but very large) number of neurons in their visual and cerebral | cortex, and that any distinct alphabetic character would exceed the | thresholds of an enumerable permutation of those neurons, mustn't there | be only a finite number of characters (and concepts?). I am assuming that | neural thresholds for a certain learned concept are relatively constant. If | another concept produces the same permutation, but only to a greater | degree, then couldn't that be reduced down to two concepts-- one a measure | of the concept, and the other a measure of its degree? | - Mitch Bayersdorfer +===== You are addressing a problem that linguists have had to wrestle with for a long time. Back in the 1950's, the linguist Noam Chomsky did some original research on the properties of formal grammars that might be used to generate sentences in a natural language. Among other things, he discovered that any reasonable grammar (or production system) to generate sentences in natural language would have recursive properties that would, in effect, make the set of sentences of the language infinite in size. The corollary to this, of course, is that there is no "longest sentence" in a natural language. Realizing that it is somewhat silly to claim that a human can utter or comprehend a sentence that might be a few giga-centuries in duration, Chomsky fudged around and invented a distinction between linguistic "competence" and linguistic "performance". Actually, it's a reasonable fudge. A good analogy for this is simple arithmetic. If you know the rules of, say, multiplication, there is no reason why you can't multiply two numbers that require a few giga-light-years worth of paper to write down (in 10-point type). You are theoretically competent to do this. Your performance abilities (life span, attention span, etc.) will, however, undoubtedly keep you from achieving the end product. Cantor discovered several kinds of infinities. Perhaps what we need to discover is a kind of sub-infinity. The actual number of sentences in any natural language is "infinite" in the same sense that the number of grains of sand on a beach is infinite, i.e. "not really". Are alphabets like sentences? Uh, no, not at all! I have stayed out of this discussion about "infinite alphabets", since I do not understand the issue. I enter reluctantly. Here goes... An alphabet is a set (decidedly finite) of glyphs (e.g., little marks on paper). A language adopts an alphabet, and tries to map its sound system (its phonology) onto the alphabet. Sometimes this works well, sometimes not so well. For example, both Italian and English use the Latin alphabet. Neither language's phonology maps directly to the alphabet, but Italian maps better than English. In Italian there are only a few little problems (the letters "c", "g", "e", and "o" do not map one-to-one to Italian phonemes). In English, there are many, many problems (which we are all aware of). The sound system of every language contains a finite number of phonemes, and a finite (but much larger) number of syllables. Therefore, any alphabet (phomeme-glyph mapping), or any syllabary (syllable-glyph mapping) should be finite. Ideographic writing systems, such as Japanese "kanji", are not alphabets, and they are not syllabaries. If we assume that the languages that use ideographic writing systems allow free creation of new glyphs, then these writing systems have an infinite number of glyphs, in exactly the same sense that there is an infinite number of sentences in English, or that there is an infinite number of art works that can be created by mankind. -- Tim Smith INTERNET: tsmith@gryphon.CTS.COM UUCP: {hplabs!hp-sdd, sdcsvax, ihnp4, ....}!crash!gryphon!tsmith UUCP: {philabs, trwrb}!cadovax!gryphon!tsmith
rwojcik@bcsaic.UUCP (Richard Wojcik) (10/16/87)
There seems to be some confusion over the nature of writing systems for human languages. There are three 'ideal' types: logographic (word-based), syllabic, and alphabetic. Alphabetic writing systems are based on phonemes, the basic units of sound that make up words. Since most languages have between 30 and 50 phonemes, it makes little sense to talk about natural writing systems with infinite alphabets. Logographic writing systems, the least efficient type of writing, can have only as many symbols as there are words (or morphemes) in the vocabulary. If it is necessary to carry on this discussion, why not debate the size of the class of objects that written symbols map onto?
nakashim@russell.STANFORD.EDU (Hideyuki Nakashima) (10/16/87)
In article <971@kodak.UUCP> bayers@kodak.UUCP (mitch bayersdorfer) writes: >In article <417@russell.STANFORD.EDU> nakashim@russell.UUCP (Hideyuki Nakashima) writes: >> >> Chinese can DO have infinite characters. > >I don't mean to make a reductio ad adsurdum, but can any alphabet which >a given human can perceive be truely infinite? OK. Let me change "infinite" to "unbounded". Who cares for infinity in real life, anyway? What I wanted to say was, however, not that point. What Chinese tried was (or so seems tome was) to assign a symbol for each distinct concepts. So, as the number of concepts grow, so does the number of symbols (potentially). Europian "word"s correspond to Chinese "character"s. No Chinese equivalent of Europian "alphabet"s exist. -- Hideyuki Nakashima CSLI and ETL nakashima@csli.stanford.edu (until Aug. 1988) nakashima%etl.jp@relay.cs.net (afterwards)
spector@suvax1.UUCP (mitchell spector) (10/19/87)
In article <557@nitrex.UUCP>, rbl@nitrex.UUCP ( Dr. Robin Lake ) says: > .... If you go back to the fundamentals > of math (Robinson's axiomatic set theory) and remove the Axiom of Choice, one > can derive a new mathematics which is more appropriate for situations where > uncertainty prevails. This line of reasoning has worked extremely well in > developing some powerful tools for "smart" data analyzers where the data is > too dirty and uncertain for classic statistical techniques. What specifically are you thinking of? Robinson developed non-standard analysis, which is basically calculus with infinitesimals, done rigorously. The axiom of choice *is* generally used to construct a hyperreal number system (a structure containing both real numbers and infinitesimals which has certain desirable properties), as in Robinson's theory. While there is a substantial body of mathematics in which the axiom of choice is rejected, two things should be pointed out. The first is that it is almost never satisfactory merely to remove the axiom of choice from set theory (or even to adopt as an axiom the negation of the axiom of choice; the resulting theory is far too weak to prove many interesting results. Instead, various strong axioms which happen to contradict the axiom of choice are used as replacements; the chief of these is the axiom of determinateness. The second is that the acceptance or rejection of the axiom of choice has no known effect on any practical problem (for example, anything in physics). (This is certainly true if one is willing to adopt the much weaker axiom of dependent choices, as almost everyone in the field does.) For example, the same sentences of number theory can be proven with the axiom of choice as without it. Moreover, all statements of calculus or analysis of the type used in physics can be rephrased as statements of number theory, although this can get rather tedious and technical. It follows that none of these statements require the axiom of choice for their proofs either. It is only when one starts to consider things at a more abstract mathematical level that the axiom of choice comes into play (for example, the existence of a non-measurable set of real numbers). I'd be interested in the examples referred to in the previous message (although I don't know if this discussion belongs in sci.lang -- that depends on the examples, I guess). -- Mitchell Spector Dept. of Computer Science/Software Engineering, Seattle University Path: ...!uw-beaver!uw-entropy!dataio!suvax1!spector or: dataio!suvax1!spector@entropy.ms.washington.edu
lee@uhccux.UUCP (Greg Lee) (10/19/87)
Fixing a reference in a recent posting: Yngve, Victor. 1960. "A model and a hypothesis for language structure", Proc. of the Amer. Philos. Society 104, pp. 444-466. (Criticised in a footnote in Aspects.) Greg Lee, lee@uhccux.uhcc.hawaii.edu
verma@hpscad.dec.com (Virendra Verma, DTN 297-5510) (10/19/87)
In article <sci.lang:1530> tsmith@gryphon.CTS.COM (Tim Smith) writes: > The sound system of every language contains a finite number of > phonemes, and a finite (but much larger) number of syllables. > Therefore, any alphabet (phomeme-glyph mapping), or any > syllabary (syllable-glyph mapping) should be finite. That is very true in Devanagari script. There are only 33 alphabets in Devanagari script and they produce unique phonetic words. This means that you can pronounce any unheard word (of any language) exactly the way it is written in Devanagari script without any ambiguity. Devanagari script is used by Sanskrit and Hindi, the two prominent languages of India. - Virendra
randyg@iscuva.ISCS.COM (Randy Gordon) (10/19/87)
In article <436@russell.STANFORD.EDU> nakashim@russell.UUCP (Hideyuki Nakashima) writes: >............. No Chinese equivalent of Europian "alphabet"s exist. > > >-- >Hideyuki Nakashima >CSLI and ETL >nakashima@csli.stanford.edu (until Aug. 1988) >nakashima%etl.jp@relay.cs.net (afterwards) sigh, I am probably gonna get my head bit off, but.... I don't know much about Chinese writing, but in terms of formal game theory, does it really make much of a difference to the equivalence of the languages that chinese use simple tokens(well sorta) and europeans use complex tokens? From what I can tell, there is nothing that is impossible to translate either way(tho it may take a number of manipulations). RANDY GORDON(TAO KUO TSE FUN PEE)
dougf@lcuxlm.UUCP (10/22/87)
In article <436@russell.STANFORD.EDU>, nakashim@russell.STANFORD.EDU (Hideyuki Nakashima) writes: > >In article <417@russell.STANFORD.EDU> nakashim@russell.UUCP (Hideyuki Nakashima) writes: > >> Chinese can DO have infinite characters. > What I wanted to say was, however, not that point. What Chinese tried > was (or so seems tome was) to assign a symbol for each distinct > concepts. So, as the number of concepts grow, so does the number of > symbols (potentially). Europian "word"s correspond to Chinese > "character"s. Most Chinese "words" these days consist of two characters. Some "words" are made up of more characters. There are less than 10,000 Chinese characters, but the language is not that limited. New words are created all the time generally composed of characters with related meanings. "Telephone", for example is two characters, the first for electricity, & the second for speech. Other "words" are formed in the way we form acronyms. A poor example, but to the point: I taught at He-fei Gong-ye Da-xue (the 3rd "word", "University", is composed of the characters for "big" and a root for "study/school"), however it was almost always refered to as He-gong-da. This works for things other than names also. I am quoting the word "word", because the concept is often slippery in Chinese. In writing, each character is the separated by the same space whether or not it is part of the same "word". > No Chinese equivalent of Europian "alphabet"s exist. True, but as someone has mentioned, It has a very limited number of strokes. In fact a typewriter keyboard hooked to a microprocessor was invented in 1982 that could accept characters as input from a combination of as few as 5 types of strokes. The keyboard also has a number of standard stroke combinations that are part of many characters. These save time, but the 5 strokes are all that is needed. A key to this process is that the strokes in a Chinese character are to be written in a specific order. Thus, the character dictionary consists of ordered sets of the 5 strokes. The most complex single character has about 24 strokes. If we take that as a limit then the maximum number of characters, (assuming all combinations would be acceptable, which i doubt), would be : i=24 ___ \ 5^i or about 7*10^16 / --- i=1 70 quadrillion, although rather large is not infinite. In fact, few people would be able to spend the time to learn them all -) > Hideyuki Nakashima -- doug foxvog ...allegra!lcuxlj!dougf [Please use lcuxlj not lcuxlm] If only Bell Labs agreed with my opinion... NSA: names of CIA agents in NRO working on TEMPEST encrypted above. Drug dealing terrorists assassinated for planned hijacking.
lee@uhccux.UUCP (Greg Lee) (10/23/87)
Alphabets are often close to phoneme systems. They may have characters for allophones, however. Karl Menge in his book on Turkish grammar mentions the Orkhon script of a Turkic dialect which had different signs for most consonants, depending on whether they occur in front or back harmonic words. Judging from Turkish and following the conventional view, these were probably allophones. Greg Lee, lee@uhccux.uhcc.hawaii.edu