nl-kr-request@CS.ROCHESTER.EDU (NL-KR Moderator Brad Miller) (11/11/87)
NL-KR Digest (11/11/87 02:53:50) Volume 3 Number 46 Today's Topics: Re: Practical effects of AI (speech) power of Montague syntax Re: Langendoen and Postal (posted by: B Re: Why can't my cat talk? ---------------------------------------------------------------------- Date: Tue, 3 Nov 87 09:29 EST From: George Tatge <gt@hpfcmp.HP.COM> Subject: Re: Practical effects of AI (speech) > >Those of us who work on speech will be very encourage by this enthusiasm. >However, > >(1) Speaker-independent continuous speech is much farther from reality > than some companies would have you think. Currently, the best > speech recognizer is IBM's Tangora, which makes about 6% errors > on a 20,000 word vocabulary. But the Tangora is for speaker- > dependent, isolate-words, grammar-guided recognition in a benign > environment. Each of these four constraints cuts the error rate > by 3 or more times if used independently. I don't know how well > they will do if you remove all four constraints, but I would guess > about 70% error rate. So while speech recognition has made a lot > of advancements, it is still far from usable in the application you > mentioned. > >Kai-Fu Lee >Computer Science Department >Carnegie-Mellon University >---------- Just curious what the definition of "best" is. For example, I have seen 6% error rates and better on grammar specific, speaker dependent, continuous speech recognition. I would guess that for some applications this is better than the "best" described above. George (floundering in superlative ambiguity) Tatge ------------------------------ Date: Sun, 8 Nov 87 12:14 EST From: Kai-Fu Lee <kfl@SPEECH2.CS.CMU.EDU> Subject: Re: Practical effects of AI (speech) In article <930001@hpfcmp.HP.COM>, gt@hpfcmp.HP.COM (George Tatge) writes: > > > >(1) Speaker-independent continuous speech is much farther from reality > > ... > >Kai-Fu Lee > > Just curious what the definition of "best" is. For example, I have seen > 6% error rates and better on grammar specific, speaker dependent, continuous > speech recognition. I would guess that for some applications this is > better than the "best" described above. > "Best" is not measured in terms of error rate alone. More effort and new technologies have gone into the IBM's system than any other system, and I believe that it will do better than any other system on a comparable task. I guess this definition is subjective, but I think if you asked other speech researchers, you will find that most people believe the same. I know many commercial (and research) systems have lower error rates than 6%. But you have to remember that the IBM system works on a 20,000 word vocabulary, and their grammar is a very loose one, accepting arbitrary sentences in office correspondences. Their grammar has a perplexity (number of choices at each decision point, roughly speaking) of several hundred. Nobody else has such a large vocabulary or such a difficult grammar. IBM has experimented with tasks like the one you mentioned. In 1978, they tried a 1000-word task with a very tight grammar (perplexity = 5 ?), the same task CMU used on Hearsay and Harpy. They achieved 0.1% error rate. > George (floundering in superlative ambiguity) Tatge Kai-Fu Lee ------------------------------ Date: Tue, 3 Nov 87 09:36 EST From: Greg Lee <lee@uhccux.UUCP> Subject: power of Montague syntax I posted a question about the power of Montague syntax. I guess the answer is obvious. No assumptions constrain the functions which determine the form of phrases given the forms of their parts. So such functions could be specified as the product of a list of transformations, or a Turing machine, for that matter. So why is it that Montague grammar is widely regarded as a non-transformational model? Am I missing something? Greg Lee, lee@uhccux.uhcc.hawaii.edu ------------------------------ Date: Tue, 3 Nov 87 18:14 EST From: Jeffrey Goldberg <goldberg@russell.STANFORD.EDU> Subject: Re: power of Montague syntax In article <1057@uhccux.UUCP> lee@uhccux.UUCP (Greg Lee) writes: >No assumptions constrain the functions >which determine the form of phrases given the forms of their parts. >So such functions could be specified as the product of a list >of transformations, or a Turing machine, for that matter. >So why is it that Montague grammar is widely regarded as a >non-transformational model? Am I missing something? > Greg Lee, lee@uhccux.uhcc.hawaii.edu Montague grammar is more or less a theory of semantics. Though many of the practitioners of MG use some form of catagorial grammar or they use a PSG with wrapping. But MG as defined in "Introduction to Montegue Grammar" by D. Dowty, R. Wall, and S. Peters places no restriction of the syntactic combination of elements and is very likey turing equivalent. -jeff goldberg -- Jeff Goldberg ARPA goldberg@russell.stanford.edu UUCP ...!ucbvax!russell.stanford.edu!goldberg ------------------------------ Date: Wed, 4 Nov 87 10:33 EST From: Paul Neubauer <neubauer@bsu-cs.UUCP> Subject: Re: power of Montague syntax In article <1057@uhccux.UUCP>, lee@uhccux.UUCP (Greg Lee) writes: > I posted a question about the power of Montague syntax. I guess > the answer is obvious. No assumptions constrain the functions > which determine the form of phrases given the forms of their parts. > So such functions could be specified as the product of a list > of transformations, or a Turing machine, for that matter. > > So why is it that Montague grammar is widely regarded as a > non-transformational model? Am I missing something? I don't think you're missing anything. If the functions are not defined, then they could be anything, but since they are not defined to be transformational-type rules, Montague grammar is not [explicity] a transformational grammar. Unfortunately, we just don't know what it IS. To be fair, though, I don't think that the question of weak generative power worries most Montague grammarians. In fact, I don't think it really worries ME. I suppose that my former life as a Generative Semanticist has jaded me on that question, but I can't get excited about the weak generative power of an under-defined class of grammars when what I see as relevant is the strong generative power of a particular, substantively defined grammar or class of grammars. [I use "substantive" in a more substantive sense than Chomsky, whose "substantive" universals I still consider mostly formal.] -- Paul Neubauer UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!neubauer ------------------------------ Date: Wed, 4 Nov 87 01:31 EST From: goldfain@osiris.cso.uiuc.edu Subject: Re: Langendoen and Postal (posted by: B > /* Written 10:34 am Nov 1, 1987 by berke@CS.UCLA.EDU in comp.ai */ > /* ---------- "Langendoen and Postal (posted by: B" ---------- */ > I just read this fabulous book over the weekend, called "The Vastness > of Natural Languages," by D. Terence Langendoen and Paul M. Postal. > ... > Their basic proof/conclusion holds that natural languages, as linguistics > construes them (as products of grammars), are what they call > mega-collections, Quine calls proper classes, and some people hold cannot > exist. That is, they maintain that (1) Sentences cannot be excluded from > being of any, even transfinite size, by the laws of a grammar, and (2) > Collections of these sentences are bigger than even the continuum. They are > the size of the collection of all sets: too big to be sets. > ... > /* End of text from osiris.cso.uiuc.edu:comp.ai */ Hang on a minute! It *sounds* as though you are talking about Context-Free Grammars/Languages (CFGs/CFLs) here. Most linguists (I'd wager) set up their CFGs as admitting only finite derivations over a finite set of production rules, each rule only allowing finite expansion. Thus, although usually a CFL is only a proper subset of this, we are ALWAYS working WITHIN the set of finite strings (of arbitrary length) over a finite alphabet. Such a set is countably infinite. Far from being a proper class, this is a very manageable set. If you move the discussion up to the cardinality of the set of "discourses", which would be finite sequences of strings in the language, you are still only up to the power set of the integers, which has the same cardinality as the set of Real numbers. Again, this is a set, and not a proper class. I haven't seen the book you cite. They must make some argument as to why they think natural languages (or linguistic theories about them) admit infinite sentences. Even given that, we would have only the Reals (i.e. the "Continuum") as a cardinality without some further surprising claims. Can you summarize their argument (if it exists) ? Mark Goldfain arpa: goldfain@osiris.cso.uiuc.edu Department of Computer Science University of Illinois at Shampoo-Banana ------------------------------ Date: Fri, 6 Nov 87 20:47 EST From: Mitchell Spector <spector@suvax1.UUCP> Subject: Re: Langendoen and Postal (posted by: B In article <8300011@osiris.cso.uiuc.edu>, goldfain@osiris.cso.uiuc.edu comments on an article by berke@CS.UCLA.EDU: > > > /* Written 10:34 am Nov 1, 1987 by berke@CS.UCLA.EDU in comp.ai */ > > /* ---------- "Langendoen and Postal (posted by: B" ---------- */ > > /* End of text from osiris.cso.uiuc.edu:comp.ai */ > > Hang on a minute! It *sounds* as though you are talking about Context-Free > Grammars/Languages (CFGs/CFLs) here... The set of all finite sequences of finite strings in a language (the set of "discourses") is still just a countably infinite set (assuming that the alphabet is finite or countably infinite, of course). The set of infinite sequences of finite strings is uncountable, with the same cardinality as the set of real numbers, as is the set of infinite strings. (By infinite string or infinite sequence, I mean an object which is indexed by the natural numbers 0, 1, 2, ....) In general, sets of finite objects are finite or countably infinite. (A finite object is, vaguely speaking, one that can be identified by means of a finite representation. More specifically, this finite representation or description must enable you to distinguish this object from all the other objects in the set.) If you want to get an uncountable set, you must use objects which are themselves infinite as members of the set. Many people lose sight of the fact that a real number is an infinite object (although an integer or a rational number is a finite object). Any general method of identifying real numbers must use infinitely long or large representations (for example, decimals, continued fractions, Cauchy sequences, or Dedekind cuts). Real numbers are much more difficult to pin down than one might gather from many math classes. This misimpression is partly due to the fact that one deals only with a relative small (finite!) set of specific real numbers; these either have their own names in mathematics or they can be defined by a finite sequence of symbols in the usual mathematical notation. The other real numbers belong to a nameless horde which we use in general arguments but never by specific mention. I certainly agree with the general objections raised to the idea that natural languages are uncountably large (or, worse yet, proper classes), although I haven't read the book in question. Maybe somebody can state more precisely what the book claimed, but it seems at first glance to indicate a lack of understanding of modern set theory. By the way, logicians do study infinite languages, including both the possibility of infinitely many symbols and that of infinitely long sentences, but such languages are very different from what we think of as "natural language." It doesn't matter whether you're talking about context-free languages or more general sorts of languages -- in any language used by people for communication, the alphabet is finite, each word is finitely long, and each sentence is finitely long. -- Mitchell Spector |"Give me a Dept. of Computer Science & Software Eng., Seattle Univ.| ticket to Path: ...!uw-beaver!uw-entropy!dataio!suvax1!spector | Mars!!" or: dataio!suvax1!spector@entropy.ms.washington.edu | -- Zippy the Pinhead ------------------------------ Date: Wed, 4 Nov 87 08:49 EST From: necntc!adelie!mirror!ishmael!inmet!justin@ames.arpa Subject: Re: Why can't my cat talk? /* Written 4:03 pm Oct 31, 1987 by roberts@cognos.UUCP in inmet:comp.ai */ Should this crystallization hypothesis prove true, what does this tell us about gorillas? And is AMSLAN, in which I understand at least one gorilla has achieved not only a considerable vocabulary but a remarkable proficiency at combining "symbols" to denote new concepts, a natural language? That is to say, does mastery of a sign language require the same brain functions as those required to speak a natural language? /* End of text from inmet:comp.ai */ As I understand it, AMSLAN is, in fact, a proper natural language. The rub is that the gorillas learning it have only learned it to a point. AMSLAN has its own particular syntax, and that seems to be the sticking point. While the gorillas seem perfectly able to learn the concepts, and is able to stick them together, they don't seem to be able to understand sophisticated *syntax* (beyond two-word combinations). Just what this implies about cognition, I'm not sure. -- Justin du Coeur ------------------------------ Date: Wed, 4 Nov 87 10:20 EST From: Alan Lovejoy <alan@pdn.UUCP> Subject: Re: Why can't my cat talk? In article <576@russell.STANFORD.EDU> goldberg@russell.UUCP (Jeffrey Goldberg) writes: /[I allege that most languages (especially primitive ones) rely more on morphology than word-order to encode syntax] /I will take your claim seriously if you do the following: / /(1) Devise a sampling method that factors out things that should / be factored out. (I linguist named Matthew Dryer has done / some excellent work on this problem, and has consturcted a / method that I would certainly trust.) / /(2) Provide a definition of "primitive" which would yeild the same / result when applied by a number of anthropologists. (That is, / your definition must be explicit enough so that an arbitrary / anthropologist could determine what what "primitive".) / /(3) Provide a definition of what ever grammatical property you wish / to test for which would yeild the same result when applied by a / number of linguists. (That is, your definition must be / explicit enough so that an arbitrary linguist could tell / whether it is "free word order" (or whatever).) / /(4) Apply standard statistical techniques to determine / significance. / /Until you move to do something like that your claim is like claiming: /"People with big feet like tometos". And basing this on the fact /that you have met a couple families with bigger feet then yours who /served spaghetti with tomato sause and one even put tomatoes in the /salad. Excuse me, but I believe you were the one to propose a new theory relating hand movement sequences to syntax, and you are the one publishing a paper expounding your theory. Why should I do your research work for you? I was happy to point out the likely form of the attacks you would receive once your paper is published, but I get paid to do software engineering, not publish research in cultural anthropology (might be fun, but it doesn't pay enough :-) ). --alan@pdn ------------------------------ Date: Thu, 5 Nov 87 12:00 EST From: Elizabeth D. Zwicky <zwicky@dormouse.cis.ohio-state.edu> Subject: Re: Why can't my cat talk? In article <8986@shemp.UCLA.EDU> srt@CS.UCLA.EDU (Scott Turner) writes: Hypotheses drawn from >degenerate cases like Genie need to be carefully tested in the normal adult >population before they can be given any serious consideration. > > Scott R. Turner Give me a break here. You CANNOT test hypotheses about whether or not there is a crystallization period after which language cannot be learned without dealing with degenerate cases. The case of someone who has been deprived of all language contact for n years, starting at birth, whether n is 2, 5, or 12, will always be a degenerate case. Certainly, Genie is not conclusive evidence, and such cases are (thank God!) rare, and so the evidence is not conclusive. However, in all known cases, children deprived of language contact cannstill learn languages normally if they start before puberty. The idea of a crystallization period is supported by the data about second language learning in normal humans, but the question I was answering was about learning of *first* languages. Elizabeth Zwicky ------------------------------ Date: Fri, 6 Nov 87 07:07 EST From: srt@CS.UCLA.EDU Subject: Re: Why can't my cat talk? In article <1125@tut.cis.ohio-state.edu> zwicky@dormouse.cis.ohio-state.edu (Elizabeth D. Zwicky) writes: >In article <8986@shemp.UCLA.EDU> srt@CS.UCLA.EDU (Scott Turner) writes: >> Hypotheses drawn from >>degenerate cases like Genie need to be carefully tested in the normal adult >>population before they can be given any serious consideration. > >Give me a break here. Take two, they're cheap. > ...You CANNOT test hypotheses about whether or not >there is a crystallization period after which language cannot be learned >without dealing with degenerate cases. Huh? Studies about second language learning and use clearly bear on this question. I don't consider adults who can learn a second language degenerate. (Well, no more degenerate than the average adult :-). I agree with your points, by the way. I'm just cautioning against building models based on people like Genie without having separate, confirming evidence that the model is reasonable for normal people. Scott R. Turner UCLA Computer Science "Delving into mockery science" Domain: srt@cs.ucla.edu UUCP: ...!{cepu,ihnp4,trwspp,ucbvax}!ucla-cs!srt ------------------------------ End of NL-KR Digest *******************