marty1@houdi.UUCP (05/27/87)
In article <13263@watmath.UUCP>, erhoogerbeet@watmath.UUCP writes: > Hello. I am an undergrad going into computer science.... > > Is there a Backus-Naur Form for the English language itself or is this too > complicated? If not, how is it that we can understand a huge variety of > different sentence forms and still recognize that some are syntactically > incorrect? Basically, what I am asking is it possible to do syntactic > checking as if "compiling" a sentence with rules set down in some BNF? About 30 years ago when I was at MIT doing graduate study in EE, my wife was talking with a guy named Chomsky who wanted to do machine translation. The effort resulted in new approaches to English grammar, but not in machine translation. Basically, people do not classify words and structures neatly. Some words are syntactically unique, and some structures are valid only for a specific sequence of words. Some sentences are recognized as syntactically correct by some speakers of English but not others. We read, write, talk, listen, understand and misunderstand each other in this chaos because our brains are not structured like computers. > As I understand it so far, natural language processing would have at least > two levels (syntactic, semantic) and that syntactic checking level would > be the basis of the other. True. > "Colourless green ideas sleep furiously." syntactically but not semantically > correct. Your ideas, being those of an undergraduate, are green, and further, since you lack imagination, are colorless. Put them to sleep. I hope they sleep calmly, but if they sleep furiously, that's your problem. :-) Now that may be semantically correct, but is it factually correct? Seriously, I would like to know what (in plain English, if possible) is going on in formal English grammar and natural language parsing. M. B. Brilliant Marty AT&T-BL HO 3D-520 (201)-949-1858 Holmdel, NJ 07733 ihnp4!houdi!marty1
randyg@iscuva.ISCS.COM (Randy Gordon) (05/27/87)
There are two major "camps" of natural language techniques. The "Old School" is the syntactic/transformational grammer types, which comprise the majority of folks actually producing natural language parsers. Harris's Intellect, for example. Some translation work is done by them, Tomita's has a nice book on English/Oriental translation, and McCord's LMT on English/German translation, for example. Personally, I find thier efforts brilliant, but as futile as the dying stages of the "ether" theory. The "Young Turks" are the descendents of Schank's Conceptual Dependency theory, Sowa, Schank, etc. These generally treat semantics as secondary to syntactics. Shanks company, Cognitive Systems, several years ago, produced a system for a Belgian Bank that could translate six languages well enough to act as an interface to an EFT system. The nice thing about concept based parsing is that you don't have to fully map the sentence to understand it. This is very useful in unrestricted input systems. There is a small group that believes you ought to pick the input words off of menus. It works, and with modifications, better than ANY of the other methods, but frankly, I don't consider it natural language processing. A lot of the concept based parsing techniques have fascinating consequences in expert systems, database representation type work, and may very well represent the next "hot button" in AI. Randy Gordon "Tao ku tse fun pee"
srt@CS.UCLA.EDU (05/27/87)
In article <529@iscuva.ISCS.COM> randyg@iscuva.UUCP (Randy Gordon) writes: > >The "Young Turks" are the descendents of Schank's Conceptual Dependency >theory, Sowa, Schank, etc. These generally treat semantics as secondary >to syntactics. > I think you probably mean "syntactics as secondary to semantics". The important distinction is that the "Old School" believes that there exists semantics-free language principles (Universal Grammar, the Language Acquisition Device and so on) and as a consequence study language separate from meaning (e.g., semantics). The "Young Turks" study language as a (natural) representation for internal concepts (Schank's CD is more a theory of concept representation than of linguistics). As a consequence, the research focus in this group is on the processes involved in turning language into concepts and vice versa. It has turned out that "semantics" (in the sense of word meanings and concept manipulation) has proven much more important to correctly translating from language to concepts than syntactic information. Unlike the "Old School" that tries to study syntax in isolation, the "Young Turks" recognize the importance of syntax and use syntactic information whenever appropriate. The apparent bias arises because of the actual relative importance of semantics vs. syntax, not because of any research methodology. Scott R. Turner UCLA Computer Science "But If I Graduate, Domain: srt@ucla.cs.edu I'll Have to Work for a Living" UUCP: ...!{cepu,ihnp4,trwspp,ucbvax}!ucla-cs!srt
code@sphinx.uchicago.edu (paul robinson wilson) (05/28/87)
In article <529@iscuva.ISCS.COM> randyg@iscuva.UUCP (Randy Gordon) writes: > >There are two major "camps" of natural language techniques. > >The "Old School" is the syntactic/transformational grammer types, which >[...] >The "Young Turks" are the descendents of Schank's Conceptual Dependency >theory, Sowa, Schank, etc. These generally treat semantics as secondary >to syntactics... Uh, I think it's the other way around (Schankians treat semantics as primary), but it's a very important point. The posting that started this exchange suggested intro articles for beginners. I don't think this is an appropriate place for that, since any good articles are not going to be really short (translation: they'll cost the net a lot of money). However, it seems like an excellent idea to post references to good intro material, both for beginners and for teachers of beginners. So how about it? Any references to good intro articles on NLP, especially ones that clearly discuss both the syntax-first and integrated approaches? (Also, is James Allen's new book good?) |=-=-=-=- "COMPUTER TALKS WITH THE DEAD: New microchip receives -=-=-=-=| | brainwaves from beyond the grave" -- Weekly World News, 5/26/87 | | Paul R. Wilson EECS Dept.(M/C 154) UIC Box 4348 Chicago, IL 60680 | | ARPA: uicbert!wilson@uxc.cso.uiuc.edu UUCP: ihnp4!uicbert!wilson | |=-=-=-=-=-=-=- if no answer try: ihnp4!gargoyle!sphinx!code -=-=-=-=-=-=-=|
hughes@endor.harvard.edu (Brian Hughes) (05/28/87)
In article <1116@houdi.UUCP> marty1@houdi.UUCP (M.BRILLIANT) writes: (summarized) >In article <13263@watmath.UUCP>, erhoogerbeet@watmath.UUCP writes: >> ... >> Is there a Backus-Naur Form for the English language itself or is this too >> complicated? ... Basically, what I am asking is it possible to do syntactic >> checking as if "compiling" a sentence with rules set down in some BNF? Natural language is not context free (though some people disagree on this). BNF formalisms cannot deal with context sensitive languages >About 30 years ago when I was at MIT doing graduate study in EE, my >wife was talking with a guy named Chomsky who wanted to do machine >translation. The effort resulted in new approaches to English grammar, >but not in machine translation. While in a strict sense this is true, Chomsky's transformational grammer seems to be almost universaly accepted as the basis upon which to build models that deal with the syntax of natural languages. This is true for computerized models as well as pure abstract models. >We read, write, talk, listen, understand and misunderstand each other >in this chaos because our brains are not structured like computers. While I agree that my brain is not structured like an IBM/PC, (at least IBM hasn't asked me for royalties :-)) or any existing silicon processor, I don't agree with the implication that a computer cannot, a priori, understand natural language. We just haven't built intelligent enough systems yet (hardware & software architectures). I realize that Marty Brilliant may not have meant the implication I read; just wanted to make my position clear. >> As I understand it so far, natural language processing would have at least >> two levels (syntactic, semantic) and that syntactic checking level would >> be the basis of the other. > >True. But that's not the end of the matter. You also have to deal with the higher levels of language organization, such as discourse (e.g., a conversation). An utterance may refer back to an entity introduced in a previous utterance in "shorthand" - by a pronoun, or short refering expression. To understand that shorthand, we somehow are able to (unconsciously) retrieve the full referant. At an even higher level, a discourse may go from one concept to a sub-concept, engage in temporary diversions, and jump from one thing to another, but we can understand it. People are starting to explicate the rules of discourse grammer, buts lots remains to be done. >Seriously, I would like to know what (in plain English, if possible) is >going on in formal English grammar and natural language parsing. For a nice, powerful method of parsing English into a deep structure representation, check out ATNs (Augmented Transition Networks). Bill Woods developed this method in the late 60's. You can implement a simple ATN in a couple of pages of LISP (see Jon Amsterdam's article in AI Expert a few months ago, also online in AI Expert's BBS). For a full ATN writeup, including code, see The Lunar Sciences Natural Language Information System: Final Report by W.A. Woods, R.M. Kaplan, and B. Nash-Webber, BBN Report #2378, BBN, 50 Moulton St., Cambridge, MA. Also see Madeline Bates' article "The Theory and Practice of Augmented Transition Networks" in Natural Language Communications with Computers, Bolc, L., ed., Springer-Verlag, 1978 For a lot of different views of language understanding, all the way from syntax through discourse, see Readings in Natural Language Processing, Grosz, B., Jones, K.S., and Webber, B., eds., 1986, Morgan Kauffman. After reading this, read a transcript of a naturally occuring informal conversation to see how far we have yet to go. Please excuse the length of this - I just finished 2 courses in this stuff with Bill Woods and Barbara Grosz, & its still buzzing in my head. -------------------------------------------------------------------- Disclaimer: Correct statements mainly due to profs. Woods & Grosz, stupidities all my own.
randyg@iscuva.ISCS.COM (Randy Gordon) (05/28/87)
*Sigh*, when I make an unconcious slip, its a doozy, aint it? It was my third try at getting it past the article eater, and I mistyped. OF COURSE its semantics more than syntactics... ANyhow, some of the better intros: THe best of the current crop seems to be Introduction to Natural Language Processing by Mary D Harris. It is clear, well written and at a beginners level. (approximately equivalent to a Scientific American article). C 1985, Reston. Everyones mentioned Conceptual Structures, Information Processing in Mind and Machine by J.F. Sowa. For some reason, I get a headache when I read it, which is unfair since it is a very well written book, and appears to be easy. Schank emits books at the speed of a turbo prolog compiler, and they are ALL readable. If you are into Lisp (for which I pity you) then an rather old book, Inside Computer Understanding, five programs plus minatures, Is about the nicest intro to Conceptual Dependency around. It has very thorough explanations plus programs written in pre common lisp. If you ignore the instructions on how to implement primitves that are probably already supplied by your Lisp (such as the MAp??? functions, the programs are easy to implement. Wilensky has done some rather nice work, such as UC, a smart help unix consultant natural language processor I have been dying to get my hands on, and the PEARL, PAM, PANDORA class of programs. He has a new book out, Planning and Understanding, A computational approach to human reasoning, Addison Wesley, that I am just starting to read, but looks like a fascinating example of the forefronts of the science. I have been waiting FOREVER for Terry Winograds 2nd volumn of his Language as a Cognitive Process, which does a wonderful, if slightly uneven job of explaining syntax in his first volume. Its an excellent reference if you are reading someone elses paper and trying to figure out what they are talking about when they discuss some obscure grammar problem. These should get you started. Randy Gordon, "No relation to the Idiot who spoonerized the previous message"
bds@mtgzz.UUCP (05/28/87)
In article <2112@husc6.UUCP>, hughes@endor.harvard.edu (Brian Hughes) writes: > In article <1116@houdi.UUCP> marty1@houdi.UUCP (M.BRILLIANT) writes: > (summarized) > >In article <13263@watmath.UUCP>, erhoogerbeet@watmath.UUCP writes: > >> ... > >> Is there a Backus-Naur Form for the English language itself or is this too > >> complicated? ... Basically, what I am asking is it possible to do syntactic > >> checking as if "compiling" a sentence with rules set down in some BNF? > > Natural language is not context free (though some people disagree > on this). BNF formalisms cannot deal with context sensitive languages What I've seen in the literature is a definite trend towards considering English as context free. I think it was Patrick Winston who wrote in his AI text that examples of context sensitive English were really examples of ambiguous English. Context free parsing algorithms can handle ambiguous grammars, and in fact a BNF like formalism for parsing English was done in the LSP system (described in a book I don't have handy right now). The ambiguity in parses were resolved through a "restriction language" which was essentially a set of rules that avoided invalid parses (this is analogous to how many C compilers handle parsing of "lvalues" - the grammar just knows about expressions, the semantics worry about the "lvalue" quality). The LSP grammar for English was quite large (but so are the ATNs; take a peek at the one for LUNAR!) and was still evolving even as the book was written. Still the issue has not been resolved to my knowledge. It's also worth while to look at some work being done with WASP systems (aka Marcus parser - see Winston's AI book again). There are serious arguments that WASP systems model human parsing of English, and is being used as a basis for theories on how English is learned (see "Acquisition of Syntactic Knowledge", by R.C. Berwick; MIT Press 1985).
goldberg@su-russell.ARPA (Jeffrey Goldberg) (05/30/87)
In article <2112@husc6.UUCP> hughes@endor.UUCP (Brian Hughes) writes: >In article <1116@houdi.UUCP> marty1@houdi.UUCP (M.BRILLIANT) writes: > (summarized) >>In article <13263@watmath.UUCP>, erhoogerbeet@watmath.UUCP writes: >>> ... >>> Is there a Backus-Naur Form for the English language itself or is this too >>> complicated? ... Basically, what I am asking is it possible to do syntactic >>> checking as if "compiling" a sentence with rules set down in some BNF? > Natural language is not context free (though some people disagree >on this). BNF formalisms cannot deal with context sensitive languages I don't think that there is any serious disagreement here. The work done by Culy on Bambara reduplication and Shieber on Swiss cross serial dependencies has convinced the last hold outs for CFness (Geoff Pullum, Gerald Gazdar, and their students: me, Culy, etc). >>About 30 years ago when I was at MIT doing graduate study in EE, my >>wife was talking with a guy named Chomsky who wanted to do machine >>translation. The effort resulted in new approaches to English grammar, >>but not in machine translation. > While in a strict sense this is true, Chomsky's transformational >grammer seems to be almost universaly accepted as the basis upon which to >build models that deal with the syntax of natural languages. This is true >for computerized models as well as pure abstract models. These is hardly true at all. It is true that "generative grammar" is nearly universally accepted, and this comes from Chomsky. While the most popular current generative theory is transformational (Government and Binding theory), the role of transformations has been reduced radically, and much more emphasis is placed on interacting well formedness conditions on different levels of representations. Substantial minority theories, Generalized Phrase Structure Grammar, and Lexical Functional Grammar, do not employ transformations. A summary of these three theories can be found in "Lectures on Contemporary Syntactic Theories: An Introduction to government-Binding Theory, Generalized Phrase Structure Grammar, and Lexical-Functional Grammar" by Peter Sell. Published by the Center for the Study of Language and Information, and distributed by Chicago University Press. I have seen implementations based on LFG, GPSG (and an offshoot of that) as well as some other not transformational models. I have only once seen a GB based parser. It was very clever, but it only parsed four sentences. None of these theories were constructed with computer processing in mind, but it does turn out that it is often easier to build a parser based on nontransformation representations. None of the authors of these theories would claim that their theory was a better linguistic theory because of this property. >>> As I understand it so far, natural language processing would have at least >>> two levels (syntactic, semantic) and that syntactic checking level would >>> be the basis of the other. I have seen parsers that build up semantic representations along with the syntax in which there is no sense that the syntax is prior. Again, I am directing follow-ups to my follow-up to sci.lang. -- Jeff Goldberg ARPA goldberg@russell.stanford.edu UUCP ...!ucbvax!russell.stanford.edu!goldberg
tfra@ur-tut.UUCP (Tom Frauenhofer) (06/01/87)
[Et tu, line-eater?] Actually, there is a (one-paragraph) discussion comparing BNF versus Transition Networks in the latest issue of AI Magazine (Volume 8, Number 1). It is part of the article "YANLI: A Powerful Natural Language Front-End Tool" by John C. Glasgow II. It even includes an example of a BNF and a Transition Network representation of the same grammar fragment.