[comp.ai] English grammar

marty1@houdi.UUCP (05/27/87)

In article <13263@watmath.UUCP>, erhoogerbeet@watmath.UUCP writes:
> Hello. I am an undergrad going into computer science....
> 
> Is there a Backus-Naur Form for the English language itself or is this too
> complicated? If not, how is it that we can understand a huge variety of
> different sentence forms and still recognize that some are syntactically
> incorrect? Basically, what I am asking is it possible to do syntactic
> checking as if "compiling" a sentence with rules set down in some BNF?

About 30 years ago when I was at MIT doing graduate study in EE, my
wife was talking with a guy named Chomsky who wanted to do machine
translation.  The effort resulted in new approaches to English grammar,
but not in machine translation.

Basically, people do not classify words and structures neatly.  Some
words are syntactically unique, and some structures are valid only for
a specific sequence of words.  Some sentences are recognized as
syntactically correct by some speakers of English but not others.
We read, write, talk, listen, understand and misunderstand each other
in this chaos because our brains are not structured like computers.

> As I understand it so far, natural language processing would have at least
> two levels (syntactic, semantic) and that syntactic checking level would
> be the basis of the other.

True.

> "Colourless green ideas sleep furiously." syntactically but not semantically
>   correct.

Your ideas, being those of an undergraduate, are green, and further,
since you lack imagination, are colorless.  Put them to sleep.  I hope
they sleep calmly, but if they sleep furiously, that's your problem. :-)
Now that may be semantically correct, but is it factually correct?

Seriously, I would like to know what (in plain English, if possible) is
going on in formal English grammar and natural language parsing.

M. B. Brilliant					Marty
AT&T-BL HO 3D-520	(201)-949-1858
Holmdel, NJ 07733	ihnp4!houdi!marty1

randyg@iscuva.ISCS.COM (Randy Gordon) (05/27/87)

There are two major "camps" of natural language techniques. 

The "Old School" is the syntactic/transformational grammer types, which 
comprise the majority of folks actually producing natural language parsers.
Harris's Intellect, for example. Some translation work is done by them,
Tomita's has a nice book on English/Oriental translation, and McCord's LMT
on English/German translation, for example. Personally, I find thier efforts
brilliant, but as futile as the dying stages of the "ether" theory.

The "Young Turks" are the descendents of Schank's Conceptual Dependency
theory, Sowa, Schank, etc. These generally treat semantics as secondary 
to syntactics. Shanks company, Cognitive Systems, several years ago, produced
a system for a Belgian Bank that could translate six languages well enough
to act as an interface to an EFT system. The nice thing about concept based
parsing is that you don't have to fully map the sentence to understand it.
This is very useful in unrestricted input systems.

There is a small group that believes you ought to pick the input words off
of menus. It works, and with modifications, better than ANY of the other
methods, but frankly, I don't consider it natural language processing.

A lot of the concept based parsing techniques have fascinating consequences
in expert systems, database representation type work, and may very well
represent the next "hot button" in AI.

Randy Gordon     "Tao ku tse fun pee"

srt@CS.UCLA.EDU (05/27/87)

In article <529@iscuva.ISCS.COM> randyg@iscuva.UUCP (Randy Gordon) writes:
>
>The "Young Turks" are the descendents of Schank's Conceptual Dependency
>theory, Sowa, Schank, etc. These generally treat semantics as secondary 
>to syntactics.
>

I think you probably mean "syntactics as secondary to semantics".

The important distinction is that the "Old School" believes that there
exists semantics-free language principles (Universal Grammar, the Language
Acquisition Device and so on) and as a consequence study language separate
from meaning (e.g., semantics).

The "Young Turks" study language as a (natural) representation for internal
concepts (Schank's CD is more a theory of concept representation than of
linguistics).  As a consequence, the research focus in this group is on
the processes involved in turning language into concepts and vice versa.
It has turned out that "semantics" (in the sense of word meanings and
concept manipulation) has proven much more important to correctly
translating from language to concepts than syntactic information.

Unlike the "Old School" that tries to study syntax in isolation, the "Young
Turks" recognize the importance of syntax and use syntactic information
whenever appropriate.  The apparent bias arises because of the actual
relative importance of semantics vs. syntax, not because of any research
methodology.
 
  Scott R. Turner
  UCLA Computer Science     "But If I Graduate,
  Domain: srt@ucla.cs.edu       I'll Have to Work for a Living"
  UUCP:  ...!{cepu,ihnp4,trwspp,ucbvax}!ucla-cs!srt

code@sphinx.uchicago.edu (paul robinson wilson) (05/28/87)

In article <529@iscuva.ISCS.COM> randyg@iscuva.UUCP (Randy Gordon) writes:
>
>There are two major "camps" of natural language techniques. 
>
>The "Old School" is the syntactic/transformational grammer types, which 
>[...]
>The "Young Turks" are the descendents of Schank's Conceptual Dependency
>theory, Sowa, Schank, etc. These generally treat semantics as secondary 
>to syntactics...
  Uh, I think it's the other way around (Schankians treat semantics as
primary), but it's a very important point.

The posting that started this exchange suggested intro articles for
beginners.  I don't think this is an appropriate place for that, since
any good articles are not going to be really short (translation: they'll
cost the net a lot of money).

However, it seems like an excellent idea to post references to good intro
material, both for beginners and for teachers of beginners.

So how about it?  Any references to good intro articles on NLP, especially
ones that clearly discuss both the syntax-first and integrated approaches?

(Also, is James Allen's new book good?)

|=-=-=-=-   "COMPUTER TALKS WITH THE DEAD:  New microchip receives    -=-=-=-=|
|       brainwaves from beyond the grave" -- Weekly World News, 5/26/87       |
|  Paul R. Wilson   EECS Dept.(M/C 154)   UIC   Box 4348   Chicago, IL 60680  |
|    ARPA: uicbert!wilson@uxc.cso.uiuc.edu    UUCP: ihnp4!uicbert!wilson      |
|=-=-=-=-=-=-=-  if no answer try:  ihnp4!gargoyle!sphinx!code  -=-=-=-=-=-=-=|

hughes@endor.harvard.edu (Brian Hughes) (05/28/87)

In article <1116@houdi.UUCP> marty1@houdi.UUCP (M.BRILLIANT) writes:
	(summarized)
>In article <13263@watmath.UUCP>, erhoogerbeet@watmath.UUCP writes:
>> ...
>> Is there a Backus-Naur Form for the English language itself or is this too
>> complicated? ... Basically, what I am asking is it possible to do syntactic
>> checking as if "compiling" a sentence with rules set down in some BNF?

	Natural language is not context free (though some people disagree
on this). BNF formalisms cannot deal with context sensitive languages

>About 30 years ago when I was at MIT doing graduate study in EE, my
>wife was talking with a guy named Chomsky who wanted to do machine
>translation.  The effort resulted in new approaches to English grammar,
>but not in machine translation.

	While in a strict sense this is true, Chomsky's transformational
grammer seems to be almost universaly accepted as the basis upon which to
build models that deal with the syntax of natural languages. This is true
for computerized models as well as pure abstract models.

>We read, write, talk, listen, understand and misunderstand each other
>in this chaos because our brains are not structured like computers.

	While I agree that my brain is not structured like an IBM/PC,
(at least IBM hasn't asked me for royalties :-)) or any existing silicon
processor, I don't agree with the implication that a computer cannot,
a priori, understand natural language. We just haven't built intelligent
enough systems yet (hardware & software architectures). I realize that
Marty Brilliant may not have meant the implication I read; just wanted
to make my position clear.

>> As I understand it so far, natural language processing would have at least
>> two levels (syntactic, semantic) and that syntactic checking level would
>> be the basis of the other.
>
>True.

	But that's not the end of the matter. You also have to deal with the
higher levels of language organization, such as discourse (e.g., a
conversation). An utterance may refer back to an entity introduced in a
previous utterance in "shorthand" - by a pronoun, or short refering expression.
To understand that shorthand, we somehow are able to (unconsciously) retrieve
the full referant. At an even higher level, a discourse may go from one
concept to a sub-concept, engage in temporary diversions, and jump from
one thing to another, but we can understand it. People are starting to
explicate the rules of discourse grammer, buts lots remains to be done.


>Seriously, I would like to know what (in plain English, if possible) is
>going on in formal English grammar and natural language parsing.

	For a nice, powerful method of parsing English into a deep
structure representation, check out ATNs (Augmented Transition Networks).
Bill Woods developed this method in the late 60's. You can implement a
simple ATN in a couple of pages of LISP (see Jon Amsterdam's article in
AI Expert a few months ago, also online in AI Expert's BBS). For a
full ATN writeup, including code, see The Lunar Sciences Natural Language
Information System: Final Report by W.A. Woods, R.M. Kaplan, and
B. Nash-Webber, BBN Report #2378, BBN, 50 Moulton St., Cambridge, MA.
Also see Madeline Bates' article "The Theory and Practice of Augmented
Transition Networks" in Natural Language Communications with Computers,
Bolc, L., ed., Springer-Verlag, 1978
	For a lot of different views of language understanding, all the
way from syntax through discourse, see Readings in Natural Language
Processing, Grosz, B., Jones, K.S., and Webber, B., eds., 1986,
Morgan Kauffman. After reading this, read a transcript of a naturally
occuring informal conversation to see how far we have yet to go.
	Please excuse the length of this - I just finished 2 courses
in this stuff with Bill Woods and Barbara Grosz, & its still buzzing
in my head.

--------------------------------------------------------------------
Disclaimer: Correct statements mainly due to profs. Woods & Grosz,
stupidities all my own.

randyg@iscuva.ISCS.COM (Randy Gordon) (05/28/87)

*Sigh*, when I make an unconcious slip, its a doozy, aint it?

It was my third try at getting it past the article eater, and I mistyped.
OF COURSE its semantics more than syntactics...

ANyhow, some of the better intros:

THe best of the current crop seems to be Introduction to Natural Language
Processing by Mary D Harris. It is clear, well written and at a beginners level.
(approximately equivalent to a Scientific American article). C 1985, Reston.

Everyones mentioned Conceptual Structures, Information Processing in Mind and
Machine by J.F. Sowa. For some reason, I get a headache when I read it, which
is unfair since it is a very well written book, and appears to be easy.

Schank emits books at the speed of a turbo prolog compiler, and they are ALL
readable. If you are into Lisp (for which I pity you) then an rather old book,
Inside Computer Understanding, five programs plus minatures, Is about the
nicest intro to Conceptual Dependency around. It has very thorough
explanations plus programs written in pre common lisp. If you ignore the
instructions on how to implement primitves that are probably already supplied
by your Lisp (such as the MAp??? functions, the programs are easy to implement.


Wilensky has done some rather nice work, such as UC, a smart help unix
consultant natural language processor I have been dying to get my hands on,
and the PEARL, PAM, PANDORA class of programs. He has a new book out,
Planning and Understanding, A computational approach to human reasoning,
Addison Wesley, that I am just starting to read, but looks like a
fascinating example of the forefronts of the science.

I have been waiting FOREVER for Terry Winograds 2nd volumn of his Language
as a Cognitive Process, which does a wonderful, if slightly uneven job
of explaining syntax in his first volume. Its an excellent reference if
you are reading someone elses paper and trying to figure out what they
are talking about when they discuss some obscure grammar problem.


These should get you started. 


Randy Gordon, "No relation to the Idiot who spoonerized the previous message"

bds@mtgzz.UUCP (05/28/87)

In article <2112@husc6.UUCP>, hughes@endor.harvard.edu (Brian Hughes) writes:
> In article <1116@houdi.UUCP> marty1@houdi.UUCP (M.BRILLIANT) writes:
> 	(summarized)
> >In article <13263@watmath.UUCP>, erhoogerbeet@watmath.UUCP writes:
> >> ...
> >> Is there a Backus-Naur Form for the English language itself or is this too
> >> complicated? ... Basically, what I am asking is it possible to do syntactic
> >> checking as if "compiling" a sentence with rules set down in some BNF?
> 
> 	Natural language is not context free (though some people disagree
> on this). BNF formalisms cannot deal with context sensitive languages

	What I've seen in the literature is a definite trend towards
considering English as context free. I think it was Patrick Winston who
wrote in his AI text that examples of context sensitive English were
really examples of ambiguous English. Context free parsing algorithms
can handle ambiguous grammars, and in fact a BNF like formalism for
parsing English was done in the LSP system (described in a book I don't
have handy right now). The ambiguity in parses were
resolved through a "restriction language" which was essentially a
set of rules that avoided invalid parses (this is analogous to how many
C compilers handle parsing of "lvalues" - the grammar just knows about
expressions, the semantics worry about the "lvalue" quality).
The LSP grammar for English was quite large (but so are the ATNs; take
a peek at the one for LUNAR!) and was still evolving even as the book was
written. Still the issue has not been resolved to my knowledge.
	It's also worth while to look at some work being done
with WASP systems (aka Marcus parser - see Winston's AI book again).
There are serious arguments that WASP systems model human parsing of
English, and is being used as a basis for theories on how English is
learned (see "Acquisition of Syntactic Knowledge", by R.C. Berwick;
MIT Press 1985).

goldberg@su-russell.ARPA (Jeffrey Goldberg) (05/30/87)

In article <2112@husc6.UUCP> hughes@endor.UUCP (Brian Hughes) writes:
>In article <1116@houdi.UUCP> marty1@houdi.UUCP (M.BRILLIANT) writes:
>	(summarized)
>>In article <13263@watmath.UUCP>, erhoogerbeet@watmath.UUCP writes:
>>> ...
>>> Is there a Backus-Naur Form for the English language itself or is this too
>>> complicated? ... Basically, what I am asking is it possible to do syntactic
>>> checking as if "compiling" a sentence with rules set down in some BNF?

>	Natural language is not context free (though some people disagree
>on this). BNF formalisms cannot deal with context sensitive languages

I don't think that there is any serious disagreement here.  The
work done by Culy on Bambara reduplication and Shieber on Swiss
cross serial dependencies has convinced the last hold outs for CFness
(Geoff Pullum, Gerald Gazdar, and their students: me, Culy, etc).

>>About 30 years ago when I was at MIT doing graduate study in EE, my
>>wife was talking with a guy named Chomsky who wanted to do machine
>>translation.  The effort resulted in new approaches to English grammar,
>>but not in machine translation.

>	While in a strict sense this is true, Chomsky's transformational
>grammer seems to be almost universaly accepted as the basis upon which to
>build models that deal with the syntax of natural languages. This is true
>for computerized models as well as pure abstract models.

These is hardly true at all.  It is true that "generative grammar" is
nearly universally accepted, and this comes from Chomsky.  While
the most popular current generative theory is transformational
(Government and Binding theory), the role of transformations has
been reduced radically, and much more emphasis is placed on
interacting well formedness conditions on different levels of
representations.

Substantial minority theories, Generalized Phrase Structure
Grammar, and Lexical Functional Grammar, do not employ
transformations.

A summary of these three theories can be found in "Lectures
on Contemporary Syntactic Theories:  An Introduction to
government-Binding Theory, Generalized Phrase Structure Grammar,
and Lexical-Functional Grammar"  by Peter Sell.  Published by the
Center for the Study of Language and Information, and distributed
by Chicago University Press.

I have seen implementations based on LFG, GPSG (and an offshoot
of that) as well as some other not transformational models.  I
have only once seen a GB based parser.  It was very clever, but
it only parsed four sentences.

None of these theories were constructed with computer processing in
mind, but it does turn out that it is often easier to build a
parser based on nontransformation representations.  None of the
authors of these theories would claim that their theory was a
better linguistic theory because of this property.

>>> As I understand it so far, natural language processing would have at least
>>> two levels (syntactic, semantic) and that syntactic checking level would
>>> be the basis of the other.

I have seen parsers that build up semantic representations along
with the syntax in which there is no sense that the syntax is
prior.

Again, I am directing follow-ups to my follow-up to sci.lang.

-- 
Jeff Goldberg 
ARPA   goldberg@russell.stanford.edu
UUCP   ...!ucbvax!russell.stanford.edu!goldberg

tfra@ur-tut.UUCP (Tom Frauenhofer) (06/01/87)

[Et tu, line-eater?]

Actually, there is a (one-paragraph) discussion comparing BNF versus Transition
Networks in the latest issue of AI Magazine (Volume 8, Number 1).  It is part
of the article "YANLI: A Powerful Natural Language Front-End Tool" by
John C. Glasgow II.  It even includes an example of a BNF and a Transition
Network representation of the same grammar fragment.