[comp.lang.misc] Syntactical defininition of English

val@wsccs.UUCP (Val Kartchner) (10/12/88)

     Does someone out there have a syntactical definition of English.  I
     would like to build English language parsers for various purposes
     including adventure game authoring systems.
		Thanks in advance,
			-=:[ VAL ]:=-
-- 
----  /\  ----------------------------------------------------------------
     /\/\  .    /\     |  Val Kartchner  {UT@WSC}  |  'vi' must go, this
    /    \/ \/\/  \    |  #include <disclaimer.h>  |  is non-negotiable.
===/ U i n T e c h \===!ihnp4!utah-cs!utah-gr!uplherc!sp7040!obie!val=====

skh@hpclskh.HP.COM (10/19/88)

If you get one...post it.  Should be an AMAZING grammar to see, if not good for
a few laughs.

djones@megatest.UUCP (Dave Jones) (10/21/88)

I recall having seen a hardback book of a few hundred pages, filled
with English language productions, nothing else.  That was about ten
years ago.  I don't remember the name of it.  It wouldn't help much
in writing shells, I fear, but it might be interesting to look at again.


		Dave J.

dougs@hcx2.SSD.HARRIS.COM (10/24/88)

>/* Written  8:12 am  Oct 22, 1988 by jkim@uhccux.uhcc.hawaii.edu */
>> <<<<<<<<<<<<***<<<<<<<<<<<<***<<<<<<***>>>>>>***>>>>>>>>>>>>***>>>>>>>>>>>>
>Clay Bond wrote:
>
>> a CFL is not going to describe English.
>
>Could you tell us a convincing evidence for this?
>If you are going to bring up 50's argument based on a long-distant 
>dependency, I would recommend you to read first Gerald Gazdar (1982) Phrase
>structure grammar. In Pauline Jacobson and Geoffrey K. Pullum (eds),
>The Nature of Syntactic Representation. Dordrecht: D. Reidel, 131-186.
>/* End of text */

Context-free languages have enough trouble adequately describing
programming languages.  Sure, they can do a half-decent job on the written
syntax as it appears in the file.  But to use syntactical productions to
recognize things such as various data types in expressions, or even worse,
checking that the number of parameters agrees between a caller and a callee
is either too exhaustive to be useful or just simply beyond a CFL.  Hey, if
a context-free grammer can't recognize the regular expression

              x  y  z  y  x     (note: this requires a pushdown machine with
	     a  b  c  b  a             multiple stacks, more power than an 
				       automata equivalent to a CFL can be)

how the hell is it going to handle English, or Spanish, or whatever?
Remember, we must check proper pluralization, subject-verb agreement, all
that good stuff.  For programming languages, the CFL describes the written
syntax and the semantic actions fill in the context-sensitive features
we need.  My wild guess is that our minds use a context-sensitive grammar
with hundreds of thousands of semantic checks to fill in where the CSG 
is inadequate for our needs.

Doug Scofield                          dougs@ssd.harris.com
Harris Computer Systems        {uunet,mit-eddie,novavax}!hcx1!dougs
Ft. Lauderdale, FL                     voice: (305) 973 5340

pardo@june.cs.washington.edu (David Keppel) (10/25/88)

> somebody writes;
>>[ english grammar? ]

In article <44600003@hcx2> dougs@hcx2.SSD.HARRIS.COM writes:
> [...] But to use syntactical productions to recognize things such as
> various data types in expressions, or even worse, checking that the
> number of parameters agrees between a caller and a callee is either
> too exhaustive to be useful or just simply beyond a CFL.

Attribute grammars are a current research topic.  It is possible
(although "too exhaustive") to write an attribute grammar that
recognizes (semantically) Ada.  It runs to some thousand pages (whew!).

Here's another "goodie":  somebody fed the statement "Time flies like
an arrow" into a computer and the computer said:

* This is an analogy; time is a thing that moves in a way (flying)
  that is similar to the way that an arrow moves.
* Definition: "time files" are some species that have characteristics
  much like those of an arrow.
* Command: [go get a stopwatch and] time flies the same way that you
  would time an arrow.

If you think that's fun, the Lojban people enumerate something like
20 different ways to understand the phrase "pretty little girl's
school".  Lobjan is a synthetic language related to Loglan that is
designed to be unambiguous and machine-parseable; there *are* parsers
for Lojban, so quick, everybody run out and learn Lojban so we can
have "synthetic-language query systems" :-)

	;-D on  ( Eh?  I don't grok, Mike )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

hermit@shockeye.UUCP (Mark Buda) (10/26/88)

In article <960009@hpclskh.HP.COM> skh@hpclskh.HP.COM writes:
>If you get one...post it.  Should be an AMAZING grammar to see, if not good for
>a few laughs.

<utterance> ::= <word>*

Everything else is semantics, of course.
-- 
Mark Buda / Smart UUCP: hermit@shockeye.uucp / Phone(work):(717)299-5189
Dumb UUCP: ...rutgers!bpa!vu-vlsi!devon!shockeye!hermit
Entropy will get you in the end.
"A little suction does wonders." - Gary Collins

wsmith@m.cs.uiuc.edu (10/26/88)

>
>              x  y  z  y  x     (note: this requires a pushdown machine with
>	     a  b  c  b  a             multiple stacks, more power than an 
>				       automata equivalent to a CFL can be)
>

Don't you mean:
	  x   y   z   x   y
	a   b   c   a   b

instead?  (Technically, what you give is not a regular expression, either.)

The language you describe is generated by this context free grammar:

R ->   a R a | S ;
S ->   b S b | T ; 
T ->   T c   |   ;

Bill Smith
wsmith@cs.uiuc.edu
uiucdcs!wsmith

gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) (10/26/88)

In article <44600003@hcx2>, dougs@hcx2.SSD.HARRIS.COM writes:
> is either too exhaustive to be useful or just simply beyond a CFL.  Hey, if
> a context-free grammer can't recognize the regular expression
> 
>               x  y  z  y  x     (note: this requires a pushdown machine with
> 	     a  b  c  b  a             multiple stacks, more power than an 
> 				       automata equivalent to a CFL can be)
> 


1.  the expression above is not regular
2.  the expression above is easily expressed as a CFG:

     A -> B
     A -> a A a
     B -> C
     B -> b B b
     C ->
     C -> C c

3.  two stacks suffice for most recognition problems
4.  grammer [sic] is misspelled
5.  automata is plural
6.  why is everybody picking on this guy so much?  All he asked
    for was a CFG for English.  If I asked for a CFG for Pascal,
    would you hassle me about all the Pascal constructs that aren't
    context-free?
7.  The UNIX command "style" contains a yacc grammar for English.
    A paper is included in the supplementary UNIX documentation
    describing "style", but the source is not supplied with the
    BSD distribution.
-- 
Gordon V. Cormack     CS Dept, University of Waterloo, Canada N2L 3G1
                      gvcormack@waterloo  { .CSNET or .CDN or .EDU }
                      gvcormack@uwaterloo.CA
                      gvcormack@water         { UUCP or BITNET }

lee@uhccux.uhcc.hawaii.edu (Greg Lee) (10/28/88)

From article <44600003@hcx2>, by dougs@hcx2.SSD.HARRIS.COM:
" ...
" is either too exhaustive to be useful or just simply beyond a CFL.  Hey, if
" a context-free grammer can't recognize the regular expression
" 
"               x  y  z  y  x     (note: this requires a pushdown machine with
" 	     a  b  c  b  a             multiple stacks, more power than an 
" 				       automata equivalent to a CFL can be)
" 
" how the hell is it going to handle English, or Spanish, or whatever?

           x y z x y
Supposing a b c a b  was meant, then the answer is it's going to the
hell handle them if they don't the hell have such constructions.
Whether one does find such constructions in natural language is
debatable -- there is discussion in the linguistic literature going
back about a decade.  At least, it seems clear that they are not
common.

" Remember, we must check proper pluralization, subject-verb agreement, all
" that good stuff.

Since natural languages have grammatical agreement with repect to only
a finite (and rather small) number of categories, and since the
strings that separate agreeing items can be characterized by a finite
number of strings of category symbols, agreement does not pose a problem
in principle.

" For programming languages, the CFL describes the written
" syntax and the semantic actions fill in the context-sensitive features
" we need.

And so it may be for natural languages.

" My wild guess is that our minds use a context-sensitive grammar
" with hundreds of thousands of semantic checks to fill in where the CSG 
" is inadequate for our needs.

The proposal that natural languages are context free is also a guess,
at this point, but I think it's fair to say it's an educated guess.
There is some evidence against the proposal, but in my opinion this
evidence is rather marginal.  Other linguists have other opinions.

		Greg, lee@uhccux.uhcc.hawaii.edu

dougs@hcx2.SSD.HARRIS.COM (10/28/88)

>
>              x  y  z  y  x     (note: this requires a pushdown machine with
>	     a  b  c  b  a             multiple stacks, more power than an 
>				       automata equivalent to a CFL can be)
>                                      ^^^^^^^^ oops.  should be automaton.
						a most relevant mistake.

Yeah, I know I made a few typos with this expression.
It should have been
                   x  y  z  x  y    
		  a  b  c  a  b      x,y,z >= 0  

More than one stack in an automaton means that it is not equivalent
to a CFL.  It doesn't matter if there is only two.  Two is too many.


Doug Scofield                          dougs@ssd.harris.com
Harris Computer Systems        {uunet,mit-eddie,novavax}!hcx1!dougs
Ft. Lauderdale, FL                     voice: (305) 973 5340

[These are my mistakes _only_]