[comp.ai] Langendoen and Postal

berke@ucla-cs.UUCP (11/01/87)

I just read this fabulous book over the weekend, called "The Vastness
of Natural Languages," by D. Terence Langendoen and Paul M. Postal.

If you have read this, I have some questions, and could use some help,
especially on the more Linguistics aspects of the book.

Are Langendoen or Postal on the net somewhere?  They might be in England,
the Publisher is Blackwell 1984.  

Their basic proof/conclusion holds that natural languages, as linguistics
construes them (as products of grammars), are what they call mega-collections,
Quine calls proper classes, and some people hold cannot exist.  That is,
they maintain that (1) Sentences cannot be excluded from being of any,
even transfinite size, by the laws of a grammar, and (2) Collections of
these sentences are bigger than even the continuum.  They are the size
of the collection of all sets:  too big to be sets.

It's wonderfully written.  Clear wording, proofs, etc.  Good reading.
Help!

Regards,
Pete

rapaport@sunybcs.uucp (William J. Rapaport) (11/02/87)

In article <8941@shemp.UCLA.EDU> berke@CS.UCLA.EDU (Peter Berke) writes:
>I just read this fabulous book over the weekend, called "The Vastness
>of Natural Languages," by D. Terence Langendoen and Paul M. Postal.
>
>Are Langendoen or Postal on the net somewhere?

Langendoen used to be on the net as: tergc%cunyvm@wiscvm.wisc.edu

but he's moved to, I think, U of Arizona.  Postal, I think, used to be
at IBM Watson.

turpin@ut-sally.UUCP (Russell Turpin) (11/02/87)

In article <8941@shemp.UCLA.EDU>, berke@CS.UCLA.EDU writes:
> I just read this fabulous book over the weekend, called "The Vastness
> of Natural Languages," by D. Terence Langendoen and Paul M. Postal.
> ...
> 
> Their basic proof/conclusion holds that natural languages, as linguistics
> construes them (as products of grammars), are what they call mega-collections,
> Quine calls proper classes, and some people hold cannot exist.  That is,
> they maintain that (1) Sentences cannot be excluded from being of any,
> even transfinite size, by the laws of a grammar, and (2) Collections of
> these sentences are bigger than even the continuum.  They are the size
> of the collection of all sets:  too big to be sets.

Let me switch contexts. I have not read the above-mentioned book,
but it seems to me that this claim is just plain wrong. I would
think a minimum requirement for a sentence in a natural language
is that some person who knows the language can read and
understand the sentence in a finite amount of time. This would
exclude any infinitely long sentences. Perhaps less obviously, it
also excludes infinite languages. The reason is that there will
never be more than a finite number of people (ET's included), and
that each will fail to parse sentences beyond some maximum
length, given a finite life for each. (I am not saying that
natural languages include only those sentences that are in fact
spoken and understood, but that only those sentences that could
be understood are included.)

In this view, infinite languages are solely a mathematical
construct.

Russell

lee@uhccux.UUCP (Greg Lee) (11/03/87)

In article <9445@ut-sally.UUCP> turpin@ut-sally.UUCP (Russell Turpin) writes:
>In article <8941@shemp.UCLA.EDU>, berke@CS.UCLA.EDU writes:
>> I just read this fabulous book over the weekend, called "The Vastness
>> of Natural Languages," by D. Terence Langendoen and Paul M. Postal.
>> ...
>
>Let me switch contexts. I have not read the above-mentioned book,
>but it seems to me that this claim is just plain wrong. I would
> ...
>also excludes infinite languages. The reason is that there will
>never be more than a finite number of people (ET's included), and
> ...
>Russell

Although the number of sentences in a natural language might be
finite, the most appropriate model for human language processing
might reasonably assume the contrary.  Suppose, for instance, that
we wish to compare the complexities of various languages with
regard to how easily they could be used by humans, and that we
take the number of phrase structure rules in a phrase structure
grammar as a measure of such complexity.  A grammar to generate
100,000 sentences of the pattern "Oh boy, oh boy, ...!" would be
much more complex than a grammar to generate an infinite number
of such sentences.  And the pattern seems easy enough to learn ...

Concerning the length of sentences, I think Postal and Langendoen
are not very persuasive.  Most of their arguments are to the
effect that previously given attempted demonstrations that
sentences cannot be of infinite length are incorrect.  I think
they make that point very well.  But obviously this is not
enough To show that one should assume some sentences of infinite
length.
	Greg Lee, lee@uhccux.uhcc.hawaii.edu

djh@beach.cis.ufl.edu (David J. Hutches) (11/03/87)

In article <9445@ut-sally.UUCP> turpin@ut-sally.UUCP (Russell Turpin) writes:
>In article <8941@shemp.UCLA.EDU>, berke@CS.UCLA.EDU writes:
>> ... That is,
>> they maintain that (1) Sentences cannot be excluded from being of any,
>> even transfinite size, by the laws of a grammar, and (2) Collections of
>> these sentences are bigger than even the continuum.  They are the size
>> of the collection of all sets:  too big to be sets.
>
>... I would
>think a minimum requirement for a sentence in a natural language
>is that some person who knows the language can read and
>understand the sentence in a finite amount of time. This would
>exclude any infinitely long sentences.
>
>Russell

Because of the processing capabilities of human beings (actually, on a
person-by-person basis), sentences of greater and greater length (and
complexity) are more and more difficult to understand.  Past a certain
point, a human being will go into cognitive overload when asked to
process a sentence which his or her capacities (short-term memory, stack
space, whatever you want to call it) are not designed to handle.  What
the human being can, in practice, process and what is *possible* in a
language are two different things.  I think that it is the case that
some theories of language/grammar explain the production of sentences
which are grammatical by use of a generative model.  In such a model, it
is possible to generate sentences of potentially infinite length, even
though it would not be possible for a human being to understand them.

== David J. Hutches                                           CIS Department ==
==                                                     University of Florida ==
== Internet:  djh@beach.cis.ufl.edu                   Gainesville, FL  32611 ==
== UUCP:  ...{ihnp4,rutgers}!codas!ufcsv!ufcsg!djh            (904) 335-8049 ==

goldfain@osiris.cso.uiuc.edu (11/04/87)

> /* Written 10:34 am  Nov  1, 1987 by berke@CS.UCLA.EDU in comp.ai */
> /* ---------- "Langendoen and Postal (posted by: B" ---------- */
> I just read this fabulous book over the weekend, called "The Vastness
> of Natural Languages," by D. Terence Langendoen and Paul M. Postal.
>    ...
> Their basic proof/conclusion holds that natural languages, as linguistics
> construes them (as products of grammars), are what they call
> mega-collections, Quine calls proper classes, and some people hold cannot
> exist.  That is, they maintain that (1) Sentences cannot be excluded from
> being of any, even transfinite size, by the laws of a grammar, and (2)
> Collections of these sentences are bigger than even the continuum.  They are
> the size of the collection of all sets: too big to be sets.
>            ...
> /* End of text from osiris.cso.uiuc.edu:comp.ai */

Hang on a minute!   It *sounds* as though  you are talking  about Context-Free
Grammars/Languages (CFGs/CFLs) here.  Most linguists  (I'd wager) set up their
CFGs as admitting  only finite derivations  over a  finite  set of  production
rules, each rule only allowing finite expansion.  Thus, although usually a CFL
is only a  proper subset of  this, we are   ALWAYS working  WITHIN the  set of
finite strings (of  arbitrary  length) over a  finite alphabet.
     Such a set is countably infinite.  Far from being a proper class, this is
a very  manageable set.  If you  move the discussion  up to the cardinality of
the  set of "discourses", which would   be finite  sequences of strings in the
language, you are still only  up to  the power set of the  integers, which has
the same cardinality  as the  set of Real numbers.  Again,  this is a set, and
not a proper class.
     I haven't seen the book you cite.  They must make some argument as to why
they  think natural  languages  (or linguistic   theories about  them)   admit
infinite sentences.  Even given that, we  would have only  the Reals (i.e. the
"Continuum") as a cardinality without some further surprising claims.  Can you
summarize their argument (if it exists) ?

           Mark Goldfain                    arpa: goldfain@osiris.cso.uiuc.edu
           Department of Computer Science
           University of Illinois at Shampoo-Banana

spector@suvax1.UUCP (Mitchell Spector) (11/07/87)

In article <8300011@osiris.cso.uiuc.edu>, goldfain@osiris.cso.uiuc.edu
comments on an article by berke@CS.UCLA.EDU:
> 
> > /* Written 10:34 am  Nov  1, 1987 by berke@CS.UCLA.EDU in comp.ai */
> > /* ---------- "Langendoen and Postal (posted by: B" ---------- */
> > I just read this fabulous book over the weekend, called "The Vastness
> > of Natural Languages," by D. Terence Langendoen and Paul M. Postal.
> >    ...
> > Their basic proof/conclusion holds that natural languages, as linguistics
> > construes them (as products of grammars), are what they call
> > mega-collections, Quine calls proper classes, and some people hold cannot
> > exist.  That is, they maintain that (1) Sentences cannot be excluded from
> > being of any, even transfinite size, by the laws of a grammar, and (2)
> > Collections of these sentences are bigger than even the continuum.  They are
> > the size of the collection of all sets: too big to be sets.
> >            ...
> > /* End of text from osiris.cso.uiuc.edu:comp.ai */
> 
> Hang on a minute!   It *sounds* as though  you are talking  about Context-Free
> Grammars/Languages (CFGs/CFLs) here.  Most linguists  (I'd wager) set up their
> CFGs as admitting  only finite derivations  over a  finite  set of  production
> rules, each rule only allowing finite expansion.  Thus, although usually a CFL
> is only a  proper subset of  this, we are   ALWAYS working  WITHIN the  set of
> finite strings (of  arbitrary  length) over a  finite alphabet.
>      Such a set is countably infinite.  Far from being a proper class, this is
> a very  manageable set.  If you  move the discussion  up to the cardinality of
> the  set of "discourses", which would   be finite  sequences of strings in the
> language, you are still only  up to  the power set of the  integers, which has
> the same cardinality  as the  set of Real numbers.  Again,  this is a set, and
> not a proper class.
> 
>            Mark Goldfain                    arpa: goldfain@osiris.cso.uiuc.edu
>            Department of Computer Science
>            University of Illinois at Shampoo-Banana

   The set of all finite sequences of finite strings in a language (the set
of "discourses") is still just a countably infinite set (assuming that the
alphabet is finite or countably infinite, of course).  The set of infinite
sequences of finite strings is uncountable, with the same cardinality as the
set of real numbers, as is the set of infinite strings.  (By infinite string
or infinite sequence, I mean an object which is indexed by the natural
numbers 0, 1, 2, ....)

   In general, sets of finite objects are finite or countably infinite.
(A finite object is, vaguely speaking, one that can be identified by means
of a finite representation.  More specifically, this finite representation
or description must enable you to distinguish this object from all the
other objects in the set.)  If you want to get an uncountable set, you
must use objects which are themselves infinite as members of the set.
Many people lose sight of the fact that a real number is an infinite object
(although an integer or a rational number is a finite object).  Any general
method of identifying real numbers must use infinitely long or large
representations (for example, decimals, continued fractions, Cauchy
sequences, or Dedekind cuts).  Real numbers are much more difficult to
pin down than one might gather from many math classes.  This misimpression
is partly due to the fact that one deals only with a relative small (finite!)
set of specific real numbers; these either have their own names in
mathematics or they can be defined by a finite sequence of symbols in the
usual mathematical notation.  The other real numbers belong to a nameless
horde which we use in general arguments but never by specific mention.

   I certainly agree with the general objections raised to the idea that
natural languages are uncountably large (or, worse yet, proper classes),
although I haven't read the book in question.  Maybe somebody can state
more precisely what the book claimed, but it seems at first glance to
indicate a lack of understanding of modern set theory.

   By the way, logicians do study infinite languages, including both the
possibility of infinitely many symbols and that of infinitely long
sentences, but such languages are very different from what we think of
as "natural language."  It doesn't matter whether you're talking about
context-free languages or more general sorts of languages -- in any
language used by people for communication, the alphabet is finite,
each word is finitely long, and each sentence is finitely long.

-- 
Mitchell Spector                                        |"Give me a
Dept. of Computer Science & Software Eng., Seattle Univ.|     ticket to
Path:    ...!uw-beaver!uw-entropy!dataio!suvax1!spector |             Mars!!"
  or:   dataio!suvax1!spector@entropy.ms.washington.edu | -- Zippy the Pinhead

goldfain@osiris.cso.uiuc.edu (11/12/87)

< /* Written  7:47 pm  Nov  6, 1987 by spector@suvax1.UUCP in comp.ai */
< In article <8300011@osiris.cso.uiuc.edu>, goldfain@osiris.cso.uiuc.edu
< comments on an article by berke@CS.UCLA.EDU:
< >       ...
< > Such  a  set  is countably  infinite.   Far from being   a proper class,
< > this is a very manageable  set.  If you  move the discussion   up  to the
< > cardinality of the set of "discourses", which would be finite sequences of
< > strings in the language, you  are still only  up to the  power set of the
< > integers, which  has the same  cardinality  as  the  set of  Real numbers.
< > Again, this is a set, and not a proper class.
< >            Mark Goldfain               arpa: goldfain@osiris.cso.uiuc.edu
< ---------------------
<    The set of all finite sequences of finite strings in a language (the set
< of "discourses") is still just a countably infinite set (assuming that the
< alphabet is finite or countably infinite, of course).  The set of infinite
< sequences of finite strings is uncountable, with the same cardinality as the
< set of real numbers, as is the set of infinite strings.
<          ...
<   I certainly agree with the general objections raised to the idea that
< natural languages are uncountably large (or, worse yet, proper classes),
< although I haven't read the book in question.  Maybe somebody can state
< more precisely what the book claimed, but it seems at first glance to
< indicate a lack of understanding of modern set theory.
<          ...
< Mitchell Spector                                        |"Give me a
< Dept. of Computer Science & Software Eng., Seattle Univ.|     ticket to
< Path:    ...!uw-beaver!uw-entropy!dataio!suvax1!spector |             Mars!!"
< or:   dataio!suvax1!spector@entropy.ms.washington.edu   | - Zippy the Pinhead
< ------------------

OOPS---OOPS---OOPS---OOPS---OOPS---OOPS----OOPS---OOPS----OOPS---OOPS----OOPS
 Mitchell Spector is correct!   I  must have  been thinking very sluggishly,
 and  I hope none of  my professors (past or present)  is  watching.  In any
 case, we   still   have  our  basic objection,   only  it  is   now greatly
 strengthened.   Indeed, it  is  a well-known   and oft-used proposition  in
 computing   theory that the number  of   things that  can   be   said in  a
 finite-alphabet language is COUNTABLE.
ENDOOPS---ENDOOPS---ENDOOPS---ENDOOPS---ENDOOPS---ENDOOPS---ENDOOPS---ENDOOPS

Continuing the "attack" ... recall that the original posting said:

> /* ---------- "Langendoen and Postal ---------- */
>     ...
> Their basic proof/conclusion  holds that natural  languages, as  linguistics
> construes  them   (as   products  of   grammars),   are   what   they   call
> mega-collections,  Quine calls proper  classes,  and some people hold cannot
> exist.   That is, they  maintain that  (1) Sentences cannot be excluded from
> being of  any, even   transfinite size, by the  laws  of a grammar, and  (2)
> Collections of these sentences are bigger than even the continuum.  They are
> the size of the collection of all sets: too big to be sets.         ...

My  follow-up   and Mitchell Spector's   note explained why  this statement is
incorrect.  Note that from the above, the  issue is not really "real English",
but the size of a language as SPECIFIED BY A GRAMMAR.

I then asked:

> I haven't seen the book you cite.  They must make some argument as to why
> they  think natural  languages  (or linguistic   theories about  them)
> admit infinite sentences.  Even given that, we  would have only  the Reals
> (i.e. the "Continuum") as a cardinality without some further surprising
> claims.  Can you summarize their argument (if it exists) ?

The only response so far that hints as to their arguments is that posted by
lee@uhccux.UUCP :

>  Concerning the length of sentences,  I think Postal  and Langendoen are not
> very persuasive.  Most of their arguments are  to the effect that previously
> given attempted demonstrations that  sentences cannot  be of infinite length
> are incorrect.  I think they make that  point very well.  But obviously this
> is not enough  To show that  one  should assume  some sentences  of infinite
> length.
>         Greg Lee, lee@uhccux.uhcc.hawaii.edu

Still, without more specifics, this whole argument may  continue to be way out
in left field.  At the risk of knocking down a mere "straw man",  consider: If
these experiments   set a  size,   say 100   words  (or more  reasonably   100
constituents), then  proceeded  to test  subjects, finding  that   this was an
insufficient upper bound, then tried 200, failing again, then tried 500, still
finding subjects who thought the sentences were grammatical, then:

1) I am really, really surprised.  I think the experiments should be
   replicated simply because I find the above too ludicrous to swallow.
2) We still have a LONG ways to go before we conclude that NO upper bound
   exists.  (It is like the humorous story about the "engineer's proof" that
   all odd numbers > 1 are prime :  "Let's see, 3 is prime, 5 is prime, 7 is
   prime, ... Yep!  All of 'em must be!")  Let them test a LARGE number like
   10,000 and get back to me when the results come in ...
3) Finally, the notion of a set with no upper bound is a different
   mathematical beast than a set containing non-finite elements!  Consider the
   positive integers; there is no largest integer, but that does not mean that
   any of them must be infinite ... in fact, none of them are.

Note: The last point is rather briefly stated - if you  haven't run  across it
in a course   or somewhere before,  it  deserves a moment  or two  to sink in.
These concepts generally get their first airing in Calculus courses,  and many
students tend to really grasp them only after about their 2nd Calculus course.

                    -----------------------------------

From here, further discussion seems pointless, unless some of the actual data
and claims from Langendoen and Postal are put forth.

                                                        - Mark Goldfain