[sci.lang] Infinte alphabets

lee@uhccux.UUCP (Greg Lee) (10/15/87)

The number of phonetic characters must be finite if language is context-free.
There are apparent context sensitivities in pronunciation, such as for
some English speakers, pronouncing "band" as "bam" before a word beginning
with "p", as in "The bam played on".  Yet such context sensitivities may
be represented in a context-free grammar, provided that the sensitive item
and its context are not separated by an unbounded number of constituent
boundaries.  And as a matter of fact, "bam" and "p" in the example cannot
be so separated.  Thus, apparently, language is context-free.

Perhaps, though, language is only almost context free, failing to be
so through having an infinite number of terminal symbols.

Greg Lee, Lee@uhccux.uhcc.hawaii.edu

goldberg@russell.STANFORD.EDU (Jeffrey Goldberg) (10/16/87)

In article <959@uhccux.UUCP> lee@uhccux.UUCP (Greg Lee) writes:

>The number of phonetic characters must be finite if language is context-free.

That is true, but it is uninformative.  Any recursively enumerable
language will have a finite terminal vocabulary.

>There are apparent context sensitivities in pronunciation, such as for
>some English speakers, pronouncing "band" as "bam" before a word beginning
>with "p", as in "The bam played on".

"Context Sensitive" and "Context Free" are technical terms.  You
can get into trouble by using both the technical and non technical
meanings in the same argument.

>Yet such context sensitivities may
>be represented in a context-free grammar, provided that the sensitive item
>and its context are not separated by an unbounded number of constituent
>boundaries.

Dependencies over arbitary distance can be maintained in a CF
grammar:

S -> a B a
S -> c B c
B -> b B
B -> b

(lowercase letters are terminal symbols.)

>Perhaps, though, language is only almost context free, failing to be
>so through having an infinite number of terminal symbols.

The linguistics literature from 1960 to 1985 is filled with
arguments about why NLs are not CF.  All of those arguments had
holes in them.  In 1985, however, two mathematically well founded
arguments were published in the "Linguistics and Philosophy".  One
by Christopher Culy and the other by Stuart Shieber.

The whole question raised had to do with writing systems, and I
don't quite see what this has to do with the (mathematical) class
languages fall into.

>Greg Lee, Lee@uhccux.uhcc.hawaii.edu

-- 
Jeff Goldberg 
ARPA   goldberg@russell.stanford.edu
UUCP   ...!ucbvax!russell.stanford.edu!goldberg

lee@uhccux.UUCP (Greg Lee) (10/17/87)

In article <434@russell.STANFORD.EDU> goldberg@russell.STANFORD.EDU
 (Jeffrey Goldberg) writes:
>In article <959@uhccux.UUCP> lee@uhccux.UUCP (Greg Lee) writes:
>[stuff ...]

The commentary on my posting seems to me to be rather obtuse.  Rather
than proceeding point by point, I think I should restate the matter.
Perhaps this time I can make myself clearer.  Before doing that, I'll
explain that by "separated by ... constituent boundaries"  I meant
to refer to the circumstance that one item is within a constituent
and another is outside that constituent.

Now, to begin again.  Gerald Gazdar made the point that a local
context sensitivity, such as a transitive verb occurring always
with a sister noun phrase, does not prevent a language from being
context free.  And since there is evidence that such subcategorization
context sensitivity is in fact local in natural languages, this provides
us some evidence (not probative) that language is context free.

Suppose now that we apply the same reasoning in the case of phonology.
If language is phonologically context free, we would predict that
context sensitivities are local.  It is a commonplace that phonological
rules tend not to apply across major constituent breaks.  The prediction
appears to be correct, and so we have some evidence (not probative)
that language is phonologically context free.

Because of the qualification in my original posting, I concede in
advance that this does not provide a very compelling argument for
finiteness of alphabets.  But the logic, at least, seems clear
enough:
	language is context free (hypothesis supported by empirical
				  argument)
	if language is context free, the alphabet is finite (obvious)
	therefore, the alphabet is finite (well known rule of logic)

(We could talk about the logic in the Schieber article sometime,
if you like.)

Greg Lee, lee@uhccux.uhcc.hawaii.edu

goldberg@russell.STANFORD.EDU (Jeffrey Goldberg) (10/18/87)

In article <969@uhccux.UUCP> lee@uhccux.UUCP (Greg Lee) writes:
>In article <434@russell.STANFORD.EDU> goldberg@russell.STANFORD.EDU
> (Jeffrey Goldberg) writes:
>>In article <959@uhccux.UUCP> lee@uhccux.UUCP (Greg Lee) writes:
>>[stuff ...]

> [ ... ]

>Now, to begin again.  Gerald Gazdar made the point that a local
>context sensitivity, such as a transitive verb occurring always
>with a sister noun phrase, does not prevent a language from being
>context free.  And since there is evidence that such subcategorization
>context sensitivity is in fact local in natural languages, this provides
>us some evidence (not probative) that language is context free.

Gazdar also deals with so-called "Unbounded" Dependencies.  See
"Unbounded Dependcies and Coordinate Structures" in _Linguistic
Inquiry_ 12, 1981.  Also, "Generalized Phrase Structure Grammar"
(Gazdar et al, 1985).

>Suppose now that we apply the same reasoning in the case of phonology.
>If language is phonologically context free, we would predict that
>context sensitivities are local.  It is a commonplace that phonological
>rules tend not to apply across major constituent breaks.  The prediction
>appears to be correct, and so we have some evidence (not probative)
>that language is phonologically context free.

I'm not sure that I follow this argument.  Are you saying: "All
these things involve local dependencies and therefore ought to be
dealt with by a CF grammar."?  That same argument could be applied
equally well (or badly) to conclude that NLs are finite state.

>Because of the qualification in my original posting, I concede in
>advance that this does not provide a very compelling argument for
>finiteness of alphabets.  But the logic, at least, seems clear
>enough:
>	language is context free (hypothesis supported by empirical
>				  argument)
>	if language is context free, the alphabet is finite (obvious)
>	therefore, the alphabet is finite (well known rule of logic)

I'm afraid the logic isn't clear to me.  If a language is is CF (or
CS for that matter) its terminal vocabulary is finite.  Agreed.
But it is not obvious to me how you get from there to concluding
that the WRITING system of a language must employ a finite
alphabet.

>(We could talk about the logic in the Schieber article sometime,
>if you like.)

I think that that would be a refreshing change of topic.

Anyway, I too believe that there are no writing systems based on
infinite alphabets.  But I have nothing more to add to what I have
already said on that topic.

>Greg Lee, lee@uhccux.uhcc.hawaii.edu

-jeff goldberg

-- 
Jeff Goldberg 
Internet:   goldberg@russell.stanford.edu
UUCP   ...!ucbvax!russell.stanford.edu!goldberg
-- 
Jeff Goldberg 
ARPA   goldberg@russell.stanford.edu
UUCP   ...!ucbvax!russell.stanford.edu!goldberg

lee@uhccux.UUCP (Greg Lee) (10/18/87)

In article <448@russell.STANFORD.EDU> goldberg@russell.STANFORD.EDU
 (Jeffrey Goldberg) writes:
*In article <969@uhccux.UUCP> lee@uhccux.UUCP (Greg Lee) writes:
*>In article <434@russell.STANFORD.EDU> goldberg@russell.STANFORD.EDU
*> (Jeffrey Goldberg) writes:
*>>In article <959@uhccux.UUCP> lee@uhccux.UUCP (Greg Lee) writes:
*>>[stuff ...]
*>...
*>Now, to begin again.  Gerald Gazdar made the point that a local
*>...
*
*Gazdar also deals with so-called "Unbounded" Dependencies.  See
*"Unbounded Dependcies and Coordinate Structures" in _Linguistic
*Inquiry_ 12, 1981.  Also, "Generalized Phrase Structure Grammar"
*(Gazdar et al, 1985).

He does not deal with all unbounded dependencies, because they cannot
in general be expressed in a CFPSG.  If there is a problem with the
argument I gave it would hinge on whether unbounded phonological
dependencies are of the sort which can be expressed in a CFPSG.  I don't
know whether they are.  For the special case of a transparently
applying phonological rule, i.e. one whose conditions are met in the
pronunciation, it seems to me that unbounded phonological dependencies
could indeed be expressed, by the device of "foot initial features"
which propagate up the tree from left-most daughters, "foot final
features" which propagate up from right-most daughters, and appropriate
agreement rules. So, upon more careful consideration, I would say my
argument was incomplete, at best.

*
*>Suppose now that we apply the same reasoning in the case of phonology.
*>...
*
*I'm not sure that I follow this argument.  Are you saying: "All
*these things involve local dependencies and therefore ought to be
*dealt with by a CF grammar."?  That same argument could be applied
*equally well (or badly) to conclude that NLs are finite state.
*
Yes, that is what I'm saying.  And of course regular languages are
context free, so that possibility is included.  So far as I'm aware,
all the evidence that has been cited in favor of language being context
free also supports the hypothesis that language is regular. (Remember
Ingve's "A Model and an Hypothesis for Language Behavior"?)

*>Because of the qualification in my original posting, I concede in
*>...
*I'm afraid the logic isn't clear to me.  If a language is is CF (or
*CS for that matter) its terminal vocabulary is finite.  Agreed.
*But it is not obvious to me how you get from there to concluding
*that the WRITING system of a language must employ a finite
*alphabet.
*
alphabet = segments of adequate transcription system

*>(We could talk about the logic in the Schieber article sometime,
*>if you like.)
*
*I think that that would be a refreshing change of topic.
*
Look at his footnote 4.  Is he right that the optionality of
objects makes no difference to his argument?  One can't simply
take an intersection with a language `... some number of verbs ...
same number of nouns' because this is not regular.

*Anyway, I too believe that there are no writing systems based on
*infinite alphabets.  But I have nothing more to add to what I have
*already said on that topic.
*...

Greg Lee, lee@uhccux.uhcc.hawaii.edu

hunt@spar.SPAR.SLB.COM (Neil Hunt) (10/19/87)

Hey guys !

>Subject: Re: Infinte alphabets (and CFness of NLs)
              ^^^^^^^

I thought I had 'K'illed this topic, but it keeps recurring !

I propose a new discussion of the infiniteness of spelling errors
and possible new additions to 'rn' to filter out all the possible
variants of any subject line due to such errors.

Neil/.