[net.nlang] Q: How can structure be learned? A: PDP

ix133@sdcc6.ucsd.EDU (Catherine L. Harris) (08/18/86)

(I'm cross-posting to net.ai because this article, part of Where Does
Structure Come From series, ends with an invitation to discussion PDP
(connectionist) models of cognition -- which would be best included in
net.cog-sci, except that there is no such group!)

Questions:

1.  Why are languages so similar?
2.  Why do children learning different language show the same acquisition 
    strategies and make the same patterns of errors?
3.  How can children possibly learn a grammar without being
    explicitly taught any rules?

Answers:

1.  They aren't (as much as was thought....)
2.  They don't (as much as was hoped) and when they do, it's not for
    the right reason.
3.  It's this new method.  See, they don't let the kids hear anything
    else except the one (or few) languages they're suppose to be 
    learning... 

(attempts at an) Explanation:

I'm aware that I really don't even scratch the surface of these 
questions, but I'll, uh, leave that as an exercise... a follow-up...)

1.  How similar languages are is very debatable.    One side alleges that
each year sees another few putative  language universals hit the dust.
I guess the other side states it differently.  (Other-siders...?)

2.  Chomsky's original explanations for similarities in acquisition
pattern was that children are born with information about the structure
of language -- that is, they are born with phrase structure trees; with
a Universal Grammar.  As the cross-linguistic data dribbled in, showing
that languages aren't as similar as was thought, this Grammar in the
Head idea was modified to become the current Parameter Setting
theory.   Kids come to the language learning task with information
about the range of different permissable forms that a language can
take.  Their job is to scan the input for clues as to whether they're
learning Turkish or English (or Chinese, Kaluli, Tagalog, Navahoe,
etc.).  Is their language one which requires strict word-order (e.g.,
English) or can word-order vary (Turkish)?  Can the subject of a
sentence be dropped if it's uninformative or redundant
(Italian) or is the subject obligatory (English -- which is why we say
"it's raining" rather than simply "raining").  When the child decides
that her language is one which allows word order to vary, she sets the
'word-order-vary?' flag to 'yes' and after a few crunches of the gears
whole classes of hypotheses about the structure of the target rule
system no longer have to be considered.

The problem with Chomsky's parameter setting model is that it
predicts (a) sudden, all-or-none decisions; (b) decisions carried out
in a pre-specified order; and (c) no opportunity to turn back once a
parameter is set.  Instead, what we find in the acquisition data is
that children cycle in and out of different hypotheses over months or
even years -- and even the adult "steady state" exhibits statistical 
variation that appears difficult to explain with discrete rules of the
re-write variety (or any kind of discrete rules, for that matter).

Chomsky predicts that the pattern of acquisition should
be similar across languages, because the child's early language
behavior is being driven, top-down fashion, from the
genetically-specified pool of hypotheses.  Instead, we find that early
language behavior shows an extreme mirroring of the input language.  It
looks like children's developing rule systems are being driven, bottom
up, by the input data.

(Digression:
Not only does the order and type of hypotheses vary between children
acquiring different languages, but it varies among children learning
the *same* language.  The amount of individual variation in language
learning is probably the same as the variation in learning any complex
skill -- people take their own strategies and make their own, often
incorrect, representations of the target domain.   So, some children
seem to take an "analytical" approach to language.  They focus on
trying to decompose the speech stream into its component parts and to
learn how those parts function; they learn nouns first and speak
"telegraphically".  Another strategy to learning language has been
called the "dramatic", "expression", or "holistic" approach.  These
kids  focus on language as a means to social goals; they learn
sentences as unanalyzed wholes, are more socially gregarious, and
make more use of imitation.)


	One Alternative to the Endogenous Structure View

Jeffrey Goldberg says (in an immediately preceding article),

> Chomsky has set him self up asking the question:  "How can children,
> given a finite amount of input, learn a language?"  The only answer
> could be that children are equipped with a large portion of language to
> begin with.  If something is innate than it will show up in all
> languages (a universal), and if something is unlearnable then it, too,
> must be innate (and therefore universal).

The important idea behind the nativist and language-modularity
hypotheses are that language structure is too complex, time is too
short, and the form of the input data (i.e., parent's speech to
children) is to degenerate for the target grammar to be learned.
Several people (e.g., Steven Pinker of MIT) have bolstered this
argument with formal "learnability" analyses:  you make an estimate of
the power of the learning mechanism, make assumptions about factors in
the learning situation (e.g., no negative feedback) and then
mathematically prove that a given grammar (a transformational grammar,
or a lexical functional grammar, or whatever) is unlearnable.

My problem with these analyses  -- and with nativist assumptions in
general -- is that they aren't considering a type of learning mechanism
that may be powerful enough to learn something as complex as a grammar,
even under the supposedly impoverished learning environment a child
encounters.  The mechanism is what Rumelhart and McClelland (of UCSD)
call the PDP approach (see their just-released from MIT Press, Parallel
Distributed Processing:  Explorations in the Microstructure of
Cognition).

The idea behind PDP (and other connectionist approaches to explaining
intelligent behavior)   is that input from hundred/thousands/millions
of information sources jointly combine to specify a result.   A
rule-governed system is, according to this approach, best represented
not by explicit rules (e.g., a set of productions or rewrite rules) but
by a large network of units:  input units, internal units, and output
units.  Given any set of inputs, the whole system iteratively "relaxes"
to a stable configuration (e.g., the soap bubble relaxing to
a parabola, our visual system finding one stable interpretation of
a visual illustion).

While  many/most people accept the idea that constraint-satisfaction
networks may underlie phenomenon like visual perception, they are more
reluctant to to see its applications to language processing or language
acquisition.  There are currently (in the Rumelhart and McClelland
work  -- and I'm sure you cognitive science buffs have already rushed
to your bookstore/library!) two convincing PDP models on language,
one on sentence processing (case role assignment) and the other on
children's acquisition of past-tense morphology.  While no one has yet
tried to use this approach to explain syntactic acquisition, I see this
as the next step.


For people interested in hard empirical, cross-linguistic data that
supports a connectionist, non-nativist, approach to acquisition, I
recommend *Mechanisms of Language Acquisition*, Brain MacWhinney Ed.,
in press.

I realize I rushed so fast over the explanation of what PDP is that
people who haven't heard about it before may be lost.   I'd like to see
a discussion on this -- perhaps other people can talk about the brand
of connectionism they're encountering at their school/research/job and
what they think its benefits and limitations are  -- in
explaining the psycholinguistic facts or just in general.

_______
Cathy Harris 	"Sweating it out on the reaction time floor -- what,
		when you could be in that ole armchair theo-- ? Never mind;
		it's only til 1990!"

goldberg@SU-Russell.ARPA (Jeffrey Goldberg) (08/19/86)

I wish to make it clear that my onw opinions are not reflected by
the quote made...
In article <2814@sdcc6.ucsd.EDU> ix133@sdcc6.ucsd.EDU (Catherine L. Harris) writes:

>Jeffrey Goldberg says (in an immediately preceding article),
>
>> Chomsky has set him self up asking the question:  "How can children,
>> given a finite amount of input, learn a language?"  The only answer
>> could be that children are equipped with a large portion of language to
>> begin with.  If something is innate than it will show up in all
>> languages (a universal), and if something is unlearnable then it, too,
>> must be innate (and therefore universal).
>
In that paragraph, I was presenting the Chomsky view (and
ridiculing it).  For those of you who didn't not see my original
posting, it is in net.nlang.

I will refrain from presenting a lengthy response to Harris's
posting.  (I have work to do, and I sent more over the net in the
past week then I ahve in my entire life.)  But I will say that her
attack on language universals is an attack on Chomsky, and there
are people (linguists even) who believe in language universals, but
share her objections to the Chomsky line.  I realize that my
original posting was very long (and should have been edited down),
but I would suggest to Cathrine Harris that she make a hard copy of
it, and read it more carefully.  She will find that we agree more
than we disagree.

>Cathy Harris 	"Sweating it out on the reaction time floor -- what,
>		when you could be in that ole armchair theo-- ? Never mind;
>		it's only til 1990!"

-Jeff Goldberg {ucbvax, pyramid}!glacier!russell!goldberg
-- 
/* 
**  Jeff Goldberg (best reached at GOLDBERG@SU-CSLI.ARPA)
*/