[comp.ai.neural-nets] S. Pinker / A. Prince

clutx.clarkson.edu (Roger Gonzalez,,,) (03/23/89)

I just read a rather scathing article by Steven Pinker (MIT) and
Alan Prince (Brandeis) that tore PDP apart.  Has anyone seen any
responses to this article that defend Rumelhart and McClelland?
The article bummed me out, and I'm not up enough on language
processing to really debate what they claim.

++ Roger Gonzalez ++ spam@clutx.clarkson.edu ++ Clarkson University ++

   "Just like I've always said; there's nothing an agnostic can't do
    if he's not sure he believes in anything or not!" - Monty Python

kortge@Portia.Stanford.EDU (Chris Kortge) (03/23/89)

In article <2726@sun.soe.clarkson.edu> spam@clutx.clarkson.edu writes:
>I just read a rather scathing article by Steven Pinker (MIT) and
>Alan Prince (Brandeis) that tore PDP apart.  Has anyone seen any
>responses to this article that defend Rumelhart and McClelland?
>The article bummed me out, and I'm not up enough on language
>processing to really debate what they claim.
>

The article doesn't tear PDP apart at all (I assume you mean the one in
_Cognition_, published also as a book, "Connections and Symbols").  What it
does do is tear apart one very simple model of a complex phenomenon (learning
of past tenses).  As far as I could tell, virtually all its criticisms
could be answered with a multilayer network model; that is, the faults
of the R&M model derive mostly from the fact that it's just a single
layer associative network, with hand-wired representations.  As I
understand it (having heard Dave Rumelhart's response),
the R&M model was never intended to be anything near a
complete model of tense acquisition--rather, it was mainly supposed to
demonstrate that "rule-like" behavior can coexist with "special case"
behavior in the same set of connecting weights.

If you want to read an article which _attempts_ to tear PDP _in general_
apart, read the one by Fodor and Pylyshyn in the same book.  It didn't
make much sense to me, but then I guess I don't have the MIT
perspective.  If someone really wants to blast PDP, there are _real_
problems with it, like scaling of learning time, which make more
sense to focus on than the things F&P talk about.

Chris Kortge
kortge@psych.stanford.edu

manj@brand.usc.edu (B. S. Manjunath) (03/24/89)

In article <2726@sun.soe.clarkson.edu> spam@clutx.clarkson.edu writes:
>I just read a rather scathing article by Steven Pinker (MIT) and
                               ^^^^^^^^^^^^^^^^^^^^^^^^
>Alan Prince (Brandeis) that tore PDP apart.  Has anyone seen any
>responses to this article that defend Rumelhart and McClelland?
>The article bummed me out, and I'm not up enough on language
>processing to really debate what they claim.
>
>++ Roger Gonzalez ++ spam@clutx.clarkson.edu ++ Clarkson University ++

	Could you please post a complete reference to this article  (where
it was published etc.). I am sure many of us will be interested in 
getting hold of a copy.
     Thank you,

bs manjunath.

clutx.clarkson.edu (Roger Gonzalez,,,) (03/24/89)

> The article doesn't tear PDP apart at all (I assume you mean the one in
> _Cognition_, published also as a book, "Connections and Symbols").  What it
> does do is tear apart one very simple model of a complex phenomenon (learning
> of past tenses).  As far as I could tell, virtually all its criticisms
> could be answered with a multilayer network model; that is, the faults
> of the R&M model derive mostly from the fact that it's just a single
> layer associative network, with hand-wired representations. 

Yeah, so I noticed after I waded through the last 50 pages.

My impressions changed after I was done reading the article.  Seems
to me that they were getting all picky about details that I don't
think R&M intended to be the god's truth about language processing...
Correct me if I'm wrong, but weren't R&M just saying "Look, here's
a simple little model that does a pretty good job with past tenses...
and look, it even seems to exhibit some of the behavior of children
learning language.."

Some of P&P's accusations seemed pretty trivial anyway:
"It can learn rules found in no human language"

SOoooo?

(Or are they assuming the ol' language aquisition device?)


Anyway, I'm reading the "nastier" article you suggested right now.


- Roger


++ Roger Gonzalez ++ spam@clutx.clarkson.edu ++ Clarkson University ++

   "Just like I've always said; there's nothing an agnostic can't do
    if he's not sure he believes in anything or not!" - Monty Python

clutx.clarkson.edu (Roger Gonzalez,,,) (03/24/89)

> 	Could you please post a complete reference to this article  (where
> it was published etc.). I am sure many of us will be interested in 
> getting hold of a copy.

Yeesh! I got over 20 mail inquiries for this!

The article was in "Connections and Symbols", MIT Press

++ Roger Gonzalez ++ spam@clutx.clarkson.edu ++ Clarkson University ++

   "Just like I've always said; there's nothing an agnostic can't do
    if he's not sure he believes in anything or not!" - Monty Python

kavuri@cb.ecn.purdue.edu (Surya N Kavuri ) (03/24/89)

In article <1078@Portia.Stanford.EDU>, kortge@Portia.Stanford.EDU (Chris Kortge) writes:
> In article <2726@sun.soe.clarkson.edu> spam@clutx.clarkson.edu writes:
> >I just read a rather scathing article by Steven Pinker (MIT) and
> >Alan Prince (Brandeis) that tore PDP apart.  Has anyone seen any
> >responses to this article that defend Rumelhart and McClelland?
> 
> If you want to read an article which _attempts_ to tear PDP _in general_
> apart, read the one by Fodor and Pylyshyn in the same book.  It didn't
 ......
 There is another paper with a similar claim.
  "Gradient descent fails to separate" is its title.
            By : M. Brady and R.Raghavan
The paper shows the failure of BP in the case of examples where 
there are no local minima.  They assert (and they could be right as 
such cliams have been "romantic",as Minsky put it!) that least square
 solutons do not minimize the # of misclassifications. 
 They have examples where Perceptron does well while the gradient descent 
 with LSR fails. 
 They conclude that the failure of GD and LSE may be much more wide spread 
than presumed.
                                          SURYA
                                         (FIAT LUX)

randall@alberta.UUCP (Allan F Randall) (03/26/89)

In article <1078@Portia.Stanford.EDU>, kortge@Portia.Stanford.EDU (Chris Kortge) writes:
> If you want to read an article which _attempts_ to tear PDP _in general_
> apart, read the one by Fodor and Pylyshyn in the same book.  It didn't
> make much sense to me, but then I guess I don't have the MIT
> perspective.  If someone really wants to blast PDP, there are _real_
> problems with it, like scaling of learning time, which make more
> sense to focus on than the things F&P talk about.
> 
> Chris Kortge
> kortge@psych.stanford.edu

I think the reason Fodor and Pylyshyn do not concentrate on those issues is
because they are criticizing the philosophy of connectionism as a general
approach to studying the mind, rather than attacking specific problems with
current systems. The points they do discuss are all intended to be general
problems inherent in the philosophy of connectionism, which they do not
anticipate being solved.

There are a few aspects of their reasoning that puzzle me; I thought I'd
respond by posting my own reactions to the article. I would be interested in
hearing from anyone who perhaps understands their perspective better,
particularly since I haven't read any connectionist responses to Fodor and
Pylyshyn (does anybody know of any?).

First of all, though, I think they do a reasonably good job of clearing up a
few contentious points concerning which issues are directly relevant to the
connectionist/symbolist debate and which are not. For instance, while
parallelism is central to most connectionist systems, there is nothing contrary
to the classical symbolist view in massive parallelism. The same goes for the
idea of soft or fuzzy constraints.

I have two major problems with the rest of their article. First, they seem
very limited in the types of connectionist systems they are willing to discuss.
Most of the examples they give are of systems were each node represents some
concept or proposition. Hence, they only discuss connectionist systems that
already have a lot in common with symbolic systems. They talk very little about
distributed representations and sub-symbolic processes. This seems strange to
me, since I would consider these things to be the central justification for
the connectionist approach. Fodor and Pylyshyn seem to be artificially limiting
the connectionist architecture to a narrow form of symbolism and then judging
it on its performance as a logic. What they fail to realize is that it is these
very assumptions they use in judging PDP that connectionists are calling into
question in the first place. *Of course* PDP, at least in its current forms,
fails as a general logical inference mechanism. What (many of) the new
connectionists are saying is that these systems work better at *certain types
of tasks* than the classical systems. They are meant to address problems with
the symbolist approach. Yes, they fail miserably at many things symbol systems
do well, but this does not mean we must choose one over the other.

This brings me to the other point, which I think is the key problem with Fodor
and Pylyshyn's approach. They do not seem to consider the possibility of using
*both* approaches. Their main argument is that mental representations have a
"combinatorial syntax and semantics" and "structure sensitivity of processes."
The upshot of this is that to do the sorts of things humans are good at, a
system must have a systematic way to generate mental representations that have
a constituent structure. Connectionist systems lack this language-like ability.
This is an argument for a "Language of Thought." Because of this emphasis on
the language-like aspects of cognition, many of Fodor and Pylyshyn's arguments
are about the inability of PDP nets to deal with language. They then generalize
to the rest of cognition. While this is not entirely invalid, I think it really
weakens their argument, as language is the one aspect of cognition that seems
to be the most symbolic and the least connectionistic. 

However, I would still agree with much of what they say. It is true that thought
must have these properties. Cognition must be more than the statistical
modelling of the environment. But Fodor and Pylyshyn give short shrift to the
idea that both types of architectures will be needed to handle all aspects of
cognition. Why could we not have a connectionist system modelling the
environment and creating distributed representations that are used by a more
classical symbolic processor? (This is, of course, only one way of looking at
hybrid systems.) While Fodor and Pylyshyn do spend a little time discussing this
sort of thing, it seems to be more of an afterthought, rather than a central
part of their argument. This seems strange, especially since this is where the
field of AI as a whole seems to be going.

In short, Fodor and Pylyshyn are extreme symbolists. They believe in the
classical symbolist view in its most extreme form: physical symbol systems are
a necessary and sufficient condition for cognition. Their article does a good
job of arguing for the "necessary" part, but pays little attention to the more
central "sufficient" part. Like the extreme connectionists, they seem convinced
that we must choose one or the other. They show that a pure connectionist system
could not work and thus conclude that pure symbolism is the answer.

To give them credit, while I disagree with their conclusions, I think they do
a good job of explaining why an intelligent system must display these properties
and why current connectionist architectures are insufficient on their own to do
the job. They build a good case against extreme connectionism, but fail to
explain why this implies the other extreme.

To summarize, my problems with Fodor and Pylyshyn are:
  i) they criticize connectionism as (a) symbolic logic, and (b) language,
     largely ignoring its other aspects as unimportant to cognition.
  ii) they ignore hybrid connectionist/symbolist approaches.

Finally, I think Fodor and Pylyshyn simply have different intuitions than I do.
They seem to feel that the statistical and distributed nature of intelligence
is really not very crucial, if it is there at all. While I disagree, I can
certainly respect that. But I was disappointed with their article, because I
didn't think it really addressed the issue. I would love to see a critique of
connectionism that considered the "sufficient" aspect of symbol systems as
rigorously as they discussed the "necessary" aspects.

-----------------------
Allan Randall
Dept. Computing Science
University of Alberta
Edmonton, AB