harnad@mind.UUCP (Stevan Harnad) (09/02/88)
Posted for Pinker & Prince by S. Harnad ----------------------------------------------------------- From: Steve Pinker <steve@cogito.mit.edu> Site: MIT Center for Cognitive Science Subject: answers to S. Harnad's questions, short version Alluding to our paper "On Language and Connectionism: Analysis of a PDP model of language acquisition", Stevan Harnad has posted a list of questions and observations as a 'challenge' to us. His remarks owe more to the general ambience of the connectionism / symbol-processing debate than to the actual text of our paper, in which the questions are already answered. We urge those interested in these issues to read the paper or the nutshell version published in Trends in Neurosciences, either of which may be obtained from Prince (address below). In this note we briefly answer Harnad's three questions. In another longer message to follow, we direct an open letter to Harnad which justifies the answers and goes over the issues he raises in more detail. Question # 1: Do we believe that English past tense formation is not learnable? Of course we don't! So imperturbable is our faith in the learnability of this system that we ourselves propose a way in which it might be done (OLC, 130-136). Question #2: If it is learnable, is it specifically unlearnable by nets? No, there may be some nets that can learn it; certainly any net that is intentionally wired up to behave exactly like a rule-learning algorithm can learn it. Our concern is not with (the mathematical question of) what nets can or cannot do in principle, but with which theories are true, and our conclusions were about pattern associators using distributed phonological representations. We showed that it is unlikely that human children learn the regular rule the way such a pattern associator learns the regular rule, because it is simply the wrong tool for the job. Therefore it's not surprising that the developmental data confirm that children do not behave in the way that such a pattern associator behaves. Question # 3: If past tense formation is learnable by nets, but only if the invariance that the net learns and that causally constrains its successful performance is describable as a "rule", what's wrong with that? Absolutely nothing! --just like there's nothing wrong with saying that past tense formation is learnable by a bunch of precisely-arranged molecules (viz., the brain) but only if the invariance that the molecules learn, etc. etc. etc. The question is, what explains the facts of human cognition? Pattern associator networks have some interesting properties that can shed light on certain kinds of phenomena, such as *irregular* past tense forms. But it is simply a fact about the *regular* past tense alternation in English that it is not that kind of phenomenon. You can focus on the interesting empirical predictions of pattern associators, and use them to explain certain things (but not others), or you can generalize them to a class of universal devices that can explain nothing without an appeal to the rules that they happen to implement. But you can't have it both ways. Alan Prince Program in Cognitive Science Department of Psychology Brown 125 Brandeis University Waltham, MA 02254-9110 prince@brandeis.bitnet Steven Pinker Department of Brain and Cognitive Sciences E10-018 MIT Cambridge, MA 02139 steve@cogito.mit.edu References: Pinker, S. & Prince, A. (1988) On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73-193. Reprinted in S. Pinker & J. Mehler (Eds.), Connections and symbols. Cambridge, MA: Bradford Books/MIT Press. Prince, A. & Pinker, S. (1988) Rules and connections in human language. Trends in Neurosciences, 11, 195-202. Rumelhart, D. E. & McClelland, J. L. (1986) On learning the past tenses of English verbs. In J. L. McClelland, D. E. Rumelhart, & The PDP Research Group, Parallel distributed processing: Explorations in the microstructure of cognition. Volume 2: Psychological and biological models. Cambridge, MA: Bradford Books/MIT Press. ---------------------------------------------------------------- Posted for Pinker & Prince by: -- Stevan Harnad ARPANET: harnad@mind.princeton.edu harnad@princeton.edu harnad@confidence.princeton.edu srh@flash.bellcore.com harnad@mind.uucp BITNET: harnad%mind.princeton.edu@pucc.bitnet UUCP: princeton!mind!harnad CSNET: harnad%mind.princeton.edu@relay.cs.net
harnad@mind.UUCP (Stevan Harnad) (09/02/88)
Posted for Pinker & Prince by S. Harnad ------------------------------------------------------------------ From: Steve Pinker <steve@cogito.mit.edu> To: Stevan Harnad (harnad@mind.princeton.edu) Site: MIT Center for Cognitive Science Subject: answers to S. Harnad's questions, longer version This letter is a reply to your posted list of questions and observations alluding to our paper "On language and connectionism: Analysis of a PDP model of language acquisition" (Pinker & Prince, 1988; see also Prince and Pinker, 1988). The questions are based on misunderstandings of our papers, in which they are already answered. (1) Contrary to your suggestion, we never claimed that pattern associators cannot learn the past tense rule, or anything else, in principle. Our concern is with which theories of the psychology of language are true. This question cannot be answered from an archair but only by examining what people learn and how they learn it. Our main conclusion is that the claim that the English past tense rule is learned and represented as a pattern-associator with distributed representations over phonological features for input and output forms (e.g., the Rumelhart-McClelland 1986 model) is false. That's because what pattern-associators are good at is precisely what the regular rule doesn't need. Pattern associators are designed to pick up patterns of correlation among input and output features. The regular past tense alternation, as acquired by English speakers, is not systematically sensitive to phonological features. Therefore some of the failures of the R-M model we found are traceable to its trying to handle the regular rule with an architecture inappropriate to the regular rule. We therefore predict that these failures should be seen in other network models that compute the regular past tense alternation using pattern associators with distributed phonological representations (*not* all conceivable network models, in general, in principle, forever, etc.). This prediction has been confirmed. Egedi and Sproat (1988) devised a network model that retained the assumption of associations between distributed phonological representations but otherwise differed radically from the R-M model: it had three layers, not two; it used a back-propagation learning rule, not just the simple perceptron convergence procedure; it used position-specific phonological features, not context-dependent ones; and it had a completely different output decoder. Nonetheless its successes and failures were virtually identical to those of the R-M model. (2) You claim that "the regularities you describe -- both in the irregulars and the regulars -- are PRECISELY the kinds of invariances you would expect a statistical pattern learner that was sensitive to higher order correlations to be able to learn successfully. In particular, the form-independent default option for the regulars should be readily inducible from a representative sample." This is an interesting claim and we strongly encourage you to back it up with argument and analysis; a real demonstration of its truth would be a significant advance. It's certainly false of the R-M and Egedi-Sproat models. There's a real danger in this kind of glib commentary of trivializing the issues by assuming that net models are a kind of miraculous wonder tissue that can do anything. The brilliance of the Rumelhart and McClelland (1986) paper is that they studiously avoided this trap. In the section of their paper called "Learning regular and exceptional patterns in a pattern associator" they took great pains to point out that pattern associators are good at specific things, especially exploiting statistical regularities in the mapping from one set of featural patterns to another. They then made the interesting emprical claim that these basic properties of the pattern associator model lie at the heart of the acquisition of the past tense. Indeed, the properties of the model afforded it some interesting successes with the *irregular* alternations, which fall into family resemblance clusters of the sort that pattern associators handle in interesting ways. But it is exactly these properties of the model that made it fail at the *regular* alternation, which does not form family resemblance clusters. We like to think that these kinds of comparisons make for productive empirical science. The successes of the pattern associator architecture for irregulars teaches us something about the psychology of the irregulars (basically a memory phenomenon, we argue), and its failures for the regulars teach us something about the psychology of the regulars (use of a default rule, we argue). Rumelhart and McClelland disagree with us over the facts but not over the key emprical tests. They hold that pattern associators have particular aptitudes that are suited to modeling certain kinds of processes, which they claim are those of cognition. One can argue for or against this and learn something about psychology while so doing. Your claim about a 'statistical pattern learner...sensitive to higher order correlations' is essentially impossible to evaluate. (3) We're mystified that you attribute to us the claim that "past tense formation is not learnable in principle." The implication is that our critique of the R-M model was based on the assertion that the rule is unlearned and that this is the key issue separating us from R&M. Therefore -- you seem to reason -- if the rule is learned, it is learned by a network. But both parts are wrong. No one in his right mind would claim that the English past tense rule is "built in". We spent a full seven pages (130-136) of 'OLC' presenting a simple model of how the past tense rule might be learned by a symbol manipulation device. So obviously we don't believe it can't be learned. The question is how children in fact do it. The only way we can make sense of this misattribution is to suppose that you equate "learnable" with "learnable by some (nth-order) statistical algorithm". The underlying presupposition is that statistical modeling (of an undefined character) has some kind of philosophical priority over other forms of analysis; so that if statistical modeling seems somehow possible-in-principle, then rule-based models (and the problems they solve) can be safely ignored. As a kind of corollary, you seem to assume that unless the input is so impoverished as to rule out all statistical modeling, rule theories are irrelevant; that rules are impossible without major stimulus-poverty. In our view, the question is not CAN some (ungiven) algorithm 'learn' it, but DO learners approach the data in that fashion. Poverty-of-the-stimulus considerations are one out of many sources of evidence in this issue. (In the case of the past tense rule, there is a clear P-of-S argument for at least one aspect of the organization of the inflectional system: across languages, speakers automatically regularize verbs derived from nouns and adjectives (e.g., 'he high-sticked/*high-stuck the goalie'; she braked/*broke the car'), despite virtually no exposure to crucial informative data in childhood. This is evidence that the system is built around representations corresponding to the constructs 'word', 'root', and 'irregular'; see OLC 110-114.) (4) You bring up the old distinction between rules that describe overall behavior and rules that are explicitly represented in a computational device and play a causal role in its behavior. Perhaps, as you say, "these are not crisp issues, and hence not a solid basis for a principled critique". But it was Rumelhart and McClelland who first brought them up, and it was the main thrust of their paper. We tend to agree with them that the issues are crisp enough to motivate interesting research, and don't just degenerate into discussions of logical possibilities. We just disagree about which conclusions are warranted. We noted that (a) the R-M model is empirically incorrect, therefore you can't use it to defend any claims for whether or not rules are explicitly represented; (b) if you simply wire up a network to do exactly what a rule does, by making every decision about how to build the net (which features to use, what its topology should be, etc.) by consulting the rule-based theory, then that's a clear sense in which the network "implements" the rule. The reason is that the hand-wiring and tweaking of such a network would not be motivated by principles of connectionist theory; at the level at which the manipulations are carried out, the units and connections are indistinguishable from one another and could be wired together any way one pleased. The answer to the question "Why is the network wired up that way?" would come from the rule-theory; for example, "Because the regular rule is a default operation that is insensitive to stem phonology". Therefore in the most interesting sense such a network *is* a rule. The point carries over to more complex cases, where one would have different subnetworks corresponding to different parts of rules. Since it is the fact that the network implements such-and-such a rule that is doing the work of explaining the phenomenon, the question now becomes, is there any reason to believe that the rule is implemented in that way rather some other way? Please note that we are *not* asserting that no PDP model of any sort could ever acquire linguistic knowledge without directly implementing linguistic rules. Our hope, of course, is that as the discussion proceeds, models of all kinds will be become more sophisticated and ambitious. As we said in our Conclusion, "These problems are exactly that, problems. They do not demonstrate that interesting PDP models of language are impossible in principle. At the same time, they show that there is no basis for the belief that connectionism will dissolve the difficult puzzles of language, or even provide radically new solutions to them." So to answer the catechism: (a) Do we believe that English past tense formation is not learnable? Of course we don't! (b) If it is learnable, is it specifically unlearnable by nets? No, there may be some nets that can learn it; certainly any net that is intentionally wired up to behave exactly like a rule-learning algorithm can learn it. Our concern is not with (the mathematical question of) what nets can or cannot do in principle, but about which theories are true, and our analysis was of pattern associators using distributed phonological representations. We showed that it is unlikely that human children learn the regular rule the way such a pattern associator learns the regular rule, because it is simply the wrong tool for the job. Therefore it's not surprising that the developmental data confirm that children do not behave the way such a pattern associator behaves. (c) If past tense formation is learnable by nets, but only if the invariance that the net learns and that causally constrains its successful performance is describable as a "rule", what's wrong with that? Absolutely nothing! -- just like there's nothing wrong with saying that past tense formation is learnable by a bunch of precisely-arranged molecules (viz., the brain) such that the invariance that the molecules learn, etc. etc. The question is, what explains the facts of human cognition? Pattern associator networks have some interesting properties that can shed light on certain kinds of phenomena, such as irregular past tense forms. But it is simply a fact about the regular past tense alternation in English that it is not that kind of phenomenon. You can focus on the interesting empirical properties of pattern associators, and use them to explain certain things (but not others), or you can generalize them to a class of universal devices that can explain nothing without appeals to the rules that they happen to implement. But you can't have it both ways. Steven Pinker Department of Brain and Cognitive Sciences E10-018 MIT Cambridge, MA 02139 steve@cogito.mit.edu Alan Prince Program in Cognitive Science Department of Psychology Brown 125 Brandeis University Waltham, MA 02254-9110 prince@brandeis.bitnet References: Egedi, D.M. and R.W. Sproat (1988) Neural Nets and Natural Language Morphology, AT&T Bell Laboratories, Murray Hill,NJ, 07974. Pinker, S. & Prince, A. (1988) On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73-193. Reprinted in S. Pinker & J. Mehler (Eds.), Connections and symbols. Cambridge, MA: Bradford Books/MIT Press. Prince, A. & Pinker, S. (1988) Rules and connections in human language. Trends in Neurosciences, 11, 195-202. Rumelhart, D. E. & McClelland, J. L. (1986) On learning the past tenses of English verbs. In J. L. McClelland, D. E. Rumelhart, & The PDP Research Group, Parallel distributed processing: Explorations in the microstructure of cognition. Volume 2: Psychological and biological models. Cambridge, MA: Bradford Books/MIT Press. ------------------------------------------------------------- Posted for Pinker & Prince by: -- Stevan Harnad ARPANET: harnad@mind.princeton.edu harnad@princeton.edu harnad@confidence.princeton.edu srh@flash.bellcore.com harnad@mind.uucp BITNET: harnad%mind.princeton.edu@pucc.bitnet UUCP: princeton!mind!harnad CSNET: harnad%mind.princeton.edu@relay.cs.net
harnad@mind.UUCP (Stevan Harnad) (09/02/88)
ON THEFT VS HONEST TOIL Pinker & Prince (prince@mit.cogito.edu) write in reply: >> Contrary to your suggestion, we never claimed that pattern associators >> cannot learn the past tense rule, or anything else, in principle. I've reread the paper, and unfortunately I still find it ambiguous: For example, one place (p. 183) you write: "These problems are exactly that, problems. They do not demonstrate that interesting PDP models of language are impossible in principle." But elsewhere (p. 179) you write: "the representations used in decomposed, modular systems are abstract, and many aspects of their organization cannot be learned in any obvious way." [Does past tense learning depend on any of this unlearnable organization?] On p. 181 you write: "Perhaps it is the limitations of these simplest PDP devices -- two-layer association networks -- that causes problems for the R & M model, and these problems would diminish if more sophisticated kinds of PDP networks were used." But earlier on the same page you write: "a model that can learn all possible degrees of correlation among a set of features is not a model of a human being" [Sounds like a Catch-22...] It's because of this ambiguity that my comments were made in the form of conditionals and questions rather than assertions. But we now stand answered: You do NOT claim "that pattern associaters cannot learn the past tense rule, or anything else, in principle." [Oddly enough, I do: if by "pattern associaters" you mean (as you mostly seem to mean) 2-layer perceptron-style nets like the R & M model, then I would claim that they cannot learn the kinds of things Minsky showed they couldn't learn, in principle. Whether or not more general nets (e.g., PDP models with hidden layers, back-prop, etc.) will turn out to have corresponding higher-order limitations seems to be an open question at this point.] You go on to quote my claim that: "the regularities you describe -- both in the irregulars and the regulars -- are PRECISELY the kinds of invariances you would expect a statistical pattern learner that was sensitive to higher order correlations to be able to learn successfully. In particular, the form-independent default option for the regulars should be readily inducible from a representative sample." and then you comment: >> This is an interesting claim and we strongly encourage you to back it >> up with argument and analysis; a real demonstration of its truth would >> be a significant advance. It's certainly false of the R-M and >> Egedi-Sproat models. There's a real danger in this kind of glib >> commentary of trivializing the issues by assuming that net models are >> a kind of miraculous wonder tissue that can do anything. I don't understand the logic of your challenge. You've disavowed having claimed that any of this was unlearnable in principle. Why is it glibber to conjecture that it's learnable in practice than that it's unlearnable in practice? From everything you've said, it certainly LOOKS perfectly learnable: Sample a lot of forms and discover that the default regularity turns out to work well in most cases (i.e., the "regulars"; the rest, the "irregulars," have their own local invariances, likewise inducible from statistical regularities in the data). This has nothing to do with a belief in wonder tissue. It was precisely in order to avoid irrelevant stereotypes like that that the first posting was prominently preceded by the disclaimer that I happen to be a sceptic about connectionism's actual accomplishments and an agnostic about its future potential. My critique was based solely on the logic of your argument against connectionism (in favor of symbolism). Based only on what you've written about its underlying regularities, past tense rule learning simply doesn't seem to pose a serious challenge for a statistical learner -- not in principle, at any rate. It seems to have stumped R & M 86 and E & S 88 in practice, but how many tries is that? It is possible, for example, as suggested by your valid analysis of the limitations of the Wickelfeature representation, that some of the requisite regularities are simply not reflected in this phonological representation, or that other learning (e.g. plurals) must complement past-tense data. This looks more like an entry-point problem (see (1) below), however, rather than a problem of principle for connectionist learning of past tense formation. After all, there's no serious underdetermination here; it's not like looking for a needle in a haystack, or NP-complete, or like that. I agree that R & M made rather inflated general claims on the basis of the limited success of R & M 86. But (to me, at any rate) the only potentially substantive issue here seems to be the one of principle (about the relative scope and limits of the symbolic vs. the connectionistic approach). Otherwise we're all just arguing about the scope and limits of R & M 86 (and perhaps now also E & S 88). Two sources of ambiguity seem to be keeping this disagreement unnecessarily vague: (1) There is an "entry-point" problem in comparing a toy model (e.g., R & M 86) with a lifesize cognitive capacity (e.g., the human ability to form past tenses): The capacity may not be modular; it may depend on other capacities. For example, as you point out in your article, other phonological and morphological data and regularities (e.g., pluralization) may contribute to successful past tense formation. Here again, the challenge is to come up with a PRINCIPLED limitation, for otherwise the connectionist can reasonably claim that there's no reason to doubt that those further regularities could have been netted exactly the same way (if they had been the target of the toy model); the entry point just happened to be arbitrarily downstream. I don't say this isn't hand-waving; but it can't be interestingly blocked by hand-waving in the opposite direction. (2) The second factor is the most critical one: learning. You put a lot of weight on the idea that if nets turn out to behave rulefully then this is a vindication of the symbolic approach. However, you make no distinction between rules that are built in (as "constraints," say) and rules that are learned. The endstate may be the same, but there's a world of difference in how it's reached -- and that may turn out to be one of the most important differences between the symbolic approach and connectionism: Not whether they use rules, but how they come by them -- by theft or honest toil. Typically, the symbolic approach builds them in, whereas the connectionistic one learns them from statistical regularities in its input data. This is why the learnability issue is so critical. (It is also what makes it legitimate for a connectionist to conjecture, as in (1) above, that if a task is nonmodular, and depends on other knowledge, then that other knowledge too could be acquired the same way: by learning.) >> Your claim about a 'statistical pattern learner...sensitive to higher >> order correlations' is essentially impossible to evaluate. There are in principle two ways to evaluate it, one empirical and open-ended, the other analytical and definitive. You can demonstrate that specific regularities can be learned from specific data by getting a specific learning model to do it (but its failure would only be evidence that that model fails for those data). The other way is to prove analytically that certain kinds of regularities are (or are not) learnable from certain kinds of data (by certain means, I might add, because connectionism may be only one candidate class of statistical learning algorithms). Poverty-of-the-stimulus arguments attempt to demonstrate the latter (i.e., unlearnability in principle). >> We're mystified that you attribute to us the claim that "past >> tense formation is not learnable in principle."... No one in his right >> mind would claim that the English past tense rule is "built in". We >> spent a full seven pages (130-136) of 'OLC' presenting a simple model >> of how the past tense rule might be learned by a symbol manipulation >> device. So obviously we don't believe it can't be learned. Here are some extracts from OLC 130ff: "When a child hears an inflected verb in a single context, it is utterly ambiguous what morphological category the inflection is signalling... Pinker (1984) suggested that the child solves this problem by "sampling" from the space of possible hypotheses defined by combinations of an innate finite set of elements, maintaining these hypotheses in the provisional grammar, and testing them against future uses of that inflection, expunging a hypothesis if it is counterexemplified by a future word. Eventually... only correct ones will survive." [The text goes on to describe a mechanism in which hypothesis strength grows with success frequency and diminishes with failure frequency through trial and error.] "Any adequate rule-based theory will have to have a module that extracts multiple regularities at several levels of generality, assign them strengths related to their frequency of exemplification by input verbs, and let them compete in generating a past tense for for a given verb." It's not entirely clear from the description on pp. 130-136 (probably partly because of the finessed entry-point problem) whether (i) this is an innate parameter-setting or fine-tuning model, as it sounds, with the "learning" really just choosing among or tuning the built-in parameter settings, or whether (ii) there's genuine bottom-up learning going on here. If it's the former, then that's not what's usually meant by "learning." If it's the latter, then the strength-adjusting mechanism sounds equivalent to a net, one that could just as well have been implemented nonsymbolically. (You do state that your hypothetical module would be equivalent to R & M's in many respects, but it is not clear how this supports the symbolic approach.) [It's also unclear what to make of the point you add in your reply (again partly because of the entry-point problem): >>"(In the case of the past tense rule, there is a clear P-of-S argument for at least one aspect of the organization of the inflectional system...)">> Is this or is this not a claim that all or part of English past tense formation is not learnable (from the data available to the child) in principle? There seems to be some ambiguity (or perhaps ambivalence) here.] >> The only way we can make sense of this misattribution is to suppose >> that you equate "learnable" with "learnable by some (nth-order) >> statistical algorithm". The underlying presupposition is that >> statistical modeling (of an undefined character) has some kind of >> philosophical priority over other forms of analysis; so that if >> statistical modeling seems somehow possible-in-principle, then >> rule-based models (and the problems they solve) can be safely ignored. Yes, I equate learnability with an algorithm that can extract statistical regularities (possibly nth order) from input data. Connectionism seems to be (an interpretation of) a candidate class of such algorithms; so does multiple nonlinear regression. The question of "philosophical priority" is a deep one (on which I've written: "Induction, Evolution and Accountability," Ann. NY Acad. Sci. 280, 1976). Suffice it to say that induction has epistemological priority over innatism (or such a case can be made) and that a lot of induction (including hypothesis-strengthening by sampling instances) has a statistical character. It is not true that where statistical induction is possible, rule-based models must be ignored (especially if the rule-based models learn by what is equivalent to statistics anyway), only that the learning NEED not be implemented symbolically. But it is true that where a rule can be learned from regularities in the data, it need not be built in. [Ceterum sentio: there is an entry-point problem for symbols that I've also written about: "Categorical Perception," Cambr. U. Pr. 1987. I describe there a hybrid approach in in which symbolic and nonsymbolic representations, including a connectionistic component, are put together bottom-up in a principled way that avoids spuriously pitting connectionism against symbolism.] >> As a kind of corollary, you seem to assume that unless the input is so >> impoverished as to rule out all statistical modeling, rule theories >> are irrelevant; that rules are impossible without major stimulus-poverty. No, but I do think there's an entry-point problem. Symbolic rules can indeed be used to implement statistical learning, or even to preempt it, but they must first be grounded in nonsymbolic learning or in innate structures. Where there is learnability in principle, learning does have "philosophical (actually methodological) priority" over innateness. >> In our view, the question is not CAN some (ungiven) algorithm >> 'learn' it, but DO learners approach the data in that fashion. >> Poverty-of-the-stimulus considerations are one out of many >> sources of evidence in this issue... >> developmental data confirm that children do not behave the way such a >> pattern associator behaves. Poverty-of-the-stimulus arguments are the cornerstone of modern linguistics because, if they are valid, they entail that certain rules (or constraints) are unlearnable in principle (from the data available to the child) and hence that a learning model must fail for such cases. The rule system itself must accordingly be attributed to the brain, rather than just the general-purpose inductive wherewithal to learn the rules from experience. Where something IS learnable in principle, there is of course still a question as to whether it is indeed learned in practice rather than being innate; but neither (a) the absence of data on whether it is learned nor (b) the existence of a rule-based model that confers it on the child for free provide very strong empirical guidance in such a case. In any event, developmental performance data themselves seem far too impoverished to decide between rival theories at this stage. It seems advisable to devise theories that account for more lifesize chunks of our asymptotic (adult) performance capacity before trying to fine-tune them with developmental (or neural, or reaction-time, or brain-damage) tests or constraints. (Standard linguistic theory has in any case found it difficult to find either confirmation or refutation in developmental data to date.) By way of a concrete example, suppose we had two pairs of rival toy models, symbolic vs. connectionistic, one pair doing chess-playing and the other doing factorials. (By a "toy" model I mean one that models some arbitrary subset of our total cognitive capacity; all models to date, symbolic and connectionistic, are toy models in this sense.) The symbolic chess player and the connectionistic chess player both perform at the same level; so do the symbolic and connectionistic factorializer. It seems evident that so little is known about how people actually learn chess and factorials that "developmental" support would hardly be a sound basis for choosing between the respective pairs of models (particularly because of the entry-point problem, since these skills are unlikely to be acquired in isolation). A much more principled way would be to see how they scaled up from this toy skill to more and more lifesize chunks of cognitive capacity. (It has to be conceded, however, that the connectionist models would have a marginal lead in this race, because they would already be using the same basic [statistical learning] algorithm for both tasks, and for all future tasks, presumably, whereas the symbolic approach would have to be making its rules on the fly, an increasingly heavy load.) I am agnostic about who would win this race; connectionism may well turn out to be side-lined early because of a higher-order Perceptron-like limit on its rule-learning ability, or because of principled unlearnability handicaps. Who knows? But the race is on. And it seems obvious that it's far too early to use developmental (or neural) evidence to decide which way to bet. It's not even clear that it will remain a 2-man race for long -- or that a finish might not be more likely as a collaborative relay. (Nor is the one who finishes first or gets farthest guaranteed to be the "real" winner -- even WITH developmental and neural support. But that's just normal underdetermination.) >> if you simply wire up a network to do exactly what a rule does, by >> making every decision about how to build the net (which features to >> use, what its topology should be, etc.) by consulting the rule-based >> theory, then that's a clear sense in which the network "implements" >> the rule What if you don't WIRE it up but TRAIN it up? That's the case at issue here, not the one you describe. (I would of course agree that if nets wire in a rule as a built-in constraint, that's theft, not honest toil, but that's not the issue!) -- Stevan Harnad ARPANET: harnad@mind.princeton.edu harnad@princeton.edu harnad@confidence.princeton.edu srh@flash.bellcore.com harnad@mind.uucp BITNET: harnad%mind.princeton.edu@pucc.bitnet UUCP: princeton!mind!harnad CSNET: harnad%mind.princeton.edu@relay.cs.net
harnad@mind.UUCP (Stevan Harnad) (09/03/88)
Posted for Pinker & Prince [pinker@cogito.mit.edu] by S. Harnad
--------------------------------------------------------------
In his reply to our answers to his questions, Harnad writes:
-Looking at the actual behavior and empirical fidelity of
connectionist models is not the right way to test
connectionist hypotheses;
-Developmental, neural, reaction time, and brain-damage data
should be put aside in evaluating psychological theories.
-The meaning of the word "learning" should be stipulated to
apply only to extracting statistical regularities
from input data.
-Induction has philosophical priority over innatism.
We don't have much to say here (thank God, you are probably all
thinking). We disagree sharply with the first two claims, and have no
interest whatsoever in discussing the last two.
Alan Prince
Steven Pinker
----------------------------------------------------------------------
Posted for Pinker & Prince by:
--
Stevan Harnad ARPANET: harnad@mind.princeton.edu harnad@princeton.edu
harnad@confidence.princeton.edu srh@flash.bellcore.com harnad@mind.uucp
BITNET: harnad%mind.princeton.edu@pucc.bitnet UUCP: princeton!mind!harnad
CSNET: harnad%mind.princeton.edu@relay.cs.net
harnad@mind.UUCP (Stevan Harnad) (09/04/88)
Pinker & Prince attribute the following 4 points (not quotes) to me, indicating that they sharply disgree with (1) and (2) and have no interest whatsoever in discussing (3) and (4).: (1) Looking at the actual behavior and empirical fidelity of connectionist models is not the right way to test connectionist hypotheses. This was not the issue, as any attentive follower of the discussion can confirm. The question was whether Pinker & Prince's article was to be taken as a critique of the connectionist approach in principle, or just of the Rumelhart & McClelland 1986 model in particular. (2) Developmental, neural, reaction time, and brain-damage data should be put aside in evaluating psychological theories. This was a conditional methodological point; it is not correctly stated in (2): IF one has a model for a small fragment of human cognitive performance capacity (a "toy" model), a fragment that one has no reason to suppose to be functionally self-contained and independent of the rest of cognition, THEN it is premature to try to bolster confidence in the model by fitting it to developmental (neural, reaction time, etc.) data. It is a better strategy to try to reduce the model's vast degrees of freedom by scaling up to a larger and larger fragment of cognitive performance capacity. This certainly applies to past-tense learning (although my example was chess-playing and doing factorials). It also seems to apply to all cognitive models proposed to date. "Psychological theories" will begin when these toy models begin to approach lifesize; then fine-tuning and implementational details may help decide between asymptotic rivals. [Here's something for connectionists to disagree with me about: I don't think there is a solid enough fact known about the nervous system to warrant "constraining" cognitive models with it. Constraints are handicaps; what's needed in the toy world that contemporary modeling lives in is more power and generality in generating our performance capacities. If "constraints" help us to get that, then they're useful (just as any source of insight, including analogy and pure fantasy can be useful). Otherwise they are just arbitrary burdens. The only face-valid "constraint" is our cognitive capacity itself, and we all know enough about that already to provide us with competence data till doomsday. Fine-tuning details are premature; we haven't even come near the station yet.] (3) The meaning of the word "learning" should be stipulated to apply only to extracting statistical regularities from input data. (4) Induction has philosophical priority over innatism. These are substantive issues, very relevant to the issues under discussion (and not decidable by stipulation). However, obviously, they can only be discussed seriously with interested parties. -- Stevan Harnad ARPANET: harnad@mind.princeton.edu harnad@princeton.edu harnad@confidence.princeton.edu srh@flash.bellcore.com harnad@mind.uucp BITNET: harnad%mind.princeton.edu@pucc.bitnet UUCP: princeton!mind!harnad CSNET: harnad%mind.princeton.edu@relay.cs.net