bakker@cs.uq.oz.au (Paultje Bakker) (05/13/91)
Here follows a summary of replies to my posting that requested : "References on extracting rules from trained neural networks." I received many replies, which contained helpful advice and anecdotes as well as straight-out references. In this posting I have just included the reference list and some comments on the papers. I did not sort the references into alphabetical order, as they were often organized by the respondees into meaningful groups. If anyone would like the full summary, or has references to add, please email me at bakker@cs.uq.oz.au Many thanks to all who replied! (especially Ray Lister). ------ [...] Have a look at the following papers, all from the 11th International Joint Conference on Artificial Intelligence (IJCAI-89), Detroit 1989, ... Mooney et al "An Experimental Comparison of Symbolic and Connectionist Learning Algorithms", Vol. 2, pp 775-780. Weiss and Kapouleas "An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods", Vol. 2, pp 781-787. Fisher and McKusick "An Empirical Comparison of ID3 and Back-propagation", Vol 2, pp 788-793. ... all three papers are worthwhile reading, but the first two perform comparative studies on big, real-world problems. The "artificial examples" of the third paper are more useful, and as a result I find the conclusions of this paper to be more thoughtful than the first two papers. also.. Draper, J, Frankel, D, Hancock, H, Mize, A "A Microcomputer Neural Net Benchmarked Against Standard Classification Techniques" IEEE First International Conference on Neural Networks, San Diego (1987), pp IV-651 to IV-658 Gallant, S "Connectionist Expert Systems", Comm. Assoc. Comp. Mach. Vol. 31, No. 2 (February 1988), pp 152-169 Lister "Toward Context Dependent Classification of Infra-Red Spectra by Energy Minimization", International Joint Conference on Neural Networks, San Diego, June 1990, Vol II, pp 1-6. Venkatasubramanian, V "Inexact Reasoning in Expert Systems: A Stochastic Parallel Network Approach" IEEE Second Conference in Artificial Intelligence Applications, Miami Beach (1985) pp 13-15 Some theory (i.e. connection with orthordox pattern recognition) ... Anderson, C "A Bayesian Probability Network" Neural Networks for Computing, Snowbird, Utah (1986), in Denker, J (ed.) American Institute of Physics Conf. Proc. 151 (1986), pp 7-11 Anderson, C "The Bayes Connection" IEEE First International Conference on Neural Networks, San Diego (1987), pp III-105 to III-112 Geffner, H, and Pearl, J "On the Probabilistic Semantics of Connectionist Networks" IEEE First International Conference on Neural Networks, San Diego (1987), pp II-187 to II-195 Golden, R "A Unified Framework for Connectionist Systems" Biol. Cybern. 59, pp 109-120 (1988) Deitterich et al "A Comparative Study of ID3 and Backpropagation for English text-to-Speech Mapping", 7th International Workshop on Machine Learning, Austin Texas, June 1990. Denker et al "Large Automatic Learning, Rule Extraction, and Generalization" Complex Systems 1 (1987) pp 877-922. The Hecht-Neilsen Corporation sell a product "KnowledgeNet", which allegedly extracts "explanations" from neural networks. see R. Hecht-Nielsen, Neurocomputing, Addison-Wesley 1990. Claude Sammut, Computer Science, University of NSW, published a very interesting paper on extracting rules for balancing a pole: Sammut, C & Cribb, J "Is Learning Rate a Good Performance Criterion for Learning?" Seventh International Conference on Machine Learning, Austin Texas, June 1990, pp 170-178. (Proceedings edited by Bruce Porter and Ray Mooney, and published by Morgan-Kaufmann.) ----- Bochereau, L., Bourgine, P., Extraction of Semantic Features and Logical Rules from a Multilayer Neural Network. IJCNN 90 Washington, D.C., Application vol., pg. 579 etc. They analyzed a NN that had learned the optimal strategy for the first bid in bridge, and extracted the rules that the NN used. ---- McMillan, C., Mozer, M. C., Smolensky, P., (submitted). The connectionist scientist game: rule extraction and refinement in a neural network. Thirteenth Annual Conference of the Cognitive Science Society, Chicago, IL, August 1991. McMillan, C., Mozer, M. C., Smolensky, P., (1991). Learning rules in a neural network. To appear in: Proceedings of the International Joint Conference on Neural Networks, Seattle, WA, July 1991. McMillan, C., Smolensky, P., (1988). Analyzing a connectionist network as a system of soft rules, Proceedings of the 10th Conference of the Cognitive Science Society, Hillsdale, NJ: Lawrence Erlbaum Associates. ------ Garson, David G. "Interpreting Neural Net Connection Weights" A.I. Expert, vol 6, no 4, (April 1991) Miller Freeman Pulications 600 Harrison St. San Francisco, CA 94107 (415) 905-2200 "Discovering the underlying causal model behind a neural network's solution is difficult but not impossible. The trick is to use the connection weights from input layer to hidden layer to output layer to partition the relative share of the output prediction associated with each input variable." ----- The latest version of NeuralWorks Professional from NeuralWare in Pittsburgh reputedly extracts rules. I think it is based on the work of Stephen I. Gallant. ----- Pavel, M., Gluck, M. A., & Henkle, V. have written two artikles that touches your question. "Generalization by humans and multi-layer adaptive networks." (Submitted to Tenth Annual Conference of the Cognitive Science Society, August 17-19, 1988.) "Constraints on adaptive networks for modeling human generalization" This one I found in : Touretzky David S. "Advances in Neural Information Processing Systems 1". (ISBN 1-558-60015-9) MORGAN KAUFMANN PUBLISHERS, INC. (1989) -------- Servan-Schreiber, Cleeremans, & McClelland, "Encoding Sequential Structure in Simple Recurrent Networks", Carnegie Mellon CS dept. technical report CMU-CS-88-183. November 1988. Gorman & Sejnowski, "Analysis of Hidden Units in a Layered Network trained to Classify Sonar Targets", Neural Networks, vol.1 number2, pp 75-89, 1988. Both of these use cluster analysis to analyze hidden unit activity patterns and correlate them with outputs. A similar approach does correlation analysis yielding principle components of hidden unit activity. See: Dennis Sanger, "Contribution Analysis: A Technique for Assigning Responsibilitties to Hidden Units in Connectionist Networks", Connection Science, vol. 1 number 2, p. 115-138, 1989. Also, you may consider methods of trimming stuff out of a network to reduce it's complexity, and thereby have less left to analyze. See Mozer& Smolensky - Skeletonization - short paper in NIPS II proceedings, 1989. LeCun, Denker, Solla - Optimal Brain Damage - also presented at NIPS II, 1989. ----- Sammut, C and Michie, D "Controlling a Black Box Simulation of a Spacecraft" AI Magazine, Vol 12, No. 1 (Spring 1991), pp 56-63. Touretzky and Pomerleau "What Hidden in the Hidden Layers?", Byte, August 1989, pp 227-233. It talks about some ways of visualizing what weights represent in networks. ---- You may find the following paper interesting: "The Upstart Algorithm: a method for constructing and training feed-forward neural networks" Marcus Frean, in Neural Computation 2:2. This is a constructive method for building MLP's, which builds networks of close to minimal size, at least for binary classification tasks. The interesting thing from your point of view is that the rules which the network is using (in effect) to do a given classification are apparent from the architecture. This point isn't emphasised in the paper, but it should be clear enough - the algorithm's very simple. For a paper on the relationshp between MLPs and classic "Expert" Bayesian Reasoning, see ... Ruck, D et al "The Multilayer Perceptron as an Approximation to a Bayes Optimal Discriminant Function", IEEE Transactions on NNs, Vol 1, No 4, December 1990, pp 296-298. ------ "The Truck Backer-Upper", by Nguyen, D and Widrow, B, International Neural Network Conference, Paris, July 1990, pp 399-407. If you looked at the insides of their network, or used their network to generate a lot of cases for ID3 like inductive learning, I'll bet you'd find a set of simple "rules". ----- Paul -- PaulBakker email:bakker@cs.uq.oz.au Depts.ofComputerScience/Psychology,UniversityofQueensland,Qld4072,Australia Famous Last Words (I). George Bernard Shaw : "I am going to die."
bakker@cs.uq.oz.au (Paultje Bakker) (05/14/91)
Ooops, I forgot to include these two references in the list: "Machine learning using single-layered and multi-layered neural networks" by Sabrina Sestito and Tharam Dillon, in Tools for AI, Washington D.C., 1990 "Using multi-layered neural networks for learning symbolic knowledge" by Sabrina Sestito and Tharam Dillon, in AI'90, Perth, Australia, 1990. Apologies, apologies... paul -- PaulBakker email:bakker@cs.uq.oz.au Depts.ofComputerScience/Psychology,UniversityofQueensland,Qld4072,Australia Famous Last Words (I). George Bernard Shaw : "I am going to die."