[comp.ai.neural-nets] SUMMARY : Extracting Rules from a Trained Network

bakker@cs.uq.oz.au (Paultje Bakker) (05/13/91)

Here follows a summary of replies to my posting that requested :

"References on extracting rules from trained neural networks."

I received many replies, which contained helpful advice and anecdotes
as well as straight-out references. In this posting I have just included the
reference list and some comments on the papers. I did not sort the
references into alphabetical order, as they were often organized by the
respondees into meaningful groups.  

If anyone would like the full summary, or has references to add,
please email me at 
   bakker@cs.uq.oz.au
Many thanks to all who replied! (especially Ray Lister).

------
[...]
Have a look
at the following papers, all from the 11th International Joint Conference
on Artificial Intelligence (IJCAI-89), Detroit 1989,  ...
 
Mooney et al "An Experimental Comparison of Symbolic and Connectionist
Learning Algorithms", Vol. 2, pp 775-780.

Weiss and Kapouleas "An Empirical Comparison of Pattern Recognition, Neural
Nets, and Machine Learning Classification Methods", Vol. 2, pp 781-787.
 
Fisher and McKusick "An Empirical Comparison of ID3 and Back-propagation",
Vol 2, pp 788-793.
 
   ... all three papers are worthwhile reading, but the first two 
perform comparative studies on big, real-world problems. The "artificial 
examples" of the third paper are more useful, and as a result I find the 
conclusions of this paper to be more thoughtful than the first two papers.
 
also..
 
Draper, J, Frankel, D, Hancock, H, Mize, A "A Microcomputer Neural Net
Benchmarked Against Standard Classification Techniques" IEEE First
International Conference on Neural Networks, San Diego (1987), pp
IV-651 to IV-658

Gallant, S "Connectionist Expert Systems", Comm. Assoc. Comp. Mach.
Vol. 31, No. 2 (February 1988), pp 152-169

Lister "Toward Context Dependent Classification of Infra-Red Spectra
by Energy Minimization", International Joint Conference on Neural Networks,
San Diego, June 1990, Vol II, pp 1-6.

Venkatasubramanian, V "Inexact Reasoning in Expert Systems: A
Stochastic Parallel Network Approach" IEEE Second Conference in
Artificial Intelligence Applications, Miami Beach (1985) pp 13-15


Some theory (i.e. connection with orthordox pattern recognition) ...

Anderson, C "A Bayesian Probability Network" Neural Networks for
Computing, Snowbird, Utah (1986), in Denker, J (ed.) American Institute
of Physics Conf.  Proc. 151 (1986), pp 7-11

Anderson, C "The Bayes Connection" IEEE First International Conference
on Neural Networks, San Diego (1987), pp III-105 to III-112

Geffner, H, and Pearl, J "On the Probabilistic Semantics of
Connectionist Networks" IEEE First International Conference on Neural
Networks, San Diego (1987), pp II-187 to II-195

Golden, R "A Unified Framework for Connectionist Systems" Biol.
Cybern.  59, pp 109-120 (1988)

Deitterich et al "A Comparative Study of ID3 and Backpropagation for English
text-to-Speech Mapping", 7th International Workshop on Machine Learning,
Austin Texas, June 1990.

Denker et al "Large Automatic Learning, Rule Extraction, and Generalization"
Complex Systems 1 (1987) pp 877-922.

The Hecht-Neilsen Corporation sell a product "KnowledgeNet", which
allegedly extracts "explanations" from neural networks.
see R. Hecht-Nielsen, Neurocomputing, Addison-Wesley 1990.

Claude Sammut, Computer Science, University of NSW, published a very
interesting paper on extracting rules for balancing a pole:

Sammut, C & Cribb, J "Is Learning Rate a Good Performance Criterion for
Learning?" Seventh International Conference on Machine Learning, Austin
Texas, June 1990, pp 170-178.
(Proceedings edited by Bruce Porter and Ray Mooney, and published by
Morgan-Kaufmann.)

-----
Bochereau, L., Bourgine, P., Extraction of Semantic Features 
and Logical Rules from a Multilayer Neural Network.
IJCNN 90 Washington, D.C., Application vol., pg. 579 etc.

They analyzed a NN that had learned the optimal strategy
for the first bid in bridge, and extracted the rules
that the NN used.
----

McMillan, C., Mozer, M. C., Smolensky, P., (submitted). The connectionist 
scientist game: rule extraction and refinement in a neural network. 
Thirteenth Annual Conference of the Cognitive Science Society, Chicago, IL, 
August 1991.

McMillan, C., Mozer, M. C., Smolensky, P., (1991). Learning rules in a
neural network. To appear in: Proceedings of the International Joint 
Conference on Neural Networks, Seattle, WA, July 1991.

McMillan, C., Smolensky, P., (1988). Analyzing a connectionist network as 
a system of soft rules, Proceedings of the 10th Conference of the Cognitive 
Science Society, Hillsdale, NJ: Lawrence Erlbaum Associates.

------
Garson, David G.  "Interpreting Neural Net Connection Weights"
  A.I. Expert, vol 6, no 4, (April 1991)  Miller Freeman Pulications
  600 Harrison St.  San Francisco, CA 94107  (415) 905-2200

"Discovering the underlying causal model behind a neural network's
 solution is difficult but not impossible.  The trick is to use the
 connection weights from input layer to hidden layer to output layer
 to partition the relative share of the output prediction associated
 with each input variable."
-----
The latest version of NeuralWorks Professional from NeuralWare in Pittsburgh
reputedly extracts rules.

I think it is based on the work of Stephen I. Gallant.
-----
Pavel, M., Gluck, M. A., & Henkle, V. have written two artikles that
touches your question.

"Generalization by humans and multi-layer adaptive networks." 
(Submitted to Tenth Annual Conference of the Cognitive Science Society,
August 17-19, 1988.)

"Constraints on adaptive networks for modeling human generalization"
This one I found in : Touretzky David S. "Advances in Neural Information
Processing Systems 1". (ISBN 1-558-60015-9) MORGAN KAUFMANN PUBLISHERS, INC.
(1989)

--------
Servan-Schreiber, Cleeremans, & McClelland, "Encoding Sequential
Structure in Simple Recurrent Networks", Carnegie Mellon CS dept.
technical report CMU-CS-88-183. November 1988.

Gorman & Sejnowski, "Analysis of Hidden Units in a Layered Network
trained to Classify Sonar Targets", Neural Networks, vol.1 number2,
pp 75-89, 1988.

Both of these use cluster analysis to analyze hidden unit activity
patterns and correlate them with outputs.

A similar approach does correlation analysis yielding principle components
of hidden unit activity. See:

Dennis Sanger, "Contribution Analysis: A Technique for Assigning Responsibilitties to Hidden Units in Connectionist Networks", Connection Science, vol. 1
number 2, p. 115-138, 1989.

Also, you may consider methods of trimming stuff out of a network to
reduce it's complexity, and thereby have less left to analyze.
See

Mozer& Smolensky - Skeletonization - short paper in NIPS II proceedings, 1989.

LeCun, Denker, Solla - Optimal Brain Damage - also presented at NIPS II, 1989.
-----
Sammut, C and Michie, D "Controlling a Black Box Simulation of a Spacecraft"
AI Magazine, Vol 12, No. 1 (Spring 1991), pp 56-63.

Touretzky and Pomerleau
"What Hidden in the Hidden Layers?", Byte, August 1989, pp 227-233.

It talks about some ways of visualizing what weights represent in networks.
----
You may find the following paper interesting:

 "The Upstart Algorithm: a method for constructing and training
feed-forward neural networks" Marcus Frean, in Neural Computation 2:2.

This is a constructive method for building MLP's, which builds
networks of close to minimal size, at least for binary classification
tasks. The interesting thing from your point of view is that the rules
which the network is using (in effect) to do a given classification
are apparent from the architecture. This point isn't emphasised in the
paper, but it should be clear enough - the algorithm's very simple.

For a paper on the relationshp between MLPs and classic "Expert" Bayesian
Reasoning, see ...

Ruck, D et al "The Multilayer Perceptron as an Approximation to a Bayes
              Optimal Discriminant Function", IEEE Transactions on NNs,
	      Vol 1, No 4, December 1990, pp 296-298.
------
"The Truck Backer-Upper", by Nguyen, D and Widrow, B, International Neural 
Network Conference, Paris, July 1990, pp 399-407.  
If you looked at the insides of their network, or used their network to 
generate a lot of cases for ID3 like inductive learning, I'll bet you'd 
find a set of simple "rules".
-----
Paul

--
PaulBakker     email:bakker@cs.uq.oz.au
Depts.ofComputerScience/Psychology,UniversityofQueensland,Qld4072,Australia

Famous Last Words (I). George Bernard Shaw : "I am going to die."

bakker@cs.uq.oz.au (Paultje Bakker) (05/14/91)

Ooops, I forgot to include these two references in the list:

"Machine learning using single-layered and multi-layered
neural networks" by Sabrina Sestito and Tharam Dillon,
in Tools for AI, Washington D.C., 1990
 
"Using multi-layered neural networks for learning 
symbolic knowledge" by Sabrina Sestito and Tharam Dillon,
in AI'90, Perth, Australia, 1990.

Apologies, apologies...

paul

--
PaulBakker     email:bakker@cs.uq.oz.au
Depts.ofComputerScience/Psychology,UniversityofQueensland,Qld4072,Australia

Famous Last Words (I). George Bernard Shaw : "I am going to die."