neuron-request@HPLMS2.HPL.HP.COM ("Neuron-Digest Moderator Peter Marvit") (05/09/91)
Neuron Digest Wednesday, 8 May 1991 Volume 7 : Issue 24 Today's Topics: IEEE Transactions on Software Eng. Special Issue Enquiry about Perceptron NN generalization Re: Rigorous Results on Fau Re: lack of generalization CONSCIOUSNESS & SCIENCE DISCUSSION GROUP/MAY 10 MEETING IJCNN-91-SEATTLE call for volunteers Send submissions, questions, address maintenance and requests for old issues to "neuron-request@hplabs.hp.com" or "{any backbone,uunet}!hplabs!neuron-request" Use "ftp" to get old issues from hplpm.hpl.hp.com (15.255.176.205). ------------------------------------------------------------ Subject: IEEE Transactions on Software Eng. Special Issue From: Erol Gelenbe <gelenbe@csd36.NYU.EDU> Date: Mon, 22 Apr 91 12:26:56 -0400 A special isssue of the IEEE TSE on "Artificial Neural Network Models and Systems" is planned for the month of April 1992, for which I will as Editor. Papers concerning models and their theory, system implementations including one or more applications, and system environments which use artificial neural networks are sollicited. They will be reviewed in accordance with the normal practices of these IEEE Transactions. Paper submissions should be sent to me by August 1, 1991 to the following address : Erol Gelenbe EHEI 45 rue des Saints-Peres 75006 Paris, France erol@ehei.ehei.fr ------------------------------ Subject: Enquiry about Perceptron From: fernand@computervision.bristol.ac.uk Date: Thu, 25 Apr 91 00:09:31 +0100 Bristol 24-April-1991 I'm a portuguese student and I'm finishing my 5-years degree (Engenharia de Sistemas e Informatica - Dep. Informatica - Universidade do Minho - Braga - Portugal). I'm actually in Bristol developing a Neural Network for Pattern Recognition (British number plates). I'm using a single-layer Perceptron (Delta Rule). I had some problems to find the "best" set of weights because for the same learning value (niu) and different random number seed, the results were very inconsistent (same times very good, other times very bad). I trained the N.N. for classification and I've used the "step function" to see if the output was +1 or -1. The problems arrived in the test because for a given pattern the node that classifies it, is the one that has the highest weighted sum and for some patterns there were 2 or more output nodes with near values. This means that if the pattern has some noise the classification could not be the one I want. To solve this problem, in the training part I use now a "stair function". That is, if the weighted sum for a given output node is between -100 and 100, (these constants can be changed!!) the classification is 0 (not the desired one which is +1 or -1). The effect is that the weighted sum for the output node that should classify the pattern is now much bigger than all the others. There is a "gap" between the good node and the others!!! The results are now consistent and quite good. I've only problems with O and D; 8 and B; 6 and G (very correlated patterns). I've never read about this "stair function" and I would like to know where can I read about it because I would like to refine my N.N. I wonder also if with this kind of threshold function I can have problems with convergence (in the training part). Until now I hadn't. I ask if this stair function is the same as the sigmoid one (with a tolerance). Thanks in advance. Joao Miguel Fernandes. ------------------------------ Subject: NN generalization From: shoe@cod.nosc.mil (Patrick A. Shoemaker) Date: Thu, 25 Apr 91 09:54:54 -0700 In Neuron Digest, Volume 7, Issue 21, Phil Neal writes: >I have a problem with the ability of a neural net to generalize... >I break the data into a 400 observation training set and a 200 >observation test set... >When I use a simple linear discriminant function with seperate covariance >matrices and compare that against a NN with 6 input, 12 hidden and 4 >output nodes. Here's what I get for correct classification rates: > LDF NN >train 48.5 59.0 >test 42.0 37.0 These symptoms look like overfitting, although the data sample would seem to be large enough to support training of a network of that size. We have run back-propagation using data sets and networks of similar size and have never seen such overfitting on the problems we have looked at. Perhaps the structure of the problem itself is particularly simple. Have networks with as few as two or three hidden nodes been tried? Neal further writes: >I have heard of workers creating synthetic data from the data set they >had. From what I understand, it goes something like this: > 1. for each predictor variable in the training set > a. Assume a distribution > b. find the empirical parameters for that distribution... >Now, I am not too sure that this is "statistically" acceptable... >So, has anybody done this , or read any reports on anybody doing this ? I and several coworkers have carried out a study of classification by neural networks in which class probability density functions were established in closed form, and networks were trained with random samples drawn according to these distributions. (For one problem, the density functions were obtained by fitting real data.) This allows evaluation of the expected performance of the networks, which gives an unbiased measure of their generalization ability. This approach is "statistically acceptable" in the context of a comparative study of neural network classification performance, but I don't believe that it is applicable to the general problem in which you must induce something about class probability laws based upon samples of data (i.e., training sets). If you are confident that the observations are generated according to some distributions (e.g., Gaussian) and you obtain the parameters for such distributions by fitting the data in some way, then why train and use a neural network for classification? You should be able to use your statistical model to make discriminations. On the other hand, if the model is significantly mis-specified, then why use it to train a network? You will have a network based upon an incorrect model; better to treat the network as a non-parametric estimator and train on the data directly. I can send reprints of our work in this area to those who are interested. Patrick Shoemaker Code 552 Naval Ocean Systems Center San Diego, CA 92152-5000 shoe@cod.nosc.mil ------------------------------ Subject: Re: Rigorous Results on Fau From: Mark Dzwonczyk <mark_dzwonczyk@qmlink.draper.com> Date: 26 Apr 91 16:13:10 +0800 RE> Rigorous Results on Fault Tolerance, ND V7 #16 (01 April 1991) 26 April 1991 I recently completed my master's thesis [1] on this subject in the Aero/Astro department at MIT. It involved quantifying the probability of network failure after the insertion of faults. Madaline feed-forward networks, of arbitrary size, were considered, so that the probability of network failure is equivalent to the probability of misclassification at the output layer. (A madaline network has strictly binary neurons with a hard threshold activation function.) A spatial analysis of input space, as developed by Widrow et al over the past 30 years, was used to quantify the probability of misclassification. Specifically, the thesis is an extension of Stevenson's work [2, 3] on determining the probability of madaline misclassification when the network weights have limited precision. She determined madaline failure probability given weights which are perturbed from ideal; in my fault model, all faults are emulated as weight faults so weight perturbation can be quite severe. Closed-form solutions for a number of interesting fault modes (including multiple and mixed fault scenarios) were derived for an arbitrary size madaline, but these involved n-dimensional integrals, where n is the number of inputs to a neuron, which were quite complicated. Monte Carlo simulations were used to evaluate the probabilities of failure. The results indicate that errors propagate extensively in a fully-connected network. This means that faults that occur in early layers are likely to cause misclassification if the Hamming distance of the output encoding is small. This is due to the binary nature of neuron threshold function which is not forgiving of small input errors near the threshold value. Also, some assumptions were made in the thesis in order to keep the model general (for example, the uniform distribution of an n-dim weight vector in n-dim space), which may differ significantly for particular instantiations. A sparsely-connected madaline was shown to limit the propagation of errors and thereby reduce the probability of network failure. Of course, this resiliency is obtained at the expense of network capacity (fewer weights), so a larger sparse network would be required to achieve the capacity of a fully-connected network. After all, that's what fault-tolerance is all about: using redundancy (more neurons) to increase reliability. Apart from the specific results, the basic failure model developed in the thesis is a tool which can be used to quantitatively evaluate the reliability of madaline networks. Extension of the model to networks of sigmoidal neurons has now begun. A summary paper for publication is in preparation. If you would like a preprint of the paper (when complete) or the thesis (with all the gory details, available now) send me a surface mail address. Mark Dzwonczyk Fault-Tolerant Systems Division The Charles Stark Draper Laboratory surface mail: 555 Technology Square, MS 6F Cambridge, MA 02139 email: mdz@draper.com REFERENCES: 1. Dzwonczyk, M., "Quantitative Failure Models of Feed-Forward Neural Networks," Master of Science Thesis, MIT, February 1991. Also available as a C. S. Draper Laboratory technical report, CSDL-T-1068. 2. Stevenson, M., R. Winter, & B. Widrow, "Sensitivity of Feedforward Neural Networks to Weight Errors," IEEE Transactions on Neural Networks, vol. 1 (1990), pp. 71-80. 3. Stevenson, M., R. Winter & B. Widrow, "Sensitivity of Layered Neural Networks to Errors in the Weights," Proc. IJCNN-90-Wash-DC, pp. I:337-340. ------------------------------ Subject: Re: lack of generalization From: Naji Younes <NAJI@gwuvm.gwu.edu> Date: Sun, 28 Apr 91 17:32:06 -0400 I'm curious about a procedure described by Phil Neal in vol 7, issue 21. The procedure is supposed to increase the generalization power of a neural network. If I understand it correctly, it consists of two stages: 1. A distribution is assumed for each predictor in the data set, and the parameters of these distributions are estimated from the data. 2. The distributions are then used to add noise to each observation in the original data set, thereby creating "synthetic" observations. This sounds like a noisy version of the bootstrap algorithm proposed by Efron. I'm puzzled by several aspects of the procedure and here are a two questions: 1. Why not just generate observations at random using the estimated distributions? Is there any particular advantage in adding noise to the original observations? The only rationale I can imagine is that it might be an attempt to deal with the case in which the predictors are not independent from each other (in which case generating values for each predictor separately would be quite misleading). If that is the concern, I don't think the procedure is going to do a whole lot of good: the noise components for each variable are still independent of each other and are just as likely to make matters worse than better. What is really needed is an estimate of the *joint* distribution of the predictors. That's hard. One way to fudge it would be to compute the correlations between predictors and then generate correlated random variables - this should work if the dependence between the predictors is linear. And, as always, there are fancier ways to go about it. But adding noise to the original observations is not going to help with dependent predictors. Is there something else? 2. The only new "information" contained in the synthetic observations is the postulated distribution for the variables. If the postulated distributions are correct (or close to), then the fake variables are not contributing anything that isn't in the original data set, and, from a statistical point of view, it is hard to see why they would do any good. If the postulated distributions are wrong, then you have that much more to worry about: bear in mind that in many data sets a Cauchy distribution can look deceptively like a Gaussian. The difference is that the Cauchy has no mean and an infinite variance (ugh!). So my second question is: what's in the synthetic observations that isn't in the original ones? I guess what puzzles me is this: one way to increase the generalization power of a technique is to supply it with some fresh and unexpected new data --- to broaden its horizons if you will. It seems to me that the synthetic observations are only new in the sense that they were not in the original data set. From the information theoretic sense, there is nothing fresh and unexpected about them that I can see. Am I missing something? I would love to hear more about this! / Naji Naji Younes Statistics/Computer and Information Systems Department George Washington University ------------------------------ Subject: CONSCIOUSNESS & SCIENCE DISCUSSION GROUP/MAY 10 MEETING From: bvi@cca.ucsf.edu (Ravi Gomatam) Organization: Computer Center, UCSF Date: 25 Apr 91 06:44:43 +0000 [[ Editor's Note: For San Francisco Bay Area folks, I've found this group to have provocative speakers. I particularly enjoyed Benjamin Libet last February. This Friday's talk promises some hard neural data relavant to many of the issues which modelers and connectionists work with. -PM ]] CONSCIOUSNESS AND SCIENCE DISCUSSION GROUP The purpose of this group, which meets on the second Friday of every month, is to explore the nature of consciousness and its relationship to science, in such fields as biology, physics, artificial intelligence, psychology and parapsychology. Relevant ideas from mathematics will also be discussed. In general, a minimum of graduate level training is assumed of the participants. The meetings are free and open to all interested persons. NEXT MEETING ANNOUNCEMENT "INTRACRANIAL EVIDENCE FOR ASSOCIATIVE ACTIVATION IN HUMAN HIPPOCAMPUS" SPEAKER: Gary Heit, M.D., Ph.D., Stanford University Dr. Gary Heit received his Ph.D. from the Interdepartmental Neuroscience program at U.C.L.A. in 1988 and his M.D. from Stanford University in 1991. He is continuing his training in the Division of Neurosurgery at Stanford University. DATE: May 10, 1991 (Friday) PLACE: Room N721, School of Nursing, U.C. San Francisco TIME: 7:30 p.m.-8:00 p.m. Social; 8:00 p.m.-10:00 p.m. Talk and Discussion ABSTRACT: Data from intracranial recordings of the electrical activity of single neuron and local EEG from human amygdala and hippocampus will be presented within the framework of a parallel processing network. It is postulated that the activity represents construction of cognitive gestalts based on shared features between current stimulus and prior experience. A more general discussion will center on the methodology and experimental philosophy for studying the mind/brain phenomenon. REFERENCES: 1. Heit, G., Smith, M.E., and E. Halgren; Neural encoding of individual words and faces by human hippocampus and amygdala. Nature. 333, 773-775 (1988). 2. Heit, G., Smith, M.E., and E. Halgren; Activity in the human medial temporal lobe during recognition memory. Brain, 113, 1093-1112 (1990) REGISTRATION: If you are attending for the first time, please pre-register by calling Kainila Rajan, Ph.D. at 415-753-8647/8648, or Jean Burns, Ph.D., at (415) 481-7507. DIRECTIONS: The closest parking is the UCSF public garage at 500 Parnassus Avenue. Special $1 parking rate if you get your ticket validated at the meeting. After parking, come up to street level on Parnassus, cross the street to enter the Medical Sciences Building (513 Parnassus). Go through the double doors and follow the first corridor to your right all the way to the School of Nursing Building. Take the elevator to the seventh floor and turn left. ------------------------------ Subject: IJCNN-91-SEATTLE call for volunteers From: worth@park.bu.edu (Andrew J. Worth) Date: Fri, 26 Apr 91 12:37:23 -0400 [[ Editor's Note: Especially for starving students, I recommend volunteering for IJCNN. If last year's tradition continues, you get a nifty T-shirt, enough free time, and get to meet many (potential) colleagues easily. (Hi to San Diego volunteers whom I met!) I especially recommend the stuffing session at the beginning of the conference for a bizarre excercise in assembly-line manual labour. -PM ]] =--------------------------------------------------------------------------- IJCNN-91-Seattle Call for Volunteers July 8-12th, 1991 Seattle, Washington, USA. =--------------------------------------------------------------------------- The International Joint Conference on Neural Networks (IJCNN-91-SEATTLE) has volunteer positions available. If you or anyone you know would like to exchange admittance to the conference for working as a volunteer, please respond directly to me at the e-mail address below. In the past, volunteers have given approximately 20 hours of labor (spread out over the entire conference) to receive: o admittance to the conference o a full set of proceedings o attendance to a limited number of tutorials (while working) The exact benefits are still being worked out. Volunteer positions include helping at: o Stuffing Conference Proceedings o Poster Sessions o Technical Sessions o Evening Plenary Sessions o Social Events o OPTIONAL duty: Tutorials If you are interested in volunteering, please respond directly to me with the following information: o Electronic Mail Address o Last Name, First Name o Address o Country o Phone number o Volunteer Position Preference Positions will be filled on a first commit first served basis. There will be no funding available for volunteer's travel and lodging expenses. PLEASE RESPOND TO: worth@park.bu.edu Thank you, Andy. =------------------------------------------------------------------------- Andrew J. Worth Cognitive & Neural Systems Dept. IJCNN-91 Volunteer Chair Boston University worth@park.bu.edu 111 Cummington Street, Rm 244 (617) 353-6741 Boston, MA 02215 =------------------------------------------------------------------------- ------------------------------ End of Neuron Digest [Volume 7 Issue 24] ****************************************