[comp.ai.neural-nets] Neuron Digest V7 #24

neuron-request@HPLMS2.HPL.HP.COM ("Neuron-Digest Moderator Peter Marvit") (05/09/91)
Neuron Digest   Wednesday,  8 May 1991
                Volume 7 : Issue 24

Today's Topics:
            IEEE Transactions on Software Eng. Special Issue
                        Enquiry about Perceptron
                            NN generalization
                       Re: Rigorous Results on Fau
                       Re: lack of generalization
         CONSCIOUSNESS & SCIENCE DISCUSSION GROUP/MAY 10 MEETING
                  IJCNN-91-SEATTLE call for volunteers


Send submissions, questions, address maintenance and requests for old issues to
"neuron-request@hplabs.hp.com" or "{any backbone,uunet}!hplabs!neuron-request"
Use "ftp" to get old issues from hplpm.hpl.hp.com (15.255.176.205).

------------------------------------------------------------

Subject: IEEE Transactions on Software Eng. Special Issue
From:    Erol Gelenbe <gelenbe@csd36.NYU.EDU>
Date:    Mon, 22 Apr 91 12:26:56 -0400


A special isssue of the IEEE TSE on "Artificial Neural Network Models and
Systems" is planned for the month of April 1992, for which I will as
Editor.

        Papers concerning models and their theory, system implementations
including one or more applications, and system environments which use
artificial neural networks are sollicited.

        They will be reviewed in accordance with the normal practices of
these IEEE Transactions.

        Paper submissions should be sent to me by August 1, 1991 to the
following address :

                                Erol Gelenbe
                                EHEI
                                45 rue des Saints-Peres
                                75006 Paris, France

erol@ehei.ehei.fr

------------------------------

Subject: Enquiry about Perceptron
From:    fernand@computervision.bristol.ac.uk
Date:    Thu, 25 Apr 91 00:09:31 +0100

Bristol 24-April-1991

  I'm a portuguese student and I'm finishing my 5-years degree
(Engenharia de Sistemas e Informatica - Dep. Informatica - Universidade
do Minho - Braga - Portugal). I'm actually in Bristol developing a Neural
Network for Pattern Recognition (British number plates). I'm using a
single-layer Perceptron (Delta Rule). I had some problems to find the
"best" set of weights because for the same learning value (niu) and
different random number seed, the results were very inconsistent (same
times very good, other times very bad). I trained the N.N. for
classification and I've used the "step function" to see if the output was
+1 or -1. The problems arrived in the test because for a given pattern
the node that classifies it, is the one that has the highest weighted sum
and for some patterns there were 2 or more output nodes with near values.
This means that if the pattern has some noise the classification could
not be the one I want. To solve this problem, in the training part I use
now a "stair function".  That is, if the weighted sum for a given output
node is between -100 and 100, (these constants can be changed!!) the
classification is 0 (not the desired one which is +1 or -1). The effect
is that the weighted sum for the output node that should classify the
pattern is now much bigger than all the others. There is a "gap" between
the good node and the others!!! The results are now consistent and quite
good. I've only problems with O and D; 8 and B; 6 and G (very correlated
patterns).

  I've never read about this "stair function" and I would like to know where
can I read about it because I would like to refine my N.N. I wonder also if
with this kind of threshold function I can have problems with convergence (in
the training part). Until now I hadn't. I ask if this stair function is the
same as the sigmoid one (with a tolerance).

  Thanks in advance.

  Joao Miguel Fernandes.

------------------------------

Subject: NN generalization
From:    shoe@cod.nosc.mil (Patrick A. Shoemaker)
Date:    Thu, 25 Apr 91 09:54:54 -0700

In Neuron Digest, Volume 7, Issue 21, Phil Neal writes:

>I have a problem with the ability of a neural net to generalize...  

>I break the data into a 400 observation training set and a 200
>observation test set...

>When I use a simple linear discriminant function with seperate covariance
>matrices and compare that against a NN with 6 input, 12 hidden and 4
>output nodes. Here's what I get for correct classification rates:

>                        LDF     NN
>train                   48.5    59.0
>test                    42.0    37.0

These symptoms look like overfitting, although the data sample would seem
to be large enough to support training of a network of that size.  We
have run back-propagation using data sets and networks of similar size
and have never seen such overfitting on the problems we have looked at.
Perhaps the structure of the problem itself is particularly simple.  Have
networks with as few as two or three hidden nodes been tried?

Neal further writes:

>I have heard of workers creating synthetic data from the data set they
>had. From what I understand, it goes something like this:

>        1. for each predictor variable in the training set
>            a. Assume a distribution
>            b. find the empirical parameters for that distribution...

>Now, I am not too sure that this is "statistically" acceptable...

>So, has anybody done this , or read any reports on anybody doing this ?

I and several coworkers have carried out a study of classification by
neural networks in which class probability density functions were
established in closed form, and networks were trained with random samples
drawn according to these distributions.  (For one problem, the density
functions were obtained by fitting real data.)  This allows evaluation of
the expected performance of the networks, which gives an unbiased measure
of their generalization ability.  This approach is "statistically
acceptable" in the context of a comparative study of neural network
classification performance, but I don't believe that it is applicable to
the general problem in which you must induce something about class
probability laws based upon samples of data (i.e., training sets).  If
you are confident that the observations are generated according to some
distributions (e.g., Gaussian) and you obtain the parameters for such
distributions by fitting the data in some way, then why train and use a
neural network for classification?  You should be able to use your
statistical model to make discriminations.  On the other hand, if the
model is significantly mis-specified, then why use it to train a network?
You will have a network based upon an incorrect model; better to treat
the network as a non-parametric estimator and train on the data directly.

I can send reprints of our work in this area to those who are interested.

Patrick Shoemaker
Code 552
Naval Ocean Systems Center
San Diego, CA  92152-5000
shoe@cod.nosc.mil

------------------------------

Subject: Re: Rigorous Results on Fau
From:    Mark Dzwonczyk <mark_dzwonczyk@qmlink.draper.com>
Date:    26 Apr 91 16:13:10 +0800

  RE> Rigorous Results on Fault Tolerance, ND V7 #16 (01 April 1991)
  26 April 1991

I recently completed my master's thesis [1] on this subject in the
Aero/Astro department at MIT.  It involved quantifying the probability of
network failure after the insertion of faults.  Madaline feed-forward
networks, of arbitrary size, were considered, so that the probability of
network failure is equivalent to the probability of misclassification at
the output layer.  (A madaline network has strictly binary neurons with a
hard threshold activation function.)

A spatial analysis of input space, as developed by Widrow et al over the
past 30 years, was used to quantify the probability of misclassification.
Specifically, the thesis is an extension of Stevenson's work [2, 3] on
determining the probability of madaline misclassification when the
network weights have limited precision.  She determined madaline failure
probability given weights which are perturbed from ideal; in my fault
model, all faults are emulated as weight faults so weight perturbation
can be quite severe.

Closed-form solutions for a number of interesting fault modes (including
multiple and mixed fault scenarios) were derived for an arbitrary size
madaline, but these involved n-dimensional integrals, where n is the
number of inputs to a neuron, which were quite complicated.  Monte Carlo
simulations were used to evaluate the probabilities of failure.

The results indicate that errors propagate extensively in a
fully-connected network.  This means that faults that occur in early
layers are likely to cause misclassification if the Hamming distance of
the output encoding is small.  This is due to the binary nature of neuron
threshold function which is not forgiving of small input errors near the
threshold value.  Also, some assumptions were made in the thesis in order
to keep the model general (for example, the uniform distribution of an
n-dim weight vector in n-dim space), which may differ significantly for
particular instantiations.

A sparsely-connected madaline was shown to limit the propagation of
errors and thereby reduce the probability of network failure.  Of course,
this resiliency is obtained at the expense of network capacity (fewer
weights), so a larger sparse network would be required to achieve the
capacity of a fully-connected network.  After all, that's what
fault-tolerance is all about: using redundancy (more neurons) to increase
reliability.

Apart from the specific results, the basic failure model developed in the
thesis is a tool which can be used to quantitatively evaluate the
reliability of madaline networks.  Extension of the model to networks of
sigmoidal neurons has now begun.

A summary paper for publication is in preparation.  If you would like a
preprint of the paper (when complete) or the thesis (with all the gory
details, available now) send me a surface mail address.

Mark Dzwonczyk
Fault-Tolerant Systems Division
The Charles Stark Draper Laboratory

surface mail: 555 Technology Square, MS 6F
              Cambridge, MA 02139
email: mdz@draper.com


REFERENCES:

1.  Dzwonczyk, M., "Quantitative Failure Models of Feed-Forward Neural
Networks," Master of Science Thesis, MIT, February 1991.  Also available
as a C. S. Draper Laboratory technical report, CSDL-T-1068.

2.  Stevenson, M., R. Winter, & B. Widrow, "Sensitivity of Feedforward
Neural Networks to Weight Errors," IEEE Transactions on Neural Networks,
vol. 1 (1990), pp. 71-80.

3.  Stevenson, M., R. Winter & B. Widrow, "Sensitivity of Layered Neural
Networks to Errors in the Weights," Proc. IJCNN-90-Wash-DC, pp. I:337-340.


------------------------------

Subject: Re: lack of generalization
From:    Naji Younes <NAJI@gwuvm.gwu.edu>
Date:    Sun, 28 Apr 91 17:32:06 -0400

I'm curious about a procedure described by Phil Neal in vol 7, issue 21.
The procedure is supposed to increase the generalization power of a
neural network.  If I understand it correctly, it consists of two stages:

  1. A distribution is assumed for each predictor in the data set, and
     the parameters of these distributions are estimated from the data.

  2. The distributions are then used to add noise to each observation in
     the original data set, thereby creating "synthetic" observations.

This sounds like a noisy version of the bootstrap algorithm proposed by
Efron.  I'm puzzled by several aspects of the procedure and here are a
two questions:


 1. Why not just generate observations at random using the estimated
    distributions?  Is there any particular advantage in adding noise to
    the original observations?  The only rationale I can imagine is that
    it might be an attempt to deal with the case in which the predictors
    are not independent from each other (in which case generating values
    for each predictor separately would be quite misleading).  If that is
    the concern, I don't think the procedure is going to do a whole lot
    of good: the noise components for each variable are still independent
    of each other and are just as likely to make matters worse than
    better.  What is really needed is an estimate of the *joint*
    distribution of the predictors.  That's hard.  One way to fudge it
    would be to compute the correlations between predictors and then
    generate correlated random variables - this should work if the
    dependence between the predictors is linear.  And, as always, there
    are fancier ways to go about it.  But adding noise to the original
    observations is not going to help with dependent predictors.  Is
    there something else?

 2. The only new "information" contained in the synthetic observations is
    the postulated distribution for the variables.  If the postulated
    distributions are correct (or close to), then the fake variables are
    not contributing anything that isn't in the original data set, and,
    from a statistical point of view, it is hard to see why they would do
    any good.  If the postulated distributions are wrong, then you have
    that much more to worry about: bear in mind that in many data sets a
    Cauchy distribution can look deceptively like a Gaussian.  The
    difference is that the Cauchy has no mean and an infinite variance
    (ugh!). So my second question is: what's in the synthetic
    observations that isn't in the original ones?


I guess what puzzles me is this: one way to increase the generalization
power of a technique is to supply it with some fresh and unexpected new
data --- to broaden its horizons if you will.  It seems to me that the
synthetic observations are only new in the sense that they were not in
the original data set.  From the information theoretic sense, there is
nothing fresh and unexpected about them that I can see. Am I missing
something?  I would love to hear more about this!

/ Naji

Naji Younes
Statistics/Computer and Information Systems Department
George Washington University


------------------------------

Subject: CONSCIOUSNESS & SCIENCE DISCUSSION GROUP/MAY 10 MEETING
From:    bvi@cca.ucsf.edu (Ravi Gomatam)
Organization: Computer Center, UCSF
Date:    25 Apr 91 06:44:43 +0000

[[ Editor's Note: For San Francisco Bay Area folks, I've found this group
to have provocative speakers.  I particularly enjoyed Benjamin Libet last
February. This Friday's talk promises some hard neural data relavant to
many of the issues which modelers and connectionists work with. -PM ]]

                  CONSCIOUSNESS AND SCIENCE 
                      DISCUSSION GROUP

The purpose of this group, which meets on the second Friday of every
month, is to explore the nature of consciousness and its relationship to
science, in such fields as biology, physics, artificial intelligence,
psychology and parapsychology.  Relevant ideas from mathematics will also
be discussed.  In general, a minimum of graduate level training is
assumed of the participants.  The meetings are free and open to all
interested persons.

                 NEXT MEETING ANNOUNCEMENT

 "INTRACRANIAL EVIDENCE FOR ASSOCIATIVE ACTIVATION IN HUMAN HIPPOCAMPUS"

SPEAKER: Gary Heit, M.D., Ph.D., Stanford University
Dr. Gary Heit received his Ph.D. from the Interdepartmental Neuroscience
program at U.C.L.A. in 1988 and his M.D. from Stanford University in 
1991.  He is continuing his training in the Division of Neurosurgery at
Stanford University.

    DATE: May 10, 1991 (Friday)

    PLACE: Room N721, School of Nursing, U.C. San Francisco

    TIME: 7:30 p.m.-8:00 p.m. Social; 8:00 p.m.-10:00 p.m. Talk and Discussion

ABSTRACT: Data from intracranial recordings of the electrical activity of
single neuron and local EEG from human amygdala and hippocampus will be
presented within the framework of a parallel processing network.  It is
postulated that the activity represents construction of cognitive
gestalts based on shared features between current stimulus and prior
experience.  A more general discussion will center on the methodology and
experimental philosophy for studying the mind/brain phenomenon.

REFERENCES: 1. Heit, G., Smith, M.E., and E. Halgren; Neural encoding of
             individual words and faces by human hippocampus and amygdala.
             Nature. 333, 773-775 (1988).
            2. Heit, G., Smith, M.E., and E. Halgren; Activity in the 
             human medial temporal lobe during recognition memory.
             Brain, 113, 1093-1112 (1990)

REGISTRATION: If you are attending for the first time, please pre-register
by calling Kainila Rajan, Ph.D. at 415-753-8647/8648, or Jean Burns, Ph.D.,
at (415) 481-7507.

DIRECTIONS: The closest parking is the UCSF public garage at 500
Parnassus Avenue. Special $1 parking rate if you get your ticket
validated at the meeting.  After parking, come up to street level on
Parnassus, cross the street to enter the Medical Sciences Building (513
Parnassus). Go through the double doors and follow the first corridor to
your right all the way to the School of Nursing Building.  Take the
elevator to the seventh floor and turn left.

------------------------------

Subject: IJCNN-91-SEATTLE call for volunteers
From:    worth@park.bu.edu (Andrew J. Worth)
Date:    Fri, 26 Apr 91 12:37:23 -0400


[[ Editor's Note: Especially for starving students, I recommend
volunteering for IJCNN.  If last year's tradition continues, you get a
nifty T-shirt, enough free time, and get to meet many (potential)
colleagues easily. (Hi to San Diego volunteers whom I met!) I especially
recommend the stuffing session at the beginning of the conference for a
bizarre excercise in assembly-line manual labour. -PM ]]

=---------------------------------------------------------------------------
                  IJCNN-91-Seattle  Call for Volunteers
                            July 8-12th, 1991
                         Seattle, Washington, USA.
=---------------------------------------------------------------------------

The International Joint Conference on Neural Networks (IJCNN-91-SEATTLE)
has volunteer positions available.

If you or anyone you know would like to exchange admittance to the
conference for working as a volunteer, please respond directly to me at
the e-mail address below.

In the past, volunteers have given approximately 20 hours of labor
(spread out over the entire conference) to receive:

    o admittance to the conference
    o a full set of proceedings
    o attendance to a limited number of tutorials (while working)

The exact benefits are still being worked out.  Volunteer positions
include helping at:

    o Stuffing Conference Proceedings
    o Poster Sessions
    o Technical Sessions
    o Evening Plenary Sessions
    o Social Events
    o OPTIONAL duty: Tutorials

If you are interested in volunteering, please respond directly to me
with the following information:

    o Electronic Mail Address

    o Last Name, First Name
    o Address
    o Country
    o Phone number

    o Volunteer Position Preference

Positions will be filled on a first commit first served basis.  There
will be no funding available for volunteer's travel and lodging
expenses.

PLEASE RESPOND TO:

           worth@park.bu.edu

Thank you,
           Andy.
=-------------------------------------------------------------------------
 Andrew J. Worth                        Cognitive & Neural Systems Dept.
 IJCNN-91 Volunteer Chair               Boston University
 worth@park.bu.edu                      111 Cummington Street, Rm 244
 (617) 353-6741                         Boston, MA 02215
=-------------------------------------------------------------------------


------------------------------

End of Neuron Digest [Volume 7 Issue 24]
****************************************