[comp.ai.neural-nets] Neuron Digest V4 #29

neuron-request@HPLABS.HP.COM (Neuron-Digest Moderator Peter Marvit) (12/02/88)
Neuron Digest   Thursday,  1 Dec 1988
                Volume 4 : Issue 29

Today's Topics:
			       Administrivia
                       neural net job opening at MCC
                 Re: Learning arbitrary transfer functions
                        Radial basis functions etc.
                           Journal of Complexity
         car pooling from Denver Airport to NIPS conference hotel
                           NIPS Speech Workshop
               Flaming on Neural Nets and Transfer Functions
                RE: advantages of NNs over symbolic systems


Send submissions, questions, address maintenance and requests for old issues to
"neuron-request@hplabs.hp.com" or "{any backbone,uunet}!hplabs!neuron-request"

------------------------------------------------------------

Subject: Administrivia
From: Your local Moderator
Date: Thu, 1 Dec 88 13:00:00 PST

[[ Editor's Preface:  I've recieved several comments resulting from the
recent discussion of public conduct, information referencing and my
editorial policy (such as it is).  I'll report on those in the next issue.
I'm still open to suggestions and recommendations.  As always, though, I
look for submissions -- especially those with good technical foundations.

On another note, the Neuron Digest has grown to over 650 addresses, many of
which are redistribution points and a very large number of international
sites.  Please, if you are going to move or your site has mail
difficulties, let me know.  I get a large number of (frustrating)
undeliverable returns and am often forced to delete addresses.

One of the aims of this Digest is to offer information and provide conduits
for people working in this field to communicate effectively.  To this aim,
I'd like to start publishing short "biographies" -- especially of groups
around the world.  In my "welcome" message, I usually ask about the new
subscriber's interests.  I now invite YOU (the entire readership) to
introduce yourselves to the Digest.  Who are YOU and what are YOU doing? I
will publish responses appropriately in future issues.   -PM ]]


------------------------------

Subject: neural net job opening at MCC
From:    keeler@mcc.com (Jim Keeler)
Date:    Tue, 22 Nov 88 19:04:48 -0600 



WANTED: CONNECTIONIST/NEURAL NET RESEARCHERS

MCC (Microelectronics and Computer Technology Corporation, Austin Texas) is
looking for research scientists to join our newly formed neural network
research team.  We are looking for researchers with strong theoretical
skills in Physics, Electrical Engineering or Computer Science (Ph. D. level
or above preferred).  The research will focus on (non-military),
fundamental questions about neural networks including

 -Scaling and improvement of existing algorithms

 -Development of new learning algorithms
  
 -Temporal pattern recognition and processing

 -Reverse engineering of biological networks

 -Optical neural network architectures


MCC offers competitive salaries and a very stimulating, academic-like
research environment.

Contact Jim Keeler at jdk.mcc.com or Haran Boral at haran.mcc.com Or
contact Jim Keeler at the NIPS conference in Denver.


------------------------------

Subject: Re: Learning arbitrary transfer functions
From:    djb@flash.bellcore.com (David J Burr)
Date:    Wed, 23 Nov 88 00:04:32 -0500 

Based on arguments outlined in the report "Knowledge Representation in
Connectionist Networks", S. J. Hanson and D. J. Burr, Bellcore Tech.
Report, February 9, 1987, a layered neural net with "two" hidden layers
should be able to approximate any continuous function.  We argue that a
continuous function is approximated by superposing individual activation
"hills" in the (input) space.  Negative hills or valleys can also be
formed.

In a network with conventional linear summation neurons the hills are
formed in the "first" hidden layer.  However, when special quadratic
"spherical" neurons are used, they individually form activation hills and
the first hidden layer is not needed.  These hills are superposed (OR-ed)
with linear summation neurons in the outer hidden layer.  Superposition can
be quasi-linear depending on the values of the output layer weights.

We actually demonstrated a one (hidden) layer network learning a noisy 2-D
gaussian function in a later paper: "Minkowski-r Back Propagation: Learning
in Connectionist Models with Non-Euclidean Error Metrics", S. J. Hanson and
D. J. Burr, in D. Anderson, Neural Information Processing Systems: Natural
and Synthetic, AIP, 1988.

D. J. Burr, 2B-397
Bellcore
445 South St.
Morristown, NJ 07960
djb@bellcore.com

------------------------------

Subject: Radial basis functions etc.
From:    "M. Niranjan" <niranjan%digsys.engineering.cambridge.ac.uk@NSS.Cs.Ucl.AC.UK>
Date:    Wed, 23 Nov 88 12:42:17 +0000 


[[ Editor's Note: The original announcement of Niranjan's paper was in
Vol 4 #17. -PM ]]

Some recent comments on RBFs that you might find interesting!

niranjan

============================== ONE =====================================
Date:        Wed, 16 Nov 88 09:41:13 +0200
From: Dario Ringach <dario@earn.techunix>
To: ajr <ajr%uk.ac.cambridge.engineering.digsys@uk.ac.ucl.cs.nss>
Subject:     Comments on TR.25
 
Thanks a lot for the papers!
 
I'd like to share a few thoughts on TR.25 "Generalising the Nodes of the
Error Propagation Network".  What are the advantages of choosing radial
basis functions (or Gaussian nodes) in *general* discrimination tasks?  It
seems clear to me, that the results presented in Table 1 are due to the
fact that the spectral distribution of steady state vowels can be closely
represented by normal/radial distributions.  If I have no a-priori
information about the distribution of the classes then how can I know which
kind of nodes will perform better?  I think that in this case the best we
can do is to look at the combinatorical problem of how many partitions of
the n-dimensional Euclidean space can be obtained using N (proposed shape)
boundaries.  This is closely related to obtaining the Vapnik-Chervonenkis
Dimension of the boundary class.  In the case of n-dimensional hyperplanes
and hypershperes, both have VC-dimension n+1, so I think there is really no
difference in using hyperplanes or hyperspheres in *general* discrimination
problems.  Don't you agree?
 
Thanks again for the papers!
Dario
 
============================== TWO =====================================

From: M. Niranjan <niranjan>
Date: Mon, 21 Nov 88 14:03:16 GMT
To: dario@bitnet.techunix
Subject: RBF etc


With RBFs of the Gaussian type, the class conditional density function is
approximated by a mixture of multiple Gaussians. But the parameters of the
mixture are estimated to maximise the discrimination rather than modelling
the individual probability densities.

> If I have no a-priori information about the distribution of the classes
> then how can I know which kind of nodes will perform better?

There is no way other than by a set of experiments. In small scale
problems, we can probably plot cross sections of the feature space, or even
projections of it on a linear discriminant plane and get some rough idea.

> problem of how many partitions of the n-dimensional Euclidean space
> can be obtained using N (proposed shape) boundaries.

It is not how many different partitions; I think our problem in pattern
classification is dealing with breakpoints of class boundary. It is this
capability that is the power in MLPs (and RBFs). In a two class problem, we
still partition the input space into two using N boundary segments (or
splines), with N-1 break-points.

What I like about RBFs is that you can have a probabilistic interpretation.
With standard MLPs this is not very obvious and what happens is more like a
functional interpolation.

> both have VC-dimension n+1, so I think there is really no difference
 
I dont know what VC-dimension is. Any reference please?

Best wishes
niranjan

============================ THREE =======================================
 
Date:        Tue, 22 Nov 88 08:22:53 +0200
From: Dario Ringach <dario@EARN.TECHUNIX>
To: M. Niranjan <niranjan@UK.AC.CAM.ENG.DSL>
Subject:     Re: RBF etc

Thanks for your Re!
 
[some stuff deleted]
 
> > I think  that in this case the best we can do is to look at the combinatoric
al
> > problem of how many partitions of the n-dimensional Euclidean space
> > can be obtained using N (proposed shape) boundaries.
>
> It is not how many different partitions; I think our problem in pattern
> classification is dealing with breakpoints of class boundary. It is this
> capability that is the power in MLPs (and RBFs). In a two class problem,
> we still partition the input space into two using N boundary segments
> (or splines), with N-1 break-points.
 
Sure, I agree.  But if you address the question of how many hidden units of
a determined type you need to classify the input vector into one of N
distinct classes, and consider it a rough measure of the complexity of the
boundary class proposed for the units, then the problem seems to be the one
of partitioning the input space.  Note that I don't care about the nature
of the class shapes in real world problems, in this case I must agree with
you that the issue of breakpoints of the class boundary becomes of real
importance.
 
[...]
>
> I dont know what VC-dimension is. Any reference please?
>
 
An earlier draft is "Classifying Learnable Geometric Concepts with the
Vapnik-Chervonenkis Dimension" by D. Haussler et al, at FOCS '86, pp
273-282.  But if you don't know what the Valiant's lernability model is
take a look at "A Theory of the Learnable" by L. Valiant, CACM 27(11),
1984, pp 1134-42.  The original article by Vapnik and Chervonenkis is "On
the Uniform Convergence of Relative Frequencies of Events to their
 Probabilities", Th. Prob. and its Appl., 16(2), 1971, pp 264-80.  More
up-to-date papers dealing with the VC-dimension can be found at the
Proc. of the first Workshop on Computational Learning Theory, COLT '88,
held at MIT last June.
 
- --Dario.

=========================== THE END =====================================


------------------------------

Subject: Journal of Complexity
From:    hirsch%math.Berkeley.EDU@cartan.berkeley.edu
Date:    Wed, 23 Nov 88 11:37:07 -0800 

The issue on neural nets is Volume 4, Number 3, September 1988.
Address is
                1 East First St
                Duluth MN 55802.



                        Professor Morris W. Hirsch
                        Department of Mathematics
                        University of California
                        Berkeley, CA 94720 USA

                        Phone:      (415) 642-4318
                        (messages)  (415) 642-5026     

                        e-mail: hirsch@math.berkeley.edu

------------------------------

Subject: car pooling from Denver Airport to NIPS conference hotel
From:    john moody <moody-john@YALE.ARPA>
Date:    Fri, 25 Nov 88 17:01:03 -0500 


I'm arriving at Denver Airport at 10:15 PM Monday night (after the
last shuttle leaves the airport for the hotel) and will probably have
to rent a Hertz car to get to the hotel.

Would anyone out there arriving Monday night like to car pool with me
and possibly split the cost of a one-day car rental? (Starving students
are welcome to tag along for free.)

If interested, please reply ASAP.

- --John Moody
(203)432-6493


------------------------------

Subject: NIPS Speech Workshop
From:    John.Hampshire@SPEECH2.CS.CMU.EDU
Date:    Mon, 28 Nov 88 14:02:50 -0500 

This is a preliminary outline for those planning to attend the speech workshop
following NIPS 88 in Keystone, CO.  For answers to questions/details, please 
contact Alex Waibel.

Speech Workshop
- ------------------

- ----Dec. 1, eve:   Overview.  All Groups meet. ---------------------------

- ----Dec. 2, Neural Nets (NN) and Hidden Markov Models (HMM)---------------

7:30 - 9:30     Introduction.  Short Informal Presentations (15 mins each).

                Connectionist Speech. The HMM/NN Debate. (Alex Waibel, CMU)
                State of the Art in HMMs (Rich Schwartz, BBN)
                Links between HMMs and NNs (Herve Bourlard, ICSI)
                Commonalities, Differences, HMMs, NNs. (John Bridle, RSRE)
                NNs and HMMs (Richard Lippmann, Lincoln Labs)
                Brief Questions and Answers.
                
4:30 - 6:30     Discussion.  NNs, HMMs.
                Strengths and Weaknesses, Commonalities.
                Comparisons.  Performance, Computational Needs, Extensions.
                Hybrid Approaches.

Evening:        Highlights.


- ----Dec. 3, Directions for Connectionist Speech Understanding.-----------

7:30 - 9:30     Introduction.
                Phoneme Recognition.  Word Recognition.  Syntax.
                Semantics.  Pragmatics.  Integral System Design.
                Learning Algorithms.  Computational Needs/ Limitations.
                Large Scale Neural System Design. Modularity.
                Instruction. Heuristic Knowledge.
                
4:30 - 6:30     Discussion.  Extensions.

Evening:        Highlights.  Summary.


------------------------------

Subject: Flaming on Neural Nets and Transfer Functions
From:    alexis%yummy@GATEWAY.MITRE.ORG
Organization: The Internet
Date:    18 Nov 88 15:18:37 +0000 

I have to admit some surprise that so many people got this "wrong."  Our
experience is that neural nets of the PDP/backprop variety are at their
*BEST* with continueous mappings.  If you just want classification you
might as well go with nearest-neighbor alg.s (or if you want the same thing
in a net try Nestor's Coulombic stuff).  If you can't learn x=>sin(x) in a
couple of minutes, you've done something wrong and should check your code
(I'm assuming you thought to scale sin(x) to [0,1]).  Actually, requiring a
PDP net to output 1's and 0's means your weights must be quite large which
takes alot of time and puts you way out on the tails of the sigmoids where
learning is slow and painful.  What I do for fun (?) these days is try to
make nets output sin(t) {where t is time} and other waveforms with static
or "seed" wave inputs.

For those who like math, G. Cybenko (currently of U. Illinois and starting
12/10/88 of Tufts) has a very good paper "Approximation by Superpositions
of a Sigmoidal Function" where he gives a existence proof that you can
uniformly approximate any continuous function with support in the unit
hypercube.  This means a NN with one hidden layer (1 up from a perceptron).
Certainly more layers generally give more compact and robust codings ...
but the theory is *finally* coming together.

    Alexis Wieland    ....    alexis%yummy@gateway.mitre.org

------------------------------

Subject: RE: advantages of NNs over symbolic systems
From:    kortge@psych.Stanford.EDU (Chris Kortge)
Date:    Wed, 23 Nov 88 14:41:31 -0800 

>From: bradb@ai.toronto.edu (Brad Brown)
>   Neural network-based  systems have  advantages over symbolic
>   systems for the following reasons.
> [...] 
>   (2)  Neural nets  can adapt to changes in their environment.
>        For instance,  a financial expert system implemented as
>        a  NN   could  use   new  information   to  modify  its
>        performance  over   time  to  reflect  changing  market
>        conditions.  Symbolic systems are usually either static
>        or require re-training on a substantial fraction of the
>        dataset to adapt to new data.
 
I'm a Connectionist, but I don't think this advantage typically holds.  The
powerful existing learning procedures, those which can learn distributed
representations (e.g. back-prop), actually require that the environment
(i.e., the input distribution) remain _fixed_.  If, after learning, you
change the environment a little bit, you can't just train on the new
inputs; rather, you must retrain on the entire distribution.  Otherwise,
the NN happily wipes out old knowledge in order to learn the new.  Roger
Ratcliffe at Northwestern has a new paper (unpublished as yet, I believe)
on this problem with regard to modeling recognition memory.  Also, Stephen
Grossberg pointed the problem out long ago, and his ART networks don't
suffer from it (but they can't learn distributed representations, either).
His system has a learned attention mechanism, which gates learning such
that it only occurs in response to novel inputs.  Back-prop networks don't
have such a mechanism, so it's only natural they don't treat new
information any different from old.

Chris Kortge
kortge@psych.stanford.edu


------------------------------

End of Neurons Digest
*********************