[comp.ai.neural-nets] Back-propagation

guansy@cs.tamu.edu (Sheng-Yih Guan) (08/01/90)

In article <6985@helios.TAMU.EDU> vu2jok@cs.tamu.edu (Jogen K Pathak) writes:
>We are encountering problems while training the different paradigms , especially
>Back - Propagation paradigm. The training is very time consuming and tedious.
>Can anyone help to choose the training parameters' values that can
>reduce the training sessions. We are working in pattern classification of
>moderate size.e.g 100 input attributes.
>
>Any literature references also will be greatly appreciated.
>
>Jogen and Rajan.

In Fahlman and Lebiere's paper - The Cascade-Correlation Learning Architecture,
they have tried to analyze the resons why backprop learning is so slow and
they have identified two major problems:
	1. the step-size problem, and 
	2. the moving target problem.
In their references, there are several other articles on how to improve the
convergence of Back-Propagation.

Fahlman and Lebiere's paper is available via anonymous ftp.  The procedure
is as follows:
  >ftp cheops.cis.ohio-state.edu
  >cd /pub/neuroprose
  >bin
  >get fahlman.cascor-tr.ps.Z
  >quit

Hope this is helpful.

 _       _   _                            ___________          
|  \    /_| / /    Visualization Lab     /____  ____/         
 \  \  //  / /   Computer Science Dept       / /  _   _    _   _   
  | | //  / /     Texax A&M University      / /  / | | \  / | | | ||
  | |//  / / College Station, TX 77843-3112/ /  / /| |  \//|| | | ||
  /  /  / /____    Tel: (409)845-0531     / /  / -|| | |\/ || | !_!| 
 !__/  /______/ stanley@visual1.tamu.edu /_/  /_/ || !_!   || !____!

ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards) (08/01/90)

In article <7010@helios.TAMU.EDU> guansy@cs.tamu.edu (Sheng-Yih Guan) writes:
>
>In article <6985@helios.TAMU.EDU> vu2jok@cs.tamu.edu (Jogen K Pathak) writes:
>>We are encountering problems while training the different paradigms , especially
>>Back - Propagation paradigm. The training is very time consuming and tedious.
>>Can anyone help to choose the training parameters' values that can
>>reduce the training sessions. We are working in pattern classification of
>>moderate size.e.g 100 input attributes.

>In Fahlman and Lebiere's paper - The Cascade-Correlation Learning Architecture,
>they have tried to analyze the resons why backprop learning is so slow and
>they have identified two major problems:
>	1. the step-size problem, and 
>	2. the moving target problem.

Fahlman and Lebiere's Cascade-Correlation learning is a definate
improvement over conventional backprop methods.  By building up the
network layer by layer, they reduce the backprop calculation to
dealing with a single weight layer at  time, which incredibly speeds
up the process, as well as eliminating the moving target problem.
I find this algorithm very pleasing, as it explains how a multi-layered
neural system can be built up quickly.  Their TR has a wonderful example
of how C.C. learned the two spiral problem.  The first layer splits the
input space in half, the second forms a few big receptive fields,
and each layer after that forms receptive fields which come closer and closer
to exactly partitioning the input space into the two separate spirals.

The single weight layer learning is done with Quickprop (which 
could be used in a multi-layer network by itself).  This method 
uses second order information about the gradient to determine the 
next step.  (Cascade-Correlation performs much worse using 
perceptron learning as opposed to Quickprop, from my experience).

However, there is another ftpable answer.  Conjugate-gradient methods
are well known for their ability to determine function minima in
numerical analysis.  Check out the chapter in _Numerical_Recipes_
on function minimization for an explanation and comparison with other
methods, such as steepest-descent.  A conjugate gradient program called OPT is
available by anonymous ftp from cse.ogc.edu in the /pub/nnvowels
directory.  

I have used this program to develop a threat determination network
using infrared temporal intenisty data (128 or 256 inputs, 8-32
hidden units, 2 outputs...takes about 1 minute to learn 20
exemplars, but I am running on a Convex).

I would like to see a comparison (over many runs, as we all know,
backpropagation is sensitive to initial conditions) of
OPT vs. Cascade Correlation with Quickprop Learning.
Infact, I might just try this myself.

-Thomas Edwards

mcdonald@undeed.uucp (Bruce J McDonald) (11/06/90)

Hello

I am designing a reconfigurable neural-net chip which is arranged
as several layers, each containing a number of processing elements
( nodes ).  The NN is designed for recall operation only as the 
necessary logic to support comprehensive training would result in
each node becoming too large and would also mean that each node
would need a state machine to control it.  I have instead decided
that each input weight for each node in the NN can be set externally
using a single serial data channel which is multiplexed by an overall
controller to each weight in each node.  To keep things simple the 
width of input and output to each node is one-bit wide and this
technique allows for a very compact node design.  All training and
weight adjustment is done off-chip.

The key to this approach is a generic ( remaining within the limits
of digital data representation - to some, a crippling limitation) 
NN simulator and trainer.  An arbitary sized NN can be specified 
together with any number of training operations which detail the 
training data and number of iterations etc.  The programme is up 
and working but I find that weight setting convergence is often 
hard to achieve as it requires lots of fine tuning of the training
data.  I suspect that my implementation of the back-propagation 
training method is some-what suspect especially the derivative of 
the threshold function ( T'(net) ).  

Could anyone out there please mail me some examples of back-
propagation source code ( C preferrably ) as I am sure that this
is a small problem.  Any other correspondence would be most
appreciated ( helpful or not ).

Thanx 

( Lets help make the world a smaller (and healthier) place )