guansy@cs.tamu.edu (Sheng-Yih Guan) (08/01/90)
In article <6985@helios.TAMU.EDU> vu2jok@cs.tamu.edu (Jogen K Pathak) writes: >We are encountering problems while training the different paradigms , especially >Back - Propagation paradigm. The training is very time consuming and tedious. >Can anyone help to choose the training parameters' values that can >reduce the training sessions. We are working in pattern classification of >moderate size.e.g 100 input attributes. > >Any literature references also will be greatly appreciated. > >Jogen and Rajan. In Fahlman and Lebiere's paper - The Cascade-Correlation Learning Architecture, they have tried to analyze the resons why backprop learning is so slow and they have identified two major problems: 1. the step-size problem, and 2. the moving target problem. In their references, there are several other articles on how to improve the convergence of Back-Propagation. Fahlman and Lebiere's paper is available via anonymous ftp. The procedure is as follows: >ftp cheops.cis.ohio-state.edu >cd /pub/neuroprose >bin >get fahlman.cascor-tr.ps.Z >quit Hope this is helpful. _ _ _ ___________ | \ /_| / / Visualization Lab /____ ____/ \ \ // / / Computer Science Dept / / _ _ _ _ | | // / / Texax A&M University / / / | | \ / | | | || | |// / / College Station, TX 77843-3112/ / / /| | \//|| | | || / / / /____ Tel: (409)845-0531 / / / -|| | |\/ || | !_!| !__/ /______/ stanley@visual1.tamu.edu /_/ /_/ || !_! || !____!
ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards) (08/01/90)
In article <7010@helios.TAMU.EDU> guansy@cs.tamu.edu (Sheng-Yih Guan) writes: > >In article <6985@helios.TAMU.EDU> vu2jok@cs.tamu.edu (Jogen K Pathak) writes: >>We are encountering problems while training the different paradigms , especially >>Back - Propagation paradigm. The training is very time consuming and tedious. >>Can anyone help to choose the training parameters' values that can >>reduce the training sessions. We are working in pattern classification of >>moderate size.e.g 100 input attributes. >In Fahlman and Lebiere's paper - The Cascade-Correlation Learning Architecture, >they have tried to analyze the resons why backprop learning is so slow and >they have identified two major problems: > 1. the step-size problem, and > 2. the moving target problem. Fahlman and Lebiere's Cascade-Correlation learning is a definate improvement over conventional backprop methods. By building up the network layer by layer, they reduce the backprop calculation to dealing with a single weight layer at time, which incredibly speeds up the process, as well as eliminating the moving target problem. I find this algorithm very pleasing, as it explains how a multi-layered neural system can be built up quickly. Their TR has a wonderful example of how C.C. learned the two spiral problem. The first layer splits the input space in half, the second forms a few big receptive fields, and each layer after that forms receptive fields which come closer and closer to exactly partitioning the input space into the two separate spirals. The single weight layer learning is done with Quickprop (which could be used in a multi-layer network by itself). This method uses second order information about the gradient to determine the next step. (Cascade-Correlation performs much worse using perceptron learning as opposed to Quickprop, from my experience). However, there is another ftpable answer. Conjugate-gradient methods are well known for their ability to determine function minima in numerical analysis. Check out the chapter in _Numerical_Recipes_ on function minimization for an explanation and comparison with other methods, such as steepest-descent. A conjugate gradient program called OPT is available by anonymous ftp from cse.ogc.edu in the /pub/nnvowels directory. I have used this program to develop a threat determination network using infrared temporal intenisty data (128 or 256 inputs, 8-32 hidden units, 2 outputs...takes about 1 minute to learn 20 exemplars, but I am running on a Convex). I would like to see a comparison (over many runs, as we all know, backpropagation is sensitive to initial conditions) of OPT vs. Cascade Correlation with Quickprop Learning. Infact, I might just try this myself. -Thomas Edwards
mcdonald@undeed.uucp (Bruce J McDonald) (11/06/90)
Hello I am designing a reconfigurable neural-net chip which is arranged as several layers, each containing a number of processing elements ( nodes ). The NN is designed for recall operation only as the necessary logic to support comprehensive training would result in each node becoming too large and would also mean that each node would need a state machine to control it. I have instead decided that each input weight for each node in the NN can be set externally using a single serial data channel which is multiplexed by an overall controller to each weight in each node. To keep things simple the width of input and output to each node is one-bit wide and this technique allows for a very compact node design. All training and weight adjustment is done off-chip. The key to this approach is a generic ( remaining within the limits of digital data representation - to some, a crippling limitation) NN simulator and trainer. An arbitary sized NN can be specified together with any number of training operations which detail the training data and number of iterations etc. The programme is up and working but I find that weight setting convergence is often hard to achieve as it requires lots of fine tuning of the training data. I suspect that my implementation of the back-propagation training method is some-what suspect especially the derivative of the threshold function ( T'(net) ). Could anyone out there please mail me some examples of back- propagation source code ( C preferrably ) as I am sure that this is a small problem. Any other correspondence would be most appreciated ( helpful or not ). Thanx ( Lets help make the world a smaller (and healthier) place )