mgj@cup.portal.com (Mark Gregory Jurik) (02/11/91)
New Manuscript Available BACKPERCOLATION : Assigning Localized Activation Error in Feedforward Perceptron Networks Mark Jurik Jurik Research & Consulting PO 2379, Aptos, Calif. USA Abstract This work introduces a new algorithm, BackPercolation, for training multilayered feedforward perceptron networks. It assigns each processing element (PE) its own activation error, thereby giving each PE its own error surface. In contrast, Backpropagation decreases global error by descending along gradients of the global error surface. Experimental results reveal that weight adjustments which reduce local activation errors permit the system to "tunnel through" the global error surface, increasing convergence rate and the likelihood of attaining an optimal minima. **************** In early 1990, over 30 researchers had experimented with a preliminary version of Backpercolation (Perc). Their feedback motivated the development of a more sophisticated version that includes the following options: learn rate feedback control and a "kick-starter" for weight initialization. Perc uses local computations that are slightly more complex than Backpropagation, and does NOT employ any of the following techniques: - momentum, - matrix inversion or pseudo-inversion (computationally non-local), - nested subroutines (such as LineSearch), - architecture reconfiguration (troublesome with non-stationary modeling). Yet despite these simplifying constraints, Perc trains very quickly. For example, it solves the 12-2-12 encoder problem in less than 600 epochs and the 6-6-6-1 parity problem in only 93 epochs. The table below compares Perc's speed against other high-performance nets on some popular simple tests. For each test, the opposing paradigm listed is the one that the literature (listed below) revealed to be the fastest one for that test. This way, Perc is being compared to the best of the best (or something like that). The D symbol in the configuration column below denotes that both Perc and the compared paradigm included direct connections between the input and output nodes. The G symbol denotes that the hidden layer cells in both Perc and the other paradigm used Gaussian rather than sigmoidal thresholding. The Epoch Count is the number of training cycles required to get the output of every cell, for every pattern, to be within the error tolerance as specified in the respective published sources. Perc's epoch count was averaged over 50 trials per problem task. Only successfully converging episodes were included in the average, but as shown below, Perc's probability of success was frequently 100%. Also keep in mind that many published articles do not mention the probability of success of convergence. (I wonder why?) ***************************************************************************** PROBLEM ERROR COMPARING PARADIGM'S PERC'S PERC'S TYPE CONFIG TOLER. PARADIGM EPOCHS EPOCHS % SUCC -------------------------------------------------------------------------- Parity 2-2-1 O.1 CONJUGATE GRADIENT 73 8 100 3-3-1 0.1 GRAM-SCHMIDT 247 18 100 2-1-1 (D) 0.1 CASCADE-CORR (D,G) 24 8 100 3-1-1 (D) 0.1 CASCADE-CORR (D,G) 32 6 100 4-4-1 (D,G) 0.1 CASCADE-CORR (D,G) 66 5 100 8-8-1 (D,G) 0.1 CASCADE-CORR (D,G) 357 28 100 6-6-6-1 * SOLIS/WETS OPTIMIZE 5713 93 80 Encoder/ 8-2-8 0.4 QUICKPROP 103 88 100 8-2-8 0.1 ** 194 100 decoder 8-3-8 0.4 QUICKPROP 22 22 100 8-3-8 0.1 CONJUGATE GRADIENT 68 28 100 10-5-10 0.4 QUICKPROP 14 15 100 10-5-10 0.1 CONJUGATE GRADIENT 71 19 100 12-2-12 0.4 ** 581 100 Linear Channel 10-10-10 0.1 SUPER-SAB 125 8 100 Multiplex 6-6-1 0.1 DELTA-BAR-DELTA 137 24 100 Symmetry 6-2-1 0.3 GAIN BACK-PROP 198 124 84 * In the 6-6-6-1 task, training stopped when the squared error, summed across all 64 training patterns, totaled less than 0.0002. ** I am unaware of any published result for this test. Data for the above comparisons were obtained from the following publications: Gain Backprop --- J. Complex Systems, Feb 1990, p 51 Delta_bar_delta --- J. Neural Networks, #4, 1988, p 295 Quickprop --- Proceedings 1988 Connectionist Models, p 38 Super-SAB --- J. Neural Networks, #5, 1990, p 561 Conjugate Gradient --- Advances in Neural Info Proc Sys 1, 1988, p 40 Cascade Correlation --- Advances in Neural Info Proc Sys 2, 1989, p 524 Gram_Schmidt --- Neural Computation, #1, 1990, p 116 Solis/Wets Optimization --- J. Neural Networks, #5, 1989, p367 As an option, the learn-rate parameter can be initially set to zero and it will automatically adjust as learning proceeds. This option frees the user from needing to find the optimal learn-rate; however, it has one major drawback: the algorithm needs to access information on all the cells, which is a non-local process. With Perc, the user is still required to determine these things: 1. number of hidden layers, 2. number of cells per hidden layer, 3. the scale of the bounds on the initialized weights. Journal reviewers of an earlier manuscript on Perc have asked for a mathematical analysis of the algorithm s stability and convergence properties. Unfortunately, I am not aware of any proper set of analytical tools for this kind of nonlinear behavior. As I see it, the literature all too frequently applies linear analysis techniques that require too many simplifying assumptions about the network. As a result, many conclusions thought to be gospel one year get discarded the next. Thus for the sake of keeping things clean, this manuscript will simply present the Perc algorithm, some derivation and lots of experimental results. You will need to verify Perc's utility for your own particular task(s). Speaking of validation, Perc has recently been licensed for inclusion into BrainCel, a commercially available set of macros which give the Excel spreadsheet program on the PC the capability to train a neural net on spreadsheet data. I have been informed that Perc has trained a neural net to predict the commodities price of leather, as well as estimate from proprietary demographic data the likelihood that a prospective customer will agree to buy a luxury cruise from a salseman over the phone. Now a computer can prepare a list of likely prospects to teleoperators and thereby cut down on useless calls that only end up irritating many people. TThe spreadsheet-neuralnet process of BrainCel is so automated that the user does not even need to know what a neural network is. Call 203-562-7335 for details. A revised manuscript on BACKPERCOLATION is now available. It will include a hardcopy of the source code used to solve the 12-2-12 encoder/decoder problem. For one copy, please enclose US$10 (or foreign currency equivalent) to cover printing, shipping and handling expenses. Sorry, this will not be available via ftp. PS. All researchers who have tested Perc and helped develop its upgrade during this past year will automatically be receiving a free copy of the manuscript and source code. ---------------------------------------------------------------------------- JURIK RESEARCH & CONSULTING PO BOX 2379 APTOS, CALIFORNIA 95001 ---------------------------------------------------------------------------- ...