[comp.ai.neural-nets] Backpercolation - manuscript now available

mgj@cup.portal.com (Mark Gregory Jurik) (02/11/91)
                        New Manuscript Available 


                              BACKPERCOLATION : 
  Assigning Localized Activation Error in Feedforward Perceptron Networks

                               Mark  Jurik
                       Jurik Research & Consulting
                       PO 2379, Aptos, Calif.  USA


                                Abstract
This work introduces a new algorithm, BackPercolation, for training 
multilayered feedforward perceptron networks.  It assigns each processing 
element (PE) its own activation error, thereby giving each PE its own error 
surface.  In contrast, Backpropagation decreases global error by descending 
along gradients of the global error surface.  Experimental results reveal 
that weight adjustments which reduce local activation errors permit the 
system to "tunnel through" the global error surface, increasing convergence 
rate and the likelihood of attaining an optimal minima. 
****************

In early 1990, over 30 researchers had experimented with a preliminary 
version of Backpercolation (Perc).  Their feedback motivated the development 
of a more sophisticated version that includes the following options: learn 
rate feedback control and a "kick-starter" for weight initialization.  Perc 
uses local computations that are slightly more complex than Backpropagation, 
and does NOT employ any of the following techniques:

   - momentum,
   - matrix inversion or pseudo-inversion (computationally non-local), 
   - nested subroutines (such as LineSearch),
   - architecture reconfiguration (troublesome with non-stationary modeling).

Yet despite these simplifying constraints, Perc trains very quickly.  For 
example, it solves the 12-2-12 encoder problem in less than 600 epochs and 
the 6-6-6-1 parity problem in only 93 epochs.  The table below compares 
Perc's speed against other high-performance nets on some popular simple 
tests.  For each test, the opposing paradigm listed is the one that the 
literature (listed below) revealed to be the fastest one for that test.  This 
way, Perc is being compared to the best of the best (or something like that). 

The  D  symbol in the configuration column below denotes that both Perc and 
the compared paradigm included direct connections between the input and 
output nodes.  The  G  symbol denotes that the hidden layer cells in both 
Perc and the other paradigm used Gaussian rather than sigmoidal thresholding.  
The Epoch Count is the number of training cycles required to get the output 
of every cell, for every pattern, to be within the error tolerance as 
specified in the respective published sources.  Perc's epoch count was 
averaged over 50 trials per problem task.  Only successfully converging 
episodes were included in the average, but as shown below, Perc's probability 
of success was frequently 100%.  Also keep in mind that many published 
articles do not mention the probability of success of convergence.  (I wonder 
why?)  
*****************************************************************************
PROBLEM                 ERROR    COMPARING    PARADIGM'S     PERC'S  PERC'S
 TYPE       CONFIG      TOLER.   PARADIGM         EPOCHS     EPOCHS  % SUCC
--------------------------------------------------------------------------
Parity      2-2-1        O.1    CONJUGATE GRADIENT     73       8    100
            3-3-1        0.1    GRAM-SCHMIDT          247      18    100
            2-1-1 (D)    0.1    CASCADE-CORR (D,G)     24       8    100
            3-1-1 (D)    0.1    CASCADE-CORR (D,G)     32       6    100
            4-4-1 (D,G)  0.1    CASCADE-CORR (D,G)     66       5    100
            8-8-1 (D,G)  0.1    CASCADE-CORR (D,G)    357      28    100
            6-6-6-1       *     SOLIS/WETS OPTIMIZE  5713      93     80

Encoder/    8-2-8        0.4    QUICKPROP             103      88    100
            8-2-8        0.1    **                            194    100
decoder     8-3-8        0.4    QUICKPROP              22      22    100
            8-3-8        0.1    CONJUGATE GRADIENT     68      28    100
            10-5-10      0.4    QUICKPROP              14      15    100
            10-5-10      0.1    CONJUGATE GRADIENT     71      19    100
            12-2-12      0.4    **                            581    100
Linear            
Channel     10-10-10     0.1    SUPER-SAB             125       8    100

Multiplex   6-6-1        0.1    DELTA-BAR-DELTA       137      24    100

Symmetry    6-2-1        0.3    GAIN BACK-PROP        198     124     84 

* In the 6-6-6-1 task, training stopped when the squared error, summed across 
all 64 training patterns, totaled less than 0.0002.
** I am unaware of any published result for this test.

Data for the above comparisons were obtained from the following publications:

Gain Backprop           --- J. Complex Systems, Feb 1990, p 51
Delta_bar_delta         --- J. Neural Networks, #4, 1988, p 295
Quickprop               --- Proceedings 1988 Connectionist Models, p 38
Super-SAB               --- J. Neural Networks, #5, 1990, p 561
Conjugate Gradient      --- Advances in Neural Info Proc Sys 1, 1988, p 40
Cascade Correlation     --- Advances in Neural Info Proc Sys 2, 1989, p 524
Gram_Schmidt            --- Neural Computation, #1, 1990, p 116
Solis/Wets Optimization --- J. Neural Networks, #5, 1989, p367

As an option, the learn-rate parameter can be initially set to zero and it 
will automatically adjust as learning proceeds.  This option frees the user 
from needing to find the optimal learn-rate; however, it has one major 
drawback: the algorithm needs to access information on all the cells, which 
is a non-local process. 

With Perc, the user is still required to determine these things: 
   1. number of hidden layers, 
   2. number of cells per hidden layer, 
   3. the scale of the bounds on the initialized weights.

Journal reviewers of an earlier manuscript on Perc have asked for a 
mathematical analysis of the algorithm s stability and convergence 
properties.  Unfortunately, I am not aware of any proper set of analytical 
tools for this kind of nonlinear behavior.  As I see it, the literature all 
too frequently applies linear analysis techniques that require too many 
simplifying assumptions about the network.  As a result, many conclusions 
thought to be gospel one year get discarded the next.  Thus for the sake of 
keeping things clean, this manuscript will simply present the Perc algorithm, 
some derivation and lots of experimental results.  You will need to verify 
Perc's utility for your own particular task(s).

Speaking of validation, Perc has recently been licensed for inclusion into 
BrainCel, a commercially available set of macros which give the Excel 
spreadsheet program on the PC the capability to train a neural net on 
spreadsheet data.  I have been informed that Perc has trained a neural net to 
predict the commodities price of leather, as well as estimate from 
proprietary demographic data the likelihood that a prospective customer will 
agree to buy a luxury cruise from a salseman over the phone.  Now a computer 
can prepare a list of likely prospects to teleoperators and thereby cut down on 
useless calls that only end up irritating many people.  TThe spreadsheet-neuralnet
process of BrainCel is so automated that the user does not even need to know 
what a neural network is.  Call 203-562-7335 for details.  

A revised manuscript on BACKPERCOLATION is now available.  It will include a 
hardcopy of the source code used to solve the 12-2-12 encoder/decoder 
problem.  For one copy, please enclose US$10 (or foreign currency equivalent) 
to cover printing, shipping and handling expenses.  Sorry, this will not be 
available via ftp.

PS. All researchers who have tested Perc and helped develop its upgrade 
during this past year will automatically be receiving a free copy of the 
manuscript and source code.
----------------------------------------------------------------------------
           JURIK RESEARCH & CONSULTING
                 PO BOX 2379
             APTOS, CALIFORNIA 95001
----------------------------------------------------------------------------

...