mgj@cup.portal.com (Mark Gregory Jurik) (02/11/91)
New Manuscript Available
BACKPERCOLATION :
Assigning Localized Activation Error in Feedforward Perceptron Networks
Mark Jurik
Jurik Research & Consulting
PO 2379, Aptos, Calif. USA
Abstract
This work introduces a new algorithm, BackPercolation, for training
multilayered feedforward perceptron networks. It assigns each processing
element (PE) its own activation error, thereby giving each PE its own error
surface. In contrast, Backpropagation decreases global error by descending
along gradients of the global error surface. Experimental results reveal
that weight adjustments which reduce local activation errors permit the
system to "tunnel through" the global error surface, increasing convergence
rate and the likelihood of attaining an optimal minima.
****************
In early 1990, over 30 researchers had experimented with a preliminary
version of Backpercolation (Perc). Their feedback motivated the development
of a more sophisticated version that includes the following options: learn
rate feedback control and a "kick-starter" for weight initialization. Perc
uses local computations that are slightly more complex than Backpropagation,
and does NOT employ any of the following techniques:
- momentum,
- matrix inversion or pseudo-inversion (computationally non-local),
- nested subroutines (such as LineSearch),
- architecture reconfiguration (troublesome with non-stationary modeling).
Yet despite these simplifying constraints, Perc trains very quickly. For
example, it solves the 12-2-12 encoder problem in less than 600 epochs and
the 6-6-6-1 parity problem in only 93 epochs. The table below compares
Perc's speed against other high-performance nets on some popular simple
tests. For each test, the opposing paradigm listed is the one that the
literature (listed below) revealed to be the fastest one for that test. This
way, Perc is being compared to the best of the best (or something like that).
The D symbol in the configuration column below denotes that both Perc and
the compared paradigm included direct connections between the input and
output nodes. The G symbol denotes that the hidden layer cells in both
Perc and the other paradigm used Gaussian rather than sigmoidal thresholding.
The Epoch Count is the number of training cycles required to get the output
of every cell, for every pattern, to be within the error tolerance as
specified in the respective published sources. Perc's epoch count was
averaged over 50 trials per problem task. Only successfully converging
episodes were included in the average, but as shown below, Perc's probability
of success was frequently 100%. Also keep in mind that many published
articles do not mention the probability of success of convergence. (I wonder
why?)
*****************************************************************************
PROBLEM ERROR COMPARING PARADIGM'S PERC'S PERC'S
TYPE CONFIG TOLER. PARADIGM EPOCHS EPOCHS % SUCC
--------------------------------------------------------------------------
Parity 2-2-1 O.1 CONJUGATE GRADIENT 73 8 100
3-3-1 0.1 GRAM-SCHMIDT 247 18 100
2-1-1 (D) 0.1 CASCADE-CORR (D,G) 24 8 100
3-1-1 (D) 0.1 CASCADE-CORR (D,G) 32 6 100
4-4-1 (D,G) 0.1 CASCADE-CORR (D,G) 66 5 100
8-8-1 (D,G) 0.1 CASCADE-CORR (D,G) 357 28 100
6-6-6-1 * SOLIS/WETS OPTIMIZE 5713 93 80
Encoder/ 8-2-8 0.4 QUICKPROP 103 88 100
8-2-8 0.1 ** 194 100
decoder 8-3-8 0.4 QUICKPROP 22 22 100
8-3-8 0.1 CONJUGATE GRADIENT 68 28 100
10-5-10 0.4 QUICKPROP 14 15 100
10-5-10 0.1 CONJUGATE GRADIENT 71 19 100
12-2-12 0.4 ** 581 100
Linear
Channel 10-10-10 0.1 SUPER-SAB 125 8 100
Multiplex 6-6-1 0.1 DELTA-BAR-DELTA 137 24 100
Symmetry 6-2-1 0.3 GAIN BACK-PROP 198 124 84
* In the 6-6-6-1 task, training stopped when the squared error, summed across
all 64 training patterns, totaled less than 0.0002.
** I am unaware of any published result for this test.
Data for the above comparisons were obtained from the following publications:
Gain Backprop --- J. Complex Systems, Feb 1990, p 51
Delta_bar_delta --- J. Neural Networks, #4, 1988, p 295
Quickprop --- Proceedings 1988 Connectionist Models, p 38
Super-SAB --- J. Neural Networks, #5, 1990, p 561
Conjugate Gradient --- Advances in Neural Info Proc Sys 1, 1988, p 40
Cascade Correlation --- Advances in Neural Info Proc Sys 2, 1989, p 524
Gram_Schmidt --- Neural Computation, #1, 1990, p 116
Solis/Wets Optimization --- J. Neural Networks, #5, 1989, p367
As an option, the learn-rate parameter can be initially set to zero and it
will automatically adjust as learning proceeds. This option frees the user
from needing to find the optimal learn-rate; however, it has one major
drawback: the algorithm needs to access information on all the cells, which
is a non-local process.
With Perc, the user is still required to determine these things:
1. number of hidden layers,
2. number of cells per hidden layer,
3. the scale of the bounds on the initialized weights.
Journal reviewers of an earlier manuscript on Perc have asked for a
mathematical analysis of the algorithm s stability and convergence
properties. Unfortunately, I am not aware of any proper set of analytical
tools for this kind of nonlinear behavior. As I see it, the literature all
too frequently applies linear analysis techniques that require too many
simplifying assumptions about the network. As a result, many conclusions
thought to be gospel one year get discarded the next. Thus for the sake of
keeping things clean, this manuscript will simply present the Perc algorithm,
some derivation and lots of experimental results. You will need to verify
Perc's utility for your own particular task(s).
Speaking of validation, Perc has recently been licensed for inclusion into
BrainCel, a commercially available set of macros which give the Excel
spreadsheet program on the PC the capability to train a neural net on
spreadsheet data. I have been informed that Perc has trained a neural net to
predict the commodities price of leather, as well as estimate from
proprietary demographic data the likelihood that a prospective customer will
agree to buy a luxury cruise from a salseman over the phone. Now a computer
can prepare a list of likely prospects to teleoperators and thereby cut down on
useless calls that only end up irritating many people. TThe spreadsheet-neuralnet
process of BrainCel is so automated that the user does not even need to know
what a neural network is. Call 203-562-7335 for details.
A revised manuscript on BACKPERCOLATION is now available. It will include a
hardcopy of the source code used to solve the 12-2-12 encoder/decoder
problem. For one copy, please enclose US$10 (or foreign currency equivalent)
to cover printing, shipping and handling expenses. Sorry, this will not be
available via ftp.
PS. All researchers who have tested Perc and helped develop its upgrade
during this past year will automatically be receiving a free copy of the
manuscript and source code.
----------------------------------------------------------------------------
JURIK RESEARCH & CONSULTING
PO BOX 2379
APTOS, CALIFORNIA 95001
----------------------------------------------------------------------------
...