[comp.ai.neural-nets] Explanation of Conjugate Gradient

pjhamvs@cs.vu.nl (Summeren van Peter) (09/20/90)
Greetings to the reader,
this file contains :
	1. a literature referencing for troff
	2. a file for printing on a Pp4p an explanation of the method
	   from Powell. The implementation was ftp'ed and the ftp
	   adress was in this group before. But I put it here some
	   adresses:
	   for a hardcopy of docs: vincew@cse.ogc.edu
	   for having done the ftp:  opt-dist@cse.ogc.edu
	   for the bugs:opt-bugs@cse.ogc.edu.
	3. The implementation of opt was done by
 	   Etienne Barnard and Ronald Cole, (OGC Tech. Report No.CSE 89-014)

You have to use your own troff commands for referencing. Ask the local
troff specialists about it (it's not that difficult). I use a macro
that places the references on the same page, but you may have other
habits.

After the next line starts a file for referencing literature in troff.
=========================================================================
%E William H.Press 
%T Numerical Recipes in C
%I Cambridge University Press
%D 1989

%A M.J.D. Powell
%T Restart Procedures for the conjugate gradient method
%J Mathematical Programming
%P 12(1977) 241-254

%A E.M.L. Beale
%T A derivation of conjugate gradients
%E F.A.Lootsma
%B Numerical Methods for nonlinear optimization
%I Academic Press
%C London 
%D 1972
%P pp. 39-43

%A J.Stoer A.Bulirsch
%T Introduction to numerical analysis
%I Springer Verlag
%C New York
%D 1980

After the next line the troff program starts
==========================================================================
.EQ
delim $$
.EN
.nr PS 12
.nr VS 15
.ps 12
.vs 15
.ft R
.NH
Introduction
.LP
A neural network can be built in many ways. The terms used are neural node,
neural node function, sites, links. They are shown below in the figure.
.PS
scale = 2
.ps
.ps 10
ellipse at 3.737,8.012 wid 1.000 ht 1.000
ellipse at 1.988,6.513 wid 1.000 ht 1.000
ellipse at 1.238,9.012 wid 1.000 ht 1.000
dashwid = 0.050i
line dashed from 1.988,9.137 to 2.987,9.137
line dashed from 2.487,9.762 to 2.487,8.512
line from 1.800,6.513 to 1.800,6.513
line dashed from 1.050,8.387 to 1.238,8.200
line from 1.149,8.253 to 1.238,8.200 to 1.184,8.288
line from 3.887,9.412 to 3.862,9.512 to 3.837,9.412
line from 3.862,9.512 to 3.862,8.512
line from 0.657,8.536 to 0.675,8.637 to 0.612,8.556
line from 0.675,8.637 to 0.425,8.075
line from 0.600,8.567 to 0.675,8.637 to 0.575,8.611
line from 0.675,8.637 to 0.113,8.325
line from 0.675,8.637 to 0.675,8.637
line from 3.362,7.638 to 3.237,7.450 to 3.425,7.325 to 3.550,7.575
line from 2.362,6.200 to 2.550,6.075 to 2.612,6.200 to 2.425,6.325
line from 1.550,6.200 to 1.425,6.138 to 1.488,6.013 to 1.675,6.138
line from 1.488,6.513 to 1.300,6.513 to 1.300,6.388 to 1.488,6.388
line from 0.988,8.575 to 0.925,8.450 to 1.050,8.387 to 1.113,8.575
line from 0.800,8.825 to 0.613,8.700 to 0.738,8.575 to 0.863,8.700
spline from 1.988,8.762 to 2.237,8.825 to 2.362,8.950 to 2.487,9.137 to 2.675,9.450 to 2.925,9.575 to 3.175,9.575
line from 2.652,6.125 to 2.550,6.138 to 2.635,6.079
line from 2.550,6.138 	to 2.584,6.122\
	to 2.617,6.107\
	to 2.650,6.093\
	to 2.682,6.079\
	to 2.713,6.065\
	to 2.745,6.052\
	to 2.775,6.038\
	to 2.805,6.026\
	to 2.835,6.013\
	to 2.864,6.001\
	to 2.892,5.990\
	to 2.921,5.978\
	to 2.948,5.967\
	to 2.975,5.957\
	to 3.002,5.947\
	to 3.028,5.937\
	to 3.054,5.927\
	to 3.080,5.918\
	to 3.129,5.900\
	to 3.177,5.884\
	to 3.224,5.869\
	to 3.269,5.856\
	to 3.313,5.843\
	to 3.355,5.832\
	to 3.396,5.822\
	to 3.435,5.814\
	to 3.474,5.807\
	to 3.511,5.800\
	to 3.548,5.796\
	to 3.583,5.792\
	to 3.617,5.789\
	to 3.651,5.788\
	to 3.684,5.788\
	to 3.716,5.789\
	to 3.747,5.791\
	to 3.778,5.794\
	to 3.808,5.798\
	to 3.838,5.803\
	to 3.867,5.809\
	to 3.896,5.817\
	to 3.925,5.825\

line from 3.925,5.825 	to 3.955,5.835\
	to 3.985,5.848\
	to 4.014,5.862\
	to 4.043,5.879\
	to 4.073,5.897\
	to 4.101,5.916\
	to 4.130,5.938\
	to 4.158,5.960\
	to 4.186,5.984\
	to 4.214,6.010\
	to 4.241,6.036\
	to 4.268,6.063\
	to 4.294,6.091\
	to 4.320,6.120\
	to 4.346,6.150\
	to 4.371,6.180\
	to 4.395,6.210\
	to 4.419,6.241\
	to 4.442,6.272\
	to 4.465,6.303\
	to 4.486,6.334\
	to 4.508,6.364\
	to 4.528,6.395\
	to 4.548,6.425\
	to 4.567,6.454\
	to 4.585,6.483\
	to 4.602,6.511\
	to 4.619,6.539\
	to 4.634,6.565\
	to 4.649,6.591\
	to 4.675,6.638\

line from 4.675,6.638 	to 4.697,6.678\
	to 4.722,6.720\
	to 4.748,6.764\
	to 4.775,6.811\
	to 4.804,6.860\
	to 4.818,6.885\
	to 4.833,6.910\
	to 4.848,6.936\
	to 4.863,6.962\
	to 4.879,6.989\
	to 4.894,7.016\
	to 4.910,7.044\
	to 4.925,7.071\
	to 4.941,7.100\
	to 4.956,7.128\
	to 4.972,7.157\
	to 4.987,7.186\
	to 5.002,7.216\
	to 5.017,7.245\
	to 5.032,7.275\
	to 5.047,7.306\
	to 5.062,7.336\
	to 5.076,7.367\
	to 5.090,7.398\
	to 5.104,7.429\
	to 5.117,7.461\
	to 5.130,7.492\
	to 5.143,7.524\
	to 5.155,7.556\
	to 5.166,7.588\
	to 5.178,7.620\
	to 5.188,7.652\
	to 5.198,7.684\
	to 5.208,7.717\
	to 5.217,7.749\
	to 5.225,7.782\
	to 5.233,7.814\
	to 5.240,7.847\
	to 5.246,7.880\
	to 5.252,7.912\
	to 5.256,7.945\
	to 5.260,7.977\
	to 5.263,8.010\
	to 5.266,8.042\
	to 5.267,8.075\
	to 5.267,8.107\
	to 5.267,8.139\
	to 5.265,8.171\
	to 5.263,8.203\
	to 5.259,8.235\
	to 5.254,8.267\
	to 5.249,8.298\
	to 5.242,8.329\
	to 5.234,8.360\
	to 5.224,8.391\
	to 5.214,8.422\
	to 5.202,8.452\
	to 5.189,8.483\
	to 5.175,8.512\

line from 5.175,8.512 	to 5.159,8.543\
	to 5.141,8.572\
	to 5.122,8.600\
	to 5.101,8.628\
	to 5.079,8.653\
	to 5.056,8.678\
	to 5.006,8.724\
	to 4.979,8.746\
	to 4.951,8.766\
	to 4.923,8.785\
	to 4.894,8.802\
	to 4.864,8.819\
	to 4.833,8.834\
	to 4.801,8.848\
	to 4.770,8.861\
	to 4.737,8.872\
	to 4.704,8.882\
	to 4.671,8.891\
	to 4.637,8.899\
	to 4.604,8.905\
	to 4.570,8.911\
	to 4.536,8.914\
	to 4.502,8.917\
	to 4.468,8.918\
	to 4.434,8.918\
	to 4.400,8.916\
	to 4.367,8.913\
	to 4.334,8.909\
	to 4.301,8.903\
	to 4.269,8.896\
	to 4.237,8.887\

line from 4.237,8.887 	to 4.190,8.869\
	to 4.146,8.843\
	to 4.107,8.808\
	to 4.069,8.762\
	to 4.051,8.735\
	to 4.033,8.705\
	to 4.015,8.672\
	to 3.998,8.635\
	to 3.980,8.594\
	to 3.962,8.550\
	to 3.944,8.502\
	to 3.934,8.477\
	to 3.925,8.450\

line from 0.973,8.285 to 0.988,8.387 to 0.926,8.304
line from 0.988,8.387 	to 0.974,8.361\
	to 0.961,8.335\
	to 0.938,8.285\
	to 0.917,8.238\
	to 0.899,8.195\
	to 0.883,8.154\
	to 0.870,8.116\
	to 0.859,8.079\
	to 0.851,8.046\
	to 0.845,8.013\
	to 0.841,7.983\
	to 0.839,7.954\
	to 0.839,7.927\
	to 0.842,7.900\
	to 0.847,7.875\
	to 0.863,7.825\

line from 0.863,7.825 	to 0.891,7.778\
	to 0.932,7.737\
	to 0.957,7.719\
	to 0.983,7.703\
	to 1.011,7.687\
	to 1.039,7.672\
	to 1.068,7.659\
	to 1.097,7.645\
	to 1.125,7.633\
	to 1.152,7.621\
	to 1.200,7.598\
	to 1.238,7.575\

line from 1.238,7.575 	to 1.276,7.546\
	to 1.324,7.512\
	to 1.350,7.492\
	to 1.377,7.472\
	to 1.404,7.451\
	to 1.432,7.430\
	to 1.460,7.408\
	to 1.487,7.386\
	to 1.512,7.364\
	to 1.537,7.343\
	to 1.580,7.301\
	to 1.613,7.263\

line from 1.613,7.263 	to 1.639,7.219\
	to 1.653,7.190\
	to 1.667,7.157\
	to 1.682,7.117\
	to 1.699,7.070\
	to 1.708,7.043\
	to 1.717,7.014\
	to 1.727,6.983\
	to 1.738,6.950\

.ps
.ps 12
.ft
.ft R
"A  NETWORK" at 2.175,5.296 ljust
"activity" at 2.175,10.046 ljust
"input" at 2.800,8.921 ljust
"node" at 1.738,6.546 ljust
"link" at 1.300,7.608 ljust
"site" at 1.175,8.108 ljust
.ps
.ft
.PE
.nr PS 12
.nr VS 15
.ps 12
.vs 15
.ft R
A kind of  gradient descent algorithm is normally used in the backpropagation method
to calculate the weights of the network. The activation of a neuron is a
nondecreasing and differentiable function of the total input. If one looks
at the outputlayer it is very easy to make the connection between what 
happens there and any (conjugate gradient or gradient descent) method. 
.br
The output can be thought of as a vector with dimension N the number of output
units: $ act sub i sup p $ denotes the response of an output unit for the
output vector \fIp\fR. In the learning phase it is known what the output has
to be: $ s sub i sup p $.
To make the total error over all the patterns as least as possible
.br
.sp
.ce
E = $ SIGMA $ $ E sup p $ = $ SIGMA sub 1 sup N $ $ (act sub i sup p $ - $ s sub i sup p$ $ ) sup 2 $
.br
is minimised. The input itself is 
.br
.sp
.ce
$ input sub i sup p $ = $ SIGMA sub j $ $w sub i sub j$ $ act sub j sup p$ + $ bias sub j $. 
The gradient of E is now: 
.br 
.ce
$ {partial E sup p } over {partial w sub {i j} }  =   {partial  E sup p} over {partial input sub i sup p }  {partial input sup p} over {partial w sub {i j}} $.
.br
But ${partial input sub i sup p} over {partial w sub {i j}} $ is known to be $ act sub j sup p$. 
.br
We now have to work out the term: 
.br
.ce 
${partial  E sup p} over {partial input sub i sup p } $. 
.br
And this is equal to : 
.br
.sp
.ce
$ {partial  E sup p} over {partial act sub i sup p }  {partial act sub i sup p} over { partial input sub {i j}} $.
.br
The last term is the differential of the node function \fIF(x)\fR. The first
term differs for an output unit and a unit somewhere in the network. In the
case of an output unit one can differentiate $ E sup p $. In the case of
an hidden  unit
.br
.sp
.ce
$ {partial  E sup p} over {partial act sub i sup p} ~=~
SIGMA sub {h = 1} sup N { partial E sup p} over {partial input sub h sup p}  
{ partial input sub h sup p} over { partial  act sub i sup p}$
.br
The last term equals $w sub {h i} $ and so one gets:
.br
.sp
.ce
 $ {partial  E sup p} over {partial act sub i sup p} ~=~ SIGMA sub {h = 1} sup N { partial E sup p} over {partial input sub h sup p} w sub {h i} $.
.br
Now we can collect the results of all these calculations for hidden units:
.br
.sp
.ce
 ${partial input sub i sup p} over {partial w sub {i j}} ~=~ italic F roman \(qs(input sub i sup p) {act sub j sup p}{ SIGMA sub {h = 1} sup N }{ partial E sup p} over {partial input sub h sup p} w sub {h i} $ 

\fRWorking back from the output layer to the input layer one can calculate all
the necessary gradients. Therefore methods like , gradient descent and 
conjugate gradient can be implemented. The whole procedure is sketched in the
figure. 
.PS
scale = 1.6
.ps
.ps 10
ellipse at 0.988,5.763 wid 0.500 ht 0.500
ellipse at 0.487,7.263 wid 0.500 ht 0.500
ellipse at 1.238,8.762 wid 0.500 ht 0.500
dashwid = 0.050i
line dashed from 1.113,5.575 to 2.175,4.638
line from 2.083,4.685 to 2.175,4.638 to 2.117,4.722
line from 6.487,2.8 to 6.487,4.450 to 2.175,4.450 to 2.175,2.8 to 6.487,2.8
line from 3.550,3.888 to 3.550,3.513 to 4.925,3.513
line from 4.825,3.488 to 4.925,3.513 to 4.825,3.538
line from 4.8,3.8 to 4.8,4.4 to 2.237,4.4 to 2.237,3.8 to 4.8,3.8
line from 2.862,6.763 to 2.862,5.013
line from 2.837,5.112 to 2.862,5.013 to 2.887,5.112
line from 6.487,5.513 to 6.487,8.7 to 2.237,8.7 to 2.237,5.513 to 6.487,5.513
line from 2.987,6.763 to 2.987,6.013 to 2.987,5.763 to 4.987,5.763
line from 4.888,5.737 to 4.987,5.763 to 4.888,5.788
line from 4.175,6.763 to 4.175,6.263 to 4.987,6.263
line from 4.888,6.237 to 4.987,6.263 to 4.888,6.288
line from 2.987,8.075 to 2.987,7.513
line from 2.962,7.612 to 2.987,7.513 to 3.012,7.612
line from 5.0,6.8 to 5.0,7.513 to 2.362,7.513 to 2.362,6.8 to 5.0,6.8
line from 4.112,8.075 to 4.112,7.575 to 4.925,7.575
line from 4.825,7.550 to 4.925,7.575 to 4.825,7.600
line from 4.987,8.075 to 4.987,8.637 to 2.300,8.637 to 2.300,8.075 to 4.987,8.075
line from 0.925,6.013 to 0.613,7.075
line from 0.665,6.986 to 0.613,7.075 to 0.617,6.972
line from 1.238,9.012 to 1.238,9.512
line from 1.262,9.412 to 1.238,9.512 to 1.213,9.412
line from 1.488,8.325 to 1.988,9.262 to 2.487,9.262
line from 2.387,9.237 to 2.487,9.262 to 2.387,9.287
line from 1.425,8.493 to 1.363,8.575 to 1.379,8.473
line from 1.363,8.575 to 1.550,8.137
line from 0.863,6.013 to 0.863,6.013
line from 0.863,6.013 to 0.863,6.013
line from 0.863,6.013 to 0.863,6.013
line from 1.095,8.473 to 1.113,8.575 to 1.049,8.494
line from 1.113,8.575 to 0.613,7.450
line from 1.090,8.317 to 0.988,8.325 to 1.074,8.270
spline from 0.988,8.325 to 1.175,8.262 to 1.300,8.262 to 1.488,8.325
line from 1.401,8.270 to 1.488,8.325 to 1.385,8.317
.ps
.ps 10 
.ft
.ft R
"$SIGMA sub {h = 1} sup N { partial E sup p} over {partial input sub h sup p} w sub {h i} $ " at 4.987,3.579 ljust
"FOR HIDDEN UNITS" at 2.237,4.579 ljust
"$SIGMA sub {h = 1} sup N { partial E sup p} over {partial input sub h sup p}{ partial input sub h sup p} over { partial  act sub i sup p}$ " at 2.425,4.25 ljust
"$ {partial E sup p } over {partial w sub {i j} }$ " at 2.237,9.1 ljust
" $ act sub j sup p$"  at 5.112,7.517 ljust
"$ ~~~~{partial  E sup p} over {partial act sub i sup p } ~~~~~~~~~~~~ {partial act sub i sup p}over { partial input sub {i j}} $ " at 2.487,7.3 ljust
"$ 2({act sub i sup p} - {s sub i sup p})$" at 5.050,5.829 ljust
" \fnodefunction\(qs(x)\fR " at 5.050,6.329 ljust
"$~~~~{partial  E sup p} over {partial input sub i sup p }~~~~~~~~~~~~  {partial input sup p} over { w sub {i j}} $" at 2.425,8.5 ljust
.ps
.ps 12
"K" at 0.97,5.796 ljust
"J" at 0.47,7.296 ljust
"I" at 1.22,8.796 ljust
.ps 10
"$~~~~ input sub i sup p $ = $ SIGMA sub j $ $w sub i sub j$ $ act sub j sup p$ + $ bias sub j $ " at 2.487,9.296 ljust
"E = $ SIGMA $ $ E sup p $ = $ SIGMA sub 1 sup N $ $ (act sub i sup p $ - $ s sub i sup p$ $ ) sup 2 $ " at 0.988,9.608 ljust
.ps
.nr PS 12
.nr VS 15
.ps 12
.vs 15
.ft
.PE
\fRWe will now turn to conjugate gradient.

.NH
The conjugate gradient algorithm
.LP
We want to know the least value of a function F(\fBx\fR) where \fBx\fR is
a vector of dimension \fIn\fR. In the conjugate gradient method there is
no need to calculate second derivatives. We denote the $k sup th $ iteration
of the vector \fBx\fR  as $ bold x  sub k $ \fRand the gradient vector as:
.br
.sp
.ce
\fBg(x)\fR = $grad F$(\fBx\fR)
.sp
If $ bold x  sub 1 $ \fRis a given starting point in the space of variables then
we can calculate the gradient $ bold g sub 1 $. \fRThe steepest descent direction 
is of course in the opposite direction: - $ bold g sub 1 $. \fRThis direction is
taken as the first search direction $ bold d sub 1 $\fR. For \fIk\fR > 1 we
apply the following formula:
.sp
.ce
$ bold d sub k $ \fR =  -$ bold g  sub k $ \fR + $ beta sub k $ $ bold d  sub {k-1}$.
.sp
.ce
\fRwhere  $ beta sub k $ = ${norm( {bold g sub k} , {bold g sub k})} over {norm( {bold g sub {k-1}}, {bold g sub {k-1}} )} $.
.br
.sp
\fRAs norm can be taken the Euclidean norm. Now we have to find the minimum of
F(\fBx\fR) in the search direction. This is certainly not an easy task. 
Implementations for this can be found in 
.[ [
Recipes 
.]].
In the course of the algorithm one gets in  this way the several 
searchdirections $ bold d sub k $ \fR and the points 
.sp
.ce
$bold x sub {k+1} $ = $bold x sub k $ + $ lambda sub k $ $ bold d sub k $\fR
.sp
where  F($ bold x sub {k+1}$) is minimal. It would be very advantageous if
the search directions lead in a very short time  to the desired solution.
This indeed can be done by the above method in the case of a quadratic function.
Let :
.sp
.ce
F(\fBx\fR) = $ 1 over 2 $ $ bold x sup T $ A\fBx\fR + $ bold {b sup T} x $\fR + c
.sp
where T denotes transponation.
Then it can be proven that the above calculated search directions have the
property:
.sp
.ce
$ bold d sub i sup T$ \fRA $ bold d sub j$ = 0 and i != j.
.sp
We say that $ bold d sub i $ \fRand $ bold d sub j $ \fR are conjugate.
As the sequence of $bold x sub k$ leads always to the lowest point after k+1
iterations $bold x sub {k+1} $ must be the lowest value. This last point
can be expressed as a sum of the previous  $bold d sub k $ of which there
are at most N . Therefore the algorithm leads in at most N iterations to 
the least value. It is the essence of Powell\'s idea to make use of this
property. One could consider every function as built up from quadratic
partial functions. Then in every subregion one could use a kind of 
conjugate gradient for N steps. After these N steps a restart is made. 
The point now is to guard the convergence during a traverse in a non
quadratic region. Here lie a number of tricks which we will give later. 
.br
To understand what happens in the very many non-quadratic cases a geometrical
representation of the principle of the conjugate gradient method is given below.
At a starting point on the left side of the ellips the first search direction
is chosen to be the opposite of the gradient vector. In a lineseach the point
is found where the function F is minimal. The box in the corner elaborates
what happens in this case: the gradient increases, so the weight of the old
search direction is increased according to $ beta $ and a new search  direction
can be established. This new search direction leads to a third point etc.
.sp
.PS
scale = 2
.ps
.ps 10
ellipse at 4.987,4.650 wid 0.250 ht 0.250
ellipse at 4.987,5.013 wid 0.525 ht 1.025
ellipse at 4.987,4.513 wid 0.025 ht 0.025
ellipse at 4.987,6.513 wid 1.650 ht 4.025
ellipse at 4.987,5.825 wid 0.775 ht 2.650
line from 4.300,7.213 to 4.300,7.312 to 4.175,7.312 to 4.175,7.213 to 4.300,7.213
line from 4.225,7.263 to 7.225,6.950
line from 7.123,6.935 to 7.225,6.950 to 7.128,6.985
line from 4.987,6.013 to 4.987,6.013
line from 5.175,7.000 to 5.175,7.312 to 4.800,7.312 to 4.800,7.000 to 5.175,7.000
line from 5.112,7.312 to 7.225,8.338
line from 7.146,8.271 to 7.225,8.338 to 7.124,8.316
.ps
.ps 20
line from 8.175,9.137 to 8.662,9.075
line from 8.458,9.051 to 8.662,9.075 to 8.470,9.150
line from 8.175,9.137 to 8.113,8.438
line from 8.080,8.641 to 8.113,8.438 to 8.180,8.632
dashwid = 0.050i
line dashed from 8.175,9.137 to 8.550,8.438
line from 8.411,8.590 to 8.550,8.438 to 8.500,8.637
line dashed from 9.738,8.338 to 9.738,9.775 to 7.225,9.775 to 7.225,8.338 to 9.738,8.338
line dashed from 4.987,7.213 to 6.237,5.013
line from 6.095,5.162 to 6.237,5.013 to 6.182,5.211
line from 9.863,4.388 to 9.863,9.825 to 3.862,9.825 to 3.862,4.388 to 9.863,4.388
.ps
.ps 12
.ft
.ft R
"new search direction" at 6.362,5.108 ljust
"grad" at 7.362,8.546 ljust
"Sold" at 8.675,9.171 ljust
"Snew" at 8.675,8.546 ljust
.ps
.ft
.PE
.sp
.nr PS 12
.nr VS 15
.ps 12
.vs 15
.ft R
Yet in the figure all the disadvantages of the conjugate gradient method can
be seen. Most functions are not quadratic at all in a large domain. The effect
of this can be seen in the next figure.
.br
.sp
.PS
scale = 2
.ps
.ps 10
ellipse at 2.900,6.250 wid 1.350 ht 1.350
ellipse at 2.900,6.250 wid 0.200 ht 0.200
ellipse at 2.900,6.250 wid 0.300 ht 0.300
ellipse at 2.900,6.250 wid 0.525 ht 0.525
ellipse at 2.900,6.250 wid 0.425 ht 0.425
ellipse at 2.900,6.250 wid 0.625 ht 0.625
.ps
.ps 20
ellipse at 2.900,6.300 wid 3.100 ht 3.100
.ps
.ps 10
line from 4.002,5.261 to 3.913,5.312 to 3.967,5.225
line from 3.913,5.312 to 4.362,4.875
dashwid = 0.050i
line dashed from 6.013,9.775 to 7.463,9.000
line from 7.363,9.025 to 7.463,9.000 to 7.386,9.069
line from 6.013,9.775 to 5.963,9.675
line from 5.985,9.776 to 5.963,9.675 to 6.030,9.753
line from 6.013,9.775 to 7.513,9.050
line from 7.412,9.071 to 7.513,9.050 to 7.433,9.116
line from 4.825,10.088 to 4.825,10.088
line from 7.825,8.637 to 7.825,10.300 to 4.825,10.300 to 4.825,8.637 to 7.825,8.637
.ps
.ps 20
dashwid = 0.037i
line dotted from 3.987,7.912 to 4.875,8.637
line from 4.752,8.472 to 4.875,8.637 to 4.688,8.550
line from 4.037,7.338 to 4.037,7.912 to 3.525,7.912 to 3.525,7.338 to 4.037,7.338
dashwid = 0.050i
line dashed from 1.450,9.050 to 6.537,5.938
line from 6.341,5.999 to 6.537,5.938 to 6.393,6.085
.ps
.ps 10
line from 3.750,7.562 	to 3.785,7.533\
	to 3.814,7.508\
	to 3.861,7.466\
	to 3.892,7.436\
	to 3.913,7.412\

line from 3.913,7.412 	to 3.957,7.363\
	to 3.982,7.332\
	to 4.009,7.300\
	to 4.035,7.267\
	to 4.060,7.237\
	to 4.100,7.188\

line from 4.100,7.188 	to 4.113,7.164\
	to 4.128,7.136\
	to 4.146,7.104\
	to 4.165,7.071\
	to 4.183,7.037\
	to 4.200,7.005\
	to 4.215,6.975\
	to 4.225,6.950\

line from 4.225,6.950 	to 4.239,6.921\
	to 4.255,6.885\
	to 4.271,6.844\
	to 4.287,6.800\
	to 4.302,6.754\
	to 4.316,6.711\
	to 4.328,6.671\
	to 4.338,6.638\

line from 4.338,6.638 	to 4.341,6.612\
	to 4.346,6.579\
	to 4.350,6.543\
	to 4.354,6.503\
	to 4.358,6.463\
	to 4.361,6.425\
	to 4.362,6.391\
	to 4.362,6.362\

line from 4.362,6.362 	to 4.362,6.322\
	to 4.358,6.273\
	to 4.356,6.246\
	to 4.354,6.218\
	to 4.350,6.189\
	to 4.347,6.160\
	to 4.343,6.131\
	to 4.339,6.103\
	to 4.335,6.075\
	to 4.331,6.049\
	to 4.322,6.001\
	to 4.312,5.963\

line from 4.312,5.963 	to 4.300,5.922\
	to 4.283,5.875\
	to 4.273,5.849\
	to 4.263,5.822\
	to 4.251,5.795\
	to 4.240,5.768\
	to 4.228,5.742\
	to 4.216,5.715\
	to 4.192,5.666\
	to 4.170,5.622\
	to 4.150,5.588\

line from 4.150,5.588 	to 4.118,5.545\
	to 4.077,5.496\
	to 4.054,5.470\
	to 4.030,5.443\
	to 4.005,5.416\
	to 3.979,5.389\
	to 3.953,5.363\
	to 3.927,5.337\
	to 3.901,5.312\
	to 3.876,5.288\
	to 3.828,5.246\
	to 3.788,5.213\

line from 3.788,5.213 	to 3.759,5.189\
	to 3.723,5.163\
	to 3.682,5.134\
	to 3.637,5.105\
	to 3.592,5.077\
	to 3.548,5.051\
	to 3.508,5.029\
	to 3.475,5.013\

line from 3.475,5.013 	to 3.432,4.999\
	to 3.407,4.991\
	to 3.381,4.984\
	to 3.352,4.977\
	to 3.323,4.970\
	to 3.293,4.963\
	to 3.262,4.956\
	to 3.232,4.950\
	to 3.201,4.945\
	to 3.172,4.939\
	to 3.144,4.935\
	to 3.117,4.931\
	to 3.092,4.928\
	to 3.050,4.925\

line from 3.050,4.925 	to 3.003,4.922\
	to 2.975,4.922\
	to 2.946,4.922\
	to 2.915,4.922\
	to 2.883,4.923\
	to 2.850,4.925\
	to 2.817,4.927\
	to 2.784,4.930\
	to 2.752,4.933\
	to 2.720,4.937\
	to 2.689,4.941\
	to 2.660,4.946\
	to 2.634,4.951\
	to 2.587,4.963\

line from 2.587,4.963 	to 2.548,4.978\
	to 2.501,5.001\
	to 2.477,5.014\
	to 2.451,5.028\
	to 2.425,5.043\
	to 2.399,5.059\
	to 2.373,5.074\
	to 2.348,5.090\
	to 2.301,5.121\
	to 2.259,5.150\
	to 2.225,5.175\

line from 2.225,5.175 	to 2.178,5.225\
	to 2.151,5.256\
	to 2.121,5.294\
	to 2.086,5.338\
	to 2.067,5.363\
	to 2.047,5.391\
	to 2.025,5.421\
	to 2.002,5.453\
	to 1.977,5.488\
	to 1.950,5.525\

line from 2.140,5.233 to 2.225,5.175 to 2.177,5.266
.ps
.ps 20
line from 2.688,10.300 	to 2.691,10.258\
	to 2.692,10.227\
	to 2.688,10.188\

line from 2.688,10.188 	to 2.690,10.139\
	to 2.688,10.110\
	to 2.688,10.088\

line from 2.688,10.088 	to 2.701,10.063\
	to 2.719,10.036\
	to 2.750,9.988\

line from 2.750,9.988 	to 2.748,9.962\
	to 2.744,9.932\
	to 2.743,9.901\
	to 2.750,9.875\

line from 2.750,9.875 	to 2.773,9.855\
	to 2.800,9.825\

line from 2.800,9.825 	to 2.794,9.798\
	to 2.777,9.772\
	to 2.750,9.725\

line from 2.750,9.725 	to 2.745,9.701\
	to 2.744,9.672\
	to 2.750,9.625\

line from 2.750,9.625 	to 2.748,9.599\
	to 2.748,9.569\
	to 2.750,9.538\
	to 2.750,9.512\

line from 2.750,9.512 	to 2.719,9.464\
	to 2.701,9.436\
	to 2.688,9.412\

line from 2.688,9.412 	to 2.688,9.363\

line from 2.688,9.363 	to 2.681,9.338\
	to 2.666,9.310\
	to 2.638,9.262\

line from 2.638,9.262 	to 2.639,9.231\
	to 2.638,9.200\

line from 2.638,9.200 	to 2.638,9.150\

line from 2.638,9.150 	to 2.640,9.127\
	to 2.638,9.100\

line from 2.638,9.100 	to 2.640,9.078\
	to 2.640,9.050\
	to 2.638,9.000\

line from 2.638,9.000 	to 2.642,8.973\
	to 2.638,8.950\

line from 2.638,8.950 	to 2.615,8.921\
	to 2.587,8.900\

line from 2.587,8.900 	to 2.587,8.868\
	to 2.587,8.838\

line from 2.587,8.838 	to 2.593,8.791\
	to 2.593,8.762\
	to 2.587,8.738\

line from 2.587,8.738 	to 2.563,8.713\
	to 2.538,8.688\

line from 2.538,8.688 	to 2.537,8.661\
	to 2.538,8.637\

line from 2.538,8.637 	to 2.513,8.607\
	to 2.487,8.588\

line from 2.487,8.588 	to 2.460,8.557\
	to 2.438,8.525\

line from 2.438,8.525 	to 2.407,8.506\
	to 2.388,8.475\

line from 2.388,8.475 	to 2.388,8.425\

line from 2.388,8.425 	to 2.357,8.399\
	to 2.325,8.375\

line from 2.325,8.375 	to 2.316,8.353\
	to 2.305,8.324\
	to 2.275,8.275\

line from 2.275,8.275 	to 2.227,8.243\
	to 2.198,8.232\
	to 2.175,8.225\

line from 2.175,8.225 	to 2.140,8.211\
	to 2.095,8.196\
	to 2.050,8.181\
	to 2.013,8.162\

line from 2.013,8.162 	to 1.991,8.141\
	to 1.962,8.113\

line from 1.962,8.113 	to 1.924,8.089\
	to 1.883,8.065\
	to 1.837,8.042\
	to 1.788,8.018\
	to 1.762,8.006\
	to 1.735,7.994\
	to 1.706,7.981\
	to 1.676,7.969\
	to 1.645,7.956\
	to 1.613,7.943\
	to 1.579,7.930\
	to 1.543,7.916\
	to 1.506,7.903\
	to 1.467,7.889\
	to 1.427,7.874\
	to 1.385,7.860\
	to 1.341,7.845\
	to 1.295,7.829\
	to 1.248,7.814\
	to 1.198,7.797\
	to 1.173,7.789\
	to 1.147,7.781\
	to 1.120,7.772\
	to 1.093,7.764\
	to 1.065,7.755\
	to 1.037,7.746\
	to 1.009,7.737\
	to 0.979,7.728\
	to 0.950,7.719\
	to 0.919,7.709\
	to 0.888,7.700\
	to 0.857,7.690\
	to 0.825,7.680\
	to 0.792,7.670\
	to 0.759,7.660\
	to 0.725,7.650\

line from 2.438,10.238 	to 2.455,10.201\
	to 2.468,10.174\
	to 2.487,10.137\

line from 2.487,10.137 	to 2.497,10.115\
	to 2.510,10.087\
	to 2.538,10.037\

line from 2.538,10.037 	to 2.587,9.988\

line from 2.587,9.988 	to 2.602,9.963\
	to 2.618,9.933\
	to 2.631,9.902\
	to 2.638,9.875\

line from 2.638,9.875 	to 2.642,9.827\
	to 2.640,9.798\
	to 2.638,9.775\

line from 2.638,9.775 	to 2.640,9.750\
	to 2.638,9.725\

line from 2.638,9.725 	to 2.640,9.690\
	to 2.642,9.646\
	to 2.641,9.600\
	to 2.638,9.562\

line from 2.638,9.562 	to 2.630,9.520\
	to 2.623,9.493\
	to 2.615,9.464\
	to 2.606,9.435\
	to 2.598,9.408\
	to 2.587,9.363\

line from 2.587,9.363 	to 2.587,9.339\
	to 2.588,9.310\
	to 2.587,9.262\

line from 2.587,9.262 	to 2.589,9.231\
	to 2.587,9.200\

line from 2.587,9.200 	to 2.590,9.153\
	to 2.590,9.124\
	to 2.587,9.100\

line from 2.587,9.100 	to 2.579,9.066\
	to 2.563,9.024\
	to 2.547,8.983\
	to 2.538,8.950\

line from 2.538,8.950 	to 2.536,8.925\
	to 2.537,8.894\
	to 2.539,8.863\
	to 2.538,8.838\

line from 2.538,8.838 	to 2.529,8.805\
	to 2.516,8.763\
	to 2.501,8.720\
	to 2.487,8.688\

line from 2.487,8.688 	to 2.466,8.651\
	to 2.438,8.606\
	to 2.409,8.561\
	to 2.388,8.525\

line from 2.388,8.525 	to 2.355,8.506\
	to 2.325,8.475\

line from 2.325,8.475 	to 2.325,8.425\

line from 2.325,8.425 	to 2.278,8.396\
	to 2.248,8.386\
	to 2.225,8.375\

line from 2.225,8.375 	to 2.200,8.350\
	to 2.175,8.325\

line from 2.175,8.325 	to 2.131,8.296\
	to 2.104,8.281\
	to 2.074,8.266\
	to 2.044,8.251\
	to 2.014,8.239\
	to 1.987,8.230\
	to 1.962,8.225\

line from 1.962,8.225 	to 1.938,8.219\
	to 1.907,8.216\
	to 1.873,8.215\
	to 1.836,8.216\
	to 1.800,8.218\
	to 1.766,8.220\
	to 1.736,8.223\
	to 1.712,8.225\

line from 1.712,8.225 	to 1.685,8.223\
	to 1.653,8.220\
	to 1.617,8.218\
	to 1.579,8.216\
	to 1.542,8.215\
	to 1.507,8.216\
	to 1.476,8.219\
	to 1.450,8.225\

line from 1.450,8.225 	to 1.427,8.231\
	to 1.400,8.240\
	to 1.371,8.254\
	to 1.341,8.269\
	to 1.312,8.284\
	to 1.284,8.300\
	to 1.238,8.325\

line from 1.238,8.325 	to 1.188,8.348\
	to 1.159,8.362\
	to 1.137,8.375\

line from 1.137,8.375 	to 1.113,8.393\
	to 1.085,8.415\
	to 1.054,8.440\
	to 1.023,8.467\
	to 0.992,8.497\
	to 0.965,8.527\
	to 0.942,8.558\
	to 0.925,8.588\

line from 0.925,8.588 	to 0.922,8.620\
	to 0.927,8.662\
	to 0.930,8.703\
	to 0.925,8.738\

line from 0.925,8.738 	to 0.915,8.763\
	to 0.899,8.790\
	to 0.879,8.819\
	to 0.856,8.848\
	to 0.832,8.877\
	to 0.810,8.904\
	to 0.775,8.950\

line from 0.775,8.950 	to 0.749,8.980\
	to 0.718,9.020\
	to 0.690,9.063\
	to 0.675,9.100\

line from 0.675,9.100 	to 0.669,9.124\
	to 0.669,9.153\
	to 0.675,9.200\

line from 0.675,9.200 	to 0.664,9.250\
	to 0.656,9.279\
	to 0.648,9.309\
	to 0.639,9.339\
	to 0.630,9.368\
	to 0.613,9.412\

line from 0.613,9.412 	to 0.600,9.449\
	to 0.585,9.476\
	to 0.562,9.512\

line from 2.175,9.875 	to 2.219,9.849\
	to 2.257,9.827\
	to 2.289,9.806\
	to 2.316,9.787\
	to 2.357,9.755\
	to 2.388,9.725\

line from 2.388,9.725 	to 2.402,9.691\
	to 2.416,9.647\
	to 2.428,9.602\
	to 2.438,9.562\

line from 2.438,9.562 	to 2.436,9.515\
	to 2.436,9.486\
	to 2.438,9.463\

line from 2.438,9.463 	to 2.435,9.438\
	to 2.438,9.412\

line from 2.438,9.412 	to 2.435,9.390\
	to 2.435,9.361\
	to 2.438,9.312\

line from 2.438,9.312 	to 2.435,9.284\
	to 2.438,9.262\

line from 2.438,9.262 	to 2.435,9.237\
	to 2.435,9.206\
	to 2.435,9.175\
	to 2.438,9.150\

line from 2.438,9.150 	to 2.437,9.102\
	to 2.439,9.073\
	to 2.438,9.050\

line from 2.438,9.050 	to 2.416,9.012\
	to 2.384,8.970\
	to 2.350,8.931\
	to 2.325,8.900\

line from 2.325,8.900 	to 2.315,8.863\
	to 2.303,8.817\
	to 2.289,8.772\
	to 2.275,8.738\

line from 2.275,8.738 	to 2.256,8.702\
	to 2.229,8.659\
	to 2.200,8.618\
	to 2.175,8.588\

line from 2.175,8.588 	to 2.153,8.560\
	to 2.125,8.529\
	to 2.097,8.499\
	to 2.075,8.475\

line from 2.075,8.475 	to 2.051,8.464\
	to 2.021,8.449\
	to 1.990,8.435\
	to 1.962,8.425\

line from 1.962,8.425 	to 1.919,8.420\
	to 1.891,8.420\
	to 1.862,8.420\
	to 1.834,8.420\
	to 1.806,8.421\
	to 1.762,8.425\

line from 1.762,8.425 	to 1.725,8.436\
	to 1.681,8.452\
	to 1.637,8.468\
	to 1.600,8.475\

line from 1.600,8.475 	to 1.568,8.478\
	to 1.526,8.477\
	to 1.484,8.474\
	to 1.450,8.475\

line from 1.450,8.475 	to 1.408,8.490\
	to 1.360,8.511\
	to 1.335,8.524\
	to 1.309,8.537\
	to 1.283,8.552\
	to 1.257,8.567\
	to 1.232,8.582\
	to 1.207,8.598\
	to 1.160,8.629\
	to 1.119,8.660\
	to 1.087,8.688\

line from 1.087,8.688 	to 1.058,8.719\
	to 1.027,8.761\
	to 0.998,8.804\
	to 0.975,8.838\

line from 0.975,8.838 	to 0.965,8.874\
	to 0.952,8.919\
	to 0.938,8.964\
	to 0.925,9.000\

line from 0.925,9.000 	to 0.915,9.033\
	to 0.901,9.075\
	to 0.886,9.117\
	to 0.875,9.150\

line from 0.875,9.150 	to 0.869,9.182\
	to 0.864,9.220\
	to 0.859,9.263\
	to 0.853,9.308\
	to 0.847,9.353\
	to 0.840,9.395\
	to 0.833,9.433\
	to 0.825,9.463\

line from 0.825,9.463 	to 0.804,9.510\
	to 0.789,9.538\
	to 0.772,9.567\
	to 0.756,9.597\
	to 0.742,9.625\
	to 0.725,9.675\

line from 0.725,9.675 	to 0.725,9.725\

.ft
.ps 10
.ft R
"a very slow approximation" at 4.362,4.893 ljust
"gradient" at 5.388,9.418 ljust
"Sold" at 7.050,9.418 ljust
.ps
.ft
.PE

.nr PS 12
.nr VS 15
.ps 12
.vs 15
The hills on the left side are very steep, so the search direction is
a large vector. Then when one arrives at the landscape of the quadratic
function, the gradient becomes much smaller, but big enough to do harm.
The method of the conjugate gradient requires to conserve a portion of the
old search direction, and if the landscape is chosen well , this portion
is very big in comparison with the local gradient. Then the quadratic function
is travelled through in a very slow spiral. 
This effect can be observed in neural networks. In the start  of the algorithm
everything works well. A considerable diminishing of the error takes place.
But when the error approaches zero, it takes a lot of iterations to get a
factor two less - so it is very time consuming. Powell
.[ [
Powell
.]]
points at the difference between two expressions:
.sp
.ce
$ beta sub k $ = ${norm( {bold g sub k} , {bold g sub k})} over {norm( {bold g sub {k-1}}, {bold g sub {k-1}})}~~ $  $ beta sub k $ = ${norm( {bold g sub k} , {bold g sub k} - {bold g sub {k-1})}} over {norm( {bold g sub {k-1}}, {bold g sub {k-1}} )} $ 
.sp
\fR If the function is  quadratic there is no difference between the two
expressions. But in the case that the searchdirection is formed by a part
of the function which is not quadratic at all, then it makes a lot of difference.
To understand this again a geometric representation can help. 
.sp
.PS
scale = 2
.ps
.ps 20
line from 1.175,7.200 to 3.175,7.950
line from 3.005,7.833 to 3.175,7.950 to 2.970,7.927
line from 2.300,8.637 to 3.237,7.950
line from 3.047,8.028 to 3.237,7.950 to 3.106,8.109
line from 1.175,7.200 to 2.300,8.637
line from 2.216,8.449 to 2.300,8.637 to 2.137,8.511
.ps
.ps 12
.ft
.ft R
"$ space 0 bold d sub k $" at 2.237,7.421 ljust
"$ space 0 {beta sub k} {bold d sub {k-1}} $" at 2.737,8.796 ljust
"$ space 0 bold -g sub k $" at 0.925,7.983 ljust
.ps
.ft
.PE
.nr PS 12
.nr VS 15
.ps 12
.vs 15
.ft R
.sp
\fRThe angle between $ bold -g sub k $ \fRand $ bold d sub k $\fR is called $ theta$ \fRand equals:
.sp
.ce
norm($ bold d sub k $\fR) = sec $theta sub k$ norm($ bold g sub k $\fR)
.sp
and if we replace k by k+1 and eliminate norm($ bold d sub k $\fR) then:
.sp
.ce
$ tan theta sub {k+1}$ = sec$theta sub k$ ${{norm({bold g sub {k+1}} - { bold g sub k)}} over {norm({bold g sub k})}} $
.sp
\fRand therefore:
.sp
.sp
.ce
$ tan theta sub {k+1}$ > tan$theta sub k$ ${{norm({bold g sub {k+1}} - { bold g sub k)}} over {norm({bold g sub k})}} $
.sp
\fRNow we are able to give some idea how  to proceed in the case of the very
slow spiraling: if $theta sub k$ is close to ${1 over 2} pi$ than if the
iteration takes a small step, the change in $ bold g sub {k+1} -  bold g sub k $
is also very small. Therefore the ratio between the norms is close to one.
This all results in a $ theta sub k $ to be also near one: and there is the
slow approximation of the spiral.
.br
The other formula however can save the day:
.sp
.ce
$ tan theta sub {k+1}$ < sec$theta sub k$ ${{norm({bold g sub {k+1}} - { bold g sub k)}} over {norm({bold g sub k})}} $
.sp
The problem is however that it does not work all the time. Powell gives an
example of this.
Although it is a matter of experience that restarting every N iterations does
help, this is a meager result for all the trouble. This restarting has often
the disadvantage that the new search direction along the gradient is not
better than the old one - and this is understandable. So one looks for a
better direction than the gradient and also better than the conjugate gradient.
Beale 
.[ [
Beale
.]]
gives arguments to take a multiple of the restarting direction $ bold d sub t$ \fRand to add this
to our conjugate gradient. If for the moment we take $ bold d sub t$ an arbitrary
direction and the function F quadratic is it then possible to make the $ bold d sub t $, $ bold d sub {t+1} $,$ bold d sub {t+2} $,$ bold d sub {t+3} $,.... mutually conjugate? We  
have:
.sp
.ce
$ bold d sub k $ \fR =  -$ bold g  sub k $ \fR + $ beta sub k $ $ bold d  sub {k-1}$ + $ gamma sub k $ $bold d sub  t$.
.sp
\fRThe factors $beta$ and $gamma$ \fR look like the old $beta$, but now the
restart direction is involved in the $gamma$.
.sp
.ce
$ beta sub k $ = ${norm( {bold g sub k} , {bold g sub k} - {bold g sub {k-1})}} over {norm( {bold d sub {k-1}}, {bold g sub k} - {bold g sub {k-1}} )} $ 
.sp
.ce
$ gamma sub k $ = ${norm( {bold g sub k} , {bold g sub {t+1}} - {bold g sub t})} over {norm( {bold d sub t}, {bold g sub {t+1}} - {bold g sub t} )} $ 
.sp
But the augmentation in $bold d sub k$ \fR can easily lead uphill instead of
downhill. One way not to get in this situation is to demand that:
.sp
.ce
norm($ bold d sub k $, $bold g sub k$) < 0 \fRfor k > t.
.sp
\fRAnd this is just what the algorithm of Powell does! It compares in the $beta$ of
the conjugate gradient method and demands that:
${norm( {bold g sub {k-1}} , {bold g sub k})} ~~~<=~~~ 0.2 {norm( {bold g sub k}, {bold g sub k})} $
.sp
Furthermore the searchdirection has to be sufficiently downhill:
.sp
.ce
-1.2 ${norm( {bold g sub k} , {bold g sub k})} ~~~<=~~~ norm( bold d sub k $, $bold g sub k$) $~~~<=~~~ -0.8 {norm( {bold g sub k}, {bold g sub k})} $
.sp
\fRIn the above the method of linesearch is not mentioned. But also there
it is wise to set some norms:
.sp
.ce
${norm( {bold g sub k} , {bold g sub {k+1}})} ~~~<~~~ 0.1 norm( bold d sub k $, $bold g sub k$) 
.sp
\fRto insure that  one really has a decrease in the directional derivate.To require that the angle $theta$ between $bold g sub {k+1} $ and $bold d sub k$ is
nearly ${1 over 2}pi $ makes also sense according to Powell.
.sp
To compare the increase in efficiency we take the well known problem of the
xor. We ask for a total error of 0.000001. The implementation for the 
normal backpropagation method had 2 inputs, 1 hidden layer with 2 hidden units and 
1 output. The unit function was:
.br
.sp
.ce
f(x) = $ 1 over { 1 + e sup {-x + bias}}$.
.sp
.sp
In the implementation which used the method of Powell the unit function had
a bias zero. Therefore a hidden layer of three units was needed. The results
are given below.
.sp
.sp
.TS 
center allbox;
cs
cs
cc
ll.
error 0.000001
number of iterations
Powell	backprop.	
46	> 320.000
.TE
.sp
.nr PS 12
.nr VS 15
.ps 12
.vs 15
.ft R
Of course the comparison is not honest: it only shows that the normally used
method  in backpropagation can be very slow. It makes however the difference
between succes and failure in the case of the this research. There are some
154 inputs of binary value, one layer of 9 hidden units, and 16 classes.
A  network with the normal backpropagation method could learn some 50 different
cases, but it took about 24 hours  to learn 100 cases and the whole set of
learn cases could not be mastered at all.