simonof@aplcen.apl.jhu.edu (Simonoff Robert 301 540 1864) (12/21/90)
Netters: A question on the dynamic range of nodes in a backpropagation network. The answer should be obvious, but I can not for the life of me find the solution. Below are two code fragments from a backpropagation network I have written. The first fragment (above the dotted line) works perfectly for neurons having a dynamic range of [0.0, 1.0]. I decided to rewrite the code so as to allow networks to have a range of [-1.0, 1.0] (the second code fragment). I am under the impression that the activation function must be changed as well as the computation of delta which uses the derivative of the activation function. I have choseen as my new activation function the hyperbolic tangent function which is defined from [-1.0, 1.0]. The derivative of this function is: 2 1 tanh'(X) == sech (X) == --------- 2 cosh (X) If anyone can descern what is wrong with the second code fragment, I would appreciate the help. If I am forgetting to make other changes (I have already made the administrative changes such as input value range and output value range) please notify me. The symptom is that the weights connecting the input layer to the hidden layer grow rapidly to large numbers (both positive and negitive). But the network never converges to an answer, the weights just grow (never changing sign - if they start positive, they grow to be larger positive positive numbers). The following code is taken from a BP program I have written that works. I can substitute this code for the code that does not appear to work change the -1.0 inputs to 0.0 and outputs the same way and the code works fine. But when the code below the dotted line is used, the network never converges. /* w1[node1][node2] = weight from node2 in the input layer to node1 in the hidden layer w2[node1][node2] = weight from node2 in the hidden layer to node1 in the output layer input_vector[pattern][node] = input node output value for pattern out1[pattern][node] = hidden node output value for pattern out2[pattern][node] = output node output value for pattern target[pattern][node] = target output value for pattern delta1[pattern][node] = delta for hidden node,pattern delta2[pattern][node] = delta for output node,pattern */ int compute_outputs(int pattern, int player) { int i,j; double netinput; for (j=1;j<=nh ;j++ ) { netinput = w1[j][nip1]; for (i=1;i<=ni ;i++ ) netinput += w1[j][i] * input_vector[pattern][i]; out1[pattern][j]=1.0/(1.0+exp(-netinput)); } /* endfor */ for (j=1;j<=no ;j++ ) { netinput = w2[j][nhp1]; for (i=1;i<=nh ;i++ ) netinput += w2[j][i] * out1[pattern][i]; out2[pattern][j]=1.0/(1.0+exp(-netinput)); } /* endfor */ } int compute_delta(int pattern, int winner) { int i,m; double sum; for (i=1;i<=no ;i++ ) delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) * out2[pattern][i]*(1.0-out2[pattern][i]); for (i=1;i<=nh ;i++ ) { sum=0.0; for (m=1;m<=no ;m++ ) sum += delta2[pattern][m] * w2[m][i]; delta1[pattern][i] = sum * out1[pattern][i]*(1.0-out1[pattern][i]); } /* endfor */ } ----------------------------------------------------------- The following are the routines that I believe should change as a result of the new dynamic range for the neurons [-1.0, 1.0]. There are also administrative changes that include the input values and output values. int compute_outputs(int pattern) { int j,i; for (j=1;j<=nh ;j++ ) { netinput = w1[j][nip1]; for (i=1;i<=ni ;i++ ) netinput += w1[j][i] * input_vector[pattern][i]; out1[pattern][j]=tanh(netinput); } /* endfor */ for (j=1;j<=no ;j++ ) { netinput = w2[j][nhp1]; for (i=1;i<=nh ;i++ ) netinput += w2[j][i] * out1[pattern][i]; out2[pattern][j]=tanh(netinput); } /* endfor */ } int compute_delta(int pattern) { int i,m; double sum; for (i=1;i<=no ;i++ ) delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) * 1.0/(cosh(out2[pattern][i])* cosh(out2[pattern][i])); for (i=1;i<=nh ;i++ ) { sum=0.0; for (m=1;m<=no ;m++ ) sum += delta2[pattern][m] * w2[m][i]; delta1[pattern][i] = sum * 1.0/(cosh(out1[pattern][i])* cosh(out1[pattern][i])); } /* endfor */ } ------------------------------------------------------ Thanks. Bob Simonoff simonof@aplcen.apl.edu -- *********************************************************** Bob Simonoff simonof@aplcen Johns Hopkins University
markh@csd4.csd.uwm.edu (Mark William Hopkins) (12/22/90)
In article <1990Dec21.010536.17034@aplcen.apl.jhu.edu> simonof@aplcen.apl.edu (Simonoff Robert 301 540 1864) writes: >I have choseen as my new activation function the hyperbolic >tangent function which is defined from [-1.0, 1.0]. The >derivative of this function is: > 2 1 >tanh'(X) == sech (X) == --------- > 2 > cosh (X) ... = 1 - (tanh(x))^2. ... (A code fragment was presented with the question "what's wrong with it?") Thus, in: >int compute_delta(int pattern) ... > delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) * > 1.0/(cosh(out2[pattern][i])* > cosh(out2[pattern][i])); should be delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) * (1 - out2[pattern][i]*out2[pattern][i]); and > delta1[pattern][i] = sum * 1.0/(cosh(out1[pattern][i])* > cosh(out1[pattern][i])); should be delta1[pattern][i] = sum * (1 - out1[pattern][i]*out1[pattern][i]);
simonof@aplcen.apl.jhu.edu (Simonoff Robert 301 540 1864) (12/22/90)
In article <8513@uwm.edu> markh@csd4.csd.uwm.edu (Mark William Hopkins) writes: >In article <1990Dec21.010536.17034@aplcen.apl.jhu.edu> simonof@aplcen.apl.edu (Simonoff Robert 301 540 1864) writes: >>I have choseen as my new activation function the hyperbolic >>tangent function which is defined from [-1.0, 1.0]. The >>derivative of this function is: >> 2 1 >>tanh'(X) == sech (X) == --------- >> 2 >> cosh (X) > >... = 1 - (tanh(x))^2. ... > >(A code fragment was presented with the question "what's wrong with it?") > >Thus, in: >>int compute_delta(int pattern) >... >> delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) * >> 1.0/(cosh(out2[pattern][i])* >> cosh(out2[pattern][i])); > >should be > delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) * > (1 - out2[pattern][i]*out2[pattern][i]); > >and > >> delta1[pattern][i] = sum * 1.0/(cosh(out1[pattern][i])* >> cosh(out1[pattern][i])); > >should be > delta1[pattern][i] = sum * (1 - out1[pattern][i]*out1[pattern][i]); Why is delta1[pattern][i] = sum*(1-out1[pattern][i]*out1[pattern][i]) ? My activation function is tanh(netinput) and I believe the derivative of hyperbolic tanget is: tanh'(x) = 1/sech(x)**2 = 1/cosh(x)**2 = 2/(e**x + e**(-x)) Maybe I am not seeing the algebra that makes: 2 1 --------------- = ------------------ + -1 x -x -x -x e + e (1 + e ) (1 + e ) Bob Simonoff simonof@aplcen.apl.edu -- *********************************************************** Bob Simonoff simonof@aplcen Johns Hopkins University
jon@calsci (Parallax & Red Shift) (12/28/90)
In article <1990Dec22.042610.23800@aplcen.apl.jhu.edu>, simonof@aplcen (Simonoff Robert 301 540 1864) writes: >In article <8513@uwm.edu> markh@csd4.csd.uwm.edu (Mark William Hopkins) writes: >>In article <1990Dec21.010536.17034@aplcen.apl.jhu.edu> simonof@aplcen.apl.edu (Simonoff Robert 301 540 1864) writes: >>>I have choseen as my new activation function the hyperbolic >>>tangent function which is defined from [-1.0, 1.0]. The >>>derivative of this function is: >>> 2 1 >>>tanh'(X) == sech (X) == --------- >>> 2 >>> cosh (X) >> >>... = 1 - (tanh(x))^2. ... >> >>(A code fragment was presented with the question "what's wrong with it?") >> [some stuff deleted to save bandwith] >>and >> >>> delta1[pattern][i] = sum * 1.0/(cosh(out1[pattern][i])* >>> cosh(out1[pattern][i])); >> >>should be >> delta1[pattern][i] = sum * (1 - out1[pattern][i]*out1[pattern][i]); > >Why is delta1[pattern][i] = sum*(1-out1[pattern][i]*out1[pattern][i]) ? >My activation function is tanh(netinput) and I believe the >derivative of hyperbolic tanget is: > > tanh'(x) = 1/sech(x)**2 = 1/cosh(x)**2 = 2/(e**x + e**(-x)) > >Maybe I am not seeing the algebra that makes: > > 2 1 > --------------- = ------------------ + -1 > x -x -x -x > e + e (1 + e ) (1 + e ) > >Bob Simonoff >simonof@aplcen.apl.edu > Bob, look again at the equation for tanh'(x) you wrote, above. First off, tanh'(x) doesn't equal 1/sech(x)**2, but rather tanh'(x) = sech(x)**2. (This was *probably* just a typo, as you give the correct equation at the top of your original posting.) Continuing on to the 2nd '=' in your tanh'(x) eq., tanh'(x) is, in fact, equal to 1/cosh(x)**2, as you have noted, but you blow it on the 3rd equals sign in the above equation. 1/cosh(x)**2 is NOT equal to 2/(e**x + e**(-x)), but rather is equal to TWICE that quantity: 1/cosh(x)**2 = 4 / (e**x + e**(-x)). Similarly, I don't know where you got the r.h.s. of the next equation. Mark Hopkins suggested that delta1[pattern][i] = sum*(1-out1[pattern][i]*out1[pattern][i] ) This is, in fact, correct. But assuming a transfer function of tanh(x), then this doesn't equal what you wrote, i.e. 1 1 - tanh(x)**2 does NOT equal ------------------ + -1 -x -x (1 + e ) (1 + e ) In fact, e**(2x) + e**(-2x) - 2 1 - tanh(x)**2 = 1 - ---------------------- = 1/cosh(x)**2 = sech(x)**2 e**(2x) + e**(-2x) + 2 Which is the correct value for tanh'(x), as noted above. On the other hand, this appears to be equivalent to your actual code fragment. I assume Mark suggested the alternate form for reasons of computational efficiency (so you don't waste time computing the additional coshines, but rather use the outputs which you already have laying around). But the code you wrote SHOULD work (albeit slower than necessary). So, I would suggest that you have a different problem. (Unless the additional cosh(out1[pattern][i]) calculation loses more precision than your algorithm can tolerate, in which case switching to Mark's formulation will fix that problem, as well as improving the speed!) BTW, I wrote the engine for the commercially available back-prop based neural-net software package called BrainMaker(tm). (Perhaps you've heard of it?) So I've done more than my share of this kind of coding. (I.e., I'm not just talking through my hat... :-) Good luck, Jon --- Jon "J.D." "Parallax" Hartzberg, GSXR pilot '86 GSXR1100 "Red Shift" Cal. Sci. Software, Grass Valley CA DoD #0220 '80 CB750F "Ol' Flexible" jon@calsci.gvgpsa.gvg.tek.com OR ...!calsci!jon '81 XL185S "SquirtintheDirt" "When you stop falling down, you stop learning." -Kenny Roberts "When I found out what fairings cost, I decided I'd learned enough!" -me Disclaimer: If my boss knew I was doing this he'd kill me.
jon@calsci (Parallax & Red Shift) (12/28/90)
In article <0022@calsci>, calsci!jon@gvgpsa.gvg.tek.com (Parallax & Red Shift) writes: >1/cosh(x)**2 is NOT equal to 2/(e**x + e**(-x)), but rather is equal to >TWICE that quantity: 1/cosh(x)**2 = 4 / (e**x + e**(-x)). Oops! Now *I'm* making typos! I meant, of course: 1/cosh(x)**2 is NOT equal to 2/(e**x + e**(-x)), but rather is equal to the SQUARE of that quantity: 1/cosh(x)**2 = 4 / (e**x + e**(-x))**2. = 4 / (e**(2x) + e**(-2x) + 2) I actually used the correct value later on in my post, so everything else should still be cool. --- Jon "J.D." "Parallax" Hartzberg, GSXR pilot '86 GSXR1100 "Red Shift" Cal. Sci. Software, Grass Valley CA DoD #0220 '80 CB750F "Ol' Flexible" jon@calsci.gvgpsa.gvg.tek.com OR ...!calsci!jon '81 XL185S "SquirtintheDirt" "When you stop falling down, you stop learning." -Kenny Roberts "When I found out what fairings cost, I decided I'd learned enough!" -me Disclaimer: If my boss knew I was doing this he'd kill me.