[comp.ai.neural-nets] Dynamic range of nodes

simonof@aplcen.apl.jhu.edu (Simonoff Robert 301 540 1864) (12/21/90)

Netters: A question on the dynamic range of nodes in a
backpropagation network.  The answer should be obvious, but
I can not for the life of me find the solution.  Below are
two code fragments from a backpropagation network I have
written.  The first fragment (above the dotted line) works
perfectly for neurons having a dynamic range of [0.0, 1.0].
I decided to rewrite the code so as to allow networks to
have a range of [-1.0, 1.0] (the second code fragment).  I
am under the impression that the activation function must be
changed as well as the computation of delta which uses the
derivative of the activation function.

I have choseen as my new activation function the hyperbolic
tangent function which is defined from [-1.0, 1.0].  The
derivative of this function is:
                2            1
tanh'(X) == sech (X)  ==  ---------
                              2
                          cosh (X)

If anyone can descern what is wrong with the second code
fragment, I would appreciate the help.  If I am forgetting
to make other changes (I have already made the
administrative changes such as input value range and output
value range) please notify me.

The symptom is that the weights connecting the input layer
to the hidden layer grow rapidly to large numbers (both
positive and negitive).  But the network never converges to
an answer, the weights just grow (never changing sign - if
they start positive, they grow to be larger positive
positive numbers).


The following code is taken from a BP program I have written
that works.  I can substitute this code for the code that
does not appear to work change the -1.0 inputs to 0.0
and outputs the same way and the code works fine.  But when
the code below the dotted line is used, the network never
converges.




/* w1[node1][node2] = weight from node2 in the input layer
                      to node1 in the hidden layer

   w2[node1][node2] = weight from node2 in the hidden layer
                      to node1 in the output layer

   input_vector[pattern][node] = input node output value
                                 for pattern

   out1[pattern][node] = hidden node output value for
                         pattern

   out2[pattern][node] = output node output value for
                         pattern

   target[pattern][node] = target output value for pattern

   delta1[pattern][node] = delta for hidden node,pattern

   delta2[pattern][node] = delta for output node,pattern
*/

int compute_outputs(int pattern, int player)
{
   int i,j;
   double netinput;


   for (j=1;j<=nh ;j++ )
      {
         netinput = w1[j][nip1];
         for (i=1;i<=ni ;i++ )
            netinput += w1[j][i] * input_vector[pattern][i];

         out1[pattern][j]=1.0/(1.0+exp(-netinput));
      } /* endfor */

   for (j=1;j<=no ;j++ )
      {
         netinput = w2[j][nhp1];
         for (i=1;i<=nh ;i++ )
            netinput += w2[j][i] * out1[pattern][i];

         out2[pattern][j]=1.0/(1.0+exp(-netinput));
      } /* endfor */
}




int compute_delta(int pattern, int winner)
{
   int i,m;
   double sum;

   for (i=1;i<=no ;i++ )
      delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) *
                             out2[pattern][i]*(1.0-out2[pattern][i]);

   for (i=1;i<=nh ;i++ ) {
      sum=0.0;

      for (m=1;m<=no ;m++ )
         sum += delta2[pattern][m] * w2[m][i];

      delta1[pattern][i] = sum * out1[pattern][i]*(1.0-out1[pattern][i]);
   } /* endfor */
}


-----------------------------------------------------------

The following are the routines that I believe should change
as a result of the new dynamic range for the neurons [-1.0, 1.0].
There are also administrative changes that include the input
values and output values.

int compute_outputs(int pattern)
   {
      int j,i;

      for (j=1;j<=nh ;j++ )
         {
            netinput = w1[j][nip1];
            for (i=1;i<=ni ;i++ )
               netinput += w1[j][i] * input_vector[pattern][i];

            out1[pattern][j]=tanh(netinput);
         } /* endfor */

      for (j=1;j<=no ;j++ )
         {
            netinput = w2[j][nhp1];
            for (i=1;i<=nh ;i++ )
               netinput += w2[j][i] * out1[pattern][i];

            out2[pattern][j]=tanh(netinput);
         } /* endfor */
   }



int compute_delta(int pattern)
{
   int i,m;
   double sum;


   for (i=1;i<=no ;i++ )
      delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) *
                             1.0/(cosh(out2[pattern][i])*
                                  cosh(out2[pattern][i]));

   for (i=1;i<=nh ;i++ ) {
      sum=0.0;

      for (m=1;m<=no ;m++ )
         sum += delta2[pattern][m] * w2[m][i];

      delta1[pattern][i] = sum * 1.0/(cosh(out1[pattern][i])*
                              cosh(out1[pattern][i]));
  } /* endfor */
}


------------------------------------------------------

Thanks.
Bob Simonoff
simonof@aplcen.apl.edu

-- 
***********************************************************
Bob Simonoff
simonof@aplcen
Johns Hopkins University

markh@csd4.csd.uwm.edu (Mark William Hopkins) (12/22/90)

In article <1990Dec21.010536.17034@aplcen.apl.jhu.edu> simonof@aplcen.apl.edu (Simonoff Robert  301 540 1864) writes:
>I have choseen as my new activation function the hyperbolic
>tangent function which is defined from [-1.0, 1.0].  The
>derivative of this function is:
>                2            1
>tanh'(X) == sech (X)  ==  ---------
>                              2
>                          cosh (X)

... = 1 - (tanh(x))^2. ...

(A code fragment was presented with the question "what's wrong with it?")

Thus, in:
>int compute_delta(int pattern)
...
>      delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) *
>                             1.0/(cosh(out2[pattern][i])*
>                                  cosh(out2[pattern][i]));

should be
       delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) *
			    (1 - out2[pattern][i]*out2[pattern][i]);

and

>      delta1[pattern][i] = sum * 1.0/(cosh(out1[pattern][i])*
>                              cosh(out1[pattern][i]));

should be
       delta1[pattern][i] = sum * (1 - out1[pattern][i]*out1[pattern][i]);

simonof@aplcen.apl.jhu.edu (Simonoff Robert 301 540 1864) (12/22/90)

In article <8513@uwm.edu> markh@csd4.csd.uwm.edu (Mark William Hopkins) writes:
>In article <1990Dec21.010536.17034@aplcen.apl.jhu.edu> simonof@aplcen.apl.edu (Simonoff Robert  301 540 1864) writes:
>>I have choseen as my new activation function the hyperbolic
>>tangent function which is defined from [-1.0, 1.0].  The
>>derivative of this function is:
>>                2            1
>>tanh'(X) == sech (X)  ==  ---------
>>                              2
>>                          cosh (X)
>
>... = 1 - (tanh(x))^2. ...
>
>(A code fragment was presented with the question "what's wrong with it?")
>
>Thus, in:
>>int compute_delta(int pattern)
>...
>>      delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) *
>>                             1.0/(cosh(out2[pattern][i])*
>>                                  cosh(out2[pattern][i]));
>
>should be
>       delta2[pattern][i] = (target[pattern][i]-out2[pattern][i]) *
>			    (1 - out2[pattern][i]*out2[pattern][i]);
>
>and
>
>>      delta1[pattern][i] = sum * 1.0/(cosh(out1[pattern][i])*
>>                              cosh(out1[pattern][i]));
>
>should be
>       delta1[pattern][i] = sum * (1 - out1[pattern][i]*out1[pattern][i]);

Why is delta1[pattern][i] = sum*(1-out1[pattern][i]*out1[pattern][i]) ?
My activation function is tanh(netinput) and I believe the 
derivative of hyperbolic tanget is:

   tanh'(x) = 1/sech(x)**2 = 1/cosh(x)**2 = 2/(e**x + e**(-x))

Maybe I am not seeing the algebra that makes:

          2                    1
   ---------------  =  ------------------   + -1  
       x     -x            -x          -x  
      e   + e         (1 + e  )  (1 + e )  

Bob Simonoff
simonof@aplcen.apl.edu  

-- 
***********************************************************
Bob Simonoff
simonof@aplcen
Johns Hopkins University

jon@calsci (Parallax & Red Shift) (12/28/90)

In article <1990Dec22.042610.23800@aplcen.apl.jhu.edu>, simonof@aplcen (Simonoff Robert  301 540 1864) writes:
>In article <8513@uwm.edu> markh@csd4.csd.uwm.edu (Mark William Hopkins) writes:
>>In article <1990Dec21.010536.17034@aplcen.apl.jhu.edu> simonof@aplcen.apl.edu (Simonoff Robert  301 540 1864) writes:
>>>I have choseen as my new activation function the hyperbolic
>>>tangent function which is defined from [-1.0, 1.0].  The
>>>derivative of this function is:
>>>                2            1
>>>tanh'(X) == sech (X)  ==  ---------
>>>                              2
>>>                          cosh (X)
>>
>>... = 1 - (tanh(x))^2. ...
>>
>>(A code fragment was presented with the question "what's wrong with it?")
>>
        [some stuff deleted to save bandwith]
>>and
>>
>>>      delta1[pattern][i] = sum * 1.0/(cosh(out1[pattern][i])*
>>>                              cosh(out1[pattern][i]));
>>
>>should be
>>       delta1[pattern][i] = sum * (1 - out1[pattern][i]*out1[pattern][i]);
>
>Why is delta1[pattern][i] = sum*(1-out1[pattern][i]*out1[pattern][i]) ?
>My activation function is tanh(netinput) and I believe the
>derivative of hyperbolic tanget is:
>
>   tanh'(x) = 1/sech(x)**2 = 1/cosh(x)**2 = 2/(e**x + e**(-x))
>
>Maybe I am not seeing the algebra that makes:
>
>          2                    1
>   ---------------  =  ------------------   + -1
>       x     -x            -x          -x
>      e   + e         (1 + e  )  (1 + e )
>
>Bob Simonoff
>simonof@aplcen.apl.edu
>
Bob, look again at the equation for tanh'(x) you wrote, above.  First off,
tanh'(x) doesn't equal 1/sech(x)**2, but rather tanh'(x) = sech(x)**2.  (This
was *probably* just a typo, as you give the correct equation at the top of
your original posting.)  Continuing on to the 2nd '=' in your tanh'(x) eq.,
tanh'(x) is, in fact, equal to 1/cosh(x)**2, as you have noted, but you blow it
on the 3rd equals sign in the above equation.

1/cosh(x)**2 is NOT equal to 2/(e**x + e**(-x)), but rather is equal to
TWICE that quantity: 1/cosh(x)**2 = 4 / (e**x + e**(-x)).

Similarly, I don't know where you got the r.h.s. of the next equation.
Mark Hopkins suggested that
    delta1[pattern][i] = sum*(1-out1[pattern][i]*out1[pattern][i] )

This is, in fact, correct.  But assuming a transfer function of tanh(x),
then this doesn't equal what you wrote, i.e.
                                               1
    1 - tanh(x)**2 does NOT equal      ------------------   + -1
                                            -x          -x
                                      (1 + e  )  (1 + e )

In fact,
                        e**(2x) + e**(-2x) - 2
  1 - tanh(x)**2 =  1 - ----------------------    = 1/cosh(x)**2 = sech(x)**2
                        e**(2x) + e**(-2x) + 2

Which is the correct value for tanh'(x), as noted above.  On the other hand,
this appears to be equivalent to your actual code fragment.  I assume Mark
suggested the alternate form for reasons of computational efficiency (so you
don't waste time computing the additional coshines, but rather use the outputs
which you already have laying around).  But the code you wrote SHOULD work
(albeit slower than necessary).  So, I would suggest that you have a different
problem. (Unless the additional cosh(out1[pattern][i]) calculation loses more
precision than your algorithm can tolerate, in which case switching to Mark's
formulation will fix that problem, as well as improving the speed!)

BTW, I wrote the engine for the commercially available back-prop based
neural-net software package called BrainMaker(tm).
(Perhaps you've heard of it?)
So I've done more than my share of this kind of coding.  (I.e., I'm not just
talking through my hat... :-)

Good luck,
Jon
---
Jon "J.D." "Parallax" Hartzberg, GSXR pilot           '86 GSXR1100  "Red Shift"
Cal. Sci. Software, Grass Valley CA     DoD #0220    '80 CB750F  "Ol' Flexible"
jon@calsci.gvgpsa.gvg.tek.com  OR ...!calsci!jon   '81 XL185S "SquirtintheDirt"
    "When you stop falling down, you stop learning." -Kenny Roberts
    "When I found out what fairings cost, I decided I'd learned enough!" -me
Disclaimer:    If my boss knew I was doing this he'd kill me.

jon@calsci (Parallax & Red Shift) (12/28/90)

In article <0022@calsci>, calsci!jon@gvgpsa.gvg.tek.com (Parallax & Red Shift) writes:

>1/cosh(x)**2 is NOT equal to 2/(e**x + e**(-x)), but rather is equal to
>TWICE that quantity: 1/cosh(x)**2 = 4 / (e**x + e**(-x)).

Oops!  Now *I'm* making typos!  I meant, of course:

    1/cosh(x)**2 is NOT equal to 2/(e**x + e**(-x)), but rather is equal to
    the SQUARE of that quantity: 1/cosh(x)**2 = 4 / (e**x + e**(-x))**2.

    = 4 / (e**(2x) + e**(-2x) + 2)

I actually used the correct value later on in my post, so everything else
should still be cool.
---
Jon "J.D." "Parallax" Hartzberg, GSXR pilot           '86 GSXR1100  "Red Shift"
Cal. Sci. Software, Grass Valley CA     DoD #0220    '80 CB750F  "Ol' Flexible"
jon@calsci.gvgpsa.gvg.tek.com  OR ...!calsci!jon   '81 XL185S "SquirtintheDirt"
    "When you stop falling down, you stop learning." -Kenny Roberts
    "When I found out what fairings cost, I decided I'd learned enough!" -me
Disclaimer:    If my boss knew I was doing this he'd kill me.