[comp.sources.misc] v14i084: Fast Backpropagation Part 1 of 4

drt@chinet.chi.il.us (Donald Tveter) (09/16/90)
Posting-number: Volume 14, Issue 84
Submitted-by: Donald Tveter <drt@chinet.chi.il.us>
Archive-name: back-prop/part01

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of archive 1 (of 4)."
# Contents:  README
# Wrapped by drt@chinet on Fri Aug 31 08:17:03 1990
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'README' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'README'\"
else
echo shar: Extracting \"'README'\" \(34957 characters\)
sed "s/^X//" >'README' <<'END_OF_FILE'
X.ce
XFast Back-Propagation
X.ce
XCopyright (c) 1990 by Donald R. Tveter
X
X
X.ul
XIntroduction
X
X   The programs described below were produced for my own use in studying
Xback-propagation and for doing experiments that are found in my
Xintroduction to Artificial Intelligence textbook, \fIThe Basis of
XArtificial Intelligence\fR, to be published by Computer Science Press.
XI have copyrighted these files but I hereby give permission to anyone to
Xuse them for experimentation, educational purposes or to redistribute
Xthem on a not for profit basis.  All others that may want to use, change
Xor redistribute these programs for commercial purposes, should contact
Xme by mail at:
X
X.na
X.nf
X                  Dr. Donald R. Tveter
X                  5228 N. Nashville Ave.
X                  Chicago, Illinois   60656
X                  USENET:  drt\@chinet.chi.il.us
X.ad
X.fi
X
XAlso, I would be interested in hearing your suggestions, bug reports
Xand major successes or failures.
X
X   There are four simulators that can be constructed from the
Xincluded files.  The program, rbp, does back-propagation using double
Xprecision floating point weights and arithmetic.  The program, bp, does
Xback-propagation using 16-bit integer weights, 16 and 32-bit integer
Xarithmetic and some double precision floating point arithmetic.  The
Xprogram, sbp, uses 16-bit integer symmetric weights but only allows
Xtwo-layer networks.  The program srbp does the same using 64-bit
Xfloating point weights.  The purpose of sbp and srbp is to produce
Xnetworks that can be used with the Boltzman machine relaxation
Xalgorithm (not included).
X
X   In most cases, the 16-bit integer programs are the most useful,
Xbecause they are the fastest.  With a 10 MHz 68010, connections can be
Xprocessed at up to about 45,000 per second and weight changes can be
Xdone at up to about 25,000 per second.  These values depend on the exact
Xproblem.  The integer versions will probably be faster on most machines
Xthan the versions that use real arithmetic.  Unfortunately, sometimes
X16-bit integer weights don't have enough range or precision and then
Xusing the floating point versions may be necessary.  Many other speed-up
Xtechniques are included in these programs.
X
X.ul
XMaking the Simulators
X
X   To make a particular executable file, use the makefile given
Xwith the data files and make any or all of them like so:
X
X.ce
Xmake bp
X.ce
X make sbp
X.ce
X make rbp
X.ce
X make srbp
X
XOne option exists for bp and sbp.  If your compiler is smart enough
Xto divide by 1024 by shifting, use "-DSMART".
X
X   To make a record of all the input and output from the programs,
Xthe following small UNIX command file I call record can be used:
X
X.na
X.nf
Xtrap "" 2
Xoutfile="${1}.record"
Xif test -f $outfile 
X   then
X      rm $outfile
X   fi
Xecho $outfile
X(tee -a $outfile | $*) | tee -a $outfile
Xprocess=`ps | grep tee | cut -c1-6`
Xkill $process
X.ad
X.fi
X
XFor example to make a record of all the input and output from the
Xprogram bp using data file, xor, use:
X
X.ce
Xrecord bp xor
X
X
X.ul
XA Simple Example
X
X  Each version would normally be called with the name of a file to read
Xcommands from, as in:
X
X.ce
Xbp xor
X
XWhen no file name is specified, bp expects to take commands from the
Xkeyboard (UNIX stdin file).  After the file name from the command line
Xis read and the commands in the file are executed, commands are then
Xtaken from the keyboard.
X
X   The commands are one letter commands.  Most commands have
Xoptional parameters.  The `*' character is a comment.  It can be used
Xto make the remainder of the line a comment.  Here is an example of
Xan input file to do the xor problem:
X           
X.na
X.nf
X* input file for the xor problem
X           
Xm 2 1 1           * make a 2-1-1 network
Xc 1 1 3 1         * add this extra connection
Xc 1 2 3 1         * add this extra connection
Xs 7               * seed the random number function
Xk 0 1             * give the network random weights
X
Xn 4               * read four new patterns into memory
X1 0 1
X0 0 0
X0 1 1
X1 1 0
X
Xe 0.5             * set eta to 0.5 (and eta2 to 0.05)
Xa 0.9             * set alpha to 0.9
X.ad
X.fi
X
XIn this example, the m command is a command to make a network.  The
Xnumbers following it are the number of units for each layer.  The m
Xcommand connects adjacent layers with weights.  The following c
Xcommands create extra connections from layer 1, unit 1 to layer 3,
Xunit 1 and from layer 1, unit 2 to layer 3, unit 1.  The `s' command
Xsets the seed for the random number function.  The `k' command then
Xgives the network random weights.  The `k' command has another use as
Xwell.  It can be used to try to kick a network out of a local minimum.
XHere, the meaning of "k 0 1" is to examine all the weights in the
Xnetwork and for every weight equal to 0 (and they all start out at 0),
Xadd in a random number between -1 and +1.  The `n' command
Xspecifies four new patterns to be read into memory.  With the `n'
Xcommand, any old patterns that may have been present are removed.
XThere is also an `x' command that behaves like the `n' command, except
Xthe `x' commands \fIadds\fR the extra patterns to the current training
Xset.  The input pattern comes first, followed by the output pattern.
XThe statement, e 0.5, sets eta, the learning rate, to 0.5 and eta2 from
Xthe differential step size algorithm to one tenth this, or 0.05.  The
Xlast line sets alpha, the momentum parameter, to 0.9.
X
X   The above statements set up the network and when the list of
Xcommands runs out, commands are taken from the keyboard.  The
Xfollowing messages and prompt appears:
X
X.na
X.nf
X.ne2
XFast Backpropagation Copyright (c) 1990 by Donald R. Tveter
Xtaking commands from stdin now
X[?!*AabCcEefHhijklmnoPpQqRrSstWwx]?
X.ad
X.fi
X
XThe square brackets enclose a list of the possible commands.
XThe `r' command is used to run the training algorithm.  Typing in "r 200
X100" as shown below, means run 200 iterations through the patterns
Xand print the output patterns every 100 iterations:
X
X.na
X.nf
X.ne3
X[?!*AabCcEefHhijklmnoPpQqRrSstWwx]? r 200 100
Xrunning . . .
X.ne5
X100 iterations, s 7, k 0 1.00, file = xor
X  1  0.81  (0.03739)
X  2  0.13  (0.01637)
X  3  0.85  (0.02262)
X  4  0.17  (0.02988)
X.ne5
X159 iterations, s 7, k 0 1.00, file = xor
X  1  0.90  (0.00973)
X  2  0.07  (0.00467)
X  3  0.92  (0.00565)
X  4  0.09  (0.00739)
Xpatterns learned to within 0.10 at iteration 159
X.ad
X.fi
X
XThe program immediately prints out the "running . . ." message.  After
Xeach 100 iterations, a header line giving some program parameters
Xis printed out, followed by the results that occur when each of the four
Xpatterns is submitted to the network.  If the second number defining
Xhow often to print out values is omitted, the values will not print
Xeven when the learning is finished.  The values in parentheses at the
Xend of each line give the sum of the squared error on the output units
Xfor each output pattern.  These error numbers are useful to see because
Xthey give you some idea of how fast each pattern is being learned.
XThe program also reports that the patterns have been learned to within
Xthe default tolerance of 0.1.  This check for the tolerance being met
Xis done for every learning iteration.  Sometimes in the integer version
Xthe program will do a few extra iterations before declaring
Xthe problem done.  This is because of truncation errors in the
Xarithmetic done to check for convergence.
X
X   A particular test pattern can be input to the network with the `p'
Xcommand, as in:
X
X.na
X.nf
X.ne2
X[?!*AabCcEefhijklmnoPpQqRrSstwx]? p 1 0
X     0.91 
X.ad
X.fi
X
XTo have the system evaluate a particular stored pattern, say pattern
Xnumber 4, use the `P' command as in:
X
X.na
X.nf
X.ne2
X[?!*AabCcEefHhijklmnoPpQqRrSstWwx]? P4
X  4  0.09  (0.00739)
X.ad
X.fi
X
XTo print all the values for all the training patterns without doing
Xany learning, type `P':
X
X.na
X.nf
X.ne5
X[?!*AabCcEefHhijklmnoPpQqRrSstWwx]? P
X  1  0.90  (0.00973)
X  2  0.07  (0.00467)
X  3  0.92  (0.00565)
X  4  0.09  (0.00739)
X.ad
X.fi
X
X   One thing you might want to know are the values of the weights
Xthat have been produced.  To see this, there is the `w' command.
XThe `w' command gives the value of the weights leading into
Xa particular unit and also data about how the activation value of the
Xunit is computed.  Two integers after the w specify the layer and
Xunit number within the layer whose weights should be printed.  For
Xexample, if you want the weights leading into the unit at layer 2,
Xposition number 1, type:
X
X.na
X.nf
X.ne6
X[?!*AabCcEefHhijklmnoPpQqRrSstWwx]? w 2 1
Xlayer unit  unit value     weight         input from unit
X  1      1    1.00000     7.27930             7.27930
X  1      2    1.00000    -5.66797            -5.66797
X  2      t    1.00000     2.74902             2.74902
X                                      sum =   4.36035
X.ad
X.fi
X
XIn this example, the unit at layer 2, number 1 is receiving input from
Xunits 1 and 2 in the previous (the input) layer and from a unit, t.
XUnit t is the threshold unit.  The "unit value" column gives the value
Xof the input units for the last time some pattern was placed on the
Xinput units.  In this case, the fourth pattern was the last one that the
Xnetwork has seen.  The next column lists the weights on the connections
Xinto the unit at (2,1).  The final column is the result from multiplying
Xtogether the unit value and the weight.  Beneath this column, the sum of
Xthe inputs is given.
X
X   Another important command is the help command.  It is the letter
X`h' (not `?') followed by the letter of the command.  The help command
Xwill give a brief summary of how to use the command.  Here, we type
Xh h for help with help:
X
X.na
X.nf
X.ne3
X[?!*AabCcEefHhijklmnoPpQqRrSstWwx]? h h
X
Xh <letter> gives help for command <letter>.
X.ad
X.fi
X
X   Finally, to end the program, the `q' (for quit) command is entered:
X
X[?!*AabCcEefHhijklmnoPpQqRrSstWwx]? q
X
X.ul
XInput and Output Formats
X
X   The programs are able to read patterns in two different formats.  The
Xdefault input format is the compressed (condensed) format.  In it, each
Xvalue is one character and it is not necessary to have blanks between
Xthe characters.  For example, in compressed format, the patterns for xor
Xcould be written out in either of the following ways:
X
X.ce
X101               10 1
X.ce
X000               00 0
X.ce
X011               01 1
X.ce
X110               11 0
X
XThe second example is preferable because it makes it
Xeasier to see the input and the output patterns.  Compressed format can
Xalso be used to input patterns with the `p' command.
XIn addition to using 1 and 0 as input, the character, `?' can be used.
XThis character is initially defined to be 0.5, but it can be redefined
Xusing the Q command like so:
X
X.ce
XQ 0.7
X
XThis sets the value of ? to 0.7.  Other valid input characters are the
Xletters, `h', `i', `j' and `k'.  The `h' stands for `hidden'.  Its
Xmeaning in an input string is that the value at this point in the string
Xshould be taken from the next unit in the second layer of the network.
XNormally this will be the second layer of a three-layer network.  This
Xnotation is useful for specifying simple recurrent
Xnetworks.  Naturally, `i', `j' and `k' stand for taking input
Xvalues from the third, fourth and fifth layers (if they exist).  A
Xsimple example of a recurrent network is given later.
X
X   The other input format for numbers is real.  The number portion must
Xstart with a digit (.35 is not allowed, but 0.35 is).  Exponential
Xnotation is not allowed.  Real numbers have to be separated by a space.
XThe `h', `i', `j', `k' and `?' characters are also allowed with real
Xinput patterns.  To take input in this format, it is necessary
Xto set the input format to be real using the `f' (format) command as in:
X
X.ce
Xf ir
X
XTo change back to the compressed format, use:
X
X.ce
Xf ic
X
XOutput format is controlled with the `f' command as in:
X
X.ce
Xf or
X.ce
Xf oc
X.ce
Xf oa
X
XThe first sets the output to real numbers.  The second sets the
Xoutput to be condensed mode where the value printed will be a `1' when
Xthe unit value is greater than 1.0 - tolerance, a `^' when the value
Xis above 0.5 but less than 1.0 - tolerance, a `v' when the value is
Xless than 0.5 but greater than the tolerance.  Below the tolerance
Xvalue, a `0' is printed.  The tolerance can be changed using the `t'
Xcommand.  For example, to make all values greater than 0.8 print
Xas `1' and all values less than 0.2 print as `0', use:
X
X.ce
Xt 0.2
X
XOf course, this same tolerance value is also used to check to see if all
Xthe patterns have converged.  The third output format is meant to
Xgive "analog condensed" output.  In this format, a `c' is printed when
Xa value is close enough to its target value.  Otherwise, if the answer
Xis close to
X1, a `1' is printed, if the answer is close to 0, a `0' is printed, if
Xthe answer is above the target but not close to 1, a `^' is printed and
Xif the answer is below the target but not close to 0, a `v' is printed.
XThis output format is designed for
Xproblems where the output is a real number, as for instance, when the
Xproblem is to make a network learn sin(x).
X
X   With the f command, a number of sub-commands can be put on one line
Xas in the following, where the input is set to real and the output
Xis set to analog condensed:
X
X.ce
Xf ir oa
X
XAlso, for the sake of convenience, the output format (and only the
Xoutput format) can be set without using the `f', so that:
X
X.ce
Xor
X
Xwill also make the output format real.
X
X   In the condensed formats, the default is to print a blank after every
X10 values.  This can be altered using the `b' (for inserting breaks)
Xcommand.  The use for this command is to separate output values into
Xlogical groups to make the output more readable.  For instance, you may
Xhave 24 output units where it makes sense to insert blanks after the
X4th, 7th and 19th positions.  To do this, specify:
X
X.ce
Xb 4 7 19
X
XThen for example, the output will look like:
X
X.na
X.nf
X  1 10^0 10^ ^000v00000v0 01000 (0.17577)
X  2 1010 01v 0^0000v00000 ^1000 (0.16341)
X  3 0101 10^ 00^00v00000v 00001 (0.16887)
X  4 0100 0^0 000^00000v00 00^00 (0.19880)
X.ad
X.fi
X
XThe `b' command allows up to 20 break positions to be specified.
XThe default output format is the real format with 10 numbers per
Xline.  For the output of real values, the `b' command specifies when to
Xprint a carriage return, rather than when to print a blank.
X
X   Sometimes the training set is so large that it is annoying to
Xhave all the patterns print out every n iterations.  To get a summary of
Xhow learning is going, instead of all these patterns, use "f s+".
XNow, if the command in the xor problem was "r 200 50" the following
Xoutput summary will result:
X
X.na
X.nf
X    50        0 learned      4 unlearned     0.48364 error/unit
X   100        0 learned      4 unlearned     0.16528 error/unit
X   150        3 learned      1 unlearned     0.08813 error/unit
X   159        4 learned      0 unlearned     0.08203 error/unit
Xpatterns learned to within 0.10 at iteration 159
X.ad
X.fi
X
XThe program counts up how many patterns were learned or not learned
Xin each training pass before the weights are updated.  Therefore, the
Xstatus is one iteration out of date.  The error/unit is the average
Xabsolute value of the error on each unit for each pattern.  To switch
Xback to the longer report, use "f s-".  The P command will list all the
Xpatterns no matter what the setting of the summary parameter is.
X
X.ul
XSaving and Restoring Weights and Related Values
X
X   Sometimes the amount of time and effort needed to produce a set of
Xweights to solve a problem is so great that it is more convenient to
Xsave the weights rather than constantly recalculate them.  Weights can
Xbe saved as real values (the default) or as binary, to save space.  To
Xsave the weights enter the command, `S'.  The weights are written on a
Xfile called "weights".  The following file comes from the
Xxor problem:
X
X.na
X.nf
X159r  file = xor
X    7.2792968750
X   -5.6679687500
X    2.7490234375
X    5.8486328125
X   -5.0400390625
X  -11.8574218750
X    8.3193359375
X.ad
X.fi
X
XTo write the weights, the program starts with the second layer, writes
Xout the weights leading into these units in order with the threshold
Xweight last.  Then it moves on to
Xthe third layer, and so on.  To restore these weights, type an `R' for
Xrestore.  At this time, the program reads the header line and sets the
Xtotal number of iterations the program has gone through to be the first
Xnumber it finds on the header line.  It then reads the character
Ximmediately after the number.  The `r' indicates that the weights will
Xbe real numbers represented as character strings.  If the weights were
Xbinary, the character would be a `b' rather than an `r'.  Also, if the
Xcharacter is `b', the next character is read.  This next character
Xindicates how many bytes are used per value.  The integer versions, bp
Xand sbp write files with 2 bytes per weight, while the real versions,
Xrbp and srbp write files with 8 bytes per weight.  With this notation,
Xweight files written by one program can be read by the other.  A binary
Xweight format is specified within the `f' command by using "f wb".  A
Xreal format is specified by using "f wr".  If your program specifies
Xthat weights should be written in one format, but the weight file you
Xread from is different, a warning message will be printed.  There is no
Xcheck made to see if the number of weights on the file equals the number
Xof weights in the network.
X
X   The above formats specify that only weights are written out and
Xthis is all you need once the patterns have converged.  However, if
Xyou're still training the network and want to break off training and
Xpick up the training from exactly the same point later, you need to save
Xthe old weight changes when using momentum, and the parameters for the
Xdelta-bar-delta method if you are using this technique.  To save these
Xextra parameters on the weights file, use "f wR" to write the extra
Xvalues as real and "f wB" to write the extra values as binary.
X
X   In the above example, the command S, was used to save the weights
Ximmediately.  Another alternative is to save weights at regular
Xintervals.  The command, S 100, will automatically save weights every
X100 iterations the program does, that is, when the total iterations mod
X100 = 0.  The initial rate at which to save weights is set at 100,000,
Xwhich generally means that no weights will ever be saved.
X
X   Another use for saving weights has to do with trying to find the
Xproper parameters to quickly solve the problem.  Ordinarily, a high
Xrate of learning is desirable, but often too high a rate of learning
Xwill increase the error, rather than decrease it.  In trying to find
Xthe answer as quickly as possible, if the network seems to be
Xconverging with the current parameters you can save the current weights
Xand increase the learning rate.  If this increased learning rate ruins
Xthe convergence, then you can restore the weights you had before you
Xmade this increase.
X
X
X.ul
XInitializing Weights and Giving the Network a `Kick'
X
X   All the weights in the network initially start out at 0.  In
Xsymmetric networks then, no learning may result because error signals
Xcancel themselves out.  Even in non-symmetric
Xnetworks, the training process will often converge faster if the weights
Xstart out at small random values.  To do this, the `k' command will
Xtake the network and alter the weights in the following ways.  Suppose
Xthe command given is:
X
X.ce
Xk 0 0.5
X
XNow, if a weight is exactly 0, then the weight will be changed to a
Xrandom value between +0.5 and -0.5.  The above command can therefore be
Xused to initialize the weights in the network.  A more complex use of
Xthe `k' command is to decrease the magnitude of large weights in the
Xnetwork by a certain random amount.  For instance, in the following
Xcommand:
X
X.ce
Xk 2 8
X
Xall the weights in the network that are greater than or equal to 2, will
Xbe decreased by a random number between 0 and 8.  Weights
Xless than or equal to -2 will be increased by a random number
Xbetween 0 and 8.  The seed to the random number generator can be
Xchanged using the `s' command as in "s 7".  The integer parameter in the
X`s' command is of type, unsigned.
X
X   Another method of giving a network a kick is to add hidden layer
Xunits.  The command:
X
X.ce
XH 2 0.5
X
Xadds one unit to layer 2 of the network and all the weights that are
Xcreated are initialized to between - 0.5 and + 0.5.
X
X   The subject of kicking a back-propagation network out of local minima
Xhas barely been studied and there is no guarantee that the above methods
Xare very useful in general.
X
X.ul
XSetting the Algorithm to Use
X
X   A number of different variations on the original back-propagation
Xalgorithm have been proposed in order to speed up convergence.  Some
Xof these have been built into these simulators.  Some of the methods
Xcan be mixed together.  The two most important choices are the
Xderivative term to use and the update method to use.  The default
Xderivative is the one devised by Fahlman:
X
X.ce
X0.1 + s(1-s)
X
Xwhere s is the activation value of the unit.  The reason for adding in
Xthe 0.1 term to the correct formula for the derivative is that when s is
Xclose to 0 or 1, the amount of error passed back is very small and so
Xlearning is very slow.  Adding the 0.1 speeds up the learning process.
X(For the original description of this method, see "Faster Learning
XVariations of Back-Propagation:  An Empirical Study", by Scott E.
XFahlman, in \fIProceedings of the 1988 Connectionist Models Summer
XSchool\fR, Morgan Kaufmann, 1989.)  Besides Fahlman's derivative and the
Xoriginal one, the differential step size method (see "Stepsize Variation
XMethods for Accelerating the Back-Propagation Algorithm", by Chen and
XMars, in \fIIJCNN-90-WASH-DC\fR, Lawrence Erlbaum, 1990) takes the
Xderivative to be 1 in the layer going into the output units and uses the
Xoriginal derivative for all other layers.  The learning rate for the
Xinner layers is normally set to 1/10 the rate in the outer layer.  To
Xset the derivative, use the `A' command as in:
X
X.ne4
X   A do   * use the original derivative
X   A df   * use Fahlman's derivative
X   A dd   * use the differential step size derivative
X
X   The algorithm command can contain other sub-commands besides the
Xsetting of the derivative.  The other major choice is the update method.
XThe choices are the original one, the differential step size method,
XJacob's delta-bar-delta method, the continuous update method and the
Xcontinuous update method with the differential step size etas.  To set
Xthese update methods use:
X
X.na
X.nf
X.ne6
X   A uo   * the original update method
X   A ud   * the differential step size method
X   A uj   * Jacob's delta-bar-delta method
X   A uc   * the continuous update method
X   A uC   * the continuous update method with the differential
X          * step size etas
X.ad
X.fi
X
XThe differential step size method uses the standard eta when updates
Xare made to the units leading into the output layer.  For deeper layers,
Xanother value will be used.  The default is to use an eta, called eta2,
Xfor the inner layers that is one-tenth the standard eta.  These etas
Xboth get set using the `e' command (not a sub-command of the `A'
Xcommand) as in:
X
X.ce
Xe 0.5 0.1
X
XThe standard eta will be set to 0.5 and eta2 will be 0.1  If eta2
Xhad been omitted, it would have been set to 0.05.  Jacob's
Xdelta-bar-delta method uses a number of special parameters and these
Xare set using the `j' command.  Jacob's update method can actually be
Xused with any of the three choices for derivatives and the algorithm
Xwill find its own value of eta for each weight.  The differential
Xstep size derivative is often very effective with Jacob's
Xdelta-bar-delta method.
X
X   There are five other `A' sub-commands.  First, the activation
Xfunction can be either the piece-wise linear function or the original
Xsmooth activation function, but the smooth function is only available
Xwith the programs that use real weights and arithmetic.  To set the
Xtype of function, use:
X
X   A ap   * for the piece-wise activation function
X   A as   * for the smooth activation function
X
XThe piece-wise function can save quite a lot in execution time despite
Xthe fact that it normally increases the number of iterations required
Xto solve a problem.
X
X   Second, it has been reported that using a sharper sigmoid shaped
Xactivation function will produce faster convergence (see "Speeding Up
XBack Propagation" by Yoshio Izui and Alex Pentland in the Proceedings of
X\fIIJCNN-90-WASH-DC\fR, Lawrence Erlbaum Associates, 1990 ).  If we let
Xthe function be:
X
X                                1
X                         ----------------,
X                         1 + exp (-D * x)
X
Xincreasing D will make the sigmoid sharper while decreasing D will
Xmake it flatter.  To set this parameter, to say, 8, use:
X
X.ce
XA D 8  * sets the sharpness to 8
X
XThe default value is 1.  A larger D is also useful in the integer
Xversion of back-propagation where the weights are limited to between
X-32 and +31.999.  A larger D value in effect magnifies the weights and
Xmakes it possible for the weights to stay smaller.  Values of D less
Xthan one may be useful in extracting a network from a local minima
X(see "Handwritten Numeral Recognition by Multi-layered Neural Network
Xwith Improved Learning Algorithm" by Yamada, Kami, Temma and Tsukumo in
XProceedings of the 1989 IJCNN, IEEE Press).  Also, when you have large
Xinput values, values of D less than 1 can be used to scale down the
Xactivation to higher level units.
X
X   The third miscellaneous command is the `b' command to control
Xwhether or not to backpropagate error for units that have learned
Xtheir response to within a given tolerance.  The default is to
Xalways backpropagate error.  The advantage to not backpropagating
Xerror is that this can save computer time and sometimes actually
Xdecrease the number of iterations that are required to solve the
Xproblem.  This parameter can be set like so:
X
X   A b+   * always backpropagate error
X   A b-   * don't backpropagate error when close
X
X   The fourth `A' sub-command allows you to limit the weights
Xthat the network produces to some restricted range.  This can be
Ximportant in the programs with 16-bit weights.  These programs limit
Xthe weights to be from -32 to +31.999.  When a weight near +31.999 is
Xincreased a little it can overflow and produce a negative value.  When
Xone or more weights overflow, the learning usually takes a dramatic
Xturn for the worse, or on rare occasions, it suddenly improves.  To
Xhave the program check for weights above 30 or below -30, enter: "A l
X30".  This also limits the
Xabsolute values of the weights to be less than or equal to 30.  The
Xweights are checked after they have been updated and if a weight is
Xgreater than this limit, it is set equal to this limit.  The first time
Xthis happens, a warning message is produced.  With this method, it is
Xpossible, in principle, for a large weight change to cause overflow
Xwithout being caught, but this is unlikely.  To stop the weight
Xchecking, set the limit to 0.  The default is to not check.
X
X   The final miscellaneous `A' sub-command is `s', for skip.  Setting
Xs = n will have the program skip whole patterns for n iterations that
Xhave been learned to within the required tolerance.  For example, to
Xskip patterns that have been learned for 5 iterations, use:  "A s 5".
X
X.ul
XJacob's Delta-Bar-Delta Method and Parameters
X
X   Jacob's delta-bar-delta method attempts to find a learning rate
Xeta, for each individual weight.  The parameters are the initial
Xvalue for the etas, the amount by which to increase an eta that seems
Xto be too small, the rate at which to decrease an eta that is apparently
Xtoo large, a maximum value for each eta and a parameter used in keeping
Xa running average of the slopes.  Here are examples of setting these
Xparameters:
X
X.na
X.nf
X   j d 0.5    * sets the decay rate to 0.5
X   j e 0.1    * sets the initial etas to 0.1
X   j k 0.25   * sets the amount to increase etas by (kappa) to
X              * 0.25
X   j m 10     * sets the maximum eta to 10
X   j t 0.7    * sets the history parameter, theta, to 0.7
X.ad
X.fi
X
XThese settings can all be placed on one line:
X
X.ce
Xj d 0.5  e 0.1  k 0.25  m 10  t 0.7
X
XThe version implemented here does not use momentum.
X
X   The idea behind the delta-bar-delta method is to let the program find
Xits own learning rate for each weight.  The `e' sub-command sets the
Xinitial value for each of these learning rates.  When the program sees
Xthat the slope of the error surface averages out to be in the same
Xdirection for several iterations for a particular weight, the program
Xincreases the eta value by an amount, kappa, given by the `k' parameter.
XThe network will then move down this slope faster.  When the program
Xfinds the slope changes signs, the assumption is that the program has
Xstepped over to the other side of the minima and it is nearing the
Xminimum from the opposite side.   Therefore, it cuts down the learning
Xrate, by the decay factor, given by the `d' parameter.  For instance, a
Xd value of 0.5 cuts the learning rate for the weight in half.  The `m'
Xparameter specifies the maximum allowable value for an eta.  The `t'
Xparameter (theta) is used to compute a running average of the slope of
Xthe weight and must be in the range 0 <= t < 1.  The running average at
Xiteration i, a\di\u , is defined as:
X
X.ce
Xa\di\u = (1 - t) slope\di\u + ta\di-1\u,
X
Xso small values for t make the most recent slope more important than
Xthe previous average of the slope.  Determining the learning rate for
Xback-propagation automatically is, of course, very desirable and this
Xmethod often speeds up convergence by quite a lot.  Unfortunately, bad
Xchoices for the delta-bar-delta parameters give bad results and a lot of
Xexperimentation may be necessary.  For more, see "Increased Rates of
XConvergence" by Robert A. Jacobs, in \fINeural Networks\fR, Volume 1,
XNumber 4, 1988.
X
X.ul
XRecurrent Networks
X
X   Recurrent back-propagation networks take values from higher level
Xunits and use them as activation values of lower level units.  This
Xgives a network a simple kind of short-term memory, possibly a little
Xlike human short-term memory.  For instance, suppose you want a network
Xto memorize the two short sequences, "acb" and "bcd".  In the middle of
Xboth of these sequences is the letter, "c".  In the first case you
Xwant a network to take in "a" and output "c".  Then take in "c" and
Xoutput "b".  In the second case you want a network to take in "b" and
Xoutput "c".  Then take in "c" and output "d".  To do this, a network
Xneeds a simple memory of what came before the "c".
X
X   Let the network be an 7-3-4 network where input units 1-4 and output
Xunits 1-4 stand for the letters a-d.  Furthermore, let there be 3 hidden
Xlayer units.  The hidden units will feed their values back down to the
Xinput units 5-7, where they become input for the next step.  To see why
Xthis works, suppose the patterns have been learned by the network.
XInputing the "a" from the first string produces some random pattern of
Xactivation on the hidden layer units and "c" on the output units.  The
Xpattern from the hidden units is copied down to the input layer.
XSecond, the letter, "c" is presented to the network together with the
Xrandom pattern, now on units 5-7.
XHowever, if the "b" from the second string is presented first, there
Xwill be a different random pattern on the hidden layer units.  These
Xvalues are copied to units 5-7.  These values
Xcombine with the "c" to produce another random pattern.  This random
Xpattern will be different from the pattern the first string produced.
XThis difference can be used by the network to make the response for the
Xfirst string, "b" and the response for the second string, "d".
XThe training patterns for the network can be:
X
X     1000 000   0010  * "a" prompts the output, "c"
X     0010 hhh   0100  * inputing "c" should produce "b"
X
X     0100 000   0010  * "b" prompts the output, "c"
X     0010 hhh   0001  * inputing "c" should produce "d"
X
Xwhere the first four values on each line are the normal input, the
Xmiddle three either start out all zeros or take their values from the
Xprevious values of the hidden units.  The code for taking these values
Xfrom the hidden layer units is "h".  The last set of values represents
Xthe output that should be produced.  To take values from the third layer
Xof a network, the code is "i".  For the fourth and fifth layers (if they
Xexist) the codes are "j" and "k".  Training recurrent networks can take
Xmuch longer than training standard networks.
X
X.ul
XMiscellaneous Commands
X
X   Below is a list of some miscellaneous commands, a short example of
Xeach and a short description of the command.
X
X.IP "   ?   ?       " 15
XA `?' will print program status information.
X
X.IP "   !   !cmd    " 15
XAnything after `!' will be passed on to UNIX as a command to execute.
X
X.IP "   C           " 15
XThe C command will clear the network of values, reset the number of
Xiterations, set the seed to 0 and reset other values so that another
Xrun can be made with a new seed value.
X
X.IP "   E   E 1     " 15
XEntering "E 1" will echo all the input.  "E 0" will stop
Xechoing command input.  The default is to not echo input, since it
Xappears on the screen automatically.  Echoing input is useful when
Xcommands are taken from a file of commands, using the `i' command
Xdescribed below.  It can also be useful when reading commands from
Xa file when there is some kind of error within the file.
X
X.IP "   i   i f     " 15
XEntering "i f" will read commands from the file, f.  When there are
Xno more commands on a file, the program starts reading from the
Xkeyboard.  (Its very handy to have a set of fixed commands in a file
Xto, in effect, create a new command.)
X
X.IP "   l   l 2     " 15
XEntering "l 2" will print the values of the units on layer 2,
Xor whatever layer is specified.
X
X.IP "   T   T -3   " 15
XIn sbp and srbp only, "T -3" sets all the threshold weights
Xto -3 or whatever value is specified and freezes them at this value.
X
X.IP "   W   W 0.9   " 15
XEntering "W 0.9" will remove (whittle away) all the weights with
Xabsolute values less than 0.9.
X.in-15
X
XIn addition, when a user generated interrupt occurs (by typing DEL)
Xthe program will drop its current task and take the next command.
X
X.ul
XLimitations
X
X   Weights in the bp and sbp programs are 16-bit integer weights, where
Xthe real value of the weight has been multiplied by 1024.  The integer
Xversions cannot handle weights less than -32 or greater than 31.999.
XWeights are only checked if the Algorithm parameter, l, is set to a
Xvalue greater than 0.  Large learning rates with the differential step
Xsize derivative and using the continuous update method can produce
Xoverflow.  There are other places in these programs where calculations
Xcan possibly overflow as well and none of these places are checked.
XOverflow seems highly unlikely, in these other places, however.  Input
Xvalues for the integer versions can run from -31.994 to
X31.999.  Due to the method used to implement recurrent connections,
Xinput values in the real version are limited to -31994.0 and above.
END_OF_FILE
if test 34957 -ne `wc -c <'README'`; then
    echo shar: \"'README'\" unpacked with wrong size!
fi
# end of 'README'
fi
echo shar: End of archive 1 \(of 4\).
cp /dev/null ark1isdone
MISSING=""
for I in 1 2 3 4 ; do
    if test ! -f ark${I}isdone ; then
	MISSING="${MISSING} ${I}"
    fi
done
if test "${MISSING}" = "" ; then
    echo You have unpacked all 4 archives.
    rm -f ark[1-9]isdone
else
    echo You still need to unpack the following archives:
    echo "        " ${MISSING}
fi
##  End of shell archive.
exit 0