[comp.ai.neural-nets] Neural Net Applications in Chemistry

dtinker@gpu.utcs.utoronto.ca (Prof. David Tinker) (05/10/89)

    An interesting (non-tech) account of Neural Network applications in 
Chemistry appeared in "Chemical and Engineering News", April 24, 1989.  
The article describes papers given at a symposium on NN's held by the 
Division of Computers in Chemistry of the American Chemical Society (the 
article dosn't give detail on this symposium).  Besides a thumbnail 
sketch of NN theory, several applications are described.
     D.W. Elrod, G.M. Maggiora and R.G. Trenary used the "ANSim" 
commercial NN simulator(*) to predict reaction products involving 
nitrations of monosubstututed benzenes; a 25-element connection table 
incorporated data on identity, connectivity and charges of non-H atoms.  
The net, a back-prop algorithm was trained and tested with 13 test 
compounds - it predicted 10 of the 13 product distributions correctly, 
about as well as "three organic chemists at Upjohn".  An "artificial 
intelligence" program (not specified) did not do as well.  No reference 
was cited. (* Source: "Science Applications International Corp", running
on a 386 PC-AT clone with math co-processor).
     D.F. Stubbs described use of a NN program (not described in detail) 
to predict adverse gastrointestinal applications of non-steroidal anti 
inflammatory drugs, using data on pK, blood half-life, molecular weight 
and dosage.  The net was able to predict adverse drug reaction frequency 
to an accuracy of 1%.
     J.D. Bryngelson and J.J. Hopfield discussed use of a NN to predict 
protein secondary structure from data on amino-acid properties.  Here, a 
couple of recent references are given: N. Qian & T.J. Sejnowski, J. Mol. 
Biol. 202(4), 865, 1988;   L.H. Holley & M. Karplus, Proc. Natl. Acad. 
Sci. 86(1), 152, 1989.  Another approach was described by M.N. Liebman.
     If anyone has more details on the meeting or the work described, 
or further references to chemistry/biochemistry applications, please 
post!
------------------------------------------------------------------------
Disclaimer: I'm just learning about all this stuff!
-- 
-----------------------------------------------------------------------------
! David O. Tinker            !  ^ ^  !  UUCP: dtinker@gpu.utcs.utoronto.ca  !
! Department of Biochemistry !< O O >!  BITNET: dtinker@vm.utcs.utoronto.ca !
! University of Toronto      !   ^   !  BIX: dtinker                        !
! TORONTO, Ontario, Canada   ! ##### !  Voice: (416) 978-3636               !
! M5S 1A8                    !       !  And so on.                          !
!                            ! Hi ho !                                      !
-----------------------------------------------------------------------------

ted@nmsu.edu (Ted Dunning) (05/11/89)

alan lapedes and co at los alamos were able to predict whether short
sequences of dna coded for specific proteins with good accuracy using
modified neural nets.

much of the work at lanl using neural net methods has been supplanted
by doyne farmers local approximation method which (for many problems)
is several orders of magnitude more computationally efficient.  the
use of radial basis functions improves the value of this method
considerably (in addition to making the link to nn techniques even
stronger).

andrew@berlioz (Lord Snooty @ The Giant Poisoned Electric Head) (05/11/89)

"Science News", April 29th, p271 "Neural Network Predicts Reactions".
The key quote for me was "When tested with 13 [..] not in the training set
[of 32] the network predicted [..] proportions within 20% of actual values
in 10 cases. That equals the performance by a small set of human chemists
and beats out by three an existing conventional computer expert system for
predicting reaction outcomes."
		Go Nets!
-- 
Andrew Palfreyman 		USENET: ...{this biomass}!nsc!logic!andrew
National Semiconductor M/S D3969, 2900 Semiconductor Dr., PO Box 58090,
Santa Clara, CA 95052-8090 ; 408-721-4788 		there's many a slip
							'twixt cup and lip

andrew@berlioz (Lord Snooty @ The Giant Poisoned Electric Head) (05/11/89)

In article <TED.89May10130744@kythera.nmsu.edu>, ted@nmsu.edu (Ted Dunning) writes:
> much of the work at lanl using neural net methods has been supplanted
> by doyne farmers local approximation method which (for many problems)
> is several orders of magnitude more computationally efficient.  the
> use of radial basis functions improves the value of this method
> considerably (in addition to making the link to nn techniques even
> stronger). 

This is very interesting, Ted. Could you post some references to the net
on Doyne Farmer's method, and perhaps a brief resume of his approach?
-- 
Andrew Palfreyman 		USENET: ...{this biomass}!nsc!logic!andrew
National Semiconductor M/S D3969, 2900 Semiconductor Dr., PO Box 58090,
Santa Clara, CA 95052-8090 ; 408-721-4788 		there's many a slip
							'twixt cup and lip

aboulang@bbn.com (Albert Boulanger) (05/11/89)

In article  <201@bach.nsc.com> Andrew Palfreyman asks:

  In article <TED.89May10130744@kythera.nmsu.edu>, ted@nmsu.edu (Ted Dunning) writes:
  > much of the work at lanl using neural net methods has been supplanted
  > by doyne farmers local approximation method which (for many problems)
  > is several orders of magnitude more computationally efficient.  the
  > use of radial basis functions improves the value of this method
  > considerably (in addition to making the link to nn techniques even
  > stronger). 

  This is very interesting, Ted. Could you post some references to the net
  on Doyne Farmer's method, and perhaps a brief resume of his approach?
  -- 

I should mention that besides Doyne Farmer's method, James Crutchfield
and Bruce S. McNamara have a method for recovering the equations of
motion from a time series. The reference is:

"Equations of Motion from a Data Series", James Crutchfield & Bruce
McNamara, Complex Systems, 1(1987) 417-452.

Unfortunately, I never did get a reference to Doyne's method.

Chaotically yours,
Albert Boulanger
BBN Systems & Technologies Corp.
aboulanger@bbn.com

ted@nmsu.edu (Ted Dunning) (05/12/89)

In article <201@bach.nsc.com> andrew@berlioz (Lord Snooty @ The Giant Poisoned Electric Head) writes:

   In article <TED.89May10130744@kythera.nmsu.edu>, ted@nmsu.edu (Ted Dunning) writes:
   > much of the work at lanl using neural net methods has been supplanted
   > by doyne farmers local approximation method which (for many problems)
	...

   This is very interesting, Ted. Could you post some references to the net
   on Doyne Farmer's method, and perhaps a brief resume of his approach?

the best reference is the los alamos tech report LA-UR-88-901.  i
asssume that this is available from lanl, somehow (i got mine by
hand).  (interestingly, on the last page i not a us gov printing
office number:  1988-0-573-034/80049).  (you might also try doyne
whose net address is jdf@lanl.gov)

an extract of the abstract follows:

Exploiting Chaos to Predict the Future and Reduce Noise

J. Doyne Farmer and John J. Sidorowich

We discuss new approaches to forecasting, noise reduction, and the
analysis of experimental data.  The basic idea is to embed the data in
a state space and then use straightforward numerical techniques to
build a nonlinear dynamical model.  We pick an ad hoc nonlinear
representation, and fit it to the data.  For higher dimensional
problems we find that breaking the domain into neighborhoods using
local approxiamtion is usually better than using an arbitrary global
representation.  When random behavior is caused by low dimensional
chaos our short term forecasts can be several orders of magnitude
better than thos of standard linear methods.  We derive error
estimates for the accuracy of approximatin in terms of attractor
dimension and Lyapunov exponents, the number of data points, and the
extrapolation time.  We demonstrate that for a given extrapolation
time T iterating a short-term estimate is superior to computing an
estimate for T directly.
	...

We propose a nonlinear averaging scheme for separating noise from
deterministic dynamics.  For chaotic time series the noise reduction
possible depends exponentially on the length of the time series,
whereas for non-chaotic behavior it is proportional to the square
root.  WHen the equations of motion are known exactly, we can achieve
noise reductions of more than ten orders of magnitude.  Wehn the
equations are not known the limitation comes from predication error,
but for low dimensional systems noise reductions of several orders of
magnitude are still possible.

The basic principles underlying our methods are similar to those of
neural nets, but are more straightforward.  For forecasting, we get
equivalent or better results with vastly less computer time.  We
suggest that these ideas can be applied to a much larger class of
problems. 

----------------

hope that this helps.

mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel) (05/12/89)

In article <39817@bbn.COM> aboulanger@bbn.com writes:
>In article  <201@bach.nsc.com> Andrew Palfreyman asks:
>
>  In article <TED.89May10130744@kythera.nmsu.edu>, ted@nmsu.edu (Ted Dunning) writes:
>  > much of the work at lanl using neural net methods has been supplanted
>  > by doyne farmers local approximation method which (for many problems)
>  > is several orders of magnitude more computationally efficient.  the
>  > use of radial basis functions improves the value of this method
>  > considerably (in addition to making the link to nn techniques even
>  > stronger). 
>
>  This is very interesting, Ted. Could you post some references to the net
>  on Doyne Farmer's method, and perhaps a brief resume of his approach?
>  -- 
>
>I should mention that besides Doyne Farmer's method, James Crutchfield
>and Bruce S. McNamara have a method for recovering the equations of
>motion from a time series. The reference is:
>
>"Equations of Motion from a Data Series", James Crutchfield & Bruce
>McNamara, Complex Systems, 1(1987) 417-452.
>
>Unfortunately, I never did get a reference to Doyne's method.
>
>
>Chaotically yours,
>Albert Boulanger
>BBN Systems & Technologies Corp.
>aboulanger@bbn.com

The best reference for Farmers' local approximation method is in the
LANL preprint:  "Exploiting Chaos to Predict the Future and Reduce Noise"
by Farmer and Sidrowich.

Note that this paper only deals with predicting chaotic dynamical systems.

Having just done a thesis on the same sort of prediction using neural
networks, I might be able to give an outline of various methods.  There
are basically two major categories:

1)  Global methods.  In this scheme, one tries to fit some complex function
to all the data.  There are various functional forms.
    A)  Linear methods.  By "linear" I mean the output of the function
        depends linearly on the _free parameters_; all of these functions
        can represent nonlinear transformations from input to output, of
        course.  If the output is linear in the free parameters (weights),
        there is a well-known guaranteed, optimal & deterministic
	learning algorithm for the least squared error measure.

	1)  Global polynomials.  Basically a taylor expansion of degree
            d in m unknowns.  This is what Crutchfield & MacNamara use.
            Advantages: easy to compute.  Disadvantages, for complicated
            functions, you need large values of d and the number of
            free weights increases very rapidly, approx = d^m.
	    High-degree polynomials tend to blow up away from the domains
            on which they are fitted.  (trained)
	2)  Rational quotients of polynomials.  Similar to above, but if
            the degrees of num. and denom. are same, they don't blow up.

        3)  Radial basis functions, with fixed centers.  Here you choose
            the centers of your radial functions, and then learn the
            weights of the output layer using linear least squares, for
            example.  This can put put in terms of a network with a single
            hidden layer of neurons, where the first layer of weights is
            fixed, and the second layer of weights leads to a linear
            output unit.  See a recent paper by Broomhead & (somebody-or
            other) in Complex Systems.  There's also a pre-print
            by Martin Casdagli, using this to predict chaotic systems.
            ED note:  this method looks promising and should probably
            be investigated further, especially to see whether it works
            in applications other than chaotic systems prediction.

     B)  Nonlinear methods.  Now, you have to use an iterative
	 gradient-descent type of method.

	1)  Standard sigmoidal back-prop net.  Well known learning algorithm.
	2)  Radial basis function, with adjustable centers.  (This is what I
            used).  Lets you represent the mapping more accurately using
            the same size network (=free parameters) compared to sigmoidal
            net, I found.  Learn with conjugate gradient.

II)  Local methods.

     Now, instead of trying to fit a global function to all examples,
you make a simple individual fit _every_ time you need an output.
Here's how Farmer's method works:
        Given some input vector, find the N closest points in the
        data base to this input vector.  For each of these stored points,
        you know what the output was.  With these input-output pairs,
        make a linear or quadratic approximation, and fit the free parameters
        using linear least squares.  This should be fast because there aren't
        that many examples (10-30 say) and free parameters.

	"Learning" simply consists of putting the input data base into
a k-d tree that permits you to retrieve nearest neighbors is
O(log N) time, as needed to make predictions.

	This method has a nice intuitive interpretation:  given some new
input, you look around to see what things like that input did, and you
do something like that.  

Advantages:  Much faster computationally than an iterative gradient-descent
optimization, especially for large data bases of examples.  Doesn't
require any hard crunching to learn some function.  Probably more accurate
for large data bases, too, because most people wouldn't have the patience
or computer power to learn a large network to high accuracy.

Disadvantages:  The mapping isn't in a nice analytic functional form.
You can't realize it in silicon.  You need to carry around the data base
of examples, and making predictions is slower (a search & small fit vs.
a simple functional evaluation).

=================================
Personal opinion:
    If you have a large data base (say over 1K examples) of noise-free
continuous-valued examples, local linear and local quadratic methods
will probably be the best.  I don't know what the effect input noise
would have on this method, compared to neural networks, but I don't
think it would be that bad.  For binary values, it may not be as good,
for the whole method is rooted in the field of dynamical systems.
But I don't think anybody's tried yet, so I have no real evidence one
way or the other.
==================

Get the preprint (it's very good) from Doyne Farmer at LANL.  I believe
his e-mail address is "jdf%heretic@lanl.gov".  An earlier version of
this appeared in Physical Review Letters, so it's definitely real.

Matt Kennel
mbkennel@phoenix.princeton.edu

hsg@romeo.cs.duke.edu (Henry Greenside) (05/12/89)

In these discussions of Farmer et al's methods versus
neural nets, has anyone addressed the real issue, how
to treat high-dimensional data?

In his paper, Farmer et al point out the crucial fact
that one can learn only low dimensional chaotic systems
(where low is rather vague, say of dimension less than
about 5). High dimensional systems require huge amounts
of data for learning. Presumably many interesting data
sets (weather, stock markets, chemical patterns, etc.)
are not low-dimensional and neither method will be
useful.

mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel) (05/13/89)

In article <14480@duke.cs.duke.edu> hsg@romeo.UUCP (Henry Greenside) writes:
)In these discussions of Farmer et al's methods versus
)neural nets, has anyone addressed the real issue, how
)to treat high-dimensional data?
)
)In his paper, Farmer et al point out the crucial fact
)that one can learn only low dimensional chaotic systems
)(where low is rather vague, say of dimension less than
)about 5). High dimensional systems require huge amounts
)of data for learning. Presumably many interesting data
)sets (weather, stock markets, chemical patterns, etc.)
)are not low-dimensional and neither method will be
)useful.

Quite true.  This is definitely a fundamental problem.   As the
dimension gets higher and higher, the data series looks more and more
like true random noise, and so predicion becomes impossible.  Note, for
example, that the output of your favorite "random number generator" is
most likely deterministic, but probably has such a high dimension that
you can't predict with any accuracy without knowing the exact algorithm
used.

The real problem comes down to finding a representation with both
smooth mappings and low (fractal) dimensional input spaces.  This
requires plain old hard work and  clever insight.

Matt Kennel
mbkennel@phoenix.princeton.edu