[comp.ai.neural-nets] Neural Nets in Gene Recognition

eesnyder@boulder.Colorado.EDU (Eric E. Snyder) (11/06/89)

I am looking for some references on the application of neural nets to
recognition of genes in DNA sequences.  I have heard second-hand of some
research at LANL but Medline does not seem to be well informed on the
matter....

Thanx,

---------------------------------------------------------------------------
TTGATTGCTAAACACTGGGCGGCGAATCAGGGTTGGGATCTGAACAAAGACGGTCAGATTCAGTTCGTACTGCTG
Eric E. Snyder                            
Department of Biochemistry              Proctoscopy recapitulates   
University of Colorado, Boulder         hagiography.            
Boulder, Colorado 80309                  
LeuIleAlaLysHisTrpAlaAlaAsnGlnGlyTrpAspLeuAsnLysAspGlyGlnIleGlnPheValLeuLeu
---------------------------------------------------------------------------

eesnyder@boulder.Colorado.EDU (Eric E. Snyder) (11/09/89)

Thanks for the several replies providing references on the subject.  
Several more people requested that I forward the information I recieved
on the subject.  Our mailer bounced several of these replies so I'll 
just post it here (the comp.ai people have probably seen it before):

*************************************************************************

From: ohsu-hcx!spackmank@cse.ogc.edu (Dr. Kent Spackman)
Subject: connectionist protein structure

The two articles I mentioned are:

Holley, L.H.; Karplus, M.  Protein structure prediction with a neural
    network. Proceeding of National Academy of Science, USA; 1989; 
    86: 152-156.

Qian, Ning; Sejnowski, Terrence J.  Predicting the secondary structure
    of globular proteins using neural network models.  J Mol Biol; 1988;
    202: 865-884.

    I have an article that will be published in the proceedings of the
Symposium on Computer Applications in Medical Care, in Washington,
D.C., in November, entitled: "Evaluation of Neural Network Performance
by ROC analysis: Examples from the Biotechnology Domain".
Authors are M.L. Meistrell and myself.

Kent A. Spackman, MD PhD
Biomedical Information Communication Center (BICC)
Oregon Health Sciences University
3181 SW Sam Jackson Park Road
Portland, OR 97201-3098

From: Lambert.Wixson@MAPS.CS.CMU.EDU
Subject: DNA,RNA, etc.

Holley and Karplus, Proceedings of the National Academy of Science 86,
152-156 (89).

----

From: mv10801@uc.msc.umn.edu
Subject: Re:  applications to DNA, RNA and proteins

George Wilcox (mf12801@sc.msc.umn.edu) does work on predicting protein
tertiary structure using large backprop nets.

--Jonathan Marshall
  Center for Research in Learning, Perception, and Cognition
  205 Elliott Hall, Univ. of Minnesota, Minneapolis, MN  55455

----

>From munnari!cluster.cs.su.OZ.AU!ray@uunet.UU.NET Fri Sep 29 23:40:55 1989
Subject: applications to DNA, RNA and proteins

Borman, Stu "Neural Network Applications In Chemistry Begin to Appear", 
	  C&E News, April 24 1989, pp 24-28.

Thornton, Janet "The shape of things to come?" Nature, Vol. 335 (1st
	  September 1988), pp 10-11.

You probably know about the Qian and Sejnowski paper already.

The Thornton "paper" is a fast overview with a sentence or two comparing
Q&S's work with other work.

Borman's C&E piece is fairly superficial, but it mentions some other people
who have played with this stuff, including Bryngelson and Hopfield, Holley
and Karplus (who apparantly have published in Proc. Nat. Acad. Sci., 86(1),
152 (1989)) and Liebman.

The 1990 Spring Symposium at Stanford (March 27-29, 1990) will have a
session on "Artificial Intelligence and Molecular Biology".  The CFP lists
Neural Networks (very broad-minded of them!), so it might be worth a look
when it comes around.

From: "Evan W. Steeg" <steeg@ai.toronto.edu>
Subject: NNets and macromolecules

 There is a fair amount of work on applying neural networks to
questions involving DNA, RNA, and proteins.  The two major
types of application are:

 1) Using neural networks to predict conformation (secondary
structure and/or tertiary structure) of molecules from their
sequence (primary structure).

 2) Using nets to find regularities, patterns, etc. in the sequence
itself, e.g. find coding regions, search for homologies between
sequences, etc.

 The two areas are not disjoint -- one might look for alpha-helix
"signals" in a protein sequence as part of a structure prediction
method, for example.

  I did my M.Sc. on "Neural Network Algorithms for RNA Secondary
Structure Prediction", basically using a modified Hopfield-Tank
(Mean Field Theory) network to perform an energy minimization
search for optimal structures.   A technical report and journal
paper will be out soon.  I'm currently working on applications
of nets to protein structure prediction.  (Reference below).

  Qian and Sejnowski used a feed-forward net to predict local
secondary structure of proteins.  (Reference above).
At least two other groups repeated and extended the Qian &
Sejnowski experiments.  One was Karplus et al (ref. above)
and the other was Cotterill et al in Denmark. (Discussed in
a poster at the Fourth International Symposium on Artificial
Intelligence Systems, Trento, Italy Sept. 1988). 

  Finally, a group in Minnesota used a supercomputer and back-prop 
to try to find regularities in the 2-d distance matrices (distances 
between alpha-carbon atoms in a protein structure).  An interim 
report on this work was discussed at the IJCNN-88 (Wash. DC) conference.
(Sorry, I don't recall the names, but the two researchers were
at the Minnesota Supercomputer Center, I believe.)

  As for the numerous research efforts in finding signals and patterns
in sequences, I don't have these references handy.  But the work
of Lapedes of Los Alamos comes to mind as an interesting bit
of work.

Refs:

E.W. Steeg.
Neural Network Algorithms for the Prediction of RNA Secondary
Structure.
M.Sc. Thesis, Computer Science Dept., University of Toronto,
Toronto, Ontario, Canada, 1988.

Evan W. Steeg (416) 978-7321      steeg@ai.toronto.edu (CSnet,UUCP,Bitnet)
Dept of Computer Science          steeg@ai.utoronto    (other Bitnet)
University of Toronto,            steeg@ai.toronto.cdn (EAN X.400)
Toronto, Canada M5S 1A4           {seismo,watmath}!ai.toronto.edu!steeg

-----

From: pastor@PRC.Unisys.COM (Jon Pastor)
Subject: Re:  applications to DNA, RNA and proteins

@article(nakata85a,
	Author="K. Nakata and M. Kanehisa and D. DeLisi",
	Title="Prediction of splice junctions in mRNA sequences",
	Journal="Nucleic Acids Research",
	Year="1985",

	Volume="13",
	Number="",
	Month="",
	Pages="5327--5340",
	Note="",

	Annote="")
@article(stormo82a,
	Author="G.D. Stormo and T.D. Schneider and L.M. Gold ",
	Title="Characterization of translational initiation sites in E. coli",
	Journal="Nucleic Acids Research",
	Year="1982",

	Volume="10",
	Number="",
	Month="",
	Pages="2971--2996",
	Note="",

	Annote="")
@article(stormo82b,
	Author="G.D. Stormo and T.D. Schneider and L.M. Gold and A. Ehrenfeucht",
	Title="Use of the `perceptron' algorithm to distinguish translational initiation sites in E. coli",
	Journal="Nucleic Acids Research",
	Year="1982",

	Volume="10",
	Number="",
	Month="",
	Pages="2997--3010",
	Note="",

	Annote="")

In addition, there is going to be (I think) a paper by Alan Lapedes, from
Los Alamos, in a forthcoming book published by the Santa Fe Institute;
my group also has a paper in this book, which is how I know about Lapedes'
submission.  I am going to try to contact the editor to see if I can get a
preprint; if so, I'll let you know.  I didn't attend the meeting at which
Lapedes presented his paper, but I'm told that he was looking for splice
junctions.

----

From: ff%FRLRI61.BITNET@CUNYVM.CUNY.EDU (Francoise Fogelman)
Subject: proteins

We have done some work on the prediction of secondary structures
of proteins. This was presented at a NATO meeting (Les Arcs, march 1989)
and will be published in the proceedings.

F. Fogelman
LRI Bat 490
Universite de Paris Sud
91405 ORSAY cedex FRANCE
Tel 33 1 69 41 63 69
e-mail:  ff@lri.lri.fr

----

The book "Evolution, Learning and Cognition", the article
"Learning to Predict the Secondary Structure of Globular Proteins"
by N. Qian & T. J. Sejnowski.