[bionet.general] Splice site recognition with Neural Networks

engeje@uts.uni-c.dk (Jacob Engelbrecht) (03/28/91)

The following preprint is available in postscript form by anonymous ftp 

"Prediction of human  mRNA  donor  and  acceptor  sites  from  the  DNA
sequence". S. Brunak, J. Engelbrecht and S. Knudsen. 

Journal of Molecular Biology, to appear. 


Abstract: 

Artificial neural networks have  been  applied  to  the  prediction  of
splice site location in human pre-mRNA. A joint prediction scheme where
prediction of transition regions between introns and exons regulates  a
cutoff level for splice site assignment was able to predict splice site
locations with confidence levels far better than previously reported in
the literature. The problem of predicting donor and acceptor  sites  in
human genes is hampered by the presence of numerous  amounts  of  false
positives --- in the paper the distribution of these false splice sites
is examined  and  linked  to  a  possible  scenario  for  the  splicing
mechanism in vivo. When the presented method detects 95%  of  the  true
donor and acceptor sites it makes  less  than  0.1%  false  donor  site
assignments and less than 0.4% false acceptor site assignments. For the
large data set used in this study this means that on the average  there
are one and a half false donor sites per true donor site and six  false
acceptor sites per true acceptor site. With the joint assignment method
more than a fifth of the true donor sites and around one fourth of  the
true acceptor sites could be  detected  without  accompaniment  of  any
false positive predictions. Highly confident splice sites could not  be
isolated with a widely used weight matrix method or by separate  splice
site networks. A complementary relation between the  confidence  levels
of the coding/non-coding and the  separate  splice  site  networks  was
observed, with many weak splice sites having sharp transitions  in  the
coding/non-coding signal and many stronger  splice  sites  having  more
ill-defined transitions between coding and non-coding. 

Subject category: Genes, under the sub--headings: expression,  sequence
and structure. 

Keywords:  Intron--splicing,  human  genes,  exon   selection,   neural
network, computer--prediction.  


-----------------------------------------------------------------------

You will need a POSTSCRIPT printer to print the file. 
To  obtain  a  copy  of  the   preprint,   use   anonymous   ftp   from
cheops.cis.ohio-state.edu (here is what the transaction looks like): 

unix> ftp
ftp> open cheops.cis.ohio-state.edu
Connected to cheops.cis.ohio-state.edu.
220 cheops.cis.ohio-state.edu FTP server (Version blah blah) ready.
Name (cheops.cis.ohio-state.edu:yourname): anonymous
331 Guest login ok, send ident as password.
Password: anything 
230 Guest login ok, access restrictions apply.
ftp> cd pub/neuroprose
250 CWD command successful.
ftp> bin  
200 Type set to I.
ftp> get brunak.netgene.ps.Z 
200 PORT command successful.
150 Opening BINARY mode data connection for brunak.netgene.ps.Z 
226 Transfer complete.
local: brunak.netgene.ps.Z remote: brunak.netgene.ps.Z
ftp> quit
221 Goodbye.
unix> uncompress brunak.netgene.ps.Z
unix> lpr brunak.netgene.ps 




Hardcopies are also available:

S. Brunak and J. Engelbrecht
Department of Structural Properties of Materials  
Building 307
The Technical University of Denmark 
DK-2800 Lyngby, Denmark  
brunak@nbivax.nbi.dk