engeje@uts.uni-c.dk (Jacob Engelbrecht) (03/28/91)
The following preprint is available in postscript form by anonymous ftp "Prediction of human mRNA donor and acceptor sites from the DNA sequence". S. Brunak, J. Engelbrecht and S. Knudsen. Journal of Molecular Biology, to appear. Abstract: Artificial neural networks have been applied to the prediction of splice site location in human pre-mRNA. A joint prediction scheme where prediction of transition regions between introns and exons regulates a cutoff level for splice site assignment was able to predict splice site locations with confidence levels far better than previously reported in the literature. The problem of predicting donor and acceptor sites in human genes is hampered by the presence of numerous amounts of false positives --- in the paper the distribution of these false splice sites is examined and linked to a possible scenario for the splicing mechanism in vivo. When the presented method detects 95% of the true donor and acceptor sites it makes less than 0.1% false donor site assignments and less than 0.4% false acceptor site assignments. For the large data set used in this study this means that on the average there are one and a half false donor sites per true donor site and six false acceptor sites per true acceptor site. With the joint assignment method more than a fifth of the true donor sites and around one fourth of the true acceptor sites could be detected without accompaniment of any false positive predictions. Highly confident splice sites could not be isolated with a widely used weight matrix method or by separate splice site networks. A complementary relation between the confidence levels of the coding/non-coding and the separate splice site networks was observed, with many weak splice sites having sharp transitions in the coding/non-coding signal and many stronger splice sites having more ill-defined transitions between coding and non-coding. Subject category: Genes, under the sub--headings: expression, sequence and structure. Keywords: Intron--splicing, human genes, exon selection, neural network, computer--prediction. ----------------------------------------------------------------------- You will need a POSTSCRIPT printer to print the file. To obtain a copy of the preprint, use anonymous ftp from cheops.cis.ohio-state.edu (here is what the transaction looks like): unix> ftp ftp> open cheops.cis.ohio-state.edu Connected to cheops.cis.ohio-state.edu. 220 cheops.cis.ohio-state.edu FTP server (Version blah blah) ready. Name (cheops.cis.ohio-state.edu:yourname): anonymous 331 Guest login ok, send ident as password. Password: anything 230 Guest login ok, access restrictions apply. ftp> cd pub/neuroprose 250 CWD command successful. ftp> bin 200 Type set to I. ftp> get brunak.netgene.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for brunak.netgene.ps.Z 226 Transfer complete. local: brunak.netgene.ps.Z remote: brunak.netgene.ps.Z ftp> quit 221 Goodbye. unix> uncompress brunak.netgene.ps.Z unix> lpr brunak.netgene.ps Hardcopies are also available: S. Brunak and J. Engelbrecht Department of Structural Properties of Materials Building 307 The Technical University of Denmark DK-2800 Lyngby, Denmark brunak@nbivax.nbi.dk