BAIROCH@CGECMU51.BITNET (Amos Bairoch) (02/07/90)
< I, too, am interested in consensus sequence databases. I have a <document from EMBL called "PROSITE: A dictionary of protein sites and <patterns" which seems to have quite a lot of work put into it, but it has <two major problems. First, it is hardcopy (very disappointing for something <from the EMBL biocomputing group), and second, it has no table of contents <or index. So far I haven't managed to find a listing for signal peptides. < < If anyone knows of a source of consensus sequence information in <computer-readable form, could they please post a message to this list? < < <Stephen Clark < <clark@mshri.utoronto.ca (Internet) <sinai@utoroci (Netnorth/Bitnet) In November I have posted to this list (as well as other lists) a news bulletin that explained that, while release 4 of PROSITE is available, in a printed form, future releases will be distributed in a computer readable form. In fact PROSITE has been all along available in a computer form, in my sequence analysis package (PC/Gene), but as I have now decided to make the data bank public domain, I had to define a new distribution format, it was not possible to do so for release 4. So before flaming on the fact that it is "very disappointing for something from the EMBL biocomputing group" it would be nice if the info that's available on the net was read. Furthermore the introduction of the PROSITE book says that: "This dictionary will very soon be available, on-line, in Europe on the EMBL Data Library file server, and in the U.S.A. on the GenBank on-line service computer facility." --------------------------------------------------------------------- <it has no table of contents There is a table of contents (pages 5 to 9). --------------------------------------------------------------------- <So far I haven't managed to find a listing for signal peptides. And you will never see one in PROSITE, signal peptides are not found using consensus patterns, but using a matrix (profile) approach, as implemented in the very reliable method of von Heijne [1] which is available in the majority of sequence analysis packages. [1] Von Heijne G. A new method for predicting signal sequences cleavage sites. Nucleic Acids Res. 14:4683-4690(1986). In addition to von Heijne, there have been three other publications that describe methods to find signal peptide similar or based on that method. Folz R.J., Gordon J.I. Computer-assisted predictions of signal peptidase processing sites. Biochem. Biophys. Res. Commun. 146:870-877(1987). Pascarella S., Bossa F. CLEAVAGE: a microcomputer program for predicting signal sequence cleavage sites. CABIOS 5:53-54(1989). Popowicz A.M., Dash P.F. SIGSEQ: a computer program for predicting signal sequence cleavage sites. CABIOS 4:405-406(1988). The program for this method is available on the EMBL file server (get dos_software:SIGSEQ$.UUE) --------------------------------------------------------------------- < If anyone knows of a source of consensus sequence information in <computer-readable form, could they please post a message to this list? PROSITE release 5.0, will be available on the EMBL file server somewhere in March. ***************************************************************************** * Amos Bairoch * Email: bairoch@cgecmu51 * * Dept. Medical Biochemistry * Tel : +(41 22) 61 84 92 * * CMU *********************************************** * 1, rue Michel Servet * * * 1211 Geneva 4 * H(2)O is hot water, CO(2) is cold water * * Switzerland * --High school chemistry exam response- * *****************************************************************************
clark@MSHRI.UTORONTO.CA (02/07/90)
Amos Bairoch (bairoch@cgecmu51), author/compiler/distibuter of PROSITE, writes (quoting me): /< I, too, am interested in consensus sequence databases. I have a /<document from EMBL called "PROSITE: A dictionary of protein sites and /<patterns" which seems to have quite a lot of work put into it, but it has /<two major problems. First, it is hardcopy (very disappointing for something /<from the EMBL biocomputing group), and second, it has no table of contents /<or index. So far I haven't managed to find a listing for signal peptides. /< /<Stephen Clark / /In November I have posted to this list (as well as other lists) a news /bulletin that explained that, while release 4 of PROSITE is available, /in a printed form, future releases will be distributed in a computer /readable form. Yes, I saw your message concerning PROSITE, which is how I got hold of it. Thank-you for sending it to me. Actually, I saw your message on one of the BIONET lists and don't recall seeing it on info-gcg. It probably came through when our link was down and bounced. I'll never know how many useful messages I have missed because of this problem. /So before flaming on the fact that it is "very disappointing for something /from the EMBL biocomputing group" it would be nice if the info that's /available on the net was read. Furthermore the introduction of the PROSITE /book says that: /"This dictionary will very soon be available, on-line, in Europe on the / EMBL Data Library file server, and in the U.S.A. on the GenBank on-line / service computer facility." I'm sorry if you took this comment to be a flame; that wasn't my intention at all. You have obviously gone to a lot of work to produce this document that contains loads of useful information. Nevertheless, I _am_ disappointed because, short of keying in all the data by hand, there is no way for me to hack together something to allow me to search new protein sequences for any of these motifs. I'll just have to wait for the computer-readable form. Haven't you heard that molecular biologists are very impatient? /--------------------------------------------------------------------- /<it has no table of contents / /There is a table of contents (pages 5 to 9). /--------------------------------------------------------------------- To flame, or not to flame; the temptation is strong. Let me just say, so the people who read this list and haven't seen PROSITE don't think that I'm a total idiot, that the table of contents does not mention page numbers for the patterns. /<So far I haven't managed to find a listing for signal peptides. / /And you will never see one in PROSITE, signal peptides are not found using /consensus patterns, but using a matrix (profile) approach, as implemented /in the very reliable method of von Heijne [1] which is available in the /majority of sequence analysis packages. / /[1] Von Heijne G. / A new method for predicting signal sequences cleavage sites. / Nucleic Acids Res. 14:4683-4690(1986). Thanks for the reference, I wasn't aware of this work. I'll look it up in the library as soon as I can, along with the other three you mentioned. Nevertheless, the distinction between a profile and pattern is quite fine, in fact, non-existant to your average molecular biologist, so it might be a good idea to include this information in your next release. Given the fact that so much can be gained from having a reasonable idea whether a protein is cytoplasmic or secreted/membrane-bound, you might want to use large letters (and put it on the front cover). /< If anyone knows of a source of consensus sequence information in /<computer-readable form, could they please post a message to this list? / /PROSITE release 5.0, will be available on the EMBL file server somewhere /in March. Fantastic! Will it be a flat ascii file so that someone can mail it to me? I don't have access to ftp. :^( And how about DNA sequences? _Everybody_ wants to look for regulatory protein binding sites. I vaguely recall that there was such a thing on BIONET when they were still alive. Does anyone know if this is right, and if so, is it in the public domain (considering they were supported by NIH money), or does it belong to IntelliGenetics? :vQ Stephen Clark clark@mshri.utoronto.ca (Internet) sinai@utoroci (Netnorth/Bitnet) "We should be quite remiss not to emphasize that despite the popularity of secondary structural prediction schemes, and the almost ritual performance of these calculations, the information available from this is of limited reliability. This is true even of the best methods now known, and much more so of the less successful methods commonly available in sequence analysis packages. Running a secondary structure prediction on a newly-determined sequence just because everyone else does so, is to be deplored, and the fact that the results of such predictions are generally ignored is insufficient justification for doing and publishing them." - Arthur Lesk, 1988