RICE@EMBL.BITNET (Peter Rice) (02/06/90)
Stephen Clark asks: >/Does anybody know, whether there are any files in existing databases, >/which represent so called consensus-sequences ? >I, too, am interested in consensus sequence databases. I have a >document from EMBL called "PROSITE: A dictionary of protein sites and >patterns" which seems to have quite a lot of work put into it, but it has >two major problems. First, it is hardcopy (very disappointing for something >from the EMBL biocomputing group), and second, it has no table of contents >or index. So far I haven't managed to find a listing for signal peptides. >If anyone knows of a source of consensus sequence information in >computer-readable form, could they please post a message to this list? PROSITE is produced by Amos Bairoch at the University of Geneva. The first few releases were all produced in document form while the database contents were being developed. The latest release (release 4, November 1989) was issued by EMBL in the Biocomputing Technical Document series. There is a table of contents at the beginning, on pages 5-10, but most of the document has no page numbers. I have put page numbers in my copy, and added them to the table of contents at the beginning so I can find my way around. Amos is planning to produce an online version of PROSITE, in a similar format to the SWISSPROT protein sequence database and to be distributed with SWISSPROT. The exact format specification has yet to be decided, hence the delay. The database will be distributed by EMBL together with SWISSPROT, and will include cross-references between PROSITE and SWISSPROT entries. Amos Bairoch posted details of the PROSITE database to this list in November last year. You can get more information about the progress of PROSITE by sending mail to PROSITE@CGECMU51.bitnet PROSITE uses ambiguous amino acid positions (L,I,V,M for example) in the motif definitions, which at present is not supported by the GCG pattern matching routines/programs. I set up a series of pattern files for FIND from a previous version of PROSITE, but it was too much work to keep updating it. I am now waiting for the database distribution before writing a new program (for a future version of the GCGEMBL package) that understands the PROSITE pattern syntax. Signal peptides do not fit into the PROSITE database, as the sequences are too degenerate. EMBL distributes a program SIGCLEAVE in the GCGEMBL package on the EMBL Network File Server. SIGCLEAVE uses the von Heijne method to identify signal peptides. The method is reported to be 95% accurate in locating signal sequences, and 75-80% accurate in identifying the cleavage site (although I have heard that it may be better than the original claims :-) Just send E-mail with the subject HELP SOFTWARE to NETSERV@EMBL.bitnet for more information on the Network File Server. Peter Rice ----------------------------------------------------------------------------- Peter Rice, EMBL | Post: BioComputing Programme | European Molecular EARN/Bitnet: rice@embl.bitnet | Biology Laboratory Internet: rice%embl.bitnet@cunyvm.cuny.edu | Postfach 10-2209 | D-6900 Heidelberg Phone: +49-6221-387247 | West Germany