[bit.listserv.info-gcg] ProSite database for consensus sequences

RICE@EMBL.BITNET (Peter Rice) (02/06/90)

Stephen Clark asks:

>/Does anybody know, whether there are any files in existing databases,
>/which represent so called consensus-sequences ?

>I, too, am interested in consensus sequence databases. I have a
>document from EMBL called "PROSITE: A dictionary of protein sites and
>patterns" which seems to have quite a lot of work put into it, but it has
>two major problems. First, it is hardcopy (very disappointing for something
>from the EMBL biocomputing group), and second, it has no table of contents
>or index. So far I haven't managed to find a listing for signal peptides.

>If anyone knows of a source of consensus sequence information in
>computer-readable form, could they please post a message to this list?


  PROSITE is produced by Amos Bairoch at the University of Geneva. The
  first few releases were all produced in document form while the
  database contents were being developed. The latest release (release 4,
  November 1989) was issued by EMBL in the Biocomputing Technical
  Document series. There is a table of contents at the beginning, on
  pages 5-10, but most of the document has no page numbers. I have put
  page numbers in my copy, and added them to the table of contents at
  the beginning so I can find my way around.

  Amos is planning to produce an online version of PROSITE, in a similar
  format to the SWISSPROT protein sequence database and to be
  distributed with SWISSPROT. The exact format specification has yet to
  be decided, hence the delay. The database will be distributed by EMBL
  together with SWISSPROT, and will include cross-references between
  PROSITE and SWISSPROT entries.

  Amos Bairoch posted details of the PROSITE database to this list in
  November last year. You can get more information about the progress of
  PROSITE by sending mail to PROSITE@CGECMU51.bitnet

  PROSITE uses ambiguous amino acid positions (L,I,V,M for example) in
  the motif definitions, which at present is not supported by the GCG
  pattern matching routines/programs. I set up a series of pattern files
  for FIND from a previous version of PROSITE, but it was too much work
  to keep updating it. I am now waiting for the database distribution
  before writing a new program (for a future version of the GCGEMBL
  package) that understands the PROSITE pattern syntax.

  Signal peptides do not fit into the PROSITE database, as the sequences
  are too degenerate. EMBL distributes a program SIGCLEAVE in the
  GCGEMBL package on the EMBL Network File Server. SIGCLEAVE uses the
  von Heijne method to identify signal peptides. The method is reported
  to be 95% accurate in locating signal sequences, and 75-80% accurate
  in identifying the cleavage site (although I have heard that it may be
  better than the original claims :-) Just send E-mail with the subject
  HELP SOFTWARE to NETSERV@EMBL.bitnet for more information on the Network
  File Server.

                                        Peter Rice

 -----------------------------------------------------------------------------
 Peter Rice, EMBL                             | Post: BioComputing Programme
                                              |       European Molecular
 EARN/Bitnet: rice@embl.bitnet                |             Biology Laboratory
 Internet: rice%embl.bitnet@cunyvm.cuny.edu   |       Postfach 10-2209
                                              |       D-6900 Heidelberg
 Phone:   +49-6221-387247                     |       West Germany