clark@mshri.utoronto.ca (06/05/90)
The Prosite database, compiled by Amos Bairoch at the Centre Medical Universitaire in Geneva, Switzerland, is a great resource for identifying short patterns (or motifs) which are typical of proteins of a certain class or function. Release 5 of the database contains over 300 patterns. The database has been set up to make it relatively easy to write a program that can read the patterns and search for them in a protein of unknown function. It has not been designed to answer the question "What patterns have been identified for this protein of known function?", even though the database contains that information. To make these data easily accessible, I have written the program Proindex which creates an index of all the proteins in the database. For example, somebody interested in proteolytic enzymes could immediately find that there are five references to "protease" and eight to "proteases", then look up their patterns and associated information. A small part of the index file is shown here as an example: ANHYDRASES 00146 Carbonic anhydrases signature. ANION 00192 Anion exchangers family signature 1. ANION 00192 Anion exchangers family signature 2. ANNEXINS 00195 Annexins phospholipid/calcium-binding domain signature. ANTENNAPEDIA-TY 00032 'Homeobox' antennapedia-type protein signature. ANTIGEN 00265 Proliferating cell nuclear antigen signature. ARAC 00040 Bacterial activator proteins, araC family signature. ARGINASE 00135 Arginase signature 2. ARGINASE 00135 Arginase signature 1. ARRESTIN 00267 Arrestin signature. Each line is a maximum of 79 bytes, with three fields. The first field of 15 characters is the keyword, the next field of 5 bytes is the pointer to the Prosite documentation, and the last field is as much of the pattern description from the "DE" line as will fit. The keywords are obtained from the DE line of the data file. (The uninformative words like "site", "signature", "family", etc are screened out during the indexing process.) The source for the program, written in VAX Fortran, the associated files, and a sorted index for Prosite version 5 have been deposited with EMBL. Anyone who would like copies of these and who doesn't have access to the EMBL file server can ask me for them directly. To my knowledge, this is the second program to take advantage of the Prosite database, the first being Kay Hofmann's for converting it to a format that can be used by the GCG programs. I expect there will be many others as this gem of a database becomes more and more widely distributed. (If others are already available, please let us know about them!) Steve Clark clark@mshri.utoronto.ca (Internet) clark@utoroci (Netnorth/Bitnet)