[bionet.molbio.proteins] ENZYME data bank: announcement and description.

BAIROCH@cgecmu51.bitnet (Amos Bairoch) (04/12/90)

ENZYME DATA BANK PRE-ANNOUNCEMENT
=================================

A new "secondary" data bank is being established. It is called the 'ENZYME'
data bank and it contains the following data for each type of enzyme:

  1) EC number.
  2) Recommended name.
  3) Alternative names (if any).
  4) Catalytic activity.
  5) Cofactors (if any).
  6) Pointers to SWISS-PROT entrie(s) corresponding to that enzyme (if any).

We think that the ENZYME data bank will be useful to anybody working with
enzymes and will allow programs to  be  developped that can help with the
creation of new metabolic pathways.

With the ENZYME data bank  the  current situation,  in term of data bases
interconnections, will now be the following:


          +------+      +------------+
          |      |      |            |  <--> ENZYME
 EPD  <-- | EMBL | <--> | SWISS-PROT |  ---> PDB
          |      |      |            |  <--> PROSITE
          +------+      +------------+


IMPACT ON SWISS-PROT

This new data bank will have the following impact on SWISS-PROT:

  1) The existence of this data bank will make the ECINDEX.TXT document
     obsolete and it will thus be discarded.

  2) Instead of having CC (comments) lines with the topics:

     CC   -!- CATALYTIC ACTIVITY: description_of_catalytic_activity.
     CC   -!- COFACTOR: description_of_cofactor.

     The enzyme entries in SWISS-PROT will have two new types of lines:

     CA   Description_of_catalytic_activity.
     CF   Description_of_cofactor.

     These lines will be carried over from the ENZYME data bank and
     will be automatically generated at each release of SWISS-PROT
     from the information stored in the ENZYME data bank.

     The introduction of the new line types is planned for release 16 of
     SWISS-PROT (October 1990).


CREATION AND MAINTENANCE

How will this data bank be created and maintained ?

The source of the majority of the data in the ENZYME data bank comes from
the IUPAC/IUB 1984 enzyme nomenclature book [1] and the two supplements
(1986 and 1989) [2,3].

Unfortunatly these documents  do  not  seem to be available on any computer
media and we were  forced  to  type-in  the information relevant to all the
different enzymes  which  are  represented  in  SWISS-PROT.  There are 3056
different  EC numbers,  the  information concerning 30% of these enzymes is
already entered.   We have decided to type-in the rest of the data (optical
reading of the documents has been attempted, but is not reliable enough).

The full data bank will be available probably in late autumn.   Preliminary
versions  will be distributed along with SWISS-PROT, starting with the next
release (release 14 in mid-April).

This data bank will be very easy to maintain. Except for error corrections,
or new information concerning cofactors,  updates  of  the enzyme list will
only occur when  a  new supplement is published (every two or three years).
The pointers to SWISS-PROT are also not a problem, the program that used to
build the ECINDEX file now automatically creates the DR lines in the ENZYME
data bank. This program will be run at every release of SWISS-PROT.


PRELIMINARY FORMAT DESCRIPTION

Global format: EMBL/SWISS-PROT like.

Line-types:

ID  Identification line
    Contains the EC number of the enzyme.
DE  Description line(s).
    Contains the recommended name of the enzyme.
AN  Alternative name(s) line(s)
    Contains the alternative name(s) of the enzyme.
CA  Catalytic activity line(s)
    Contains the description of the catalytic activity. The format used
    is that of IUPAC/IUB.
CF  Cofactor(s) line(s).
    Description of known cofactors.
CC  Comments line(s)
    Free text comments.
DR  Data bank cross-reference line(s).
    Cross-reference to the SWISS-PROT entries corresponding to the enzyme
    described.
//  Entry termination line.


SAMPLE ENTRY.

ID   1.14.17.3
DE   PEPTIDYLGLYCINE MONOOXYGENASE.
AN   PEPTIDYL ALPHA-AMIDATING ENZYME.
CA   PEPTIDYLGLYCINE + ASCORBATE + O(2) = PEPTIDYL(2-HYDROXYGLYCINE) +
CA   DEHYDROASCORBATE + H(2)O.
CC   THE PRODUCT IS UNSTABLE AND DISMUTATES TO GLYOXYLATE AND THE
CC   CORRESPONDING DESGLYCINE PEPTIDE AMIDE.
CF   COPPER.
DR   P10731, AMD$BOVIN ;  P14925, AMD$RAT   ;  P08478, AMD1$XENLA;
DR   P12890, AMD2$XENLA;
//


[1] Enzyme Nomenclature, NC-IUB, Academic Press, New-York, (1984).
[2] Supp. 1: Corrections and Additions, Eur. J. Biochem. 157:1-26(1986).
[3] Supp. 2: Corrections and Additions, Eur. J. Biochem. 179:489-533(1989).

----------------------------------

This is a pre-announcement, feedback is welcomed and encouraged.

*****************************************************************************
* Amos Bairoch                * Email: bairoch@cgecmu51                     *
* Dept. Medical Biochemistry  * Tel  : +(41 22) 61 84 92                    *
* CMU                         ***********************************************

* 1, rue Michel Servet        * Greer's third law:                          *
* 1211 Geneva 4               * To err is human, but to really foul things  *
* Switzerland                 * up you need a computer.                     *
*****************************************************************************