BAIROCH@cgecmu51.bitnet (04/07/89)
-----------------------------------------------------------------------------
SWISS-PROT bulletin board
Concern : SWISS-PROT release 10.
Category: Release notes.
Date : March 5, 1989
-----------------------------------------------------------------------------
1. Introduction
1.1 Evolution
Release 10.0 of SWISS-PROT contains 10008 sequence entries,
comprising 2'952'613 amino acids abstracted from 9920
references. This represents an increase of 15% over release
9.0. The recent growth of the data bank is summarised
below:
Release Date Number of entries Nb of amino acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
1.2 Source of data
Release 10.0 has been updated using protein sequence data
from release 19.0 of the PIR (Protein Identification
Resource) protein data bank, as well as translation of
nucleotide sequence data from release 18.0 of the EMBL
nucleotide sequence Data Library.
As an indication to the source of the sequence data in the
SWISS-PROT data bank we list here the statistics concerning
the DR (Databank Reference) pointer lines:
Entries with pointer(s) to only PIR entri(es): 3009
Entries with pointer(s) to only EMBL entri(es): 3383
Entries with pointer(s) to both EMBL and PIR entri(es): 2973
Entries with no pointers lines (entered in house): 643
2. Description of the changes made to SWISS-PROT since
release 9
2.1 Sequences and annotations
Some 1306 new sequences have been added since the last
release, the sequence data of 159 existing entries has been
updated and the annotations of 1699 entries have been
revised. In particular we have used reviews articles to
update the annotations of the following groups or families
of proteins:
Aerolysin type toxins
Asparaginases / Glutaminases
Aspartyl proteases
ATP-binding proteins 'active transport' family
Bacterial and fungi ribonucleases
Bowman-Birk serine protease inhibitors
Cadherins
Calcitonins
Clathrin light chains
Crystallins beta and gamma
Cytosine-specific DNA methylases
Colony stimulating factors
Glucagon / GIP / secretin / VIP family
E1-E2 ATPases
Crystaline entomocidal toxin proteins
Fos/jun proteins family.
Galactose-1-phosphate uridyl transferase
Glutathione S-transferase
Herpes and Varicella Zooster viruses proteins
Int-1 family
Interferons alpha and beta
Interferons induced proteins
Kazal serine protease inhibitors
Lipoxygenases
LysR bacterial activator proteins family
Manganese and Iron superoxide dismutases
Myb family proteins
Myc family proteins
Nicotinic acetylcholine receptors
Pancreatic hormone / Neuropeptide Y family
Pancreatic ribonucleases
Peptidyl-prolyl cis-trans isomerase (ppiase)
(cyclophilin)
Platelet factor 4 family.
Shiga/Ricin ribosomal inactivating toxins
Somatotropin, prolactin and related hormones
Squash family of serine protease inhibitors
Thymidylate synthase
TNF alpha and beta
Topoisomerases type II
Tropomyosins
2.2 New format for the date (DT) line type
The format of the DT line has been changed and is now:
DT DD-MMM-YYYY (REL. XX, COMMENT)
where DD is the day, MMM the month, YYYY the year, and XX
the SWISS-PROT release number. The comment portion of the
line indicates the action taken on That date. There are
always three DT lines in each entry, each of them is
associated with a specific comment:
- The first DT line indicates when the entry first
appeared in the data bank. The associated comment is
"CREATED".
- The second DT line indicates when the sequence data was
last modified. The associated comment is "LAST SEQUENCE
UPDATE".
- The third DT line indicates when any data other then the
sequence was last modified. The associated comment is
"LAST ANNOTATION UPDATE".
Example of a block of DT lines:
DT 01-JAN-1988 (REL. 06, CREATED)
DT 01-AUG-1988 (REL. 08, LAST SEQUENCE UPDATE)
DT 01-MAR-1989 (REL. 10, LAST ANNOTATION UPDATE)
2.3 Extension of the taxonomic classification in the OC
lines
In previous releases of SWISS-PROT the OC (Organism
Classification) lines only contained the first node of the
taxonomic tree (PROKARYOTA, EUKARYOTA or VIRIDAE). Starting
with release 10 we are implementing a full taxonomic
classification. In release 10, 164 different taxonomic
nodes have been defined. The list of these nodes is
available in the SPECLIST.TXT document file.
2.4 New topic for the comments (CC) line type
As of release 10 we have added a new 'topic' for the
comments (CC) line-type: COFACTOR, which is used to
describe enzyme cofactor(s). Example of its usage:
CC -!- COFACTOR: REQUIRES PYRIDOXAL PHOSPHATE.
2.5 New feature key
A new feature key has been introduced in this release:
TRANSIT, which describes the extent of a transit peptide
(mitochondrial or chloroplastic). Examples of TRANSIT key
feature lines:
FT TRANSIT 1 25 MITOCHONDRION.
FT TRANSIT 1 42 CHLOROPLAST.
3. THE NEXT RELEASE
SWISS-PROT release 11.0 will be available in June 1989.
4. WE NEED YOUR HELP !
We welcome any feedback from our users. We especially would
appreciate that you notify us if you find that sequences
belonging to your field of expertise are missing from the
data bank. We also would like to be notified about
annotations to be updated, as for example if the function
of a protein has been clarified or if new post-
translational information has become available.
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
COMPOSITION IN PERCENT FOR THE COMPLETE DATA BANK
Ala (A) 7.77 Gln (Q) 4.11 Leu (L) 9.08 Ser (S) 7.00
Arg (R) 5.23 Glu (E) 6.15 Lys (K) 5.81 Thr (T) 5.84
Asn (N) 4.36 Gly (G) 7.30 Met (M) 2.26 Trp (W) 1.35
Asp (D) 5.21 His (H) 2.29 Phe (F) 3.94 Tyr (Y) 3.22
Cys (C) 1.89 Ile (I) 5.30 Pro (P) 5.21 Val (V) 6.51
Asx (B) 0.01 Glx (Z) 0.01 Xaa (X) 0.03
CLASSIFICATION OF THE AMINO ACIDS BY THEIR FREQUENCY
Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Arg, Asp, Pro,
Asn, Gln, Phe, Tyr, His, Met, Cys, Trp
A.2 Repartition of the sequences by their organism of
origin
Total number of species represented in this release of the
data bank: 1590
Species represented 1x: 741
2x: 291
3x: 163
4x: 99
5x: 58
6x: 45
7x: 27
8x: 28
9x: 30
10x: 13
11- 20x: 43
21-100x: 42
>100x: 10
TABLE OF THE MOST REPRESENTED SPECIES
918: HUMAN 173: DROME 71: BPT4 59: VACCV
838: ECOLI 143: RABIT 70: HSV11 57: WHEAT
497: MOUSE 126: PIG 69: TOBAC 54: BPT7
414: RAT 93: XENLA 67: VZVD 53: SOYBN
324: YEAST 84: BACSU 62: LAMBD 53: SHEEP
282: BOVIN 80: SALTY 60: MARPO
176: CHICK 74: MAIZE 59: EPBAR
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 582 1001-1100 70
51- 100 1291 1101-1200 46
101- 150 2043 1201-1300 35
151- 200 1066 1301-1400 24
201- 250 805 1401-1500 17
251- 300 662 1501-1600 8
301- 350 588 1601-1700 11
351- 400 549 1701-1800 9
401- 450 420 1801-1900 8
451- 500 461 1901-2000 5
501- 550 369 >2000 49
551- 600 216
601- 60 168 Currently the two largest sequences are:
651- 700 114 APB$HUMAN 4563 a.a.
701- 750 99 APOA$HUMAN 4548 a.a.
751- 800 70
801- 850 66
851- 900 85
901- 950 37
951-1000 35