KRISTOFFERSON@BIONET-20.ARPA (02/06/88)
From: MJB1@VMS-SUPP.CAM.AC.UK
Bulletin_# 72 ATTIMONELLI%VAXBA0.INFNET 5 Feb 88 BBOARDS on Nucleic Acid colle
From: <ATTIMONELLI%VAXBA0.INFNET@EARN.IBOINFN> 5-FEB-1988 02:29
To: SEQNET
Subject: BBOARDS on Nucleic Acid collections
Date: 5 Feb 88
Via: UK.AC.RL.EARN; Fri, 05 Feb 88 02:27:26 GMT
Received:
from UKACRL by UK.AC.RL.IB (Mailer X1.25) with BSMTP id 2730; Fri, 05
Feb 88 02:27:26 GM
Received:
from IBOINFN.BITNET by UKACRL.BITNET (Mailer X1.25) with BSMTP id
2728; Fri, 05 Feb 88 02:27:25
Date: Thu, 4 Feb 88 19:30 N
From: <ATTIMONELLI%VAXBA0.INFNET@EARN.IBOINFN>
Reply-To: <ATTIMONELLI%VAXBA0.INFNET@EARN.IBOINFN>
Subject: BBOARDS on Nucleic Acid collections
To: MJB1@UK.AC.CAM.VMS-SUPP
X-Original-To: "MJB1@VMS-SUPP.CAM.AC.UK"
Message-id: <4495>
Date: THU, 4-FEB-88 19:29 N
From: <ATTIMONELLI@VAXBA0.INFNET>
Reply-To: <ATTIMONELLI%VAXBA0.INFNET@IBOINFN.BITNET> (alternate reply)
Subject: BBOARDS on Nucleic Acid collections
To: <MJB1@VMS-SUPP.CAM.AC.UK>
X-Original-To: MJB1@VMS-SUPP.CAM.AC.UK, ATTIMONELLI
To the managers of GenBank and EMBL collections and to
all the Databank users.
This is a note on the release 54 of GenBank that we have recently examined.
In this release the GenBank has changed the FEATURE table format and has
announced that they are moving toward a common format together with the
EMBL and the DNA Japan databank.
We are glad to note that there is at least an intent of reaching a
standardization on the format of the Nucleic Acid Bank, but we want to point
out a few important issues which we hope will be taken into account by both
the scientific community and the Bank managers.
As reported in our paper [ACNUC - a portable retreival system for...(CABIOS,
vol.1(3), 1985, pp.167-172)], we have adopted the GenBank collection for the
generation of the database ACNUC. We preferred the GenBank format to the EMBL
one mainly for the organization of the FEATURE and SITES tables.
In fact we considered very useful the use of SITE keys indicating start and
stop of a region (e.g. -> and <- ) and boundary between two regions (e.g. /).
Moreover the distinction apported by GenBank between FEATURES table and SITES
table allowed us to easily select the regions that in ACNUC are extracted
as SUBSEQUENCES.
In other words this organization gave us the possibility to extract directly
through ACNUC, specific fragments of a GenBank locus.
This facility is one of the most useful features of ACNUC which makes this
software more flexible and powerful.
The great advantages of the old organization of GenBank has been stressed
also by several researchers (see for example [Nussinov,R. et al. Biochimica et
Biophysica Acta 866 (1986),109-119]).
It is therefore a pity to note that just these useful keys have been abolished.
Moreover in our opinion at the moment the temporary structure of GenBank is
floppy and not very useful.
Of course we do not know the future developments and goals of GenBank but we
would like to stress that with this new format the scientific community has
lost a very important tool.
We wish also to point out several incongruencies encountered between
the news reported in the release notes and the content of the entries files.
In particular in the Primate entry file we have noted :
a) several EMBL sequences have been converted into the GenBank format
in a pedestrian way (EMBL feature tables have been simply confined to
Comments);
b) the feature keys as pept.psi, matp.psi, mRNA.psi, sigp.psi, mRNA+IVS
are not reported in the Feature keys names (section 3.5.7.1 of the release
notes);
c) the announced substitution of the key "variation" into the key "variant"
has not been applied and this has produced an uncorrected tabulation of the
'from' and 'to/span' fields;
The examples below reported can clarify the situation:
1) Partial feature table of GenBank entry HUMHBB
pept.psi 45741 45831 pseudo-hbp, exon 1 [62]
45953 46175 pseudo-hbp, exon 2 [62]
47030 47157 pseudo-hbp, exon 3 [62]
mRNA.psi 45688 47425 pseudo-hbp mRNA [62]
mRNA+IVS 19289 21098 hbe mRNA (alt.) [19],[40],[52]
mRNA+IVS 19504 21098 hbe mRNA (alt.) [19],[40],[52]
mRNA+IVS 19506 21098 hbe mRNA (alt.) [19],[40],[52]
rpt 66817 66827 Alu flank repeat 5' copy [49],[63]
rpt 66828 67094 Alu family repeat [49],[63]
variation 17864 17866 cag in clone lambda-epsilon; g in ph 1.8 [24]
revision 18641 18646 aatata in [34]; gatgtg in [19]
refnumbr 19120 19120 numbered 1 in [19]
refnumbr 19560 19560 numbered 1 in [67]; zero used
variation 32761 32762 ag in [26]; ga in [25]
variation 33204 33204 a in [26]; g in [25]
variation 46596 46597 aa in [62]; a in [63]
variation 46851 46853 aca in [62]; a in [63]
variation 47186 47208 ggtccactatgtttgtacctatg in [62]; g in [63]
variation 47341 47341 t in [62]; tt in [63]
refnumbr 50768 50768 numbered 1 in [45],[54]
2) EMBL PTAGGLOG entry converted into GenBank CHPAGGLOG
LOCUS CHPAGGLOG 1815 bp ds-DNA pre-entry 12/31/87
DEFINITION Chimpanzee fetal A-gamma-globin gene.
ACCESSION X03110
KEYWORDS A-gamma-globin; direct repeat; gamma-globin; tandem repeat.
SOURCE chimpanzee (Pan troglodytes).
ORGANISM Pan troglodytes
Eukaryota; Metazoa; Chordata; Vertebrata; Tetrapoda; Mammalia;
Eutheria; Primates; Anthropoidea; Hominoidea; Ponginae; Ponginae.
REFERENCE 1 (bases 1 to 1815; enum. 1 to 1815)
AUTHORS Slightom,J.L., Chang,L.-Y.E., Koop,B.F. and Goodman,M.
TITLE Chimpanzee fetal G-gamma and A-gamma globin gene nucleotide
sequences provide further evidence of gene conversions in hominine
evolution
JOURNAL Mol Biol Evol 2, 370-389 (1985)
COMMENT Data kindly reviewed (07-JUL-1986) by Slightom J.L.
EMBL features not translated to GenBank features:
key from to description
PRM 24 28 put. TATA-box
TRANSCR 55 1647 put. primary transcript
CAP 55 55 put. cap site
MSG 55 199 put. exon 1
IVS 200 321 intron I
IVS 545 1431 intron iI
RPT 1123 1162 TG(14) repeat (hot spot sequence
MSG 1431 1647 put. exon 3
SITE 1621 1626 put. polyadenylation signal
POLYA 1647 1647 put. polyadenylation site
FEATURES from to/span description
pept 108 199 A-gamma-globin (aa 1-31) (199 is 2nd base in
codon)
322 544 A-gamma-globin (aa 32-105) (322 is 3rd base in
codon)
1432 1560 A-gamma-globin (aa 106-147)
BASE COUNT 471 a 357 c 474 g 513 t
ORIGIN
We agree that this is an intermediate format, but we believe that it would
have been more correct to distribute the collection in the old format before
completing the conversion.
We cannot utilize the release 54 for updating our database ACNUC.
Since fortunately we have included into our package MERGE (in press on NAR
special issue - Jan 1988) the program TRANSFORM which convert EMBL format into
the "old" GenBank format, we prefer to use at the moment only the EMBL
collection.
We hope that GenBank can accomplish quickly a revision of the data, checking the
collection in all its structural parts.
We would like to stress another important point.
Many italian research units have adopted our database and softwares (ACNUC and
GLORIA) which are distributed through italian network. This demonstrates
the responsability of the Bank management and the importance for the users
(researchers and software developers) to rely on a structure which could
be easily manipulated with automatic procedures. In this contest we can
welcome changes but only if they provide an improvement.
Marcella Attimonelli
BioComputing Unit Manager
Bari (Italy)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
attimonelli@vaxba0.infnet