[bionet.molbio.genbank] SEQNET Bulletin

KRISTOFFERSON@BIONET-20.ARPA (02/06/88)
From: MJB1@VMS-SUPP.CAM.AC.UK

Bulletin_# 72   ATTIMONELLI%VAXBA0.INFNET 5 Feb 88 BBOARDS on Nucleic Acid colle
From: <ATTIMONELLI%VAXBA0.INFNET@EARN.IBOINFN>  5-FEB-1988 02:29
To: SEQNET
Subject: BBOARDS on Nucleic Acid collections
Date: 5 Feb 88


Via:           UK.AC.RL.EARN; Fri, 05 Feb 88 02:27:26 GMT
Received:
          from UKACRL by UK.AC.RL.IB (Mailer X1.25) with BSMTP id 2730; Fri, 05
               Feb 88 02:27:26 GM
Received:
              from IBOINFN.BITNET by UKACRL.BITNET (Mailer X1.25) with BSMTP id
               2728; Fri, 05 Feb 88 02:27:25
Date:          Thu, 4 Feb 88 19:30 N
From:          <ATTIMONELLI%VAXBA0.INFNET@EARN.IBOINFN>
Reply-To:      <ATTIMONELLI%VAXBA0.INFNET@EARN.IBOINFN>
Subject:       BBOARDS on Nucleic Acid collections
To:            MJB1@UK.AC.CAM.VMS-SUPP
X-Original-To: "MJB1@VMS-SUPP.CAM.AC.UK"

Message-id: <4495>
Date: THU,  4-FEB-88 19:29 N
From: <ATTIMONELLI@VAXBA0.INFNET>
Reply-To: <ATTIMONELLI%VAXBA0.INFNET@IBOINFN.BITNET>  (alternate reply)
Subject: BBOARDS on Nucleic Acid collections
To:   <MJB1@VMS-SUPP.CAM.AC.UK>
X-Original-To:  MJB1@VMS-SUPP.CAM.AC.UK, ATTIMONELLI


           To the managers of GenBank and EMBL collections and to
           all the Databank users.

This is a note on the release 54 of GenBank that we have recently examined.
In this release the GenBank has changed the FEATURE table format and has
announced that they are moving toward a common format together with the
EMBL and the DNA Japan databank.
We are glad to note that there is at least an intent of reaching a
standardization on the format of the Nucleic Acid Bank, but we want to point
out a few important issues which we hope will be taken into account by both
the scientific community and the Bank managers.

As reported in our paper [ACNUC - a portable retreival system for...(CABIOS,
vol.1(3), 1985, pp.167-172)], we have adopted the GenBank collection for the
generation of the database ACNUC. We preferred the GenBank format to the EMBL
one mainly for the organization of the FEATURE and SITES tables.
In fact we considered very useful the use of SITE keys indicating start and
stop of a region (e.g. -> and  <- ) and boundary between two regions (e.g. /).
Moreover the distinction apported by GenBank between FEATURES table and SITES
table allowed us to easily select the regions that in ACNUC are extracted
as SUBSEQUENCES.
In other words this organization gave us the possibility to extract directly
through ACNUC, specific fragments of a GenBank locus.
This facility is one of the most useful features of ACNUC which makes this
software more flexible and powerful.
The great advantages of the old organization of GenBank has been stressed
also by several researchers (see for example [Nussinov,R. et al. Biochimica et
Biophysica Acta 866 (1986),109-119]).
It is therefore a pity to note that just these useful keys have been abolished.
Moreover in our opinion at the moment the temporary structure of GenBank is
floppy and not very useful.
Of course we do not know the future developments and goals of GenBank but we
would like to stress that with this new format the scientific community has
lost a very important tool.
We wish also to point out several incongruencies encountered between
the news reported in the release notes and the content of the entries files.

In particular in the Primate entry file we have noted :

a) several EMBL sequences have been converted into the GenBank format
   in a pedestrian way (EMBL feature tables have been  simply confined to
   Comments);
b) the feature keys as pept.psi, matp.psi, mRNA.psi, sigp.psi, mRNA+IVS
   are not reported in the Feature keys names (section 3.5.7.1 of the release
   notes);
c) the announced substitution of the key "variation" into the key "variant"
   has not been applied and this has produced an uncorrected tabulation of the
   'from' and 'to/span' fields;

The examples below reported can clarify the situation:

1)             Partial feature table of GenBank entry HUMHBB

    pept.psi  45741    45831     pseudo-hbp, exon 1 [62]
              45953    46175     pseudo-hbp, exon 2 [62]
              47030    47157     pseudo-hbp, exon 3 [62]
    mRNA.psi  45688    47425     pseudo-hbp mRNA [62]
    mRNA+IVS  19289    21098     hbe mRNA (alt.) [19],[40],[52]
    mRNA+IVS  19504    21098     hbe mRNA (alt.) [19],[40],[52]
    mRNA+IVS  19506    21098     hbe mRNA (alt.) [19],[40],[52]
    rpt       66817    66827     Alu flank repeat 5' copy [49],[63]
    rpt       66828    67094     Alu family repeat [49],[63]
    variation  17864    17866     cag in clone lambda-epsilon; g in ph 1.8 [24]
    revision  18641    18646     aatata in [34]; gatgtg in [19]
    refnumbr  19120    19120     numbered 1 in [19]
    refnumbr  19560    19560     numbered 1 in [67]; zero used
    variation  32761    32762     ag in [26]; ga in [25]
    variation  33204    33204     a in [26]; g in [25]
    variation  46596    46597     aa in [62]; a in [63]
    variation  46851    46853     aca in [62]; a in [63]
    variation  47186    47208     ggtccactatgtttgtacctatg in [62]; g in [63]
    variation  47341    47341     t in [62]; tt in [63]
    refnumbr  50768    50768     numbered 1 in [45],[54]

2)    EMBL PTAGGLOG entry converted into GenBank CHPAGGLOG

LOCUS       CHPAGGLOG    1815 bp ds-DNA             pre-entry 12/31/87
DEFINITION  Chimpanzee fetal A-gamma-globin gene.
ACCESSION   X03110
KEYWORDS    A-gamma-globin; direct repeat; gamma-globin; tandem repeat.
SOURCE      chimpanzee (Pan troglodytes).
  ORGANISM  Pan troglodytes
            Eukaryota; Metazoa; Chordata; Vertebrata; Tetrapoda; Mammalia;
            Eutheria; Primates; Anthropoidea; Hominoidea; Ponginae; Ponginae.
REFERENCE   1  (bases 1 to 1815; enum. 1 to 1815)
  AUTHORS   Slightom,J.L., Chang,L.-Y.E., Koop,B.F. and Goodman,M.
  TITLE     Chimpanzee fetal G-gamma and A-gamma globin gene nucleotide
            sequences provide further evidence of gene conversions in hominine
            evolution
  JOURNAL   Mol Biol Evol 2, 370-389 (1985)
COMMENT     Data kindly reviewed (07-JUL-1986) by Slightom J.L.

               EMBL features not translated to GenBank features:
               key        from     to       description

               PRM          24     28       put. TATA-box
               TRANSCR      55   1647       put. primary transcript
               CAP          55     55       put. cap site
               MSG          55    199       put. exon 1

               IVS         200    321       intron I

               IVS         545   1431       intron iI
               RPT        1123   1162       TG(14) repeat (hot spot sequence
               MSG        1431   1647       put. exon 3

               SITE       1621   1626       put. polyadenylation signal
               POLYA      1647   1647       put. polyadenylation site
FEATURES       from  to/span     description
    pept        108      199     A-gamma-globin (aa 1-31) (199 is 2nd base in
                                 codon)
                322      544     A-gamma-globin (aa 32-105) (322 is 3rd base in
                                 codon)
               1432     1560     A-gamma-globin (aa 106-147)
BASE COUNT      471 a    357 c    474 g    513 t
ORIGIN




We agree that this is an intermediate format, but we believe that it would
have been more correct to distribute the collection in the old format before
completing the conversion.

We cannot utilize the release 54 for updating our database ACNUC.

Since fortunately we have included into our package MERGE (in press on NAR
special issue - Jan 1988) the program TRANSFORM  which convert EMBL format into
the "old" GenBank format, we prefer to use at the moment only the EMBL
collection.

We hope that GenBank can accomplish quickly a revision of the data, checking the
collection in all its structural parts.

We would like to stress another important point.
Many italian research units have adopted our database and softwares (ACNUC and
GLORIA) which are distributed through italian network. This demonstrates
the responsability of the Bank management and the importance for the users
(researchers and software developers) to rely on a structure which could
be easily manipulated with automatic procedures. In this contest we can
welcome changes but only if they provide an improvement.

                                            Marcella Attimonelli

                                            BioComputing Unit Manager

                                            Bari (Italy)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
attimonelli@vaxba0.infnet