[bionet.molbio.genbank] What's in EMBL that's not in GenBank?

roy@phri.UUCP (Roy Smith) (07/15/89)

	Can anybody who has used both the EMBL and GenBank nucleotide
databases tell me how much (if anything) is in one that isn't in the other?
We currently have a subscription to GB and are considering subscribing to
EMBL.  If there isn't really anything in EMBL that isn't already in GB, then
it isn't worth the cost and effort (and, more importantly, disk space) to
maintiain both.  The same question applies to SWISS-PROT and Dayhoff (PIR).
-- 
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
{att,philabs,cmcl2,rutgers,hombre}!phri!roy -or- roy@alanine.phri.nyu.edu
"The connector is the network"

dd@beta.lanl.gov (Dan Davison) (07/15/89)

In article <3863@phri.UUCP>, roy@phri.UUCP (Roy Smith) writes:
> 	Can anybody who has used both the EMBL and GenBank nucleotide
> databases tell me how much (if anything) is in one that isn't in the other?

Disclaimer: I work at T-10, Los Alamos National Laboratory, but I'm
not part of the GenBank project.  Comments here are worth what you
paid for them...


In general, data in one appears in the other.  Delays are due to the
different release schedules of the two databases.  A rule of thumb
is if a sequence is from a European journal the data will appear first
in the EMBL database, and American journals will have data appear first
in GenBank.

If you are concerned, you could get GenBank releases regularly,
then check the EMBL server once a week for new entries.  EMBL
adds completed entries to their BITNET mail server daily.  I will
do the same for my GenBank mail server beginning in October, I hope.
I've just posted a message about the GenBank mail server in the last
two days.  Send e-mail if you want more information about it.

dna dan
-- 
dan davison/theoretical biology/t-10 ms k710/los alamos national laboratory
los alamos, nm 87545/dd@lanl.gov (arpa)/dd@lanl.uucp(new)/..cmcl2!lanl!dd

cb%intron@LANL.GOV (Christian Burks) (07/17/89)

In terms of historical data, there's very little (a handful of citations)
that are in one database but not in the other.  Both databases put
out quarterly releases that are approximately interleaved; since each
database grows by roughly 20% each release, and assuming that a share
of that comes from in house entry and a share from the other database,
it means that each database contains about 10% (or so) that isn't
in the other database for 6 weeks or so after the release.

bottom line: if you're desparate for all that's very recently been
published, and don't have access to the EMBL network server (where
the interleaving lags are done away with), then you will gain 1-2
months head start on 10% or less of the data by subscribing
to both.

note that EMBL, GenBank, and DDBJ are working reducing these numbers
even further, both through mechanisms like the EMBL e-mail server
(GenBank will soon have a similar service through on-line dial-up access),
and through real-time updating of the data at the various sites.

- Christian Burks

wrp@biochsn.acc.Virginia.EDU (William R. Pearson) (07/17/89)

In article <3863@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
]
]	Can anybody who has used both the EMBL and GenBank nucleotide
]databases tell me how much (if anything) is in one that isn't in the other?
]We currently have a subscription to GB and are considering subscribing to
]EMBL.  If there isn't really anything in EMBL that isn't already in GB, then
]it isn't worth the cost and effort (and, more importantly, disk space) to
]maintiain both.  The same question applies to SWISS-PROT and Dayhoff (PIR).
]-- 
]Roy Smith, Public Health Research Institute
]455 First Avenue, New York, NY 10016
]{att,philabs,cmcl2,rutgers,hombre}!phri!roy -or- roy@alanine.phri.nyu.edu
]"The connector is the network"

	This is a question that is best answered by the U. Wiconsin
people, or a subscriber to their software.  They distribute a version
of the EMBL database that has had Genbank entries removed.  I checked
the size of the EMBL file, it is approximately 10% larger than the PRIMATE
file on GENBANK.

Bill Pearson

OLIVER@calstate.bitnet (OLIVER SEELY) (07/17/89)

We just received the April 1989 CD of EMBL -- and no CD player to read it!
Shoot, I don't even know if it's in sonata or symphonic form -- or if
it's done with full instrumentation or synthesizer.

More seriously, I haven't looked at the mnemonics used with EMBL
to determine how difficult it would be to do a comparitive search
between GenBank and EMBL.