[bionet.molbio.genbank] Time lag for sequence appearence

pgil%histone@LANL.GOV (Paul Gilna) (01/17/90)

Rupert de Wachter (RRNA@ccv.uia.ac.be) from the University of 
Antwerp (Belgium) writes:

<text deleted>

     I would like to have some more information about a few things:
- can sequences automatically be retrieved using this same e-mail number or is
  there another access to the file server?
- can we ask on-line help?
- how would a retrieval using the accession number of a particular sequence,
  for example M22441, look like?
- does a sequence appear on the server as soon as it is mentioned in a
  publication or is there any delay?



Dear Dr. de Wachter,

Our colleagues at IntellGenetics will handle your inquiries regarding the 
online system, I should like to address the final question in your list;

"- does a sequence appear on the server as soon as it is mentioned in a
  publication or is there any delay?"

There are three principal sources of nucleotide sequence data that are handled
by the data entry and annotation staff here at LANL; (1), the printed 
publication, where data are manually entered by our data entry crew, 
(2), direct author  submission, where the sequence data and associated 
bibliographic and biological information are provided directly to us by the 
scientist, and  (3), incorporation of data from EMBL and DDBJ releases.

In the former case (extracted from publication), the time taken for
the data to appear on the on-line system is a function of the time taken to
process a particular article through our data entry and annotation
staff.  As soon as our staff here are completed with an entry 
it is immediately passed to the servers both at Intelligenetics 
and at EMBL (as well as Houston).  Currently we are averaging a six week turnaround from the  date of publication to the appearence 
of a fully annotated "entry" on the on-line system.  This is in contrast
to the 13 month average for this source of data two years ago.

In regard to the second source of data, i.e., from the author, if the
data are received in computer-readable form, they should appear on the
servers in fully annotated form within two weeks or less.  If received
in hard-copy form, they go through the process described above.  The fact
that we receive the bulk of our direct submissions AHEAD of publication,
means that the data appear on the on-line systems and servers before
or close to the date of publication. We often have the data in our hands far enough in advance of publication to have errors that we spot in our routine
integrity checking procedures corrected by the author before publication;
in a sense we provide a peer-review function for the sequence data itself, a review not often carried out in the conventional editorial review process.

If the data submitted to us are associated with a manuscript that has 
yet to be accecpted by the journal editorial  process, they will be 
classified as "unpublished" ( this removes complications
which might occur if the journal chose not to accecpt the manuscript):
the entry will be updated with the correct citation once we spot or are 
notified of publication.

We now receive about 65-70% of our data direct from the community.  About
70-80% of that are in electronic form, whether by e-mail or on floppy disc.

While we here at Los Alamos currently incorporate data from EMBL and DDBJ
releases within two weeks of receipt of the tapes, EMBL in addition 
supply new data to the GenBank on-line server on a similar daily basis.

Finally for all submissions, we offer the choice to the author of holding
that data confidential until such time as we are given permission to
release the data or they are published.  In some cases, there is a time
lag before we spot the appearence of data in the literature and link
this to data we are holding as confidential, but this should not normally 
exceed two weeks if the data appear in journals that we regularly scan.

I hope this answers your question.

Regards,

Paul Gilna Ph.D.,
Biology Domain Leader
GenBank, Los Alamos.

kristoff@genbank.BIO.NET (David Kristofferson) (01/18/90)

From Dr. de Wachter:

>- can sequences automatically be retrieved using this same e-mail number or is
>  there another access to the file server?
>- can we ask on-line help?
>- how would a retrieval using the accession number of a particular sequence,
>  for example M22441, look like?

Dr. de Wachter's question was answered previously by private mail but
for the sake of general information I provide the following
instructions from our database entry retrieval server:

----------------------------------------------------------------------
Database entries can be retrieved by either locus name or accession
number.  To use the GenBank Retrieval System, send an electronic
message to RETRIEVE@GENBANK.BIO.NET containing as text (leave the
Subject: line blank) either an accession number, or an entry name, but
not both.  The message text should contain exactly one word.

The data banks are searched in the order: GenBank New Data, GenBank
current release, EMBL New Data, EMBL current release, and Swiss-Prot
(available mid-January 1990), until a match is found.  If an entry
exists in both GenBank and EMBL with the same accession number (the
usual case), a query on the accession number will return the GenBank
version of the entry.  If the EMBL-format version is required, it can
be retrieved from the file server at NETSERV@EMBL.BITNET (for
instructions send a message containing the line HELP to that address).

An electronic version of the sequence data submission form used by the
sequence data banks is also available through the RETRIEVE server.  To
receive a copy, send a message containing the word DATASUB as the only
line.  Instructions for completing and submitting the form are
included.

If you have any questions or comments, feel free to mail them to
RETRIEVE-REQUEST@GENBANK.BIO.NET.
----------------------------------------------------------------------
-- 
				Sincerely,

				Dave Kristofferson
				GenBank On-line Service Manager

				kristoff@genbank.bio.net