cb%intron@LANL.GOV (Christian Burks) (11/20/88)
Dear Dr. Smith: I gather that Dave Benton answered some of your questions...I've made stab at answering the others. Thanks for your concern and interest. Christian Burks > Return-Path: <phri!alanine.phri!roy@nyu.edu> > Received: from rutgers.edu by BIONET-20.BIO.NET with TCP; Sat 5 Nov 88 02:34:36-PST > Received: by rutgers.edu (5.59/1.15) > id AA06289; Sat, 5 Nov 88 05:34:19 EST > Received: by phri.phri (5.51/5.17) > id AA11252; Fri, 4 Nov 88 14:05:43 EST > Received: by alanine.phri (3.2/5.17) > id AA24967; Fri, 4 Nov 88 14:07:05 EST > Date: Fri, 4 Nov 88 14:07:05 EST > From: phri!alanine.phri!roy@nyu.edu (Roy Smith) > Message-Id: <8811041907.AA24967@alanine.phri> > To: benton@bionet-20.bio.net, nucall@bionet-20.bio.net > Subject: Re: Missing entries in GenBank (cooperation with EMBL) > Cc: roy@alanine.phri > > Dr. Benton, > > Thank you for checking this out for me. While I am glad to know it > will be in the next release, I do wonder if a 10 month (end of January to > start of December) lag between publication and entry into the data base is > too long. Granted, we're looking at a worst case because we just missed a > release and thus incurred an extra 3-month delay, but even 7 months seems > like a long time. Keep in mind, that this 7 or 10 month delay starts > counting from the date the paper is published; given that most papers are 6 > months or more from submission to when it hits presses, and you're talking > over a year from the time a sequence in known to when it's on-line. > > What is a typical amount of time between publication of a sequence > in a journal and when it goes out on a GenBank tape? What amount of time > is considered "good" by the GenBank staff (i.e. the target delay, beyond > which subscribers should feel justified complaining about)? > Most data are now getting into a public release of GenBank within 3-5 months of receipt date at LANL. The major exception is for data received prior to publication which the author requests we withhold from public release until some future date (e.g., date of publication in a journal article); in this case the data are queued into a public release as soon as that date is reached. (This may still fall short of the ideal ...but it should be compared with 2 years ago when the average time from receipt at LANL to appearance in the database was 12-14 months. That was clearly unacceptable, and we've put much effort and resources into turning that around.) When should a correspondent feel concerned enough to follow up? At this point I would suggest that if more than one public release has gone by since LANL received and acknowledged receipt of the data (and if the author released the data for public consumption at that point), they should definitely contact us. If one submits data initially and doesn't receive an acknowledgement within two weeks, that should be followed up on immediately. > Are there any plans to have more frequent updates in-between the > major quarterly releases? I could envision once a week ftping the latest > stuff from bionet-20 (or wherever the master copy is maintained, or perhaps > there could be several repository sites around the country to reduce system > load and network congestion). These intermediate updates could be > unannotated to get them out faster. I could envision three levels an entry > would go through. First, as soon as possible, an unnanotated entry made > available for ftp. Second, each time a quarterly tape goes out, all those > entries in the ftp area which are still not yet fully ready would be put on > the tape as part of the current unannotated section. Lastly, when entries > are fully annotated, checked, indexed, and otherwise masaged into their > final form, merged into the main data base. There are many schemes (including that you suggest) for getting incremental data out earlier...in fact, we did, until a year ago, distribute an interim (six weeks) release that included only "new" data...this was dropped because very few people requested subscriptions to it and those that did admitted (with only 1-2 exceptions) that they didn't use it anyway. Given the way that we were maintaining the data at that time, these interim releases were very time consuming for us with little -- as far as we could see -- benefit reaching the user community. Over the next year, we will be shifting over to a data maintenance scheme that will allow for the continuous or almost-continuous updating of the database for internal maintenance...we hope by the end of the year to have established some reflection of this continuity in the distributed data, perhaps with even weekly updates being available in some distributed form. > > The idea is to get the sequence data out as fast as possible to the > scientists who want to see it. From what I see, I classify GenBank (and > the same comments go pretty much for Dayhoff and other similar databases) > usage into two catagories. First is "I want to know the sequence of XXXX". > This is straight-forward and if XXXX is not yet in the database, you find > out fast. If it's critical that you know about XXXX, you can always call > the author or something like that. The second one is the shot-in-the-dark > search. This latter one is where you really get killed by slow updates, > because if you don't find something, you don't know what you missed. These > searches are often for sequence homolgies; but people just as often say > "give me all the erythromycin resistance genes" or something like that, for > which the same comments about the dangers of slow updates apply. > ------- We share these concerns, and although we've made great strides in this regard over the past 18 months, we believe the release cycle time will be much more improved over the coming year. > > >
kramerj@bionette.CS.ORST.EDU (Jack Kramer - CMBL) (12/03/88)
I have been working with many molecular biologists for several years doing much of their computer work. I often get complaints on the interval of time between the publication of a paper and inclusion of a contained sequence in the database. Most often, when I ask if they submitted the sequence to the database, the response is that they didn't have time to fill out the form or didn't think it was necessary. "It was published, why shouldn't everyone rush to take care of this most important sequence?" I think this may be the most common reason for the gap between publication and availability. Both Dave and Christian were too kind to scold complainers in this class. The original publication does not always have all the information that should be included in the database. Even when it does, it can take an unreasonable amount of time to decipher many of the papers. Prompt submission of all the required data directly to the correct location would probably alleviate any rational basis for dissatisfaction with the delays. I would like to relay the thanks from several hundred biologists at Oregon State University, who have benefited from the excellent work that has been and is being done by the molecular database staffs. Handling only the volume of data submitted properly, I'm sure calls for much effort beyond the normal. And then much is done even beyond that. THANKS!!!!!! Jack Kramer Computational Molecular Biology Laboratory Oregon State University
Kristofferson@BIONET-20.BIO.NET (David Kristofferson) (12/03/88)
Well put, Jack! The attitudes that you describe in your message are encountered time and time again. You have truly hit the nail right on the head. I am sure that the database staff will appreciate your kind remarks. Criticism often comes from those with little knowledge of the magnitude of the efforts involved. Dave -------