smith@mcclb0.med.nyu.edu (08/04/90)
We have just installed our new GenBank release (64) and have attempted to strip the USENET update bank of the sequences now to be found in the bank proper. In doing so we found a bug in the way that a small number of dates were decoded: the bug was caused by the presence of an extra blank in the date string, and only affected 2-3 days in the three month period. This bug has been fixed, and the updated s/w for VMS is available as NIGHTLY.1 from our MAILSERVer, or via anonymous/FTP. However... In running the program to eliminate the duplicated sequences we found that of the approximately 4,600 sequences in our update bank, 1,700 were duplicated in the new GenBank release, using 11-May-1990 [provided by GenBank] as the drop-dead date for this release. However, the number of new sequences in this release is about 2,700, or 1,000 more than we found. Since the USENET 'delivery' started close to the drop-dead date for release 63, finding 1,000 sequences in the new bank not found in the update bank doesn't sound right: either a lot of sequences were lost in delivery, or we still have a s/w problem (sigh). We will look into this problem in the next two weeks. Until we resolve it we intend to keep our unstripped update bank on line here. Any comments, or reports of experiences at other sites would be helpful (Roy?, Dave?, Eliot?). +---------------------------------------------------------------------------+ |Ross Smith, Cell Biology, NYU Medical Center, 550 First Ave., NYC, 10016| |Phone: (212) 340-5356: FAX: (212) 340-8139 (Alternate NYUMC) (212) 340-7190| |E-Mail: SMITH@NYUMED.BITNET (BITNET), SMITH@MCCLB0.MED.NYU.EDU (Internet)| +---------------------------------------------------------------------------+
roy@phri.nyu.edu (Roy Smith) (08/04/90)
In <6946.26b9a433@mcclb0.med.nyu.edu> smith@mcclb0.med.nyu.edu writes: > either a lot of sequences were lost in delivery, or we still have a s/w > problem (sigh). I know a lot of sequences were dropped on the floor at phri when our network link was down for a while, so mcclb0 didn't get them either. Whether that accounts for the missing killolocus or not, I can't say. It is clear, however, that, at least given the current technology, the tape copies of the database still have to be relied on as the only authoritative ones, with the network updates being value-added, but not (yet, if ever) a full replacement. -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy "Arcane? Did you say arcane? It wouldn't be Unix if it wasn't arcane!"
smith@mcclb0.med.nyu.edu (08/04/90)
In article <1990Aug4.012439.9628@phri.nyu.edu>, roy@phri.nyu.edu (Roy Smith) writes: > > I know a lot of sequences were dropped on the floor at phri when > our network link was down for a while, so mcclb0 didn't get them either. The link was down for about two weeks, so we FTPed the sequence batches from GenBank for that period, and loaded them into the Update bank by hand. So we should not have lost that many for that reason. One question is therefore whether many sequences missed being sent out in the update feed. 1,000 seems a lot to have missed. +---------------------------------------------------------------------------+ |Ross Smith, Cell Biology, NYU Medical Center, 550 First Ave., NYC, 10016| |Phone: (212) 340-5356: FAX: (212) 340-8139 (Alternate NYUMC) (212) 340-7190| |E-Mail: SMITH@NYUMED.BITNET (BITNET), SMITH@MCCLB0.MED.NYU.EDU (Internet)| +---------------------------------------------------------------------------+
smith@mcclb0.med.nyu.edu (08/18/90)
In article <1990Aug4.012439.9628@phri.nyu.edu>, roy@phri.nyu.edu (Roy Smith) writes: > In <6946.26b9a433@mcclb0.med.nyu.edu> smith@mcclb0.med.nyu.edu writes: >> either a lot of sequences were lost in delivery, or we still have a s/w >> problem (sigh). > > I know a lot of sequences were dropped on the floor at phri when > our network link was down for a while, so mcclb0 didn't get them either. > Whether that accounts for the missing killolocus or not, I can't say. We still do not have an explanation as to where the 'lost' sequences went. What is clear, however, is that stripping the banks of the duplicates found does not cause the loss of non-duplicate sequence data. So its OK to strip the UPDATE bank (if you've been waiting). In the meantine we have implemented sequence base-count checking for each sequence delivered. The modified distribution is in NIGHTLY.1 on our MAILSERVer, or via anonymous/FTP. Thanks to Brent Hobbs for the code. +---------------------------------------------------------------------------+ |Ross Smith, Cell Biology, NYU Medical Center, 550 First Ave., NYC, 10016| |Phone: (212) 340-5356: FAX: (212) 340-8139 (Alternate NYUMC) (212) 340-7190| |E-Mail: SMITH@NYUMED.BITNET (BITNET), SMITH@MCCLB0.MED.NYU.EDU (Internet)| +---------------------------------------------------------------------------+
lear@turbo.bio.net (Eliot) (08/21/90)
I suppose what I ought to do is to add a line in the header with a sequence number and have associated with each sequence number the accession numbers in a particular posting. That way if you do lose some in transit you'll be able to figure out which ones. -- Eliot Lear [lear@turbo.bio.net]