[bionet.molbio.genbank] More about the USENET updates: VMS software.

smith@mcclb0.med.nyu.edu (08/04/90)

      We have just installed our new GenBank release (64) and have attempted 
to strip the USENET update bank of the sequences now to be found in the bank 
proper.

      In doing so we found a bug in the way that a small number of dates were
decoded: the bug was caused by the presence of an extra blank in the date
string, and only affected 2-3 days in the three month period.  This bug has
been fixed, and the updated s/w for VMS is available as NIGHTLY.1 from our
MAILSERVer, or via anonymous/FTP.  However... 

      In running the program to eliminate the duplicated sequences we found
that of the approximately 4,600 sequences in our update bank, 1,700 were
duplicated in the new GenBank release, using 11-May-1990 [provided by
GenBank] as the drop-dead date for this release. However, the number of new
sequences in this release is about 2,700, or 1,000 more than we found.  Since
the USENET 'delivery' started close to the drop-dead date for release 63,
finding 1,000 sequences in the new bank not found in the update bank doesn't
sound right: either a lot of sequences were lost in delivery, or we still
have a s/w problem (sigh). 

      We will look into this problem in the next two weeks.  Until we resolve
it we intend to keep our unstripped update bank on line here.  Any comments,
or reports of experiences at other sites would be helpful (Roy?, Dave?,
Eliot?). 

+---------------------------------------------------------------------------+
|Ross Smith, Cell Biology,  NYU Medical Center,  550 First Ave.,  NYC, 10016|
|Phone: (212) 340-5356: FAX: (212) 340-8139 (Alternate NYUMC) (212) 340-7190|
|E-Mail:  SMITH@NYUMED.BITNET (BITNET),  SMITH@MCCLB0.MED.NYU.EDU (Internet)|
+---------------------------------------------------------------------------+

roy@phri.nyu.edu (Roy Smith) (08/04/90)

In <6946.26b9a433@mcclb0.med.nyu.edu> smith@mcclb0.med.nyu.edu writes:
> either a lot of sequences were lost in delivery, or we still have a s/w
> problem (sigh). 

	I know a lot of sequences were dropped on the floor at phri when
our network link was down for a while, so mcclb0 didn't get them either.
Whether that accounts for the missing killolocus or not, I can't say.  It
is clear, however, that, at least given the current technology, the tape
copies of the database still have to be relied on as the only authoritative
ones, with the network updates being value-added, but not (yet, if ever) a
full replacement.
--
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy
"Arcane?  Did you say arcane?  It wouldn't be Unix if it wasn't arcane!"

smith@mcclb0.med.nyu.edu (08/04/90)

In article <1990Aug4.012439.9628@phri.nyu.edu>, roy@phri.nyu.edu (Roy Smith) writes:
> 
> 	I know a lot of sequences were dropped on the floor at phri when
> our network link was down for a while, so mcclb0 didn't get them either.

The link was down for about two weeks, so we FTPed the sequence batches from 
GenBank for that period, and loaded them into the Update bank by hand.  So we 
should not have lost that many for that reason.

One question is therefore whether many sequences missed being sent out in the  
update feed.  1,000 seems a lot to have missed.

+---------------------------------------------------------------------------+
|Ross Smith, Cell Biology,  NYU Medical Center,  550 First Ave.,  NYC, 10016|
|Phone: (212) 340-5356: FAX: (212) 340-8139 (Alternate NYUMC) (212) 340-7190|
|E-Mail:  SMITH@NYUMED.BITNET (BITNET),  SMITH@MCCLB0.MED.NYU.EDU (Internet)|
+---------------------------------------------------------------------------+

smith@mcclb0.med.nyu.edu (08/18/90)

In article <1990Aug4.012439.9628@phri.nyu.edu>, roy@phri.nyu.edu (Roy Smith) writes:
> In <6946.26b9a433@mcclb0.med.nyu.edu> smith@mcclb0.med.nyu.edu writes:
>> either a lot of sequences were lost in delivery, or we still have a s/w
>> problem (sigh). 
> 
> 	I know a lot of sequences were dropped on the floor at phri when
> our network link was down for a while, so mcclb0 didn't get them either.
> Whether that accounts for the missing killolocus or not, I can't say.

We still do not have an explanation as to where the 'lost' sequences went.
What is clear, however, is that stripping the banks of the duplicates found 
does not cause the loss of non-duplicate sequence data.

So its OK to strip the UPDATE bank (if you've been waiting).

In the meantine we have implemented sequence base-count checking for each
sequence delivered.  The modified distribution is in NIGHTLY.1 on our 
MAILSERVer, or via anonymous/FTP.  Thanks to Brent Hobbs for the code.

+---------------------------------------------------------------------------+
|Ross Smith, Cell Biology,  NYU Medical Center,  550 First Ave.,  NYC, 10016|
|Phone: (212) 340-5356: FAX: (212) 340-8139 (Alternate NYUMC) (212) 340-7190|
|E-Mail:  SMITH@NYUMED.BITNET (BITNET),  SMITH@MCCLB0.MED.NYU.EDU (Internet)|
+---------------------------------------------------------------------------+

lear@turbo.bio.net (Eliot) (08/21/90)

I suppose what I ought to do is to add a line in the header with a
sequence number and have associated with each sequence number the
accession numbers in a particular posting.  That way if you do lose
some in transit you'll be able to figure out which ones.
-- 
Eliot Lear
[lear@turbo.bio.net]