[bionet.molbio.genbank] A question for FTP users

kristoff@GENBANK.BIO.NET (Dave Kristofferson) (01/19/91)

We have had a request to keep both the current GenBank release AND the
previous GenBank release in the FTP directory as standard policy.
This places certain demands on disk space obviously.  We would like to
know how many users really think that this feature would be of use and
why.  Please feel free to post your responses publicly to
genbank-bb@genbank.bio.net (this will not be applicable to most
European BIOSCI readers except those with Internet connections).

				Sincerely,

				Dave Kristofferson
				GenBank Manager

				kristoff@genbank.bio.net

roy@ALANINE.PHRI.NYU.EDU (Roy Smith) (01/19/91)

Dave Kristofferson says:
> We have had a request to keep both the current GenBank release AND the
> previous GenBank release in the FTP directory as standard policy.

	To be honest, I'm hard pressed to find a reason why anybody would
want to see release N-1 of GenBank.  I certainly don't see a reason to use
valuable public disk space for this purpose.  What could possibly be in an
older release that's not also in the current one?
-- 
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy
"Arcane?  Did you say arcane?  It wouldn't be Unix if it wasn't arcane!"

jkramer@molbio.med.miami.edu (Jack Kramer) (01/19/91)

In article <9101190214.AA17249@alanine.phri.nyu.edu> roy@ALANINE.PHRI.NYU.EDU (Roy Smith) writes:
>Dave Kristofferson says:
>> We have had a request to keep both the current GenBank release AND the
>> previous GenBank release in the FTP directory as standard policy.
>
>	To be honest, I'm hard pressed to find a reason why anybody would
>want to see release N-1 of GenBank.  I certainly don't see a reason to use
>valuable public disk space for this purpose.  What could possibly be in an
>older release that's not also in the current one?
>-- 

I am one of those who requested that the previous release UPDATE files
be kept on line for some overlap period after a new release.  My primary
reason for this is that I maintain two major software packages (IG and GCG)
that work with the databases.  I make every attempt to keep all the 
databases as up to date as possible.  Each of the packages uses a 
proprietary format for the data.  Even though it is ultimately possible,
it is extremely inconvenient to download the entire databases as soon
as a new version is posted and reformat them to the proprietary formats.
I therefore usually depend on the vendors normal distributions for full
releases.  There is delay from the GenBank release date until the vendors get 
the new release, reformat it, and distribute it to their customers.  This
can be a several week delay.  My request was that the previous updates be 
kept online for a reasonable period to allow those dependent upon vendor 
distribution to get the new baseline release.  This is mainly to prevent
any confusion and mistakes which could affect the work of the database
and software package users.

Now that the new feature table fiasco is finally over, and the state of
update files are well documented on-line this may all be moot.  But I
still feel a little uncomfortable about everything being deleted for
the previous release as soon as a new release is available at GenBank.

This is not a complaint about GenBank.  The anonymous ftp service is
a real lifesaver for me and I really appreciate all the cooperation
and service I have received from the GenBank staff.

BROE@AARDVARK.UCS.UOKNOR.EDU (Bruce Roe) (01/19/91)

David K. recently asked us to comment on the idea of keeping both
the current GenBank release AND the previous GenBank release in the
FTP directory as standard policy.

Personally I don't care since I do not down load these.  However,
we have the GCG programs at our site and it would be very nice if
we could ftp the latest GenBank release which has been converted
to the GCG format.  It seems that alot of us in netland have to
go through alot of work to convert the databases from GenBank to
GCG format before we can use them and it would be nice if we could
just ftp the database in a format which we don't have to fool with
before using.

At present we (GCG sites) either have to pay $1600 for the tapes
from GCG or download the databases from GenBank and convert them
ourselves to GCG format.  This is money and time we could spend on
other, more productive work.  Is there a site we can ftp the databases
in GCG format out there?  If not, why not?

I await responses with baited breath....

Thanks 
        Bruce A. Roe
        Professor of Chemistry and Biochemistry
        INTERNET: BROE@aardvark.ucs.uoknor.edu
        BITNET:   BROE@uokucsvx
        AT&TNET:  405-325-4912 or 405-325-7610
        SnailNet: Department of Chemistry and Biochemistry
                  University of Oklahoma
                  620 Parrington Oval, Rm 208
                  Norman, Oklahoma 73019
        FAXnet:   405-325-6111

reisner@ee.su.oz.au (Alex Reisner) (01/20/91)

>We have had a request to keep both the current GenBank release AND the
>previous GenBank release in the FTP directory as standard policy.
>                                Dave Kristofferson
>                                GenBank Manager
 =======================================================================

For our part we download new releases as soon as they become available via
the Internet.  They are converted to PIR format for use by the packages
we've purchased and the in house software we run.  The previous release is
then compressed and moved to an 8 mm Exabyte cartridge.
Therefore, we don't require holding the previous release on GOS
discs.
        One inexpensive option that may be open to GOS is to place on-line
the previous version which will be on CD-ROM starting this quarter via
a symbolic link.  That should be a fairly cheap solution.  It wouldn't be
in compressed format but at least it would be available.

Alex Reisner
(Australian Genomic Information Service)

jkramer@molbio.med.miami.edu (Jack Kramer) (01/21/91)

In article <9101191404.AA18024@genbank.bio.net> BROE@AARDVARK.UCS.UOKNOR.EDU (Bruce Roe) writes:
>
>go through alot of work to convert the databases from GenBank to
>GCG format before we can use them and it would be nice if we could
>just ftp the database in a format which we don't have to fool with
>before using.
>
>At present we (GCG sites) either have to pay $1600 for the tapes
>from GCG or download the databases from GenBank and convert them
>ourselves to GCG format.  This is money and time we could spend on
>other, more productive work.  Is there a site we can ftp the databases
>in GCG format out there?  If not, why not?
>

My correspondence with major vendors over the past few years on the
ftp availability of proprietary formatted versions of the government
provided seqnence databases would fill a small book.  After lots of
haggling over all the details it seems to all boil down to the fact
that the databases will never be available for "free" as long as these
commercial vendors can make additional profit by reformatting and
distributing the databases.  I have never seen any argument which
"yet" justifies the reformatting other than the profit motive.

roy@phri.nyu.edu (Roy Smith) (01/21/91)

jkramer@molbio.med.miami.edu (Jack Kramer) writes:
> I am one of those who requested that the previous release UPDATE files
> be kept on line for some overlap period after a new release.  My primary
> reason for this is that I maintain two major software packages [...] Each
> of the packages uses a proprietary format for the data.

	Perhaps I misunderstood the original posting; I thought the request
was to keep the *entire* previous release on line.  Just the updates sounds
more reasonable.  But, the real reason I'm following up to Jack's posting
is to flame the software vendors.

	The idea of each vendor having a proprietary format for Genbank is
nuts.  Do vendors really think it's a good idea for people who use two or
more packages to have to keep two or more complete copies of the database
on-line?  Or do they just think that their package is so wonderful, so
complete, and so able to fulfill the needs of every user at every site that
nobody might ever want to possibly run any software other than either own?

	I could see how you could make a point for reformatting the database
to be in some drastically better format (a relational data base, for
example), but many of the reformats I've seen have been nothing more than
trivial textual changes that don't make it better, they just make it
different.

	For example, Ross Smith and I both maintain complete copies of
GenBank (and other databases) on different machines on the same LAN.  For a
while, we've been talking about just having a single copy which one of us
would NFS mount from the other's disk.  A couple of days ago, I got to look
at his copy of GenBank.  It's still formatted as plain old ascii flat files,
but his software vendor decided it was important to insert lines starting
with >'s to delimit loci, instead of the "//" delimiter that the files have
coming off the tape from IG.  There were a couple of other other textual
differences which I didn't study too closely, but it was obvious that none of
them were fundamental changes; they didn't make the file substantially better
than it was before, just different.  Enough so that in order for us to share
a single copy of the database, one of us would have to re-write a lot of our
software to know about the format of the other's database.

	Assuming the only difference is purely reformating the text, then
there is no excuse.  If there is some added information, then it seems to me
that best thing would have been to create a parallel flat file with the extra
info; the vendor's programs could read both files and other programs that
wanted to see a virgin GB file could see that too.  If the vendor wanted
some sort of index into the file, they could have made an index that pointed
into the original file; again, programs that wanted the virgin file could
just ignore the index.

> This is not a complaint about GenBank.  The anonymous ftp service is
> a real lifesaver for me and I really appreciate all the cooperation
> and service I have received from the GenBank staff.

	I'll go along with that.  I've had some minor disagreements with the
GenBank folks, but even the closest long-term colaborators don't always agree
100%.  By and large, the GB people (both at IG and LANL) have gone out of
their way to service every request we have made of them, even when those
requests havn't been entirely reasonable.
--
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy
"Arcane?  Did you say arcane?  It wouldn't be Unix if it wasn't arcane!"

Cherry@Frodo.MGH.Harvard.EDU (J. Michael Cherry) (01/21/91)

In article <9101191404.AA18024@genbank.bio.net> 
BROE@AARDVARK.UCS.UOKNOR.EDU (Bruce Roe) writes:
> David K. recently asked us to comment on the idea of keeping both
> the current GenBank release AND the previous GenBank release in the
> FTP directory as standard policy.

I see no reason for GenBank.Bio.Net to keep the old release files online. 
Each release exists on GenBank.Bio.Net for about three months before its 
deleted and replaced with the next release. That seems like plenty of time 
for anyone to retrieve the files if GenBank is important to their site. If 
someone really needs an easy archive of old versions they should subscribe 
to the CD-ROM distributions.

> Is there a site we can ftp the databases in GCG format out there?  If 
> not, why not?

I know of no sites that make the GCG format of the database open to the 
public. GCG's format once was just NBRF's format but they have been moving 
away from that in recent years. I don't think GenBank.Bio.Net should 
provide anything but the GenBank format files. I would be willing to 
provide the GCG formatted database to the net but I really don't want to 
be the only site in the world providing access. If other sites around the 
world, or at least the US, are interested in being regional ftp access 
sites for the GCG formatted database please let me know. As a closing note 
the reformating from GenBank to GCG is quite simple, involving two 
commands to rebuilt the entire database. Transferring the database via the 
Internet can take longer in real time than the GCG reformatting process.

Mike Cherry
cherry@frodo.mgh.harvard.edu
Department of Molecular Biology
Massachusetts General Hospital, Boston
617-726-5955

kristoff@genbank.bio.net (David Kristofferson) (01/22/91)

Bruce,

	As I am sure you are aware, it is not in GenBank's charter to
supply the databank in any commercial format.  Reformatting costs
money regardless of who does it.  If we were required to reformat the
database as you suggest, we would be obligated to provide it for
*every* commercial vendor.  This is clearly impractical.  Also since
many users do not have access to FTP, they would still have to rely on
tape or CDROM distributions.  The net effect of this would be to delay
the production of GenBank tremendously.  Reformatting GenBank clearly
belongs where it is right now, in the hands of the commercial vendors.

				Sincerely,

				Dave Kristofferson
				GenBank Manager

				kristoff@genbank.bio.net

kristoff@genbank.bio.net (David Kristofferson) (01/22/91)

The issue of why commercial vendors choose their own format goes way
back.  A common complaint in the "good old days" was that GenBank was
"always changing their format."  Commercial vendors did not feel that
they could reliably support their users if the format of the data that
they were receiving was not consistent.  A considerable investment in
time, money, and accumulated data has been made in the interim by
vendors and the users of their software.  Note, however, that when
GenBank changed the features table format recently, there was still a
lot of controversy despite the fact that many attempts had been made
to alert users in advance.  Having been on both sides of the fence
there is undoubtedly blame to go around everywhere.  I do not think
that one can allege any kind of commercial conspiracy here, because it
also costs the companies a significant amount of money to fiddle with
these conversions.  IBM may be able to "lock people in" with
proprietary products because of their size, but this is not a
significant consideration in this rather little arena.  

When people buy commercial software and pay a not insignificant sum,
they expect to get something for their money.  I can understand it if,
having been burned in the past, most vendors still use their own
formats.  Remember that GenBank has been based on five year contracts,
the second of which will end in another year and 9 months.  Each
change brings potential uncertainty although it appears that the
GenBank format will continue to be produced after the end of the
current contract.  Whether it is more cost effective for vendors to
change formats is a decision which is up to them since each faces
their own market conditions with their own set of resources.

As you are well aware, the National Center for Biotechnology
information is trying to establish another format using ASN.1 to try
to develop a new standard for this area.  If this is well thought out
and well received by the user community, perhaps this will eventually
put an end to some of these issues.  Until some reliable degree of
stability is assured to any format, others will undoubtedly continue
to exist.

jkramer@molbio.med.miami.edu (Jack Kramer) (01/22/91)

In article <Jan.21.09.32.02.1991.7956@genbank.bio.net> kristoff@genbank.bio.net (David Kristofferson) writes:
>Bruce,
>
>	As I am sure you are aware, it is not in GenBank's charter to
>supply the databank in any commercial format.  Reformatting costs

I think all the comments here have been directed to the commercial
distributors.  At least my intent was completely in that direction.
GenBank is to be commended on providing the original databases for
access via Internet.  This is definitely not true for all government
sponsored sequence databases.  Try getting PIR from the NBRF.  The
number of price lists and order forms they have sent me in response
to requests for network access must by now exceed the volume of the
actual database.

>the production of GenBank tremendously.  Reformatting GenBank clearly
>belongs where it is right now, in the hands of the commercial vendors.

I am completely satisfied with current and planned GenBank formats.
And if all the vendors standardized on the GenBank format it would 
certainly make life much easier.  No information is added by any of
the proprietary vendors in any of the proprietary formats.  And
indexing schemes which seem to be the most common justification,
work just as well with the original GenBank format as any other.

BROE@AARDVARK.UCS.UOKNOR.EDU (Bruce Roe) (01/23/91)

Hi,

Obviously the problem with the databases, their formats, and
programs to access the databases continues.  As with most things
in life there are no simple solutions until, of course, the solution
is found and then everyone says:

"My, that solution was so simple, why didn't we think of it before".

The solution is rather simple, a common, stable, database format.

Without this, venders of software have 2 choices:
	1. Reformat the databases to fit their software
	2. Change their software to read the distributed databases.
	
Until now the choice of the venders has been the former, mainly
because the format of the databases was (and still is) in a state
of change.  It is more efficient to write a program to change the
database format than it is to change the multitude of code for dealing
with the databases.

John and the folks at GCG have provided tools for converting GenBank
to GCG format and for inter-converting individual sequences from one
format to another.
The Staden programs read the databases stored in the PIR format but
can analyze individual sequences stored in any of several formats.
I do not know what IG does in their package but am sure they have
some similar approaches or do they use GenBank without reformatting?


David K. has written:
>        As I am sure you are aware, it is not in GenBank's charter to
>supply the databank in any commercial format.  Reformatting costs
>money regardless of who does it.  If we were required to reformat the
>database as you suggest, we would be obligated to provide it for
>*every* commercial vendor.  This is clearly impractical.  Also since
>many users do not have access to FTP, they would still have to rely on
>tape or CDROM distributions.  The net effect of this would be to delay
>the production of GenBank tremendously.  Reformatting GenBank clearly
>belongs where it is right now, in the hands of the commercial vendors.

Give me a break.  How many many vendors is *every* ? Do folks really
search the entire GenBank from their pc's?  Some search the protein
databases on their pc's/macs but the entire GenBank?

Could we at least concentrate our discussion on MainFrame computer
programs and databases on these.  Maybe I'm mistaken but I count 
three MainFrame program sets as the vast majority used, GCG, IG, and
NBRF/PIR.  A few sites have the Staden programs but most of us who use
the Staden programs use them for purposes other than database searching.

In reality, Bill Pearson's FASTA and companion programs probably are
used the most and they handle the GCG formatted databases.

I think what we need here is a survey of what's out there.  If we limit
our discussion to Main Frame programs and FTP sites and not deal with
individual users but rather with sites. I also do not think we should
consider other forms of the databases, such as those which require
pre-processing for the NLM BLAST programs or GCG's QUICKSEARCH.

The problem is time and money.  If GCG supplies users with tapes for
$1600 they make money but they sure save me lots of time and I get ALL
the databases we want and need in a format we can use.  I also do not
have to worry about transmission error which may corrupt an ftp-ed
database.  If I get the GenBank tapes I still have to pay (although less)
but then I have to spend time re-formatting databases and also get additional
tapes from PIR and maybe others which could bring the cost in tapes and
effort to a figure greater than the cost from GCG.  No matter what it
looks like the NIH is going to pay the bills, either from individual
grants or from contracts to GenBank/IG.

I'd like to hear from the funding agencies and also like comments from
those who supply databases to the rest of us.

My overall conclusions are:
(1) pay the money to GCG and get quarterly database updates on tape as
it is the least hassle for me and our system folks.  
(2) encourage users to search the latest databases using FASTA-Mail,etc.
(3) continue to join with others to encourage discussions which will
result in a common, stable database format.

Best to one and all,

        Bruce A. Roe
        Professor of Chemistry and Biochemistry
        INTERNET: BROE@aardvark.ucs.uoknor.edu
        BITNET:   BROE@uokucsvx
        AT&TNET:  405-325-4912 or 405-325-7610
        SnailNet: Department of Chemistry and Biochemistry
                  University of Oklahoma
                  620 Parrington Oval, Rm 208
                  Norman, Oklahoma 73019
        FAXnet:   405-325-6111
        ICBMnet:  35 deg 14 min North, 97 deg 27 min West

kristoff@GENBANK.BIO.NET (Dave Kristofferson) (01/24/91)

Bruce,

	One "break" coming up 8-)!  I may respond in more detail
later, but one quick note now.  I think that you *will* find people
who search the entire database on PC's.  They just turn the thing on
and walk away for the evening.  There's a sizable crowd that doesn't
want anything to do with "mainframes" if they can avoid them.  I have
to leave official statements as to what the government can and can't
do to government officials, but there are many people that use
commercial PC and Mac programs, and it is not clear to me that the
government can provide the data in a format specifically tailored to a
subset of commercial programs.  Another thing that comes to mind is
what happens if tapes sent out in XYZ format by GenBank cause a
problem.  Does GenBank get access to the commercial software and run
tests first to ensure that the stuff works?  Does this task remain in
the hands of the vendors and result in constant exchanges between the
vendor and GenBank?  It strikes me that GenBank would have to accept
some degree of responsibility for this, but why should a public agency
get involved in support for a commercial company?  Because people out
there can't come up with $1600 a year for a commercial tape
subscription (* see below)?  It strikes me as being a lot cleaner for
GenBank simply to provide its own format and for the vendors to adapt.
We both agree that this will only occur if there is continued
stability in the format, but this has not happened yet and may still
not happen for some time unfortunately.  Technological progress in its
own right has a nasty habit of requiring format changes.

Dave


* - When I used to work at UCSF I never saw people cring at spending
several hundred dollars on radionucleotides, GTP, etc.  However, there
is some kind of psychological barrier when it comes to spending money
on computers and software (maybe because it's easier to copy software
than it is to make reagents!).

kristoff@GENBANK.BIO.NET (Dave Kristofferson) (01/24/91)

Bruce,

P.S. - The people at NCBI will be running the next GenBank contract
and have it in their charter to develop standards.  I would hope that
they would comment on these issues.  NCBI has already held developers
meetings on their proposed ASN.1 standard which I hope people are
anticipating.  

I am all for minimizing disruptions to people's software (I'm still
suffering from some of these problems myself), but the longer term
future is in the hands of the NIH.  They can dictate what the
contractors are required to provide.

Dave

JREES@vax.oxford.ac.uk (01/24/91)

	Time I guess to say my piece also...

	It shouldn't even need questioning that reformatting the databases
form one format to another is inappropriate, and I won't reiterate the
arguements made before here as to, that has been done already.

	But I am going to add my voice to those who would see ALL the software
packages accept the format distributed by the databases as the one to use for
access to the databases in straight ascii format. Will Gilbert articulated
this whole area very clearly on INFO-GCG last August, and perhaps he can be
persuaded to repost to this discussion, but in essence it is very simple for
everyone using the flat format files to interface to them in "native" format
and to provide utilities to index the access in a fashion which facilitates
access for their own package. Some programmers seem willing to do this (Rodger
Staden for one has stated a willingness to use whatever format is chosen by
PIR, and has no objection to using "native" format if they do), others seem
very determined to go their own way in the face of opposition (perhaps rather
silent oppostion until now) from those are actually on the receiving end of
the effect of this dogma.

	Since there can be no programming advantage that I can see for the
reformatting the question is why is noone willing to standardise? It is clear
that the standard HAS to be that created by the database in question, and that
software can be written to meet whatever format it is presented with, and that
all the packages COULD use whatever format the database was presented to them
in (EMBL, Genbank, PIR, Codata, whatever) by setting the appropriate parameter
to the package at the start.

	Perhaps if this were done then there would be less time and effort
wasted making multiple copies for everyday use. The problems the Genbank/IG
have with disk space probably apply to most of use - my own operations running
software and databases in Oxford and at MIT run 700MB on each machine,
reformatting databases generally means finding 150 MB of spare space at a
minimum, more if the active version is not deleted first, and this is getting
worse as the databases get larger each year.

	Clearly the change we could have all hoped for in the construction of
the relational format Genbank and Embl has not yet gained the active interest
of the programming community as an option, it is my hope that this will be the
way forward in the long term, and that those in a position to advance this
will do so on this forum.

	Finally to avoid the point being missed, I am fully in support of the
use for total reformattting where it achieves a significant change in the
response to the user, the preprocessing required to run BLAST or GCG's Quick
software is an investment well worth making - even when the overall cost in
resource has been higher, but the problem under discussion here does NOT
achieve that end.

	Jasper Rees

	Jrees@Vax.Oxford.ac.uk   (%nsfnet-relay.ac.uk)
	Seqtest@Wccf.MIT.edu

	"One World, One database format ?"

droufa@MATT.KSU.KSU.EDU (Donald J Roufa) (01/24/91)

Bruce Roe writes:
 
> 
> Give me a break.  How many many vendors is *every* ? Do folks really
> search the entire GenBank from their pc's?  Some search the protein
> databases on their pc's/macs but the entire GenBank?
> 
> Could we at least concentrate our discussion on MainFrame computer
> programs and databases on these.  Maybe I'm mistaken but I count
> three MainFrame program sets as the vast majority used, GCG, IG, and
> NBRF/PIR.  A few sites have the Staden programs but most of us who use
> the Staden programs use them for purposes other than database searching.
> 
> In reality, Bill Pearson's FASTA and companion programs probably are
> used the most and they handle the GCG formatted databases.
> 
> I think what we need here is a survey of what's out there.  If we limit
> our discussion to Main Frame programs and FTP sites and not deal with
> individual users but rather with sites. I also do not think we should
> consider other forms of the databases, such as those which require
> pre-processing for the NLM BLAST programs or GCG's QUICKSEARCH.
> 
> 
	Although I agree that users carrying out sophisticated
analyses of GenBank on PCs are severely handicapped by their machines'
command of memory and speed, as a 'bench' molecular biologist who
frequently uses GenBank information for experiments, I have found that
most of my searches are not sophisticated ones.  They simply are
requests to retrieve a single locus within the database, or, as Bruce
asserts, are TFASTA sequence comparisons.  The former are most
conveniently done on our laboratory PCs using the CD-ROM or floppy
report release.  The latter are best done, as suggested, on mainframes
or via e-mail directly to GenBank, but they can also be carried out in
just a few minutes (20 minutes to be precise) for TFASTA search on a
80386 PC.  Since the vast majority of working laboratories have access
to PC's, whereas only a subset of them have close ties to mainframes,
I think that Dr. Roe's suggestion would not be in the best interests
of the entire population of working molecular biologists.  In
addition, it has been my experience that, despite the fact that I do
use our university's mainframe and unix network for GenBank work, as a
research scientist I have little influence over our institution's
allocation of mainframe computing resources.  In contrast, I have
complete control over our local microcomputing resources, and can
tailor our database needs for research quite specifically at that
level.  Inasmuch as GenBank is, in fact, a research resource, it is
important that we not lose site of its use by people who are
depositing data in the database.

-- 
    Don Roufa        E-Mail: DROUFA@MATT.KSU.KSU.EDU    //  | /  /---  |  |
Division of Biology          DROUFA@KSUVM.KSU.EDU      //   |/   |__   |  |
Kansas State Univ.   Tel: (913) 532-6641              //    |\      |  |  |
Manhattan, KS 66506  Fax: (913) 532-6653             //     | \  \__/  \__/

Cherry@Frodo.MGH.Harvard.EDU (J. Michael Cherry) (01/25/91)

In article <CMM.0.88.664670183.kristoff@genbank.bio.net> 
kristoff@GENBANK.BIO.NET (Dave Kristofferson) writes:
> The people at NCBI will be running the next GenBank contract
> and have it in their charter to develop standards.  I would hope that
> they would comment on these issues.  NCBI has already held developers
> meetings on their proposed ASN.1 standard which I hope people are
> anticipating.  

Please forgive my nitnicking that follows but I'd hate to see things get 
more confused. This is not directed to Dave's posting I just quoted it
so you would see were things start.

The NCBI proposed database standard is built using a transaction/notation
standard called ASN.1. ASN.1 has been adopted by several commercial computer 
and software companies for a variety of applications. ASN.1 is not the name 
of the NCBI standard. I believe the NCBI refers to the database format by the 
name of their nascent database - GenInfo Backbone. You can retrieve a copy of
the the GenInfo Backbone format version 0.5 via anonymous ftp from 
ncbi.nlm.nih.gov. Look in the toolbox/asn_0.5 directory.

One more little point if I may. Several people have referred to 
"mainframe" computers in this discussion of formats. A mainframe computer 
is a very large computer typically produced by IBM. There are few 
if any dedicated mainframe computers run for molecular biologist. However 
Digital's VAX computers - generally called mini computers are everywhere. 
However, currently most all the computers being sold by Digital, Sun 
Microsystems, HP, Apple and even IBM are microcomputers. Sun and others 
may call them supermicros - but that is just marketing.

Mike Cherry
cherry@frodo.mgh.harvard.edu
Department of Molecular Biology
Massachusetts General Hospital, Boston
617-726-5955