[bionet.molbio.genbank] Data banks and CD-ROM.

BAIROCH@cgecmu51.bitnet (Amos Bairoch) (12/20/89)

I do not know why everyone seems to discuss the long term adavantages of
distributing sequence data banks on CD-ROM: if you add up the number of
CD-ROM disks containing Genbank and/or EMBL + PIR and/or SWISS-PROT
distributed by many secondary distributor you already have more CDs
out there than tapes !

You must remember that most commercial PC sequence analysis packages
distributes data banks on CD. EMBL also ship an increasing amount of
CDs.

I always have the impression that managers of big computer centers tend
to ignore what happening in the real world. CDs are not only a good
and very cheap distribution medium, they are also an efficient on-line
storage alternative. Ok they are slower than hard disk and it takes 2 to 4
hours to run a Lipman and Pearson on the complete data bank, but buying a
386 system plus a CD drive and using it only for this purpose is much
cheaper than using a VAX.

I saw someone quoting a price of $ 2000.- for a CD drive, well that's
sure is an expensive one !  You have some discount house that sell drives
for about $ 500.- List prices of most drives are around $ 1000.-

Amos Bairoch
Dept. Medical Biochemistry
University of Geneva
Switzerland

naftoli@aecom.yu.edu (Robert N. Berlinger) (12/27/89)

In article <8912192350.AA01990@net.bio.net>, BAIROCH@cgecmu51.bitnet (Amos Bairoch) writes:
> I always have the impression that managers of big computer centers tend
> to ignore what happening in the real world. CDs are not only a good
> and very cheap distribution medium, they are also an efficient on-line
> storage alternative. Ok they are slower than hard disk and it takes 2 to 4
> hours to run a Lipman and Pearson on the complete data bank, but buying a
> 386 system plus a CD drive and using it only for this purpose is much
> cheaper than using a VAX.
> 
> I saw someone quoting a price of $ 2000.- for a CD drive, well that's
> sure is an expensive one !  You have some discount house that sell drives
> for about $ 500.- List prices of most drives are around $ 1000.-

I agree that CD-ROM is a reasonable distribution medium for the data banks,
but you seem to imply that it will be cheaper for individuals than a centralized
storage and access point because everyone can get hold of a 386 and buy
a CD player for $1000.  That's simply not true if you scale it.

Suppose there are 30 labs within your institution that need this capability. 
There's $30,000 just for the players.  Then you need 30 subscriptions to each
data bank, etc.; you know what I'm talking about.

Compare that to one subscription stored on a central network
server which can provide virtual disk access.  Speed will be comparable
or better than CD-ROM directly, because the data can be stored
inexpensively (once) on fast hard disks attached to the server.

I also feel that a remote search submission facility does have merit
in this situation.  Let's face it -- a 386 is only so fast, and submitting
a job to a search engine could be the way to go.  In a network access
scheme, this will cut down on network traffic significantly.

It's true that given CD-ROM distribution it's within the reach of an
individual lab to afford the complete Genbank, and have convenient access to it
(i.e., not on floppies).  But once you start to talk about larger
numbers, the aggregate cost offsets the expense of a centralized facility.

Actually, I don't think distribution medium of the data should be much
of a concern -- more importantly users should concentrate on obtaining
and encouraging the development of more sophisticated software that
will provide the type of access that I'm suggesting in a seamless fashion.
-- 
Robert N. Berlinger                 |Domain: naftoli@aecom.yu.edu        
Supervisor of Systems Support       |UUCP: ...uunet!aecom!naftoli
Scientific Computing Center         |CompuServe: 76067,1114
Albert Einstein College of Medicine |AppleLink: D3913

smith@mcclb0.med.nyu.edu (Ross Smith: (212) 340-5356) (12/27/89)

In article <2693@aecom.yu.edu>, naftoli@aecom.yu.edu (Robert N. Berlinger) writes:
> In article <8912192350.AA01990@net.bio.net>, BAIROCH@cgecmu51.bitnet (Amos Bairoch) writes:
>> I always have the impression that managers of big computer centers tend
>> to ignore what happening in the real world.....
> 
> I agree that CD-ROM is a reasonable distribution medium for the data banks,
> but you seem to imply that it will be cheaper for individuals than a centralized
> storage and access point because everyone can get hold of a 386 and buy
> a CD player for $1000.  That's simply not true if you scale it.
> 
> Suppose there are 30 labs within your institution that need this capability. 
> There's $30,000 just for the players.  Then you need 30 subscriptions to each
> data bank, etc.; you know what I'm talking about.
> 
> Compare that to one subscription stored on a central network
> server which can provide virtual disk access.  Speed will be comparable
> or better than CD-ROM directly, because the data can be stored
> inexpensively (once) on fast hard disks attached to the server.

I strongly agree.  A central system, well managed, provides a cost-effective 
method to distribute the computing needed by a group of investigators, most 
of whom have no interest in computing per se.  More importantly, a central 
service allows users to share expertese  effectively, and to use a central 
consultant who can help ensure that all users can benefit from the latest 
releases of data and software.  Individual investigators generally do not have, 
and do not want, a postdoc spending a large fraction of his or her time 
maintaining a personal data analysis system.

We have a site with about 40 groups and 180ish users.  Most of these groups 
could not afford a standalone system.  Only one or two have the depth in 
their groups to develop and maintain expertese  to run it and keep it up to 
date.  For our site, and our size, a central system is the best solution.

+---------------------------------------------------------------------------+
|Ross Smith, Cell Biology,  NYU Medical Center,  550 First Ave.,  NYC, 10016|
|Phone: (212) 340-5356: FAX: (212) 340-8139 (Alternate NYUMC) (212) 340-7190|
|E-Mail:  SMITH@NYUMED.BITNET (BITNET),  SMITH@MCCLB0.MED.NYU.EDU (Internet)|
+---------------------------------------------------------------------------+

kristoff@genbank.BIO.NET (David Kristofferson) (12/29/89)

> I agree that CD-ROM is a reasonable distribution medium for the data
> banks, but you seem to imply that it will be cheaper for individuals
> than a centralized storage and access point because everyone can get
> hold of a 386 and buy a CD player for $1000.  That's simply not true
> if you scale it.
> 
> Suppose there are 30 labs within your institution that need this
> capability.  There's $30,000 just for the players.  Then you need 30
> subscriptions to each data bank, etc.; you know what I'm talking
> about.
> 
> Compare that to one subscription stored on a central network server
> which can provide virtual disk access.  Speed will be comparable or
> better than CD-ROM directly, because the data can be stored
> inexpensively (once) on fast hard disks attached to the server.
> 
> I also feel that a remote search submission facility does have merit
> in this situation.  Let's face it -- a 386 is only so fast, and
> submitting a job to a search engine could be the way to go.  In a
> network access scheme, this will cut down on network traffic
> significantly.

For those groups that can afford to set up and maintain their own
system, you are definitely correct.  Don't forget that this includes
personnel costs as well as hardware and software.  I should point out
that a central service from GenBank is available to anyone with just a
terminal/PC and a modem over the Telenet public data network.  We
provide *daily* updates to the nucleic acid sequence databases, a
high-speed (80 MIPS soon) computer, and lots of fast hard disk space.
More importantly we have a reliable, tested staff of high quality
systems programmers and support people that do not have to be
duplicated at each site.  When one starts looking at the economics
angle, this definitely makes sense over setting up many local servers.

The drawback is that 1200 or 2400 baud modem connections are not as
responsive as in-house systems, but Telenet is starting to offer 9600
baud service in some locations, the speed of the Internet is improving
(we have a 1.54 Mbps connection), and I am sure that additional speed
improvements will continue in this area.  Unlike the old BIONET
service, the new GenBank On-line Service has **plenty** of computing
power, so the Telenet connection speed remains as our last weak link.
-- 
				Sincerely,

				Dave Kristofferson
				GenBank On-line Service Manager

				kristoff@genbank.bio.net