[bionet.molbio.genome-program] USENet and GenBank Updates

BROE@AARDVARK.UCS.UOKNOR.EDU (Bruce Roe) (03/18/90)

Fellow netters, sequencers and assorted hacks...

Some very interesting stuff has recently been posted regarding USENET and
making computers easier to use for us poor, dumb, clones and sequencers.

The message from gilbertd@iubio.bio.indiana.edu 
Message-Id: <9003162052.AA04441@genbank.bio.net>

Containing "A brief guide to installing and reading Internet news from
 a networked Macintosh" was extremely useful and includes:

>>Macs that are connected to a network that includes tcp/ip links to the Internet
>>can be set up to let you easily read and post network news, including the
>>bionet.* newsgroups for research biologists.  The netnews reader stack is
>>easy to use, but requires some network knowledge to install.
>>Requirements:
>>*  Macintosh with an appletalk or ethernet connection
>>*  MacTCP software to provide the Mac with the tcp/ip communications link
>>*  Harry Chesley's netnews reader hypercard stack
>>*  A local area NNTP netnews server computer

and the message from genbank-bb@mcclb0.med.nyu.edu
Message-Id: <9003162115.AA05606@genbank.bio.net>

>>In order to use this data feed, software is needed to automatically extract
>>the news items (sequences) and load them into a database usable by the local
>>bank search software.  Two groups, one at the Public Health Research
>>Institute and the other at New York University Medical Center are
>>collaborating with Genbank to provide software to furnish this service.  We
>>expect to publish a short account of it in the near future.

are a great start on a system to provide first both a user friendly interface
and easily accessible network news, and second an automated method for
immediate database updates.

I do have some comments regarding the above and alternative approaches
to the problems they address.

   I just put up Dr. Clark's FAMAIL shells and they are fantastic.
This brings up the question, why should we clutter up our disk drives
with all the databases when GenBank has them easily accessible via the
Internet?  Is it because we want them here and do not want to depend
on a network??  I guess that's why.  I'd love to have daily database
updates on our VAX and take 5x more CPU time to search the databases
via WORDSEARCH or FASTA and bring every other user on our VAX to a
screeching halt while I search.  That gives me power. (is sarcasm
allowed on the net Dave??).

   Seriously though, as the databases grow in size, maintaining
them locally is going to be very difficult and searching them locally
is going to be very very slow.  So rather than having all of us deal 
with these databases locally, why doesn't the NIH think of funding
various sites located nation-wide to be mini-genbanks with the appropriate
access and searching programs?  I think this is called *distributive
computing* and it makes more sense to me than creating a local nightmare.

   What most users want is the ANSWER to the question they are
trying to ASK.  It is our responsibility to provide the tools to solve
the complex questions we are addressing by large scale sequencing and
not create a situation worse than what we now have.
 
  The idea is that with several computers containing the databases,
we could access one central server (maybe not the correct word) and
our search would be distributed to the least used computer at that
time and/or search several computers each containing a portion of the
database.  This idea is not new with me but is one which we should be
re-addressing.  Distributive computing and parallel processing may
provide a better and faster way of allowing us to obtain the results.

   As for making computer use easier. I think it's great that two of
my former students (Leslie and Chris Dow) should be advocates for this.
Both hated my Macs and loved PC's but that's another story.  It is,
as both Leslie and Chris have pointed out, very important for users
to have an understandable, easy to use, intuitive interface. The vast
majority of users do not care what computer is running what or what
operating system is being used and all this should be transparent to
the user.

Well, that's it from the home of the number one basketball team in the
USA.  In the words of a previous president of the University of Oklahoma,
"Lets create an academic environment our football (and now our basketball)
team will be proud of..."

Best to one and all,
Bruce A. Roe
Professor of Chemistry and Biochemistry
University of Oklahoma, Norman, OK 73019

kristoff@genbank.BIO.NET (David Kristofferson) (03/20/90)

From Prof. Bruce Roe:

>    I just put up Dr. Clark's FAMAIL shells and they are fantastic.

So I have heard from others.  The purpose of these is described
further below along with some other important observations.

> This brings up the question, why should we clutter up our disk drives
> with all the databases when GenBank has them easily accessible via the
> Internet?  Is it because we want them here and do not want to depend

Please note that users on BITNET, EARN, NETNORTH, JANET, etc. can also
access the databases on ther GenBank computer.

> on a network??  I guess that's why.  I'd love to have daily database
> updates on our VAX and take 5x more CPU time to search the databases
> via WORDSEARCH or FASTA and bring every other user on our VAX to a
> screeching halt while I search.  That gives me power. (is sarcasm
> allowed on the net Dave??).
>
>    Seriously though, as the databases grow in size, maintaining
> them locally is going to be very difficult and searching them locally
> is going to be very very slow.  So rather than having all of us deal 
> with these databases locally, why doesn't the NIH think of funding
> various sites located nation-wide to be mini-genbanks with the appropriate
> access and searching programs?  I think this is called *distributive
> computing* and it makes more sense to me than creating a local nightmare.
> 

Frankly, Bruce, I am very much in agreement with you on this point.
We (the GenBank On-line Service or GOS for short) have been providing
these alternate means of database distribution simply because the
demand is there for them and because it does not require much effort
to do.

HOWEVER, each time anyone has approached me with a new distribution
scheme, I have always asked them the question:

"WHY DO YOU WANT TO WANT TO WASTE YOUR DISK SPACE AND CPU POWER AND
HAVE TO KEEP ON TOP OF THE UPDATE SITUATION DAILY WHEN THE DATABASES
ARE MAINTAINED ON THE GENBANK ON-LINE SERVICE AND ACCESS TO THEM OVER
THE NETWORK IS **FREE** FOR FASTA SEARCHING AND SEQUENCE ENTRY
RETRIEVAL???  The NIH has funded a high speed computer with lots of
disk space at GOS precisely for this purpose!!!"

The only answer that seems to be valid is if someone needs the entire
database present and updated each day locally for some kind of
analysis other than FASTA or IRX searching.  

However, I believe that many sites may just latch on to local
maintenance because 

	  **it is possible for the systems manager to do**.

The usage will probably turn out to be 90%+ for FASTA searches anyway
and the NIH will be continually faced with requests for more disks on
which to store the data.  When the database assumes much larger
dimensions than it has currently, this may obviously not be the right
way to proceed.  It makes more sense economically for the NIH to
provide easier access to the database by providing just enough
computing power to enable people to get their jobs done expeditiously.

Currently, the existing 80 MIPS Solbourne computer at the GenBank
On-line Service (GOS) is more than up to this task.  At some point
this computer will become inadequate and the NIH will have to expand
the available power.

Again the most economical way to do this will be to set up
"mini-GenBanks," as Dr. Roe proposed, by using hardware that is
capable of doing the job and is already in place, if possible (I
obviously am arguing *against* my own self-interest here since I could
just say "give us more money and we'll solve the job here.").
Possibly these additional sites could be located at the various Genome
Centers under consideration for funding.

Dr. Roe also mentioned Steve Clark's scripts for use with GOS.
Although I have not seen them myself (we use Suns/Solbournes at GOS,
not VAXen), I understand that they are for the VAX something similar
to what we had on BIONET, namely a simple interface that appropriately
constructs the required mail message for the GOS FASTA Server and then
sends it off automatically.  This is a straightforward program that
relieves the user of having to compose the e-mail submission in the
precise server format and greatly simplifies the process.  Mailing of
the message to the appropriate server address is also automated.
Basically all that the user need do is answer a couple of prompts
about the file containing his/her query sequence, what database they
wish to search, and what parameter settings they wish to use.  The
search is then sent off to GenBank automatically, the GOS computer
reads the message, runs the search automatically, and sends back the
scores and requested alignments.  Finally the user can then access the
entry retrieval server to return anything of interest found during the
search.  

I have run tests of the GOS systems from other machines on the
Internet.  The transit time for the mail is very short (a couple of
minutes back and forth and sometimes less) and the time that it takes
to search a 1000 base query against all of GenBank rel. 62 at ktup=4
was about 21-22 minutes!

For the majority of users, this will probably be sufficient.  It will
also keep their local computer freed up for less compute-intensive
tasks!!  On the other hand, if everyone wants to tie up their
machine's CPU's and disks with FASTA searches of the GenBank databank,
we at GenBank do not have the right to refuse them this privilege
8-)!!

If I were in charge of a local computer, I would make absolutely
certain that there is a legitmate need other than FASTA and IRX
searching *before* I took steps to maintain a constantly updated
version of GenBank locally!!  Users, consider yourselves forewarned.
-- 
				Sincerely,

				Dave Kristofferson
				GenBank On-line Service Manager

				kristoff@genbank.bio.net