BROE@AARDVARK.UCS.UOKNOR.EDU (Bruce Roe) (03/18/90)
Fellow netters, sequencers and assorted hacks... Some very interesting stuff has recently been posted regarding USENET and making computers easier to use for us poor, dumb, clones and sequencers. The message from gilbertd@iubio.bio.indiana.edu Message-Id: <9003162052.AA04441@genbank.bio.net> Containing "A brief guide to installing and reading Internet news from a networked Macintosh" was extremely useful and includes: >>Macs that are connected to a network that includes tcp/ip links to the Internet >>can be set up to let you easily read and post network news, including the >>bionet.* newsgroups for research biologists. The netnews reader stack is >>easy to use, but requires some network knowledge to install. >>Requirements: >>* Macintosh with an appletalk or ethernet connection >>* MacTCP software to provide the Mac with the tcp/ip communications link >>* Harry Chesley's netnews reader hypercard stack >>* A local area NNTP netnews server computer and the message from genbank-bb@mcclb0.med.nyu.edu Message-Id: <9003162115.AA05606@genbank.bio.net> >>In order to use this data feed, software is needed to automatically extract >>the news items (sequences) and load them into a database usable by the local >>bank search software. Two groups, one at the Public Health Research >>Institute and the other at New York University Medical Center are >>collaborating with Genbank to provide software to furnish this service. We >>expect to publish a short account of it in the near future. are a great start on a system to provide first both a user friendly interface and easily accessible network news, and second an automated method for immediate database updates. I do have some comments regarding the above and alternative approaches to the problems they address. I just put up Dr. Clark's FAMAIL shells and they are fantastic. This brings up the question, why should we clutter up our disk drives with all the databases when GenBank has them easily accessible via the Internet? Is it because we want them here and do not want to depend on a network?? I guess that's why. I'd love to have daily database updates on our VAX and take 5x more CPU time to search the databases via WORDSEARCH or FASTA and bring every other user on our VAX to a screeching halt while I search. That gives me power. (is sarcasm allowed on the net Dave??). Seriously though, as the databases grow in size, maintaining them locally is going to be very difficult and searching them locally is going to be very very slow. So rather than having all of us deal with these databases locally, why doesn't the NIH think of funding various sites located nation-wide to be mini-genbanks with the appropriate access and searching programs? I think this is called *distributive computing* and it makes more sense to me than creating a local nightmare. What most users want is the ANSWER to the question they are trying to ASK. It is our responsibility to provide the tools to solve the complex questions we are addressing by large scale sequencing and not create a situation worse than what we now have. The idea is that with several computers containing the databases, we could access one central server (maybe not the correct word) and our search would be distributed to the least used computer at that time and/or search several computers each containing a portion of the database. This idea is not new with me but is one which we should be re-addressing. Distributive computing and parallel processing may provide a better and faster way of allowing us to obtain the results. As for making computer use easier. I think it's great that two of my former students (Leslie and Chris Dow) should be advocates for this. Both hated my Macs and loved PC's but that's another story. It is, as both Leslie and Chris have pointed out, very important for users to have an understandable, easy to use, intuitive interface. The vast majority of users do not care what computer is running what or what operating system is being used and all this should be transparent to the user. Well, that's it from the home of the number one basketball team in the USA. In the words of a previous president of the University of Oklahoma, "Lets create an academic environment our football (and now our basketball) team will be proud of..." Best to one and all, Bruce A. Roe Professor of Chemistry and Biochemistry University of Oklahoma, Norman, OK 73019
kristoff@genbank.BIO.NET (David Kristofferson) (03/20/90)
From Prof. Bruce Roe: > I just put up Dr. Clark's FAMAIL shells and they are fantastic. So I have heard from others. The purpose of these is described further below along with some other important observations. > This brings up the question, why should we clutter up our disk drives > with all the databases when GenBank has them easily accessible via the > Internet? Is it because we want them here and do not want to depend Please note that users on BITNET, EARN, NETNORTH, JANET, etc. can also access the databases on ther GenBank computer. > on a network?? I guess that's why. I'd love to have daily database > updates on our VAX and take 5x more CPU time to search the databases > via WORDSEARCH or FASTA and bring every other user on our VAX to a > screeching halt while I search. That gives me power. (is sarcasm > allowed on the net Dave??). > > Seriously though, as the databases grow in size, maintaining > them locally is going to be very difficult and searching them locally > is going to be very very slow. So rather than having all of us deal > with these databases locally, why doesn't the NIH think of funding > various sites located nation-wide to be mini-genbanks with the appropriate > access and searching programs? I think this is called *distributive > computing* and it makes more sense to me than creating a local nightmare. > Frankly, Bruce, I am very much in agreement with you on this point. We (the GenBank On-line Service or GOS for short) have been providing these alternate means of database distribution simply because the demand is there for them and because it does not require much effort to do. HOWEVER, each time anyone has approached me with a new distribution scheme, I have always asked them the question: "WHY DO YOU WANT TO WANT TO WASTE YOUR DISK SPACE AND CPU POWER AND HAVE TO KEEP ON TOP OF THE UPDATE SITUATION DAILY WHEN THE DATABASES ARE MAINTAINED ON THE GENBANK ON-LINE SERVICE AND ACCESS TO THEM OVER THE NETWORK IS **FREE** FOR FASTA SEARCHING AND SEQUENCE ENTRY RETRIEVAL??? The NIH has funded a high speed computer with lots of disk space at GOS precisely for this purpose!!!" The only answer that seems to be valid is if someone needs the entire database present and updated each day locally for some kind of analysis other than FASTA or IRX searching. However, I believe that many sites may just latch on to local maintenance because **it is possible for the systems manager to do**. The usage will probably turn out to be 90%+ for FASTA searches anyway and the NIH will be continually faced with requests for more disks on which to store the data. When the database assumes much larger dimensions than it has currently, this may obviously not be the right way to proceed. It makes more sense economically for the NIH to provide easier access to the database by providing just enough computing power to enable people to get their jobs done expeditiously. Currently, the existing 80 MIPS Solbourne computer at the GenBank On-line Service (GOS) is more than up to this task. At some point this computer will become inadequate and the NIH will have to expand the available power. Again the most economical way to do this will be to set up "mini-GenBanks," as Dr. Roe proposed, by using hardware that is capable of doing the job and is already in place, if possible (I obviously am arguing *against* my own self-interest here since I could just say "give us more money and we'll solve the job here."). Possibly these additional sites could be located at the various Genome Centers under consideration for funding. Dr. Roe also mentioned Steve Clark's scripts for use with GOS. Although I have not seen them myself (we use Suns/Solbournes at GOS, not VAXen), I understand that they are for the VAX something similar to what we had on BIONET, namely a simple interface that appropriately constructs the required mail message for the GOS FASTA Server and then sends it off automatically. This is a straightforward program that relieves the user of having to compose the e-mail submission in the precise server format and greatly simplifies the process. Mailing of the message to the appropriate server address is also automated. Basically all that the user need do is answer a couple of prompts about the file containing his/her query sequence, what database they wish to search, and what parameter settings they wish to use. The search is then sent off to GenBank automatically, the GOS computer reads the message, runs the search automatically, and sends back the scores and requested alignments. Finally the user can then access the entry retrieval server to return anything of interest found during the search. I have run tests of the GOS systems from other machines on the Internet. The transit time for the mail is very short (a couple of minutes back and forth and sometimes less) and the time that it takes to search a 1000 base query against all of GenBank rel. 62 at ktup=4 was about 21-22 minutes! For the majority of users, this will probably be sufficient. It will also keep their local computer freed up for less compute-intensive tasks!! On the other hand, if everyone wants to tie up their machine's CPU's and disks with FASTA searches of the GenBank databank, we at GenBank do not have the right to refuse them this privilege 8-)!! If I were in charge of a local computer, I would make absolutely certain that there is a legitmate need other than FASTA and IRX searching *before* I took steps to maintain a constantly updated version of GenBank locally!! Users, consider yourselves forewarned. -- Sincerely, Dave Kristofferson GenBank On-line Service Manager kristoff@genbank.bio.net