BCHTANTW%NUSVM@PUCC.PRINCETON.EDU (Tan Tin Wee) (09/29/90)
Recently somebody asked for help on getting random sequences from GENBANK. One of the options suggested was anonymous FTP with the caveat that the netter must be on INTERNET. Even if he is not, it is still possible to do anonymous FTP via BITNET. I frequently use the BITFTP Princeton BITNET FTP server which provides a mail interface to the FTP portion of the IBM TCP/IP running on the Princeton VM system. It allows BITNET (or NetNorth or EARN) users to ftp files from sites on the INTERNET. The load on BITFTP is "often very heavy". For further information, send a mail message "HELP" to BITFTP@PUCC.BITNET or ask M Varian at MAINT@PUCC.BITNET or MAINT@pucc.princeton.edu (INTERNET). Hope it solves your problem if you are not on INTERNET and wish to do anonymous FTP. Sincerely, Tin Wee TAN Dept of Biochemistry National University of Singapore BCHTANTW@NUSVM.BITNET PS. Many thanks to folks who run the show at BITFTP. Wouldn't know what I'd do without this service.
kristoff@genbank.bio.net (David Kristofferson) (09/30/90)
A WORD OF EXTREME CAUTION HERE on trying anonymous FTP using these BITNET servers for GenBank files!!!!! GenBank data bank files are LARGE, i.e. many MEGABYTES. I shudder to think what would happen to BITNET if people all around the world on BITNET started using this mechanism regularly to access GenBank. You would be much better off having us mail you a tape than trying to get GenBank over BITNET as it would hog vital resources. I will add the caveat that this is based on second hand information about other BITNET disasters of which I have heard, so perhaps someone with more direct experience using these things should respond. My understanding is that this service may be great for getting small programs from places like SIMTEL-20, but would be a disaster trying to retrieve 10+ megabyte files. Anyone else like to comment? -- Sincerely, Dave Kristofferson GenBank Manager kristoff@genbank.bio.net
gilbertd@silver.ucs.indiana.edu (Don Gilbert) (09/30/90)
I would guess trying to pump megabyte-sized files thru bitnet is asking for trouble. It is a strain on Internet traffic using FTP to transfer the 50+ megabyte releases of Genbank, and the ability of Internet to transfer data is orders of magnitude greater than Bitnet. GenBank already provides two very handy services thru e-mail that should suffice for most individual users needs to have access to the most recent GenBank data: the FastA search and the retrieval of individual sequences. I also suggest that the most economical and useable way to receive quarterly updates of the full Genbank is to subscribe to the CD-Rom release. CD-Rom drives are inexpensive ($500-1000), can be put on about any microcomputer or workstation or vax. GenBank now sells their CDs for an annual subscription of $300 (for 4 releases). This cost should drop as more users subscribe, as the actual media cost of compact disks is quite low if enough copies are pressed. For instance, Apple Computer now prefers to release software on a CD rather than two or more floppies, as the cost to them is less (e.g., $1 or so). While the cost of using even FTP thru Internet may not be seen by you directly as an end user, we are paying for it thru federal taxes. Don.Gilbert@Iubio.Bio.Indiana.Edu biocomputing office, indiana univ., bloomington, in 47405, usa
OLIVER@calstate.bitnet (OLIVER SEELY) (09/30/90)
There is another problem associated with using FTP (help me out, Dave, I'm not sure what anonymous FTP is), to retrieve MEGABYTE files. I tried to transfer the primate sequence file first to an ELXSI computer and when I was unsuccessful, I tried to transfer it to my account on our central Cyber 960. Well, in both cases I exceeded my memory allocation but in one (sorry, I cannot remember which) I was unable to login again until one of the systems analysts "unfroze" the problem. FTPs advantage (in my mind at least) lies in its simplicity of use, but as Dave writes there are some serious problems such as tying up valuable resources. Oh, yeah, I forgot to write that the transfer process had gone on for about 20 minutes before it bombed. Oliver Seely CSU Dominguez Hills
usenet@nlm.nih.gov (usenet news poster) (09/30/90)
In article <9009290610.AA23614@genbank.bio.net> BCHTANTW%NUSVM@PUCC.PRINCETON.EDU (Tan Tin Wee) writes: > >Recently somebody asked for help on getting random sequences from >GENBANK. One of the options suggested was anonymous FTP with the >caveat that the netter must be on INTERNET. >Even if he is not, it is still possible to do anonymous FTP via >BITNET. ... >Tin Wee TAN Even if it is possible to hack a path from E-mail into FTP, in the long run your best solution is to get on INTERNET. There is already a dramatic difference in transfer speed between INTERNET and BITNET, and this is only going to get worse. BITNET cannot handle files larger than 25kbytes without segmenting them into chunks, but 25k is pretty small (even a 2 page paper with some graphics will exceed this). The network protocol used on INTERNET, TCP/IP is more reliable than modems, and once your site is on line, much less hassle. Perhaps most important, FTP is only one of many services available via INTERNET. Until you have a network route which supports TCP/IP, you will not be able to use remote shell, remote procedure call, socket communication, and other services available through INTERNET. The importance of these other services is that they give you much greater flexibiity to optimize your communications and computation. For example, rather than attempt to maintain the current, most up to date version of a database on your local machine, you could use a relatively simple local program to send your search queries via the net to an up to date server, perhaps in another city. You save the expense and aggravation of attempting to maintain an up to date local database copy, and the net saves the traffic of sending the whole database. With TCP/IP, the turn around is still interactive rather than hours or days as in FASTA-mail. The arguement that "Internet is not free, we pay for it one way or another through taxes" is true on some level, but, in my view, misleading. The communication of scientific data between academic research groups is precisely what INTERNET was created to do. It is one of the support mechanisms the the US government provides to encourage research and development. Saying "Don't use INTERNET because we will all pay for it in our taxes" is like saying "Don't write grant applications because ..." David States states@ncbi.nlm.nih.gov National Center for Biotechnology Information National Library of Medicine
BCHTANTW%NUSVM@PUCC.PRINCETON.EDU ("T.W.Tan") (09/30/90)
I would agree with Don Gilbert and Dr Kristofferson that pumping megasize files through the network is not on. I only use BITFTP to get much smaller files eg public domain programs or single sequence files and certainly would *NOT* recommend anyone to try getting the whole lot by ftp through BITNET or otherwise. Apologies if I have conveyed the wrong impression. Tin Wee TAN Dept of Biochemistry Natl. U. of Singapore
kristoff@genbank.bio.net (David Kristofferson) (10/03/90)
David States gave an excellent description of the advantages of the Internet over BITNET, and I would heartily second the fact that sites should get on the Internet. Unfortunately it often takes some time, money, and effort for this to occur. I suggest that if someone at your campus is not already working on getting an Internet style network connection, then they should begin ***immediately*** before the data problem reaches overwhelming proportions. However there was one statement made in Mr. States' message which was less than accurate. > With TCP/IP, the turn around is still interactive rather than hours or > days as in FASTA-mail. As many of our readers know, FASTA-MAIL is a GenBank service. As our readers who have USED THE SERVICE also know, the turnaround on FASTA-MAIL, while not "interactive," is very fast, on the order of minutes, not "hours or days." I recently did a demonstration of the service in Mr. States' back yard at the NIH and got the results of my search back in about ten minutes. During this time my terminal was freed for other aspects of the demo, instead of sitting "interactively" looking at a "Working ..." message. Because many biologists still do not have Internet connections, FASTA-MAIL provides a needed service to them. We are also working on providing access by e-mail to the newer BLAST program which was developed at NCBI and appears to be a faster search algorithm. Another point that needs clarification: > You save the expense and aggravation of attempting to maintain > an up to date local database copy, and the net saves the traffic > of sending the whole database. GenBank's goal *is* to allow remote sites to have their own local copies of the database in a relational database management system and to have the local copies updated over the network, not by sending megabyte size files, but instead by providing sites with an initial copy of the database and then by sending "transactions" which automatically update individual entities in the local copies every time the master copy is changed. The software to provide these transactions to remote sites is currently undergoing testing and more will be announced about this later. While it is true that for things like FASTA searches it is a waste to maintain a local copy, I have heard enough comments from the community over the last several years that indicate that the desired set-up is to have a local copy for more specialized applications, but also to have access to a powerful remote facility for offloading routine, but CPU-intensive searches. Although I have personally managed centralized time-sharing services such as BIONET, it appears to be the case that these systems are not the wave of the future except for specialized applications. Right now remote database searching can be done for free on the GenBank On-line Service via FASTA-MAIL or interactively over the Internet or SprintNet by GOS account holders. So much for specifics, but now for a more general and much more important statement about Mr. States' remark about FASTA-MAIL. As I mentioned in a recent posting on BIONEWS, there are many discussions going on right now in "high places" related to the future of bio-computing, particularly as it impacts the Genome Project. The National Center for Biotechnology Information where David works is a key player in these debates and will be the agency that oversees the next GenBank contract which will start in 1992. One would hope that, given NCBI's important role, public statements by its employees should be very carefully considered and based on fact, not on distortions. If there is a better way of doing things, then it should be perfectly possible to demonstrate it by setting up and successfully running a service. NCBI has already provided us with some fine software such as IRX and BLAST, so I do not doubt their talents in software development. However, I sincerely hope that we will evolve into the future in this fashion **** rather than by attempting to put down existing systems through the spread of misinformation ****. GenBank has unfortunately been an easy target to shoot at because the first five year contract underestimated the size of the task, and the resulting lack of funds led to a tremendous data backlog. This backlog has been largely eliminated during the second five year contract and the NIH GenBank advisors commended both LANL and IntelliGenetics for their progress at the last advisory meeting. Word of this progress is slow to get out unfortunately and complaints are always remembered much longer than compliments. One can also still find responsible people quoting outdated GenBank backlog statistics in print. You have my solemn word that if flaws are pointed out we will OPENLY either attempt to correct them to the best of our ability or step aside if the system is so structurally flawed that an entirely new attempt is needed. However you may also rest assured that I will vigorously respond to any attempt at distortion of the facts. It is always easy to tear down through distortion, but this is not the kind of tactic that one would expect from those who are really professional and who really have better ways of doing things. Their results should be able to speak for themselves. I also suggest that the community pay close attention to any services offerred and provide their feedback ** before ** decisions are made. *** In the end, it will be the users who will be left with the results. *** Given the amount of data projected to be generated by the Genome Project a mistake made now would make the backlog of the initial GenBank attempt appear miniscule by comparison. Unfortunately the users are often the last to react because they are not brought in to the decision loop. I have argued before, and will do so again, that electronic newsgroups can be a new element in this review process. Although the decision must ultimately be the responsibility of a single person or small group, the technology nows exists to easily sample a wide range of opinion. Why not take advantage of this, particularly when so much is at stake? Why not utilize the collective experience residing on the net? Currently we have "developers meetings" where people are asked to digest a large amount of new information in the course of a day. Why not do this over the net so that people can react more intelligently than in a one day jet-lagged haze? After all, scientists are supposed to be progressive, right? ... right? -- Sincerely, Dave Kristofferson GenBank Manager kristoff@genbank.bio.net
JS05STAF@MIAMIU.BITNET (Joe Simpson) (10/03/90)
BITNET usage guidelines suggest that no file should be larger that 3000 lines of 80 column records. Many genbank data sets are much larger than this. Please use BITFTP only for files that approximate the BITNET usage guidelines.
usenet@nlm.nih.gov (usenet news poster) (10/04/90)
In response to my comment: >> With TCP/IP, the turn around is still interactive rather than hours or >> days as in FASTA-mail. kristoff@genbank.bio.net (David Kristofferson) writes: >[...] However there was one statement made in Mr. States' message >which was less than accurate. > >As many of our readers know, FASTA-MAIL is a GenBank service. As our >readers who have USED THE SERVICE also know, the turnaround on >FASTA-MAIL, while not "interactive," is very fast, on the order of >minutes, not "hours or days." I recently did a demonstration of the >service in Mr. States' back yard at the NIH and got the results of my >search back in about ten minutes. The turn around time for FASTA-MAIL is dependent on both the response time of the server itself and the network handling the electronic mail transactions. My comments were made in the context of a discussion of the latter. As Dave Kristofferson points out, at a site like NIH, where E-mail is handled by INTERNET, the turn around time can be pretty good. There are, however, sites where E-mail processing is handled as a low priority batch process. If you are dependent on one of those sites, you may see a dramatic improvement in E-mail mediated services by getting onto INTERNET. David Kristofferson then goes on to say: >So much for specifics, but now for a more general and much more >important statement about Mr. [sic] States' remark about FASTA-MAIL. > >As I mentioned in a recent posting on BIONEWS, there are many >discussions going on right now in "high places" related to the future >of bio-computing, particularly as it impacts the Genome Project. The >National Center for Biotechnology Information where David works is a >key player in these debates and will be the agency that oversees the >next GenBank contract which will start in 1992. One would hope that, >given NCBI's important role, public statements by its employees should >be very carefully considered and based on fact, not on distortions. >If there is a better way of doing things, then it should be perfectly >possible to demonstrate it by setting up and successfully running a >service. NCBI has already provided us with some fine software such as >IRX and BLAST, so I do not doubt their talents in software >development. However, I sincerely hope that we will evolve into the >future in this fashion **** rather than by attempting to put down >existing systems through the spread of misinformation ****. We appreciate the compliment on our software, but I really don't feel that a comment on the relative technological merits of electronic mail handling should be construed as "putting down an existing organization existing systems through the spread of misinformation". Readers of this news group are quite able to assess response time at their own sites, and the global handling of E-mail is beyond the control of GenBank in any case. >GenBank has unfortunately been an easy target to shoot at because the >first five year contract underestimated the size of the task [...] >One can also still find responsible people quoting outdated GenBank >backlog statistics in print. I did not quote any statistics on the GenBank backlog in my posting (current or previous). >You have my solemn word that if flaws are pointed out we will OPENLY >either attempt to correct them to the best of our ability or step >aside if the system is so structurally flawed that an entirely new >attempt is needed. However you may also rest assured that I will >vigorously respond to any attempt at distortion of the facts. It is >always easy to tear down through distortion, but this is not the kind >of tactic that one would expect from those who are really professional >and who really have better ways of doing things. Their results should >be able to speak for themselves. > >I also suggest that the community pay close attention to any services >offerred and provide their feedback ** before ** decisions are made. > >*** In the end, it will be the users who will be left with the results. *** > >Given the amount of data projected to be generated by the Genome >Project a mistake made now would make the backlog of the initial >GenBank attempt appear miniscule by comparison. Unfortunately the >users are often the last to react because they are not brought in to >the decision loop. I have argued before, and will do so again, that >electronic newsgroups can be a new element in this review process. >Although the decision must ultimately be the responsibility of a >single person or small group, the technology nows exists to easily >sample a wide range of opinion. Why not take advantage of this, >particularly when so much is at stake? Why not utilize the collective >experience residing on the net? Currently we have "developers >meetings" where people are asked to digest a large amount of new >information in the course of a day. Why not do this over the net so >that people can react more intelligently than in a one day jet-lagged >haze? > >After all, scientists are supposed to be progressive, right? ... right? It is apparent that my posting elicited some rather stongly held feelings. I think both Dave and I agree that the biomedical research community is going to face some quite significant information handling challenges in the near future, and that we would all be best served by the rational use of available technology. Electronic mail and networks are clearly a part of this so let's avoid unnecessary personal flames. A lively and open discussion depends on all parties feeling free to post their opinions. >-- > Sincerely, > > Dave Kristofferson > GenBank Manager > > kristoff@genbank.bio.net David J. States, M.D.,Ph.D. Senior Staff Fellow National Center for Biotechnology Information National Library of Medicine states@ncbi.nlm.nih.gov
kristoff@genbank.bio.net (David Kristofferson) (10/04/90)
My earlier response to Dr. States' message, while possibly appearing to be personal, was actually the result of the straw being dropped on the camel's back. I apologize to David that he was in the unfortunate position of having dropped that straw. His reply was a model of forebearance. However, what I am concerned about is not the attitude of any one individual, but a sequence of events which is continuing to occur, mainly outside of these newsgroups, of which this one small incident is just the latest manifestation. If this was an isolated statement I would never have taken the time to reply in the intensity or at the length that I did. I agree completely that "a lively and open discussion depends on all parties feeling free to post their opinions," but one can only compromise so far for the sake of gentility. I intend to be "gentlemanly" only as long as advantage is not taken of that forebearance, as has occurred in my past experiences. I emphasize that this is not an personal accusation against Dr. States, but a general pronouncement to all concerned. In the context of a simple comment about FASTA-MAIL, I realize that the readership may think that these words may border on the absurd, but I can assure you that much larger computing issues are on the agenda. As I stated in my last message, our real concern here is with issues to be decided in the next two years that will affect the future direction of biological computing. Now to return to details and comment on the following: > The turn around time for FASTA-MAIL is dependent on both the response > time of the server itself and the network handling the electronic mail > transactions. My comments were made in the context of a discussion of > the latter. As Dave Kristofferson points out, at a site like NIH, > where E-mail is handled by INTERNET, the turn around time can be pretty > good. There are, however, sites where E-mail processing is handled as > a low priority batch process. If you are dependent on one of those > sites, you may see a dramatic improvement in E-mail mediated services > by getting onto INTERNET. > There is no doubt that e-mail transfer would be faster from our site to other Internet sites rather than to, e.g., JANET, but it is also true that sites which are not on the Internet would be completely cut off from using an interactive, TCP/IP based system. FASTA-MAIL is accessible to people on virtually any major network. I will also suggest that the effect of e-mail delays on other networks may not always be as great as suggested above. While it is the case that the transatlantic BITNET gateways have become extremely congested at times, I have received reports of excellent FASTA-MAIL turnaround times from many non-Internet sites, not only in the U.S. but overseas as well. For example, in the U.K. a user on JANET reported a 15 minute turnaround time to get the results back of a protein search. I invite others to tell us of their experiences, either positive or negative. Of course, if the network or our machine goes down, a delay would occur until the access was restored, but note that this would affect both e-mail and interactive access equally. The facts are the following. Our computer takes about ten minutes (plus or minus one depending on the load) to do a search of 1000 bases against all of Genbank 64. It takes about 1.5 minutes to do a search of 1000 amino acids against all of SWISS-PROT 14. FASTA provides the search time as part of its output. Any additional time is due to transit. Users can utilize this information themselves to decide if the network is unduly affecting them. I have never received any evidence that FASTA-MAIL tooks "days" unless there was an extreme malfunction on the network or with the computer system. I suggest that in terms of turnaround time the difference is not a big issue. On the other hand, if someone wanted to design some nice user-friendly software that would reside on a PC or Mac (which were, e.g., etherneted in to a gateway to the network) and provide an interactive interface to FASTA or BLAST on another machine with a comprehensive up-to-date database somewhere else on the Internet (which is what I assume Dr. States is alluding to), this would be a great help to people who have PC's and Mac's with such a network connection. Such a suggestion could be made without having to mention FASTA-MAIL which is not obsoleted by such software. Had the suggestion been made in this context, much of this discussion would have been kept on a lower key. I would also venture to suggest that this idea has not occurred in only one place. As usual, the question of who does it usually gets down to either who has the funding or who expects to be able to make money by selling it. No one place has the sole monopoly on the talent. -- Sincerely, David Kristofferson, Ph.D. GenBank Manager kristoff@genbank.bio.net
harper@csc.fi (Rob Harper (Supercomputer Centre Finland)) (10/04/90)
In article <Oct.3.16.58.11.1990.195@genbank.bio.net>, kristoff@genbank.bio.net (David Kristofferson) writes: > There is no doubt that e-mail transfer would be faster from our site > to other Internet sites rather than to, e.g., JANET, but it is also > true that sites which are not on the Internet would be completely cut > off from using an interactive, TCP/IP based system. FASTA-MAIL is > accessible to people on virtually any major network. I would suggest that people in Europe could use the similar service from EMBL. There is a very nice DCL script for VMS called FASTEMBL which can can take your sequence, convert it to UPPERCASE, modify it to STADEN format, and perpare a file to be sent to EMBL. I have tried sending the same sequence to EMBL with FASTEMBL and to GENBANK with FAMAIL... ( a similar script for Genbank) the results I can get back from EMBL within an hour... the Genbank results usually arrive the next day... I have never had to wait days for a reply. Scientists should learn to utilize the resource that is closest to home. Rob " it's not where you're from... it's where you're at... Harper