IO00865@MAINE.BITNET (09/21/90)
Hello, I'm new to the net. I was wondering if it is possible to aquire genebank sequences through the net. I'm looking specifically for Potato virus X and Potato virus M sequences and possible analysis software. Ethan Strauss IO00865@MAINE.BITNET
kristoff@genbank.bio.net (David Kristofferson) (09/24/90)
Here are the instructions for using the GenBank entry retrieval server: ---------------------------------------------------------------------- Database entries can be retrieved by either locus name or accession number. To use the GenBank Retrieval System, send an electronic message to RETRIEVE@GENBANK.BIO.NET containing as text (leave the Subject: line blank) either an accession number, or an entry name, but not both. The message text should contain exactly one word. The data banks are searched in the order: GenBank New Data, GenBank current release, EMBL New Data, EMBL current release, GenPept New Data, GenPept current release, and Swiss-Prot until a match is found. If an entry exists in both GenBank and EMBL with the same accession number (the usual case), a query on the accession number will return the GenBank version of the entry. If the EMBL-format version is required, it can be retrieved from the file server at NETSERV@EMBL.BITNET (for instructions send a message containing the line HELP to that address). An electronic version of the sequence data submission form used by the sequence data banks is also available through the RETRIEVE server. To receive a copy, send a message containing the word DATASUB as the only line. Instructions for completing and submitting the form are included. If you have any questions or comments, feel free to mail them to RETRIEVE-REQUEST@GENBANK.BIO.NET. -- Sincerely, Dave Kristofferson GenBank Manager kristoff@genbank.bio.net
toms@fcs260c2.ncifcrf.gov (Tom Schneider) (09/24/90)
In article <90264.125319IO00865@MAINE.BITNET> IO00865@MAINE.BITNET writes: > I was wondering if it is possible to aquire genebank sequences through the > net. Ethan Strauss IO00865@MAINE.BITNET Yes. Use ftp: ftp genbank.bio.net anonymous (any string) ls cd pub/db/gb-rel61 (or whichever is most recent release) get (whatever) for more details contact: David Benton GenBank Manager 415-962-7360 benton@genbank.ig.com Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov
kristoff@GENBANK.BIO.NET (Dave Kristofferson) (09/25/90)
FTP is only available to people who have computers connected to the Internet. It is not clear whether or not the person who posted the query has Internet access since the posting came from a BITNET address. The e-mail server works from virtually any mail address as long as the sender's From: line has a recognizable return address. Unfortunately some people still set their mailers up as joe@foo and expect the outside world to know where "foo" is. Retrieval server users should use either a domain style address, e.g., joe@foo.nyu.edu or append .bitnet to their BITNET address, e.g., use joe@CUNYVM.BITNET instead of joe@CUNYVM. In the event that FTP access is available to the person who posted the question, more complete instructions are included below. On another note Dave Benton is moving shortly to the NIH Genome Office. Queries about GenBank should not be addressed to individuals since he/she might be on the road at any time. Please send GenBank questions either to this newsgroup or to genbank@genbank.bio.net. The main GenBank phone number is 415-962-7364. Sincerely, Dave Kristofferson GenBank Manager kristoff@genbank.bio.net ---------------------------------------------------------------------- Example of FTPing new weekly data updates from GenBank (last update 2/20/90 - D.K.) ********************************************************************** In the following example NET<n> is the Unix prompt on the computer used for this example. Your computer's system prompt will be different. Details of the ftp protocol may also be different on your local computer, so please consult your local systems manager if the following commands are not recognized on your local computer. Comments in the example below are set off by ***'s as here or by ;'s at the end of a line. Input is underlined and <cr> means press the return key. NOTE that the GenBank UNIX system is CASE-SENSITIVE. Use lower case input unless specifically noted otherwise below. ********************************************************************** NET<1>ftp genbank.bio.net<cr> ;FTP to the computer genbank.bio.net ----------------------- Connected to genbank.bio.net. 220 GENBANK.BIO.NET FTP server (SunOS 4.0) ready. Name (genbank.bio.net:kristoff): anonymous<cr> ;use name "anonymous" ------------- 331 Guest login ok, send ident as password. Password: <cr> ;use your user name as a ------------ ;password, e.g., kristoff ;in my case. 230 Guest login ok, access restrictions apply. ftp> ls<cr> ;list the directory contents ------ 200 PORT command successful. 150 ASCII data connection for /bin/ls (134.172.3.252,2591). .hushlogin Public bin dev etc lib pub usr 226 ASCII Transfer complete. 50 bytes received in .68 seconds (0.072 Kbytes/s) ********************************************************************** The GenBank new data is actually kept a few subdirectories down under pub/db/gb-newdata. EMBL new data is under pub/db/embl-newdata. For the sake of brevity we simply change directly to the pub/db directory. ********************************************************************** ftp> cd /pub/db<cr> -------------- 250 CWD command successful. ftp> ls<cr> ;show GenBank and EMBL subdirectories ------ 200 PORT command successful. 150 Opening ASCII mode data connection for file list. alu seqanalref embl-newdata ;directory for new EMBL data gb-newdata ;directory for new GenBank data readme.doc ;IMPORTANT UPDATE INFORMATION gb-rel62 ;directory for GenBank Release 62 gp-rel62 ;directory for GenPept Release 62 gp-newdata ;directory for new GenPept data 226 Transfer complete. 79 bytes received in 0.04 seconds (1.9 Kbytes/s) ********************************************************************** Finally we change into the GenBank new data subdirectory. The above steps could have been omitted and one could simply use "cd pub/db/gb-newdata" to switch to this directory after logging in. The ls command below shows the names of the new data files. Note that the file names all start with "gb" for GenBank followed by the month and the day for the file in question (each file contains the previous week's data), and finally an extension of .seq is used to indicate that the file contains nucleic acid sequence data. A similar file naming convention is used for EMBL data as can be seen by using the ls command in the directory pub/db/embl-newdata. Files that end with a .Z extension after the .seq are compressed using the UNIX "compress" utility. Notes: Weekly update files are available as standard ASCII files or as compressed ASCII files. The compressed files are about one-third the size of the standard files. They can be distinguished by the .Z suffix and can be uncompressed after transfer with the standard Unix uncompress utility. In addition to the weekly (incremental) update files, the newdata directories each contain a file which represents the cumulative contents of the (GenBank, GenPept, or EMBL) new data since the last (quarterly) public release, pruned of duplicate entries. A given weekly update file will not contain duplicate instances of an entry, but a given entry may appear (with changed status) in more than one of the weekly update files. Within a cumulative update file (gbseq.all, gpseq.all, or emseq.all) there should be no duplicate entries. The cumulative update files are updated daily and are available in compressed form only. The current GenBank release files (in gb-rel62) and GenPept release files (in gp-rel62) are provided in compressed form only. ********************************************************************** ftp> cd gb-newdata<cr> ----------------- 250 CWD command successful. ftp> ls<cr> ------ 200 PORT command successful. 150 Opening ASCII mode data connection for file list. gb1113.seq.Z gb1225.seq ;data for the week ending 12/25. gb1225.seq.Z ;compressed data for the week ending 12/25. gb0101.seq gb0101.seq.Z gb1113.seq gb1120.seq gb1120.seq.Z gb1127.seq gb1127.seq.Z gb1204.seq gb1204.seq.Z gb1211.seq gb1211.seq.Z gb1218.seq gb1218.seq.Z gb0108.seq gb0108.seq.Z gb0115.seq gb0115.seq.Z gb0122.seq gb0122.seq.Z gb0129.seq gbseq.all.Z ;compressed cumulative update file. gb0129.seq.Z gb0205.seq gb0205.seq.Z gb0212.seq gb0212.seq.Z gb0219.seq gb0219.seq.Z 226 Transfer complete. 372 bytes received in 0.02 seconds (18 Kbytes/s) ********************************************************************** Next use the "get" command to retrieve the desired file. These files are typically anywhere from 1 - 4 Megabytes. Transfer time will be limited most likely by the speed of your local Internet connection. The GenBank connection runs at 1.54 Mbps so it will not be the rate limiting step in most cases. The "get" command retrieves the file over the Internet into your directory on your local computer. NOTES FOR VMS USERS: Some of the data files are in compressed format (using the UNIX "compress" utility) and are named as follows: gbxxxx.seq.Z. To transfer successfuly these to a VMS system, you will have to rename them locally (before beginning the transfer) to eliminate one of the "."'s, e.g., rename the file to gbxxxx_seq.Z. Also note that case is important in the file names, i.e., the final "Z" is definitely in uppercase! ********************************************************************** ftp> get gb1106.seq<cr> ;get data for week ending Nov. 11th. ------------------ 200 PORT command successful. 150 ASCII data connection for gb1106.seq (134.172.3.252,2595) (1510609 bytes). 226 ASCII Transfer complete. local: gb1106.seq remote: gb1106.seq 1535022 bytes received in 19 seconds (80 Kbytes/s) ********************************************************************** After the first transfer is complete you can retrieve other files or use the "bye" or "quit" command to end the session. ********************************************************************** ftp> bye<cr> ------- 221 Goodbye. ********************************************************************** **********************************************************************