[bionet.molbio.genbank] Is it possible to get random genbank sequences through the net?

IO00865@MAINE.BITNET (09/21/90)

Hello, I'm new to the net. I was wondering if it is possible to aquire
genebank sequences through the net. I'm looking specifically for Potato
virus X and Potato virus M sequences and possible analysis software.
 
 
Ethan Strauss       IO00865@MAINE.BITNET

kristoff@genbank.bio.net (David Kristofferson) (09/24/90)

Here are the instructions for using the GenBank entry retrieval
server:

----------------------------------------------------------------------

Database entries can be retrieved by either locus name or accession
number.  To use the GenBank Retrieval System, send an electronic
message to RETRIEVE@GENBANK.BIO.NET containing as text (leave the
Subject: line blank) either an accession number, or an entry name, but
not both.  The message text should contain exactly one word.

The data banks are searched in the order: GenBank New Data, GenBank
current release, EMBL New Data, EMBL current release, GenPept New
Data, GenPept current release, and Swiss-Prot until a match is found.
If an entry exists in both GenBank and EMBL with the same accession
number (the usual case), a query on the accession number will return
the GenBank version of the entry.  If the EMBL-format version is
required, it can be retrieved from the file server at
NETSERV@EMBL.BITNET (for instructions send a message containing the
line HELP to that address).

An electronic version of the sequence data submission form used by the
sequence data banks is also available through the RETRIEVE server.  To
receive a copy, send a message containing the word DATASUB as the only
line.  Instructions for completing and submitting the form are
included.

If you have any questions or comments, feel free to mail them to
RETRIEVE-REQUEST@GENBANK.BIO.NET.
-- 
				Sincerely,

				Dave Kristofferson
				GenBank Manager

				kristoff@genbank.bio.net

toms@fcs260c2.ncifcrf.gov (Tom Schneider) (09/24/90)

In article <90264.125319IO00865@MAINE.BITNET> IO00865@MAINE.BITNET writes:
> I was wondering if it is possible to aquire genebank sequences through the
> net.  Ethan Strauss       IO00865@MAINE.BITNET

Yes.  Use ftp:

ftp genbank.bio.net
anonymous
(any string)
ls
cd pub/db/gb-rel61 (or whichever is most recent release)
get (whatever)

for more details contact:

                                        David Benton
                                        GenBank Manager
                                        415-962-7360
                                        benton@genbank.ig.com

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov

kristoff@GENBANK.BIO.NET (Dave Kristofferson) (09/25/90)

FTP is only available to people who have computers connected to the
Internet.  It is not clear whether or not the person who posted the
query has Internet access since the posting came from a BITNET
address.  The e-mail server works from virtually any mail address as
long as the sender's From: line has a recognizable return address.
Unfortunately some people still set their mailers up as joe@foo and
expect the outside world to know where "foo" is.  Retrieval server
users should use either a domain style address, e.g., joe@foo.nyu.edu
or append .bitnet to their BITNET address, e.g., use joe@CUNYVM.BITNET
instead of joe@CUNYVM.

In the event that FTP access is available to the person who posted the
question, more complete instructions are included below.

On another note Dave Benton is moving shortly to the NIH Genome
Office.  Queries about GenBank should not be addressed to individuals
since he/she might be on the road at any time.  Please send GenBank
questions either to this newsgroup or to genbank@genbank.bio.net.  The
main GenBank phone number is 415-962-7364.


				Sincerely,

				Dave Kristofferson
				GenBank Manager

				kristoff@genbank.bio.net

----------------------------------------------------------------------

	Example of FTPing new weekly data updates from GenBank
		    (last update 2/20/90 - D.K.)

**********************************************************************
In the following example NET<n> is the Unix prompt on the computer
used for this example.  Your computer's system prompt will be
different.  Details of the ftp protocol may also be different on your
local computer, so please consult your local systems manager if the
following commands are not recognized on your local computer.

Comments in the example below are set off by ***'s as here or by ;'s
at the end of a line.  Input is underlined and <cr> means press the
return key.  NOTE that the GenBank UNIX system is CASE-SENSITIVE.  Use
lower case input unless specifically noted otherwise below.
**********************************************************************

NET<1>ftp genbank.bio.net<cr>		;FTP to the computer genbank.bio.net
      -----------------------
Connected to genbank.bio.net.
220 GENBANK.BIO.NET FTP server (SunOS 4.0) ready.
Name (genbank.bio.net:kristoff): anonymous<cr>	;use name "anonymous"
				 -------------
331 Guest login ok, send ident as password.
Password:	 <cr>				;use your user name as a
	 ------------				;password, e.g., kristoff
						;in my case.
230 Guest login ok, access restrictions apply.
ftp> ls<cr>					;list the directory contents
     ------
200 PORT command successful.
150 ASCII data connection for /bin/ls (134.172.3.252,2591).
.hushlogin
Public
bin
dev
etc
lib
pub
usr
226 ASCII Transfer complete.
50 bytes received in .68 seconds (0.072 Kbytes/s)

**********************************************************************
The GenBank new data is actually kept a few subdirectories down under
pub/db/gb-newdata.  EMBL new data is under pub/db/embl-newdata.  For
the sake of brevity we simply change directly to the pub/db directory.
**********************************************************************

ftp> cd /pub/db<cr>
     --------------
250 CWD command successful.
ftp> ls<cr>				;show GenBank and EMBL subdirectories
     ------
200 PORT command successful.
150 Opening ASCII mode data connection for file list.
alu
seqanalref
embl-newdata				;directory for new EMBL data
gb-newdata				;directory for new GenBank data
readme.doc				;IMPORTANT UPDATE INFORMATION
gb-rel62				;directory for GenBank Release 62
gp-rel62				;directory for GenPept Release 62
gp-newdata				;directory for new GenPept data
226 Transfer complete.
79 bytes received in 0.04 seconds (1.9 Kbytes/s)

**********************************************************************
Finally we change into the GenBank new data subdirectory.  The above
steps could have been omitted and one could simply use "cd
pub/db/gb-newdata" to switch to this directory after logging in.  The
ls command below shows the names of the new data files.  Note that the
file names all start with "gb" for GenBank followed by the month and
the day for the file in question (each file contains the previous
week's data), and finally an extension of .seq is used to indicate
that the file contains nucleic acid sequence data.  A similar
file naming convention is used for EMBL data as can be seen by using
the ls command in the directory pub/db/embl-newdata.  Files that end
with a .Z extension after the .seq are compressed using the UNIX
"compress" utility.

Notes: Weekly update files are available as standard ASCII files or
       as compressed ASCII files.  The compressed files are about
       one-third the size of the standard files.  They can be
       distinguished by the .Z suffix and can be uncompressed after
       transfer with the standard Unix uncompress utility.

       In addition to the weekly (incremental) update files, the
       newdata directories each contain a file which represents the
       cumulative contents of the (GenBank, GenPept, or EMBL) new data
       since the last (quarterly) public release, pruned of duplicate
       entries.  A given weekly update file will not contain duplicate
       instances of an entry, but a given entry may appear (with
       changed status) in more than one of the weekly update files.
       Within a cumulative update file (gbseq.all, gpseq.all, or
       emseq.all) there should be no duplicate entries.  The cumulative
       update files are updated daily and are available in compressed
       form only.

       The current GenBank release files (in gb-rel62) and
       GenPept release files (in gp-rel62) are provided
       in compressed form only.
**********************************************************************

ftp> cd gb-newdata<cr>
     -----------------
250 CWD command successful.
ftp> ls<cr>
     ------
200 PORT command successful.
150 Opening ASCII mode data connection for file list.
gb1113.seq.Z
gb1225.seq			;data for the week ending 12/25.
gb1225.seq.Z			;compressed data for the week ending 12/25.
gb0101.seq
gb0101.seq.Z
gb1113.seq
gb1120.seq
gb1120.seq.Z
gb1127.seq
gb1127.seq.Z
gb1204.seq
gb1204.seq.Z
gb1211.seq
gb1211.seq.Z
gb1218.seq
gb1218.seq.Z
gb0108.seq
gb0108.seq.Z
gb0115.seq
gb0115.seq.Z
gb0122.seq
gb0122.seq.Z
gb0129.seq
gbseq.all.Z			;compressed cumulative update file.
gb0129.seq.Z
gb0205.seq
gb0205.seq.Z
gb0212.seq
gb0212.seq.Z
gb0219.seq
gb0219.seq.Z
226 Transfer complete.
372 bytes received in 0.02 seconds (18 Kbytes/s)

**********************************************************************
Next use the "get" command to retrieve the desired file.  These files
are typically anywhere from 1 - 4 Megabytes.  Transfer time will be
limited most likely by the speed of your local Internet connection.
The GenBank connection runs at 1.54 Mbps so it will not be the rate
limiting step in most cases.  The "get" command retrieves the file
over the Internet into your directory on your local computer.

NOTES FOR VMS USERS: Some of the data files are in compressed format
(using the UNIX "compress" utility) and are named as follows:
gbxxxx.seq.Z.  To transfer successfuly these to a VMS system, you will
have to rename them locally (before beginning the transfer) to
eliminate one of the "."'s, e.g., rename the file to gbxxxx_seq.Z.
Also note that case is important in the file names, i.e., the final
"Z" is definitely in uppercase!
**********************************************************************

ftp> get gb1106.seq<cr>		;get data for week ending Nov. 11th.
     ------------------
200 PORT command successful.
150 ASCII data connection for gb1106.seq (134.172.3.252,2595) (1510609 bytes).
226 ASCII Transfer complete.
local: gb1106.seq remote: gb1106.seq
1535022 bytes received in 19 seconds (80 Kbytes/s)

**********************************************************************
After the first transfer is complete you can retrieve other files or
use the "bye" or "quit" command to end the session.
**********************************************************************

ftp> bye<cr>
     -------
221 Goodbye.

**********************************************************************
**********************************************************************