dbd@THEORY.BCHS.UH.EDU (Dan Davison) (01/29/91)
Archive-name: bionet/molbio/seqf/1991-01-24 Archive-directory: menudo.uh.edu:/pub/genbank-server/unix/seqf-shar.a[a-k] [129.7.1.6] Original-posting-by: dbd@THEORY.BCHS.UH.EDU (Dan Davison) Original-subject: New release of the SEQF database search routines Reposted-by: emv@ox.com (Edward Vielmetti) A collection of programs based on Minoru Kanehisa's SEQF library search codes is now available for public use. Directions for retrieval from the UH Gene-Server are given below; they will appear at IUBIO, the EMBL File Server and FUNET shortly. The codes have been worked over and now run on most Unix boxes, Crays, and VMS. Much work has been put into making the code as portable as possible. This does not (yet) extend to DOS compilers, though. Don't ask about the Mac until say 5 years after System 7 comes out... The programs are: This package consists of four programs for searching genetic sequence libraries: SN - Search Nucleotide, D/RNA query sequence against a nucleotide sequence library; SP - Search Protein, amino acid query sequence against a protein sequence library; ST - Search Translated, amino acid query sequence against a nucleotide sequence library with 3-frame translation; SPR -Search Protein Reduced, amino acid query sequence against a protein sequence library, with the 20 aa alphabet reduced to 6 letters on charge, hydrophobicity, and size characteristics. SU - Search Unformatted, SN specially I/O hacked for the Cray which requires some care and feeding, partially documented in the code. It is about 55% faster than SN for the same problems. These codes can be used to compare two sequences against each other; the underlying algorithm is the Needleman-Wunsch-Sellers metric alignment, in distance mode. [Yes that's 5, but SU is only usable on non-Crays without some effort.] SEQUENCE FILE FORMATS This code is designed to use most common formats; if you have a format you want included contact dbd at one of the addresses below. Supported formats include GenBank, EMBL/SwissProt, Bionet/ Intelli- genetics/ Stanford, and straight ASCII. The code should automatically detect the proper type. Note that GCG format and Staden code and format is NOT supported at present. If you have GCG files, try TOEMBL in the GCG package for sequence file format conversion. THE CREDITS The code was written Minoru Kanehisa while with the Theoretical Biology and Biophysics Group, Theoretical Divison, Los Alamos National Laboratory, I/O and other modification by Dan Davison while at LANL and the University of Houston. Additional I/O improvements are due to Hugh Nicholas of the Pittsburg Supercomputer Center (thanks!); some last minute work by Ed Chen of the University of Houston. The reduced protein code search came out of discussions with Jim Ostell, now at the National Center for Biotechnology Information at the National Library of Medicine (thanks, Jim!). University of Houston Gene-Server retrieval info: The files are available for e-mail retrieval in the Unix directory: the command send unix seqf-shar.aa seqf-shar.ab seqf-shar.ac seqf-shar.ad seqf-shar.ae seqf-shar.af seqf-shar.ag seqf-shar.ah seqf-shar.ai seqf-shar.aj seqf-shar.ak will send all the files to you. Remove mail headers, concatenate them all together, and run "unshar" or just "/bin/sh filename" where "filename" is the name of the concatenated file. Then read "seqf.relnotes" for more info. The shar file is available for anonymous FTP in menudo.uh.edu (129.7.1.6): ~ftp/pub/genbank-server/unix/seqf.shar and as split files ~ftp/pub/genbank-server/unix/seqf-shar.a[a-k]. If you have questions, comments, flames, or even kind words about the code, direct them all to: Dr. Dan Davison BCHS-5500 Dept. of Biochemical and Biophysical Sciences University of Houston 4800 Calhoun Houston, TX 77204-5500 phone: 713-749-2801 fax: 713-749-3239 e-mail: davison@uh.edu (Internet) DAVISON@UHOU (BITNET) davison@uhnix1.UUCP (Usenet, new style) uhnix1!davison (Usenet, old style) 74065,41 Compu$erve (rarely!)