[bionet.software] Mailfasta in the rerun

deboer@balaena.bio.vu.nl (Thon de Boer) (07/18/90)
The posting of yesterday of my program 'mailfasta' had some bugs so here it is again. Sorry for the double posting !

I wrote a program which uses the services offered by GenBank and EMBL for 
doing FASTA searches. It will send a sequence to a mail-server which will do a
sequence similarity search against the specified databases using the FASTA
program written by William Pearson and David Lipman.
The program runs interactive and is easy to use.
It can read files in the Pearson format, DNA strider files or plain DNA and
PROTEIN files with no comment lines.
For more information about the FASTA mail servers you can contact GenBank 
or EMBL. (send "HELP" in the subject field of a mail message to
SEARCH@GENBANK.BIO.NET or FASTA@EMBL (bitnet/EARN etc))
or FASTA@EMBL (bitnet/EARN etc))

Hope you can use it

-------------------------------------------------------------------------------
 _______                 ___           Thon de Boer
    /                   /__/           Dept. of Microbiological physiology
   //   __ __  _   __  /  \ __ __ __   Biological Faculty
  //-- / // / / \ /_/ /   // //_//  '  Vrije Universiteit, Amsterdam, Holland
 //  //_// / /__//_  /___//_//_ /      deboer@bio.vu.nl
-------------------------------------------------------------------------------
#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  mfasta mfasta/cid.c mfasta/getentry mfasta/mailfasta
#   mfasta/mailfasta.doc
# Wrapped by deboer@bio.vu.nl on Tue Jul 17 19:22:46 1990
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test ! -d 'mfasta' ; then
    echo shar: Creating directory \"'mfasta'\"
    mkdir 'mfasta'
fi
if test -f 'mfasta/cid.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'mfasta/cid.c'\"
else
echo shar: Extracting \"'mfasta/cid.c'\" \(580 characters\)
sed "s/^X//" >'mfasta/cid.c' <<'END_OF_FILE'
X#include <stdio.h>
X
Xmain ()
X{
X int ca,cc,cg,ct,total,i;
X float result;
X char ch;
X ca=cc=cg=ct=total=i=0;
X while((ch = getchar()) == ';') readln();
X if (ch =='>') readln();
X while((ch = getchar()) != EOF)
X { switch (ch)
X   { case 'a':
X     case 'A': ca++; break;
X     case 'c':
X     case 'C': cc++; break;
X     case 't':
X     case 'T': ct++; break;
X     case 'g':
X     case 'G': cg++; break;
X   }
X   if (((ch>=65)&&(ch<=90))||((ch>=97)&&(ch<=122))) total++;
X}
X result = (float)(ca+cc+ct+cg)/total;
X if (result>=.85) exit(0);
X else exit(1);
X}
X
Xreadln()
X{
X while(getchar() != 10);
X}
END_OF_FILE
if test 580 -ne `wc -c <'mfasta/cid.c'`; then
    echo shar: \"'mfasta/cid.c'\" unpacked with wrong size!
fi
chmod +x 'mfasta/cid.c'
# end of 'mfasta/cid.c'
fi
if test -f 'mfasta/getentry' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'mfasta/getentry'\"
else
echo shar: Extracting \"'mfasta/getentry'\" \(294 characters\)
sed "s/^X//" >'mfasta/getentry' <<'END_OF_FILE'
X#!/bin/csh
Xif ($1 != '') then
X  foreach i ($*)
X    echo $i | mail retrieve@genbank.bio.net
X    end
Xelse
X  echo getentry: Retrieve a sequence database entry via mail from GenBank
X  echo 'Usage: getentry  entry [entry entry ....]'
X  echo ' entry can be an accession number or a locus name'
Xendif
END_OF_FILE
if test 294 -ne `wc -c <'mfasta/getentry'`; then
    echo shar: \"'mfasta/getentry'\" unpacked with wrong size!
fi
chmod +x 'mfasta/getentry'
# end of 'mfasta/getentry'
fi
if test -f 'mfasta/mailfasta' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'mfasta/mailfasta'\"
else
echo shar: Extracting \"'mfasta/mailfasta'\" \(6214 characters\)
sed "s/^X//" >'mfasta/mailfasta' <<'END_OF_FILE'
X#!/bin/csh
Xecho ' '
Xecho ' '
Xecho '                                 MailFasta'
Xecho '                                 *********'
Xecho By Thon de Boer
Xecho Department of Microbiological physiology
Xecho Vrije universiteit AMSTERDAM  Holland.
Xecho This program is in the Public Domain
Xecho 'Send comments to deboer@bio.vu.nl (email).'
Xecho ' '
Xecho Use \'mailfasta -q\' to get the queue from the fasta mail server
Xecho " or 'mailfasta [sequencefile1 sequencefile2 ..]'"
Xecho ' '
Xecho This program will read a sequence file and mail it to
Xecho 'GenBank (or EMBL) were it will be scanned against the sequence databases'
Xecho 'using the FASTA program'
Xecho ' '
Xecho ' '
Xset stop = false
Xwhile ($stop == false)
Xif ($1 == '') then
X  set argument = empty
Xelse
X  set argument = $1
Xendif
Xswitch ($argument)
Xcase -q:
X        echo QUEUE | mail SEARCH@GENBANK.BIO.NET
X        echo Queue command send
X        exit
X        breaksw
Xcase -*:
X        echo mailfasta: Unknow flag \'$argument\'
X        echo Use \'mailfasta -q\' to get the queue from the fasta mail server
X        echo " or 'mailfasta [sequencefile1 sequencefile2 ..]'"
X        exit
X        breaksw
Xendsw
Xif ($argument == empty) then
Xset good=false
Xwhile ($good == false)
Xecho Enter the filename wich contains the sequence
Xset file=$<
Xif ($file != '') then
X  if !(-f $file) then
X    echo mailfasta: File \'$file\' does not exist
X  else
X    echo ' '
X    more $file
X    echo ' '
X    echo 'Is this the right file ? ([yes]/no)'
X    set choice = $<
X    switch ($choice)
X    case n*:
X         breaksw
X    default:
X         set good=true
X         breaksw
X    endsw
X  endif
Xendif
Xend
Xelse if !(-f $argument) then
X  echo mailfasta: File \'$argument\' does not exist
X  exit
Xelse
X  set file = $argument
Xendif
Xif {(cid < $file)} then
X      set type = DNA
X    else
X      set type = PROTEIN
X    endif
Xecho ' '
Xecho Now using $type file: $file
Xecho '*********'
Xgrep '>' $file
Xgrep ';' $file
Xecho ' '
Xendif
Xset db1 = (genbank/all genbank/new genpept/all genpept/new embl/all embl/new swiss-prot/all nbrf genbank/primate genbank/rodent genbank/other_mammalian genbank/other_vertebrate genbank/invertebrate genbank/plant genbank/organelle genbank/bacterial)
Xset db2 = (genbank/structural_rna genbank/viral genbank/phage genbank/synthetic genbank/unannotated)
Xset db = ($db1 $db2)
Xecho 'Which database(s) do you want to search ?'
Xecho ' '
Xif ($type == DNA) echo '    1   All GenBank sequences (including new seq since latest release)'
Xif ($type == DNA) echo '    2   The new GenBank entries'
Xif ($type == PROTEIN) echo '    3   All translated protein reading frames from GenBank'
Xif ($type == PROTEIN) echo '    4   The new entries of translated protein reading frames'
Xif ($type == DNA) echo '    5   All EMBL sequences (including the new sequences)'
Xif ($type == DNA) echo '    6   The new EMBL entries'
Xif ($type == PROTEIN) echo '    7   All SWISS-PROT protein entries'
Xif ($type == PROTEIN) echo '    8   All NBRF/PIR protien entries (results return very slow) '
Xecho ' '
Xif ($type == DNA) then
Xecho ' GenBank subdivisions (for faster searching)'
Xecho ' '
Xecho '    9   The primate sequences            16  The bacterial sequences'
Xecho '    10  The rodent sequences             17  The structural RNA sequences'
Xecho '    11  The other mammalian sequences    18  The viral sequences'
Xecho '    12  The other vertebrate seq         19  The phage sequences'
Xecho '    13  The invertebrate sequences       20  The synthetic sequences'
Xecho '    14  The plant sequences              21  The unannotated sequences'
Xecho '    15  The organelle sequences'
Xecho ' '
Xendif
Xset good=false
Xwhile ($good == false)
X  echo 'Enter the number(s) of your choice (seperated by a <SPACE>)'
X  set choice=$<
X  if ("$choice" != "") set good = true
X  foreach i ($choice)
X  if ($type == DNA) then
X    switch ($i)
X    case 1:
X    case 2:
X    case 5:
X    case 6
X    case 9:
X    case 10:
X    case 11:
X    case 12:
X    case 13:
X    case 14:
X    case 15:
X    case 16:
X    case 17:
X    case 18:
X    case 19:
X    case 20:
X    case 21:
X         breaksw
X    default:
X         echo mailfasta: Invallid choice \'$i\'
X         set good = false
X         breaksw
X    endsw
X  else
X   switch ($i)
X   case 3:
X   case 4:
X   case 7:
X   case 8:
X        breaksw
X   default:
X         echo mailfasta: Invallid choice \'$i\'
X         set good = false
X         breaksw
X   endsw
X  endif
Xend
Xend
Xset good=false
Xwhile ($good == false)
X  echo ' '
X  echo 'Enter the sensitivity (lower number means more sensitive)'
X  if ($type == DNA) then
X    echo '  (3..6) [4]'
X  else
X    echo '  (1..2) [1]'
X  endif
X  set ktup=$<
X  if ($ktup != '') then
X    if ($type == DNA) then
X      switch ($ktup)
X      case [3-6]:
X         set good=true
X         breaksw
X      default:
X         echo mailfasta: Invalid choice \'$ktup\'
X         breaksw
X    endsw
X    else
X      switch ($ktup)
X      case [1-2]:
X         set good=true
X         breaksw
X      default:
X         echo mailfasta: Invalid choice \'$ktup\'
X         breaksw
X      endsw
X    endif
Xelse
X    set good=true
X  endif
Xend
Xecho ' '
Xecho 'Enter the maximum number of matched sequences [100]'
Xset scores=$<
Xecho ' '
Xecho 'Enter the maximum number of alignments [20]'
Xset align=$<
Xgrep ">" $file > /tmp/mf2$$
Xset grp = `cat /tmp/mf2$$`
Xforeach i ($choice)
X  if ($i != 8) then
X   echo DATALIB $db[$i] >> /tmp/mf$$
X   if ($ktup != '') echo KTUP $ktup >> /tmp/mf$$
X   if ($scores != '') echo SCORES $scores >> /tmp/mf$$
X   if ($align != '') echo ALIGNMENTS $align >> /tmp/mf$$
X   echo BEGIN >> /tmp/mf$$
X   if (-z /tmp/mf2$$) echo ">"$file $type file >> /tmp/mf$$
X   grep -v \; $file >> /tmp/mf$$
X   mail SEARCH@GENBANK.BIO.NET < /tmp/mf$$
X  else
X   echo LIB nbrf >> /tmp/mf$$
X   if ($ktup != '') echo WORD $ktup >> /tmp/mf$$
X   if ($scores != '') echo LIST $scores >> /tmp/mf$$
X   if ($align != '') echo ALIGN $align >> /tmp/mf$$
X   if !(-z /tmp/mf2$$) then
X    echo TITLE $grp >> /tmp/mf$$
X    echo SEQ >> /tmp/mf$$
X    grep -v ">" $file >> /tmp/mf$$
X   else
X    echo TITLE $file $type file >> /tmp/mf$$
X    echo SEQ >> /tmp/mf$$
X    grep -v \; $file >> /tmp/mf$$
X   endif
X  mail FASTA@EMBL.BITNET < /tmp/mf$$
X  endif
Xrm /tmp/mf$$
Xend
Xif ($2 == '') then
X  set stop = true
Xelse
X  shift
Xendif
Xend
X
END_OF_FILE
if test 6214 -ne `wc -c <'mfasta/mailfasta'`; then
    echo shar: \"'mfasta/mailfasta'\" unpacked with wrong size!
fi
chmod +x 'mfasta/mailfasta'
# end of 'mfasta/mailfasta'
fi
if test -f 'mfasta/mailfasta.doc' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'mfasta/mailfasta.doc'\"
else
echo shar: Extracting \"'mfasta/mailfasta.doc'\" \(3099 characters\)
sed "s/^X//" >'mfasta/mailfasta.doc' <<'END_OF_FILE'
XMAILFASTA and GETENTRY
X
XNAME
X     mailfasta - send a sequence file for comparison with sequence databases
X
X     getentry - get an entry from the sequence databases
X
XSYNOPSIS
X     mailfasta [-q] [sequencefile1 sequencefile2 ...]
X
X     getentry entry1 [entry2 ...]
X
XDESCRIPTION
X   
X     MAILFASTA
X    
X     The GenBank site (genbank.bio.net) and the EMBL site (embl.bitnet)
X     contain several sequence databases. A sequence can be compared with these
X     databases by sending a specially formatted email message to the sites
X     mail-server. On the sites the sequence is compared to the databases with
X     the FASTA program written by Bill Pearson. The FASTA program also finds
X     related sequences.
X
X     Mailfasta is a interactive program which reads a sequence file (either
X     DNA or (one letter coded) protein) and asks which databases is to be
X     searched. Then it asks the sensitivity of the search and the maximum
X     number of best-ranked results. It will send the sequence to the apropriate
X     site and the results of the search will be e-mailed back.
X     The sequence file must be an ascii file and the first line can be a comment
X     line by starting it with '>'. The program will also read and correctly
X     handle sequencefiles made by the Macintosh program 'DNA strider 1.0'. a
X     program written by Christian Marck. The sequence does not have to be
X     continuous, spaces are allowed, but no other characters, like numbers.
X
X     The list of waiting jobs at the GenBank site can be obtained by using
X     the option -q.
X
X     GETENTRY
X    
X     Entries of interest can be obtained with the program getentry.
X     entry can be either the locus name or the accession number.
X     The entries will be e-mailed back.
X
XDATABASES
X     The following databeses can be used.
X
X       All GenBank entries plus the new entries added since the latest release.
X       The added sequences only.
X       All translated reading frames from the latest GenBank release plus the
X        added sequences.
X       The added translated reading frames only.
X       All EMBL entries plus the new entries added since the latest release.
X       The new EMBL entries only.
X       The Swiss-prot protein database.
X       The NBRF/PIR protein database.
X  
X     The NBRF/PIR database is on the EMBL site the others are on the GenBank
X     site.
X
XBUGS
X     If the NBRF/PIR database is to be searched, it will not appear in the
X     list of waiting jobs, because the EMBL sites does not support a waiting
X     queue list.
X     Most of the results will be available within a few minutes except for
X     comparisons with the NBRF/PIR database, which can take over an hour.
X
XFILES
X     cid                A c program which checks if a file is a DNA sequence
X                        or a protein sequence. If a file contains more than
X                        85% A, C, G and T's it is considered to be a DNA file.
X                        This file MUST be present.
X     mailfasta.doc      This documentation
X     /tmp/mf$$          Temporary storage of the email message.
X     
X
XLast change: 17 July 1990
END_OF_FILE
if test 3099 -ne `wc -c <'mfasta/mailfasta.doc'`; then
    echo shar: \"'mfasta/mailfasta.doc'\" unpacked with wrong size!
fi
chmod +x 'mfasta/mailfasta.doc'
# end of 'mfasta/mailfasta.doc'
fi
cd mfasta
echo Now compiling cid.c
cc -o cid cid.c
echo shar: End of shell archive.
exit 0