[bionet.software.contrib] Latest version of TDALIGN

SYEH@BIONET-20.BIO.NET (Spencer Yeh) (01/13/89)

Dear Bionet Users,

Version 1.61 of Dan Davison and Keith Thompson's TDALIGN program is
now available for downloading from BIONET or from the BIONET Lending
Library.  TDALIGN was previously known as ALIGN.  The latest version
has been recompiled using RM FORTRAN, and works on most, if not all, 
IBM-compatibles.  The previous version compiled with MicroSoft FORTRAN
crashed on some "compatibles."  The actual algorithm has not been
changed.  

The latest distribution includes two self-extracting archive files,
one for the executables, documentation, and test files, and another
for the source code and object files.  

I'm appending the "README.DOC" file below.  The same information is
available on BIONET by typing HELP TDALIGN at the BIONET @ prompt.  
Type HELP SOFTWARE for information on other contributed software.  

 
Sincerely,

Spencer Yeh			Internet: bionet@bionet-20.bio.net
Applications Analyst			      (for BIONET issues) 
BIONET					 
(415) 324-4363

----------------------------------------------------------------------

TDALIGN (version 1.61)
December 1988  Dan Davison and Dr. Keith Thompson
("readme.doc" added 1/4/89 by Spencer Yeh)


INTRODUCTION

TDALIGN is a global alignment program for two nucleotide sequences or
two protein sequences.  It works by placing the first residue of one
sequence opposite the first residue of the other sequence, and then
"stretching"  the two sequences by adding gaps to find matching. 
Because of this, it does NOT permit a gap at the 5' terminus.  This may
be a problem for some users.  The algorithm is described fully in
Davison and Thompson, "A non-metric sequence alignment algorithm", Bull.
Math. Biol.  1984 46(4):  579-590. 



CONTACT ADDRESS

For questions about the program or suggestions for future improvements
please contact:

     Dan Davison
     Theoretical Biology and Biophysics Group
     T-10 MS K710
     Los Alamos National Laboratory
     Los Alamos, NM 87545

     tel.: (505) 665-1355
     e-mail: dd@lanl.gov
             or goad.davison@bionet-20.bio.net
     CompuServe:  74065,41 (rarely)


SYSTEMS SUPPORTED

An IBM-compatible computer running MS-DOS is needed.  Two versions of
the program are available; one for machines with a 80X87 math
coprocessor and one for machines without.


AVAILABILITY

TDALIGN is available for downloading from BIONET in the directory
<PC-SOFTWARE.DAVISON> or by postal mail from the BIONET Lending
Library.  If you would like to receive TDALIGN by mail, please send a
stamped, self-addressed return envelope along with a formatted diskette
(specify capacity) and your request to:

     BIONET Administrator
     BIONET/IntelliGenetics, Inc.
     700 East El Camino Real, Suite 300
     Mountain View, CA 94040

     tel.: (415) 962-7337


   
SOURCE CODE

Source code written in FORTRAN is available in the self-extracting
archive file "TDALNSRC.EXE".  The program was originally compiled under
MicroSoft FORTRAN, but has since been recompiled using RM FORTRAN 2.42
(also known as AUSTEC FORTRAN) which is a far better implementation of
FORTRAN. 


PROGRAM FILES before de-ARCing (Approx. 228 Kb):

README.DOC     This documentation file.  (8 Kb).
TDALIGN.EXE    Self-extracting archive file for the executables 
               and documentation.  (110 Kb).
TDALNSRC.EXE   Self-extracting archive file for the source code
               and object files. (110 Kb).


Files from TDALIGN.EXE after de-ARCing (Approx. 210 Kb total):

READ     ME      4219  12-11-88  12:43a   Documentation file.
ALIGN    DOC    20025  12-11-88  12:41a   Documentation file.
NTDALIGN EXE    99920  12-11-88  12:24a   Executable for machines w/o 80X87.
TDALIGN  EXE    86416  12-10-88  12:31a   Executable for machines w/ 80X87.
SEQ1     SEQ       35   1-15-88   9:09p   Test data file.
SEQ2     SEQ       43   1-15-88   9:10p   Test data file.


DOCUMENTATION

The program is documented in the files READ.ME and TDALIGN.DOC in
addition to containing internal help messages.   The source code is also
commented.  


STARTING THE PROGRAM

De-archive the TDALIGN.EXE file by "running" the archive file and
specifying the drive and directory path where you want the program
installed.  E.g., to install TDALIGN in the \tdalign directory of the c:
drive, you should type:

	>TDALIGN c:\tdalign

Once installed, CD to the appropriate directory and then type the
program name at the MS-DOS prompt:

	>NTDALIGN        (for machines without a 80X87 math coprocessor)

or      >TDALIGN         (for machines with a 80X87 math coprocessor)


DE-ARCHIVING THE SOURCE CODE AND OBJECT FILES.

Run the TDALNSRC.EXE source archive file, specifying the drive and
directory path where you want the source files installed.  E.g., to
install the source files in the \tdalnsrc directory of the c: drive, you
should type:

	>TDALNSRC c:\tdalnsrc

Once installed the object files can be relinked with the DOS linker or
PLINK86, or the source code can be re-compiled.  The RM FORTRAN
libraries are required. 



SAMPLE PROGRAM OUTPUT

   This program takes any two R/DNA or amino acid sequences in one letter code
   and compares them for similarity. Copyright 1982, 1984, 1985, 1986, 1987,
   1988, 1989 by Dan Davison and Keith Thompson. Version 1.61 12/11/88



   Enter the name of the file containing sequence 1
   or a  ?:  seq1.seq


   Enter the name of the file containing the second sequence (can be in the
    same file) or a ?:  seq2.seq


   Enter the name of the first sequence (this parameter is case sensitive)
    or a ?:  seq1


   Enter the name of the second sequence (this parameter is case sensitive)
    or a ?:  seq2






   The sequences to be matched are  seq1      and  seq2      





   Enter the start and end positionsin sequence  seq1      
    ...YOU MUST ENTER TWO NUMBERS..... 0,0 for the complete sequence
   (not a ?):  0,0





   Enter the start and end positionsin sequence  seq2      
    ...YOU MUST ENTER TWO NUMBERS..... 0,0 for the complete sequence
   (not a ?):  0,0





   Enter gapsize, matchlength, range,gap penalty--free format
   (? or a non-number for more info):  10,2,20,4

   Do you want the input sequences printed out?
   0=No, 1=sequence 1, 2=sequence 2, 3=both:  3

   File for output ? (y/n):  n

   Print out match table? (y/n):  y


    seq1      

               10
       aaccggtt


    seq2      

               10        20
       aaccggcgcgcgcgcg


   seq1        limits:          1 -          8
   seq2        limits:          1 -         16

        K  1START    1END    2START    2END  LENGTH

        1       1       6         1       6       6
        2       9       6        17       6       0



   Gapsize=      10 Matchlength=       2 Range=      20 Gap penalty=   4.00


   seq1       LIMITS:          1 -          8
   seq2       LIMITS:          1 -         16







   1                                                                           
   1           tt                                                              
   1     aaccgg                                                                
   2     aaccgg                                                                
   2           cgcgcgcgcg                                                      
   2             10                                                            

   End of program....bye



KNOWN PROBLEMS

1.  Please be aware that TDALIGN is CASE-SENSITIVE at the prompt which asks
for the sequence name.  The sequence name must be entered in EXACTLY the
same case as it exists in the sequence file.  "SEQUENCE1" is different
from "Sequence1" which is different from "sequence1"!!

2.  Because of the algorithm that it uses, TDALIGN does not allow any
offset at the 5' end.  The two sequences must start with the same 5'
residues.  

-------