SYEH@BIONET-20.BIO.NET (Spencer Yeh) (01/13/89)
Dear Bionet Users,
Version 1.61 of Dan Davison and Keith Thompson's TDALIGN program is
now available for downloading from BIONET or from the BIONET Lending
Library. TDALIGN was previously known as ALIGN. The latest version
has been recompiled using RM FORTRAN, and works on most, if not all,
IBM-compatibles. The previous version compiled with MicroSoft FORTRAN
crashed on some "compatibles." The actual algorithm has not been
changed.
The latest distribution includes two self-extracting archive files,
one for the executables, documentation, and test files, and another
for the source code and object files.
I'm appending the "README.DOC" file below. The same information is
available on BIONET by typing HELP TDALIGN at the BIONET @ prompt.
Type HELP SOFTWARE for information on other contributed software.
Sincerely,
Spencer Yeh Internet: bionet@bionet-20.bio.net
Applications Analyst (for BIONET issues)
BIONET
(415) 324-4363
----------------------------------------------------------------------
TDALIGN (version 1.61)
December 1988 Dan Davison and Dr. Keith Thompson
("readme.doc" added 1/4/89 by Spencer Yeh)
INTRODUCTION
TDALIGN is a global alignment program for two nucleotide sequences or
two protein sequences. It works by placing the first residue of one
sequence opposite the first residue of the other sequence, and then
"stretching" the two sequences by adding gaps to find matching.
Because of this, it does NOT permit a gap at the 5' terminus. This may
be a problem for some users. The algorithm is described fully in
Davison and Thompson, "A non-metric sequence alignment algorithm", Bull.
Math. Biol. 1984 46(4): 579-590.
CONTACT ADDRESS
For questions about the program or suggestions for future improvements
please contact:
Dan Davison
Theoretical Biology and Biophysics Group
T-10 MS K710
Los Alamos National Laboratory
Los Alamos, NM 87545
tel.: (505) 665-1355
e-mail: dd@lanl.gov
or goad.davison@bionet-20.bio.net
CompuServe: 74065,41 (rarely)
SYSTEMS SUPPORTED
An IBM-compatible computer running MS-DOS is needed. Two versions of
the program are available; one for machines with a 80X87 math
coprocessor and one for machines without.
AVAILABILITY
TDALIGN is available for downloading from BIONET in the directory
<PC-SOFTWARE.DAVISON> or by postal mail from the BIONET Lending
Library. If you would like to receive TDALIGN by mail, please send a
stamped, self-addressed return envelope along with a formatted diskette
(specify capacity) and your request to:
BIONET Administrator
BIONET/IntelliGenetics, Inc.
700 East El Camino Real, Suite 300
Mountain View, CA 94040
tel.: (415) 962-7337
SOURCE CODE
Source code written in FORTRAN is available in the self-extracting
archive file "TDALNSRC.EXE". The program was originally compiled under
MicroSoft FORTRAN, but has since been recompiled using RM FORTRAN 2.42
(also known as AUSTEC FORTRAN) which is a far better implementation of
FORTRAN.
PROGRAM FILES before de-ARCing (Approx. 228 Kb):
README.DOC This documentation file. (8 Kb).
TDALIGN.EXE Self-extracting archive file for the executables
and documentation. (110 Kb).
TDALNSRC.EXE Self-extracting archive file for the source code
and object files. (110 Kb).
Files from TDALIGN.EXE after de-ARCing (Approx. 210 Kb total):
READ ME 4219 12-11-88 12:43a Documentation file.
ALIGN DOC 20025 12-11-88 12:41a Documentation file.
NTDALIGN EXE 99920 12-11-88 12:24a Executable for machines w/o 80X87.
TDALIGN EXE 86416 12-10-88 12:31a Executable for machines w/ 80X87.
SEQ1 SEQ 35 1-15-88 9:09p Test data file.
SEQ2 SEQ 43 1-15-88 9:10p Test data file.
DOCUMENTATION
The program is documented in the files READ.ME and TDALIGN.DOC in
addition to containing internal help messages. The source code is also
commented.
STARTING THE PROGRAM
De-archive the TDALIGN.EXE file by "running" the archive file and
specifying the drive and directory path where you want the program
installed. E.g., to install TDALIGN in the \tdalign directory of the c:
drive, you should type:
>TDALIGN c:\tdalign
Once installed, CD to the appropriate directory and then type the
program name at the MS-DOS prompt:
>NTDALIGN (for machines without a 80X87 math coprocessor)
or >TDALIGN (for machines with a 80X87 math coprocessor)
DE-ARCHIVING THE SOURCE CODE AND OBJECT FILES.
Run the TDALNSRC.EXE source archive file, specifying the drive and
directory path where you want the source files installed. E.g., to
install the source files in the \tdalnsrc directory of the c: drive, you
should type:
>TDALNSRC c:\tdalnsrc
Once installed the object files can be relinked with the DOS linker or
PLINK86, or the source code can be re-compiled. The RM FORTRAN
libraries are required.
SAMPLE PROGRAM OUTPUT
This program takes any two R/DNA or amino acid sequences in one letter code
and compares them for similarity. Copyright 1982, 1984, 1985, 1986, 1987,
1988, 1989 by Dan Davison and Keith Thompson. Version 1.61 12/11/88
Enter the name of the file containing sequence 1
or a ?: seq1.seq
Enter the name of the file containing the second sequence (can be in the
same file) or a ?: seq2.seq
Enter the name of the first sequence (this parameter is case sensitive)
or a ?: seq1
Enter the name of the second sequence (this parameter is case sensitive)
or a ?: seq2
The sequences to be matched are seq1 and seq2
Enter the start and end positionsin sequence seq1
...YOU MUST ENTER TWO NUMBERS..... 0,0 for the complete sequence
(not a ?): 0,0
Enter the start and end positionsin sequence seq2
...YOU MUST ENTER TWO NUMBERS..... 0,0 for the complete sequence
(not a ?): 0,0
Enter gapsize, matchlength, range,gap penalty--free format
(? or a non-number for more info): 10,2,20,4
Do you want the input sequences printed out?
0=No, 1=sequence 1, 2=sequence 2, 3=both: 3
File for output ? (y/n): n
Print out match table? (y/n): y
seq1
10
aaccggtt
seq2
10 20
aaccggcgcgcgcgcg
seq1 limits: 1 - 8
seq2 limits: 1 - 16
K 1START 1END 2START 2END LENGTH
1 1 6 1 6 6
2 9 6 17 6 0
Gapsize= 10 Matchlength= 2 Range= 20 Gap penalty= 4.00
seq1 LIMITS: 1 - 8
seq2 LIMITS: 1 - 16
1
1 tt
1 aaccgg
2 aaccgg
2 cgcgcgcgcg
2 10
End of program....bye
KNOWN PROBLEMS
1. Please be aware that TDALIGN is CASE-SENSITIVE at the prompt which asks
for the sequence name. The sequence name must be entered in EXACTLY the
same case as it exists in the sequence file. "SEQUENCE1" is different
from "Sequence1" which is different from "sequence1"!!
2. Because of the algorithm that it uses, TDALIGN does not allow any
offset at the 5' end. The two sequences must start with the same 5'
residues.
-------