SYEH@BIONET-20.BIO.NET (Spencer Yeh) (01/13/89)
Dear Bionet Users, Version 1.61 of Dan Davison and Keith Thompson's TDALIGN program is now available for downloading from BIONET or from the BIONET Lending Library. TDALIGN was previously known as ALIGN. The latest version has been recompiled using RM FORTRAN, and works on most, if not all, IBM-compatibles. The previous version compiled with MicroSoft FORTRAN crashed on some "compatibles." The actual algorithm has not been changed. The latest distribution includes two self-extracting archive files, one for the executables, documentation, and test files, and another for the source code and object files. I'm appending the "README.DOC" file below. The same information is available on BIONET by typing HELP TDALIGN at the BIONET @ prompt. Type HELP SOFTWARE for information on other contributed software. Sincerely, Spencer Yeh Internet: bionet@bionet-20.bio.net Applications Analyst (for BIONET issues) BIONET (415) 324-4363 ---------------------------------------------------------------------- TDALIGN (version 1.61) December 1988 Dan Davison and Dr. Keith Thompson ("readme.doc" added 1/4/89 by Spencer Yeh) INTRODUCTION TDALIGN is a global alignment program for two nucleotide sequences or two protein sequences. It works by placing the first residue of one sequence opposite the first residue of the other sequence, and then "stretching" the two sequences by adding gaps to find matching. Because of this, it does NOT permit a gap at the 5' terminus. This may be a problem for some users. The algorithm is described fully in Davison and Thompson, "A non-metric sequence alignment algorithm", Bull. Math. Biol. 1984 46(4): 579-590. CONTACT ADDRESS For questions about the program or suggestions for future improvements please contact: Dan Davison Theoretical Biology and Biophysics Group T-10 MS K710 Los Alamos National Laboratory Los Alamos, NM 87545 tel.: (505) 665-1355 e-mail: dd@lanl.gov or goad.davison@bionet-20.bio.net CompuServe: 74065,41 (rarely) SYSTEMS SUPPORTED An IBM-compatible computer running MS-DOS is needed. Two versions of the program are available; one for machines with a 80X87 math coprocessor and one for machines without. AVAILABILITY TDALIGN is available for downloading from BIONET in the directory <PC-SOFTWARE.DAVISON> or by postal mail from the BIONET Lending Library. If you would like to receive TDALIGN by mail, please send a stamped, self-addressed return envelope along with a formatted diskette (specify capacity) and your request to: BIONET Administrator BIONET/IntelliGenetics, Inc. 700 East El Camino Real, Suite 300 Mountain View, CA 94040 tel.: (415) 962-7337 SOURCE CODE Source code written in FORTRAN is available in the self-extracting archive file "TDALNSRC.EXE". The program was originally compiled under MicroSoft FORTRAN, but has since been recompiled using RM FORTRAN 2.42 (also known as AUSTEC FORTRAN) which is a far better implementation of FORTRAN. PROGRAM FILES before de-ARCing (Approx. 228 Kb): README.DOC This documentation file. (8 Kb). TDALIGN.EXE Self-extracting archive file for the executables and documentation. (110 Kb). TDALNSRC.EXE Self-extracting archive file for the source code and object files. (110 Kb). Files from TDALIGN.EXE after de-ARCing (Approx. 210 Kb total): READ ME 4219 12-11-88 12:43a Documentation file. ALIGN DOC 20025 12-11-88 12:41a Documentation file. NTDALIGN EXE 99920 12-11-88 12:24a Executable for machines w/o 80X87. TDALIGN EXE 86416 12-10-88 12:31a Executable for machines w/ 80X87. SEQ1 SEQ 35 1-15-88 9:09p Test data file. SEQ2 SEQ 43 1-15-88 9:10p Test data file. DOCUMENTATION The program is documented in the files READ.ME and TDALIGN.DOC in addition to containing internal help messages. The source code is also commented. STARTING THE PROGRAM De-archive the TDALIGN.EXE file by "running" the archive file and specifying the drive and directory path where you want the program installed. E.g., to install TDALIGN in the \tdalign directory of the c: drive, you should type: >TDALIGN c:\tdalign Once installed, CD to the appropriate directory and then type the program name at the MS-DOS prompt: >NTDALIGN (for machines without a 80X87 math coprocessor) or >TDALIGN (for machines with a 80X87 math coprocessor) DE-ARCHIVING THE SOURCE CODE AND OBJECT FILES. Run the TDALNSRC.EXE source archive file, specifying the drive and directory path where you want the source files installed. E.g., to install the source files in the \tdalnsrc directory of the c: drive, you should type: >TDALNSRC c:\tdalnsrc Once installed the object files can be relinked with the DOS linker or PLINK86, or the source code can be re-compiled. The RM FORTRAN libraries are required. SAMPLE PROGRAM OUTPUT This program takes any two R/DNA or amino acid sequences in one letter code and compares them for similarity. Copyright 1982, 1984, 1985, 1986, 1987, 1988, 1989 by Dan Davison and Keith Thompson. Version 1.61 12/11/88 Enter the name of the file containing sequence 1 or a ?: seq1.seq Enter the name of the file containing the second sequence (can be in the same file) or a ?: seq2.seq Enter the name of the first sequence (this parameter is case sensitive) or a ?: seq1 Enter the name of the second sequence (this parameter is case sensitive) or a ?: seq2 The sequences to be matched are seq1 and seq2 Enter the start and end positionsin sequence seq1 ...YOU MUST ENTER TWO NUMBERS..... 0,0 for the complete sequence (not a ?): 0,0 Enter the start and end positionsin sequence seq2 ...YOU MUST ENTER TWO NUMBERS..... 0,0 for the complete sequence (not a ?): 0,0 Enter gapsize, matchlength, range,gap penalty--free format (? or a non-number for more info): 10,2,20,4 Do you want the input sequences printed out? 0=No, 1=sequence 1, 2=sequence 2, 3=both: 3 File for output ? (y/n): n Print out match table? (y/n): y seq1 10 aaccggtt seq2 10 20 aaccggcgcgcgcgcg seq1 limits: 1 - 8 seq2 limits: 1 - 16 K 1START 1END 2START 2END LENGTH 1 1 6 1 6 6 2 9 6 17 6 0 Gapsize= 10 Matchlength= 2 Range= 20 Gap penalty= 4.00 seq1 LIMITS: 1 - 8 seq2 LIMITS: 1 - 16 1 1 tt 1 aaccgg 2 aaccgg 2 cgcgcgcgcg 2 10 End of program....bye KNOWN PROBLEMS 1. Please be aware that TDALIGN is CASE-SENSITIVE at the prompt which asks for the sequence name. The sequence name must be entered in EXACTLY the same case as it exists in the sequence file. "SEQUENCE1" is different from "Sequence1" which is different from "sequence1"!! 2. Because of the algorithm that it uses, TDALIGN does not allow any offset at the 5' end. The two sequences must start with the same 5' residues. -------