zuker@nrcbsz.bio.nrc.ca (Michael Zuker) (05/14/91)
As far as I can determine, Andrew Coulson (biochemist) and John Collins (computer scientist) from Edinburgh University were the first to use specialized hardware for DNA/protein sequence database searching. They used the DAP (distributed array processor) which comprises 4096 bit addressable processors in parallel. The DAP was developed by the British armed forces for military purposes, but was discarded. They use a rigorous Needleman-Wunsch dynamic programming algorithm together with a variety of PAM matrices (Dayhoff ref.) for protein database searching. No shortcuts or compromises are needed as in the FASTA algorithm. In my opinion, their combination of hardware and software is still the best. Like Geir Hauge, I have heard many promises of specialized chips, but I have not seen results. References Collins, J. F. and Coulson, A. F. W. (1984). "Applications of parallel processing algorithms for DNA sequence analysis." Nucleic Acids Res. 12, 181-192. Coulson, A. F. W., Collins, J. F. and Lyall, A. (1987). "Protein and nucleic acid sequence database searching: a suitable case for parallel processing." The Computer Journal 30, No. 5,420-424. Dayhoff, M. O., Schwartz, R. M. and Orcutt, B. C. (1978). "A Model of Evolutionary Change in Proteins." In Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 3, National Biomedical Research Foundation, Washington, 345-352. _________________________________________________________ | ID: Michael Zuker | | INTER-net: zuker@vm.nrc.ca | | zuker@nrcbsz.bio.nrc.ca | | PHONE-net: (613) 993-4830 | | FAX-net: (613) 952-0583 | | TELEX-net: 053-3145 | | SNAIL-net: Institute for Biological Sciences | | M-54, National Research Council | | Ottawa, Ontario | | Canada K1A 0R6 | |=> Absolutum obsoletum - If it works, it's out of date. | |________________________________________________________| -- --- Moderator --- Domain: curtiss@umiacs.umd.edu Phillip Curtiss UUCP: uunet!mimsy!curtiss UMIACS - Univ. of Maryland Phone: +1-301-405-6710 College Park, Md 20742
watt@eleazar.dartmouth.edu (05/17/91)
howdy, I am currently working on a simple NuBus board for the MacII to implement the Needleman-Wunsch algorithm as my M.S. thesis. I will also be writing the user-interface software that will do some basic color dot-plot homologies. The architecture is not all that sophisticated: I have a single Actel FPGA to do the additions and comparisons and a lot of DRAM to store the array for extracting the resulting alignment by backtracking through the array. The project is not meant to push back the frontiers of string comparison computer architecture, but rather to bring some of the advances in hardware to the lab at a reasonable cost. The algorithm is still O(N^2), but the curve is stretched out some. I am hoping for a 50x speedup over my plain vanilla IIfx. At the moment I am working on the schematic capture and simulation phase of the design and hope to go to prototype within a month. I am a hardware designer, not a biologist, and I still have some doubts about the features of my board, so if you have any suggestions, I would love to hear them. At the moment the design has the following features: User settable (in Mac software) costs (integers 0-15) for Insertion gap creation Insertion of a single nucleotide Deletion gap creation Deletion of a single nucleotide Substitution of a single nucleotide. Memory space for sequences up to 4000 x 4000 nts. with 1Mb SIMMs 8000 x 8000 nts. with 4Mb SIMMs 16000 x 16000 nts. with 16Mb SIMMs Ability to calculate 8000 x 8000 in approx. 6 seconds. Thanks for your time. -- --- Moderator --- Domain: curtiss@umiacs.umd.edu Phillip Curtiss UUCP: uunet!mimsy!curtiss UMIACS - Univ. of Maryland Phone: +1-301-405-6710 College Park, Md 20742
wrp@biochsn.acc.virginia.edu (05/18/91)
In article <34608@mimsy.umd.edu> watt@eleazar.dartmouth.edu writes: > >I am currently working on a simple NuBus board for the MacII to >implement the Needleman-Wunsch algorithm as my M.S. thesis. >I will also be writing the user-interface software that will do >some basic color dot-plot homologies. > >The architecture is not all that sophisticated: I have a single >Actel FPGA to do the additions and comparisons and a lot of >DRAM to store the array for extracting the resulting alignment >by backtracking through the array. > > > Memory space for sequences up to > 4000 x 4000 nts. with 1Mb SIMMs > 8000 x 8000 nts. with 4Mb SIMMs > 16000 x 16000 nts. with 16Mb SIMMs > > Ability to calculate 8000 x 8000 in approx. 6 seconds. > It should be noted that efficient implementations of both rigorous global similarity calculations (Needleman Wunsch) and local similarity calculations (Smith-Waterman, Waterman-Eggert) that require only linear (O(N)) space have been described by Miller and Myers (CABIOS 1988 4:11-17) and Huang and Miller (CABIOS, 1990 6:373-381, Adv. Appl. Math, 1991, in press). It seems silly to place memory restrictions on the algorithm when none are necessary. Bill Pearson -- --- Moderator --- Domain: curtiss@umiacs.umd.edu Phillip Curtiss UUCP: uunet!mimsy!curtiss UMIACS - Univ. of Maryland Phone: +1-301-405-6710 College Park, Md 20742