zuker@nrcbsz.bio.nrc.ca (Michael Zuker) (05/14/91)
As far as I can determine, Andrew Coulson (biochemist) and John
Collins (computer scientist) from Edinburgh University were the first
to use specialized hardware for DNA/protein sequence database
searching. They used the DAP (distributed array processor) which
comprises 4096 bit addressable processors in parallel. The DAP was
developed by the British armed forces for military purposes, but was
discarded. They use a rigorous Needleman-Wunsch dynamic programming
algorithm together with a variety of PAM matrices (Dayhoff ref.)
for protein database searching. No shortcuts or compromises are
needed as in the FASTA algorithm. In my opinion, their combination of
hardware and software is still the best.
Like Geir Hauge, I have heard many promises of specialized chips, but
I have not seen results.
References
Collins, J. F. and Coulson, A. F. W. (1984).
"Applications of parallel processing algorithms for DNA sequence
analysis."
Nucleic Acids Res. 12, 181-192.
Coulson, A. F. W., Collins, J. F. and Lyall, A. (1987).
"Protein and nucleic acid sequence database searching: a suitable
case for parallel processing."
The Computer Journal 30, No. 5,420-424.
Dayhoff, M. O., Schwartz, R. M. and Orcutt, B. C. (1978).
"A Model of Evolutionary Change in Proteins." In Atlas of Protein
Sequence and Structure, Vol. 5, Suppl. 3, National Biomedical
Research Foundation, Washington, 345-352.
_________________________________________________________
| ID: Michael Zuker |
| INTER-net: zuker@vm.nrc.ca |
| zuker@nrcbsz.bio.nrc.ca |
| PHONE-net: (613) 993-4830 |
| FAX-net: (613) 952-0583 |
| TELEX-net: 053-3145 |
| SNAIL-net: Institute for Biological Sciences |
| M-54, National Research Council |
| Ottawa, Ontario |
| Canada K1A 0R6 |
|=> Absolutum obsoletum - If it works, it's out of date. |
|________________________________________________________|
--
--- Moderator ---
Domain: curtiss@umiacs.umd.edu Phillip Curtiss
UUCP: uunet!mimsy!curtiss UMIACS - Univ. of Maryland
Phone: +1-301-405-6710 College Park, Md 20742watt@eleazar.dartmouth.edu (05/17/91)
howdy,
I am currently working on a simple NuBus board for the MacII to
implement the Needleman-Wunsch algorithm as my M.S. thesis.
I will also be writing the user-interface software that will do
some basic color dot-plot homologies.
The architecture is not all that sophisticated: I have a single
Actel FPGA to do the additions and comparisons and a lot of
DRAM to store the array for extracting the resulting alignment
by backtracking through the array.
The project is not meant to push back the frontiers of string
comparison computer architecture, but rather to bring some of
the advances in hardware to the lab at a reasonable cost.
The algorithm is still O(N^2), but the curve is stretched out
some. I am hoping for a 50x speedup over my plain vanilla IIfx.
At the moment I am working on the schematic capture and simulation
phase of the design and hope to go to prototype within a month.
I am a hardware designer, not a biologist, and I still have some
doubts about the features of my board, so if you have any
suggestions, I would love to hear them.
At the moment the design has the following features:
User settable (in Mac software) costs (integers 0-15) for
Insertion gap creation
Insertion of a single nucleotide
Deletion gap creation
Deletion of a single nucleotide
Substitution of a single nucleotide.
Memory space for sequences up to
4000 x 4000 nts. with 1Mb SIMMs
8000 x 8000 nts. with 4Mb SIMMs
16000 x 16000 nts. with 16Mb SIMMs
Ability to calculate 8000 x 8000 in approx. 6 seconds.
Thanks for your time.
--
--- Moderator ---
Domain: curtiss@umiacs.umd.edu Phillip Curtiss
UUCP: uunet!mimsy!curtiss UMIACS - Univ. of Maryland
Phone: +1-301-405-6710 College Park, Md 20742wrp@biochsn.acc.virginia.edu (05/18/91)
In article <34608@mimsy.umd.edu> watt@eleazar.dartmouth.edu writes: > >I am currently working on a simple NuBus board for the MacII to >implement the Needleman-Wunsch algorithm as my M.S. thesis. >I will also be writing the user-interface software that will do >some basic color dot-plot homologies. > >The architecture is not all that sophisticated: I have a single >Actel FPGA to do the additions and comparisons and a lot of >DRAM to store the array for extracting the resulting alignment >by backtracking through the array. > > > Memory space for sequences up to > 4000 x 4000 nts. with 1Mb SIMMs > 8000 x 8000 nts. with 4Mb SIMMs > 16000 x 16000 nts. with 16Mb SIMMs > > Ability to calculate 8000 x 8000 in approx. 6 seconds. > It should be noted that efficient implementations of both rigorous global similarity calculations (Needleman Wunsch) and local similarity calculations (Smith-Waterman, Waterman-Eggert) that require only linear (O(N)) space have been described by Miller and Myers (CABIOS 1988 4:11-17) and Huang and Miller (CABIOS, 1990 6:373-381, Adv. Appl. Math, 1991, in press). It seems silly to place memory restrictions on the algorithm when none are necessary. Bill Pearson -- --- Moderator --- Domain: curtiss@umiacs.umd.edu Phillip Curtiss UUCP: uunet!mimsy!curtiss UMIACS - Univ. of Maryland Phone: +1-301-405-6710 College Park, Md 20742