[bionet.biology.computational] Special-purpose hardware for fast comparison of DNA and/or protein sequences

zuker@nrcbsz.bio.nrc.ca (Michael Zuker) (05/14/91)

As far as I can determine, Andrew Coulson (biochemist) and John 
Collins (computer scientist) from Edinburgh University were the first
to use specialized hardware for DNA/protein sequence database 
searching. They used the DAP (distributed array processor) which 
comprises 4096 bit addressable processors in parallel. The DAP was 
developed by the British armed forces for military purposes, but was 
discarded. They use a rigorous Needleman-Wunsch dynamic programming 
algorithm together with a variety of PAM matrices (Dayhoff ref.)
for protein database searching. No shortcuts or compromises are 
needed as in the FASTA algorithm. In my opinion, their combination of
hardware and software is still the best.

Like Geir Hauge, I have heard many promises of specialized chips, but
I have not seen results.

References

Collins, J. F. and Coulson, A. F. W. (1984).
"Applications of parallel processing algorithms for DNA sequence
analysis."
Nucleic Acids Res. 12, 181-192.

Coulson, A. F. W., Collins, J. F. and Lyall, A. (1987). 
"Protein and nucleic acid sequence database searching: a suitable
case for parallel processing." 
The Computer Journal 30, No. 5,420-424.

Dayhoff, M. O., Schwartz, R. M. and Orcutt, B. C. (1978). 
"A Model of Evolutionary Change in Proteins." In Atlas of Protein
Sequence and Structure, Vol. 5, Suppl. 3, National Biomedical
Research Foundation, Washington, 345-352.


            _________________________________________________________
            |   ID:         Michael Zuker                            |
            |   INTER-net:  zuker@vm.nrc.ca                          |
            |               zuker@nrcbsz.bio.nrc.ca                  |
            |   PHONE-net:  (613) 993-4830                           |
            |   FAX-net:    (613) 952-0583                           |
            |   TELEX-net:  053-3145                                 |
            |   SNAIL-net:  Institute for Biological Sciences        |
            |               M-54, National Research Council          |
            |               Ottawa, Ontario                          |
            |               Canada  K1A 0R6                          |
            |=> Absolutum obsoletum - If it works, it's out of date. |
            |________________________________________________________|

--
                                --- Moderator ---
Domain: curtiss@umiacs.umd.edu		     Phillip Curtiss
  UUCP:	uunet!mimsy!curtiss		UMIACS - Univ. of Maryland
 Phone:	+1-301-405-6710			  College Park, Md 20742

watt@eleazar.dartmouth.edu (05/17/91)

howdy,

I am currently working on a simple NuBus board for the MacII to 
implement the Needleman-Wunsch algorithm as my M.S. thesis.  
I will also be writing the user-interface software that will do 
some basic color dot-plot homologies.  

The architecture is not all that sophisticated: I have a single
Actel FPGA to do the additions and comparisons and a lot of 
DRAM to store the array for extracting the resulting alignment 
by backtracking through the array.  

The project is not meant to push back the frontiers of string
comparison computer architecture, but rather to bring some of
the advances in hardware to the lab at a reasonable cost.  
The algorithm is still O(N^2), but the curve is stretched out
some.  I am hoping for a 50x speedup over my plain vanilla IIfx.  

At the moment I am working on the schematic capture and simulation
phase of the design and hope to go to prototype within a month. 

I am a hardware designer, not a biologist, and I still have some
doubts about the features of my board, so if you have any 
suggestions, I would love to hear them.  

At the moment the design has the following features:

	User settable (in Mac software) costs (integers 0-15) for
		Insertion gap creation
		Insertion of a single nucleotide
		Deletion gap creation
		Deletion of a single nucleotide
		Substitution of a single nucleotide.

	Memory space for sequences up to 
		4000 x 4000 nts. with 1Mb SIMMs
		8000 x 8000 nts. with 4Mb SIMMs
		16000 x 16000 nts. with 16Mb SIMMs

	Ability to calculate 8000 x 8000 in approx. 6 seconds.

Thanks for your time.

--
                                --- Moderator ---
Domain: curtiss@umiacs.umd.edu		     Phillip Curtiss
  UUCP:	uunet!mimsy!curtiss		UMIACS - Univ. of Maryland
 Phone:	+1-301-405-6710			  College Park, Md 20742

wrp@biochsn.acc.virginia.edu (05/18/91)

In article <34608@mimsy.umd.edu> watt@eleazar.dartmouth.edu writes:
>

>I am currently working on a simple NuBus board for the MacII to 
>implement the Needleman-Wunsch algorithm as my M.S. thesis.  
>I will also be writing the user-interface software that will do 
>some basic color dot-plot homologies.  
>
>The architecture is not all that sophisticated: I have a single
>Actel FPGA to do the additions and comparisons and a lot of 
>DRAM to store the array for extracting the resulting alignment 
>by backtracking through the array.  
>
>
>	Memory space for sequences up to 
>		4000 x 4000 nts. with 1Mb SIMMs
>		8000 x 8000 nts. with 4Mb SIMMs
>		16000 x 16000 nts. with 16Mb SIMMs
>
>	Ability to calculate 8000 x 8000 in approx. 6 seconds.
>

	It should be noted that efficient implementations of both
rigorous global similarity calculations (Needleman Wunsch) and local
similarity calculations (Smith-Waterman, Waterman-Eggert) that require
only linear (O(N)) space have been described by Miller and Myers
(CABIOS 1988 4:11-17) and Huang and Miller (CABIOS, 1990 6:373-381,
Adv. Appl. Math, 1991, in press).

	It seems silly to place memory restrictions on the algorithm
when none are necessary.

Bill Pearson
--
                                --- Moderator ---
Domain: curtiss@umiacs.umd.edu		     Phillip Curtiss
  UUCP:	uunet!mimsy!curtiss		UMIACS - Univ. of Maryland
 Phone:	+1-301-405-6710			  College Park, Md 20742