[sci.bio] Molecular Bio texts, protein shapes and computers

roy@phri.UUCP (Roy Smith) (09/26/87)

In article <2040@super.upenn.edu> li@linc.cis.upenn.edu (Siufai Li) writes:
> Does anyone know if someone is doing research involving the prediction of
> the shape of a protein from its amino acid sequence (or nucleotide sequence)?
> [...] I wonder if such a task is possible using computers.

	This is without a doubt one of the big open issues of computational
biology (does this field have an official name?)  Lots of people are
working on it, but as far as I know, nobody has had much success.  The
old standby is an algorithm described in:

%A P.Y. Chou
%A G.D. Fasman
%T Prediction of the secondary structure of proteins from their amino
acid sequence
%J Adv. Enzymol.
%V 47
%P 45-148
%D 1978

	Chou-Fasman is almost a decade old and pretty much worthless as an
analytical tool.  Yet, because not much better has come along, people still
use it (and, in my opinion, put far too much faith it the results).

	The ultimate goal is to take an amino acid sequence and predict not
only the 3-dimensional structure, but also the function of the resultant
enzyme, and its Km.  Not that I think this ever will happen, but it's nice
to dream.
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

dd@beta.UUCP (09/27/87)

In article <2910@phri.UUCP>, roy@phri.UUCP (Roy Smith) writes:
> In article <2040@super.upenn.edu> li@linc.cis.upenn.edu (Siufai Li) writes:
> > Does anyone know if someone is doing research involving the prediction of
> > the shape of a protein from its amino acid sequence (or nucleotide sequence)?
> 	This is without a doubt one of the big open issues of computational
> biology (does this field have an official name?)  Lots of people are
> working on it, but as far as I know, nobody has had much success. 

See comments below.  Depends on what you mean by "much success"

> 
> 	Chou-Fasman is almost a decade old and pretty much worthless as an
> analytical tool.  Yet, because not much better has come along, people still
> use it (and, in my opinion, put far too much faith it the results).

This is an understatement; yet it remains the Gold Standard of structure
prediction; if you come up with an algorithm, whether statistical, computational
or heuristic (I'm thinking of the AI approaches of Abarbanel and company) you
still have to compare your results to C&F. 

> 	The ultimate goal is to take an amino acid sequence and predict not
> only the 3-dimensional structure, but also the function of the resultant
> enzyme, and its Km.  Not that I think this ever will happen, but it's nice
> to dream.
> -- 
> Roy Smith, {allegra,cmcl2,philabs}!phri!roy

I think there has been more progress here than is immediately apparent, but 
only because the gains have been in limited domains.  For example the 
Abarbanel (Biochemistry about 1983 and 1986) pattern recognition approach
works very well...but on proteins which have alpha-beta-alpha structure.
There was a paper by Stroud et al in either Immunology or the Journal of
Immunology in which they use a power spectrum analysis to detect alpha
helices which are amphipathic (specifically, hydrophobic on one side
and hydrophylic on the other).  Both these methods work with > 80% (I'm
pretty sure it's actually higher) success, but cannot comment on joe-random-
protein.

The C&F approach suffers because its database was 29 globular, highly soluble
proteins, and people have been applying the procedure to non-globular, amphi-
pathic structures.  The divide-and-conquer approach (pick a limited domain and
get it right) has yielded better results but since you can't apply them
to any protein they haven't made a splash.  

If anyone is interested I can post references to these and other articles.
For instance, some time this year in PNAS there was an article comparing
protein structure prediction methods for membrane-bound proteins.

Regarding the full tertiary prediction prospects, yes we're a long way aways,
but molecular dynamics and simulated annealing may help a great deal.

Care to comment, dizzy Dan?

dan davison / theoretical biology/ t-10 ms k710/lanl/los alamos,NM 87545
dd@lanl.gov ...cmcl2!lanl!dd