[bionet.molbio.evolution] KRAMER vs ....

MAILER@PASTEUR.BITNET (05/17/88)

From: Jean-Michel Claverie <JMC%PASTEUR.BITNET@CUNYVM.CUNY.EDU>

Just a little french intrusion in this all american debate ....

Another type of multidimensional sequence representation and objective
analysis has been found convenient by us when you dont know what to search
for , and where,  in a sequence. This is a combination of discriminant method,
k-tuple representation and multidimensional representation.

Bougueleret et al. (1988) Nucl. Acids Res. 16, 1729-1738
Claverie & Bougueleret (1986) Nucl. Acids Res. 14, 179-196.

As for the debate on the need for a vectorial representation of a.a.: I feel
it misses the point, because the main problem we are facing in sequence
analysis is that the information is not STRICTLY linear, but only
approximatively. Here are all the situations in real life:

- at some precise position a given aa is needed (ex: catalysis)
- around a position a given aa is needed (ex: the "conserved K in prot. kinases)

- at some precise position everything BUT a given aa is needed (RAS activation)
- at some precise position a given type (+,-,aromat) of aa is needed (binding)
- around a position a given aa is needed (C-C link)
- around a position a vague type of aa is needed (secondary structure conserv.)

With one more difficulty: we dont know where is that "position" ...

It is clearly illusory to think that a new computer algorithm might solve
all these problems at once, given the evolutionary noise stored in all
these sequence which, dont forget, have all evolved from very few ancestors
(may be only able to fold in a somewhat compact shape to resist hydrolysis).
For instance, how to instruct a multialignment program, very proud to have
located a conserved KKKK, or LLLL motif, that it has almost no biological
signification whatsoever, in MOST of the case. At the present, most of
the alignment program will focus on those, and mis the more subtil, isolated
and approximate, but functionaly relevant truly homologous positions.

Matrices, "perceptron" (a sexy name for the former) algorithms will fail
lamentably anytime that a strict positional constraint is not to be respected.
Representing a.a by vectors will not alleviate this difficulty inherent to
the fundamental fact that proteins are 3-D objects which only Nature (I wonder
if even God knows how ...) knows how to fold from an evolutionary-slopy
1-D aa sequence.

.. and please, dont talk about A.I. (Absolutely Incompetent) programming.

JMC@PASTEUR
Computer Science Unit, Institut Pasteur, Paris, France