[bionet.molbio.bio-matrix] Response to Sunil Malik's questions

dbd%benden@LANL.GOV (06/02/88)
From: dbd%benden@LANL.GOV (Dan Davison)

*********************************************************************
Address postings to BIO-MATRIX@BIONET-20.ARPA
Administrivia (subscription start/stop requests) to
					BIO-MATRIX-REQUEST@BIONET-20.ARPA
*********************************************************************

The following are one person's thoughts on Sunil Maulik's questions that
were posted to the BIO-MATRIX bboard a few weeks ago.

>  From: Sunil Maulik <MAULIK@BIONET-20.ARPA>
>  Subject: Questions
>  
>  I have some questions regarding the BioMatrix concept and the LiMB db
>  (Listing of Molecular Biology DataBases database).
>  
>  	1. What suggestions were made regarding integrating sequence
>  information (e.g. the PIR protein sequence database) with structure
>  information (e.g. the Brookhaven atomic coordinate database) ?

	For the full Matrix report, you should contact Andi Sutherland
at the Santa Fe Institute address I posted a few weeks ago.  The point
you raise is a good one; the suggestion was made that a combination of
the two databases with information retrieval technology would be a
extremely good tool for biologists.


>  	2. At a recent meeting it came to light that groups in the
>  U.K. were placing Brookhaven in a Relational Database format (using
>  ORACLE). Is this format being supported and distributed in the U.S.? 


	To the best of my knowledge (I haven't looked at LiMB yet)
this is not distributed in the US, yet.  I expect it to be at some
point, though.

>  Will it [the relationalized Brookhaven db] be integrated into BIO-MATRIX ?

	This question implies something which is not quite the case.
BIO-MATRIX is not a database or a functional tool.  It is a concept,
evangelized by Dr. Harold Morowitz of Yale University.  The concept's
underpinnings are best described in the Final Report of the Workshop
on Matrix Biology.  I will summarize here what my interpretation of
the Matrix concept is [dissenting or correcting views welcome!].

The Matrix of Biological Knowledge is a response to the way biologists
reason about their systems.  Physicists have recourse to first
principles and in the last 20 years we've seen implications of quantum
mechanics on the cosmological scale.  The complexity of biological
systems is such that it's going to be a *long* time before one can
reason a Tetrahymena from first principles.  As each scientist thinks
about their particular system, they consciously (and frequently
unconsiously) reason about their system by analogy.  A striking
example of this appeared recently on the cover of Science; the
three-dimensional structure of _ras_ is essentially identical to one
proposed a few years before, based on what was known about a property
of _ras_, that it binds GTP.  By examining an already-determined
tertiary structure of a GTP-binding protein, they were able to make an
accurate prediction of what _ras_ would look like. 

	The Matrix concept wants to organize biological knowledge so
that the predictive power of models in different disciplines can be
applied to a different, perhaps new, discipline.  Molecular biologists
have been using such reasoning for years; but what does the hydra
biologist know of the models in toxicology?  Are there any
toxicological model systems that speak to a protoist system?  I don't
know the answer, and I doubt that anyone else does either.  

	The Matrix subsitutes reasoning by analogy for reasoning
from first principles.  The proposal is to combine biological
knowledge in three ways; (1) collect data into databases, and have the
agencies that fund research get serious about the proper disposition
of the knowledge they've been funding (such as requiring, as a
condition of grant funding, any resulting data to be submitted to
GenBank for nucleotide information; PIR for protein information; and
Brookhaven for x-ray crystallographic information).  (2) organize the
databases in such a way that access to them is transparent.  You tell
your MacIntosh (sp?) that you want to know all about X; the program
goes and calls MedLine, ToxLine, BRS, and whatever else...including
databases that you may not know exist... and retrieves the information
for you.  This is the knowledge base component of the Matrix (yes,
highly simpilified).  (3) Tools to help get that information even if
you don't know it's there; this is the Information Retrieval component
of the Matrix.

	Much to my personal surprise, the US funding agencies are
actually acting on parts of the matrix concept now.  Much of the
information that could have been useful is now lost to GenBank because
of past practices; with the coming mega-sequencing projects we need to
be sure that important information is not lost in the flood of A's,
G's, C's, and T's.

>  	3. Are there any attempts to maintain databases of "pointers"
>  e.g. pointers to all  sequences known to be greater than 50% homologous
>  to a given query sequence ?

	Walter Goad and others in the Theoretical Biology group at Los Alamos
National Laboratory have been discussing this possibility and
attempting to devise a structure for such as database.

>  	4. What about "higher-order" databases ? For instance  would it
>  be possible to have databases that link genetic maps with physical maps,
>  or common structural motifs with homologous sequences ?

	Yes, it is; there are at least three separate groups at work
on exactly this.
 
>  	5. How will the BIO-MATRIX type of database be distributed ?
>  What type of software will be needed to access the information within it?

	The Matrix concept is not a single monolithic database, as
discussed at length above.  The kind of software needed to access the
information exists in prototype form; BBN has C-CIN, Bellcore has
Telesophy.  These two systems address different but closely related
problems in accessing biological databases.  For more information on
these, see the Final Report or contact the authors of the programs.
If there is enough interest, I will ask the authors to post a short
note about their systems.

	BIONET is playing an important role in the Matrix concept; I
have found that we have an number of readers on USENET and BITNET
whose first contact with the concept was through the
BIONET-distributed bulletin board.


dan davison, BIO-MATRIX BBOARD leader
t-10 ms k710l
theoretical biology
los alamos national laboratory
los alamos, nm 87545
dd@lanl.gov (arpa) dd@lanl.UUCP, goad.davison@bionet-20.arpa