dbd%benden@LANL.GOV (06/02/88)
From: dbd%benden@LANL.GOV (Dan Davison) ********************************************************************* Address postings to BIO-MATRIX@BIONET-20.ARPA Administrivia (subscription start/stop requests) to BIO-MATRIX-REQUEST@BIONET-20.ARPA ********************************************************************* The following are one person's thoughts on Sunil Maulik's questions that were posted to the BIO-MATRIX bboard a few weeks ago. > From: Sunil Maulik <MAULIK@BIONET-20.ARPA> > Subject: Questions > > I have some questions regarding the BioMatrix concept and the LiMB db > (Listing of Molecular Biology DataBases database). > > 1. What suggestions were made regarding integrating sequence > information (e.g. the PIR protein sequence database) with structure > information (e.g. the Brookhaven atomic coordinate database) ? For the full Matrix report, you should contact Andi Sutherland at the Santa Fe Institute address I posted a few weeks ago. The point you raise is a good one; the suggestion was made that a combination of the two databases with information retrieval technology would be a extremely good tool for biologists. > 2. At a recent meeting it came to light that groups in the > U.K. were placing Brookhaven in a Relational Database format (using > ORACLE). Is this format being supported and distributed in the U.S.? To the best of my knowledge (I haven't looked at LiMB yet) this is not distributed in the US, yet. I expect it to be at some point, though. > Will it [the relationalized Brookhaven db] be integrated into BIO-MATRIX ? This question implies something which is not quite the case. BIO-MATRIX is not a database or a functional tool. It is a concept, evangelized by Dr. Harold Morowitz of Yale University. The concept's underpinnings are best described in the Final Report of the Workshop on Matrix Biology. I will summarize here what my interpretation of the Matrix concept is [dissenting or correcting views welcome!]. The Matrix of Biological Knowledge is a response to the way biologists reason about their systems. Physicists have recourse to first principles and in the last 20 years we've seen implications of quantum mechanics on the cosmological scale. The complexity of biological systems is such that it's going to be a *long* time before one can reason a Tetrahymena from first principles. As each scientist thinks about their particular system, they consciously (and frequently unconsiously) reason about their system by analogy. A striking example of this appeared recently on the cover of Science; the three-dimensional structure of _ras_ is essentially identical to one proposed a few years before, based on what was known about a property of _ras_, that it binds GTP. By examining an already-determined tertiary structure of a GTP-binding protein, they were able to make an accurate prediction of what _ras_ would look like. The Matrix concept wants to organize biological knowledge so that the predictive power of models in different disciplines can be applied to a different, perhaps new, discipline. Molecular biologists have been using such reasoning for years; but what does the hydra biologist know of the models in toxicology? Are there any toxicological model systems that speak to a protoist system? I don't know the answer, and I doubt that anyone else does either. The Matrix subsitutes reasoning by analogy for reasoning from first principles. The proposal is to combine biological knowledge in three ways; (1) collect data into databases, and have the agencies that fund research get serious about the proper disposition of the knowledge they've been funding (such as requiring, as a condition of grant funding, any resulting data to be submitted to GenBank for nucleotide information; PIR for protein information; and Brookhaven for x-ray crystallographic information). (2) organize the databases in such a way that access to them is transparent. You tell your MacIntosh (sp?) that you want to know all about X; the program goes and calls MedLine, ToxLine, BRS, and whatever else...including databases that you may not know exist... and retrieves the information for you. This is the knowledge base component of the Matrix (yes, highly simpilified). (3) Tools to help get that information even if you don't know it's there; this is the Information Retrieval component of the Matrix. Much to my personal surprise, the US funding agencies are actually acting on parts of the matrix concept now. Much of the information that could have been useful is now lost to GenBank because of past practices; with the coming mega-sequencing projects we need to be sure that important information is not lost in the flood of A's, G's, C's, and T's. > 3. Are there any attempts to maintain databases of "pointers" > e.g. pointers to all sequences known to be greater than 50% homologous > to a given query sequence ? Walter Goad and others in the Theoretical Biology group at Los Alamos National Laboratory have been discussing this possibility and attempting to devise a structure for such as database. > 4. What about "higher-order" databases ? For instance would it > be possible to have databases that link genetic maps with physical maps, > or common structural motifs with homologous sequences ? Yes, it is; there are at least three separate groups at work on exactly this. > 5. How will the BIO-MATRIX type of database be distributed ? > What type of software will be needed to access the information within it? The Matrix concept is not a single monolithic database, as discussed at length above. The kind of software needed to access the information exists in prototype form; BBN has C-CIN, Bellcore has Telesophy. These two systems address different but closely related problems in accessing biological databases. For more information on these, see the Final Report or contact the authors of the programs. If there is enough interest, I will ask the authors to post a short note about their systems. BIONET is playing an important role in the Matrix concept; I have found that we have an number of readers on USENET and BITNET whose first contact with the concept was through the BIONET-distributed bulletin board. dan davison, BIO-MATRIX BBOARD leader t-10 ms k710l theoretical biology los alamos national laboratory los alamos, nm 87545 dd@lanl.gov (arpa) dd@lanl.UUCP, goad.davison@bionet-20.arpa