[bionet.software] Life beyond GUI's??

B_FOLEY%UVMVAX@PUCC.PRINCETON.EDU (03/19/91)

The UNIX/MAC/DOS/VMS/GUI debate has been quite interesting and
somewhat informative.  I would like to bring up another topic
that won't be solved by open discussion, but that people should
give some thought to:  The data we work with.

From the length of the discussions I have seen on minor changes in
GenBank data format, I am sure that opinions will run strongly in
many directions on database issues.  This list should be a good
conduit for exposing some of those directions.

IMHO we need to be more concerned with getting good data into
machine-readable form than we are about what format the data is in.
Re-formating IS a major pain, but having a database in a bad format
that can be re-formatted by machine is better than not having a
database at all.

EMBL/GenBank/PIR are wonderful resources that are tremendously
needed and used by many biologists/biochemists, but they are
only the tip of an iceberg that is rapidly coming into view.
These databases are quite up to date at storing the data that
they set out to store, but one can imagine storing much more
information in database form.

Examples:

1) It might be useful to be able to search for all genes known
to be expressed in liver in response to insulin.  A lot of this
type of information is known, but it may never get cross-indexed
to the existing database entries for those genes.

2) It would be useful to be able to find out what mutagenic agent
(if known) was responsible for each mutation site noted in EMBL/
GenBank.  This data is not stored.

3) It would be nice to have a database of known secondary structures
of RNAs such as tRNAs and self-splicing introns.  But it would be
another leap to have this information "mapped" onto the existing
database entries for those structural RNAs and the genes that
encode them.  Looking at a GenBank entry for a tRNA right now
gives you no clue if the secondary structure is known and stored
somewhere.

4) A look through LiMB shows that hundreds of micro-databases are
springing up.  Is anyone thinking about a grand scheme to plan
them so that they can be cross-indexed or linked at some future
point?  I know it took GenBank quite some effort just to put the
E. coli K12 map positions onto each E. coli gene in GenBank.

5) The Human Genome Project will generate enough raw data that
we may not have time to enhance the existing data we have.  Should
some sort of survey be taken to see what priority gets put on the
various types of data?

6) We are making big progress in Artificial Intelligence fields.  Are
current journal publications being edged into a format that might
facilitate machine reading or at least machine-searching of journals?
I know that what I can do with MEDLINE now is a vast improvement
over scanning Current Contents, but I think even MEDLINE is
quite crude compared to what is potentially possible.

7) Should anyone be trained in Bio-Information?  Should we make
database design and maintenance a science in itself?  Should
graduate students in biological sciences be forced to take a
computer science class?

I could go on, but I think I have said too much already.  I would like
to end with a round of applause for GenBank, EMBL, PIR and all of
the databases; for Don Gilbert of IUBIO, Dan Davison of UH-gene-server,
Rob Harper of FINFUN, and all the other very helpful people in
NET-LAND; for John Devereux of UW-GCG, Jim Ostell of IBI/NBRF, Amos Bairoch
of Pro-Site and all the other programmers!!!!!

I hope to be of help someday soon too.

Naturally;
Brian Foley
B_FOLEY@UVMVAX.UVM.EDU

gilbertd@cricket.bio.indiana.edu (Don Gilbert) (03/20/91)

Perhaps Jim Ostell or someone from NCBI will comment on what they
are working on, as it addresses several of your points on advancing
the organization of molecular biology data.  There is also an
RNA database group working toward one of your points.  You can expect
to see some of these projects forthcoming in the next months, I
beleive.
-- Don

-- 
Don Gilbert                                     gilbert@bio.indiana.edu
biocomputing office, biology dept., indiana univ., bloomington, in 47405