[bionet.molbio.bio-matrix] All that genome data...

Peter.Rice%EMBL@PUCC.PRINCETON.EDU (Peter Rice) (02/15/91)

Bio-matrix has been quiet for so long, it is sad to see a spate of "please
remove me" messages as soon as a lively discussion springs to life. Perhaps I
can dilute the effect, and increase the number of postings, by starting a new
discussion.

There are now several "genome projects" under way, either (1) to map and
sequence (Caenorhabditis elegans, alias the nematode or the worm) (Drosophila
melanogaster, alias "the fly"), (Saccharomyces cerevisiae, or "yeast"),
(Schizosaccharomyces pombe, or "the other yeast"). (Arabidopsis thaliana, or
"the plant") (and others becoming too numerous to mention) or (2) simply to
finish the job that individual labs have done pretty well already (Escherichia
coli, or E.coli for short, which is already over 30% sequenced and pretty well
mapped both genetically and physically).

I was wondering what the current status is of each of these projects, and also
what their aims should be; in particular what the biological community sees as
the needs for the storage of the data, and how to access it in individual labs.
In the case of E.coli for example the data is already there in a large number of

publications but online access to it is a serious problem. The other projects
are just starting and have the opportunity to get things right from the
beginning.

Enough for starters. If this turns you off too, please say what you want to see
on bio-matrix.

 -----------------------------------------------------------------------------
 Peter Rice, EMBL                             | Post: Computer Group
                                              |       European Molecular
 Internet:    Peter.Rice@EMBL-Heidelberg.DE   |            Biology Laboratory
 EARN/Bitnet: rice@embl.bitnet                |       Postfach 10-2209
                                              |       D-6900 Heidelberg
 Phone:   +49-6221-387247                     |       Germany

gilbertd@cricket.bio.indiana.edu (Don Gilbert) (02/15/91)

In article <9102142237.AA11441@genbank.bio.net> Peter.Rice%EMBL@PUCC.PRINCETON.EDU (Peter Rice) writes:
>
>There are now several "genome projects" under way, either (1) to map and
...
>I was wondering what the current status is of each of these projects, and also
>what their aims should be; in particular what the biological community sees as
>the needs for the storage of the data, and how to access it in individual labs.
>In the case of E.coli for example the data is already there in a large number of
>publications but online access to it is a serious problem. The other projects
>are just starting and have the opportunity to get things right from the
>beginning.

The Worm sequencing project includes much software development.
There is now a working, X-Window based worm data base browser that
integrates sequence data, physical / contig maps, genetic and other
descriptive data and references.  It also includes an "annotation"
facility where worm researchers can interactively add or change
data in the primary, remotely-located data files.  

The Fly sequencing project is gearing up for some kind of similar
integrated data management which will be available to fly biologists
with Internet (tcp/ip) network access.

I'll leave it to others to provide more details (I'm not directly
involved in these).  But you can expect to hear more on these in
the coming months.  This would be a good time for people to
discuss what their needs are for access to such data, and in what
form. E.g., is X-Window software running on central computers that
host the data, and that allows groups of researchers scattered around 
the globe to view and add to the data, a viable way to proceed?
Some of the recent news (minutes of JITF?) on the genome-program 
newsgroup touches on this.

-- Don

-- 
Don Gilbert                                     gilbert@bio.indiana.edu
biocomputing office, biology dept., indiana univ., bloomington, in 47405