[bionet.molbio.bio-matrix] Mike Hawley's note

sticklen@CPSWH.CPS.MSU.EDU (Jon Sticklen) (05/24/89)

mike,
  i am curious about how you propose to "siphon" biological
literature into useful computer form. what does useful
mean in that context? useful for "greping". 
  isn't there some difficulty now in biological circles
(that from the outside would appear to be rather
small circles) about the language protein
researchers use vs. the language dna researchers use?
ie, developments in one field are hard to propagate to the
other?
  it'd be easy to put all the biological literature
on line. its an entire different matter to make all those
mega-Gbytes somehow accessible.
  again, from the perspective of a CS person - if incorrect,
then please set me straight.

	Jon Sticklen

elliston@rob.UUCP ( Keith Elliston) (05/24/89)

In article <8905240412.AA04767@cpswh.cps.msu.edu>, sticklen@CPSWH.CPS.MSU.EDU (Jon Sticklen) writes:
>   i am curious about how you propose to "siphon" biological
> literature into useful computer form. what does useful
> mean in that context? useful for "greping". 

Using a straight searching function like grep would be nice for small sets of
data, but for looking at large (mega Gb's)  would not make much sense.  It would
probably take forever to look at all the occurences of a specific term to find
just th e thing you are looking for.

>   isn't there some difficulty now in biological circles
> (that from the outside would appear to be rather
> small circles) about the language protein
> researchers use vs. the language dna researchers use?
> ie, developments in one field are hard to propagate to the
> other?

There are significant problems concerning terms and usage in all disciplines.
The only way to get around it is to learn the language of the discipline.  If
one is not familiar with the language, one probably does not have the knowledge
base to make sense of the data anyway.  Having a sort of "keyword dictionary"
would alleviate the major problems of usage, but still, a general knowledge of
the discipline is always necessary to understand its literature.


>   it'd be easy to put all the biological literature
> on line. its an entire different matter to make all those
> mega-Gbytes somehow accessible.
>   again, from the perspective of a CS person - if incorrect,
> then please set me straight.

I am not really a CS person... but I think that Hypertext would make all of this
material accessible.  In fact, I think that it would make the vast information
that is now present in printed form actually useful.  It is extremely difficult
to try and sift the wheat from the chaff in todays journals.. .especially when
you sit down and try to read the 10 or so journals you are really interested in.
The information content is effectively quite low (I think that I really read
only 1 or 2 article from each issue of Science, and maybe 2 or 3 from NAR,
and etc..)  Having a Hypertext "browse" function would make this sort of 
reading both easier to do, and more efficient.  

I think that we can start making sense of this sort of material with todays
tools.  Hypercard (from Apple) is a pretty good hypertext engine.  I have seen
one really nice stack that uses hypercard and a series of XCMDs to browse
straight text files.  The stack is called TEX, and is really quite good.
In fact, I store all the usenet articles I find interesting in large files
(the author calls them dataspaces), and them push them into TEX.  It then
creates an index, and lets you Hypertext through the text files using whatever
keywords you want.  It is a really nice way to sort through lots of info to 
find that info that is of interest to you.  This sort of approach could be
used with Journals and other forms of written communication.  It would allow
a researcher to browse the current literature, and find that info that is of
the most interest to him(/her).

Anyway... I thought that I would just put my two cents worth in.... and it
looks like you all got at least a nickles worth.


> 
> 	Jon Sticklen

   -Keith Elliston\

===============================================================================
Keith O. Elliston                        |  Usenet:  uunet!rob!elliston 
Senior Information Scientist             |  Arpanet: rob!elliston@uunet.uu.net 
Merck Sharp & Dohme Res. Lab.            |  Bitnet:  elliston%rob.uucp@psuvax1
Rahway, NJ  07065  U.S.A.                |   -or-    elliston@biovax 
===============================================================================
Disclaimer:  I can have no OFFICIAL comments about anything........
===============================================================================

kristoff@NET.BIO.NET (David Kristofferson) (05/24/89)

> mike,
>   i am curious about how you propose to "siphon" biological
> literature into useful computer form. what does useful
> mean in that context? useful for "greping". 

This, of course, is the major problem.

>   isn't there some difficulty now in biological circles
> (that from the outside would appear to be rather
> small circles) about the language protein
> researchers use vs. the language dna researchers use?
> ie, developments in one field are hard to propagate to the
> other?
>   it'd be easy to put all the biological literature
> on line. its an entire different matter to make all those
> mega-Gbytes somehow accessible.
>   again, from the perspective of a CS person - if incorrect,
> then please set me straight.

The odds of rewriting all of the biological literature or that of any
other discipline using a standard nomenclature are obviously zero.
Nonetheless the National Library of Medicine is attempting to use
medical subject headings (MESH terms) in their cataloging of the
literature.  Searching can then be performed using this kind of
standardized vocabulary.  However, one still faces the need to foot it
over to the library to retrieve the text.  If journals began
publishing electronically, one could simply call this up on one's
computer screen (simple character based terminals would, of course, be
at a loss here for lack of graphics capabilities).  One then gets into
questions of copyright law, loss of subscription money to publishers,
etc., etc.  This is not a trivial problem.  The building of libraries
to house ever expanding shelves of journals still seems to be the
preferred route.  Nonetheless, I believe that this too will come to
pass although it may take a few decades.  It's very useful but not
glamorous work.  Nonetheless making literature available on-line as
above would probably do more to help the progress of biology than most
research projects do.

	Again, it would be nice to get input from someone at the NLM
on progress in these areas.
-- 
				Sincerely,

				Dave Kristofferson
				BIONET Resource Manager

				kristoff@net.bio.net
			     or	kristofferson@bionet-20.bio.net