[bionet.molbio.genbank] Creating Electronic Research Libraries

hybl@mbph.UUCP (Albert Hybl Dept of Biophysics SM) (12/19/89)

Dear Santa,

The Internet has been providing useful services to the academic
community.  It provides a vehicle for exchange of electronic mail;
the usenet is an international bulletin board for communication
with a wide audience; it disseminates and archives public domain
software; bionet.journal.contents is the fledgling equivalent of
an electronic current contents; the bionet.sci-resources announces
funding opportunities and changes in NIH regulations.

Contributors to the bionet.molbio.genbank news group are advocating
that the Internet be used for the electronic transfer of current and
archived gene sequences.  I agree!  However, the Internet could also
be used for the dissemination of protein sequences, crystallographic
data from small molecules and macromolecules, bibliographic abstracts
such as Med-Line or Chem Abstracts and a host of other information
resources.

There appears to be a need to establish electronic research libraries.
Slip into any bricked academic library and you can find shelf after
shelf devoted to various publications such as Chem Abstracts
and Index Medicus, a few terminals used for accessing Med-Line, and
several PCs each dedicated to a single minded task.  Faculty and
students use these facilities to keep current on certain subjects
or to locate reference to information for a research project.

Surely it would be much more efficient to receive electronic mail
or to login and read from the current postings to info.groups based
the user's interest profile.  The electronic research libraries
could use the gigabyte or multiple terabyte optical storage devices;
these devices are considerably less expensive than building more
brick libraries.  What are the advantages?

o  Rapid posting of new entries

The public domain software already developed for the usenet can
be used to transfer compressed batches of information between
the libraries.  The generators of the information can post
directly to the network eliminating publication delays or
tape/disk production and distribution problems.

o  User profile subject selection (.medrc or .chemrc)

The National Library of Medicine publishes a tree structure
for its Medical Subject Headings; hence, just as a user defines the
news group he wishes to read in a .newsrc file, he could define
his interest profile in a .medrc or .chemrc file.  Again the
public domain software of the usenet could be used.

o  Archiving the entries

Archiving is essential.  Once it becomes obvious that the PCs
located in brick libraries can not provide the breath of
service that is technically possible, then perhaps enlightened
administrators will begin to equip and staff the facilities
needed to accomplish the job.  Software like the M. Lesk Refer
programs can be used to maintain links to the archived information.

o  User retrieval of archived entries

An UNIX shell program like lookbib or seekbib can be used to extract
relevant reference from an M. Lesk archive.  Because it is unlikely
that all users will be able to ftp directly to the computer system
that contains the archive, they should be able to issue a control
command that would cause the remote execution of a seekbib shell
with the results of the search returned via e-mail.  Although
there is plenty of room for improvements, software for all the
proposed services exist.

Technically both hardware and software exists to create electronic
research libraries.  So, Santa, please give our administrators
the wisdom to speedily provide the services.

Thank you,
----------------------------------------------------------------------
Albert Hybl, PhD.             UUCP  Office: uunet!mimsy!mbph!hybl
Department of Biophysics              Home: uunet!mimsy!mbph!hybl!ah
University of Maryland        Bitnet: hybl@umbc1         CoSy: ahybl
School of Medicine            Phone Office: (301) 328-7940
Baltimore, MD  21201                  Home: (301) 243-1710
----------------------------------------------------------------------
Responders--DO NOT USE:  hybl@cs.umd.edu  or  ah@cs.umd.edu 

kristoff@genbank.BIO.NET (David Kristofferson) (12/20/89)

Albert,

	You may be relieved to note that many people at the NIH read
the bionet.* newsgroups, so if Santa lives at the NIH then he/she may
be listening.  The usual problem, though, is who is going to pay for
all of these wonderful things.  Perhaps an answer might be found in
the long term savings on library shelf space, buildings, etc.

	This is more than just an NIH jurisdictional issue
unfortunately.  It may be of interest to note that Senator Gore is
pushing a national networking initiative and this may be a good place
to begin rallying support.

Ho, ho, ho!
-- 
				Sincerely,

				Dave Kristofferson
				GenBank On-line Service Manager

				kristoff@genbank.bio.net

sachs@tanner.berkeley.edu (Rainer Sachs) (12/20/89)

In article <617@mbph.UUCP> hybl@mbph.UUCP (Albert Hybl  Dept of Biophysics  SM) writes:
>
>Dear Santa,...
>There appears to be a need to establish electronic research libraries.
>...
>An UNIX shell program like lookbib or seekbib can be used to extract
>relevant reference from an M. Lesk archive.  Because it is unlikely
>that all users will be able to ftp directly to the computer system
>that contains the archive, they should be able to issue a control
>command that would cause the remote execution of a seekbib shell
>with the results of the search returned via e-mail.  ...

Is this advocating non-interactive searches?
Speaking for myself, I think a non-interactive search would be
virtually useless.  I have lot of experience searching medline.
I can't recall *ever* getting the search exactly right the first
time.  When I do get it right, I always need to pick and
choose among abstracts in order to get the ones I want without
hopelessly cluttering up my own reference files, duplicating, etc.
Assuming Santa has limited funds, I would urge him to put them
only into programs and situations where interactive searches are
possible.  How does the net feel?

karish@forel.stanford.edu (Chuck Karish) (12/20/89)

[ Note the `Followup-To:' header.  This is an exciting topic as
  far as data sharing goes, but it's probably a tangential thread
  for most of the newsgroups in the original distribution. --crk ]

In article <1989Dec20.011113.724@agate.berkeley.edu> sachs@tanner.UUCP
(Rainer Sachs) wrote:
>In article <617@mbph.UUCP> hybl@mbph.UUCP
(Albert Hybl  Dept of Biophysics  SM) writes:
>>Dear Santa,...
>>There appears to be a need to establish electronic research libraries.
>>...
>>An UNIX shell program like lookbib or seekbib can be used to extract
>>relevant reference from an M. Lesk archive.  Because it is unlikely
>>that all users will be able to ftp directly to the computer system
>>that contains the archive, they should be able to issue a control
>>command that would cause the remote execution of a seekbib shell
>>with the results of the search returned via e-mail.  ...
>
>Is this advocating non-interactive searches?

    So it would seem.

>Speaking for myself, I think a non-interactive search would be
>virtually useless.  I have lot of experience searching medline.
>I can't recall *ever* getting the search exactly right the first
>time.  When I do get it right, I always need to pick and
>choose among abstracts in order to get the ones I want without
>hopelessly cluttering up my own reference files, duplicating, etc.
>Assuming Santa has limited funds, I would urge him to put them
>only into programs and situations where interactive searches are
>possible.  How does the net feel?

  I agree.  Mr. Hybl was not ambitious enough in compiling his wish
  list.  An electronic library should be built around the capabilities
  of the available high-speed networks.  I'd do this by finding or
  inventing an Internet protocol that users connected to the NSFnet
  could use to make queries of remote database server machines.

  The user would invoke an interactive client program on the local
  machine.  This client would send requests to the server, which would
  search the database using the chosen patterns and return the
  results.  Once the user has found interesting information, it might
  be downloaded directly to the screen or to a file, depending on the
  amount and organization of the data.

  It might be desirable to provide hooks by which users without NSFnet
  access might use the servers in batch mode, presumably by downloading
  the indexes by anonymous UUCP or by using a LISTSERV-style
  interface.

  A USENET-style interface would be of limited utility, unless several
  servers are to share all data submitted.  For the end user, news
  readers aren't good enough at searching databases.  The server model
  exemplified by NNTP might be a good inspiration for the proposed
  project, but the protocol would be different.

	Chuck Karish		karish@mindcraft.com
	(415) 323-9000		karish@forel.stanford.edu

roy@phri.nyu.edu (Roy Smith) (12/20/89)

In <617@mbph.UUCP> hybl@mbph.UUCP (Albert Hybl) writes:
> Speaking for myself, I think a non-interactive search would be virtually
> useless [...] Assuming Santa has limited funds, I would urge him to put them
> only into programs and situations where interactive searches are possible.

	I absolutely agree.  This is the old batch processing vs. time
sharing argument all over again.  Batch lost out a decade or two ago, why
should we bring it back in the guise of bibliographic search systems?
--
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,philabs,cmcl2,rutgers,hombre}!phri!roy
"My karma ran over my dogma"

hybl@mbph.UUCP (Albert Hybl Dept of Biophysics SM) (12/21/89)

In message <1989Dec20.011113.724@agate.berkeley.edu> from 
sachs@tanner.berkeley.edu, Rainer Sachs writes:
>>In article <617@mbph.UUCP> hybl@mbph.UUCP (Albert Hybl
>> Dept of Biophysics  SM) writes:
>>There appears to be a need to establish electronic research libraries.
>>...
>>A UNIX shell program like lookbib or seekbib can be used to extract
>>relevant reference from an M. Lesk archive.  Because it is unlikely
>>that all users will be able to ftp directly to the computer system
>>that contains the archive, they should be able to issue a control
>>command that would cause the remote execution of a seekbib shell
>>with the results of the search returned via e-mail.  ...
>
>Is this advocating non-interactive searches? [...]
>Assuming Santa has limited funds, I would urge him to put them
>only into programs and situations where interactive searches are
>possible.

I agree that electronic research libraries must be available to
users on an interactive basis.  However, there are circumstances
where interactive access isn't necessary.  For example, let's say
that I may not want to take time to read certain info.groups on a
daily basis.  I might prefer to create an entry in my cron list 
that would periodically post seekbib control commands that would
locate citations posted during the previous week or month that
satisfy my interest profile and e-mail them to me.  In an article
titled "Specificity pockets for the side chains of peptide antigens
in HLA-Aw68" by Garret, Saper, Bjorkman, Strominger & Wiley in Nature
342(89Dec7): 692-696, I read:  "Coordinates of HLA-Aw68 have been
deposited in the Brookhaven Protein Data Bank (Accession no 2HLA)."
Why deny me a control initiated "seekbib PDB.2HLA" to find and deliver
the coordinate file during odd hours?  For your information, seekbib
is a 115 line shell that I use interactively; it is my modification of
lookbib.  It would be a trivial matter to further modify it to produce
a version that respond to remote execution requests submitted by a
posted control command.  I expect that an electronic library would
offer several alternatives for reading or retrieving information from
its holdings; seekbib could be just one choice.

Sincerely yours,
----------------------------------------------------------------------
Albert Hybl, PhD.             UUCP  Office: uunet!mimsy!mbph!hybl
Department of Biophysics              Home: uunet!mimsy!mbph!hybl!ah
University of Maryland        Bitnet: hybl@umbc1         CoSy: ahybl
School of Medicine            Phone Office: (301) 328-7940
Baltimore, MD  21201                  Home: (301) 243-1710
----------------------------------------------------------------------
Responders--DO NOT USE:  hybl@cs.umd.edu  or  ah@cs.umd.edu 

hybl@mbph.UUCP (Albert Hybl Dept of Biophysics SM) (12/21/89)

The following e-mail message supports my assertion that creating
electronic research libraries is technically feasible:
>The fundamental problem is obtaining this information in electronic
>form.  Everything else is easy.  ---rick

Some of the information already exists in electronic form; Inertia,
Policy, Money, Politics are the real obstacles.  The Health Science
Library (HSL) at UMAB provides a mini Med-Line service called MaryMed.
At present the MaryMED data base is updated at most once or twice a
month from magnetic tapes obtained from NLM.  The following is a
summary:

}    The MaryMED database contains citations to English language
}    articles in journals owned by the [HSL] Library.
}     
}    Contents of the MaryMED database: [Wed Dec 20 15:32:53 EST 1989]
}      JAN-DEC 1989 Updates  228,699 citations
}      JAN-DEC 1988 Updates  205,489 citations
}      NOV-DEC 1987 Updates   33,164 citations
}                            -------
}                            467,352 citations

MaryMED averages roughly 600 citations a day while usenet generates
about 2000 messages.  The average MaryMED citation is about
1,500 bytes; the average usenet message is 2,358 bytes long.  Usenet
moves over 4 megabytes per day while MaryMED processes less than 1
megabyte per day.  The usenet distributes messages among various
branches and twigs of its newsgroups tree structure.  The NLM likewise
allows searches based on a tree structure of Medical Subject Headings.
The USENET allows cross posting of a news article.  MaryMED articles
have many keywords associated with the reference that would be amenable
for cross posting.

MaryMed citations are limited to the HSL journals published in English 
and abstracted during the past two years.  I would prefer a more
complete search.  The restriction to the current two years is both
too long and too short.  It is too short when initiating a search on
a new subject where it is desired to locate reference 10, 20 or more
years old.  It is too short for the user wanting to search only the
most recent month or two.  MaryMed is a giant step forward for the
HSL but only a small step toward an electronic research library.

----------------------------------------------------------------------
Albert Hybl, PhD.             UUCP  Office: uunet!mimsy!mbph!hybl
Department of Biophysics              Home: uunet!mimsy!mbph!hybl!ah
University of Maryland        Bitnet: hybl@umbc1         CoSy: ahybl
School of Medicine            Phone Office: (301) 328-7940
Baltimore, MD  21201                  Home: (301) 243-1710
----------------------------------------------------------------------
Responders--DO NOT USE:  hybl@cs.umd.edu  or  ah@cs.umd.edu