jones@THINK.COM (Robert Jones) (06/25/91)
As a follow up to Rob Harper's messages about WAIS (Wide Area Information Servers) here is the scoop on the two molecular biology sources that are currently available. NIH Guide (source NIH) This is a chunk of the NIH Guide to RFAs and Announcements covering a few months GenBank (source MOLBIO) This is the TEXT component of the Bacterial Division of GenBank (Release 65 if I remember correctly). Sequences are not included as doing text search on sequences isn't too useful. I'm responsible for preparing these. Note that currently they are NOT supported so use them to try out the system - don't expect them to be kept up to date for a while. WAIS provides a consistent mechanism for a variety of interfaces (X11, gmacs, Mac, Unix shell) on various machines to access a variety of information servers on various machines (Macs, Unix boxes, Connection Machines) - transparency is the key - you don't need to know where the database is or what it's running on. Currently the search engines that are available for different machines vary in one important feature. UNIX and Mac search engines that we provide (public domain) simply index a file and search for keywords. The Connection Machine Text Retrieval software is a commercial package that is quite sophisticated. In particular it provides Relevance Feedback. By way of a simplistic example of the value of Relevance Feedback consider searching a text database for the keyword HIV. You would miss those articles that referred to AIDS but not HIV. With relevance feedback the user has the option of examining the articles found by the 'HIV' search, marking some of these as being "What I'm interested in" and then asking for more articles "like the ones I've marked". The software examines the full text of the marked articles and extracts common terms including, one might expect the term 'AIDS'. This new set of terms is used to rescreen the database and any new matches are presented to the user. As biology abounds with nomenclature issues like this I feel that this approach has a lot of value for our community. Most of the user interfaces that are around provide the option of relevance feedback even if the information server does not support it. This can be confusing - check the source that you want to work with. Both the biology sources reside on a Connection Machine here in Cambridge. I hope to be able spend more time on this project over the summer. Specifically I want to set things up so that the NIH Guide is automatically updated. Doing the same for the GenBank server is somewhat more complicated and it takes up a lot of space (I used to have all of GenBank and PIR on there) but I'll endeavour to do the same. Text-rich databases like Amos Bairoch's are an excellent candidate for WAIS databases. Play with it ... try out the variety of databases that are available on all sorts of topics ... let us know what you think regards --Robert Jones Thinking Machines Corporation jones@think.com