hybl@mbph.UUCP (Albert Hybl Dept of Biophysics SM) (11/27/89)
In message <2179@prune.bbn.com> from rsalz@bbn.com (Rich Salz) Re: Modifying news storage for fast searches dated 22 Nov 89 writes: >In <51195@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes: >>...Another idea is to store articles in a special compressed form >>that lists the dictionary first (ie. the list of words) followed >>by the text expressed and indices into the word list. > >Free-text retrieval is basically a solved problem. ...[See] >"Some Examples of Inverted Indices on the Unix System" by >Mike Lesk (USD:30 in the BSD docs, ...for Version 7, ...). > >There will be a [relevant posting] in c.s.unix in a couple of weeks. > /r$ I have been using the M. Lesk inverted indexes for maintaining bibliographic citation on several general subjects. For example, I search the MaryMed database located at UMAB Health Science Library for all citations on "cholera"; down load the complete list containing Title, Author, Subject, Keywords, Abstract, and a few other categories for each reference; filter each citation into the Lesk format and then produce the inverted reference files. The HSL does not provide means to search for words in the Abstracts, I can search for "X-ray crystal structure toxin" and find citations that the MaryMed software would ignore. In addition, of course, refer can be used for insertion of citations into a document being produced with the aid of (n/t)roff. I would like to see something like this available on the USENET not to complement or supplant my daily reading technique but for search of news.group archives for specific information. I would like to use control commands that would: seekinfo -n news.all -k "history expire failed write" seekinfo -y 198[7-9] -n news.admin -k "expire date unparsable" because I have been annoyed by: >Thu Nov 16 21:19:00 EST 1989 >expire: history write failed >expire: history write failed >expire: history write failed > >Fri Nov 17 21:19:00 EST 1989 >expire: Unparsable date "31 Dec 69 23:59:59 GMT" >expire: history write failed >expire: history write failed >expire: history write failed >expire: history write failed I think that I remember these question have been posted before and rather than reposting them I want to located what already exists in some archive somewhere. The AT&T refer package contains programs: mkey, inv, hunt, refer, deliv and others; I don't think they are in the public domain. The refer program is not needed and the other programs in the package are not entirely ideal. However, they work well enough to persuade me that the technique could be applied to large data bases like MedLine citations and an archive of USENET postings. ---------------------------------------------------------------------- Albert Hybl, PhD. Office UUCP: uunet!mimsy!mbph!hybl Department of Biophysics Home UUCP: uunet!mimsy!mbph!hybl!ah University of Maryland CoSy: ahybl School of Medicine Office Phone: (301) 328-7940 Baltimore, MD 21201 Home Phone: (301) 243-1710 ---------------------------------------------------------------------- Responders--DO NOT USE: hybl@cs.umd.edu or ah@cs.umd.edu