LEWIS%cs.umass.edu@RELAY.CS.NET (10/21/86)
Does anyone know of a KWIC (Keyword In Context) indexing program that runs under VMS? I would be interesting in hearing of one (preferably public domain) as well as of other programs useful in Information Retrieval research. --David D. Lewis COINS Dept. Univ. of Massachusetts Amherst, MA 01004 LEWIS%cs.umass.edu
LEICHTER-JERRY@YALE.ARPA (10/22/86)
Does anyone know of a KWIC (Keyword In Context) indexing program that runs under VMS? I would be interesting in hearing of one (preferably public domain) as well as of other programs useful in Information Retrieval research. Such a program is part of the DECUS C distribution - in fact, it's used to build the DECUS C documentation. Runs fine on VMS under VAX C. A couple of related Unix-like tools are also in the DECUS C distribution, including tr, mc (a multi-column filter), wc, and grep. I am not in a position to do massive distributions of this stuff, or to make it available via FTP; it would be best if you went through DECUS. Note that some DECUS C distributions include full source, while others are pre-built versions for various OS's - the full source versions are BIG. I'm quite sure there is no pre-built version for VMS, so pick some convenient source distribution and go from there. I've attached the documentation for DECUS C KWIK [sic] below. -- Jerry ******** * kwik * ******** NAME: kwik -- Keyword in Context Index SYNOPSIS: kwik [options] [file ...] DESCRIPTION: Kwik constructs a keyword in context (kwik) index using the data in the named files, writing the resulting index to the standard output. The standard input is read if no files are specified; kwik may be used as a filter. The following options are defined: -s The kwik index normally excludes common (stop) words. Specifing the '-s' option empties the stop list, thus including the following words: a by in to an for of the and from on with at -r Make the index in reverse alphabetic order. -t offset This is used to build index tables. The input is entered in the following format: name<TAB>index text The kwik index will be output with the name in the left hand column. The kwik'ed text then follows. The '-t' option takes a mandatory argument: the column at which the first byte of the kwik'ed text should be placed. For example, the index for the Decus library documentation was produced by the following command: kwik -t 16 -w 64 <infile >outfile -x file_name The named file contains a user-specified exclusion (stopword) list. The '-x' option may be repeated if multiple exclusion lists are needed. Note that the order of the '-s' and '-x' options is important: kwik -x file The file contains an exclusion list, one word per line. Append the contents of the file to the default list. kwik -s -x file Replace the default stoplist by the contents of the named file. kwik -x file -s After reading the exclusion file, the entire stop list is erased. (This is not a useful procedure.) -w width The output line width is normally 80 characters. The '-w' option changes it to a user-specified value. [detailed stuff omitted] AUTHOR: David Conroy, Martin Minow [more details omitted] -------