LEWIS%cs.umass.edu@RELAY.CS.NET (10/21/86)
Does anyone know of a KWIC (Keyword In Context) indexing program
that runs under VMS? I would be interesting in hearing of one (preferably
public domain) as well as of other programs useful in Information Retrieval
research.
--David D. Lewis
COINS Dept.
Univ. of Massachusetts
Amherst, MA 01004
LEWIS%cs.umass.eduLEICHTER-JERRY@YALE.ARPA (10/22/86)
Does anyone know of a KWIC (Keyword In Context) indexing program
that runs under VMS? I would be interesting in hearing of one (preferably
public domain) as well as of other programs useful in Information Retrieval
research.
Such a program is part of the DECUS C distribution - in fact, it's used to
build the DECUS C documentation. Runs fine on VMS under VAX C.
A couple of related Unix-like tools are also in the DECUS C distribution,
including tr, mc (a multi-column filter), wc, and grep.
I am not in a position to do massive distributions of this stuff, or to make
it available via FTP; it would be best if you went through DECUS.
Note that some DECUS C distributions include full source, while others are
pre-built versions for various OS's - the full source versions are BIG. I'm
quite sure there is no pre-built version for VMS, so pick some convenient
source distribution and go from there.
I've attached the documentation for DECUS C KWIK [sic] below.
-- Jerry
********
* kwik *
********
NAME: kwik -- Keyword in Context Index
SYNOPSIS:
kwik [options] [file ...]
DESCRIPTION:
Kwik constructs a keyword in context (kwik) index using
the data in the named files, writing the resulting index
to the standard output. The standard input is read if
no files are specified; kwik may be used as a filter.
The following options are defined:
-s The kwik index normally excludes common
(stop) words. Specifing the '-s' option
empties the stop list, thus including
the following words:
a by in to
an for of the
and from on with
at
-r Make the index in reverse alphabetic
order.
-t offset This is used to build index tables. The
input is entered in the following
format:
name<TAB>index text
The kwik index will be output with the
name in the left hand column. The
kwik'ed text then follows. The '-t'
option takes a mandatory argument: the
column at which the first byte of the
kwik'ed text should be placed. For
example, the index for the Decus library
documentation was produced by the
following command:
kwik -t 16 -w 64 <infile >outfile
-x file_name The named file contains a user-specified
exclusion (stopword) list. The '-x'
option may be repeated if multiple
exclusion lists are needed. Note that
the order of the '-s' and '-x' options
is important:
kwik -x file
The file contains an exclusion list,
one word per line. Append the
contents of the file to the default
list.
kwik -s -x file
Replace the default stoplist by the
contents of the named file.
kwik -x file -s
After reading the exclusion file,
the entire stop list is erased.
(This is not a useful procedure.)
-w width The output line width is normally 80
characters. The '-w' option changes it
to a user-specified value.
[detailed stuff omitted]
AUTHOR:
David Conroy, Martin Minow
[more details omitted]
-------