[mod.computers.vax] KWIC indexer

LEWIS%cs.umass.edu@RELAY.CS.NET (10/21/86)

     Does anyone know of a KWIC (Keyword In Context) indexing program
that runs under VMS?  I would be interesting in hearing of one (preferably
public domain) as well as of other programs useful in Information Retrieval
research.
         --David D. Lewis
           COINS Dept.
           Univ. of Massachusetts
           Amherst, MA  01004
           LEWIS%cs.umass.edu

LEICHTER-JERRY@YALE.ARPA (10/22/86)

         Does anyone know of a KWIC (Keyword In Context) indexing program
    that runs under VMS?  I would be interesting in hearing of one (preferably
    public domain) as well as of other programs useful in Information Retrieval
    research.
Such a program is part of the DECUS C distribution - in fact, it's used to
build the DECUS C documentation.  Runs fine on VMS under VAX C.

A couple of related Unix-like tools are also in the DECUS C distribution,
including tr, mc (a multi-column filter), wc, and grep.

I am not in a position to do massive distributions of this stuff, or to make
it available via FTP; it would be best if you went through DECUS.

Note that some DECUS C distributions include full source, while others are
pre-built versions for various OS's - the full source versions are BIG.  I'm
quite sure there is no pre-built version for VMS, so pick some convenient
source distribution and go from there.

I've attached the documentation for DECUS C KWIK [sic] below.

							-- Jerry

                                    ********
                                    * kwik *
                                    ********



        NAME:   kwik -- Keyword in Context Index

        SYNOPSIS:

                kwik [options] [file ...]

        DESCRIPTION:

                Kwik constructs a keyword in context (kwik) index  using
                the data in the named files, writing the resulting index
                to the standard output.  The standard input is  read  if
                no  files  are  specified; kwik may be used as a filter.
                The following options are defined:

                -s              The kwik index normally excludes  common
                                (stop) words.  Specifing the '-s' option
                                empties the stop  list,  thus  including
                                the following words:

                                        a       by      in      to
                                        an      for     of      the
                                        and     from    on      with
                                        at


                -r              Make the  index  in  reverse  alphabetic
                                order.

                -t offset       This is used to build index tables.  The
                                input   is   entered  in  the  following
                                format:

                                        name<TAB>index text

                                The kwik index will be output  with  the
                                name  in  the  left  hand  column.   The
                                kwik'ed text  then  follows.   The  '-t'
                                option  takes a mandatory argument:  the
                                column at which the first  byte  of  the
                                kwik'ed  text  should  be  placed.   For
                                example, the index for the Decus library
                                documentation   was   produced   by  the
                                following command:

                                    kwik -t 16 -w 64 <infile >outfile

                -x file_name    The named file contains a user-specified
                                exclusion  (stopword)  list.   The  '-x'
                                option  may  be  repeated  if   multiple
                                exclusion  lists  are needed.  Note that
                                the order of the '-s' and  '-x'  options
                                is important:

                                kwik -x file

                                    The file contains an exclusion list,
                                    one   word  per  line.   Append  the
                                    contents of the file to the  default
                                    list.

                                kwik -s -x file

                                    Replace the default stoplist by  the
                                    contents of the named file.

                                kwik -x file -s

                                    After reading  the  exclusion  file,
                                    the  entire  stop  list  is  erased.
                                    (This is not a useful procedure.)

                -w width        The output line  width  is  normally  80
                                characters.   The '-w' option changes it
                                to a user-specified value.

	[detailed stuff omitted]

        AUTHOR:

                David Conroy, Martin Minow

	[more details omitted]
-------