[comp.theory.info-retrieval] IRList Digest V4 #1


IRList Digest           Sunday, 24 January 1988      Volume 4 : Issue 1

Today's Topics:
   Email - Welcome message with latest info on submission, etc.
   Announcement - Free text retrieval software for Mac, SUN, VAX, etc.
   Abstracts - Software Psychology Society Newsletter for Winter 1988

News addresses are
   Internet or CSNET: fox@vtopus.cs.vt.edu
   BITNET: foxea@vtvax3.bitnet


Date: Sun, 24 Jan 88 13:13:35 est
From: fox (Ed Fox)
Subject: welcome message to IRList to start off the year

Welcome to the IRList.  I am the moderator of the IRList discussion.
I am responsible for composing the digest from pending submissions,
controlling the volume and frequency of mail, keeping an archive, and
answering administrative requests.  You may submit material for the
digest to a variety of places, depending on what network you are on
and how quickly and reliably you want mail to reach me.  We do not
have to pay for mail deliveries, but they do vary in speediness and
reliability.  Possibilities include:
     If on ARPANET and can use domains, or on CSNET, use
     If on ARPANET and can't use domains use
     If on BITNET, use foxea@vtvax3
     If on UUCPNET, use something like
        ... seismo!vtvax3.bitnet!foxea

As you might expect, archival copies of all digests will be kept; feel
free to ask for recent back issues.  Note that FTP is now finally
possible but details have yet to be worked out regarding access.  Meanwhile,
all communication must be by EMAIL or phone or letter.  IRList is open
to discussion of any topic (vaguely) related to information retrieval.
Certainly, any material relating to ACM SIGIR (the Special Interest
Group on Information Retrieval of the Association for Computing
Machinery) is of interest.  Our field has close ties to artificial
intelligence, database management, information and library science,
linguistics, ...  A partial list of topics suitable are:
     Information Management/Processing/Science/Technology
     AI Applications to IR        Hardware aids for IR
     Abstracting                  Hypertext and Hypermedia
     CD-ROM / CD-I / ...          Indexing/Classification
     Citations                    Information Display/Presentation
     Cognitive Psychology         Information Retrieval Applications
     Communications Networks      Information Theory
     Computational Linguistics    Knowledge Representation
     Computer Science             Language Understanding
     Cybernetics                  Library Science
     Data Abstraction             Message Handling
     Dictionary analysis          Natural Languages, NL Processing
     Document Representations     Optical disc technology and applications
     Electronic Books             Pattern Recognition, Matching
     Evidential Reasoning         Probabilistic Techniques
     Expert Systems in IR         Speech Analysis
     Expert Systems use of IR     Statistical Techniques
     Full-Text Retrieval          Thesaurus construction
     Fuzzy Set Theory

Contributions may be anything from tutorials to rampant speculation.
In particular, the following are sought:
     Abstracts of Papers,Reports,Dissertations   Address Changes
     Bibliographies                              Conference Reports
     Descriptions of Projects/Laboratories       Half-Baked Ideas
     Histories                                   Humorous,Enlightening Anecdotes
     Questions                                   Requests
     Research Overviews                          Seminar Announcements/Summaries
     Work Planned or in Progress

The only real boundaries to the discussion are defined by the topics
of other mailing lists.  Please do not send communications to both
this list and AIList or the Prolog list, except in special cases.
I will try not to overlap much with NL-KR, except when we both receive
materials from contributors or from some bulletin board or researchers.

PLEASE "sign" subscriptions with full name and address so that people
can access you from Internet and/or BITNET (many other networks can be
reached through them and are certainly urged to participate).  Editing
of contributions will usually be limited to text justifications and
spelling corrections.  Editorial remarks and elisions will be marked
with square brackets.  The author will be contacted if significant
editing is required.  I have no objection to distributing material
that is destined for conference proceedings or any other publication.

I support ACM SIGIR Forum and unless you request otherwise may encourage
inclusion of submissions in whole or in part in future paper versions of
the FORUM.  Indeed, this is one form of solicitation for FORUM
contributions!  Both IRList and the FORUM are unrefereed, and opinions
are always those of the author and not of any organization unless
there are other indications.  Copies of list items should credit the
original author, not necessarily the IRList.  If you are interested in
submitting to Information Processing and Management (IP&M), I would to
entertain a discussion with you as well.  Also with The Laserdisk
Professional, a new publication about CD-ROM and optical discs.

The list does not assume copyright, nor does it accept any liability
arising from remailing of submitted material.  Further, no liability
is accepted for use of such materials for information retrieval research,
including distribution of test collections.  I reserve the right,
however, to refuse to remail any contribution that I judge to be of
commercial purpose, obscene, libelous, irrelevant, or pointless.
Replies to public requests for information should be sent, at least
in "carbon" form, to this list unless the request states otherwise.
If necessary, I will digest or abstract the replies to control the
volume of distributed mail.  However, PLEASE DO contribute! I would
rather deal with too much material than with too little.  -- Ed Fox
    Edward A. Fox,  Assistant  Professor,  Dept. of Computer Science,
    Virginia Tech  (VPI&SU),  McBryde Hall Rm. 562, Blacksburg VA 24061
    (703) 961-5113 or 6931


Date: 27 Dec 87 09:35 EST
From: science@nems.ARPA (Mark Zimmermann)
Subject: free text retrieval software for SUN, VAX, Macintosh

Hi there!  Ed, if you could forward this note to SUN-SPOTS and/or to Igor
Metz, who asked about text retrieval software for the Sun, I'd greatly
appreciate it -- I am terrible at figuring out addresses to send things to
from here, and my mailer is even worse.

I wrote up a bunch of programs in C about 6 months ago that run on Sun,
VAX, Macintosh, etc., which generate simple complete inverted indices to
every word in an ascii text file.  (Leaving out 'stop words' turns out to
be something of a waste of the computer's time and doesn't save a significant
amount of disk space either.)  If anybody wants to see copies of the best
of these programs, 'qndxr.c' and 'brwsr.c', and can get me an address on
the net to send them to (from arpanet, from a picky mailer) I'd be more
than happy to do so.

'qndxr.c' is about 50 kB long, including comments, and seems pretty
transportable ... I've sent out dozens of copies and haven't heard of
any bugs from the latest version.  It takes an arbitrarily-large text
file (disk space limits you, until you get to 2 or 4 GB where my 32-bit
pointers run out) and breaks it up into chunks that fit into memory,
then does a quicksort on pointers to every word in the chunk, and writes
the resulting chunks of index files to disk ... then, it goes through
and merges the chunks of index together until there is a single (pair)
of index files (one holding keys, the other holding pointers to every
occurrence of words).  Very very simple ... I'm working on extensions,
but more on that later.

  Current version seems to build indices at roughly 10-15 MB/hour pace
  on a Sun or Mac II, and at 3-4 MB/hour on a Mac Plus....

'brwsr.c' lets you browse through the index ... gives you a display of
words and their occurrence rates, like:
   100 aardvark
  9876 aaron
    21 aarons
etc.  If you are interested in aardvarks, you can pop down into a
complete key-word-in-context display of the occurrences of the string
aardvark (all 100 of them), like:
 was eaten by a voracious aardvark in 1492, when his boat landed...
 took the left leg of his aardvark and painted it blue without a...
 among the earliest known aardvark civilizations.  Now it can be...
etc.  Then, if any of these lines of the KWIC display look promising,
you can pop down into the full text around that chosen line, and
read, copy to a file of notes, etc.  The C code for brwsr is also about
50 kB long including comments.

I have been spending the past few weeks rewriting most of the above to
integrate it into HyperCard (Macintosh program ... my routines become
external functions and commands) ... should have some good stuff to
start distributing in a few weeks, if all goes well.  My sabbatical
time is running out, so my work will be slower next year, alas.

Oh, I forgot to mention, 'brwsr.c' above has simple proximity searching ...
you can define a working subset of the dataspace as, for example, only
to include words within a few sentences of '1492', for instance, in
which case the index display shows the counts in that subset, e.g.,
  1/100 aardvark
 17/9876 aaron
  2/21   aarons
etc.  Now, if you ask for a KWIC display of aardvark, you only see the
one occurrence in the neighborhood of '1492'.

I use my Macintosh versions of brwsr and qndxr all the time ... have
accumulated over 12 MB of text from the past year or so of arpanet
and usenet and delphi digests, mostly related to Macintosh programming,
information retrieval, etc. -- it's easy to browse and pull out tidbits
that I vaguely recall the existence of.

As stated earlier, the programs are free (at the moment), but I can't
afford to spend a lot of time distributing them or supporting them at
that price, and my time will be even scarcer starting next week.

Best,   ^z   (Mark Zimmermann, 'science@nems.arpa')


Date: Sun, 27 Dec 87 16:50:37 EST
Subject: Software Psychology Newsletter - Winter 1988
From: ("Ben Shneiderman <ben@mimsy.umd.edu>") <ben%MIMSY.UMD.EDU@UMD2>


Happy New Year...Ben




VOLUME 12 NUMBER 2                              WINTER 1988

   Note: All meetings will be held at the George Washington University's  Mar-
vin  Center  (800  21st  Street,  N.W.)  between 10:00 AM and noon. Coffee and
doughnuts will be provided by the Department  of  Electrical  Engineering  and
Computer Sciences.

Send correspondence for this newsletter to: Software Psychology  Society,  c/o
Skip  Williamson,  Knowledge  Systems, Inc., 5705 Stillwell Rd., Rockville, MD


January 8
Room 413-414


                John T. Christian (1) and Bruce H. Thomas (2)
           Computer Sciences Corporation, System Sciences Division
                    8728 Colesville Rd., Silver Spring, MD

         (1) now at CSC, 4600 Powder Mill Rd., Beltsville, MD   20705
       (2) now at National Bureau of Standards, Gaithersburg, MD 20899

Often during the design of a user-system interface,  human  factors  engineers
are  asked  if  and how color can be used to code information.  Frequently the
response is that color can be used but in limited ways (e.g., follow  cultural
stereotypes  and  use less than six colors).  On the other hand, designers and
practitioners in the software field (e.g., word  processing  and  presentation
graphics)  have used color, sometimes in highly artistic ways, to enhance pro-
cessing of presented information.  With the advent of improved high resolution
color  graphics monitors, more people want to use color coding presumably as a
strategy to improve productivity.  The basis for  making  color  coding  deci-
sions, for example in a word processing task, are unclear at best.

In an attempt to sort out the consequences of  color  coding  information  for
user  productivity,  several  experiments were conducted.  We investigated the
effects of color character - color background combinations on people's percep-
tion  and comprehension of information in a timed target detection and reading
comprehension tasks.  In two other studies, we examined  cultural  stereotypes
for coding meteorological parameters.   The  mixed  results  may  serve  as  a
palette to color future decisions on color coding.


February 12                                                     Room 413-414


           Sylvia B. Sheppard, Elizabeth D. Murphy, Lisa J. Stewart
                    Computer Technology Associates, Inc.,
                    14900 Sweitzer Lane, Laurel, MD 20702

         Walter Truszkowski, NASA Goddard Space Center, Greenbelt, MD

A theoretical model has been designed to predict the performance of  users  of
automated  systems.  The Operating Personnel Performance Model is based on the
premise that user performance  in  control  rooms  can  be  predicted  from  a
knowledge of the cognitive, sensory, and motor demands imposed on the users in
the performance of their tasks  and  from  a  knowledge  of  the  capabilities
required  to  meet those demands.  Two studies, related to the Network Control
Center (NCC) and the Georgia Tech.  Multi-Satellite Operations Control  Center
(GT-MSOCC)  at  NASA - Goddard Space Flight Center, were conducted to test the
model's  predictive validity.  The results supported the conclusion  that  the
model is an aid in the rapid, systematic evaluation of design alternatives.


March   11
Room 413-414

                           USER-CENTERED DESIGN OF
                        AN INTELLIGENT DATABASE SYSTEM

                Eizabeth Roop, Carlow Associates Incorporated
                8315 Lee Highway, Suite 410, Fairfax, VA 22031

To facilitate the dissemination of  its  collection  and  compilation  of  UCI
(User-Computer  Interface) reports and literature, U.S. Army's Human Engineer-
ing Lab is applying an intense R&D effort toward a fully  automated,  intelli-
gent  database system.  Specifications for this state-of-the-technology system
were based on user preferences, determined by surveys of the intended users of
the  system,  and  a  review  of  the current literature for both hardware and
software techniques.

Results from early testing of the prototype will be presented.  The  prototype
includes hardware to scan documents for rapid data entry, a character recogni-
tion  server to divide the data into separate ASCII and bitmap files, and  the
latest  in  WORM (Write Once Read Many) drive technology.  HEL's database sys-
tem is supported by an intelligent front end, which provides new features  for
a  traditional  query  and retrieval subsystem, hypertext capability, and per-
sonal files.


END OF IRList Digest