FOXEA@VTCC1 (06/27/88)
IRList Digest Tuesday, 7 May 1988 Volume 4 : Issue 33
Today's Topics:
Abstract - Selected abstracts appearing in SIGIR FORUM (part 1 of 2)
News addresses are
Internet or CSNET: fox@vtopus.cs.vt.edu
BITNET: foxea@vtvax3.bitnet
----------------------------------------------------------------------
Date: Tue, 17 May 88 09:10:51 CDT
From: "Dr. Raghavan" <raghavan%raghavansun%usl.csnet@RELAY.CS.NET>
Subject: Abstracts from SIGIR Forum [Part I of II - Ed.]
Ed,
These are the abstracts I included in the recent Forum.
...
Regards, Vijay
ABSTRACTS
(Chosen by G. Salton from recent issues of journals in the retrieval area
.)
INFORMATION RETRIEVAL BY CONSTRAINED SPREADING ACTIVATION IN SEMANTIC NET
WORKS
Paul R. Cohen and Rick Kjeldsen, Department of Computer Information Scien
ce,
Lederle Graduate Research Center, University of Massachusetts, Amherst, M
A
01003
GRANT is an expert system for finding sources of funding given researc
h
proposals. Its search method - constrained spreading activation - makes
inferences about the goals of the user and thus finds information that th
e
user did not explicitly request but that is likely to be useful. The
architecture of GRANT and the implementation of constrained spreading
activation are described, and GRANT's performance is evaluated.
(INFORMATION PROCESSING & MANAGEMENT, Vol. 23, No. 4, pp. 255-268, 1987)
DEVELOPMENT OF THE CODER SYSTEM: A TESTBED FOR ARTIFICIAL INTELLIGENCE
METHODS IN INFORMATION RETRIEVAL
Edward A. Fox, Department of Computer Science, Virginia Tech, Blacksburg,
VA
24061
The CODER (COmposite Document Expert/Extended/Effective Retrieval) sys
tem
is testbed for investigating the application of artificial intelligence
methods to increase the effectiveness of information retrieval systems.
Particular attention is being given to analysis and representation of
heterogeneous documents, such as electronic mail digests or messages, whi
ch
vary widely in style, length, topic, and structure. Since handling passa
ges
of various types in these collections is difficult even for experimental
systems like SMART, it is necessary to turn to other techniques being exp
lored
by information retrieval and artificial intelligence researchers. The CO
DER
system architecture involves communities of experts around active blackbo
ards,
accessing knowledge bases that describe users, documents, and lexical ite
ms of
various types. The initial lexical knowledge base construction work is n
ow
complete, and experts for search and time/date handling can perform a var
iety
of processing tasks. User information and queries are being gathered, an
d a
simple distributed skeletal system is operational. It appears that a num
ber
of artificial intelligence techniques are needed to best handle such comm
on
but complex document analysis and retrieval tasks.
(INFORMATION PROCESSING & MANAGEMENT, Vol. 23, No. 4, pp. 341-366, 1987)
USER MODELING IN INTELLIGENT INFORMATION RETRIEVAL
Giorgio Brajnik, Giovanni Guida, and Carlo Tassom, Laboratorio di Intelli
genza
Artificiale, Dipartimento di Matematica e Informatica, Universita di Udin
e,
Udine, Italy
The issue of exploiting user modeling techniques in the framework of
cooperative interfaces to complex artificial systems has recently receive
d
increasing attention. In this paper we present the IR-NLI II system, an
expert interface that allows casual users to access online information
retrieval systems and encompasses user modeling capabilities. More
specifically, an illustration of the user modeling subsystem is given by
describing the organization of the user model proposed for the particular
application area, together with its use during system operation. The
techniques utilized for the construction of the model are presented as we
ll.
They are based on the use of sterotypes, which are descriptions of typica
l
classes of users. More specifically, they include both declarative and
procedural knowledge for describing the features of the class to which th
e
sterotype is related, for assigning a user to that class, and for acquiri
ng
and validating the necessary information during system operation.
(INFORMATION PROCESSING & MANAGEMENT, Vol. 23, No. 4, pp. 305-320, 1987)
A PROTOTYPE OF AN INTELLIGENT SYSTEM FOR INFORMATION RETRIEVAL: IOTA
Y. Chiaramella and B. Defude,
Laboratoire IMAG ``Genie Informatique,''
BP 68-38402 St. Martin d'Heres,
France
Recent results in artificial intelligence research are of prime intere
st in
various fields of computer science; in particular we think information
retrieval may benefit from significant advances in this approach. Expert
systems seem to be valuable tools for components of information retrieval
systems related to semantic inference. The query component is the one we
consider in this paper. IOTA is the name of the resulting prototype pres
ented
here, which is our first step toward what we can an intelligent system fo
r
information retrieval.
After explaining what we mean by this concept and presenting current
studies in the field, the presentation of IOTA begins with the architectu
re
problem, that is, how to put together a declarative component, such as an
expert system, and a procedural component, such as an information retriev
al
system. Then we detail our proposed solution, which is based on a procedu
ral
expert system acting as the general scheduler of the entire query process
ing.
The main steps of natural language query processing are then described
according to the order in which they are processed, from the initial pars
ing
of the query to the evaluation of the answer. The distinction between ex
pert
tasks and nonexpert tasks is emphasized. The paper ends with experimenta
l
results obtained from a technical corpus, and a conclusion about current
and
future developments.
(INFORMATION PROCESSING & MANAGEMENT, Vol. 23, No. 4, pp. 285-303, 1987)
TEXT SIGNATURES BY SUPERIMPOSED CODING OF LETTER TRIPLETS AND QUADRUPLETS
Friedrich Gebhadt, Gesellschaft fur Mathematik und Datenverabeitung mbH,
D-
5205 St Augustin, West Germany
Text signatures are a condensed, coded form of a text; due to the redu
ced
length, information is retrieved faster than with the full text if invert
ed
files are not available. It has been proposed to base a particular form
of
signatures, the superimposed coding, on letter triplets (or quadruplets)
rather than on complete words admitting in this way the masking of
searchwords. This situation is analyzed here theoretically considering t
he
unequal occurrence probabilities of the triplets; the results are compare
d
with a set of experiments. It turns out that the signatures based on let
ter
triplets produce too many false associations since the triplets occur in
words
other than the searchword. With quadruplets, the number of false associa
tions
might be tolerable.
(INFORMATION SYSTEMS, Vol. 12, No. 2, pp. 151-156, 1987)
CONCEPT RECOGNITION IN AN AUTOMATIC TEXT-PROCESSING SYSTEM FOR THE LIFE
SCIENCES
Natasha Vieduts-Stokolov, BIOSIS, 2100 Arch Street, Philadelphia, PA 1910
3
This article describes a natural-language text-processing system desig
ned
as an automatic aid to subject indexing at BIOSIS. The intellectual proc
edure
the system should model is a deep indexing with a controlled vocabulary o
f
biological concepts - Concept Headings (CHs). On the average, ten CHs ar
e
assigned to each article by BIOSIS indexers. The automatic procedure con
sists
of two stages: (1) translation of natural-language biological titles int
o
title-semantic representations which are in the constructed formalized
language of Concept Primitives, and (2) translation of the latter
representations into the language of CHs. The first stage is performed b
y
matching the titles against the system's Semantic Vocabulary (SV). The S
V
currently contains approximately 15,000 biological natural-language terms
and
their translations in the language of Concept Primitives. For the ambigu
ous
terms, the SV contains the algorithmical rules of term disambiguation, ru
les
based on semantic analysis of the contexts. The second stage of the auto
matic
procedure is performed by matching the title representations against the
CH
definitions, formulated as Boolean search strategies in the language of
Concept Primitives. Three experiments performed with the system and thei
r
results are described. The most typical problems the system encounters,
the
problems of lexical and situational ambiguities, are discussed. The
disambiguation techniques employed are described and demonstrated in many
examples.
(JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No. 4,
pp.
269-287, 1987)
PROBABILISTIC RETRIEVAL AND COORDINATION LEVEL MATCHING
Robert Losee, School of Library Science, University of North Carolina, Ch
apel
Hill, NC 27514
Probabilistic models of document-retrieval systems incorporating seque
ntial
learning through relevance feedback may require frequent and time-consumi
ng
reevaluations of documents. Coordination level matching is shown to prov
ide
equivalent document rankings to binary models when term discrimination va
lues
are equal for all terms; this condition may be found, for example, in
probabilistic systems with no feedback. A nearest-neighbor algorithm is
presented which allows probabilistic sequential models consistent with tw
o-
Poisson or binary-independence assumptions to easily locate the ``best''
document using temporary sets of documents at a given coordination level.
Conditions under which reranking is unnecessary are given.
(JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No. 4,
pp.
239-244, 1987)
OPTIMAL DETERMINATION OF USER-ORIENTED CLUSTERS
Vijay V. Raghavan, The Center for Advanced Computer Studies, University o
f
Southwestern Louisiana, Lafayette, LA 70504-4330 and Jitender S. Deogun,
Department of Computer Science, University of Nebraska, Lincoln, NE 68588
-0115
User-oriented clustering schemes enable the classification of document
s
based upon the user perception of the similarity between documents, rathe
r
than on some similarity function presumed by the designer to represent th
e
user criteria. In this paper, an enhancement of such a clustering scheme
is
presented. This is accomplished by the formulation of the user-oriented
clustering as a function-optimization problem. The problem formulated is
termed the Boundary Selection Problem (BSP). Heuristic approaches to sol
ve
the BSP are proposed and some preliminary results that motivate the need
for
further evaluation of these approaches is provided.
(PROCEEDINGS OF THE TENTH ANNUAL INTERNATIONAL ACM-SIGIR CONFERENCE ON
RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, New Orleans, LA, USA, pp
.
140-146, 1987)
PROBABILISTIC SEARCH TERM WEIGHTING--SOME NEGATIVE RESULTS
Norbert Fuhr and Peter Muller, TH Darmstadt, Fachbereich Informatik, 6100
Darmstadt, West Germany
The effect of probabilistic search term weighting on the improvement o
f
retrieval quality has been demonstrated in various experiments described
in
the literature. In this paper, we investigate the feasibility of this me
thod
for boolean retrieval with terms from a prescribed indexing vocabulary.
This
is a quite different test setting in comparison to other experiments wher
e
linear retrieval with free text terms was used. The experimental results
show
that in our case no improvement over a simple coordination match function
can
be achieved. On the other hand, models based on probabilistic indexing
outperform the ranking procedures using search term weights.
(PROCEEDINGS OF THE TENTH ANNUAL INTERNATIONAL ACM-SIGIR CONFERENCE ON
RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, New Orleans, LA, USA, pp
.
13-18, 1987)
NON-HIERARCHIC DOCUMENT CLUSTERING USING THE ICL DISTRIBUTED ARRAY PROCES
SOR
Edie M. Rasmussen and Peter Willett, Department of Information Studies,
University of Sheffield, Western Bank, Sheffield S10 2TN, U.K.
This paper considers the suitability and efficiency of a highly parall
el
computer, the ICL Distributed Array Processor (DAP), for document cluster
ing.
Algorithms are described for the implementation of the single-pass and
reallocation clustering methods on the DAP and on a conventional mainfram
e
computer. These methods are used to classify the Cranfield, Vaswani and
UKCIS
document test collections. The results suggest that the parallel archite
cture
of the DAP is not well suited to the variable-length records which
characterize bibliographic data.
(PROCEEDINGS OF THE TENTH ANNUAL INTERNATIONAL ACM-SIGIR CONFERENCE ON
RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, New Orleans, LA, USA, pp
.
132-139, 1987)
QUALITY OF INDEXING IN ONLINE DATA BASES
Howard D. White and Belver C. Griffith, College of Information Studies, D
rexel
University, Philadelphia, PA 19104
We describe practical tests by which the quality of subject indexing i
n
online bibliographic data bases can be compared and judged. The tests ar
e
illustrated with 18 clusters of documents from the medical behavioral sci
ence
literature and with terms drawn from MEDLINE, PsycINFO, BIOSIS, and Excer
pta
Medica. Each test involves obtaining a cluster of about five documents k
nown
on some grounds to be related in subject matter, and retrieving their
descriptors from at least two data bases. We then tabulate the average n
umber
of descriptors applied to the documents, the number of descriptors applie
d to
all and to a majority of the documents in the cluster, and the relative r
arity
of the applied descriptors. Comparable statistics emerge on how each dat
a
base links related documents and discriminates broadly and finely among
documents. We also gain qualitative insights into the expressiveness and
pertinence of the available indexing terms.
(INFORMATION PROCESSING & MANAGEMENT, Vol. 23, No. 3, pp. 211-224, 1987)
------------------------------
END OF IRList Digest
********************