[comp.ai.nlang-know-rep] NL-KR Digest Volume 5 No. 15

nl-kr-request@CS.ROCHESTER.EDU (NL-KR Moderator Brad Miller) (09/13/88)

NL-KR Digest             (9/12/88 21:11:42)            Volume 5 Number 15

Today's Topics:
        Re: open/closed classes
        nl evaluation workshop
        Data Wanted:
        Re: GPSG parsers
        
Submissions: NL-KR@CS.ROCHESTER.EDU 
Requests, policy: NL-KR-REQUEST@CS.ROCHESTER.EDU
----------------------------------------------------------------------

Date: Thu, 1 Sep 88 07:54 EDT
From: Bruce E. Nevin <bnevin@cch.bbn.com>
Subject: open/closed classes

There is some recent work by Leonard Talmy on the supposed cognitive
whys and wherefores of open vs closed classes.  Sorry, I don't have a
reference handy.

The supposition that a speech recognizer has to be especially good at
hearing closed-class words misses an important point:  the closed-class
words are unstressed and generally subject to reduction in phonemic--how
shall I say--extent.  This is part of a general process, apparently in
all languages, of reducing the phonemic representation of words that
carry less information.  They are reducible to the extent that they are
redundant.  (Not much difficulty predicting the filler in the context
'He __ gone.' You only need enough phonemic content to distinguish the
words 'has, had, was' plus of course more obvious--and less reduced--
constructions incorporating these such as their negatives, `will have
gone', etc.)

Historically, closed-class morphology derives from open-class words that
have become more redundant and predictable, so that their reduced forms
become `frozen' in their now predictable contexts.  An example is the
suffix -hood in `childhood', from an earlier form had meaning `state',
something like `child-state'.  The suffix -ly in adverbs of manner
derives from the dative of a word for `form, body'.  Ancestors of
Proto-Indo-European not having been reconstructed, we have no
confirmation that this is the origin of inflectional morphology such as
the preterit in descendant languages like English, but that is certainly
the most plausible assumption.  In American Indian languages, Shirley
Silver dubbed this process `morphemization' almost 20 years ago.

So affixes (inherently closed-class morphology) appear to be derived by
reduction from once free-standing words.  Similarly for closed-class
words.  `Because' derives from `by cause'.  OED cites 1305 `bi cause
whi'; whi or `why' is the instrumental of the wh- pronouns typified by
`what', reduced to `that' in the later `by cause that, because that'.
(Compare reduction of cause to zero in `for the cause why' --> `forwhy',
a common conjunction now obsolete, to which compare further `from the
place where' --> `from where'.)  Zeroing of `why ~ that' in `because
why, because that' leaves `because' as a conjunction, a closed-class
word.  (See Jespersen _Modern English Grammar on Historical Principles_
V 397 and Harris _A Grammar of English on Mathematical Principles_ 195
for further details.)

An example currently in progress in English is `going to' --> `gonna', a
reduction that takes place before verbs but not before nouns (*`I'm
gonna New York')  precisely because `going to' can occur before the
whole class of verbs (and consequently carries less information and is
subject to reduction there)  but cannot occur before every possible
noun.  (Note that in e.g.  `I'm going to authority' an indefinite noun,
one of exceptionally broad distribution, can be understood as having
been elided:  `I'm going to someone of/in authority'.  It is not
possible to reverse a reduction in this way to account for the broad
distribution of `going to' before verbs.)  This appears to be on the way
to being a separate future tense morpheme in the closed-class set.

The above example of `forwhy' illustrates that closed-class words also
become obsolete and drop from the language.  The class is closed with
respect to distribution, and conservative but not closed with respect to
change.

Bruce Nevin
bn@cch.bbn.com
<usual_disclaimer>


------------------------------

Date: Fri, 2 Sep 88 12:19 EDT
From: palmer@PRC.Unisys.COM
Subject: nl evaluation workshop

                   CALL FOR PARTICIPATION

                        Workshop on
     Evaluation of Natural Language Processing Systems

                          Dec 8-9
           Wayne Hotel, Wayne, PA (Philadelphia)

     There has been much recent interest  in  the  difficult
problem  of  evaluating  natural language systems.  With the
exception of natural language interfaces there are few work-
ing systems in existence, and they tend to be concerned with
very different tasks and use equally  different  techniques.
There  has been little agreement in the field about training
sets and test sets, or  about  clearly  defined  subsets  of
problems  that  constitute standards for different levels of
performance.  Even those groups that have attempted a  meas-
ure of self-evaluation have often been reduced to discussing
a system's performance in isolation - comparing its  current
performance  to  its  previous  performance  rather  than to
another system. As this technology  begins  to  move  slowly
into  the  marketplace, the need for useful evaluation tech-
niques is becoming more and more obvious.  The  speech  com-
munity  has  made some recent progress toward developing new
methods of evaluation, and  it  is  time  that  the  natural
language  community followed suit.  This is much more easily
said than done and will require a concentrated effort on the
part of the field.

     There are certain premises that should underly any dis-
cussion  of  evaluation  of natural language processing sys-
tems:

(1)  It should be possible to  discuss   system   evaluation
     in   general  without  having to state whether the pur-
     pose  of the  system is  "question-answering" or  "text
     processing."     Evaluating   a   system   requires the
     definition of an  application  task  in  terms  of  I/O
     pairs   which   are  equally  applicable  to  question-
     answering, text processing, or generation.

(2)  There are two basic types of evaluation: a) "black  box
     evaluation"  which  measures  system  performance  on a
     given task in terms of well-defined I/O pairs;  and  b)
     "glass  box  evaluation"  which  examines  the internal
     workings of the system.  For example,  glass  box  per-
     formance   evaluation   for  a  system that is supposed
     to  perform semantic  and  pragmatic   analysis  should
     include the  examination  of  predicate-argument  rela-
     tions,  referents,  and temporal and causal relations.

     Given these premises, the workshop will  be  structured
around  the following three sessions: 1) Defining "glass box
evaluation" and "black box evaluation." 2) Defining criteria
for "black box evaluation." _A Proposal for establishing task
oriented benchmarks for NLP Systems_ (Session Chair  -   Beth
Sundheim)  3)  Defining criteria for "glass box evaluation."
(Session Chair - Jerry Hobbs)  Several  different  types  of
systems will be discussed, including question-answering sys-
tems, text processing systems and generation systems.

     Researchers interested in participating  are  requested
to  submit  a  short  (250-500  word)  description  of their
experience and interests, and what they could contribute  to
the  workshop.  In particular, if they have been involved in
any evaluation efforts that they would like  to  report  on,
they  should  include  a  short abstract (500-1000 words) as
well. The number of participants at  the  workshop  must  be
restricted  due  to limited room size.  The descriptions and
abstracts will be reviewed by the following committee:  Mar-
tha  Palmer  (Unisys),  Mitch Marcus (University of Pennsyl-
vania), Beth Sundheim  (NOSC),  Ed  Hovy  (ISI),  Tim  Finin
(Unisys),  Lynn  Bates  (BBN).   They  should  arrive at the
address given below no later than October 1st.  Responses to
all  who  submit  abstracts  or descriptions will be sent by
November 1st.

                       Martha Palmer
                           Unisys
                   Research & Development
                         PO Box 517
                      Paoli, PA 19301
                   palmer@prc.unisys.com
                       (215) 648-7228


------------------------------

Date: Mon, 5 Sep 88 16:36 EDT
From: Mark William Hopkins <markh@csd4.milw.wisc.edu>
Subject: Data Wanted:

     I am in need of some English text, for setting up a data base.  If you 
have any to contribute please e-mail them to me.

     I asked Jerry Lewis to set up a telethon for this, but he said he was 
busy :-)

------------------------------

Date: Mon, 12 Sep 88 08:02 EDT
From: COR_HVH%HNYKUN52.BITNET@CUNYVM.CUNY.EDU
Subject:  GPSG parsers

Some time ago I asked for information on GPSG parsers (or parser-generators)
and promised to report any replies. Up to now, I have been notified of two
efforts in this area.

At the Technical University in Berlin a PROLOG system is being developed in
a machine translation context (Eurotra). It is able to parse and generate
sentences according to a small English or a medium German grammar.
At Boeing work is done on a LISP GPSG parser with the eventual aim of
automatic message processing. The system can parse English sentences
using a fairly large grammar and dictionary. Neither system uses "pure"
GPSG (in case it exists at all), the most important difference being the
absence of metarules.

I will ask both my contacts to do a more detailed write-up about their
work and submit them to this list.

Hans van Halteren             COR_HVH@HNYKUN52.BITNET

------------------------------

End of NL-KR Digest
*******************