[comp.ai] nl evaluation workshop

palmer@PRC.UNISYS.COM (09/02/88)
                   CALL FOR PARTICIPATION

                        Workshop on
     Evaluation of Natural Language Processing Systems

                          Dec 8-9
           Wayne Hotel, Wayne, PA (Philadelphia)

     There has been much recent interest  in  the  difficult
problem  of  evaluating  natural language systems.  With the
exception of natural language interfaces there are few work-
ing systems in existence, and they tend to be concerned with
very different tasks and use equally  different  techniques.
There  has been little agreement in the field about training
sets and test sets, or  about  clearly  defined  subsets  of
problems  that  constitute standards for different levels of
performance.  Even those groups that have attempted a  meas-
ure of self-evaluation have often been reduced to discussing
a system's performance in isolation - comparing its  current
performance  to  its  previous  performance  rather  than to
another system. As this technology  begins  to  move  slowly
into  the  marketplace, the need for useful evaluation tech-
niques is becoming more and more obvious.  The  speech  com-
munity  has  made some recent progress toward developing new
methods of evaluation, and  it  is  time  that  the  natural
language  community followed suit.  This is much more easily
said than done and will require a concentrated effort on the
part of the field.

     There are certain premises that should underly any dis-
cussion  of  evaluation  of natural language processing sys-
tems:

(1)  It should be possible to  discuss   system   evaluation
     in   general  without  having to state whether the pur-
     pose  of the  system is  "question-answering" or  "text
     processing."     Evaluating   a   system   requires the
     definition of an  application  task  in  terms  of  I/O
     pairs   which   are  equally  applicable  to  question-
     answering, text processing, or generation.

(2)  There are two basic types of evaluation: a) "black  box
     evaluation"  which  measures  system  performance  on a
     given task in terms of well-defined I/O pairs;  and  b)
     "glass  box  evaluation"  which  examines  the internal
     workings of the system.  For example,  glass  box  per-
     formance   evaluation   for  a  system that is supposed
     to  perform semantic  and  pragmatic   analysis  should
     include the  examination  of  predicate-argument  rela-
     tions,  referents,  and temporal and causal relations.

     Given these premises, the workshop will  be  structured
around  the following three sessions: 1) Defining "glass box













evaluation" and "black box evaluation." 2) Defining criteria
for "black box evaluation." _A _P_r_o_p_o_s_a_l _f_o_r _e_s_t_a_b_l_i_s_h_i_n_g _t_a_s_k
_o_r_i_e_n_t_e_d _b_e_n_c_h_m_a_r_k_s _f_o_r _N_L_P _S_y_s_t_e_m_s (Session Chair  -   Beth
Sundheim)  3)  Defining criteria for "glass box evaluation."
(Session Chair - Jerry Hobbs)  Several  different  types  of
systems will be discussed, including question-answering sys-
tems, text processing systems and generation systems.

     Researchers interested in participating  are  requested
to  submit  a  short  (250-500  word)  description  of their
experience and interests, and what they could contribute  to
the  workshop.  In particular, if they have been involved in
any evaluation efforts that they would like  to  report  on,
they  should  include  a  short abstract (500-1000 words) as
well. The number of participants at  the  workshop  must  be
restricted  due  to limited room size.  The descriptions and
abstracts will be reviewed by the following committee:  Mar-
tha  Palmer  (Unisys),  Mitch Marcus (University of Pennsyl-
vania), Beth Sundheim  (NOSC),  Ed  Hovy  (ISI),  Tim  Finin
(Unisys),  Lynn  Bates  (BBN).   They  should  arrive at the
address given below no later than October 1st.  Responses to
all  who  submit  abstracts  or descriptions will be sent by
November 1st.

                       Martha Palmer
                           Unisys
                   Research & Development
                         PO Box 517
                      Paoli, PA 19301
                   palmer@prc.unisys.com
                       (215) 648-7228





















9

9