[comp.ai.nlang-know-rep] NL-KR Digest Volume 3 No. 57

nl-kr-request@CS.ROCHESTER.EDU (NL-KR Moderator Brad Miller) (12/05/87)
NL-KR Digest             (12/04/87 20:08:14)            Volume 3 Number 57

Today's Topics:
        Knowledge-based bibliographies
        Wanted: a module for natural language interface (in LISP)
        Text Encoding Standard for the Humanities - Vassar Workshop report
        DCG
        Re:  measures of "Englishness"
        Re: Lip Movement and Mental Lexicons?
        Re: Language Learning
        Re: Language Learning (a Turing test)
        Re: Language Learning (anecdotes)
        Re: Language Learning (anecdotes)
        
----------------------------------------------------------------------

Date: Wed, 2 Dec 87 09:01 EST
From: Roland Zito-Wolf <RJZ@JASPER.Palladian.COM>
Subject: Knowledge-based bibliographies


I am looking for references regarding knowledge-bases and KB-based tools
for organizing a bibliographic database on AI. I want to be able to retrieve
references by various indices.

Specific issues I'd like to know about:
	- friendly data entry
	- searching through alternate paths (say, finding articles related 
	  to a given article in some way: by author, topic, system name,
	  etc.)
	- ability to "evolve" the structure of the KB with time
	- what is areasonable conceptual structure for reference databases, in
	  general?

I'll post a digest of responses to the list.

Roland J. Zito-wolf
Palladian Software
4 Cambridge Center
Cambridge, Mass 02142
617-661-7171
RJZ%JASPER@LIVE-OAK.LCS.MIT.EDU

------------------------------

Date: Wed, 2 Dec 87 12:54 EST
From: David Naumann <naumann@umn-cs.cs.umn.edu>
Subject: Wanted: a module for natural language interface (in LISP)


Wanted: A module for a natural language interface (in LISP)

We are developing a tool for research of systems analyst behavior. The tool requires a natural lanThe tool
requires a natural language front end. We would like to know if anybody has,
or knows of, any natural language interface module (in LISP) that would take a 
question in English, validate it and produce a parsed tree.

We prefer public domain software, but are also willing to pay for it if necessary. ne
necesary. Please note that we have a limited budget.

Thanks for your help.

J. David Naumann
Macedonio Alanis
University of Minnesota
Management Sciences Department
Management Information Systems Area

ARPA   nauman@umn-cs
BITNET naumann@umnacvx
       alanis@umnacvx

------------------------------

Date: Wed, 2 Dec 87 22:50 EST
From: Robert Amsler <amsler@flash.bellcore.com>
Subject: Text Encoding Standard for the Humanities - Vassar Workshop report

[The following is a summary prepared by Michael Sperberg-McQueen for
the HUMANIST mailing list of the first workshop on the preparation of
an encoding standard for text in the humanities held at Vassar
College last month. As an attendee and steering committee member, I
would be willing to answer further questions concerning this effort
for the IRLIST or NL-KR communities.  The effort to develop a standard for
encoding texts in the humanities is just starting and anyone with
interest in this noble and ambitious goal should not feel the
slightest hesitancy about becoming a part of the effort.  What is at
stake is nothing less than the creation, use and preservation of our
global electronic cultural heritage - R. Amsler, (amsler@flash.bellcore.com)]

Contributor: "Michael Sperberg-McQueen" <U18189@UICVM>

A followup on the current status of the ACH effort to formulate
guidelines for text encoding practices.  

   ******************************************************************
   * NOTE: The following encoding conventions have been used to     *
   *       represent French accents throughout this message:        *
   *                                                                *
   *   To Represent Accents  --  Pour la representation des accents *
   *    /       acute accent - accent aigu                          *
   *    `       grave accent - accent grave                         *
   *                                                                *
   * The accent codes are typed    Les codes pour les accents se    *
   * AFTER the letter, and are     trouvent APRES la lettre qu'ils  *
   * used with both upper and      modifient, et s'utilisent avec   *
   * lower case letters.           les majuscules aussi bien que    *
   *                               les minuscules.                  *
   ******************************************************************


On November 12 and 13, 1987, 31 representatives of professional
societies, universities, and text archives met to consider the
possibility of developing a set of guidelines for the encoding of texts
for literary, linguistic, and historical research. The meeting was
called by the Association for Computers and the Humanities and funded
by the National Endowment for the Humanities.  The list of participants
is appended to this document.

The participants heartily endorsed the idea of developing encoding
guidelines. In order to guide such development, they agreed on
the following principles:


       The Preparation of                 Re/daction des directives
     Text Encoding Guidelines             pour le codage des textes

                         Poughkeepsie, New York
                            13 November 1987

1.  The guidelines are intended   1.  Le but des directives est de cre/er
    to provide a standard format      un format standard pour l'e/change
    for data interchange in           des donne/es utilise/es pour la
    humanities research.              recherche dans les humanite/s.

2.  The guidelines are also       2.  Les directives sugge/reront
    intended to suggest principles    e/galement des principes pour
    for the encoding of texts         l'enregistrement des textes
    in the same format.               destine/s a` utiliser ce format.

3.  The directives should         3.  Les directives devraient

  a.  define a recommended          a.  de/finir une syntaxe recommande/e
      syntax for the format             pour exprimer le format,

  b.  define a metalanguage         b.  de/finir un me/ta-langage
      for the description               de/crivant les syste`mes de
      of text-encoding schemes,         codage des textes,

  c.  describe the new format       c.  de/crire par le moyen de ce
      and representative                me/talangage, aussi bien qu'en
      existing schemes both in          prose, le nouveau syste`me de
      that metalanguage and             codage aussi bien qu'un choix
      in prose.                         repre/sentatif de syste`mes
                                        de/ja` en vigueur.

4.  The guidelines should         4.  Les directives devraient proposer
    propose sets of coding            des syste`mes de codage utilisables
    conventions suited for            pour un large e/ventail
    various applications.             d'applications.

5.  The guidelines should         5.  Sera incluse dans les directives
    include a minimal set of          l'e/nonciation d'un syste`me de
    conventions for encoding          codage minimum, pour guider
    new texts in the format.          l'enregistrement de nouveaux textes
                                      conforme/ment au format propose/.

6.  The guidelines are to be      6.  Le travail d'e/laboration des
    drafted by committees on:         directives sera confie/ a` quatre
                                      comite/s centre/s sur les sujets
                                      suivants:

  a.  text documentation            a.  la documentation des textes,

  b.  text representation           b.  la repre/sentation des textes,

  c.  text interpretation           c.  l'analyse et l'interpre/tation
      and analysis                      des textes

  d.  metalanguage definition       d.  la de/finition du me/talangage et
      and description of                son utilisation pour de/crire le
      existing and proposed             nouveau syste`me aussi bien que
      schemes                           ceux qui existent de/ja`.

    co-ordinated by a steering        Ce travail sera coordonne/ par un
    committee of representatives      comite/ d'organisation ou`
    of the principal                  sie`geront des repre/sentants des
    sponsoring organizations.         principales associations qui
                                      soutiennent cet effort.

7.  Compatibility with existing   7.  Dans la mesure du possible, le
    standards will be maintained      nouveau syste`me sera compatible
    as far as possible.               avec les syste`mes de codage
                                      existants.

8.  A number of large text        8.  Des repre/sentants de plusieurs
    archives have agreed in           grandes archives de textes en form
    principle to support the          lisible par machine acceptent en
    guidelines in their function      principe d'utiliser les directives
    as an interchange format.         en tant que description des formats
    We encourage funding agencies     pour l'e/change de leurs donne/es.
    to support development of         Nous encourageons les organismes
    tools to facilitate this          qui fournissent des fonds pour la
    interchange.                      recherche de soutenir le
                                      de/veloppement de ce qui est
                                      ne/cessaire pour faciliter cela.

9.  Conversion of existing        9.  En convertissant des textes
    machine-readable texts to         lisibles par machine de/ja`
    the new format involves the       existants, on remplacera
    translation of their              automatiquement leur codage actuel
    conventions into the syntax       par ce qui est ne/cessaire pour les
    of the new format.  No            rendre conformes au format nouveau.
    requirements will be made for     Nul n'exigera l'ajout
    the addition of information       d'informations qui ne sont pas
    not already coded in the          de/ja` repre/sente/es dans ces
    texts.                            textes.

                                         (trad. P. A. Fortier)

                            ******************

The further organization and drafting of the guidelines will be
supervised by a steering committee selected by the three sponsoring
organizations:  ACH (the Association for Computers and the Humanities),
ACL (the Association for Computational Linguistics), and ALLC (the
Association for Literary and Linguistic Computing).  Drafts of the
guidelines will be submitted for comment to an editorial committee with
representatives of all participating organizations (in addition to the
sponsors, thus far:  the Modern Language Association, the Association
for Computing Machinery Special Interest Group for Information
Retrieval, and the Association of American Publishers; the following
groups have indicated interest informally but have not yet formally
pledged participation, in most cases pending a formal vote: the
Linguistic Society of America, the Association for Documentary Editing,
the American Philological Association. The American Anthropological
Association, plus several organizations within Europe, are now being
asked to consider participation.

The interchange format defined by the guidelines is expected to be
compatible with the Standard Generalized Markup Language defined
by ISO 8859, if that proves compatible with the needs of research.  The
needs of specialized research interests will be addressed wherever it
proves possible to find interested groups or individuals to do the
necessary work and achieve the necessary consensus.  Formation of
specific working groups will be announced later; in the meantime, those
interested in working on specific problems are invited to contact
either Dr. C. M. Sperberg-McQueen, Computer Center, University of
Illinois at Chicago (M/C 135), P.O. Box 6998, Chicago IL 60680 (on
Bitnet: U18189 at UICVM), or Prof. Nancy Ide, Dept. of Computer
Science, Vassar College, Poughkeepsie NY 12601 (on Bitnet:  IDE at
VASSAR).

                                                 - N.I., C.M.S-McQ

------------------------------------------------------------------------------

                    List of Participants

  NOTE: Association names are given following the names of their
        representatives at this meeting.

   Helen Aguera, National Endowment for the Humanities
   Robert A. Amsler, Bell Communications Research
   David T. Barnard, Department of Computing and Information Science,
      Queen's University, Ontario
   Lou Burnard, Oxford Text Archive
   Roy Byrd, IBM Research
   Nicoletta Calzolari, Istituto di linguistica computazionale, Pisa
   David Chestnutt  (Assoc. for Documentary Editing, American Historical
      Assoc.), Department of History, University of South Carolina
   Yaacov Choueka (Academy of the Hebrew Language), Department of
      Mathematics and Computer Science, Bar-Ilan University
   Jacques Dendien, Institut National de la Langue Francaise
   Paul A. Fortier, Department of Romance Languages, University of
      Manitoba
   Thomas Hickey, OCLC Online Computer Library Center
   Susan Hockey  (Association for Literary and Linguistic Computing),
      Oxford University Computing Service
   Nancy M. Ide (Association for Computers and the Humanities),
      Department of Computer Science, Vassar College
   Stig Johansson, International Computer Archive of Modern English,
      University of Oslo
   Randall Jones  (Modern Language Association), Humanities Research
      Computing Center, Brigham Young University
   Robert Kraft, Center for the Computer Analysis of Texts, University of
      Pennsylvania
   Ian Lancashire, Center for Computing in the Humanities, University of
      Toronto
   D. Terence Langendoen (Linguistic Society of America), Graduate
      Center, City University of New York
   Charles (Jack) Meyers, National Endowment for the Humanities
   Junichi Nakamura, Department of Electrical Engineering, Kyoto
      University
   Wilhelm Ott, Universitaet Tuebingen
   Eugenio Picchi, Istituto di linguistica computazionale, Pisa
   Carol Risher (American Association of Publishers), American
      Association of Publishers, Inc.
   Jane Rosenberg, National Endowment for the Humanities
   Jean Schumacher, Centre de traitement e/lectronique de textes,
      Universite/ catholique de Louvain a` Louvain-la-neuve
   J. Penny Small (American Philological Association), U.S. Center for
      the Lexicon Iconographicum Mythologiae Classicae, Rutgers
      University
   C.M. Sperberg-McQueen, Computer Center, University of Illinois at
      Chicago
   Paul Tombeur, Centre de traitement e/lectronique de textes,
      Universite/ catholique de Louvain a` Louvain-la-neuve, Belgium
   Frank Tompa, New Oxford English Dictionary Project, University of
      Waterloo
   Donald E. Walker (Association for Computational Linguistics), Bell
      Communications Research
   Antonio Zampolli, Istituto di linguistica computazionale, Pisa, Italy

------------------------------

Date: Thu, 3 Dec 87 20:27 EST
From: ganguly@ATHENA.MIT.EDU
Subject: DCG

Hi! 
	Does someone have a Definite Clause Grammar parser written in
Edinburgh PROLOG that I may use as an user interface ? 
Thanking in advance,


Jaideep Ganguly

------------------------------

Date: Fri, 20 Nov 87 11:46 EST
From: Bruce Nevin <bnevin@cch.bbn.com>
Subject: Re:  measures of "Englishness"

Re statistical measures of `Englishness':

A number of studies were made of admissable and inadmissable phoneme
sequences in English vocabulary in the '50s.  One application was
provision of a list of unused potential English vocabulary for new trade
names.  There may be something about this in Gleason's old textbook.
There are some examples illustrating the general method of generating
tables of next-successor phonemes or of next-successor morphemes in
words in Harris's _Methods in Structural Linguistics_ (1952).

In his 1968 book _Mathematical Structures of Language_, Harris
summarizes results of computer test of a hypothesis made earlier in his
`From phoneme to morpheme' paper (sorry, I don't have the reference--
_Language_ in the early '50s I think).  The report of results of the
test appears in full in one of the TDAP papers from U. Penn.  The
general observation is that the number of next successors drops as you
proceed along the phoneme sequence making up a morpheme, and rises again
when you get to morpheme boundary, reflecting the relative arbitrariness
of how the next morpheme may begin.  Thus for the sentence `Dogs were
indisputably quicker', the number of next successors for each phoneme is
as follows (numbers under phonemes):

  d  o g . z . w ^ r . i  n . d  i  s . p  y u w t . ^ b . l i y .
  12 7 29  29  7 3 28  13 28  10 14 21  9  2 2 2 28  2 4   2 2 28 


  k  w i  k . ^ r .
  12 8 10 28  3 29

The dots indicating morpheme boundaries suggested by the test were not
input to the test, and are included only to clarify results.  The
boundary between the last two syllables of `indisputably' is the least
strongly indicated.  Running the test in reverse order (next
predecessors, as it were) helps confirm or eliminate marginal cases.
And all results are subject to regularization by standard distributional
methods of linguistics.

I have altered the display on p. 25 of Harris's book (1) by using ^ for
schwa and (2) by estimating the numbers from his graph.  I may not have
got the numbers just right but they are certainly good enough to make
the point.

Bruce Nevin
bn@cch.bbn.com

(Disclaimer:  if you infer anything from this about the opinions of my
employer, its clients, etc, it's not by my intent, and you're on your own.)

------------------------------

Date: Wed, 25 Nov 87 17:20 EST
From: Steve Cassidy <steve@comp.vuw.ac.nz>
Subject: Re: Lip Movement and Mental Lexicons?

   Date: Sun, 15 Nov 87 10:41 EST
   From: Murray Watt <murrayw@utai.UUCP>
   Subject: Re: Lip Movement and Mental Lexicons?


   What does phonemic represention have to do with LEXICAL MEANING?
   (Phonemic meaning is all the rage in current linguistic research,
   but I think this is a different type of meaning.)
   ...
   I have never SEEN any arguments that the phonemic representation
   resides in the same location as lexical enties and I have never 
   heard of a letter based lexicon in the mind. Are you sure your not
   confusing dictionaries and the human mind? 8-)

							  Murray Watt 

   
The current `best' theory of human word processing (going from printed word to
`lexical item') is based on making analogies with stored representations of
the words based on a letter mediated representation. That is there are letters
in there somewhere but they may be grouped or organised in a way which is not
yet clear. 

The current best theory of reading development suggests that it is heavily
tied in with spelling development and that the same `lexical entry' is used
for both, and that there is transference between the two skills. For competent
spelling a letter by letter representation of the word is needed, sound to
letter rules don't work well enough. Similarly it would seem that a phonemic
representation is needed to pronounce (some) words. 

So the mental lexicon should contain references to an orthographic
representation (probably close to letter strings) and a phonemic
representation.

I don't know what lexical meaning is. Do you?

The judgement as to 'best' theories above is my own.

Steve Cassidy				        ACSnet: steve@vuwcomp.nz|
Victoria University, Private Bag,  -------------------------------------|
Wellington, New Zealand	             UUCP: ...seismo!uunet!vuwcomp!steve|

"If God had meant us to be perfect, He would have made us that way"
					     - Winston Niles Roomford III

------------------------------

Date: Wed, 25 Nov 87 08:45 EST
From: Richard Wexelblat <rutgers!philabs.philips.com!rlw>
Subject: Re: Language Learning

Readers of this group might be interested in looking up the Ph.D. dissertation
of Kathy Hirsh-Pasek (Univ. of Penna., 1980+-2?) who did an extensive study of
language learning in hearing children of deaf parents.  As I recall, she
concluded that there was no statistically significant difference -- but I
don't really remember the parameters of the study.  Perhaps someone with
access to _Dissertation_Abstracts_ will look up the specific reference.
-- 

--Dick Wexelblat  {uunet|ihnp4|decvax}!philabs!rlw
		  rlw@philabs.philips.com

------------------------------

Date: Tue, 1 Dec 87 11:56 EST
From: Rick Wojcik <rwojcik@bcsaic.UUCP>
Subject: Re: Language Learning (a Turing test)


In article <2363@tut.cis.ohio-state.edu> paul@tut.cis.ohio-state.edu (Paul W. Placeway) writes:
>
>Actually, the "story" I was thinking of is similar, but with a big
>difference: I am told that Dr. Lehiste (who's native language is
>Estonian), when traveling in Germany, regularly fools native speakers
>into thinking that she is German, but from some other region.  From
>what I have been told, this effect is true, even for extended
>conversations.
>
I am familiar with your examples, since I took my undergraduate and
graduate degrees in linguistics at OSU.  Having studied Estonian with
Dr. Lehiste, a world-renowned acoustician and phonetician, I can well
believe that she fools native German speakers.  She has pointed out some
very subtle differences between Estonian & German--for example, the fact
that word-initial vowels in German are always preceded by a glottal
stop, but that those in Estonian never are.  I may be wrong, but I don't
think that she has native-like control over this aspect of German.  When
did she learn German, anyway?  That's an essential point here.  (Don't
forget that the Estonia of her childhood had close ties to Germany.)  It
is also worth noting that, despite her many years of residence in
America and her linguistic sophistication, she retains a noticeable
foreign accent in English.  Her control of English is about as good as
it can get in adult language learners.

>The similarity of dialect does not allways hold either.  Elizabeth
>Zwicky does not speak the same regional dialect of SAE that I do, even
>though the two of us spent the majority of our lives growing up within
>10 miles of each other, in the same side of the same city.  Our

You miss the point.  I never said anything about the social and ethnic
factors that shape dialects.  The Columbus neighborhood that you and she
grew up in contains a mixture of Northern and Midland dialects.
Elizabeth's dialect (Northern) and yours (Midland?) are recognizably
American.

===========
Rick Wojcik   rwojcik@boeing.com

------------------------------

Date: Tue, 1 Dec 87 14:23 EST
From: goldfain@osiris.cso.uiuc.edu
Subject: Re: Language Learning (anecdotes)


I  think the  "crystallization   hypothesis"  in language   acquisition  is an
hypothesis which by its very nature will snag people into a debate.  I think a
review of the overall nature of this hypothesis and debate are  instructive as
to something which should be avoided whenever possible in science.

1) We have an observable phenomenon at a very high level of complexity:
   It concerns fine distinctions in natural language behavior.
2) The observations of  the phenomenon are  not  well pinned down: Researchers
   mention something about "mastery" of the language,  then sometimes back off
   and   only simply make claims  about   phonetic categorial perception, then
   shift back to discussing scores on grammar tests among people who have been
   in  a culture  for  10-20 years,  immigrating at  different  times in their
   lives, etc.

   **************************************************************************
   *  I am not saying the phenomenon isn't real!  There are observable and  *
   *  interesting phenomena here.  I am just qualifying that.               *
   **************************************************************************

3) The phenomena *suggests* that *possibly* there is a physiological basis for
   such trends  and differences as are  observed.  To  make a really  concrete
   claim, it  *suggests* that perhaps  some maturation  process  in the normal
   human brain occurs at about mid teenage years.
4) There  are lots of  other mechanisms that are consistent  with the observed
   phenomena: a wide   range of psychological "lower-level"  factors have been
   listed in the current debate in this note file.
5) If one  really steps back and looks  at this objectively,  we can tell that
   the "experiments" ("studies" is actually a  better word) thus far performed
   and currently underway will  never help distinguish whether this phenomenon
   has a physiological basis or merely a psychological basis, or a combination
   of both (don't forget that possibility!)
6) There is a large set of anecdotal rumor floating around that is  only going
   to keep the issue cloudy.  It may keep us from the wrong conclusion, but it
   is not going to settle us down on whatever the correct answer is.

     I think the only way  to settle the  matter will have  to wait on tighter
experimentation (if it is ever judged that this issue is worth the experiments
it would  take to  settle it.)   It will require  a great  deal of progress in
neurophysiology,   or    some  volunteers   for  some  outrageous   psychology
experiments.  (Find me 100  open-minded  adults who  will set aside  all other
interests for at least 5 years of their lives ... )

In other words, I think the moral of this issue  is that you cannot  expect to
settle an issue that is several layers of abstraction below the level  of your
observational apparatus.  (In this case it might be more  than "several".)  In
a sense I'm saying: "Go back to the lab and let's look for other things we can
get a better grip on - this issue will have to wait until another day."

                 Mark Goldfain      arpa:     goldfain@osiris.cso.uiuc.edu
                                    US Mail:  Mark Goldfain
           (A lowly student at)-->            Department of Computer Science
                                              University of Illinois at U-C
                                              1304 West Springfield Avenue
                                              Urbana, Illinois  61801

------------------------------

[Editor's Note: There is still a backlog of items on language learning which
will be posted next week.]

End of NL-KR Digest
*******************