[comp.lang.prolog] nlp code: request for comments

rpg@rex.cs.tulane.edu (Robert Goldman) (10/27/90)

I'm teaching a course in NLP here at Tulane, using James Allen's
Natural Language Understanding as a text, and using Prolog as the
language of instruction.  I've written a small semantic processing
system along the lines of the one Allen outlines in his book.  That
is:

1. semantic processing is done on a completed parse;
2. semantic information (selection restrictions, and object typing) is
represented in isa hierarchies;
3. semantic representation is a quasi-logical form, centered around a
representation of the head verb of the sentence, with a fixed number
of verb cases filled by semantic representations of other sentence
constituents.
4. I have used feature structures, along the lines of the ones in Gazdar
and Mellish's Natural Language Processing in PROLOG, as a
representation for the quasi-logical form.

I would like to give this semantic processor to my students to
examine, and I would appreciate it if any of you could comment on the
coding, and let me know if I have committed any prolog solecisms.

Some notes:

1. I have not tried to make it terribly efficient:  preferring
clarity over efficiency where possible.

2. I'm not terribly concerned about details of the syntactic
representation that are not of immediate import.  E.g., I have
bracketed the sentence in ways that have to do less with linguistic
validity than with ease of translation into semantic representation,
and played fast and loose with terms like nbar.

3. I have not felt free to provide in the following source listing
code for the two predicates pathval/4 and unify/2.  They are
predicates for unifying feature structures, and for finding values in
feature structures, and I have used code from Gazdar & Mellish, which
is probably protected by copyright.

The comment for pathval is
  pathval(Dag,Path,Value,Remainder)
  The pathvalue of Dag at Path is Value.
  Remainder is Dag with the Path and Value
  removed.

4. The code is admittedly fragmentary.  I have just implemented a
small fragment of Allen's suggested system.  E.g., there is no
provision for embedded sentences or adjectives.

EDITORIAL COMMENT:
Quite frankly, I have had a fairly difficult time teaching this course
using Allen's book and prolog together.  I think I had good reasons
for choosing each of these, but I wouldn't recommend the combination
to anyone else.  Why?  Because the discussion in Allen's book gives a
very procedural view, and it's very hard for me, let alone my
students, to read his discussion, `declarativify' it, and then develop
prolog code based on that.

Following is a shar archive of the system.  Thanks for your patience
with my long posting.  If there's interest, I will take the comments
people send me, incorporate them in the code, and make the results
available for anonymous ftp.

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  gensym.pl hierarchy.pl sem.pl grammar.pl load.pl
#   lexicon.pl 
# Wrapped by rpg@rex on Fri Oct 26 13:44:58 1990
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'gensym.pl' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'gensym.pl'\"
else
echo shar: Extracting \"'gensym.pl'\" \(303 characters\)
sed "s/^X//" >'gensym.pl' <<'END_OF_FILE'
X
X/*
X  a utility predicate
X  new_atom(A)
X  A must be unbound.  Will be bound to a new name.
X  */
Xnew_atom(A) :- counter(N),
X	number_chars(N, L),
X	append("foo",L,Name),
X	atom_chars(A,Name).
X
Xcounter(N) :-
X	count(N),!,
X	retract(count(N)),
X	N1 is N + 1,
X	assert(count(N1)).
Xcounter(0) :-
X	assert(count(1)).
END_OF_FILE
if test 303 -ne `wc -c <'gensym.pl'`; then
    echo shar: \"'gensym.pl'\" unpacked with wrong size!
fi
# end of 'gensym.pl'
fi
if test -f 'hierarchy.pl' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'hierarchy.pl'\"
else
echo shar: Extracting \"'hierarchy.pl'\" \(2690 characters\)
sed "s/^X//" >'hierarchy.pl' <<'END_OF_FILE'
X:- op(500, xfx, isa).
X
Xfeatures(Thing,Feats) :-
X	local_feats(Thing,Feats),
X	inherited_feats(Thing,Feats).
X
Xlocal_feats(Thing,Feats) :- feats(Thing,Feats),!.
Xlocal_feats(_,_).
X
Xinherited_feats(Thing,Feats) :-
X	Thing isa OtherThing,!,
X	features(OtherThing, NewFeats),
X	unify(Feats, NewFeats).
Xinherited_feats(_,_).
X
Xfindcase(Verb,Indicator,Case) :-
X	case(Indicator,Verb,Case).
Xfindcase(Verb,Indicator,Case) :-
X	Verb isa OtherVerb,!,   %single inheritance
X	findcase(OtherVerb,Indicator,Case).
X
X/*
X  feats(Type,FeatStr)
X  The features in FeatStr are characteristic of Type
X  */
X%features for things corresponding to nouns
Xfeats(physobj,[concrete:t|_]).
Xfeats(organic,[organic:t|_]).
Xfeats(inanimate,[organic:f|_]).
Xfeats(animate,[animate:t|_]).
Xfeats(vegetable,[animate:f|_]).
Xfeats(human,[human:t|_]).
Xfeats(dog,[human:f|_]).
Xfeats(cat,[human:f|_]).
X%features for verbs
Xfeats(actionverb,[agent:[animate:t|_],
X                  experiencer:nil,
X                  co_agent:[animate:t|_],
X                  at_time:[time:t|_],
X		  beneficiary:[human:t|_],
X		  at_loc:[animate:f|_]|_]).
Xfeats(objaction,[theme:[physobj:t|_],
X                 instrument:[animate:f|_]|_]).
Xfeats(put,[to_loc:[animate:f|_]|_]).
Xfeats(transfer,[to_poss:[animate:t|_]|_]).
Xfeats(interhuman,[agent:[human:t|_],
X            theme:[human:t|_]|_]).
Xfeats(break,[agent:[human:t|_],
X             theme:[animate:f|_]|_]).
Xfeats(unactionverb,[agent:nil|_]).
X
X
Xcase(subject,actionverb,agent).
Xcase(subject,objaction,instrument).
Xcase(subject,objaction,theme).
Xcase(dobj,objaction,theme).
Xcase(dobj,interhuman,theme).
Xcase(iobj,objaction,beneficiary).
Xcase(iobj,transfer,to_poss).
Xcase(by,objaction,agent).
Xcase(at,actionverb,at_loc).
Xcase(on,actionverb,at_loc).
Xcase(in,actionverb,at_loc).
Xcase(along,actionverb,at_loc).
Xcase(from,transfer,from_poss).
Xcase(from,transfer,from_loc).
Xcase(to,transfer,to_poss).
Xcase(to,transfer,to_loc).
Xcase(for,actionverb,beneficiary).
Xcase(with,actionverb,instrument).
Xcase(with,actionverb,co_agent).
Xcase(subject,unactionverb,experiencer).
Xcase(dobj,unactionverb,theme).
X
X
X/*
X  isa hierarchy
X  */
Xorganic isa physobj.
Xinanimate isa physobj.
X
Xanimate isa organic.
Xvegetable isa organic.
X
Xhuman isa animate.
Xdog isa animate.
Xcat isa animate.
X
Xtree isa vegetable.
Xplant isa vegetable.
X
Xrock isa inanimate.
Xcar isa inanimate.
Xbuilding isa inanimate.
Xhammer isa inanimate.
Xwindow isa inanimate.
X
Xhouse isa building.
Xapartment_complex isa building.
X
X%verb isa hierarchy
Xobjaction isa actionverb.
Xinstraction isa objaction.
Xinterhuman isa actionverb.
Xkiss isa interhuman.
Xlove isa interhuman.
Xput isa objaction.
Xtransfer isa objaction.
Xbreak isa objaction.
Xcry isa actionverb.
Xsee isa unactionverb.
END_OF_FILE
if test 2690 -ne `wc -c <'hierarchy.pl'`; then
    echo shar: \"'hierarchy.pl'\" unpacked with wrong size!
fi
# end of 'hierarchy.pl'
fi
if test -f 'sem.pl' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'sem.pl'\"
else
echo shar: Extracting \"'sem.pl'\" \(3619 characters\)
sed "s/^X//" >'sem.pl' <<'END_OF_FILE'
X/*
X  SEM.PL
X  Rules for semantic processing
X  */
Xtranslate(String,Tree,Sem) :-
X	parse(Tree,Mood,String),
X	semantics(Tree,Mood,Sem).
X
Xsemantics(smaj(Tree),Mood,Sem) :- Mood \= wh_q,
X	Sem =.. [Mood,SSem],
X	sem_translate(Tree,SSem,nil).
X
Xsemantics(smaj(Tree),wh_q,wh_q(Whvar,SSem)) :-
X	new_atom(Whvar),
X	sem_translate(Tree,SSem,Whvar).
X
Xsem_translate(s(NPTree,VPTree),Sem,Whvar) :-
X	vp_sem(VPTree,NPSlot, Sem, Whvar),
X	np_sem(NPTree,NP,Whvar),
X	unify(NP,NPSlot),
X	new_atom(EventName),
X	unify([token:EventName|_],Sem).
X
X
X/*
X  vp_sem(VPTree, NPSlot, Sem, Whvar)
X  The semantics of VPTree is Sem.  A substructure of Sem is
X  NPSlot.  This is the substructure with which the semantic
X  representation of the subject np should be unified.
X  Whvar is an input variable indicating the variable corresponding
X  to the wh-word in a wh_q.
X  */
Xvp_sem(VPTree, NPSlot, Sem, Whvar) :-
X	head_verb(VPTree, Verb),
X	verb_sem(Verb,VerbSense,Sem),
X	findcase(VerbSense,subject,Case),
X	pathval(Sem,Case,NPSlot,_),
X	complement_semantics(VPTree,VerbSense,Sem,Whvar).
X
X/*
X  if there's a verb complement, it must have some
X  semantics.  If not, no problem.
X  */
Xcomplement_semantics(VPTree,VerbSense,Sem,Whvar) :-
X	verb_complement(VPTree,Comp),!,
X	comp_sem(Comp,VerbSense,Sem,Whvar).
X
Xhead_verb(vp(aux(_),v(Verb),compl(_)),Verb) :- !.
X
Xverb_complement(vp(aux(_),v(_),Comp),Comp) :- !.
X
X
X/*
X  comp_sem(Comp,VerbSense,Sem)
X  */
Xcomp_sem(compl(nil),_,_,_).
Xcomp_sem(compl(Tree1,Tree2),VerbSense,Sem,Whvar) :-
X	np_sem(Tree1,Iobj,Whvar),
X	findcase(VerbSense,iobj,Case1),
X	unify([Case1:Iobj|_],Sem),
X	np_sem(Tree2,Dobj,Whvar),
X	findcase(VerbSense,dobj,Case2),
X	unify([Case2:Dobj|_],Sem).
Xcomp_sem(compl(Tree),VerbSense,Sem,Whvar) :-
X	np_sem(Tree,Dobj,Whvar),
X	findcase(VerbSense,dobj,Case),
X	unify([Case:Dobj|_],Sem).
X
X
X  
X
X/*
X  np_sem(NPTree,NPSem,Wh_var),
X  */
Xnp_sem(np(what),[token:Wh_var|_],Wh_var).
Xnp_sem(np(who),[token:Wh_var|_],Wh_var).
Xnp_sem(NPTree,Sem,Wh_var) :-
X	np_sem1(NPTree,Sem,Wh_var),
X	new_atom(A),
X	unify([token:A|_],Sem).	
X
Xnp_sem1(np(Nbar,Mods), NbarSem, Wh_var) :-
X	np_mod_sem(Mods, ModsSem),
X	nbar_sem(Nbar, NbarSem, Wh_var),
X	unify([mods:ModsSem|_],NbarSem).
Xnp_sem1(np(Nbar), NbarSem, _) :-
X	nbar_sem(Nbar, NbarSem).
X
Xnbar_sem(nbar(prop_n(Name)),Sem) :- prop_noun_sem(Name,Sem).
Xnbar_sem(nbar(art(Art),n(Noun)), Sem) :-
X	art_sem(Art,ArtSem),
X	noun_sem(Noun,Sem),
X	unify(Sem,ArtSem).
Xnbar_sem(nbar(pn(PN)),Sem) :-
X	pronoun_sem(PN,Sem).
X
Xnp_mod_sem((Mod,Mods),Sem) :-
X	np_mod_sem(Mod,Sem1),
X	np_mod_sem(Mods,Sem),
X	unify(Sem1,Sem).
Xnp_mod_sem(pp(p(P),np(NPTree)),Sem) :-
X	pp_noun_mod(P, NPSlot, Sem),
X	np_sem(NPTree,NP,nil),
X	unify(NP,NPSlot).
X
Xprop_noun_sem(Name,Sem) :-
X	name_sem(Name,Sem),
X	features(human,PSem),
X	unify(Sem,PSem).
X
Xpronoun_sem(PN,Sem) :-
X	PN \= it,
X	noun_sem(person,PSem),
X	pnsem(PN,Sem),
X	unify(Sem,PSem).
Xpronoun_sem(it,Sem) :-
X	pnsem(it, Sem),
X	noun_sem(inanimate,NSem),
X	unify(Sem,NSem).
X
X/*
X  pp_noun_mod(Preposition, NPSlot, Structure)
X  The prep builds the semantic Structure;
X  a substructure of this semantic Structure is
X  NPSlot.  This is the place in the structure which should be
X  unified with the semantics of the np which is the argument
X  of the pp.
X  */
Xpp_noun_mod(at, NPSlot, [at_location:NPSlot|_]) :-
X	NPSlot = [organic:f|_].
X
Xpp_noun_mod(on, NPSlot, [on_location:NPSlot|_]) :-
X	NPSlot = [organic:f|_].
X
Xpp_noun_mod(in, NPSlot, [in_location:NPSlot|_]) :-
X	NPSlot = [organic:f|_].
X	
X
Xnoun_sem(Noun,Sem) :- noun_sense(Noun,Sense),
X	features(Sense, Sem).
X
Xverb_sem(Verb,Sense,Sem) :- verb_sense(Verb,Sense),
X	features(Sense, Sem).
X
Xart_sem(the,[ref:def|_]).
Xart_sem(a,[ref:indef|_]).
X
X
X
X
X
X
END_OF_FILE
if test 3619 -ne `wc -c <'sem.pl'`; then
    echo shar: \"'sem.pl'\" unpacked with wrong size!
fi
# end of 'sem.pl'
fi
if test -f 'grammar.pl' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'grammar.pl'\"
else
echo shar: Extracting \"'grammar.pl'\" \(2707 characters\)
sed "s/^X//" >'grammar.pl' <<'END_OF_FILE'
X/*
X  GRAMMAR.PL
X  The grammar we will use with semantic processing.
X*/
X
Xparse(Tree,Mood,String) :- smaj(Tree,Mood,String,[]).
X
X/*
X  smaj has been added to the grammar for the sake of
X  allowing it to later be expanded to include
X  clauses (embedded sentences), like the ones in our
X  verb complementation homework.
X  */
Xsmaj(smaj(Tree),Mood) --> s(Tree,Mood).
X
X/*
X s(ParseTree,Mood)
X */
Xs(s(NP,VP),decl) --> np(NP,NPnum,[]/[]),
X	vp(VP,VPnum,[]/[],[]/[]),
X	{num_agree(NPnum,VPnum,_)}.
X
Xs(s(NP,VP),yes-no-q) --> 	
X	aux(Aux,AuxNum,[]/[]),
X	np(NP,NPnum,[]/[]),
X	vp(VP,AuxNum,[Aux,AuxNum]/[],[]/[]),
X	{num_agree(NPnum,AuxNum,_)}.
X
X%A wh-question querying the subject is normal in
X%structure
Xs(s(NP,VP),wh_q) -->
X	wh(GapInfo),
X	np(NP,NPnum,[GapInfo]/[]),
X	vp(VP,VPNum,[]/[],[]/[]),
X	{num_agree(NPnum,VPNum,_)}.
X%querying anything else requires aux-inversion.
Xs(s(NP,VP),wh_q) -->
X	wh(GapInfo),
X	aux(Aux,AuxNum,[]/[]),
X	np(NP,NPnum,[]/[]),
X	vp(VP,AuxNum,[Aux,AuxNum]/[],[GapInfo]/[]),
X	{num_agree(NPnum,AuxNum,_)}.
X
X/*
X  vp(Tree,Number,AuxGap,GapInfo)
X  The vp rules have GapInfo as an argument because they have to
X  pass the gaps on to the np's that may be nested in them.
X  */
Xvp(vp(aux(Aux),v(V),Compl),Num,AuxGap,GapInfo) --> aux(Aux,Num,AuxGap),
X	verb(V, Num, Form),
X	{aux_agree(Aux,Form),
X	 subcat(V,Subcat)},
X	compl(Compl,Subcat,GapInfo).
X
X
X/*
X  compl(Tree,Subcat,GapInfo)
X  verb complement with Tree representation, admitted by a verb with
X  subcategorization alternatives SubCat.
X  */
Xcompl(compl(nil),Subcat,X/X) --> {member(iv,Subcat)},[].
Xcompl(compl(NP),Subcat,GapInfo) --> {member(tv,Subcat)},
X	np(NP,_,GapInfo).
Xcompl(compl(NP1,NP2),Subcat,GapIn/GapOut) --> {member(bv,Subcat)},
X	np(NP1,_,GapIn/Gap1),
X	np(NP2,_,Gap1/GapOut).
X
X/*
X  np(ParseTree,Number,GapInfo)
X  */
Xnp(np(Nbar),Num,GapIn/GapOut) -->
X	nbar(Nbar,Num,GapIn/GapOut).
Xnp(np(Nbar,Mods),Num,GapIn/GapOut) -->
X	nbar(Nbar,Num,GapIn/Gap1),
X	npmods(Mods,Gap1/GapOut).
Xnp(np(Gap),_,[np(Gap)|Gaps]/Gaps) --> [].
X
X/*
X  npmods(TreeFragment,Gap)
X  for the moment, we allow only prepositional
X  phrases as modifiers
X  */
Xnpmods(NPmod,GapIn/GapOut) --> pp(NPmod,GapIn/GapOut).
Xnpmods((NPmod,NPmods),GapIn/GapOut) --> pp(NPmod,GapIn/Gap1),
X	npmods(NPmods,Gap1/GapOut).
X
X
X/*
X  nbar(Tree,Num,GapInfo)
X  */
Xnbar(nbar(art(Art),n(Noun)),Num,Gap/Gap) --> art(Art,ArtNum),
X	noun(Noun,NounNum),
X	{num_agree(ArtNum,NounNum,Num)}.
Xnbar(nbar(prop_n(Name)),[s3],Gap/Gap) --> proper_noun(Name).
Xnbar(nbar(pn(PN)),Num,Gap/Gap) --> pronoun(PN,Num).
X
X/*
X  pp(Tree, GapInfo)
X  */
Xpp(pp(p(Prep),np(NP)),GapIn/GapOut) --> prep(Prep),np(NP,_,GapIn/GapOut).
X
X/*
X  aux(Aux,Num,AuxGapInfo
X  */
Xaux(Aux,Num,[Aux,Num]/[]) --> [].
Xaux(Aux,Num,AuxGap/AuxGap) --> aux(Aux,Num).
END_OF_FILE
if test 2707 -ne `wc -c <'grammar.pl'`; then
    echo shar: \"'grammar.pl'\" unpacked with wrong size!
fi
# end of 'grammar.pl'
fi
if test -f 'load.pl' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'load.pl'\"
else
echo shar: Extracting \"'load.pl'\" \(251 characters\)
sed "s/^X//" >'load.pl' <<'END_OF_FILE'
X/*
X  The load file for our parser with semantic analysis
X  */
X:-consult(library(not)).
X:-consult(library(sets)).
X:-consult(library(basics)). 
X
X:-compile([dag_unify,gensym]).
X
X:- consult([grammar,lexicon,sem,hierarchy]). %fs_utils
X
X:- dynamic count/1.
END_OF_FILE
if test 251 -ne `wc -c <'load.pl'`; then
    echo shar: \"'load.pl'\" unpacked with wrong size!
fi
# end of 'load.pl'
fi
if test -f 'lexicon.pl' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'lexicon.pl'\"
else
echo shar: Extracting \"'lexicon.pl'\" \(2521 characters\)
sed "s/^X//" >'lexicon.pl' <<'END_OF_FILE'
X/*
X  LEXICON.PL
X  */
X
X
X
X/*
X  noun(Lexeme,Num)
X  */
Xnoun(dog,[p3]) --> [dogs].
Xnoun(dog,[s3]) --> [dog].
Xnoun(hammer,[s3]) --> [hammer].
Xnoun(window,[s3]) --> [window].
X
X
X/*
X  art(Article,Num)
X  */
Xart(the,[s3,p3]) --> [the].
Xart(a,[s3]) --> [a].
X
X
X/*
X  verb(Root,Number-and-person,Form)
X  Form = 1 tenseless/present
X         2 -s form (3s)
X	 3 past
X	 4 present participle
X	 5 past participle
X  */
Xverb(cry,[s1,p1,s2,p2,p3],1) --> [cry].
Xverb(cry,[s3],2) --> [cries].
Xverb(cry,_,4) --> [crying].
Xverb(see,[s1,p1,s2,p2,p3],1) --> [see].
Xverb(see,[s3],2) --> [sees].
Xverb(see,_,3) --> [saw].
Xverb(see,_,4) --> [seeing].
Xverb(see,_,5) --> [saw].
Xverb(love,[s3],2) --> [loves].
Xverb(love,[s1,p1,s2,p2,p3],1) --> [love].
Xverb(love,_,4) --> [loving].
Xverb(break,[s1,p1,s2,p2,p3],1) --> [break].
Xverb(break,[s3],2) --> [breaks].
Xverb(break,_,3) --> [broke].
Xverb(break,_,5) --> [broken].
X
Xproper_noun(john) --> [john].
Xproper_noun(jack) --> [jack].
Xproper_noun(mary) --> [mary].
X
X/*
X  pronoun(Pronoun,NumberandPerson
X  */
Xpronoun(i,[s1]) --> [i].
Xpronoun(we,[p1]) --> [we].
Xpronoun(you,[s2,p2]) --> [you].
Xpronoun(he,[s3]) --> [he].
Xpronoun(she,[s3]) --> [she].
Xpronoun(it,[s3]) --> [it].
Xpronoun(they,[p3]) --> [they].
X
Xprep(at) --> [at].
Xprep(on) --> [on].
Xprep(in) --> [in].
Xprep(along) --> [along].
Xprep(from) --> [from].
Xprep(to) --> [to].
Xprep(for) --> [for].
Xprep(by) --> [by].
Xprep(with) --> [with].
X
X
X/*
X  aux(AuxVerb,Number)
X  */
Xaux(do,[s1,p1,s2,p2,p3]) --> [do].
Xaux(do,[s3]) --> [does].
Xaux(be,[s1]) --> [am].
Xaux(be,[s3]) --> [is].
Xaux(be,[p1,s2,p2,p3]) --> [are].
Xaux(nil,_) --> [].
X
X/*
X  wh(GapIntroduced)
X  */
Xwh(np(who)) --> [who].
Xwh(np(what)) --> [what].
X
X/*
X  num_agree(N1,N2,N3)
X  N3 is the non-empty result of making N1 and N2 agree wrt
X  number.
X  */
Xnum_agree(N1,N2,N3) :- intersection(N1,N2,N3),
X	( N3 = [] -> fail
X	| true
X	).
X
X%do can be followed by the tenseless form
Xaux_agree(do,1).
X% be can be followed by the present participle
Xaux_agree(be,4).
X%participles not permitted without auxverb
Xaux_agree(nil,1).
Xaux_agree(nil,2).
Xaux_agree(nil,3).
X
Xsubcat(cry,[iv]).
Xsubcat(love,[tv]).
Xsubcat(see,[tv]).
Xsubcat(break,[iv,tv]).
X
X
X/*
X  Word sense entries
X  */
Xnoun_sense(dog,dog).
Xnoun_sense(hammer,hammer).
Xnoun_sense(window,window).
Xnoun_sense(rock,rock).
X
X
Xverb_sense(break,break).
Xverb_sense(cry,cry).
Xverb_sense(love,love).
Xverb_sense(kiss,kiss).
Xverb_sense(see,see).
X
Xname_sem(john, [name:john,gender:male|_]).
Xname_sem(jack, [name:jack,gender:male|_]).
Xname_sem(mary, [name:mary,gender:female|_]).
END_OF_FILE
if test 2521 -ne `wc -c <'lexicon.pl'`; then
    echo shar: \"'lexicon.pl'\" unpacked with wrong size!
fi
# end of 'lexicon.pl'
fi
if test ! -d '' ; then
    echo shar: Creating directory \"''\"
    mkdir ''
fi
echo shar: End of shell archive.
exit 0

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/30/90)

In article <4691@rex.cs.tulane.edu>, rpg@rex.cs.tulane.edu (Robert Goldman) writes:
> 4. I have used feature structures, along the lines of the ones in Gazdar
> and Mellish's Natural Language Processing in Prolog, as a
> representation for the quasi-logical form.

It's worth noting that their representation for feature structures
(an improper list of Feature:Value pairs) is more than somewhat ugly.
A tiny touch of pre-processing can make the source code much clearer
(declare feature clusters, e.g.
	:- features(case_frame, [agent,patient,beneficiary,...])
 and then write rules that say e.g.
	p(Subj, Features, ...) -->
		{Features^[agent] = Subj}.
) and the code that actually _runs_ much faster (because unifications
are done in-line *as* unifications, not as calls to a non-logical
unify/3 or whatever it was.

I guess I should tidy up the code I gave my students for this and post it.

> I would like to give this semantic processor to my students to
> examine, and I would appreciate it if any of you could comment on the
> coding, and let me know if I have committed any Prolog solecisms.

I hope you really meant that.

> EDITORIAL COMMENT:
> Quite frankly, I have had a fairly difficult time teaching this course
> using Allen's book and Prolog together.

I was doing the same thing exactly this year.  We ended up using rather
little of Allen.  My students got a _lot_ of handouts to make up for it.

> X  a utility predicate
> X  new_atom(A)
> X  A must be unbound.  Will be bound to a new name.

The program appears to be using Quintus Prolog (use of library(basics)
and the like...)  What on earth was wrong with the existing library
predicate gensym/1, or if the "foo" prefix was so very important,
gensym/2?  The :- dynamic declaration for count/1 should have been in
the file gensym.pl (which is the _only_ file that has any business
knowing about that predicate) not in load.pl.


There are actually two essentially unrelated things going on in the
file 'hierarchy.pl'.  First, symbols are being mapped to classes.
The improper-list-of-pairs encoding of feature structures is a rather
poor representation for types.  A much better representation is due to
Chris Mellish.  Suppose we have the single-inheritance 'ako' tree

	a
		b
			c
			d
		e
			f
			g

with the individuals cee: c, dee: d, eff: f, gee: g.
We would map the individuals to terms representing their types thus:

	type_of(cee, b(c(_))).
	type_of(dee, b(d(_))).
	type_of(eff, e(f(_))).
	type_of(gee, e(g(_))).

More generally, for each <class> / <parent> arc in the tree, we have
	class_type(<class>, T0, T) :-
		class_type(<parent>, <class>(T0), T).
The top of the hierarchy as the corresponding rule
	class_type(<top>, T, T).
When we have an individual <indiv> belonging to <class> we say
	indiv_type(<indiv>, T) :-
		class_type(<class>, _, T).

So here we have
	class_type(c, T0, T) :- class_type(b, c(T0), T).
	class_type(d, T0, T) :- class_type(b, d(T0), T).
	class_type(b, T0, T) :- class_type(a, b(T0), T).
	class_type(f, T0, T) :- class_type(e, f(T0), T).
	class_type(g, T0, T) :- class_type(e, g(T0), T).
	class_type(e, T0, T) :- class_type(a, e(T0), T).
	class_type(a, T,  T).

	class_type(C, T) :- class_type(C, _, T).

	indiv_type(cee, T) :- class_type(c, T).
	indiv_type(dee, T) :- class_type(d, T).
	indiv_type(eff, T) :- class_type(f, T).
	indiv_type(gee, T) :- class_type(g, T).

This too is the kind of thing that can be done rather neatly by
a preprocessor.  Now, imagine that we want to say that a particular
verb must have an animate subject.  We might say
	may_fill(subject, see, X) :-
		class_type(animate, T),
		indiv_type(X, T).	% X's type is compatible with T

where the class_type/2 call can be preprocessed away.

Chris Mellish pointed out that this scheme generalises to
systems where multiple classifications apply to the same thing.
For example, something of type "agreement" might be classified
according to "person", "number", and "gender", so we might have
	agreement(1 | 2 | 3, s | p, m | f | n)
With that scheme, we can easily represent things like
	agreement(_,p,_)	"plural"
	agreement(3,_,f)	"third person feminine"
and combine them:
	agreement(3,p,f)	"third person plural feminine"

This can be much more economical, and is in my view much clearer,
than lists of unstructured atoms.

	MAKE UNIFICATION WORK FOR YOU!

As a particular example of doing things clearly with terms instead
of pounding away on lists, consider the complements of a verb phrase.

Goldman's program does

	vp(...) -->
		...
		{subcat(V, Subcat)},
		compl(..., Subcat, Gap).

	compl(compl(nil),Subcat,X/X) --> {member(iv,Subcat)},[].
	compl(compl(NP),Subcat,GapInfo) --> {member(tv,Subcat)},
		np(NP,_,GapInfo).
	compl(compl(NP1,NP2),Subcat,GapIn/GapOut) --> {member(bv,Subcat)},
		np(NP1,_,GapIn/Gap1),
		np(NP2,_,Gap1/GapOut).

where subcat/2 returns a subset of {iv,tv,bv} represented as a list.
But why use a list here?  Suppose instead that we represent the
verb subcategorisation as a triple
	v(i | 0, t | 0, b | 0)
where i, t, b mean that the verb _can_ be used as an intransitive,
transitive, or ditransitive-or-benefactive respectively, and 0 in
a particular slot means it can't.  Let's move this information to
the front as well:  it is always a good idea to have the argument
which we're dispatching on be the first so that a human reader
has the least possible trouble finding it.  Then we get

	compl(v(i,_,_), comp0, Gap, Gap) --> [].
	compl(v(_,t,_), comp1(Np), Gap0, Gap) -->
		np(NP, _, Gap0, Gap).
	compl(v(_,_,b), comp2(N1,N2), Gap0, Gap) -->
		np(N1, _, Gap0, Gap1),
		np(N2, _, Gap1, Gap).


There's a lot of left-over Lisp in the code.  For example, there's
a rule that starts out

	s(s(NP,VP),yes-no-q) --> ...

Now yes-no-q is a perfectly good Lisp atom (it's a spelling of
|YES-NO-Q|) but it is a compound term in Prolog -(-(yes,no),q).
Why does that matter?  Because a later rule tries to use it as
a function symbol!

A rather worse hangover (and if it isn't a headache now, it soon
will be) from Lisp is the use of 'nil' as a "default" or "absent"
marker.  Here's a particularly important case.

	translate(String,Tree,Sem) :-
		parse(Tree,Mood,String),
		semantics(Tree,Mood,Sem).

	semantics(smaj(Tree),Mood,Sem) :- Mood \= wh_q,
		Sem =.. [Mood,SSem],
		sem_translate(Tree,SSem,nil).

	semantics(smaj(Tree),wh_q,wh_q(Whvar,SSem)) :-
		new_atom(Whvar),
		sem_translate(Tree,SSem,Whvar).

In both of the calls to sem_translate/3 we pass an atom as the
last argument.  An atom spelled "nil" means "there isn't any
Wh-variable".  An atom spelled "foo123" or the like means
"there is a Wh-variable called foo123".  That is NOT good Prolog
coding practice.

	What are the situations, and what are the associated data?
	- there is a Wh-variable X
	- there is no Wh-variable
	Invent names for these situations, and make the associated
	data the arguments of appropriate terms
	- var(X) means there is a Wh-variable X
	- novar  means there is no Wh-variable

Then later on we'll be able to ask "was there a Wh-variable" by
doing	Wh = var(_)  instead of by doing Wh \== nil.

That's far from the only problem here.  The program does
"Mood \= wh_q" in order to test whether Mood is decl or yn_q
(assuming that yes-no-q should have been yn_q).  There is
no point in using (\=)/2 here; it would be better to use the
built-in predicate (\==)/2.  But it's better still to say
exactly what you do mean, so that a human reader can see what
the possible cases for Mood are.  (The use of (=..)/2 is a
fairly reliable cue that something rather strange is going on.
This is the bit that breaks if Mood is yes-no-q.)  The variable
names aren't too good either:  there isn't any String here; but
there _is_ a list of Words.

	translate(Words, Tree, Sem) :-
		parse(Tree, Mood, Words),
		semantics(Mood, Tree, Sem).

	semantics(decl, smaj(Tree), decl(Sem)) :-
		sem_translate(Tree, Sem, novar).
	semantics(yn_q, smaj(Tree), yn_q(Sem)) :-
		sem_translate(Tree, Sem, novar).
	semantics(wh_q, smaj(Tree), wh_q(WhVar,Sem)) :-
		gensym(WhVar),
		sem_translate(Tree, Sem, var(WhVar)).

And so it goes.

It would improve the program a _lot_ to have a comment which
says exactly what a Tree or a Sem can look like.

There's a lot more that could be said.

One thing that _does_ need to be said is that I was very pleased to
see this posting, and I've put a copy of it where my students can get
at it.  Never mind the flaws, at least it's _there_ and it's a place
to _start_.  Much the same can be said about the code in the Gazdar
& Mellish book; the code there isn't very good, but it's _there_ and
is a place to _start_, whereas Allen leaves you pretty much on your own.


-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.