[comp.lang.icon] prolog2.doc

alanf@bruce.OZ (Alan Grant Finlay) (03/29/90)
.ce
.uh "Prolog in Icon version 2 : Program documentation"
.ce
Alan Finlay, Monash University, March 1990.

This version of Prolog is loosely based upon the various denotational
semantics for Prolog with which I have been acquainted\*[*\*]
.(f
* In particular: T. Nicholson and N. Foo. "A Denotational Semantics for Prolog",
to appear in ACM TOPLAS.
.)f
and discussion with colleagues at Monash University.
The motivation was to gain experience with Icon and to see if there was any
truth to the claim "It should only take about half a page of Icon to implement
a Prolog interpreter."

.sh1 "Data structures"

.ftCW
########################## global variables and types ######################

.nf
record ctxt(env,subst)    # integer[string] * ((integer | struct | null) list)
record struct(name,args)  # string * ((integer | struct) list)
record rule(ids,head,body)# string list * predicate * predicate list \\
record all(ids,body)      # string list * predicate list              } clauses
record one(ids,body)      # string list * predicate list             /
record fun(name,args)     # string * predicate list        \\ types of 
record var(name)          # string                         / predicates
global dbase              # table of clauses indexed by head name
global consult            # stack of files being consulted
global query              # top level query
.ft
.fi

Despite the lack of enforced type discipline in Icon I have used some 
self discipline.  The record and global declarations above indicate types
using "|" for disjoint sum and shared field names to get a degree of type
inheritance.
Type "ctxt" is a context for goal resolution and consists of an environment
and a substitution.  The environment is a table which maps variable identifiers
to variables.  A variable is represented by an integer which is its position
in the substitution list.  A substitution then effectively maps variables
to terms or unbound.  The null value is used to represent an unbound variable. 
A term is either another variable or a structure.
Type "struct" is used to represent structures which are either functors or
constants.  Functors have a name and a list of arguments which are terms.
Constants are represented as functors with zero arguments.

The types for clauses and predicates are used to represent the syntax of
Prolog.  I will use the words predicate and functor more or less
interchangeably since the interpreter doesn't distinguish between them
structurally.  A consequence of this is that a variable may be used as
a goal.
The use of separate domains for syntactic predicates and their run time
equivalents (called structures above) is required because the syntax is
independent of any context whereas the "terms" referred to above are only
meaningful when considered together with an appropriate context.  At various
stages in a computation a given clause or predicate will have to be used
in different contexts.  There are a few idiosyncrasies of the syntactic 
domains worth mentioning.  Clauses have the redundant field "ids" which is a
list of all the variable identifiers used in the clause.  This saves a lot
of recalculation during execution.  The types "all" and "one" are respectively
queries where all or one solution are requested.  Syntactically this is
signaled by using the turnstile ":-" for all solutions and "?-" for one.
As before type "fun" can be used with no arguments to indicate constants.
Facts are represented as rules with empty bodies.  

The database is global and consists of a table of clause lists indexed
by the name of the predicate at the head.  The index is used for efficiency
reasons only.  The global "consult" is a stack of files being consulted,
used for nesting of consult commands.  The top level query is global simply 
to simplify the parser and could have been passed to the driver as a parameter
via procedure "prog".

.ne7
.sh1 "The driver"

.ftCW
################################## driver ##################################

procedure main()
   dbase:=table([]); consult:=[&input] # empty dbase; standard input
   while \\query | *consult>0 do { # more queries possible
      prog() # parse clauses, possibly setting query as a side effect
      if \\query then case type(query) of {
         "all" : {every printsoln(query); write("no more solutions")}
         "one" : if not printsoln(query) then write("no")}
      else pop(consult)}
end

procedure printsoln(qry) # print first or next solution to qry 
   local ans,v
   every ans:=resolve(qry.body,1,*qry.body,newctxt(qry.ids,[])) do { 
      writes("yes")
      every v:=!qry.ids do writes(", ",v,"=",trmstr(ans.env[v],ans.subst))
      suspend write()}
end
.ft

Most of procedure "main" is concerned with providing a usable interactive
interface and the details are of little interest.  The parser is called
until a query is encountered and "printsoln" is used to resolve it and display
the answer substitution.  The separate
procedure "printsoln" is required because it can generate all solutions and
may be activated once only or resumed until it fails depending on the type
of query.
The procedure "printsoln" simply uses procedure "resolve" to generate solution
contexts from an initial context determined by the query.  This initial 
context has an environment with all and only the free variables in the query
and a substitution in which they are unbound.  This context is created by
calling procedure "newctxt" which will be described later.

.ne7
.sh1 "The interpreter proper"

.ftCW
########################### Prolog interpreter #############################

procedure resolve(qry,hd,tl,ctext) # generates all solutions of qry[hd:tl]
   local sub,q                     # in given context, returns updated context
   if hd>tl then return ctext
   if (q:=qry[hd]).name=="~" then # negation by failure
      {if not resolve(q.args,1,1,ctext) then suspend resolve(qry,hd+1,tl,ctext)}
   else every sub:=tryclause(scanpred(q,ctext),!dbase[q.name],ctext.subst) do
           suspend resolve(qry,hd+1,tl,ctxt(ctext.env,sub))
end

procedure tryclause(term,cls,sub) # resolves term using given clause or fails
   local ctext                    # a copy of sub is used so no side effects
   ctext:=newctxt(cls.ids,copy(sub)) # preallocate context for whole clause
   if unify(term,scanpred(cls.head,ctext),ctext.subst) then 
      suspend resolve(cls.body,1,*cls.body,ctext).subst
end

procedure scanpred(prd,ctext) # converts predicate to structure 
   local args; args:=[] 
   if type(prd)=="var" then return ctext.env[prd.name]
   every put(args,scanpred(!prd.args,ctext))
   return struct(prd.name,args) 
end
.ft

The procedure "resolve" is a generator which produces all solutions to a sublist
of the supplied goal list "qry".  This sublist is from element "hd" to element
.(f
* Lisp's variety of lists can be implemented in Icon but lack the useful
operators the inbuilt lists have.  On the other hand the inbuilt list type
in Icon does not have a non copying "tail" or "cdr" operation.
.)f
"tl" and is passed in this way to save copying sublists\*[*\*].  The
resolution takes place with respect to the supplied context and
an updated context is returned as the answer.  
Ignoring negation for the moment, "resolve" proceeds by resolving
the first goal in the list and for each solution uses a recursive call
to satisfy the rest of the goal list if possible and generates a result
for every case.  The first goal is resolved by resuming "tryclause" for
each clause in the database matching the goal and "tryclause" itself being
a generator can be resumed several times for each clause.  For those not
accustomed to Icon's procedure resumption conventions this is more clearly
expressed

.ftCW
   q:=qry[hd]
   every cls:=!dbase[q.name] do
      every sub:=tryclause(scanpred(q,ctext),cls,ctext.subst) do
         suspend resolve(qry,hd+1,tl,ctxt(ctext.env,sub))
.ft

Notice that the updated substitution return by "tryclause" is supplied
to "resolve" for the rest of the list hence the effects of goal resolution
within a clause body are cumulative.  Another important point is to note that
the recursion bottoms out when the sublist of goals is empty and succeeds in
this case.  It is tempting to try to use a goal generator instead of a goal
list as in

.ftCW
   sub:=ctext.subst
   every q:=!qry do
      every sub:=tryclause(scanpred(q,ctext),!dbase[q.name],sub)
   return ctxt(ctext.env,sub)
.ft

This appears to save an explicit test for the end of the list but suffers
from a fatal flaw.  Apart from only being able to generate one solution this
scheme finds the first solution provided there is one but otherwise
succeeds anyway with a partial solution.  

Negation is simply handled as "negation as failure" by using Icon's "not"
operator which succeeds if and only if its argument fails.  Since when
a negated goal
succeeds the substitution cannot be updated the original substitution is
passed to the remaining goals in this case.

The procedure "tryclause" first creates a context for resolving the supplied
term against the supplied clause.  This context is based upon the supplied 
substitution and all the free identifiers in the clause.  The free identifiers
and the a copy of the substitution are used by "newctxt" to create a context
which has an environment for the identifiers as new variables and the
substitution extended with these new variables unbound.  Denotational
semantics for Prolog may perform this task on the fly as a clause is
interpreted and this corresponds operationally to a great deal of recomputation.
Here the syntax parser generates a list of free variable identifiers (without
repetitions) only once and combined with the "newctxt" this avoids extending
the context in a piecemeal fashion.

The substitution passed to "newctxt" is a copy since "newctxt" and "unify"
cause side effects upon their substitution parameter.  If these side effects
were eliminated it would require two copying operations to be performed on
very similar substitutions where only one is required.  The unifier returns
only success or failure of the attempted unification but in the case of
success the supplied substitution is updated as a side effect.

The procedure "scanpred" simply converts a predicate from its syntactic form
into a term according to some relevant context.  There are no side effects.

.ne7
.sh1 "Primitive domain operations and the parser"

.ftCW
######################## primitive domain operations ########################

procedure unify(t1,t2,sub) # (integer | struct),(integer | struct),sub 
   local v,i,num           # side effect: sub is updated
   if type(t1)=="integer" then {
      while type(v:=sub[t1])=="integer" do t1:=v # apply sub to t1
      return if type(v)=="struct" then unify(v,t2,sub) else sub[t1]:=t2}
   if type(t2)=="integer" then return unify(t2,t1,sub)
   if (t1.name==t2.name) & ((num:=*t1.args)=*t2.args) then {
      every i:=1 to num do if not unify(t1.args[i],t2.args[i],sub) then fail
      return}
end

procedure newctxt(ids,sub)       # forms a new context by extending sub
   local env; env:=table(&null)  # to accommodate the unbound identifiers
   every env[!ids]:=*put(sub,&null)
   return ctxt(env,sub)
end
   
procedure trmstr(trm,sub) # converts a term to a string suitable for output
   local s; s:=""
   case type(trm) of {
      "integer" : return trmstr(sub[trm],sub)
      "struct" : if s:=lstr(trm,sub) then return "["||s||"]" # non-empty list 
                 else {every s:=s||trmstr(!trm.args,sub)||","
                      return trm.name||(if *s=0 then "" else "("||s[1:-1]||")")}
      "null" : return "undefined"}
end

procedure lstr(l,sub) # succeeds if l is a proper non-empty list and
   local hd,tl        # converts l to string suitable for output
   if l.name=="." & *l.args=2 then {
      hd:=trmstr(l.args[1],sub); tl:=l.args[2]
      while type(tl)=="integer" do tl:=sub[tl] # apply sub to tl
      case type(tl) of {
         "struct" : {if tl.name=="nil" & *tl.args=0 then return hd # nil
                     return hd||","||lstr(tl,sub)}                 # cons
         "null" : return "undefined"}}
end

############################## Prolog parser ###############################

procedure prog() # parses consult[1] until query found or end of file
   query:=&null
   while write(read(consult[1])) ? clause() 
   if /query & consult[1]~===&input then close(consult[1])
end

procedure clause() # adds a clause to the dbase or fails when query set
   local p,b,ids,t; b:=[]; ids:=[]
   if =":-" then query:=all(ids,b:=body())
   else if ="?-" then query:=one(ids,b:=body())
   else {p:=pred(); if =":-" then b:=body()}
   if (t:=trim(tab(0)))~=="." then # syntax error
      return write("syntax error: ",t,if *t=0 then "." else " not"," expected")
   every extractids(ids,\\p|!b) # list of variable identifiers
   if (\\p).name=="consult" then every push(consult,open((!p.args).name))
   return dbase[(\\p).name]:=dbase[p.name]|||[rule(ids,p,b)]
end
 
procedure body() # list of predicates (may be empty)
   local b; b:=[]
   if put(b,pred()) then while ="," & put(b,pred())
   return b
end

procedure dots() # converts non-empty body of list to cons cells
   local p
   if p:=pred() then if ="," then return fun(".",[p,dots()])
                     else return fun (".",[p,fun("nil",[])])
end

procedure pred() # ~pred , name(body) , uc_name , lc_name , [body] , pred|pred
   local name,args,d,p,pp; args:=[]
   if ="~" then p:=fun("~",[pred()])
   else if name:=tab(many(&ucase++&lcase++'0123456789._')) then {
      if any(&ucase,name) then p:=var(name)
      else {if ="(" & args:=body() then check(")"); p:=fun(name,args)}}
   else if ="[]" then p:=fun("nil",[]) # empty list abbreviation
   else if ="[" then {p:=dots(); check("]")} # non-empty list abbreviation
   if ="|" then if pp:=pred() then return fun(".",[p,pp]) # infix cons
                else write("syntax error: missing second argument to \\"|\\"")
   return \\p # n.b. fails if predicate invalid
end

procedure check(s) # report error if s not present or skip over it
   if not =s then write("syntax error: ",s," expected before ",tab(0))
end

procedure extractids(ids,pred) # build the set of variable identifiers 
   if type(pred)=="fun" then every extractids(ids,!pred.args)
   else if not (pred.name==!ids) then put(ids,pred.name)
   return # the identifiers have been appended to reference parameter ids
end
.ft

The rest of the interpreter is supplied for completeness but is of little
intrinsic interest.  

The unifier updates the supplied substitution.  It is natural to consider 
using the backtrackable assignment and hence avoid the need to copy
substitutions altogether.  A preliminary investigation indicates that this
is feasible but space inefficient.  This may be due to poor implementation
of backtrackable assignment in the particular Icon interpreter used\*[*\*].
.(f
* &version = "Icon Version 6.0.  July 7, 1986." (University of Arizona)
.)f

The parser uses procedures which return parse trees except that "clause"
uses a global variable to return a top level query and fails when it
encounters one.  This causes termination of string scanning in "prog".
The behaviour of the user interface is described in the user manual.