[comp.lang.prolog] How to do it

thom@tuhold (Thom Fruehwirth) (08/19/88)

The fact that Prolog is both a specification and implementation language
seems to cause many problems and misunderstandings. Most people
do not distinguish between these to aspects when discussing about,
writing or using Prolog code. I want to stress that distinction. My
suggestion is that for every predicate you write, you should have
a specification version as well as an implementation version optimized
fo the specific context in which it is used.

The specification is declarative, pure Prolog, thus easy to
understand, to modify, reuse and transform. Even the transformation
to the implementation version could be done semi-automatically
ensuring a degree of equivalence. The implementation version is
efficient, often full of cuts and other ugly side-effective things.

To illustrate what I mean, I will do an example based on an article
implementing the soundex algorithm in Prolog. 

The soundex algorithm maps similar words into the same code.
The code consists of the starting letter and a digit sequence. 
The letters of the alphabet are grouped by mapping them into digits. 
Letters of a certain group are removed, then successive 
letters belonging to the same group are removed, last another group
of letters is removed. Note the difference between removal before
and after processing successive letters of the same group. Removal
before can make letters successive which were not, while removal
afterwards does not change the result of the mappings of other letters.
Only the first three digits are used for the actual code. If there are
less digits, zeroes are added.

Here is the article. I changed the syntax from Waterloo Prolog to
Edingburgh and added comments. I assume the algorithm given here 
to implement soundex. (I have no textual specification of soundex).

------------------------Start-of-Article-----------------------

Date: Wed, 13 Jul 88 10:35 EDT
From: William J. Joel <JZEM%MARIST.BITNET@MITVMA.MIT.EDU>
Subject: Soundex algorithm
To: AILIST@AI.AI.MIT.EDU

/* key(Letter,CodeDigit) - Letter has CodeDigit */

key(a,-1).		key(b,1).		key(c,2).
key(d,3).
key(e,-1).		key(f,1).		key(g,2).
key(h,-2).
key(i,0).		key(j,2).		key(k,2).
key(l,4).
key(m,5).		key(n,5).		key(o,-1).
key(p,1).
key(q,2).		key(r,6).		key(s,2).
key(t,3).
key(u,-1).		key(v,1).		key(w,-3).
key(x,2).
key(y,-2).		key(z,2).

/* soundex(Name,Code) - Name has Code */

soundex(Name,Code):-
   string(Name,Code1) , write(Code1) ,	/* convert name to charlist */
   soundex1(Code1,[A,B,C,D|Rem]) ,
   string(Code,[A,B,C,D]).	/* convert four-element char/digitlist
			(!?) to n	ame */

/* soundex1(CharList,CharDigitList) - Charlist has code CharDigitlist */

soundex1([Head|Code1],[Head|Code]):-
   keycode([Head|Code1],Code2) , write(Code2) ,
   reduce(Code2,[T|Code3]) , write([T|Code3]) ,
   eliminate(Code3,Code4) , write(Code4) ,
   append(Code4,[0,0,0],Code).

/*keycode(CharList,DigitList) - CharList is coded as DigitList */

keycode([H|T],[N|CodeList]):-
   key(H,N) ,
   keycode(T,CodeList).
keycode([],[]).

/* reduce(DigitList1,DigitList2) - DigitList1 is reduced to DigitList2
    by removing (-2) and successive letters of the same group */

reduce([X,(-2),X|Rem],List):-
   reduce([X|Rem],List).
reduce([X,X|Rem],List):-
   reduce([X|Rem],List).
reduce([X,Y,Z|Rem],[X|List]):-
   not X==Z ,
   reduce([Y,Z|Rem],List).
reduce([X,Y|Rem],[X|List]):-
   not X==Y ,
   reduce([Y|Rem],List).
reduce([X],[X]).
reduce([],[]).

/* eliminate(DigitList1,DigitList2) - DigitList1 is reduced to
DigitList2
    by removing negative code digits */

eliminate([X|Rem],List):-
   X<0,
   eliminate(Rem,List).
eliminate([X|Rem],[X|List]):-
   X>=0,
   eliminate(Rem,List).
eliminate([],[]).

/* append(List1,List2,List3) - List3 is List1 appended by List2 */

append([Head|Tail],List,[Head|NewList]):-
   append(Tail,List,NewList).
append([],List,List).

-------------------End-of-Article----------------------------

The write/1-calls in soundex/2 and soundex1/2 seem to stem from
the debugging phase of the program and should be removed. The string/2
predicate used in soundex/1 is a obscured built-in, as the second call 
converts a list of digits (integers) _and_ characters into a string.

Soundex/2 should be deterministic, but it isn't, because reduce/2 isn't.
The reason is the insufficient negation of conditions. The clauses of
reduce/2 should be mutually exclusive, so without using cut one has to
negate the positive conditions satiesfied by the previous clauses. If
the
conditions are the unifications in the head, errors can be avoided by
just negating these unifications. Things get much simpler, two clauses
less are necessary: 

reduce([],[]).
reduce([X,(-2),X|Rem],List):-
   reduce([X|Rem],List).
reduce([X,X|Rem],List):-
   not [X,X|Rem]=[XN,(-2),XN|RemN],	/* negate head-unfication of
first clause */
   reduce([X|Rem],List).
reduce([X|Rem],[X|List]):-
   not [X|Rem]=[XN,(-2),XN|RemN] ,	/* negate head-unfication of
first clause */
   not [X|Rem]=[XN,XN|RemN] ,		/* negate head-unfication of
second clause */
   reduce(Rem,List).

Of course in many cases these negations can be simplified:

reduce([],[]).
reduce([X,(-2),X|Rem],List):-
   reduce([X|Rem],List).
reduce([X,X|Rem],List):-
   not (X=(-2),Rem=[(-2)|_]),
   reduce([X|Rem],List).
reduce([X|Rem],[X|List]):-
   not Rem=[(-2),X|_] ,		
   not Rem=[X|_] ,		
   reduce(Rem,List).

===============================================================

But lets go back to our initial thoughts. How does a specification of
the soundex algorithm look like? Well, the steps of the algorithm
have to be cleary seperated. The predicate remove/3 removes all
occurences of a certain code. Remdup/3 removes all duplicates, 
it is derived from remove/3 by changing a single variable !
In key/2 all codes are given as strings, letters which are 
removed-before get code 'rb', those removed-after get code 'ra'.

------------------------Start-of-Specification-------------------

/* Specification of SOUNDEX Algorithm as given by William J. Joel */

/* key(Letter,Code) - Letter has Code */

key(a,'ra').	key(b,'1').		key(c,'2').
key(d,'3').
key(e,'ra').	key(f,'1').		key(g,'2').
key(h,'rb').
key(i,'0').	key(j,'2').		key(k,'2').
key(l,'4').
key(m,'5').	key(n,'5').		key(o,'ra').
key(p,'1').
key(q,'2').	key(r,'6').		key(s,'2').
key(t,'3').
key(u,'ra').	key(v,'1').		key(w,'ra').
key(x,'2').
key(y,'rb').	key(z,'2').

/* soundex(Name,Code) - Name has Code */

soundex(Name,Code):-
   string(Name,Chars),
   soundex1(Chars,Codes),
   string(Code,Codes).

/* soundex1(Char,Chars,Codes) - Char and Chars have code Codes */

soundex1([Char|Chars],[Char,C2,C3,C4]):-
   keycode([Char|Chars],[C1|Codes2]),
   remove(ra,Codes2,Codes3),
   remdup(C1,Codes3,Codes4),
   remove(rb,Codes4,Codes5),
   append(Codes5,['0','0','0'],[C2,C3,C4|UnusedCodes]).

/*keycode(Chars,Codes) - Chars is coded as Codes */

keycode([],[]).
keycode([Char|Chars],[Code|Codes]):-
   key(Char,Code),
   keycode(Chars,Codes).

/* remove(Code,Codes1,Codes2) - Codes2 is Codes1 
    with all occurences of Code removed */

remove(Code,[],[]).
remove(Code,[Code|Codes1],Codes2):-
   remove(Code,Codes1,Codes2).
remove(RCode,[Code|Codes1],[Code|Codes2]):-
   not RCode=Code,
   remove(RCode,Codes1,Codes2).

/* remdup(Code,Codes1,Codes2) - Codes2 is Codes1 
    with all duplicates removed, Code is the current Code */

remdup(Code,[],[]).
remdup(Code,[Code|Codes1],Codes2):-
   remdup(Code,Codes1,Codes2).
remdup(RCode,[Code|Codes1],[Code|Codes2]):-
   not RCode=Code,
   remdup(Code,Codes1,Codes2).

/* append(List1,List2,List3) - List3 is List1 appended by List2 */

append([],List,List).
append([Head|Tail],List,[Head|NewList]):-
   append(Tail,List,NewList).

-------------------End-of-Specification-------------------------

This specification in pure Prolog can be called not only to give the
code
for a word, but in any mode, as long as the first argument is a list.

================================================================

We now derive a optimized implementation version. 
We first partially evaluate the subgoals of soundex1 
and replace negations by cuts:
---------------------------------------------------------------
/* soundex(Name,Code) - Name has Code */

soundex(Name,Code):-
   string(Name,[Char|Chars]),
   soundex1(Char,Chars,C1,C2,C3),
   string(Code,[Char,C1,C2,C3]).

/* soundex1(Char,Chars,Codes) - Char and Chars have code Codes */

soundex1(Char,Chars,C2,C3,C4):-
   key(Char,C1),
   keycode(Chars,Codes2),
   remove_ra(Codes2,Codes3),
   remdup(C1,Codes3,Codes4),
   remove_rb(Codes4,Codes5),
   appdcodes(Codes5,C2,C3,C4).

/*keycode(Chars,Codes) - Chars is coded as Codes */

keycode([],[]).
keycode([Char|Chars],[Code|Codes]):-
   key(Char,Code),
   keycode(Chars,Codes).

/* remove_ra(Codes1,Codes2) - Codes2 is Codes1 
    with all occurences of 'ra' removed */

remove_ra([],[]).
remove_ra(['ra'|Codes1],Codes2):-
   !,
   remove_ra(Codes1,Codes2).
remove_ra([Code|Codes1],[Code|Codes2]):-
   remove_ra(Codes1,Codes2).

/* remove_rb(Codes1,Codes2) - Codes2 is Codes1 
    with all occurences of 'rb' removed */

remove_rb([],[]).
remove_rb(['rb'|Codes1],Codes2):-
   !,
   remove_rb(Codes1,Codes2).
remove_rb([Code|Codes1],[Code|Codes2]):-
   remove_rb(Codes1,Codes2).

/* remdup(Code,Codes1,Codes2) - Codes2 is Codes1 
    with all duplicates removed, Code is the current Code */

remdup(Code,[],[]).
remdup(Code,[Code|Codes1],Codes2):-
   !,
   remdup(Code,Codes1,Codes2).
remdup(RCode,[Code|Codes1],[Code|Codes2]):-
   remdup(Code,Codes1,Codes2).

/* appdcodes(Codes1,Codes2) - Codes2 is Codes1 cut or extended to 4
digits */

appdcodes([],'0','0','0').
appdcodes([C1],C1,'0','0').
appdcodes([C1,C2],C1,C2,'0').
appdcodes([C1,C2,C3|UnusedCodes],C1,C2,C3).
----------------------------------------------------------------

We now join remove_rb/2,remdup/3 and remove_ra/2 into a single
subgoal remove/3: 

/* remove(Code,Codes1,Codes2) - Codes2 is Codes1 
    with all 'ra',duplicates and 'rb' removed, Code is the current Code
*/

remove(Code,[],[]):-!.
remove(Code,['rb'|Codes1],Codes2):-
   !,
   remove(Code,Codes1,Codes2).
remove(Code,[Code|Codes1],Codes2):-
   !,
   remove(Code,Codes1,Codes2).
remove(RCode,['ra'|Codes1],Codes2):-
   !,
   remove('ra',Codes1,Codes2).
remove(RCode,[Code|Codes1],[Code|Codes2]):-
   remove(Code,Codes1,Codes2).

We now join remove/3 and keycode/2 into single subgoal remcode/3: 

/* remcode(Code,Chars,Codes) - Codes is the code of Chars 
    with all 'ra',duplicates and 'rb' remcoded, Code is the current Code
*/

remcode(Code,[],[]):-!.
remcode(Code,[Char|Chars],Codes2):-
   key(Char,'rb'),
   !,
   remcode(Code,Chars,Codes2).
remcode(Code,[Char|Chars],Codes2):-
   key(Char,Code),
   !,
   remcode(Code,Chars,Codes2).
remcode(RCode,[Char|Chars],Codes2):-
   key(Char,'ra'),
   !,
   remcode('ra',Chars,Codes2).
remcode(RCode,[Char|Chars],[Code|Codes2]):-
   key(Char,Code),
   remcode(Code,Chars,Codes2).

Key/2 is called all over again, but we can do better  splitting
remcode/3 into remcode/3 and remcode/4:

remcode(Code,[],[]):- !.
remcode(Code1,[Char|Chars],Codes):-
   key(Char,Code2),
   !,
   remcode(Code1,Code2,Chars,Codes).
   
remcode(Code,'rb',Codes1,Codes2):-
   !,
   remcode(Code,Codes1,Codes2).
remcode(Code,Code,Codes1,Codes2):-
   !,
   remcode(Code,Codes1,Codes2).
remcode(RCode,'ra',Codes1,Codes2):-
   !,
   remcode('ra',Codes1,Codes2).
remcode(RCode,Code,Codes1,[Code|Codes2]):-
   remcode(Code,Codes1,Codes2).

In remcode/4 the third and fourth argument are copied to the recursive
call in every but the last clause. We thus make the last clause explicit

in remcode/3 and move the recursive call up to remcode 2:

remcode(Code,[],[]).
remcode(Code1,[Char|Chars],Codes):-
   key(Char,Code2),
   !,
   (remcode1(Code1,Code2,Code3) ->		/* if-then-else */
      Codes=Codes1;
      Code2=Code3, Codes=[Code2|Codes1]),
   remcode(Code3,Chars,Codes1).
   
remcode1(Code,'rb',Code).
remcode1(Code,Code,Code).
remcode1(Code,'ra','ra').

Note that the solution with if-then-else is more or less arbitrary,
there are at least four other versions with the same timings on my
Prolog. (You can move the if-then-else down to remcode/3 or up to
soundex/1  etc.). _There is never a _single_ best implementation_!.
 
Now there is still inefficiency in our program, because we compute all
the code for a word although we only need the first four elements of 
the code. Therefore we join remcode/2 and appdcodes/4 (was append/3).
The idea is to call an additional argument of remcode/3 with a list of 
three zeroes which is shortened if the resulting list is made longer. 
This is the case in the else-part of the if-then-else construct:

remcode([],Code,UnusedChars,[]):- !.	/* This clause saves the work */
remcode(Zeroes,Code,[],Zeroes).
remcode(Zeroes,Code1,[Char|Chars],Codes):-
   key(Char,Code2),
   !,
   (remcode1(Code1,Code2,Code3) ->
      Codes=Codes1, Zeroes=Zeroes1;
      Code2=Code3, Codes=[Code2|Codes1], Zeroes=[Zero|Zeroes1]),
   remcode(Zeroes1,Code3,Chars,Codes1).

Now remcode/4 is the only subgoal of soundex1 besides key/2, 
we at last therefore result in:

------------------------Start-of-Implementation-------------------

/* An Implementation of SOUNDEX Algorithm */

/* key(Letter,Code) - Letter has Code */

key(a,'ra').	key(b,'1').		key(c,'2').
key(d,'3').
key(e,'ra').	key(f,'1').		key(g,'2').
key(h,'rb').
key(i,'0').	key(j,'2').		key(k,'2').
key(l,'4').
key(m,'5').	key(n,'5').		key(o,'ra').
key(p,'1').
key(q,'2').	key(r,'6').		key(s,'2').
key(t,'3').
key(u,'ra').	key(v,'1').		key(w,'ra').
key(x,'2').
key(y,'rb').	key(z,'2').

/* soundex(Name,Code) - Name has Code */

soundex(Name,Code):-
   string(Name,[Char|Chars]),
   key(Char,Code),
   soundex1(['0','0','0'],Code,Chars,Codes),
   string(Code,[Char|Codes]).

/* remcode(Code1,Code2,Code3) - Code1 and Code2 can be replaced
   by the single code Code3 */

remcode(Code,'rb',Code).
remcode(Code,Code,Code).
remcode(Code,'ra','ra').

/* soundex1(Zeroes,Char,Chars,Codes) - Char and Chars have code Codes
cut
to the length of the list Zeroes or extended by the rest of the list
Zeroes */

soundex1([],Code,UnusedChars,[]):- !.
soundex1(Zeroes,Code,[],Zeroes).
soundex1(Zeroes,Code1,[Char|Chars],Codes):-
   key(Char,Code2),
   !,
   (remcode(Code1,Code2,Code3) ->
      Codes=Codes1, Zeroes=Zeroes1;
      Code2=Code3, Codes=[Code2|Codes1], Zeroes=[Zero|Zeroes1]),
   soundex1(Zeroes1,Code3,Chars,Codes1).
   
------------------------End-of-Implementation-------------------


CONCLUSION:
Distinguish between specification and implementation.
Do the specification first, then derive the implementation.

thom fruehwirth
technical university of vienna