thom@tuhold (Thom Fruehwirth) (08/19/88)
The fact that Prolog is both a specification and implementation language seems to cause many problems and misunderstandings. Most people do not distinguish between these to aspects when discussing about, writing or using Prolog code. I want to stress that distinction. My suggestion is that for every predicate you write, you should have a specification version as well as an implementation version optimized fo the specific context in which it is used. The specification is declarative, pure Prolog, thus easy to understand, to modify, reuse and transform. Even the transformation to the implementation version could be done semi-automatically ensuring a degree of equivalence. The implementation version is efficient, often full of cuts and other ugly side-effective things. To illustrate what I mean, I will do an example based on an article implementing the soundex algorithm in Prolog. The soundex algorithm maps similar words into the same code. The code consists of the starting letter and a digit sequence. The letters of the alphabet are grouped by mapping them into digits. Letters of a certain group are removed, then successive letters belonging to the same group are removed, last another group of letters is removed. Note the difference between removal before and after processing successive letters of the same group. Removal before can make letters successive which were not, while removal afterwards does not change the result of the mappings of other letters. Only the first three digits are used for the actual code. If there are less digits, zeroes are added. Here is the article. I changed the syntax from Waterloo Prolog to Edingburgh and added comments. I assume the algorithm given here to implement soundex. (I have no textual specification of soundex). ------------------------Start-of-Article----------------------- Date: Wed, 13 Jul 88 10:35 EDT From: William J. Joel <JZEM%MARIST.BITNET@MITVMA.MIT.EDU> Subject: Soundex algorithm To: AILIST@AI.AI.MIT.EDU /* key(Letter,CodeDigit) - Letter has CodeDigit */ key(a,-1). key(b,1). key(c,2). key(d,3). key(e,-1). key(f,1). key(g,2). key(h,-2). key(i,0). key(j,2). key(k,2). key(l,4). key(m,5). key(n,5). key(o,-1). key(p,1). key(q,2). key(r,6). key(s,2). key(t,3). key(u,-1). key(v,1). key(w,-3). key(x,2). key(y,-2). key(z,2). /* soundex(Name,Code) - Name has Code */ soundex(Name,Code):- string(Name,Code1) , write(Code1) , /* convert name to charlist */ soundex1(Code1,[A,B,C,D|Rem]) , string(Code,[A,B,C,D]). /* convert four-element char/digitlist (!?) to n ame */ /* soundex1(CharList,CharDigitList) - Charlist has code CharDigitlist */ soundex1([Head|Code1],[Head|Code]):- keycode([Head|Code1],Code2) , write(Code2) , reduce(Code2,[T|Code3]) , write([T|Code3]) , eliminate(Code3,Code4) , write(Code4) , append(Code4,[0,0,0],Code). /*keycode(CharList,DigitList) - CharList is coded as DigitList */ keycode([H|T],[N|CodeList]):- key(H,N) , keycode(T,CodeList). keycode([],[]). /* reduce(DigitList1,DigitList2) - DigitList1 is reduced to DigitList2 by removing (-2) and successive letters of the same group */ reduce([X,(-2),X|Rem],List):- reduce([X|Rem],List). reduce([X,X|Rem],List):- reduce([X|Rem],List). reduce([X,Y,Z|Rem],[X|List]):- not X==Z , reduce([Y,Z|Rem],List). reduce([X,Y|Rem],[X|List]):- not X==Y , reduce([Y|Rem],List). reduce([X],[X]). reduce([],[]). /* eliminate(DigitList1,DigitList2) - DigitList1 is reduced to DigitList2 by removing negative code digits */ eliminate([X|Rem],List):- X<0, eliminate(Rem,List). eliminate([X|Rem],[X|List]):- X>=0, eliminate(Rem,List). eliminate([],[]). /* append(List1,List2,List3) - List3 is List1 appended by List2 */ append([Head|Tail],List,[Head|NewList]):- append(Tail,List,NewList). append([],List,List). -------------------End-of-Article---------------------------- The write/1-calls in soundex/2 and soundex1/2 seem to stem from the debugging phase of the program and should be removed. The string/2 predicate used in soundex/1 is a obscured built-in, as the second call converts a list of digits (integers) _and_ characters into a string. Soundex/2 should be deterministic, but it isn't, because reduce/2 isn't. The reason is the insufficient negation of conditions. The clauses of reduce/2 should be mutually exclusive, so without using cut one has to negate the positive conditions satiesfied by the previous clauses. If the conditions are the unifications in the head, errors can be avoided by just negating these unifications. Things get much simpler, two clauses less are necessary: reduce([],[]). reduce([X,(-2),X|Rem],List):- reduce([X|Rem],List). reduce([X,X|Rem],List):- not [X,X|Rem]=[XN,(-2),XN|RemN], /* negate head-unfication of first clause */ reduce([X|Rem],List). reduce([X|Rem],[X|List]):- not [X|Rem]=[XN,(-2),XN|RemN] , /* negate head-unfication of first clause */ not [X|Rem]=[XN,XN|RemN] , /* negate head-unfication of second clause */ reduce(Rem,List). Of course in many cases these negations can be simplified: reduce([],[]). reduce([X,(-2),X|Rem],List):- reduce([X|Rem],List). reduce([X,X|Rem],List):- not (X=(-2),Rem=[(-2)|_]), reduce([X|Rem],List). reduce([X|Rem],[X|List]):- not Rem=[(-2),X|_] , not Rem=[X|_] , reduce(Rem,List). =============================================================== But lets go back to our initial thoughts. How does a specification of the soundex algorithm look like? Well, the steps of the algorithm have to be cleary seperated. The predicate remove/3 removes all occurences of a certain code. Remdup/3 removes all duplicates, it is derived from remove/3 by changing a single variable ! In key/2 all codes are given as strings, letters which are removed-before get code 'rb', those removed-after get code 'ra'. ------------------------Start-of-Specification------------------- /* Specification of SOUNDEX Algorithm as given by William J. Joel */ /* key(Letter,Code) - Letter has Code */ key(a,'ra'). key(b,'1'). key(c,'2'). key(d,'3'). key(e,'ra'). key(f,'1'). key(g,'2'). key(h,'rb'). key(i,'0'). key(j,'2'). key(k,'2'). key(l,'4'). key(m,'5'). key(n,'5'). key(o,'ra'). key(p,'1'). key(q,'2'). key(r,'6'). key(s,'2'). key(t,'3'). key(u,'ra'). key(v,'1'). key(w,'ra'). key(x,'2'). key(y,'rb'). key(z,'2'). /* soundex(Name,Code) - Name has Code */ soundex(Name,Code):- string(Name,Chars), soundex1(Chars,Codes), string(Code,Codes). /* soundex1(Char,Chars,Codes) - Char and Chars have code Codes */ soundex1([Char|Chars],[Char,C2,C3,C4]):- keycode([Char|Chars],[C1|Codes2]), remove(ra,Codes2,Codes3), remdup(C1,Codes3,Codes4), remove(rb,Codes4,Codes5), append(Codes5,['0','0','0'],[C2,C3,C4|UnusedCodes]). /*keycode(Chars,Codes) - Chars is coded as Codes */ keycode([],[]). keycode([Char|Chars],[Code|Codes]):- key(Char,Code), keycode(Chars,Codes). /* remove(Code,Codes1,Codes2) - Codes2 is Codes1 with all occurences of Code removed */ remove(Code,[],[]). remove(Code,[Code|Codes1],Codes2):- remove(Code,Codes1,Codes2). remove(RCode,[Code|Codes1],[Code|Codes2]):- not RCode=Code, remove(RCode,Codes1,Codes2). /* remdup(Code,Codes1,Codes2) - Codes2 is Codes1 with all duplicates removed, Code is the current Code */ remdup(Code,[],[]). remdup(Code,[Code|Codes1],Codes2):- remdup(Code,Codes1,Codes2). remdup(RCode,[Code|Codes1],[Code|Codes2]):- not RCode=Code, remdup(Code,Codes1,Codes2). /* append(List1,List2,List3) - List3 is List1 appended by List2 */ append([],List,List). append([Head|Tail],List,[Head|NewList]):- append(Tail,List,NewList). -------------------End-of-Specification------------------------- This specification in pure Prolog can be called not only to give the code for a word, but in any mode, as long as the first argument is a list. ================================================================ We now derive a optimized implementation version. We first partially evaluate the subgoals of soundex1 and replace negations by cuts: --------------------------------------------------------------- /* soundex(Name,Code) - Name has Code */ soundex(Name,Code):- string(Name,[Char|Chars]), soundex1(Char,Chars,C1,C2,C3), string(Code,[Char,C1,C2,C3]). /* soundex1(Char,Chars,Codes) - Char and Chars have code Codes */ soundex1(Char,Chars,C2,C3,C4):- key(Char,C1), keycode(Chars,Codes2), remove_ra(Codes2,Codes3), remdup(C1,Codes3,Codes4), remove_rb(Codes4,Codes5), appdcodes(Codes5,C2,C3,C4). /*keycode(Chars,Codes) - Chars is coded as Codes */ keycode([],[]). keycode([Char|Chars],[Code|Codes]):- key(Char,Code), keycode(Chars,Codes). /* remove_ra(Codes1,Codes2) - Codes2 is Codes1 with all occurences of 'ra' removed */ remove_ra([],[]). remove_ra(['ra'|Codes1],Codes2):- !, remove_ra(Codes1,Codes2). remove_ra([Code|Codes1],[Code|Codes2]):- remove_ra(Codes1,Codes2). /* remove_rb(Codes1,Codes2) - Codes2 is Codes1 with all occurences of 'rb' removed */ remove_rb([],[]). remove_rb(['rb'|Codes1],Codes2):- !, remove_rb(Codes1,Codes2). remove_rb([Code|Codes1],[Code|Codes2]):- remove_rb(Codes1,Codes2). /* remdup(Code,Codes1,Codes2) - Codes2 is Codes1 with all duplicates removed, Code is the current Code */ remdup(Code,[],[]). remdup(Code,[Code|Codes1],Codes2):- !, remdup(Code,Codes1,Codes2). remdup(RCode,[Code|Codes1],[Code|Codes2]):- remdup(Code,Codes1,Codes2). /* appdcodes(Codes1,Codes2) - Codes2 is Codes1 cut or extended to 4 digits */ appdcodes([],'0','0','0'). appdcodes([C1],C1,'0','0'). appdcodes([C1,C2],C1,C2,'0'). appdcodes([C1,C2,C3|UnusedCodes],C1,C2,C3). ---------------------------------------------------------------- We now join remove_rb/2,remdup/3 and remove_ra/2 into a single subgoal remove/3: /* remove(Code,Codes1,Codes2) - Codes2 is Codes1 with all 'ra',duplicates and 'rb' removed, Code is the current Code */ remove(Code,[],[]):-!. remove(Code,['rb'|Codes1],Codes2):- !, remove(Code,Codes1,Codes2). remove(Code,[Code|Codes1],Codes2):- !, remove(Code,Codes1,Codes2). remove(RCode,['ra'|Codes1],Codes2):- !, remove('ra',Codes1,Codes2). remove(RCode,[Code|Codes1],[Code|Codes2]):- remove(Code,Codes1,Codes2). We now join remove/3 and keycode/2 into single subgoal remcode/3: /* remcode(Code,Chars,Codes) - Codes is the code of Chars with all 'ra',duplicates and 'rb' remcoded, Code is the current Code */ remcode(Code,[],[]):-!. remcode(Code,[Char|Chars],Codes2):- key(Char,'rb'), !, remcode(Code,Chars,Codes2). remcode(Code,[Char|Chars],Codes2):- key(Char,Code), !, remcode(Code,Chars,Codes2). remcode(RCode,[Char|Chars],Codes2):- key(Char,'ra'), !, remcode('ra',Chars,Codes2). remcode(RCode,[Char|Chars],[Code|Codes2]):- key(Char,Code), remcode(Code,Chars,Codes2). Key/2 is called all over again, but we can do better splitting remcode/3 into remcode/3 and remcode/4: remcode(Code,[],[]):- !. remcode(Code1,[Char|Chars],Codes):- key(Char,Code2), !, remcode(Code1,Code2,Chars,Codes). remcode(Code,'rb',Codes1,Codes2):- !, remcode(Code,Codes1,Codes2). remcode(Code,Code,Codes1,Codes2):- !, remcode(Code,Codes1,Codes2). remcode(RCode,'ra',Codes1,Codes2):- !, remcode('ra',Codes1,Codes2). remcode(RCode,Code,Codes1,[Code|Codes2]):- remcode(Code,Codes1,Codes2). In remcode/4 the third and fourth argument are copied to the recursive call in every but the last clause. We thus make the last clause explicit in remcode/3 and move the recursive call up to remcode 2: remcode(Code,[],[]). remcode(Code1,[Char|Chars],Codes):- key(Char,Code2), !, (remcode1(Code1,Code2,Code3) -> /* if-then-else */ Codes=Codes1; Code2=Code3, Codes=[Code2|Codes1]), remcode(Code3,Chars,Codes1). remcode1(Code,'rb',Code). remcode1(Code,Code,Code). remcode1(Code,'ra','ra'). Note that the solution with if-then-else is more or less arbitrary, there are at least four other versions with the same timings on my Prolog. (You can move the if-then-else down to remcode/3 or up to soundex/1 etc.). _There is never a _single_ best implementation_!. Now there is still inefficiency in our program, because we compute all the code for a word although we only need the first four elements of the code. Therefore we join remcode/2 and appdcodes/4 (was append/3). The idea is to call an additional argument of remcode/3 with a list of three zeroes which is shortened if the resulting list is made longer. This is the case in the else-part of the if-then-else construct: remcode([],Code,UnusedChars,[]):- !. /* This clause saves the work */ remcode(Zeroes,Code,[],Zeroes). remcode(Zeroes,Code1,[Char|Chars],Codes):- key(Char,Code2), !, (remcode1(Code1,Code2,Code3) -> Codes=Codes1, Zeroes=Zeroes1; Code2=Code3, Codes=[Code2|Codes1], Zeroes=[Zero|Zeroes1]), remcode(Zeroes1,Code3,Chars,Codes1). Now remcode/4 is the only subgoal of soundex1 besides key/2, we at last therefore result in: ------------------------Start-of-Implementation------------------- /* An Implementation of SOUNDEX Algorithm */ /* key(Letter,Code) - Letter has Code */ key(a,'ra'). key(b,'1'). key(c,'2'). key(d,'3'). key(e,'ra'). key(f,'1'). key(g,'2'). key(h,'rb'). key(i,'0'). key(j,'2'). key(k,'2'). key(l,'4'). key(m,'5'). key(n,'5'). key(o,'ra'). key(p,'1'). key(q,'2'). key(r,'6'). key(s,'2'). key(t,'3'). key(u,'ra'). key(v,'1'). key(w,'ra'). key(x,'2'). key(y,'rb'). key(z,'2'). /* soundex(Name,Code) - Name has Code */ soundex(Name,Code):- string(Name,[Char|Chars]), key(Char,Code), soundex1(['0','0','0'],Code,Chars,Codes), string(Code,[Char|Codes]). /* remcode(Code1,Code2,Code3) - Code1 and Code2 can be replaced by the single code Code3 */ remcode(Code,'rb',Code). remcode(Code,Code,Code). remcode(Code,'ra','ra'). /* soundex1(Zeroes,Char,Chars,Codes) - Char and Chars have code Codes cut to the length of the list Zeroes or extended by the rest of the list Zeroes */ soundex1([],Code,UnusedChars,[]):- !. soundex1(Zeroes,Code,[],Zeroes). soundex1(Zeroes,Code1,[Char|Chars],Codes):- key(Char,Code2), !, (remcode(Code1,Code2,Code3) -> Codes=Codes1, Zeroes=Zeroes1; Code2=Code3, Codes=[Code2|Codes1], Zeroes=[Zero|Zeroes1]), soundex1(Zeroes1,Code3,Chars,Codes1). ------------------------End-of-Implementation------------------- CONCLUSION: Distinguish between specification and implementation. Do the specification first, then derive the implementation. thom fruehwirth technical university of vienna