ok@quintus (08/23/88)
In a recent article, Thom Fruehwirth suggested that it is a good idea to distinguish between Prolog as a specification language and Prolog as an implementation language. Quite right. When he suggests that you should write a specification version and an implementation version of every predicate, I think he goes rather too far. In particular, it is not necessary for the implementation of Soundex to be quite as ugly as his implementation version. Proof by illustration: /* This description of the Soundex algorithm is taken from Knuth, "The Art of Computer Programming", Vol 3 "Sorting and Searching", page 392 of the first edition. 1. Retain the first letter of the name, and drop all occurrences of a, e, h, i, o, u, w, y in other positions. 2. Assign the following numbers to the remaining letters after the first: b, f, p, v -> 1 l -> 4 c, g, j, k, q, s, x, z -> 2 m, n -> 5 d, t -> 3 r -> 6 3. If two or more letters with the same code were adjacent in the original name (before step 1), omit all but the first. 4. Convert to the form "letter, digit, digit, digit" by adding trailing zeros (if there are fewer than three digits) or by dropping rightmost digits (if there are more than three). Note that this specification refers only to lower-case letters, so that is what the Prolog code below implements. To handle upper- case letters as well, see the comment in chars_to_codes/2. */ soundex_atoms(Name, Soundex) :- name(Name, Chars), soundex_chars(Chars, Code), name(Soundex, Code). soundex_chars([Char|Chars], [Char|Digits]) :- chars_to_codes([Char|Chars], [Code|Codes]), omit_duplicates_and_vowels(Codes, Code, CodesWithoutDuplicates), three_digits(CodesWithoutDuplicates, Digits). /* To handle both cases of letters, do to_lower(Char, Lower), char_to_code(Lower, Code) if you have to_lower/2, or do Lower is Char \/ 8'040, char_to_code(Lower, Code) if the input uses ISO 8859/1 and you haven't got to_lower/2. */ chars_to_codes([], []). chars_to_codes([Char|Chars], [Code|Codes]) :- char_to_code(Char, Code), chars_to_codes(Chars, Codes). char_to_code(0'a, 0). char_to_code(0'e, 0). char_to_code(0'h, 0). char_to_code(0'i, 0). char_to_code(0'o, 0). char_to_code(0'u, 0). char_to_code(0'w, 0). char_to_code(0'y, 0). char_to_code(0'b, 0'1). char_to_code(0'f, 0'1). char_to_code(0'p, 0'1). char_to_code(0'v, 0'1). char_to_code(0'c, 0'2). char_to_code(0'g, 0'2). char_to_code(0'j, 0'2). char_to_code(0'k, 0'2). char_to_code(0'q, 0'2). char_to_code(0's, 0'2). char_to_code(0'x, 0'2). char_to_code(0'z, 0'2). char_to_code(0'd, 0'3). char_to_code(0't, 0'3). char_to_code(0'l, 0'4). char_to_code(0'm, 0'5). char_to_code(0'n, 0'5). char_to_code(0'r, 0'6). % Note that omit_duplicates_and_vowels/3 uses if->then;elses, but that % the "then" arrows could be replaced by conjunction without harming % the logic. omit_duplicates_and_vowels([], _, []). omit_duplicates_and_vowels([Code|Codes], Prev, Without) :- ( Code =:= Prev -> Without = More % omit a duplicate ; Code =:= 0 -> Without = More % omit a vowel ; Code =\= Prev, Code =\= 0 -> Without = [Code|More] ), omit_duplicates_and_vowels(Codes, Code, More). % The cuts in three_digits/2 are all "green" ones. That is, they % prune away (failed) proofs, not solutions. Deeper indexing would % obviate the need for such cuts. three_digits([], "000"). three_digits([D1], [D1|"00"]) :- !. three_digits([D1,D2], [D1,D2|"0"]) :- !. three_digits([D1,D2,D3|_], [D1,D2,D3]). /* Test cases (from Knuth): euler, ellery -> e460 gauss, ghosh -> g200 hilbert, heilbronn -> h416 knuth, kant -> k530 lloyd, ladd -> l300 lukasiewicz, lissajous -> l222 */
thom@tuhold (Thom Fruehwirth) (08/26/88)
Commenting on my recent article, Richard O'Keefe <305@quintus.UUCP> tried to proof by illustration that my implementation version of soundex is too ugly. Richard O'Keefes implementation is based on a different specification, which is easier to implement. Second, his version is only half-way between my specification and implementation version To yield the same efficiency, a joining of his three predicates called in soundex_chars/2 into one is necessary. Note that the resulting version is arbitrarily faster than the current version depending on the length of the word (how about 'CodesWithoutDuplicates') one codes. The reason for this is that my version never codes more than three digits, while O'Keefes version codes the whole word.
ok@quintus.uucp (Richard A. O'Keefe) (08/30/88)
In article <1178@tuhold> thom@tuhold (Thom Fruehwirth) writes: >Richard O'Keefe's implementation is based on a different >specification, which is easier to implement. My program was based on the specification of the Soundex function that Knuth provides. Anything which computes a different function is not Soundex. >Note that the resulting version is arbitrarily faster than the current version depending on the length of the word one codes. The reason for >for this is that my version never codes more than three digits, while >O'Keefe's version codes the whole word. Since the Soundex algorithm is primarily intended for encoding surnames, this is not much of a problem. Since the specification of the Soundex function requires that duplicate characters be discarded, it is possible to construct strings of arbitrary length _all_ of which must be inspected even if only 3 codes are produced. (Stick "sss.......sss" in front of any word you like.) However, Fruehwirth is quite right that I didn't take my "implementation" version as far as I could have. Here's my "optimised" version: the soundex_atoms/2 predicate and char_to_code/2 table didn't change at all. soundex_chars([Char|Chars], [Char|Codes]) :- char_to_code(Char, Code), soundex_chars(Chars, Code, 3, Codes). % soundex_chars(+Letters, +PreviousCode, +DigitsWanted, -Digits) soundex_chars([], _, N, Zeros) :- zeros(N, Zeros). soundex_chars([Char|Chars], Prev, N, Codes) :- char_to_code(Char, Code), ( Code =:= Prev -> % Discard duplicate characters soundex_chars(Chars, Code, N, Codes) ; Code =:= 0 -> % Ignore vowels soundex_chars(Chars, Code, N, Codes) ; N =:= 1 -> % Stop when 3 digits have been done Codes = [Code] ; M is N-1, % otherwise, convert 1 more character Codes = [Code|Codes1], soundex_chars(Chars, Code, M, Codes1) ). zeros(1, "0"). zeros(2, "00"). zeros(3, "000"). Ironically, this has even fewer cuts than the version I started with! It provides even less support for the idea that ugliness is warranted for efficiency's sake. Testing the two versions on a random sample of 20 names drawn from /usr/dict/words, the improved version was a little under twice as fast as the version I first thought of.
waldau@kuling.UUCP (Mattias Waldau) (08/30/88)
Thom Fruewirth and R.O.Keefe have both specified following soundex algorithm with Horn clauses. Following the tradition of Keith CLark, Sten-Ake Tarnlund, Zohar Manna, Richard Waldinger, and Alan Bundy (see his last lecture at the logic conference in Seattle) I have tried to specify the same algorithm in first order predicate logic with equality. The idea is that that specification should be easier to understand, and therefor more likely to be correct. The question to the newsgroup is whether this is true or not. /* This description of the Soundex algorithm is taken from Knuth, "The Art of Computer Programming", Vol 3 "Sorting and Searching", page 392 of the first edition. 1. Retain the first letter of the name, and drop all occurrences of a, e, h, i, o, u, w, y in other positions. 2. Assign the following numbers to the remaining letters after the first: b, f, p, v -> 1 l -> 4 c, g, j, k, q, s, x, z -> 2 m, n -> 5 d, t -> 3 r -> 6 3. If two or more letters with the same code were adjacent in the original name (before step 1), omit all but the first. 4. Convert to the form "letter, digit, digit, digit" by adding trailing zeros (if there are fewer than three digits) or by dropping rightmost digits (if there are more than three). */ Syntax: ======= I use ~, (x), (e x), &, |, ->, <-, <-> for not, forall, exists, and, or, onlyif, if, and iff respectively. Not, exists, forall binds titest, then and, then or, then onlyif and if, and at last iff. So a<->b|c&(e x)d&e->f is the same as a<->((b|(c&((e x)d)&e))->f). Variable names start with a capital letter, constants, functions and predicates are written with lower-case only. Free variables are quantified with forall. I use standard Prolog list syntax. * is the append function. *, eq, teq, eq2, teq2, in, =, and < are written as infix operators. Specification: ============== % soundex(Sequence_of_chars, Soundex_code), Soundex_code is the % soundex code of the Sequence_of_chars soundex([Letter|Rest], [Letter|Digits])) <-> (e T)(e V) {without_adjacent_identical_codes([Letter|Rest], [Letter|T]) & without_vowels(T, V) & three_first_digits(V, Digits)} % without_identical_adjancent_codes(With, Without), Without is the % shortest sequence without identical adjancent codes that is identical % to With. without_adjancent_identical_codes(With, Without) <-> With teq Without & ~(e Shorter){With teq Shorter & Shorter<Without} % A eq B, B is the same sequence as A except that one letter with the % same code as the preceeding one is removed. % % A teq B, teq is the transitive closure of eq. With eq Without <-> (e Before)(e After)(e Dup1)(e Dup2) {With=Before*[Dup1, Dup2]*After & Without=Before*[Dup1]*After & code(Dup1)=code(Dup2)} A teq B <-> A=B | (e C){A eq C & C teq B} % A<B, the length of A is shorter than the length of B. A<B <-> (e Before)(e After)(e Element) {B=Before*[Element]*After & (A=Before*After | A<Before*After)} % without_vowels(With, Without), Without is all the non-vowel letters % of With in the same order. without_vowels(With, Without) <-> With teq2 Without & (Element){Element in Without -> ~vowel(Element)} % A eq2 B, B is the same sequence as A, but with one vowel removed. % A teq2 B, teq2 is the transitive closure of eq2. A eq2 B <-> (e Before)(e Vowel)(e After) {A=Before*[Vovel]*After & B=Before*After & vowel(Vowel)} A teq2 B <-> A=B | (e C){A eq2 C & C teq2 B} % three_digits(V, Digits), V is a sequence of letters, Digits is a % sequence of the codes of the three first letters. If there are fewer % than three letters, add trailing zeroes to Digits. three_digits(V, Digits) <-> V=[] & Digits=[0, 0, 0] | V=[Char1] & Digits=[code(char1), 0, 0] | V=[Char1, Char2] & Digits=[code(Char1), code(Char2), 0] | (e Garbage){V=[Char1, Char2, Char3|Garbage] & Digits=[code(Char1), code(Char2), code(Char3)]} % vowel(V), V is a vowel (almost, h and w isn't) vowel(V) <-> {V=a | V=e | V=h | V=i | V=o | V=u | V=w | V=y } % code(Letter)=Code, Letter has the code Code. code(L)=C <-> (L=b | L=f | L=p | L=v -> C=1) & (L=c | L=g | L=j | L=k | L=q | L=s | L=x | L=z -> C=2) & (L=d | L=t -> C=3) & (L=l -> C=4) & (L=m | L=n -> C=5) & (L+r -> C=6) % A*B=C, C is the concatenation of the sequences A and B. A*B=C <-> A=[] & B=C | (e H)(e T){A=[H|T] & C=[H|T*B]} % E in S, E is a element of the sequence S. E in S <-> (e H)(e T){S=[H|T] & (E=H | E in T)} /Mattias Waldau -- Mattias Waldau mattias-waldau@aida.uu.se or Computing Science Department waldau@kuling.uucp P.O. Box 520, S-751 20 Uppsala, Sweden Phone: +46-18-181055
ok@quintus.uucp (Richard A. O'Keefe) (09/01/88)
In article <816@kuling.UUCP> waldau@kuling.UUCP (Mattias Waldau) writes: >Thom Fruewirth and R.O'Keefe have both specified following Soundex >algorithm with Horn clauses. Following the tradition of Keith Clark, >Sten-Ake Tarnlund, Zohar Manna, Richard Waldinger, and Alan Bundy (see >his last lecture at the logic conference in Seattle) I have tried to >specify the same algorithm in first order predicate logic with >equality. > >The idea is that that specification should be easier to understand, >and therefore more likely to be correct. The question to the newsgroup >is whether this is true or not. Excluding comments, blank lines, and the table which maps letters to digits (and which are to be dropped), the last version I presented had 24 lines, 82 "words", 3.4 "words"/line. With the same exclusions, Waldau's version has 32 lines, 201 "words", 6.3 "words"/line. {I really *must* write an editor command to count tokens rather than "words".} Other things being equal (which they aren't) I would be surprised if a text containing two and a half times as many "words" were easier to understand. Note that the version I presented *is* first-order logic except for the use of if->then;else, and eliminating that by adding an appropriate number of inequalities adds a mere 6 "words" to the total. Easily the shortest specification is the one which Knuth provides; as I typed it in it took a mere 11 non-blank lines of English, and that includes the tables. It is interesting to note that Waldau's comment explaining "A<B disagrees with the code: > % A<B, the length of A is shorter than the length of B. > A<B <-> (e Before)(e After)(e Element) > {B=Before*Element*After & (A=Before*After | A<Before*After)} > A*B=C <-> A=[] & B=C | (e H)(e T){A=[H|T] & C=[H|T*B]} The code defines A < B when A is a *subsequence* of B, so that although [a] < [b,c] according to the comment, not so according to the code. If I've understood correctly, the rest of the program wants the definition in the comment. I offer for comparison three other definitions of the relation, using "<<" rather than "<" because I want arithmetic comparison as well. A version in a functional language like ML or Miranda: len([]) = 0. len([_|T]) = 1+len(T). X << Y = len(X) < len(Y). A version in English: X << Y iff X and Y are lists and X has fewer elements than Y. A version in Prolog: [] << [_|_]. [_|X] << [_|Y] :- X << Y.
thom@tuhold (Thom Fruehwirth) (09/01/88)
In a recent article ok@quintus.uucp (Richard A. O'Keefe) writes: > Since the specification of the Soundex function requires that > duplicate characters be discarded, it is possible to construct > strings of arbitrary length _all_ of which must be inspected > even if only 3 codes are produced. (Stick "sss.......sss" in > front of any word you like.) But what sense does this make ? > Fruehwirth is quite right that I didn't take my "implementation" > version as far as I could have. Here's my "optimised" version: the > soundex_atoms/2 predicate and char_to_code/2 table didn't change. I pleased to see that Richard O'Keefes code follows my initial suggestions on how to transform soundex(-like) specifications. Only one little transformation is still missing: That of transforming zeros/2 away. It's a minor change, but it avoids going the roundabout way of counting the number of character-codes produced so far: % soundex_chars(+Letters, +PreviousCode, +FillInCode, -Digits) % in our case FillInCode = "000" soundex_chars([], _, Zeros, Zeros). soundex_chars([Char|Chars], Prev, Zeros, Codes) :- char_to_code(Char, Code), ( Code =:= Prev -> % Discard duplicate characters soundex_chars(Chars, Code, Zeros, Codes) ; Code =:= 0 -> % Ignore vowels soundex_chars(Chars, Code, Zeros, Codes) ; Zeros = [Zero] -> % Stop when 3 digits have been done Codes = [Code] ; Zeros = [Zero|Zeros1], % otherwise, convert 1 more character Codes = [Code|Codes1], soundex_chars(Chars, Code, Zeros1, Codes1) ). Isn't it more beautiful this way ? thom fruehwirth PS: Effiency is about the same, results depend on the test cases used.
ok@quintus.uucp (Richard A. O'Keefe) (09/02/88)
In article <1190@tuhold> thom@tuhold (Thom Fruehwirth) writes: >In a recent article ok@quintus.uucp (Richard A. O'Keefe) writes: >> Since the specification of the Soundex function requires that >> duplicate characters be discarded, it is possible to construct >> strings of arbitrary length _all_ of which must be inspected >> even if only 3 codes are produced. (Stick "sss.......sss" in >> front of any word you like.) >But what sense does this make ? The point is that for any N I can construct a list of length N which any implementation of the Soundex function must examine in its entirety: stopping after generating 4 elements of *output* doesn't mean that you can avoid looking at all of the *input*. Stopping early doesn't improve the worst case at all. >I pleased to see that Richard O'Keefe's code follows my initial >suggestions on how to transform Soundex(-like) specifications. Well, no. My code doesn't have all those cuts in it... >Only one little transformation is still missing: That of transforming >zeros/2 away. It's a minor change, but it avoids going the roundabout >way of counting the number of character-codes produced so far: It's a very pretty method, which I have often used. But it doesn't avoid counting. The technique is to exploit the epimorphism from lists to natural numbers (length/2), using [] -> 0 [_|X] -> X'+1 Minor correction: my code didn't count the number of character codes produced so far, but the number *NOT* produced (as does Fruewirth's latest version). The point of my postings about Soundex is that I deny the strong distinction Fruewirth draws bewteen "specification" and "implementation". In particular, I was challenging the idea implicit in his original pair of versions that the "implementation" version was licensed to be ugly. I think the series of postings so far has demonstrated quite clearly that "specification" and "implementation" are *directions* rather than *places*. We're still left with the rather unfortunate point that the clearest specification for Soundex that we've seen so far is Knuth's English text. I was going to include the cleanest functional specification I could come up with, but excluding the code table it is 14 lines, 84 "words", so isn't really an improvement on the Prolog "implementation". If this specification/implementation topic is of interest, perhaps we should find another example.
waldau@kuling.UUCP (Mattias Waldau) (09/02/88)
R.O'Keefe points out that my specification of < is wrong, thank you. He also says that his program should be easier to understand since it is shorter, but I say that this only holds under the assumption that you have to read the whole program to understand what it describes. An example: we want to define the relation that removes the vowels from a sequence of letters. In Prolog we would have to write a recursive program like remove_vowels([], []). remove_vowels([Vowel|With], Without) :- vowel(Vowel), remove_vowels(With, Without). remove_vovels([Not_a_vowel|With], [Not_a_vowel|Without]) :- not_a_vowel(Not_a_vowel), remove_vowels(With, Without). If we have an explicit definition of not_a_vowel this program can also be used backwards. If we had full first order logic we could instead write remove_vowels(With, Without) <-> subsequence(Without, With) & (Elements){Element in Without -> ~vowel(Element)} saying that Without must be a subsequence of With and all elements of Without are nonvowels. It is assumed that the predicates subsequence and in are already understood by the programmer, so the number of NEW symbols that he has to write/understand are fewer then in the Prolog example above. I generally believe that quantifiers are easier to understand than recursion, but if we have recursive datastructures we still have to use recursion when defining primitive relations like in and subsequence. -- Mattias Waldau
ok@quintus.uucp (Richard A. O'Keefe) (09/03/88)
In article <818@kuling.UUCP> waldau@kuling.UUCP (Mattias Waldau) writes: >He also says that his program should be easier to understand since it >is shorter, but I say that this only holds under the assumption that >you have to read the whole program to understand what it describes. > >An example: we want to define the relation that removes the vowels >from a sequence of letters. [He provides a Prolog version and the following 1st-order version:] >remove_vowels(With, Without) <-> > subsequence(Without, With) & > (Elements){Element in Without -> ~vowel(Element)} (1) The two definitions do not say the same thing. Consider the simpler case of vowel(0), nonvowel(1). With that definition, the Prolog version would reject remove_vowels([0,*], [*]), but the version quoted above would accept it. (2) By assuming that "the predicate... subseqence [is] already understood by the programmer", Waldau builds in the conclusion he wants. I would derive the Prolog version from the generic predicate exclude/3: remove_vowels(With, Without) :- exclude(vowel, With, Without). I believe that by distinguishing between "specific" predicates and "generic" predicates (predicate schemes) it is possible to produce a Prolog-like language with second-order convenience but provably first-order power, but I haven't proved it yet. I think it is fair to compare the full text of Waldau's specification with the full text of my "implementation" because both were self- contained. Waldau says >I generally believe that quantifiers are easier to understand >than >recursion (3) I guess it's a matter of what you're used to. I find definitions like exclude P [] = []. exclude P [X|Xs] = exclude P Xs if P(X). exclude P [X|Xs] = [X|exclude P Xs] otherwise. remove_vowels = exclude vowel. ever so much clearer than Waldau's version with explicit quantifiers. {Note that if Waldau can assume subsequence/2, I can assume exclude, so this specification has *neither* recursion nor quantifiers.} I would point out that definitions like this are (equational) logic too. I'd also be happy with a specification like remove_vowels(With, Wout) <-> subsequence(With, Wout) & range(Wout) .intersect. vowels = {} Again, this specification has *neither* recursion nor quantifiers. The whole thing can be expressed in APL as (WITH .notelement VOWELS) .compress WITH which is very much in the 'exclude' spirit. I think it is a bad idea to have a lot of explicit recursion in a specification and an even worse one to have a lot of explicit quantifiers. A specification should build up a high level vocabulary which permits the *succint* statement of the interesting bits. For an example of this, consider the Soundex algorithm again. It's about sequences. A language with a LOT of predefined vocabulary for sequences is APL. Modulo the fact that I haven't touched APL in about 5 years, here's an APL definition of Soundex. (I know that characters aren't integers in APL; I also know what to do about that.) $ K <- code # nullary function [1] K <- 256 .rho 0 # 256s 0s [2] K["aehiouwy"] <- 1 ... [7] K["r"] <- 6 $ $ S <- soundex X ;K # K is a local [1] K <- code[X] # K[i] = code[X[i]] [2] K <- ((1 .drop K) = -1 .drop K) .compress 1 .drop K # no dups [3] K <- (K = 0) .compress K # discard vowels [4] S <- X[1], 3 .take K, 0 0 0 # 1st letter, 3 0-padded digits $ That's pretty concise, and it manages without any form of iteration, whether indexed loops, recursion, or quantifiers, because it starts with a large given sequence-manipulation vocabulary. I have enough sequence-handling "predicate schemas" kicking around in my libraries that I can do this kind of thing in Prolog as soundex([X|Xs], [X,D1,D2,D3]) :- maplist(code, [X|Xs], [K|K1]), %\ these three schema exclude(=:=, K1, [K|K1], K2), %| instances combine into exclude(=:=(0), K2, K3), %/ a single induction on K1 append(K3, [0,0,0], [D1,D2,D3|_]). which is a deliberate parallel to the APL version. Just for interest, let's see what happens when we attack this with a few simple program transformations. a. Unfold the call to maplist. It is replaced by code(X, K), maplist(code, Xs, K1) b. Introduce an auxiliary predicate soundex(K1, K, Xs, K3) :- maplist(code, Xs, K1), exclude(=:=, K1, [K|K1], K2), exclude(=:=(0), K2, K3). c. Unfold maplist again, getting two cases: K1 = [] and K1 = [C|Cs]. soundex([], K, [], K3) :- exclude(=:=, [], [K], K2), exclude(=:=(0), K2, K3). soundex([C|Cs], K, [X|Xs], K3) :- code(X, C), maplist(code, Xs, Cs), exclude(=:=, [C|Cs], [K,C|Cs], K2), exclude(=:=(0), K2, K3). d. Unfold exclude/4 and exclude/3 in the first clause, getting soundex([], _, [], []). e. Unfold exclude/4 in the second clause, getting two new clauses soundex([C|Cs], K, [X|Xs], K3) :- code(X, C), C =:= K, maplist(code, Xs, Cs), exclude(=:=, Cs, [C|Cs], K2), exclude(=:=(0), K2, K3). soundex([C|Cs], K, [X|Xs], K3) :- code(X, C), C =\= K, maplist(code, Xs, Cs), exclude(=:=, Cs, [C|Cs], K2), exclude(=:=(0), [C|K2], K3). f. Fold the original definition of soundex/4 into the first clause, getting soundex([C|Cs], K, [X|Xs], K3) :- code(X, C), C =:= K, soundex(Cs, C, Xs, K3). g. Unfold exclude/3 in the last clause, replacing it by two new clauses: soundex([C|Cs], K, [X|Xs], K3) :- code(X, C), C =\= K, maplist(code, Xs, Cs), exclude(=:=, Cs, [C|Cs], K2), 0 =:= C, exclude(=:=(0), K2, K3). soundex([C|Cs], K, [X|Xs], [C|K3]) :- code(X, C), C =\= K, maplist(code, Xs, Cs), exclude(=:=, Cs, [C|Cs], K2), 0 =\= C, exclude(=:=(0), K2, K3). h. Fold the original definition of soundex/4 into these two clauses. The complete definition of soundex/4 now looks like soundex([], _, [], []). soundex([C|Cs], K, [X|Xs], K3) :- code(X, C), C =:= K, soundex(Cs, C, Xs, K3). soundex([C|Cs], K, [X|Xs], K3) :- code(X, C), C =\= K, 0 =:= C, soundex(Cs, C, Xs, K3). soundex([C|Cs], K, [X|Xs], [C|K3]) :- code(X, C), C =\= K, 0 =\= C, soundex(Cs, C, Xs, K3). Going from this to the version I offered a while back with if-then-elses (C =:= K -> .. ; C =:= 0 -> .. ; ..) is easy. This is such a *mechanical* derivation from the original specification that one normally just writes it straight down. That's one of the reasons why I stick with languages like ML and Prolog rather than going for full 1st-order predicate calculus with equality: with the aid of a few simple ideas I can do derivations like this and *easily* produce efficient code at the other end.
goldfain@osiris.cso.uiuc.edu (09/05/88)
In comp.lang.prolog, Sep 2, 1988 by waldau@kuling.UUCP writes:
(incomplete excerpts below, bracketted with "[" symbols -MSG)
[ An example: we want to define the relation that removes the vowels
[ from a sequence of letters. In Prolog we would have to write a
[ recursive program like
[
[ remove_vowels([], []).
[ remove_vowels([Vowel|With], Without) :-
[ vowel(Vowel),
[ remove_vowels(With, Without).
[ remove_vovels([Not_a_vowel|With], [Not_a_vowel|Without]) :-
[ not_a_vowel(Not_a_vowel),
[ remove_vowels(With, Without).
[
[ If we had full first order logic we could instead write
[
[ remove_vowels(With, Without) <->
[ subsequence(Without, With) &
[ (Elements){Element in Without -> ~vowel(Element)}
[
[ I generally believe that quantifiers are easier to understand than
[ recursion, but if we have recursive datastructures we still have to use
[ recursion when defining primitive relations like in and subsequence.
[ Mattias Waldau
(end of excerpts)
Perhaps they are NOT that much easier to work with! For example, the prolog
program looks like it correctly fulfills the problem specification, whereas
the richer-logic version allows the following solutions, which were evidently
not the author's intention:
?- remove_vowels([t,o,o,m,a,n,y], Result)
Result = [t,m,n] (this one is fine)
Result = [m,n]
Result = [t,n]
Result = [t,m]
Result = [t]
Result = [m]
Result = [n]
Result = []
This is partly fixed if we change the "->" to "<->" in the proposed clause:
remove_vowels(With, Without) <->
subsequence(Without, With) &
(Elements){Element in Without <-> ~vowel(Element)}
but this still allows erroneous results for remove_vowels([x,x,x], Result)
where a non-vowel has multiple occurrences. (This latter conclusion depends
on the precise meaning of the construct:
(Elements){ expression }
which is not clearly specified in the note.)
On the other hand, I DO support the idea of working with a richer set of
connectives than provided in basic Prolog systems. I would like them for
convenience.
- Mark Goldfain
waldau@kuling.UUCP (Mattias Waldau) (09/06/88)
Mark Goldfain found a serious bug in the last full first order version of remove_vowels, and I see no easy fix right now. So the original version is the only correct one. It just a set of rewrite rules, and is probably more efficient in an equational language. The soundex-algorithm is not easy specifiable, since it is an algorithm where we transform A to B, B to C, and the relationship between A and C is not easy to specify. (What does remove adjancent identical codes mean after we have removed the vowels?) Therefor I tried, but I didn't get a clearer description than the corresponding conventional well written program. -- Mattias Waldau
ok@quintus.uucp (Richard A. O'Keefe) (09/07/88)
In article <822@kuling.UUCP> waldau@kuling.UUCP (Mattias Waldau) writes: >The soundex-algorithm is not easy specifiable, since it is an >algorithm where we transform A to B, B to C, and the relationship >between A and C is not easy to specify. (a) If we transform A to B then B to C, it is easy to specify this provided the A->B and B->C mappings are separately easy to specify. Why should we care about A->C? Presumably if that was easy to describe, we'd describe it directly and forget about B entirely. (b) The Soundex function is not hard to specify at all, *if* you use an appropriate *vocabulary*. I showed in an earlier message how, if you already have a suitable kit of schemas, Soundex can be specified in Prolog with one clause having four body goals. That particular kit wouldn't be much good for handling problems involving sets, whereas the methods that Waldau used would work rather well in that case.