[comp.lang.prolog] Soundex revisited

ok@quintus (08/23/88)

In a recent article, Thom Fruehwirth suggested that it is a good idea
to distinguish between Prolog as a specification language and Prolog
as an implementation language.  Quite right.  When he suggests that
you should write a specification version and an implementation version
of every predicate, I think he goes rather too far.  In particular, it
is not necessary for the implementation of Soundex to be quite as ugly
as his implementation version.  Proof by illustration:

/*  This description of the Soundex algorithm is taken from
    Knuth, "The Art of Computer Programming", Vol 3 "Sorting and Searching",
    page 392 of the first edition.

    1.  Retain the first letter of the name,
	and drop all occurrences of a, e, h, i, o, u, w, y in other positions.

    2.	Assign the following numbers to the remaining letters after the first:
	b, f, p, v -> 1			l -> 4
	c, g, j, k, q, s, x, z -> 2	m, n -> 5
	d, t -> 3			r -> 6

    3.	If two or more letters with the same code were adjacent in the
	original name (before step 1), omit all but the first.

    4.	Convert to the form "letter, digit, digit, digit" by adding
	trailing zeros (if there are fewer than three digits) or by
	dropping rightmost digits (if there are more than three).

    Note that this specification refers only to lower-case letters, so
    that is what the Prolog code below implements.  To handle upper-
    case letters as well, see the comment in chars_to_codes/2.
*/

soundex_atoms(Name, Soundex) :-
	name(Name, Chars),
	soundex_chars(Chars, Code),
	name(Soundex, Code).

soundex_chars([Char|Chars], [Char|Digits]) :-
	chars_to_codes([Char|Chars], [Code|Codes]),
	omit_duplicates_and_vowels(Codes, Code, CodesWithoutDuplicates),
	three_digits(CodesWithoutDuplicates, Digits).

/*  To handle both cases of letters, do
	to_lower(Char, Lower),
	char_to_code(Lower, Code)
    if you have to_lower/2, or do
	Lower is Char \/ 8'040,
	char_to_code(Lower, Code)
    if the input uses ISO 8859/1 and you haven't got to_lower/2.
*/
chars_to_codes([], []).
chars_to_codes([Char|Chars], [Code|Codes]) :-
	char_to_code(Char, Code),
	chars_to_codes(Chars, Codes).

char_to_code(0'a, 0).
char_to_code(0'e, 0).
char_to_code(0'h, 0).
char_to_code(0'i, 0).
char_to_code(0'o, 0).
char_to_code(0'u, 0).
char_to_code(0'w, 0).
char_to_code(0'y, 0).
char_to_code(0'b, 0'1).
char_to_code(0'f, 0'1).
char_to_code(0'p, 0'1).
char_to_code(0'v, 0'1).
char_to_code(0'c, 0'2).
char_to_code(0'g, 0'2).
char_to_code(0'j, 0'2).
char_to_code(0'k, 0'2).
char_to_code(0'q, 0'2).
char_to_code(0's, 0'2).
char_to_code(0'x, 0'2).
char_to_code(0'z, 0'2).
char_to_code(0'd, 0'3).
char_to_code(0't, 0'3).
char_to_code(0'l, 0'4).
char_to_code(0'm, 0'5).
char_to_code(0'n, 0'5).
char_to_code(0'r, 0'6).

%   Note that omit_duplicates_and_vowels/3 uses if->then;elses, but that
%   the "then" arrows could be replaced by conjunction without harming
%   the logic.

omit_duplicates_and_vowels([], _, []).
omit_duplicates_and_vowels([Code|Codes], Prev, Without) :-
	(   Code =:= Prev -> Without = More	% omit a duplicate
	;   Code =:= 0    -> Without = More	% omit a vowel
	;   Code =\= Prev, Code =\= 0 -> Without = [Code|More]
	),
	omit_duplicates_and_vowels(Codes, Code, More).


%   The cuts in three_digits/2 are all "green" ones.  That is, they
%   prune away (failed) proofs, not solutions.  Deeper indexing would
%   obviate the need for such cuts.

three_digits([],		"000").
three_digits([D1],	    [D1|"00"]) :- !.
three_digits([D1,D2],	  [D1,D2|"0"]) :- !.
three_digits([D1,D2,D3|_], [D1,D2,D3]).


/*  Test cases (from Knuth):
	euler, ellery		-> e460
	gauss, ghosh		-> g200
	hilbert, heilbronn	-> h416
	knuth, kant		-> k530
	lloyd, ladd		-> l300
	lukasiewicz, lissajous	-> l222
*/

thom@tuhold (Thom Fruehwirth) (08/26/88)

Commenting on my recent article, Richard O'Keefe <305@quintus.UUCP>
tried to proof by illustration that my implementation version of
soundex is too ugly.

Richard O'Keefes implementation is based on a different
specification, which is easier to implement. Second, his version
is only half-way between my specification and implementation version
To yield the same efficiency, a joining of his three predicates
called in soundex_chars/2 into one is necessary. Note that the
resulting version is arbitrarily faster than the current version
depending on the length of the word (how about 'CodesWithoutDuplicates')
one codes. The reason for this is that my  version
never codes more than three digits, while O'Keefes version codes
the whole word.

ok@quintus.uucp (Richard A. O'Keefe) (08/30/88)

In article <1178@tuhold> thom@tuhold (Thom Fruehwirth) writes:
>Richard O'Keefe's implementation is based on a different
>specification, which is easier to implement.

My program was based on the specification of the Soundex function
that Knuth provides.  Anything which computes a different function
is not Soundex.

>Note that the resulting version is arbitrarily faster than the current
version depending on the length of the word one codes.  The reason for
>for this is that my version never codes more than three digits, while
>O'Keefe's version codes the whole word.

Since the Soundex algorithm is primarily intended for encoding surnames,
this is not much of a problem.  Since the specification of the Soundex
function requires that duplicate characters be discarded, it is possible
to construct strings of arbitrary length _all_ of which must be inspected
even if only 3 codes are produced.  (Stick "sss.......sss" in front of any
word you like.)

However, Fruehwirth is quite right that I didn't take my "implementation"
version as far as I could have.  Here's my "optimised" version: the
soundex_atoms/2 predicate and char_to_code/2 table didn't change at all.  

soundex_chars([Char|Chars], [Char|Codes]) :-
	char_to_code(Char, Code),
	soundex_chars(Chars, Code, 3, Codes).

%   soundex_chars(+Letters, +PreviousCode, +DigitsWanted, -Digits)

soundex_chars([], _, N, Zeros) :-
	zeros(N, Zeros).
soundex_chars([Char|Chars], Prev, N, Codes) :-
	char_to_code(Char, Code),
	(   Code =:= Prev ->		% Discard duplicate characters
	    soundex_chars(Chars, Code, N, Codes)
	;   Code =:= 0 ->		% Ignore vowels
	    soundex_chars(Chars, Code, N, Codes)
	;   N =:= 1 ->			% Stop when 3 digits have been done
	    Codes = [Code]
	;   M is N-1,			% otherwise, convert 1 more character
	    Codes = [Code|Codes1],
	    soundex_chars(Chars, Code, M, Codes1)
	).

zeros(1, "0").
zeros(2, "00").
zeros(3, "000").

Ironically, this has even fewer cuts than the version I started with!
It provides even less support for the idea that ugliness is warranted
for efficiency's sake.

Testing the two versions on a random sample of 20 names drawn from
/usr/dict/words, the improved version was a little under twice as
fast as the version I first thought of.

waldau@kuling.UUCP (Mattias Waldau) (08/30/88)

Thom Fruewirth and R.O.Keefe have both specified following soundex
algorithm with Horn clauses. Following the tradition of Keith CLark,
Sten-Ake Tarnlund, Zohar Manna, Richard Waldinger, and Alan Bundy (see
his last lecture at the logic conference in Seattle) I have tried to
specify the same algorithm in first order predicate logic with
equality. 

The idea is that that specification should be easier to understand,
and therefor more likely to be correct. The question to the newsgroup
is whether this is true or not.


/*  This description of the Soundex algorithm is taken from
    Knuth, "The Art of Computer Programming", Vol 3 "Sorting and Searching",
    page 392 of the first edition.

    1.  Retain the first letter of the name,
	and drop all occurrences of a, e, h, i, o, u, w, y in other positions.

    2.	Assign the following numbers to the remaining letters after the first:
	b, f, p, v -> 1			l -> 4
	c, g, j, k, q, s, x, z -> 2	m, n -> 5
	d, t -> 3			r -> 6

    3.	If two or more letters with the same code were adjacent in the
	original name (before step 1), omit all but the first.

    4.	Convert to the form "letter, digit, digit, digit" by adding
	trailing zeros (if there are fewer than three digits) or by
	dropping rightmost digits (if there are more than three).


*/


Syntax:
=======


I use ~, (x), (e x), &, |, ->, <-, <-> for not, forall, exists, and,
or, onlyif, if, and iff respectively. Not, exists, forall binds
titest, then and, then or, then onlyif and if, and at last iff. So
a<->b|c&(e x)d&e->f is the same as a<->((b|(c&((e x)d)&e))->f).

Variable names start with a capital letter, constants, functions and
predicates are written with lower-case only. Free variables are
quantified with forall.

I use standard Prolog list syntax. * is the append function. *, eq,
teq, eq2, teq2, in, =, and < are written as infix operators.



Specification:
==============


% soundex(Sequence_of_chars, Soundex_code), Soundex_code is the
% soundex code of the Sequence_of_chars

soundex([Letter|Rest], [Letter|Digits])) <->
	(e T)(e V)
	  {without_adjacent_identical_codes([Letter|Rest], [Letter|T]) &
	   without_vowels(T, V) &
	   three_first_digits(V, Digits)}


% without_identical_adjancent_codes(With, Without), Without is the
% shortest sequence without identical adjancent codes that is identical
% to With.

without_adjancent_identical_codes(With, Without) <->
	With teq Without &
	~(e Shorter){With teq Shorter & Shorter<Without}

 
% A eq B, B is the same sequence as A except that one letter with the
% same code as the preceeding one is removed. 
%
% A teq B, teq is the transitive closure of eq.


With eq Without <-> (e Before)(e After)(e Dup1)(e Dup2)
	     {With=Before*[Dup1, Dup2]*After &
	      Without=Before*[Dup1]*After &
	      code(Dup1)=code(Dup2)}

A teq B <-> A=B | (e C){A eq C & C teq B}


% A<B, the length of A is shorter than the length of B.

A<B <-> (e Before)(e After)(e Element)
	  {B=Before*[Element]*After &
	   (A=Before*After | A<Before*After)}


% without_vowels(With, Without), Without is all the non-vowel letters
% of With in the same order. 

without_vowels(With, Without) <->
	With teq2 Without &
	(Element){Element in Without -> ~vowel(Element)}


% A eq2 B, B is the same sequence as A, but with one vowel removed. 
% A teq2 B, teq2 is the transitive closure of eq2.

A eq2 B <-> (e Before)(e Vowel)(e After)
	      {A=Before*[Vovel]*After &
	       B=Before*After &
	       vowel(Vowel)}

A teq2 B <-> A=B | (e C){A eq2 C & C teq2 B}


% three_digits(V, Digits), V is a sequence of letters, Digits is a
% sequence of the codes of the three first letters. If there are fewer
% than three letters, add trailing zeroes to Digits.


three_digits(V, Digits) <->
	V=[] & Digits=[0, 0, 0] |
	V=[Char1] & Digits=[code(char1), 0, 0] |
	V=[Char1, Char2] & Digits=[code(Char1), code(Char2), 0] |
	(e Garbage){V=[Char1, Char2, Char3|Garbage] & 
	              Digits=[code(Char1), code(Char2), code(Char3)]}





% vowel(V), V is a vowel (almost, h and w isn't)

vowel(V) <-> {V=a | V=e | V=h | V=i | V=o | V=u | V=w | V=y }


% code(Letter)=Code, Letter has the code Code.

code(L)=C <-> (L=b | L=f | L=p | L=v -> C=1) &
	      (L=c | L=g | L=j | L=k | L=q | L=s | L=x | L=z -> C=2) &
	      (L=d | L=t -> C=3) &
	      (L=l -> C=4) &
	      (L=m | L=n -> C=5) &
	      (L+r -> C=6)


% A*B=C, C is the concatenation of the sequences A and B.

A*B=C <-> A=[] & B=C | (e H)(e T){A=[H|T] & C=[H|T*B]}


% E in S, E is a element of the sequence S.

E in S <-> (e H)(e T){S=[H|T] & (E=H | E in T)}



/Mattias Waldau



-- 
Mattias Waldau                              mattias-waldau@aida.uu.se or
Computing Science Department                waldau@kuling.uucp
P.O. Box 520, S-751 20  Uppsala, Sweden     Phone: +46-18-181055

ok@quintus.uucp (Richard A. O'Keefe) (09/01/88)

In article <816@kuling.UUCP> waldau@kuling.UUCP (Mattias Waldau) writes:
>Thom Fruewirth and R.O'Keefe have both specified following Soundex
>algorithm with Horn clauses.  Following the tradition of Keith Clark,
>Sten-Ake Tarnlund, Zohar Manna, Richard Waldinger, and Alan Bundy (see
>his last lecture at the logic conference in Seattle) I have tried to
>specify the same algorithm in first order predicate logic with
>equality. 
>
>The idea is that that specification should be easier to understand,
>and therefore more likely to be correct.  The question to the newsgroup
>is whether this is true or not.

Excluding comments, blank lines, and the table which maps letters to
digits (and which are to be dropped), the last version I presented had
    24 lines,  82 "words", 3.4 "words"/line.
With the same exclusions, Waldau's version has
    32 lines, 201 "words", 6.3 "words"/line.
{I really *must* write an editor command to count tokens rather than "words".}

Other things being equal (which they aren't) I would be surprised if a
text containing two and a half times as many "words" were easier to
understand.  Note that the version I presented *is* first-order logic
except for the use of if->then;else, and eliminating that by adding an
appropriate number of inequalities adds a mere 6 "words" to the total.

Easily the shortest specification is the one which Knuth provides; as I
typed it in it took a mere 11 non-blank lines of English, and that
includes the tables.

It is interesting to note that Waldau's comment explaining "A<B
disagrees with the code:

>	% A<B, the length of A is shorter than the length of B.

>	A<B <-> (e Before)(e After)(e Element)
>	    {B=Before*Element*After & (A=Before*After | A<Before*After)}

>	A*B=C <-> A=[] & B=C | (e H)(e T){A=[H|T] & C=[H|T*B]}

The code defines A < B when A is a *subsequence* of B, so that although
[a] < [b,c] according to the comment, not so according to the code.  If
I've understood correctly, the rest of the program wants the definition
in the comment.

I offer for comparison three other definitions of the relation, using
"<<" rather than "<" because I want arithmetic comparison as well.
A version in a functional language like ML or Miranda:

    len([]) = 0.
    len([_|T]) = 1+len(T).

    X << Y = len(X) < len(Y).

A version in English:

    X << Y iff X and Y are lists and X has fewer elements than Y.

A version in Prolog:

    [] << [_|_].
    [_|X] << [_|Y] :- X << Y.

thom@tuhold (Thom Fruehwirth) (09/01/88)

In a recent article ok@quintus.uucp (Richard A. O'Keefe) writes:  
> Since the specification of the Soundex function requires that 
> duplicate characters be discarded, it is possible to construct
> strings of arbitrary length _all_ of which must be inspected
> even if only 3 codes are produced.  (Stick "sss.......sss" in
> front of any word you like.)
But what sense does this make ?

> Fruehwirth is quite right that I didn't take my "implementation"
> version as far as I could have.  Here's my "optimised" version: the
> soundex_atoms/2 predicate and char_to_code/2 table didn't change.

I pleased to see that Richard O'Keefes code follows my initial 
suggestions on how to transform soundex(-like) specifications.
Only one little transformation is still missing: That of transforming
zeros/2 away. It's a minor change, but it avoids going the roundabout
way of counting the number of character-codes produced so far:

%   soundex_chars(+Letters, +PreviousCode, +FillInCode, -Digits)
%   	in our case FillInCode = "000"

soundex_chars([], _, Zeros, Zeros).
soundex_chars([Char|Chars], Prev, Zeros, Codes) :-
   char_to_code(Char, Code),
   (   Code =:= Prev ->        % Discard duplicate characters
            soundex_chars(Chars, Code, Zeros, Codes)
   ;   Code =:= 0 ->           % Ignore vowels
            soundex_chars(Chars, Code, Zeros, Codes)
   ;   Zeros = [Zero] ->       % Stop when 3 digits have been done
            Codes = [Code]
   ;   Zeros = [Zero|Zeros1],  % otherwise, convert 1 more character
            Codes = [Code|Codes1],
            soundex_chars(Chars, Code, Zeros1, Codes1)
   ).

Isn't it more beautiful this way ?

thom fruehwirth

PS: Effiency is about the same, results depend on the test cases used.

ok@quintus.uucp (Richard A. O'Keefe) (09/02/88)

In article <1190@tuhold> thom@tuhold (Thom Fruehwirth) writes:
>In a recent article ok@quintus.uucp (Richard A. O'Keefe) writes:  
>> Since the specification of the Soundex function requires that 
>> duplicate characters be discarded, it is possible to construct
>> strings of arbitrary length _all_ of which must be inspected
>> even if only 3 codes are produced.  (Stick "sss.......sss" in
>> front of any word you like.)
>But what sense does this make ?

The point is that for any N I can construct a list of length N
which any implementation of the Soundex function must examine
in its entirety: stopping after generating 4 elements of *output*
doesn't mean that you can avoid looking at all of the *input*.
Stopping early doesn't improve the worst case at all.

>I pleased to see that Richard O'Keefe's code follows my initial 
>suggestions on how to transform Soundex(-like) specifications.

Well, no.  My code doesn't have all those cuts in it...

>Only one little transformation is still missing: That of transforming
>zeros/2 away. It's a minor change, but it avoids going the roundabout
>way of counting the number of character-codes produced so far:

It's a very pretty method, which I have often used.  But it doesn't
avoid counting.  The technique is to exploit the epimorphism from
lists to natural numbers (length/2), using
	[]		-> 0
	[_|X]		-> X'+1
Minor correction: my code didn't count the number of character codes
produced so far, but the number *NOT* produced (as does Fruewirth's
latest version).

The point of my postings about Soundex is that I deny the strong
distinction Fruewirth draws bewteen "specification" and "implementation".
In particular, I was challenging the idea implicit in his original
pair of versions that the "implementation" version was licensed to be
ugly.  I think the series of postings so far has demonstrated quite
clearly that "specification" and "implementation" are *directions*
rather than *places*.

We're still left with the rather unfortunate point that the clearest
specification for Soundex that we've seen so far is Knuth's English text.

I was going to include the cleanest functional specification I could
come up with, but excluding the code table it is 14 lines, 84 "words",
so isn't really an improvement on the Prolog "implementation".

If this specification/implementation topic is of interest, perhaps we
should find another example.

waldau@kuling.UUCP (Mattias Waldau) (09/02/88)

R.O'Keefe points out that my specification of < is wrong, thank you.
He also says that his program should be easier to understand since it
is shorter, but I say that this only holds under the assumption that
you have to read the whole program to understand what it describes.

An example: we want to define the relation that removes the vowels
from a sequence of letters. In Prolog we would have to write a
recursive program like

remove_vowels([], []).
remove_vowels([Vowel|With], Without) :-
	vowel(Vowel), 
	remove_vowels(With, Without).
remove_vovels([Not_a_vowel|With], [Not_a_vowel|Without]) :-
	not_a_vowel(Not_a_vowel),
	remove_vowels(With, Without).

If we have an explicit definition of not_a_vowel this program can also
be used backwards.


If we had full first order logic we could instead write

remove_vowels(With, Without) <->
	subsequence(Without, With) &
	(Elements){Element in Without -> ~vowel(Element)}

saying that Without must be a subsequence of With and all elements of
Without are nonvowels. It is assumed that the predicates subsequence
and in are already understood by the programmer, so the number of NEW
symbols that he has to write/understand are fewer then in the Prolog
example above.

I generally believe that quantifiers are easier to understand than
recursion, but if we have recursive datastructures we still have to use
recursion when defining primitive relations like in and subsequence.
-- 
Mattias Waldau

ok@quintus.uucp (Richard A. O'Keefe) (09/03/88)

In article <818@kuling.UUCP> waldau@kuling.UUCP (Mattias Waldau) writes:
>He also says that his program should be easier to understand since it
>is shorter, but I say that this only holds under the assumption that
>you have to read the whole program to understand what it describes.
>
>An example: we want to define the relation that removes the vowels
>from a sequence of letters.
[He provides a Prolog version and the following 1st-order version:]
>remove_vowels(With, Without) <->
>	subsequence(Without, With) &
>	(Elements){Element in Without -> ~vowel(Element)}

(1) The two definitions do not say the same thing.  Consider the simpler
    case of vowel(0), nonvowel(1).  With that definition, the Prolog
    version would reject remove_vowels([0,*], [*]), but the version
    quoted above would accept it.

(2) By assuming that "the predicate... subseqence [is] already understood
    by the programmer", Waldau builds in the conclusion he wants.  I would
    derive the Prolog version from the generic predicate exclude/3:

    remove_vowels(With, Without) :- exclude(vowel, With, Without).

    I believe that by distinguishing between "specific" predicates and
    "generic" predicates (predicate schemes) it is possible to produce
    a Prolog-like language with second-order convenience but provably
    first-order power, but I haven't proved it yet.

    I think it is fair to compare the full text of Waldau's specification
    with the full text of my "implementation" because both were self-
    contained.

Waldau says
>I generally believe that quantifiers are easier to understand
>than >recursion

(3) I guess it's a matter of what you're used to.  I find definitions like

	exclude P [] = [].
	exclude P [X|Xs] = exclude P Xs     if P(X).
	exclude P [X|Xs] = [X|exclude P Xs] otherwise.

	remove_vowels = exclude vowel.

    ever so much clearer than Waldau's version with explicit quantifiers.
    {Note that if Waldau can assume subsequence/2, I can assume exclude,
    so this specification has *neither* recursion nor quantifiers.}
    I would point out that definitions like this are (equational) logic
    too.  I'd also be happy with a specification like

	remove_vowels(With, Wout) <->
	    subsequence(With, Wout) &
	    range(Wout) .intersect. vowels = {}

    Again, this specification has *neither* recursion nor quantifiers.
    The whole thing can be expressed in APL as
	(WITH .notelement VOWELS) .compress WITH
    which is very much in the 'exclude' spirit.

    I think it is a bad idea to have a lot of explicit recursion in a
    specification and an even worse one to have a lot of explicit
    quantifiers.  A specification should build up a high level vocabulary
    which permits the *succint* statement of the interesting bits.

    For an example of this, consider the Soundex algorithm again.
    It's about sequences.  A language with a LOT of predefined vocabulary
    for sequences is APL.  Modulo the fact that I haven't touched APL in
    about 5 years, here's an APL definition of Soundex.  (I know that
    characters aren't integers in APL; I also know what to do about that.)

    $ K <- code				# nullary function
    [1] K <- 256 .rho 0			# 256s 0s
    [2] K["aehiouwy"] <- 1
        ...
    [7] K["r"]        <- 6
    $

    $ S <- soundex X	;K		# K is a local
    [1] K <- code[X]			# K[i] = code[X[i]]
    [2] K <- ((1 .drop K) = -1 .drop K) .compress 1 .drop K # no dups
    [3] K <- (K = 0) .compress K	# discard vowels
    [4] S <- X[1], 3 .take K, 0 0 0	# 1st letter, 3 0-padded digits
    $

    That's pretty concise, and it manages without any form of iteration,
    whether indexed loops, recursion, or quantifiers, because it starts
    with a large given sequence-manipulation vocabulary.

    I have enough sequence-handling "predicate schemas" kicking around
    in my libraries that I can do this kind of thing in Prolog as

	soundex([X|Xs], [X,D1,D2,D3]) :-
	    maplist(code, [X|Xs], [K|K1]),  %\ these three schema
	    exclude(=:=, K1, [K|K1], K2),   %| instances combine into
	    exclude(=:=(0), K2, K3),	    %/ a single induction on K1
	    append(K3, [0,0,0], [D1,D2,D3|_]).

    which is a deliberate parallel to the APL version.

Just for interest, let's see what happens when we attack this with a few
simple program transformations.

a. Unfold the call to maplist.  It is replaced by

	code(X, K),
	maplist(code, Xs, K1)

b. Introduce an auxiliary predicate

    soundex(K1, K, Xs, K3) :-
	maplist(code, Xs, K1),
	exclude(=:=, K1, [K|K1], K2),
	exclude(=:=(0), K2, K3).

c. Unfold maplist again, getting two cases: K1 = [] and K1 = [C|Cs].

    soundex([], K, [], K3) :-
	exclude(=:=, [], [K], K2),
	exclude(=:=(0), K2, K3).
    soundex([C|Cs], K, [X|Xs], K3) :-
	code(X, C),
	maplist(code, Xs, Cs),
	exclude(=:=, [C|Cs], [K,C|Cs], K2),
	exclude(=:=(0), K2, K3).

d. Unfold exclude/4 and exclude/3 in the first clause, getting

    soundex([], _, [], []).

e. Unfold exclude/4 in the second clause, getting two new clauses

    soundex([C|Cs], K, [X|Xs], K3) :-
	code(X, C),
	C =:= K,
	maplist(code, Xs, Cs),
	exclude(=:=, Cs, [C|Cs], K2),
	exclude(=:=(0), K2, K3).
    soundex([C|Cs], K, [X|Xs], K3) :-
	code(X, C),
	C =\= K,
	maplist(code, Xs, Cs),
	exclude(=:=, Cs, [C|Cs], K2),
	exclude(=:=(0), [C|K2], K3).

f. Fold the original definition of soundex/4 into the first clause,
   getting

    soundex([C|Cs], K, [X|Xs], K3) :-
	code(X, C),
	C =:= K,
	soundex(Cs, C, Xs, K3).

g. Unfold exclude/3 in the last clause, replacing it by two new clauses:

    soundex([C|Cs], K, [X|Xs], K3) :-
	code(X, C),
	C =\= K,
	maplist(code, Xs, Cs),
	exclude(=:=, Cs, [C|Cs], K2),
	0 =:= C,
	exclude(=:=(0), K2, K3).
    soundex([C|Cs], K, [X|Xs], [C|K3]) :-
	code(X, C),
	C =\= K,
	maplist(code, Xs, Cs),
	exclude(=:=, Cs, [C|Cs], K2),
	0 =\= C,
	exclude(=:=(0), K2, K3).

h. Fold the original definition of soundex/4 into these two clauses.
   The complete definition of soundex/4 now looks like

    soundex([], _, [], []).
    soundex([C|Cs], K, [X|Xs], K3) :-
	code(X, C),
	C =:= K,
	soundex(Cs, C, Xs, K3).
    soundex([C|Cs], K, [X|Xs], K3) :-
	code(X, C),
	C =\= K,
	0 =:= C,
	soundex(Cs, C, Xs, K3).
    soundex([C|Cs], K, [X|Xs], [C|K3]) :-
	code(X, C),
	C =\= K,
	0 =\= C,
	soundex(Cs, C, Xs, K3).

    Going from this to the version I offered a while back with
    if-then-elses (C =:= K -> .. ; C =:= 0 -> .. ; ..) is easy.

This is such a *mechanical* derivation from the original specification
that one normally just writes it straight down.  That's one of the
reasons why I stick with languages like ML and Prolog rather than going
for full 1st-order predicate calculus with equality:  with the aid of a
few simple ideas I can do derivations like this and *easily* produce
efficient code at the other end.

goldfain@osiris.cso.uiuc.edu (09/05/88)

In comp.lang.prolog, Sep  2, 1988 by waldau@kuling.UUCP writes:
(incomplete excerpts below, bracketted with "[" symbols  -MSG)
[ An example: we want to define the relation that removes the vowels
[ from a sequence of letters. In Prolog we would have to write a
[ recursive program like
[ 
[ remove_vowels([], []).
[ remove_vowels([Vowel|With], Without) :-
[ 	vowel(Vowel), 
[ 	remove_vowels(With, Without).
[ remove_vovels([Not_a_vowel|With], [Not_a_vowel|Without]) :-
[ 	not_a_vowel(Not_a_vowel),
[ 	remove_vowels(With, Without).
[ 
[ If we had full first order logic we could instead write
[ 
[ remove_vowels(With, Without) <->
[ 	subsequence(Without, With) &
[ 	(Elements){Element in Without -> ~vowel(Element)}
[ 
[ I generally believe that quantifiers are easier to understand than
[ recursion, but if we have recursive datastructures we still have to use
[ recursion when defining primitive relations like in and subsequence.
[ Mattias Waldau
(end of excerpts)

Perhaps they are NOT that much easier to  work with!  For  example, the prolog
program looks like  it correctly fulfills  the problem  specification, whereas
the richer-logic version allows the  following solutions, which were evidently
not the author's intention:

?- remove_vowels([t,o,o,m,a,n,y], Result)

Result = [t,m,n]           (this one is fine)
Result = [m,n]
Result = [t,n]
Result = [t,m]
Result = [t]
Result = [m]
Result = [n]
Result = []

This is partly fixed if we change the "->" to "<->" in the proposed clause:

    remove_vowels(With, Without) <->
        subsequence(Without, With) &
        (Elements){Element in Without <-> ~vowel(Element)}

but this still allows erroneous results for remove_vowels([x,x,x], Result)
where a non-vowel has multiple occurrences.  (This latter conclusion depends
on the precise meaning of the construct:
     (Elements){ expression }
which is not clearly specified in the note.)

On the other hand, I DO support the idea of working with a richer set of
connectives than provided in basic Prolog systems.  I would like them for
convenience.
                                                      - Mark Goldfain

waldau@kuling.UUCP (Mattias Waldau) (09/06/88)

Mark Goldfain found a serious bug in the last full first order version
of remove_vowels, and I see no easy fix right now. So the original
version is the only correct one. It just a set of rewrite rules, and
is probably more efficient in an equational language.
 

The soundex-algorithm is not easy specifiable, since it is an
algorithm where we transform A to B, B to C, and the relationship
between A and C is not easy to specify. (What does remove adjancent
identical codes mean after we have removed the vowels?) Therefor I
tried, but I didn't get a clearer description than the corresponding
conventional well written program.

-- 
Mattias Waldau

ok@quintus.uucp (Richard A. O'Keefe) (09/07/88)

In article <822@kuling.UUCP> waldau@kuling.UUCP (Mattias Waldau) writes:
>The soundex-algorithm is not easy specifiable, since it is an
>algorithm where we transform A to B, B to C, and the relationship
>between A and C is not easy to specify.

(a) If we transform A to B then B to C, it is easy to specify this
    provided the A->B and B->C mappings are separately easy to specify.
    Why should we care about A->C?  Presumably if that was easy to
    describe, we'd describe it directly and forget about B entirely.

(b) The Soundex function is not hard to specify at all, *if* you use an
    appropriate *vocabulary*.  I showed in an earlier message how, if
    you already have a suitable kit of schemas, Soundex can be specified
    in Prolog with one clause having four body goals.  That particular
    kit wouldn't be much good for handling problems involving sets,
    whereas the methods that Waldau used would work rather well in that case.