[comp.lang.prolog] PROLOG Digest V5 #73

PROLOG-REQUEST@SUSHI.STANFORD.EDU (Chuck Restivo, The Moderator) (10/13/87)

PROLOG Digest           Wednesday, 14 Oct 1987     Volume 5 : Issue 73

Today's Topics:
                   Announcement - Call For Papers,
                     Implementation - Persimmons
----------------------------------------------------------------------

Date: Fri, 9 Oct 87 0:07:32 EDT
From: Ken Bowen <kabowen@logiclab.cis.syr.edu>
Subject: Digest Submission: Call for Papers

                           CALL FOR PAPERS
                  ASSOCIATION FOR LOGIC PROGRAMMING

           Fifth International Logic Programming Conference
                 Fifth Symposium on Logic Programming

        15-19 August 1988 -- Seattle,  Washington

    The Association for Logic Programming is sponsoring the first joint
meeting of the two principal Logic Programming conferences.  Previous meetings
of the International Conference have taken place in Marseilles, Stockholm,
London, and Melbourne.  Previous meetings of the Symposium have taken place
in Atlantic City, Boston, Salt Lake City, and San Francisco.

    Authors are invited to submit papers on all aspects of Logic Programming,
including, but not restricted to:

        +  Applications
        +  Role in Artificial Intelligence
        +  Deductive Databases
        +  Relations to other computational paradigms
        +  Language issues
        +  Methodology
        +  Implementations on sequential and parallel architectures
        +  Theory
Of special interest are applications that exploit the unique character of
logic programming.

PAPERS DUE:  1 February 1988.
============================
    Short papers are limited to five double-spaced 10-point pages;
    Long papers are limited to twelve double-spaced 10-point pages.

Notification of Acceptance:  9 May 1988.
Camera-Ready Copy Due:       10 June, 1988.

Send five (5) copies of submitted manuscripts to the Program Chairman:
        Robert A. Kowalski, LP88
        Department of Computing
        Imperial College
        180 Queen's Gate,
        London SW7 2BZ
        ENGLAND

For other information contact the Conference Chairman:
        Kenneth A. Bowen, LP88
        Logic Programming Research Group
        School of Computer and Information Science
        313 Link Hall
        Syracuse University
        Syracuse, NY 13210
        USA

    Authors are also invited to submit proposals for TUTORIALS, both advanced
and introductory.  Tutorial sessions may be half or full-day, and will be
conducted on a schedule throughout the week.  Tutorial proposals should
include (1) a statement of the proposed topic area, (2) an outline of the
tutorial, (3) relevant bibliography, and (4) proposer's curriculum vita.
Proposals for ADVANCED TUTORIALS are especially encouraged.

Send three (3) copies of tutorial proposals to the Tutorial Chairman:
        Veronica Dahl
        Department of Computing
        Simon Fraser University
        Burnaby, B.C.
        CANADA

-------------------

Date: Thu, 1 Oct 87 17:16:54 PDT
From: quintus!cresswell!ok@Sun.COM (Richard A. O'Keefe)
Subject: Persimmons

    Eduoard Lagache has replied to the criticisms of his library.  I
think it is worth replying to his reply, because there are some
important issues involved.

    I was violently criticised here at Quintus for my first, message
on the subject.  I should make it plain that my opinions are my own,
that I send messages reflecting my own opinions rather than any
official Quintus position (there isn't any such animal) or even any
general consensus among Quintus staff (people at Quintus had the
chance to read these messages only when they came back through the
Prolog Digest).  My message was denounced at Quintus as "appallingly
arrogant".  Three particular points that were raised were
 -  my characterisation of Lagache's library as "regrettable",
 -  my criticism of his sending an unstable version of an intrinsically
    inefficient sorting routine as "perverse", and
 -  my claim that "a competent Prolog programmer who is concerned
    with efficiency" would avoid univ (=..) except in dialects where
    it was superfluous (LM Prolog, for example).

    Well, yes, I should apologise for using intemperate
languauge, and I do.

    I should also make it clear that while I am strongly
critical of Lagache's LIBRARY, I am not at all critical of
LAGACHE.  Someone who goes to the trouble to write, DOCUMENT,
and distribute a library of any sort deserves praise.  What is
regrettable about Lagache's library is that the quality of his
code is in such contrast to the quality of his character.

    I should also confess that some of the errors in Lagache's
code are to be found in some old code of mine:  if you look at
the public-domain file FLAT.PL you will see that the "predicates"
it defines are no more steadfast than his.  (After explaining how
it should all work, I took a look at the old file to check that
it did work that way.  And of course it didn't.)  But if you
have been following the Prolog Digest since say 1983, you will
recall that my postings of code have tended to say "please tell
me about bugs", not "NO KNOWN BUGS".

    Having said that writing, documenting, and distributing a
library is a praiseworthy act, let me point out some of the
obligations that it carries with it.

    Lagache says:
        "I am not a PROLOG guru, and never claimed that I was."
I'm afraid this simply is not true.  To shower your code on people
who never asked for it is implicitly to say "you OUGHT to be using
this code, or at the very least you OUGHT to read and consider it
before rejecting it".  To publish a book on a subject is to make
an implicit claim to your readers that you are an expert on the
subject whose opinions are worth reading.  To publish source code
for a programming library is to implicitly tell your readers that
this is good code that they ought to use and/or IMITATE.  If you
are not an expert, by what right do you thrust your code on our
attention?

    These implicit claims can be over-ridden by explicit counter-
claims.  And, readers being what they are, the counter-claims
have to be repeated in every file.  (Did you ever take that
programming aptitude test which had you doing all sorts of absurd
things, and then the final instruction was "do none of the above?")

    The reason that I reacted so immediately and with such vehemence
is that I was seriously worried that if immediate counter-measures
were not taken, innocent readers of the Digest would be deceived by
a general silence into imagining that Lagache's code was a model
worthy of imitation.  But it is not only bad, it is systematically bad.
Since I knew of Lagache's code, and understood in what ways it was
bad, and thought I had the ability to explain the ways in which it was
bad and how to do better, it seemed to me that I had a plain duty to
the Prolog community to do so.

    Let me make this clear:  I surmise that Lagache's basic problem
is that he has simply never been taught how to write Prolog.  There
are a lot of very bad books out there, nearly all of them, in fact.
To be fair to Lagache, some of the books he is likely to have had
access to present worse code than his.  Most people, trying to write
Prolog by "the light of nature" and with the aid of such books, do a
rather bad job of it.  I'll repeat a point I made in an earlier
message:  it looks to me as though Lagache has got his code right
with respect to every criterion he could think of, and I'm sure that
he carefully tested his code.  The trouble is that he didn't know
enough of the important criteria.

    Lagache speaks of "nit-pickers".  He presumably means Lee
Naish, Stan Shebs, and me.  He speaks of "nit-picking".  Well, I
for one was not nit-picking.  If it were only matters of style,
such as the use of [list|notation] or cons.notation or even if
it were only a matter of :- fail clauses, I would have remained
silent.  There is something more serious at issue:

        many of Lagache's predicates are just plain INCORRECT.

    The fundamental problem is that Lagache is writing Lisp in Prolog.
Many things that LOOK like stylistic issues are in fact symptoms of
the same problem.  Lagache thinks that when Lee Naish and I complain
about :- !, fail clauses, we are nit-picking.  Wrong.  Those clauses
are not neutral in their effect:  they actually introduce errors.
Lagache seems to think that

        merge([],[],[]).                % 'lagache'
        merge([],List2,List2):- !, fail.
        merge(List1,[],List1):- !, fail.
        merge([Head1|Tail1],[Head2|Tail2],[Head1,Head2|R]):-
                merge(Tail1,Tail2,R), !.

is easier to read than

       merge([], [], []).               % 'pure'
       merge(Head1.Tail1, Head2.Tail2, Head1.Head2.R) :-
                 merge(Tail1,Tail2,R).

He explicitly condemns the latter as "cryptic", and "terse",
and suggests that people who write "terse" code should "go back
and take a course in structured Pascal".

    Let's keep the falsely-so-called "algebraic" languages out of
the discussion.  What would this look like in a good modern
application language, of the type of ML?  (I can't remember the
actual ML syntax, so handwave a bit.)

        merge nil nil = nil.
        merge nil H2.T2 = failwith "wrong length".
        merge H1.T1 nil = failwith "wrong length".
        merge H1.T1 H2.T2 = H1.H2.(merge T1 T2).

As a matter of fact, this is ill-typed unless the two arguments
have the same type of elements.  Some compilers, having worked
out that the type is

        merge : list(Alpha) x list(Alpha) -> list(Alpha)

then check that each combination of the constructors must have an
explicit equation.  Since list(Alpha) has constructors
        nil : -> list(Alpha)
        .   : Alpha x list(Alpha) -> list(Alpha)
there must be four cases, one for each pair of constructors.  This
leads to the code I wrote.  But ML has an exception handling facility,
triggered by the 'failwith' construct.

    Lacking a comparable generally accepted construct in Prolog,
let's define a fail_with/1 of our own.  If something is actually
an error, your code should not just quietly fail, but should
print an error message.  After that, all you can do is 'fail'
or 'abort'.  You should seriously consider 'abort'.

        fail_with(Message) :-
                format(user_error, '~N! Error: ~w~n', [Message]),
                fail.

A faithful translation of the ML style into Prolog would then be

        merge([], [], []).
        merge([], [_|_], _) :- fail_with('merge/3: wrong length').
        merge([_,_], [], _) :- fail_with('merge/3: wrong length').
        merge([H1|T1], [H2|T2], [H1,H2|T3]) :-
                merge(T1, T2, T3).

    PROVIDED that this was documented as ONLY being usable to
compute the third argument from the first two, this would be tolerable.
Mind you, I find it extremely odd that
        merge([], [a], X)
would be reported as an error, but that
        merge([], a, X)
would NOT be reported.  What is so special about wrong length that
it is the only error that deserves reporting?  Another problem is
that merge/3 is not the best name for this operation.  merge/3
is more commonly used for merging two ORDERED lists (ordered set
union, as it were).  Lagache's merge/3 is neither this nor the
stream merge used in committed choice languages.  What this predicate
actually does is to INTERLEAVE its first two arguments.  Why not call
it interleave/3?

    Let's suppose that we don't really regard it as a serious error
to provide lists of the wrong length.  Let's suppose that we just
want to list all the combinations of constructors as a general
discipline, and want to make it explicit that certain combinations
are NOT in the relation.  Then the code to use is

        merge([], [], []).                      % 'defensible'
        merge([], [_|_], _) :- fail.
        merge([_,_], [], _) :- fail.
        merge([H1|T1], [H2|T2], [H1,H2|T3]) :-
                merge(T1, T2, T3).

I don't really like it, but it IS a defensible coding style.
Coding in this style can indeed help you eliminate "missing
clause" cases, and it is indeed arguable that it makes life
easier for the human reader.

    But it is NOT what Lagache wrote!  His code was

        merge([], [], []).                      % 'lagache'
        merge([], [_|_], _) :- !, fail.
        merge([_,_], [], _) :- !, fail.
        merge([H1|T1], [H2|T2], [H1,H2|T3]) :-
                merge(T1, T2, T3),
                !.

And this is NOT easier to read than the 'pure' version.
The reader first has to ask himself: "why is there a cut at the end
of the last clause?"  It turns out that there is no good reason,
but it takes a lot of effort to work that out.  The reader then
has to ask himself: "why are there cuts in the preceding two
clauses?"  And again, it turns out that there is no good reason.
In fact, those cuts stop you using this predicate to decompose
a given last argument, which is a perfectly sensible thing to want
to do.

    So yes, ":- fail" clauses are always redundant, and that is all
they are, and Lagache could have dismissed complaints about them as
nit-picking.  But ":- !, fail" clauses are NOT redundant!  They
actually change what the code MEANS.  It can be quite hard to work
out what the effect is, and it seems clear that Lagache has not
done this, otherwise he wouldn't think that they are redundant.

    The only material difference between the 'defensible' version
(which is presumably what Lagache intended) and the 'pure' version
is the genuinely redundant :- fail clauses.  Can we find an objective
reason for preferring one or the other?  There is an initial
difficulty, because the two styles are trying to help the reader
answer two different questions:

(1) The 'defensible' version helps you answer the question:
        for which arguments is the query SENSIBLE?
(2) The 'pure' version helps you answer the question:
        for which arguments is the query TRUE?

    Speaking for myself, I always read code expecting to found out
the conditions under which the predicate is true (or, if it is a
command), the conditions under which the command will take effect.
So I am asking the second question.  When I want to know the answer
to the first question, I look at the comments.  I find it very
confusing to read my way through a clause, only to find at the
last minute "GOTCHA! DON'T use this clause!".  For merge/3, there
are only two ways that the predicate can be true.  Having four
clauses obscures this.

    But what about someone who is interested in the other question?
Well, is there any need to answer the question in the CODE?  Would
not the following text answer everyone's questions:

        %   merge(L1, L2, Merge)
        %   is true when all three arguments are lists, and
        %       L1 = [X1,...,Xn] for some n,
        %       L2 = [Y1,...,Yn] for the same n, and
        %       Merge = [X1,Y1,X2,Y2,...,Xn,Yn].
        %   This predicate can be used to solve for any two
        %   of the arguments given the other one, but at
        %   least one of the arguments should be proper.

        merge([], [], []).                      % 'defensible'
        merge([H1|T1], [H2|T2], [H1,H2|T3]) :-
                merge(T1, T2, T3).

    To get this thing in perspective, let's consider another
example.  One of my earlier messages included a predicate

        sort_aux([], [], [], [], []).
        sort_aux([K|Ks], [V|Vs], [K-V|Ps], [W|Ws], [_-W|Qs]) :-
                sort_aux(Ks, Vs, Ps, Ws, Qs).

This has five arguments.  2**5 = 32.  Would Lagache really want
me to write 32 clauses, only two of which provide useful information?
You might object that I was actually supplying only two inputs, so
I would only need four cases.  But there is NOTHING in this predicate
to say which arguments are inputs and which arguments are outputs.

    Let me give you another example.  Imagine that we have a
data type 'country' with 100 country names as nullary constructor
functions, and a data type 'city' with 300 names as nullary
constructor functions, and that we want to represent the relation
between countries and capitals.  In a functional language, we would
write
        capital(belgium) = brussels.
        capital(britain) = london.
        ...
        capital(bolivia) = sucre.       % I asked 'chat' for this one

        is_capital_of(brussels)  = belgium.
        is_capital_of(london)    = britain.

        is_capital_of(sucre)     = bolivia.
        is_capital_of(berkeley)  = failwith 'is_capital_of'.
        ...
        is_capital_of(new_york)  = failwith 'is_capital_of'.
Writing down 200 equations that say so-and-so is not the capital of
any country is rather tedious, but at least there are only 200 such
equations in this example.

    Now suppose that we code this in Prolog.  It is perfectly
plausible that we will get questions like 'is it true that new_york
is the capital of the usa?'.  In the 'pure' style, we just list the
100 clauses which are true:
        capital(belgium, brussels).
        capital(britain, london).
        ...
        capital(bolivia, sucre).
But in the other style, we have to list all the sensible combinations
which are false, as well.  And that is 29,900 of them!

    So now we see in full clarity the hidden assumption behind the
style Lagache favours:

        there is always a SMALL number of arguments
        which are always inputs, and they are made from
        a SMALL set of constructor functions, and the
        other arguments are always outputs.

    As Lee Naish pointed out, there is nothing in the 'pure' version
of merge/3 to encourage you to regard any of the arguments as special:
the 'pure' version makes a simple logical statement which a Prolog
system can use to solve for any two of the arguments given the others.
The redundant clauses in the 'defensible' version actually serve to
mislead the reader:  they suggest that the first two arguments are
very different from the last one.  That is, they put BLINKERS over
the reader's eyes.

    So we see that the practice of introducing ":- fail" clauses
(1) is laudible practice in functional programming languages
(2) is part of a "functional" mind-set which discourages effective
    use of Prolog
(3) cannot be consistently carried out for predicates which have
    many inputs, or inputs with many cases.

    We simply CANNOT follow this practice in many cases.  Suppose we
SOMETIMES follow the practice.  Different programmers will be patient
to different degrees.  So a reader looking at a peice of code which
does not use :-fail clauses has to ask himself "is this significant,
or did the programmer just get tired?".  Bah!

    Since we cannot carry the practice out all the time, it is better
never to start.  People who publish source code have no right to
squander their reader's time like this.

    Apart from this inevitable "global" inconsistency in the :-fail
style, does it tend to have good or bad effects in other ways?
Well, we've already seen that there is a temptation to use :-!,fail
instead, which is almost always wrong.  Can we find any other
problems in Lagache's code that can be attributed to this practice?
Yes we can.

        nthcdr(Tail, 0, Tail).
        nthcdr([], _, []) :- !, fail.
        nthcdr([_|Tail], Index, Result) :-
                Next is Index-1,
                nthcdr(Tail, Next, Result).

THIS WORKS (at least, it is naive correct).  But notice what it says!
The first two clauses manage to claim that nthcdr([],0,[]) is both
true (first clause) and false (second clause).  This is "logic"
programming?  He has also managed to obscure the fact that this is a
perfectly ordinary induction on the second argument.

    We see, therefore, that there is some evidence that the :-fail
style may confuse the writer as well as the reader.  There are
objective reasons to prefer the 'pure' style.

    Another reason to prefer the 'pure' style is that it naturally
arises in a disciplined approach to designing logic programs.  For
example, it is often easiest to design a predicate by induction on
the OUTPUT.  You want to know "what inputs could give rise to this
output", and listing all sorts of inputs that COULDN'T really does
not help.


    What about quicksort and univ?  It has been pointed out to me
that the DEC-10 Prolog manual, the C Prolog manual, and so on, do
not say anything about the efficiency or otherwise of 'univ'.
The answer is that I expect "a competent Prolog programmer who is
concerned about efficency" to do what I did:

 -  THINK about it.  Something which forces you to construct a
    list which you don't want should arouse suspicion.

 -  MEASURE it.  Your intuition about Prolog efficiency is not
    to be trusted.  My intuition about Prolog efficiency is not
    to be trusted.  The way to find out what is the efficient
    way to code something is to code as many variants as you
    have wit and energy for and MEASURE them.

If anybody other than my critic interpreted my statements about
"univ" as an attack on Lagache, stand corrected.  I never supposed
that efficiency was Lagache's first concern in his library, nor
that it should have been.  The statement was intended as a piece
of advice directed to everybody.

    The other objection to my initial comments on Lagache's library
was my strong criticism of his recommending quicksort.  (And let me
remind you that including something in a library is a "structural"
recommendation to use it, unless there is an explicit counter-claim.)
My critic reckoned that Lagache was not setting himself up as an
expert on sorting methods in Prolog or any other language, and should
not be criticised for not having informed himself of the problems of
quicksort and the virtues of merge sort.

    My counter-claim is that publishing library source code IS a
structural claim of expertise, and that someone who recommends a
particular sorting routine to a large group of people has an
obligation to inform himself of at least the elementary facts
about sorting routines.  It is not necessary to have writting
"The Art of Programming, Volume 3: Sorting and Searching".  But
one ought to have READ it.

    What is the advantage of quick-sort over merge-sort?  It can
be implemented very efficiently >>in languages with cheap
updateable arrays<<.  The point is that the partitioning phase
can be done in-place.  This means that the only extra storage
that quick-sort needs is space for a very small stack (a fixed
table of 2*N words will let you sort arrays of up to 2**N words).

    This advantage does not exist in pure functional languages,
and it does not exist in Prolog.  Even with SICStus Prolog's
backtrackable setarg/3, this advantage does not exist.  Lagache's
code for quick-sort does not sort the list in place: it constructs
lots and lots of new lists.  As it should.  That's how you do it
in Prolog.

    Is there any other advantage of quick-sort over merge-sort?
Well, look at the name.  Surely it must be faster.  WRONG.  Even
in languages like Fortran, Pascal, or ADA, quick-sort is not the
method of choice.  There have indeed been studies which have gone
as far as counting individual instructions, and every one of them
has concluded that quick-sort is a very fast method indeed.  But
then, every one that I have read was actually considering the
very special case of sorting an array of numbers, where comparing
two numbers was a single machine instruction.  In that case, OF
COURSE quick-sort is going to look good.  It does very little
book-keeping.  But if you want to sort an array of integers,
there are methods with O(N) expected cost, so you would be silly
to use quick-sort with its O(N.lgN) expected cost.  You can
recode k-bit floating-point numbers as k-bit integers so as to
preserve order (the PDP-10 actually used the same instructions
to compare integers and floats) so the same methods can be used
to sort arrays of k-bit floats.  As soon as comparison becomes
at all costly (such as when you are sorting strings, or when you
are calling a user-supplied comparison routine), quick-sort turns
out to be less efficient than merge-sort even in conventional
languages, because it does more comparisons.

    I have to admit that most people probably don't know that,
and that it would not be fair for me to expect Lagache to know
it.  But it IS fair to expect someone who has heard of quick-
sort to know that its worst-case behaviour is O(N^2), and that
the version which Lagache distributed has the worst case when
the list is already sorted, and it is also fair to expect
someone to know that merge sort has O(N.lgN) worst case time as
well as O(N.lgN) average time, and that the constant factors
aren't all that different either.  You don't actually have to
understand Knuth Vol 3, you just have to look at the
conclusions.

    Another of quicksort's defects is that it suffers badly when
the input contains lots of duplicates.  Handling them in place
is rather tricky, though methods have been published.  The good
news is that handling duplicates in the Prolog version is trivial,
and that doing so restores stability.  But that does nothing about
the worst-case behaviour.

    It was wrong of me to describe Lagache's distribution of
an unstable quadratic-worst-case algorithm when a stable n-log-n
algorithm has been in use for years as "perverse".  I would only
have been entitled to say that if I had known that he knew what
he was doing.  I didn't (and don't) know that.  But I do know
that he ought to have known what he was doing, and that in his
documentation, he claims ("the execution time is quite reasonable")
that he does know what he was doing.

    A comment something like this

        "This is the best sorting method I know of in Prolog.
        If anyone knows of a better one, please tell me."

would have served as a counter-claim to the structual claim of
expertise.

    Someone who offers his code for general use must expect
criticism of it.  You have no right to say BOTH "here, you ought
to use this" AND "no nit-picking".  I have no right to expect
the code I sent to the library in '83 and '84 to be immune from
criticism.  I wish it HAD been criticised, there were several
things wrong with it.  (Every flaw that I learn of is fixed in
the Quintus version of the library.  The more criticism of my
code the better, say I.)  If you offer people medlars, you must
expect persimmons back.

------------------------------

End of PROLOG Digest
********************