[net.lang.prolog] More on LIPS and benchmarking problems

mckay@burdvax.UUCP (06/01/83)

.ll 6.5
.pp
In many articles and talks about logic programming implementations and in this
news group, the
efficiency of a system is measured in "resolutions per second" or
"logical inferences per second" (aka LIPS). These are usually mentioned in
association with horn clause systems of various flavors, e.g. DEC-10/20 Prolog,
LOGLISP. By far the fastest systems appear to be the DEC-10/20 Prolog systems
which claim rates of 20,000 to 40,000 "resolutions per second" for compiled
clauses.
.pp
A benchmark is important because of interactions of the specific program
being used and the definition of "LIPS". Consider the two obvious definitions.
First, the one suggested by the use of the phrase "resolutions per second".
This suggests one is measuring successful unifications of a literal with the
head of a clause from some procedure as well as finding the appropriate clauses
with which to attempt unification. This means that the measurement includes
many unification attempts which may fail. It is extremely dependent on the
order of clauses within a procedure and the arity of the predicate involved.
The measurement can be severely effected by the "shape" of clauses or literals.
A second definition to consider is attempted unifications regardless of whether
they fail or succeed. This would equate LIPS with unification. One still has
the problem with the arity of the literals involved but the problem with the
order of clauses has been minimized. While unification is a critical component
of a logic programming system, it by itself does not measure "progress" of a
computation.
.pp
All of this suggests that LIPS (whichever definition one uses) is extremely
application specific and, therefore, if one is quoting a LIPS for a particualr
system one MUST state with what the measurement was done, ie a plain LIPS
figure is is not good enough, it must be "LIPS with respect to X".
.pp
Therefore:
.(b
What is appropriate to measure for such systems?
.br
What are reasonable benchmarks?
.br
What have you used for benchmarks in the past?
.br
What AND WHY did you choose the specific logic programs as benchmarks?
.br
What measurements are there for the various systems?
.)b