mckay@burdvax.UUCP (06/01/83)
.ll 6.5 .pp In many articles and talks about logic programming implementations and in this news group, the efficiency of a system is measured in "resolutions per second" or "logical inferences per second" (aka LIPS). These are usually mentioned in association with horn clause systems of various flavors, e.g. DEC-10/20 Prolog, LOGLISP. By far the fastest systems appear to be the DEC-10/20 Prolog systems which claim rates of 20,000 to 40,000 "resolutions per second" for compiled clauses. .pp A benchmark is important because of interactions of the specific program being used and the definition of "LIPS". Consider the two obvious definitions. First, the one suggested by the use of the phrase "resolutions per second". This suggests one is measuring successful unifications of a literal with the head of a clause from some procedure as well as finding the appropriate clauses with which to attempt unification. This means that the measurement includes many unification attempts which may fail. It is extremely dependent on the order of clauses within a procedure and the arity of the predicate involved. The measurement can be severely effected by the "shape" of clauses or literals. A second definition to consider is attempted unifications regardless of whether they fail or succeed. This would equate LIPS with unification. One still has the problem with the arity of the literals involved but the problem with the order of clauses has been minimized. While unification is a critical component of a logic programming system, it by itself does not measure "progress" of a computation. .pp All of this suggests that LIPS (whichever definition one uses) is extremely application specific and, therefore, if one is quoting a LIPS for a particualr system one MUST state with what the measurement was done, ie a plain LIPS figure is is not good enough, it must be "LIPS with respect to X". .pp Therefore: .(b What is appropriate to measure for such systems? .br What are reasonable benchmarks? .br What have you used for benchmarks in the past? .br What AND WHY did you choose the specific logic programs as benchmarks? .br What measurements are there for the various systems? .)b