[net.lang.prolog] metering Prolog performance: LIPS?

smh@mit-eddie.UUCP (Steven M. Haflich) (09/28/84)

The speed of Prolog implementations is usually quantified in Logical
Inferences Per Second (LIPS), but I have never seen specified exactly
what qualifies as a LI.  The most obvious count is the number of times
the evaluator reaches an EXIT port (in the "box model").  This would
thus count each time a rule is satisfied, recursively including
satisfaction of subclauses.  Can someone confirm that this convention?

However, depending on details of benchmark code and even coding style,
the number of EXIT ports reach may not be a very good measure of the
work the evaluator is doing.  (It's a little like counting *lines* of
code executed by Pascal.)  As the refutation of a clause is a useful
result, perhaps it would make more sense to count CALL ports instead?
It would seem important to include in this count satisfaction of
"facts", that is, rules with no subclauses, although this might be
awkward to meter in some database implementations.

There must be any number of interesting issues lurking under the
occasional LIPS figures we see.  Any takers?  (Probably it's all been
discussed before...)

Steve Haflich, {decvax!genrad, ihnp4}!mit-eddie!smh, smh@mit-ems@mit-mc

rggoebel@water.UUCP (Randy Goebel) (09/29/84)

I have not seen anyone provide a precise specification of
what a logical inference is (w.r.t. Prolog implementation), 
although the notion of a Prolog inference has a precise 
machine-independent definition.  The inference in question is
a resolution inference; for any two complementary literals P 
and Q, a successful unification of P and Q (i.e., generating 
a substitution that makes P syntactically equivalent to Q) 
sanctions the inference of the null clause.  The cost of an
inference is the cost of successful unification plus the cost
of applying the inference rule to derive the null clause.

In practice, the counting of LIPS is potentially misleading for
several reasons.   For example, the actual time for any given
unification depends on the terms being unified (e.g., constants 
versus lists) and their internal representation as data structures
(e.g., structure-sharing, copying, demand copying).  Another
important consideration questions the literal meaning of ``logical
inference;'' while failed unifications certainly consume time, they
do not sanction any inference...should they be counted?  Still
another consideration is that some may argue that what should be
counted is the computational notion of ``predicate call,'' i.e.,
the time it takes to reduce a single goal to the sequence of goals
resulting from a successfully matched rule.   The accuracy of
this measure if affected by such things as clause database indexing,
data representation, even secondary storage delays in systems that
manage external store.

Still counting LIPS seems not an unreasonable measure, and it
is used as a rough indication of speed by nearly everyone I
know.  It always provides a starting point in a conversation
between two implementors, who seek further details when a large
discrepancy in LIPS indicates some perhaps fundamental implementation
advance.

Now that I've committed my self to the belief that the computational
notion of a LIPS is a bit sketchy, I'm quite sure that several people
will feel compelled to contribute their opinion.   At least I hope
that's the case.

Cheers,
Randy Goebel
Logic Programming and Artificial Intelligence Group
University of Waterloo

schachte@ittvax.UUCP (Peter Schachte) (10/01/84)

> I have not seen anyone provide a precise specification of
> what a logical inference is (w.r.t. Prolog implementation), 
> although the notion of a Prolog inference has a precise 
> machine-independent definition.  The inference in question is
> a resolution inference; for any two complementary literals P 
> and Q, a successful unification of P and Q (i.e., generating 
> a substitution that makes P syntactically equivalent to Q) 
> sanctions the inference of the null clause.  The cost of an
> inference is the cost of successful unification plus the cost
> of applying the inference rule to derive the null clause.
> 
> In practice, the counting of LIPS is potentially misleading for
> several reasons.   For example, the actual time for any given
> unification depends on the terms being unified (e.g., constants 
> versus lists) and their internal representation as data structures
> (e.g., structure-sharing, copying, demand copying).  Another
> important consideration questions the literal meaning of ``logical
> inference;'' while failed unifications certainly consume time, they
> do not sanction any inference...should they be counted?  Still
> another consideration is that some may argue that what should be
> counted is the computational notion of ``predicate call,'' i.e.,
> the time it takes to reduce a single goal to the sequence of goals
> resulting from a successfully matched rule.   The accuracy of
> this measure if affected by such things as clause database indexing,
> data representation, even secondary storage delays in systems that
> manage external store.
> 
> Still counting LIPS seems not an unreasonable measure, and it
> is used as a rough indication of speed by nearly everyone I
> know.  It always provides a starting point in a conversation
> between two implementors, who seek further details when a large
> discrepancy in LIPS indicates some perhaps fundamental implementation
> advance.
> 
> Now that I've committed my self to the belief that the computational
> notion of a LIPS is a bit sketchy, I'm quite sure that several people
> will feel compelled to contribute their opinion.   At least I hope
> that's the case.
> 
> Cheers,
> Randy Goebel
> Logic Programming and Artificial Intelligence Group
> University of Waterloo

*** REPLACE THIS LINE WITH YOUR MESSAGE ***

	There's another potential problem with LIPS:  there's no
indication of how big the terms being unified are.  If your benchmark
involves predicates with many arguments, or involves unifying large
terms, timings will seem slow.  How big is a "logical inference?"
	A good suite of benchmmarks is probably the only way to compare
prolog implimentations fairly.  No single number will do it.

Peter Schachte (decvax!ittvax!schachte)