[comp.arch] BENCHMARKS AND LIPS

MLWLG@CUNYVM.CUNY.EDU (11/23/88)

DEAR READER,

I WOULD LIKE INFORMATION ON MEASURING THE LIPS OF A SYSTEM, I.E.,
LOGICAL INFERENCES PER SECOND. FOR EXAMPLE I HEAR THAT A PSI COMPUTER
CAN RUN, SAY, AT 100K LIPS. IS THERE SOME STANDARD PROGRAM (I.E., PROLOG)
THAT IS USED?

ALSO, HOW DO LIPS COMPARE TO FLOPS?

THANK YOU IN ADVANCE.

SINCERELY,
LARRY.

ok@quintus.uucp (Richard A. O'Keefe) (11/26/88)

In article <1740MLWLG@CUNYVM> MLWLG@CUNYVM.CUNY.EDU writes:
>I WOULD LIKE INFORMATION ON MEASURING THE LIPS OF A SYSTEM, I.E.,
>LOGICAL INFERENCES PER SECOND. FOR EXAMPLE I HEAR THAT A PSI COMPUTER
>CAN RUN, SAY, AT 100K LIPS. IS THERE SOME STANDARD PROGRAM (I.E., PROLOG)
>THAT IS USED?

Logical Inferences Per Second is a property of a _Prolog_ implementation,
not of any other sort of system, and is defined by the Naive Reverse
benchmark _only_.  Divide 496 by the time in seconds to reverse a 30-
element list the hard way, and you get LI/s.  It has been fairly widely
agreed for about the last 10 years that much better benchmarks are
needed.  A Motorola MC68020 running at Sun-3/50 speeds can get 150kLI/s
(using native code).  Surprise! the benchmark fits entirely into the
on-chip cache...  Two popular semi-realistic benchmarks use CHAT-80 and
the Berkeley PLM compiler.

>ALSO, HOW DO LIPS COMPARE TO FLOPS?
About like apples and eggplants.  You can have a high LI/s rating and a
low FlOp/s rating (e.g. a 68020 with software floating point).  You can
have a low LI/s rating and a high FlOp/s rating (a Cray running a poor
interpreter).  The LI/s rating, for a good Prolog system, depends on
memory bandwidth, integer operations (like bit tests and comparisons),
and branches.

reiter@babbage.harvard.edu (Ehud Reiter) (11/29/88)

In article <1740MLWLG@CUNYVM> MLWLG@CUNYVM.CUNY.EDU writes:
>I WOULD LIKE INFORMATION ON MEASURING THE LIPS OF A SYSTEM, I.E.,
>LOGICAL INFERENCES PER SECOND

From a technical point of view, "LIPS" is a term used by PROLOG types to
refer to the number of unifications per second their systems can
perform when running a trivial program such as naive reverse.  See,
for example, Sterling&Shapiro, THE ART OF PROLOG, pg 48 and 203.

I must say, though, that calling such a unification a "logical inference"
ranks pretty high on my list of misleading names.  If what you're really
interested in is evaluating how good a computer system is at performing
symbolic computations, I suggest you start by looking at Gabriel's book
on benchmarking LISP, which contains a lot of thoughtful analysis on
what is necessary for good performance on symbolic (and numeric) programs
written in LISP.

					Ehud Reiter
					reiter@harvard	(ARPA,BITNET,UUCP)
					reiter@harvard.harvard.EDU  (new ARPA)

s8504867@mqcomp.oz (John Gardner) (11/30/88)

In article <746@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>In article <1740MLWLG@CUNYVM> MLWLG@CUNYVM.CUNY.EDU writes:
>>THAT IS USED?
>
>Logical Inferences Per Second is a property of a _Prolog_ implementation,
>

Come off the grass.  That's like saying MIPS are a property of C
only.  While it is nice to do benchmarks by only varying what you want to
compare, it is certainly valid to calculate LIPS using any theorem prover,
not just prolog.  Do you think that all other theorem provers are incapable
of logical inferences ?  Do you think prolog is the only langauge availible
for this sort of work ?

------------------------------------------------------------------------------
John Gardner              ACSnet,Ean,CSnet: s8504867@mqcomp.mq.oz
Computing Discipline      Arpa:  s8504867%mqcomp.mq.oz@UUNET.UU.NET
Macquarie University 2109 Janet: s8504867%mqcomp.mq.oz@UK.AC.UKC
Australia                 UUCP:  {uunet,mcvax}!munnari!mqcomp.mq.oz!s8504867

[  Macquarie University  - Where you are not just a number .. you are
   a letter followed by seven numbers. ]

rang@cpsin3.cps.msu.edu (Anton Rang) (12/02/88)

In article <595@mqcomp.oz>, John Gardner (s8504867@mqcomp.mq.oz) writes:
>In article <746@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>>In article <1740MLWLG@CUNYVM> MLWLG@CUNYVM.CUNY.EDU writes:
>>>THAT IS USED?
>>
>>Logical Inferences Per Second is a property of a _Prolog_ implementation,
>>
>
>Come off the grass.  That's like saying MIPS are a property of C
>only.  While it is nice to do benchmarks by only varying what you want to
>compare, it is certainly valid to calculate LIPS using any theorem prover,
>not just prolog.

  It's not valid to calculate and compare them using any theorem
prover.  The term "LIPS" as it is usually used refers to the internal
workings of PROLOG (unification, etc.).  While it would be possible to
find LIPS with any theorem prover, the results couldn't be compared
against PROLOG ones.
  The basic problem is the definition of a "logical inference".
PROLOG has one; other languages would have others.  MIPS is a property
of a system, not a language, because you can always look at the system
and count the number of instructions it's running!  You can't look and
just say "Oh, that's...346 logical inferences" for any given proof,
because there are a very large number of ways to handle it.  Even
PROLOG people don't always agree on the LIPS count for a program.
  The "LIPS" term is misleading, but it *is* a property of PROLOG (at
least, as it is currently used).

+---------------------------+------------------------+----------------------+
| Anton Rang (grad student) | "VMS Forever!"         | "Do worry...be SAD!" |
| Michigan State University | rang@cpswh.cps.msu.edu |                      |
+---------------------------+------------------------+----------------------+

ok@quintus.uucp (Richard A. O'Keefe) (12/02/88)

In article <595@mqcomp.oz> s8504867@mqcomp.mq.oz (John Gardner) writes:
>In article <746@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>>Logical Inferences Per Second is a property of a _Prolog_ implementation,

>Come off the grass.  That's like saying MIPS are a property of C
>only.  While it is nice to do benchmarks by only varying what you want to
>compare, it is certainly valid to calculate LIPS using any theorem prover,
>not just prolog.  Do you think that all other theorem provers are incapable
>of logical inferences ?  Do you think prolog is the only langauge availible
>for this sort of work ?

Stop knocking down straw men; you'll only get straw in your hair and then
what will the neighbours think?

It is no more "valid to calculate LIPS using any theorem prover" than it
is to calculate Dhrystones "using any program".  Remember: the meaning
of a compound noun is *not* a simple composition of the meanings of the
words it is made from.  Is a foot-hill a hill made of feet?  Is a
benchmark a mark on a bench?  (Not now.)  Yes, other programs are
theorem provers capable of drawing logical inferences.  Prolog is a
pretty weak theorem prover, that's _why_ it is usable as a programming
language.  If you compare the number of resolutions per second in a good
theorem prover (say Markgraf Karl, or ITP) with the kind of LIPS rating
a good Prolog should get (a) the theorem prover would look _terrible_,
and (b) you would learn nothing of interest.  In fact, you don't learn
a whole lot comparing the LIPS rating of two Prolog systems, either.
Run some tests and you can get quite a few more procedure calls per second
than the LIPS rating; run some others and you can get far fewer.

We should be trying to get rid of "LIPS", not trying to spread the disease.

For comparing theorem provers in general, there's a book of examples
from one of the Argonne crowd which might be useful for benchmarking.
Cpu and wall time in seconds to solve each of those problems would be
much more illuminating than a single figure which favours depth-first
search.

mac3n@babbage.acc.virginia.edu (Alex Colvin) (12/02/88)

In various articles
>   The "LIPS" term is misleading, but it *is* a property of PROLOG (at
> least, as it is currently used).

it's true, it's even pretty much restricted to nai:ve reverse. think of
LIPS as comparable to Dhrystones rather than MIPS. bad terminology.
should've been called something like NRPS or Lhipstones.

jec@iuvax.cs.indiana.edu (James E. Conley) (12/03/88)

	MIPS and LIPS have a similar problem-- that you may not be counting
the same sorts of instructions or logical inferences.  The VAX instruction
to add two integers and the MIPS instruction really don't do the same thing
and similarly a logical inference has the same sort of problem-- even if
you are dealing with the same system (Prolog in this case).  This gets worse
if you start comparing other systems which do their inferences in a different
manner (production systems like YAPS and OPS5 come to mind).

	As usual, benchmarking using LIPS will get you a number with no
solid meaning (as with MIPS).

    III			Usenet:     iuvax!jec
UUU  I  UUU		ARPANet:    jec@iuvax.cs.indiana.edu
 U   I   U		Phone:      (812) 855-7729
 U   I   U		U.S. Mail:  James E. Conley
 U   I   U			    Indiana University 
  UUUIUUU			    Dept. of Computer Science
     I				    004 Lindley Hall
    III				    Bloomington, IN. 47405

jwmills@iuvax.cs.indiana.edu (Jonathan Mills) (12/03/88)

To expand a little on what Richard has said (and I agree with him) LIPS 
depend heavily on the ability to perform the following fragment of Warren 
abstract machine code (the inner loop of list concatenate - see An Abstract 
Prolog Instruction Set by D.H.D. Warren, SRI Tech Note 309, October 1983):

conc/3:	switch_on_term C1a, C1, C2, fail

<variable & nil code omitted>

C2:		get_list	A1	% conc( [
		unify_variable 	X4	%	X|
		unify_variable	A1	%	L1],

		get_variable 	X2,A2	%	L2,  <omit because X2 = A2>

		get_list	A3	%	[
		unify_value	X4	%	X|
		unify_variable	A3	%	L3]  ) :-

		execute conc/3		%	conc( L1, L2, L3 ).


Because the functionality of this instruction sequence could be duplicated 
by writing it in assembler, Pascal, C, LISP, or whatever, the "LIPS" for 
ANY language or machine can be obtained... and be just as meaningful as you 
like.  Or as meaningless.

LIPS bear little relation to inferences in the rest of the system for other 
reasons:  smart Prolog compilers look for concatenate-like operations, and 
produce code optimized for it (so LIPS may vary between NREV and any other 
program by a factor of 2 to 10).  And if one is willing to accept as LIPS 
the rate at which a Prolog can do procedure calls, unbelievably high (and 
non-representative) speeds can be claimed.  (For example, benchmarking "a 
:- a", which reduces to "a: goto a" in assembly language produces a LIPS 
rate *approximately* equal to the MIPS rate of your CPU).  And I am still
embarrassed to recall a version of NREV on a DPS-88 that fit both the
program and the list into cache... and ran at 2.7 MegaLIPS...and dropped to
700 KLIPS when the list was made too large for the cache.  Live and learn.

Theorem provers and Prologs can be rated for LIPS, but again, this has 
little relation to the power of the system.  Proving Sam's Lemma takes ITP 
(an interactive theorem prover written in Pascal running on a VAX 11-780) 
16,000 seconds, ALS Prolog (running on an 88000 ANGELFIRE 1) 150 seconds, 
and OTTER (a batch theorem prover written in C running on a VAX 8800) 29 
seconds.  All use the set of support strategy, and generate roughly the 
same number of clauses, so the number of logical inferences are similar - 
but this tells little about what makes the programs' execution times vary 
(aside from the CPU, of course).

Because rewriting operations (subsumption, demodulation, paramodulation), 
clause selection (indexing), unification, and clause integration (asserts) 
all interact to determine how many clauses are generated (i.e., how many 
inferences are *completed*), it is not an easy task to give a single 
"inference rate".  Indeed, it would be NAIVE to do so - LIPS are now part 
of the *Prolog* mythos, and should be left there, as firmly embedded as is 
TAK in the Lisp community.  If anyone out there would undertake Prolog 
benchmarks analogous to Gabriel's benchmarks for Lisp, we would all 
benefit.  At least we could attempt to justify our prejudices with better 
numbers! (:-)

And when I have better numbers for the behavior of various theorem provers
under various constraints over a wide variety of domains, I'll post them.
Don't expect to see anything before 1990.

(ITP is available from NAG (Numerical Algorithms Group), and OTTER is
available from Argonne National Lab, try "mccune@mcs.anl.gov".  I can e-mail
a compiled version of Sam's Lemma (useful as a benchmark) to interested folks,
but the FOL compiler itself is a ways off yet.)